Text augmentation based on operation weighting using genetic algorithm

Document Type : Article

Authors

Department of Information Technology Engineering, Faculty of Electrical and Computer Engineering, University of Sistan and Baluchestan, Zahedan, Iran

Abstract

Insufficient training samples is one of the major challenges in deep learning, and one promising solution is data augmentation. Most existing methods for text data augmentation use a fixed strategy, in which some simple operations such as word replacement, insertion, deletion, and shuffling are selected randomly and applied to the text words that are also randomly sampled with equal probability. In this paper, a task-independent text augmentation approach is proposed, which, by weighting data augmentation operations using genetic algorithm, intelligently chooses the appropriate type and position of these operations for each sentences in the dataset. To evaluate the effectiveness of the proposed method, extensive experiments were conducted on several sentiment analysis datasets. In comparison with the baseline method (without data augmentation), EDA (a well-known task-independent method for text augmentation) and TTA (a state-of-the-art text augmentation method for sentiment analysis), the proposed method improves the average accuracy by 9.19%, 3.63%, and 1.04% on datasets of size 100, and by 5.27%, 3.18%, and 1.18% on datasets of size 500, respectively.

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 05 March 2025
  • Receive Date: 20 September 2024
  • Revise Date: 26 November 2024
  • Accept Date: 05 March 2025