Using generative adversarial networks to increase the classification efficiency of imbalanced user reviews

Document Type : Original Article

Authors

1 Computer Engineering Department,, Shahrood University of Technology

2 Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Ir

Abstract

Text generation methods use artificial intelligence to automatically generate natural language texts. One of the uses of text generation is in text classification. Many real-world problems are related to imbalanced textual data, which can reduce classification efficiency. One approach to solving the imbalanced data problem is the minority class oversampling. Due to the progress of generative adversarial networks (GAN) in data generation, these networks can be used to generate text samples in oversampling. Generating text using GANs is a complex problem due to the discrete nature of text. Despite their potential, the use of these networks in solving the problem of imbalanced textual data has rarely been investigated. This article examines the effect of using the SentiGAN network to solve the problem of imbalanced user reviews with the aim of improving the classification efficiency. To evaluate the proposed method, before and after oversampling with traditional, recent and SentiGAN methods, four classification algorithms were implemented on the data and evaluation criteria were calculated. It was observed that oversampling with the help of SentiGAN has increased the accuracy, precision, specificity and f_score of zero class compared to the situation where the data is imbalanced or even is oversampled by the other methods.

Keywords