Performance evaluation of multilabel emotion classification using data augmentation techniques

Ahanin, Zahra and Ismail, Maizatul Akmar and Herawan, Tutut (2024) Performance evaluation of multilabel emotion classification using data augmentation techniques. Malaysian Journal of Computer Science, 37 (2). pp. 154-168. ISSN 0127-9084, DOI https://doi.org/10.22452/mjcs.vol37no2.4.

Full text not available from this repository.
Official URL: https://doi.org/10.22452/mjcs.vol37no2.4

Abstract

One of the challenges of emotion classification is the existence of low annotated datasets, that makes the task more complex. Certain existing datasets often suffer from imbalanced data for the emotion classes. Several data augmentation approaches can help to overcome the challenges regarding imbalanced datasets. However, the existing data augmentation techniques in emotion classification lack consideration for the contextual nuances of emotions and this area is still relatively underexplored. In this work, we study the impact of data augmentation on classification performance of three machine learning models including Logistic Regression, BiLSTM and BERT and compare frequently used methods to address the issue. Specifically, we assessed Easy Data Augmentation (EDA) and contextual Embedding -based data augmentation (BERT) on two datasets. Based on the experimental results, we combined two BERT -based augmentation techniques including insert and substitute, to generate data for minority emotion classes. Furthermore, we proposed a data augmentation method using ChatGPT. Compared to the baseline models, incorporating the BERT augmentation techniques with BERT model resulted in improvements of +4.34% and +5.56% in Macro F1 score on the SemEval-2018 and GoEmotions datasets, respectively. Moreover, the proposed augmentation technique utilizing ChatGPT yielded improvements of +3.55% and +4.83% on the same datasets.

Item Type: Article
Funders: Universiti Malaya International Collaboration Grant (ST005-2023)
Uncontrolled Keywords: Text classification; Deep learning; Class imbalance; NLP; Data augmentation; ChatGPT
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology > Department of Information System
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 12 Nov 2024 07:33
Last Modified: 12 Nov 2024 07:33
URI: http://eprints.um.edu.my/id/eprint/45835

Actions (login required)

View Item View Item