Context-based feature technique for sarcasm identification in benchmark datasets using deep learning and bert model

Eke, Christopher Ifeanyi and Norman, Azah Anir and Shuib, Liyana (2021) Context-based feature technique for sarcasm identification in benchmark datasets using deep learning and bert model. IEEE Access, 9. pp. 48501-48518. ISSN 2169-3536, DOI https://doi.org/10.1109/ACCESS.2021.3068323.

Full text not available from this repository.

Abstract

Sarcasm is a complicated linguistic term commonly found in e-commerce and social media sites. Failure to identify sarcastic utterances in Natural Language Processing applications such as sentiment analysis and opinion mining will confuse classification algorithms and generate false results. Several studies on sarcasm detection have utilised different learning algorithms. However, most of these learning models have always focused on the contents of expression only, leaving the contextual information in isolation. As a result, they failed to capture the contextual information in the sarcastic expression. Secondly, many deep learning methods in NLP uses a word embedding learning algorithm as a standard approach for feature vector representation, which ignores the sentiment polarity of the words in the sarcastic expression. This study proposes a context-based feature technique for sarcasm Identification using the deep learning model, BERT model, and conventional machine learning to address the issues mentioned above. Two Twitter and Internet Argument Corpus, version two (IAC-v2) benchmark datasets were utilised for the classification using the three learning models. The first model uses embedding-based representation via deep learning model with bidirectional long short term memory (Bi-LSTM), a variant of Recurrent Neural Network (RNN), by applying Global Vector representation (GloVe) for the construction of word embedding and context learning. The second model is based on Transformer using a pre-trained Bidirectional Encoder representation and Transformer (BERT). In contrast, the third model is based on feature fusion that comprised BERT feature, sentiment related, syntactic, and GloVe embedding feature with conventional machine learning. The effectiveness of this technique is tested with various evaluation experiments. However, the technique's evaluation on two Twitter benchmark datasets attained 98.5% and 98.0% highest precision, respectively. The IAC-v2 dataset, on the other hand, achieved the highest precision of 81.2%, which shows the significance of the proposed technique over the baseline approaches for sarcasm analysis.

Item Type: Article
Funders: None
Uncontrolled Keywords: Feature extraction; Sentiment analysis; Deep learning; Context modeling; Semantics; Bit error rate; Social networking (online); Natural language processing; Sarcasm identification; Bi-LSTM; GloVe embedding; BERT
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms Zaharah Ramly
Date Deposited: 05 Apr 2022 07:34
Last Modified: 05 Apr 2022 07:34
URI: http://eprints.um.edu.my/id/eprint/26991

Actions (login required)

View Item View Item