A multi-label emoji classification method using balanced pointwise mutual information-based feature selection

Ahanin, Zahra and Ismail, Maizatul Akmar (2022) A multi-label emoji classification method using balanced pointwise mutual information-based feature selection. Computer Speech & Language, 73. ISSN 0885-2308, DOI https://doi.org/10.1016/j.csl.2021.101330.

Full text not available from this repository.

Abstract

The availability of social media such as twitter allows users to express their feeling, emotions and opinions toward a topic. Emojis are graphic symbols that are regarded as the new generation of emoticons and an effective way of conveying feelings and emotions in social media. With the surging popularity of Emojis, the researchers in the area of Emotion Classification strive to understand the emotion correlated to each Emoji. Two of the most the successful approaches in emoji analysis rely on: 1) official Unicode description and 2) manually built emoji lexicons. Since the use of emoji is socially determined, the former approach is not aligned with intended semantic and usage, which leads researchers to opt for emoji lexicons. To overcome problem of lexiconbased approach, we proposed a method to classify emojis automatically. Therefore, we present a modified Pointwise Mutual Information (PMI) method, called Balanced Pointwise Mutual Information-Based (B-PMI), to develop a balanced weighted emoji classification based on the semantic similarity. Further, deep neural network is used to represent emoji in vector form (emoji embedding) to extend the pre-trained word embeddings. We carefully evaluated the proposed method in multiple twitter datasets that are employed in sentiment and emotion classification using machine learning (ML) and deep learning (DL) approaches. In both approaches, extending word embedding with the proposed emoji embedding improved results. The DL-based approach achieved the highest f1-score of 70.01% for sentiment classification, and accuracy score of 56.36% for emotion classification. ML-based approach obtained accuracy score of 52.17% in emotion classification.

Item Type: Article
Funders: None
Uncontrolled Keywords: Multi -label classification; Emotion classification; Emoji classification; Emoji lexicon; Sentiment analysis; Pointwise Mutual Information; Natural Language Processing
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 21 Jul 2022 02:24
Last Modified: 21 Jul 2022 02:24
URI: http://eprints.um.edu.my/id/eprint/33651

Actions (login required)

View Item View Item