An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpus

Kaity, Mohammed and Balakrishnan, Vimala (2020) An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpus. Journal of Supercomputing, 76 (12). pp. 9772-9799. ISSN 0920-8542, DOI https://doi.org/10.1007/s11227-020-03222-0.

Full text not available from this repository.

Abstract

Building sentiment analysis resources is a fundamental step before developing any sentiment analysis model. Sentiment lexicons are one of these critical resources. However, many non-English languages suffer from a severe shortage of these resources and lexicons. This study proposes an integrated framework for extracting domain-based polarity words from unannotated massive non-English corpus. The framework consists of three layers, namely lexicon-based, corpus-based and human-based. The first two layers automatically recognize and extract new polarity words from a massive unannotated corpus using initial seed lexicons. A key advantage of the proposed framework is that it only needs an initial seed lexicon and unannotated corpus to start the extraction process. Therefore, the framework is semi-automated due to the use of seed lexicons. Experiments on three languages indicate the proposed framework outperformed existing lexicons, achieving F-scores of 77.8%, 83.8% and 68.6% for the Arabic, French and Malay lexicons, respectively.

Item Type:	Article
Funders:	UNSPECIFIED
Uncontrolled Keywords:	Multilingual sentiment analysis; Sentiment lexicon; Polarity words; Social media analysis; Unannotated corpus
Subjects:	P Language and Literature > PE English Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Computer Science & Information Technology
Depositing User:	Ms Zaharah Ramly
Date Deposited:	05 Oct 2023 00:49
Last Modified:	05 Oct 2023 00:49
URI:	http://eprints.um.edu.my/id/eprint/36822

Actions (login required)

View Item