Tracking dengue on twitter using hybrid filtration-polarity and Apache Flume

Ghani, Norjihan Binti Abdul and Hamid, Suraya and Ahmad, Muneer and Saadi, Younes and Jhanjhi, N. Z. and Alzain, Mohammed A. and Masud, Mehedi (2022) Tracking dengue on twitter using hybrid filtration-polarity and Apache Flume. Computer Systems Science and Engineering, 40 (3). pp. 913-926. ISSN 0267-6192, DOI

Full text not available from this repository.


The world health organization (WHO) terms dengue as a serious illness that impacts almost half of the world's population and carries no specific treatment. Early and accurate detection of spread in affected regions can save precious lives. Despite the severity of the disease, a few noticeable works can be found that involve sentiment analysis to mine accurate intuitions from the social media text streams. However, the massive data explosion in recent years has led to difficulties in terms of storing and processing large amounts of data, as reliable mechanisms to gather the data and suitable techniques to extract meaningful insights from the data are required. This research study proposes a sentiment analysis polarity approach for collecting data and extracting relevant information about dengue via Apache Hadoop. The method consists of two main parts: the first part collects data from social media using Apache Flume, while the second part focuses on querying and extracting relevant information via the hybrid filtration-polarity algorithm using Apache Hive. To overcome the noisy and unstructured nature of the data, the process of extracting information is characterized by pre and post -filtration phases. As a result, only with the integration of Flume and Hive with filtration and polarity analysis, can a reliable sentiment analysis technique be offered to collect and process large-scale data from the social network. We introduce how the Apache Hadoop ecosystem - Flume and Hive - can provide a sentiment analysis capability by storing and processing large amounts of data. An important finding of this paper is that developing efficient sentiment analysis applications for detecting diseases can be more reliable through the use of the Hadoop ecosystem components than through the use of normal machines.

Item Type: Article
Funders: Taif University Researchers Supporting Project[TURSP-2020/98]
Uncontrolled Keywords: Big data analysis;Data filtration;Text analysis;Sentiment analysis;Social media;Event detection
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 04 Aug 2022 02:58
Last Modified: 04 Aug 2022 02:58

Actions (login required)

View Item View Item