Partition-based pattern matching approach for efficient retrieval of Arabic text

Hakak, Saqib Iqbal and Kamsin, Amirrudin and Shivakumara, Palaiahnakote and Idris, Mohd Yamani Idna (2018) Partition-based pattern matching approach for efficient retrieval of Arabic text. Malaysian Journal of Computer Science, 31 (3). pp. 200-209. ISSN 0127-9084

Full text not available from this repository.
Official URL: https://doi.org/10.22452/mjcs.vol31no3.3

Abstract

Encoding for Arabic based on the Unicode Transformation Format (UTF) differs from encoding for English based on the American Standard Code for Information Interchange (ASCII) since the Arabic usage of diacritics, symbols and elongated characters makes searching more challenging in the field of information retrieval. In this paper, we propose a new partition-based pattern matching approach that divides the query words into two equal parts (sub-parts). The proposed approach treats the two divided sub-parts as independent query words and uses a parallel search to match the content in the database. In addition, the proposed approach modifies the conventional brute force pattern matching to speed up the searching process which results in efficient text retrieval from any database. The experimental results are used to evaluate the proposed approach in terms of processing time. The comparative analysis of the existing approaches and the proposed approach reveals that the proposed approach outperforms all other existing approaches in terms of computational time.

Item Type: Article
Uncontrolled Keywords: Arabic texts; Digital Quran; Exact matching; Information retrieval; Partition-based pattern matching; Short patterns
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 20 Aug 2019 04:47
Last Modified: 20 Aug 2019 04:47
URI: http://eprints.um.edu.my/id/eprint/21978

Actions (login required)

View Item View Item