Qur'anic words stemming

Raja Yusof, Raja Jamilah and Zainuddin, R. and Baba, Mohd Sapiyan and Yusoff, Z.M. (2010) Qur'anic words stemming. Arabian Journal for Science and Engineering, 35 (2C). pp. 37-49. ISSN 2193-567X,

Full text not available from this repository.
Official URL: https://pdfs.semanticscholar.org/fec6/3fee7f1ab03e...

Abstract

Arabic words are known to have complex morphological structure. The different structures produce various word patterns or derivatives from a root word. This paper attempts to identify various word patterns that originate from a root word. These word patterns are compared to the words in the 30th part of the Qur'an. Nine stemming test cases were outlined for words in the 30 th part of the Qur'an. Analysis showed that stemming nouns and particles leads to a lower percentage error compared to stemming the 10 alphabets that can be added as affixes in a root word. A rule-based stemming engine (RSE) was also implemented and the stemming accuracy achieved was 62.5 and the average time taken to stem 1000 word tokens was 11.7ms. The accuracy of the results was comparable to other stemming engines such as the Khoja stemmer, Buckwalter Morphological Analyzer (BAMA), Tri-literal Root Extraction (TRE) algorithm, and Voting algorithm.

Item Type: Article
Funders: UNSPECIFIED
Additional Information: Export Date: 7 November 2012 Source: Scopus Language of Original Document: English Correspondence Address: Yusof, R.J.R.; Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia; email: rjry@um.edu.my References: Dukes, K., Atwell, E., Sharaf, A.M., Syntactic Annotation Guidelines for the Quranic Arabic Treebank (2010) The Seventh International Conference on Language Resources and Evaluation (LREC-2010); Maamouri, M., Bies, A., Buckwalter, T., Mekki, W., The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus (2004) Proc. of the Arabic Language Technologies and Resources Int'l. Conference; NEMLAR, , http://papers.LDC.uPenn.edu/NEMLAR2004/Penn-Arabic-Treebank.pdf; Buckwalter, T., Issues in Morphological Analysis (2007) Arabic Computational Morphology, pp. 23-41. , Eds. A. Soudi, A. van den Bosch, and G. Neumann. Springer; Dukes, K., Habash, N., Morphological Annotation of Quranic Arabic (2010) Proceeding of the Seventh International Conference on Language Resources and Evaluation (LREC-2010); Darwish, K., Oard, D.W., Adapting Morphology for Arabic Information Retrieval (2007) Arabic Computational Morphology, pp. 245-162. , Eds. A. Soudi, A. van den Bosch, and G. Neumann., Springer; Taghva, K., Elkhoury, R., Coombs, J., Arabic Stemming Without a Root Dictionary Proc. of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01 (April 04-06, 2005). ITCC. IEEE Computer Society, Washington, DC, pp. 152-157. , http://dx.doi.org/10.1109/ITCC.2005.90, DOI=; Yaghi, J., Yagi, S.M., Systematic Verb Stem Generation for Arabic Proc. of the Workshop on Computational Approaches to Arabic Script-Based Languages (Geneva, Switzerland, August 28-28, 2004). ACL Workshops. Association for Computational Linguistics, Morristown, NJ, pp. 23-30; Larkey, L.S., Ballesteros, L., Connell, M.E., Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Tampere, Finland, August 11-15, 2002), pp. 275-282. , SIGIR '02. ACM, New York, NY; Al-Shammari, E.T., Lin, J., Towards an Error-Free Arabic Stemming Proceeding of the 2nd ACM Workshop on Improving Non English Web Searching (Napa Valley, California, USA, October 30-30, 2008). iNEWS '08. ACM, New York, NY, pp. 9-16; Rashwan, M., Al-Badrashiny, M., Attia, M.A., Stochastic Arabic Diacritizer Based on a Hybrid of Factorized and Un-factorized Textual Features (2010) IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 19, pp. 166-175; Habash, N., Rambow, O., Arabic Diacritization Through Full Morphological Tagging (2007) Proc. of the 8 th Meeting of the North American Chapter of the Association for Computational Linguistics (ACL); Human Language Technologies Conference (HLT-NAACL); Zitouni, I., Sorensen, J.S., Sarikaya, R., Maximum Entropy Based Restoration of Arabic Diacritics (2006) Proc. of the 21 st International Conference on Computational Linguistics and 44 th Annual Meeting of the Association for Computational Linguistics (ACL); Workshop on Computational Approaches to Semitic Languages, , http://www.ACLweb.org/anthology/P/P06/P06-1073, Sydney-Australia, July; Alsughaiyer, I.A., Alkharashi, I.A., Arabic Morphological Analysis Techniques: A Comprehensive Survey (2004) Journal of the American Society for Information Science and Technology, 55, pp. 189-213; Altantawy, M., Habash, N., Rambow, O., Saleh, I., Morphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach (2010) Proc. of the Seventh International Conference on Language Resources and Evaluation (LREC); Habash, N., Ranbow, O., Kiraz, G., Morphological Analysis and Generation for Arabic Dialects (2005) Proc. of the ACL Workshop on Computational Approaches to Semitic Languages, pp. 17-21; Thabet, N., Stemming the Qur'an (2004) Proc. of the Workshop on Computational Approaches to Arabic Script-based Languages, pp. 28-31; Al Kharashi, I.A., Al-Sughaiyer, I.A., Rule Merging in a Ruled-Based Arabic Stemmer (2002) Proc. of the 19th International Conference on Computational Linguistics; Sawalha, M., Atwell, E., Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers (2008) Proc. of COLING 2008 22nd International Conference on Comptational Linguistics; Dukes, A., (2010) Phonetic Transcription, , http://corpus.quran.com/documentation/phonetic.jsp, Last retrieved in June 2010; Buckwalter, T., Arabic Transliteration, , http://www.qamus.org/transliteration.htm, Retrieved September 13, 2010, from Buckwalter Arabic Transliteration; Ismail, R., (2008) Kosa Kata Bahasa Arab Mikro, Teknik dan Aspek Micro Arabic Vocabulary, Techniques and Aspects, , Kota Bahru, Kelantan, Malaysia; Forbes, D., (1868) Grammar of the Arabic Language, , London: WM H Allen & Co; Darwish, K., Building a Shallow Arabic Morphological Analyzer in One Day (2002) Proc. of the ACL-02 Workshop on Computational Approaches to Semitic Languages, pp. 1-8. , Association for Computational Linguistics, Morristown, NJ, USA; Wintner, S., Strengths and Weaknesses of Finite-State Technology: A Case Study in Morphological Grammar Development (2008) Nat. Lang. Eng., 14, pp. 457-469
Uncontrolled Keywords: Arabic stemming; Arabic word morphology; Qur'an; Rule-Based algorithm
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Computer Science & Information Technology > Department of Artificial Intelligence
Depositing User: Ms Maisarah Mohd Muksin
Date Deposited: 04 Jan 2013 16:48
Last Modified: 30 Apr 2021 00:43
URI: http://eprints.um.edu.my/id/eprint/5673

Actions (login required)

View Item View Item