Improving document relevancy using integrated language modeling techniques

Balakrishnan, Vimala and Humaidi, N. and Lloyd-Yemoh, E. (2016) Improving document relevancy using integrated language modeling techniques. Malaysian Journal of Computer Science, 29 (1). pp. 45-55. ISSN 0127-9084, DOI https://doi.org/10.22452/mjcs.vol29no1.4.

Full text not available from this repository.
Official URL: https://doi.org/10.22452/mjcs.vol29no1.4

Abstract

This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved.

Item Type: Article
Funders: University of Malaya: (UMRG-RP028A-14AET)
Uncontrolled Keywords: Document relevancy; Information retrieval; Language modeling; Lemmatization; Mean average precision; Stemming
Subjects: H Social Sciences > HF Commerce
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 05 Dec 2017 04:37
Last Modified: 08 Jan 2020 07:41
URI: http://eprints.um.edu.my/id/eprint/18453

Actions (login required)

View Item View Item