Estimating reliability of the retrieval systems effectiveness rank based on performance in multiple experiments

Zhang, S. and Ravana, S.D. (2017) Estimating reliability of the retrieval systems effectiveness rank based on performance in multiple experiments. Cluster Computing, 20 (1). pp. 925-940. ISSN 1386-7857, DOI

Full text not available from this repository.
Official URL:


For decades, the use of test collection has been a standardized approach in information retrieval evaluation. However, given the intrinsic nature of its construction, this approach has a number of limitations, such as bias in pooling, disagreement between human assessors, different levels of difficulty of topics, and performance constraints of the evaluation metrics. Any of these factors may distort the results of the relative effectiveness of different retrieval strategies, or rather the retrieval systems and thus result in unreliable system rankings. In this study, we have suggested techniques in estimating the reliability of the retrieval system effectiveness rank based on rankings from multiple experiments. These rankings may be from previous experimental results or rankings generated by conducting multiple experiments using smaller number of topics. These techniques will assist in precisely predicting the performance of each system in future experiments. To validate the proposed rank reliability estimation methods, two alternative systems ranking methods are proposed to generate new system rankings. The experimentation shows that system rank correlation coefficient values mostly remain above 0.8 against the gold standard. On top of that, the proposed techniques have generated system rankings that are more reliable than the baseline [traditional system ranking techniques used in text retrieval conference (TREC)-like initiatives]. The results from both TREC-2004 and TREC-8 show the same outcome which further confirms the effectiveness of the proposed rank reliability estimation method.

Item Type: Article
Funders: UMRG RP028E-14AET, Exploratory Research Grant Scheme (ERGS) ER027-2013A
Uncontrolled Keywords: Information retrieval; Relevance judgments; Retrieval evaluation; System rankings; TREC
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 06 Jun 2018 06:46
Last Modified: 06 Jun 2018 06:46

Actions (login required)

View Item View Item