1. Promoting ranking diversity for genomics search with relevance-novelty combined model
- Author
-
Xiaohua Hu, Zhoujun Li, Jimmy Xiangji Huang, and Xiaoshi Yin
- Subjects
020205 medical informatics ,Computer science ,Information Storage and Retrieval ,Information needs ,02 engineering and technology ,Biochemistry ,Ranking (information retrieval) ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Controlled vocabulary ,0202 electrical engineering, electronic engineering, information engineering ,Relevance (information retrieval) ,030212 general & internal medicine ,Graphical model ,Molecular Biology ,Anchor text ,Cognitive models of information retrieval ,Information retrieval ,Applied Mathematics ,Novelty ,Genomics ,Unified Medical Language System ,Semantics ,Computer Science Applications ,Proceedings ,Vocabulary, Controlled ,Ranking ,Human–computer information retrieval ,Algorithms - Abstract
Background In the biomedical domain, the desired information of a question (query) asked by biologists usually is a list of a certain type of entities covering different aspects that are related to the question, such as genes, proteins, diseases, mutations, etc. Hence it is important for a biomedical information retrieval system to be able to provide comprehensive and diverse answers to fulfill biologists’ information needs. However, traditional retrieval models assume that the relevance of a document is independent of the relevance of other documents. This assumption may result in high redundancy and low diversity in the retrieval ranked lists. Results In this paper, we propose a relevance-novelty combined model, named RelNov model, based on the framework of an undirected graphical model. It consists of two component models, namely the aspect-term relevance model and the aspect-term novelty model. They model the relevance of a document and the novelty of a document respectively. We show that our approach can achieve 16. 4% improvement over the highest aspect level MAP reported in the TREC 2007 Genomics track, and 9. 8% improvement over the highest passage level MAP reported in the TREC 2007 Genomics track. Conclusions The proposed combination model which models aspects, terms, topic relevance and document novelty as potential functions is demonstrated to be effective in promoting ranking diversity as well as in improving relevance of ranked lists for genomics search. We also show that the use of aspect plays an important role in the model. Moreover, the proposed model can integrate various different relevance and novelty measures easily.
- Published
- 2011