Back to Search Start Over

On Term Selection Techniques for Patent Prior Art Search

Authors :
Scott Sanner
Mohamed Reda Bouadjenek
David Hawking
Gabriela Ferraro
Mona Golestan Far
Australian National University (ANU)
National ICT Australia [Sydney] (NICTA)
Oregon State University (OSU)
Scientific Data Management (ZENITH)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Inria Sophia Antipolis - Méditerranée (CRISAM)
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
BING - Microsoft
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Inria Sophia Antipolis - Méditerranée (CRISAM)
Source :
SIGIR: Research and Development in Information Retrieval, SIGIR: Research and Development in Information Retrieval, Aug 2015, Santiago, Chile. ⟨10.1145/2766462.2767801⟩, SIGIR
Publication Year :
2015
Publisher :
HAL CCSD, 2015.

Abstract

International audience; In this paper, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection , using the Description section of the patent query with Language Model (LM) and BM25 scoring functions. We find that an oracular relevance feedback system that extracts terms from the judged relevant documents far out-performs the baseline and performs twice as well on MAP as the best competitor in CLEF-IP 2010. We find a very clear term selection value threshold for use when choosing terms. We also noticed that most of the useful feedback terms are actually present in the original query and hypothesized that the baseline system could be substantially improved by removing negative query terms. We tried four simple automated approaches to identify negative terms for query reduction but we were unable to notably improve on the baseline performance with any of them. However, we show that a simple, minimal interactive relevance feedback approach where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010 suggesting the promise of interactive methods for term selection in patent prior art search.

Details

Language :
English
Database :
OpenAIRE
Journal :
SIGIR: Research and Development in Information Retrieval, SIGIR: Research and Development in Information Retrieval, Aug 2015, Santiago, Chile. ⟨10.1145/2766462.2767801⟩, SIGIR
Accession number :
edsair.doi.dedup.....d72c3c85d3d286e1032b347816eb4c5b
Full Text :
https://doi.org/10.1145/2766462.2767801⟩