Back to Search
Start Over
When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval?
- Source :
- CIKM
- Publication Year :
- 2016
- Publisher :
- ACM, 2016.
-
Abstract
- Patent text is a mixture of legal terms and domain specific terms. In technical English text, a multi-word unit method is often deployed as a word formation strategy in order to expand the working vocabulary, i.e. introducing a new concept without the invention of an entirely new word. In this paper we explore query generation using natural language processing technologies in order to capture domain specific concepts represented as multi-word units. In this paper we examine a range of query generation methods using both linguistic and statistical information. We also propose a new method to identify domain specific terms from other more general phrases. We apply a machine learning approach using domain knowledge and corpus linguistic information in order to learn domain specific terms in relation to phrases' Termhood values. The experiments are conducted on the English part of the CLEF-IP 2013 test collection. The outcome of the experiments shows that the favoured method in terms of PRES and recall is when a language model is used and search terms are extracted with a part-of-speech tagger and a noun phrase chunker. With our proposed methods we improve each evaluation metric significantly compared to the existing state-of-the-art for the CLEP-IP 2013 test collection: for PRES@100 by 26% (0.544 from 0.433), for recall@100 by 17% (0.631 from 0.540) and on document MAP by 57% (0.300 from 0.191).
- Subjects :
- Vocabulary
Relation (database)
Computer science
media_common.quotation_subject
02 engineering and technology
computer.software_genre
Domain (software engineering)
Text mining
Rule-based machine translation
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
media_common
Information retrieval
business.industry
05 social sciences
Word formation
Noun phrase
Information extraction
Domain knowledge
Language model
Artificial intelligence
0509 other social sciences
050904 information & library sciences
business
computer
Natural language processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
- Accession number :
- edsair.doi...........685cf1cfc0393309fa14542b2c5ffb79
- Full Text :
- https://doi.org/10.1145/2983323.2983858