Back to Search Start Over

Research on Large-scale Corpus Full-text Retrieval Based on Lucene Technology.

Authors :
Cai, Fang
Source :
Procedia Computer Science; 2024, Vol. 247, p943-952, 10p
Publication Year :
2024

Abstract

Corpus retrieval is an important research direction in the field of corpora, which can help learners or applications search for relevant content and improve learning and application efficiency. Traditional retrieval methods have problems such as high development costs, complex structural design, and low data coverage. This article conducts research based on Lucene technology, leveraging its advantages such as inverted index structure, incremental indexing, object-oriented design, text analysis interfaces, and query engine services, to provide a complete solution for developing high-performance large-scale corpus full-text retrieval systems. Firstly, this article studies the foundation of Lucene technology, including system architecture, functional packages, and data flow analysis. Next, study probability retrieval models and directly model the relevance of user needs. Then, this article conducts system design and implementation, including system function design, inverted index design, and retrieval function implementation. Finally, this article conducts system testing and challenges, including building a testing environment, functional testing, and reliability testing. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18770509
Volume :
247
Database :
Supplemental Index
Journal :
Procedia Computer Science
Publication Type :
Academic Journal
Accession number :
180928978
Full Text :
https://doi.org/10.1016/j.procs.2024.10.114