Descriptor: "Full text search" / Journal: information processing & management - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Full text search"' showing total 9 results

Start Over Descriptor "Full text search" Journal information processing & management

9 results on '"Full text search"'

1. Hybrid compression of inverted lists for reordered document collections

Author: Diego Arroyuelo, Mauricio Oyarzún, Senén González, and Víctor Rondón Sepúlveda
Subjects: Information retrieval, Computer science, Full text search, 02 engineering and technology, Library and Information Sciences, Management Science and Operations Research, Inverted index, Data structure, Computer Science Applications, Term (time), Identifier, Reduction (complexity), Index (publishing), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Information Systems, Integer (computer science)
Abstract: Text search engines are a fundamental tool nowadays. Their efficiency relies on a popular and simple data structure: inverted indexes. They store an inverted list per term of the vocabulary. The inverted list of a given term stores, among other things, the document identifiers (docIDs) of the documents that contain the term. Currently, inverted indexes can be stored efficiently using integer compression schemes. Previous research also studied how an optimized document ordering can be used to assign docIDs to the document database. This yields important improvements in index compression and query processing time. In this paper we show that using a hybrid compression approach on the inverted lists is more effective in this scenario, with two main contributions: • First, we introduce a document reordering approach that aims at generating runs of consecutive docIDs in a properly-selected subset of inverted lists of the index. • Second, we introduce hybrid compression approaches that combine gap and run-length encodings within inverted lists, in order to take advantage not only from small gaps, but also from long runs of consecutive docIDs generated by our document reordering approach. Our experimental results indicate a reduction of about 10%–30% in the space usage of the whole index (just regarding docIDs), compared with the most efficient state-of-the-art results. Also, decompression speed is up to 1.22 times faster if the runs of consecutive docIDs must be explicitly decompressed, and up to 4.58 times faster if implicit decompression of these runs is allowed (e.g., representing the runs as intervals in the output). Finally, we also improve the query processing time of AND queries (by up to 12%), WAND queries (by up to 23%), and full (non-ranked) OR queries (by up to 86%), outperforming the best existing approaches.
Published: 2018
Full Text: View/download PDF

2. Term discrimination for text search tasks derived from negative binomial distribution

Author: Lorenz Bernauer, Eun Jin Han, and So Young Sohn
Subjects: Term Discrimination, 05 social sciences, Negative binomial distribution, Full text search, 02 engineering and technology, Library and Information Sciences, Management Science and Operations Research, Residual, Interaction, Computer Science Applications, Term (time), Normalized discounted cumulative gain, Statistics, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, 0509 other social sciences, 050904 information & library sciences, tf–idf, Information Systems, Mathematics
Abstract: Accurate term discrimination in information retrieval is essential for identifying important terms in specific documents. In addition to the widely known inverse document frequency (IDF) method, alternative approaches such as the residual inverse document frequency (RIDF) scheme have been introduced for term discrimination. However, existing methods' performance is not unconditionally convincing. We propose a new collection frequency weighting scheme derived from the negative binomial distribution model of term occurrences. Factorial experiments were performed to examine potential interaction effect between collection frequency weight methods and term frequency weight methods according to the mean average precision and normalized discounted cumulative gain performance assessors. The results indicate that our proposed term discrimination method offers a significant gain in accuracy as compared to the IDF and RIDF scheme. This finding is reinforced by the fact that the results show no interaction effects among factors.
Published: 2018
Full Text: View/download PDF

3. A usage study of retrieval modalities for video shot retrieval

Author: Alan F. Smeaton and Paul Browne
Subjects: Information retrieval, Modalities, Multimedia, Computer science, Search engine indexing, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Full text search, Library and Information Sciences, Management Science and Operations Research, Object (computer science), computer.software_genre, Computer Science Applications, Video tracking, Media Technology, Segmentation, Video browsing, Set (psychology), computer, Information Systems
Abstract: As an information medium, video offers many possible retrieval and browsing modalities, far more than text, image or audio. Some of these, like searching the text of the spoken dialogue, are well developed, others like keyframe browsing tools are in their infancy, and others not yet technically achievable. For those modalities for browsing and retrieval which we cannot yet achieve we can only speculate as to how useful they will actually be, but we do not know for sure. In our work we have created a system to support multiple modalities for video browsing and retrieval including text search through the spoken dialogue, image matching against shot keyframes and object matching against segmented video objects. For the last of these, automatic segmentation and tracking of video objects is a computationally demanding problem which is not yet solved for generic natural video material, and when it is then it is expected to open up possibilities for user interaction with objects in video, including searching and browsing. In this paper we achieve object segmentation by working in a closed domain of animated cartoons. We describe an interactive user experiment on a medium-sized corpus of video where we were able to measure users' use of video objects versus other modes of retrieval during multiple-iteration searching. Results of this experiment show that although object searching is used far less than text searching in the first iteration of a user's search it is a popular and useful search type once an initial set of relevant shots have been found.
Published: 2006
Full Text: View/download PDF

4. An evaluation method of words tendency depending on time-series variation and its improvements

Author: Masami Shishibori, Makoto Okada, El-Sayed Atlam, and Jun-ichi Aoe
Subjects: business.industry, Computer science, Decision tree, Stability (learning theory), Full text search, Library and Information Sciences, Management Science and Operations Research, computer.software_genre, Machine learning, Computer Science Applications, Word lists by frequency, Media Technology, Data analysis, Proper noun, Artificial intelligence, business, computer, Natural language processing, Word (computer architecture), Information Systems, Test data
Abstract: In every text, some words have frequency appearance and are considered as keywords because they have a strong relationship with the subjects of their texts, these words' frequencies change with time-series variation in a given period. However, in traditional text dealing methods and text search techniques, the importance of frequency change with time-series variation is not considered. Therefore, traditional methods could not correctly determine the index of word's popularity in a given period. In this paper, a new method is proposed to estimate automatically the stability classes (increasing, relatively constant, and decreasing) that indicate word's popularity with time-series variation based on the frequency change in past text data. At first, learning data were produced by defining five attributes to measure the frequency change of a word quantitatively. These five attributes were extracted automatically from electronic texts. These learning data were manual (human) classified into three stability classes. Then, these data were subjected to a decision tree to determine automatically stability classes of analysis data (test data). For learning data, we obtained the attribute values of 443 proper nouns that were extracted from 2216 articles of CNN newspapers (1997-1999) that discussed professional baseball. For testing data, 472 proper nouns that were extracted from 972 articles of CNN newspaper (1997-2000) then classified them automatically using decision tree. According to the comparison between the evaluation of the decision tree results and manually (human) results, F-measures of increasing, relatively constant and decreasing classes were 0.847, 0.851, and 0.768, respectively, and the effectiveness of this method is achieved.
Published: 2002
Full Text: View/download PDF

5. Towards data abstraction in networked information retrieval systems

Author: Norbert Fuhr
Subjects: Information retrieval, Computer science, Interoperability, Database schema, Full text search, Inference, Library and Information Sciences, Management Science and Operations Research, Data type, Computer Science Applications, Data independence, Media Technology, Uncertain inference, Proper noun, Information Systems
Abstract: Networked information retrieval aims at the interoperability of heterogeneous information retrieval (IR) systems. In this paper, we show how differences concerning search operators and database schemas can be handled by applying data abstraction concepts in combination with uncertain inference. Different data types with vague predicates are required to allow for queries referring to arbitrary attributes of documents. Physical data independence separates search operators from access paths, thus solving text search problems related to noun phrases, compound words and proper nouns. Projection and inheritance on attributes support the creation of unified views on a set of IR databases. Uncertain inference allows for query processing even on incompatible database schemas.
Published: 1999
Full Text: View/download PDF

6. Automatic text structuring and categorization as a first step in summarizing legal cases

Author: Marie-Francine Moens and Caroline Uyttendaele
Subjects: Information retrieval, Grammar, Knowledge representation and reasoning, Computer science, business.industry, media_common.quotation_subject, Full text search, Library and Information Sciences, Management Science and Operations Research, computer.software_genre, Structuring, Semantic network, Computer Science Applications, Information extraction, Categorization, Media Technology, Relevance (information retrieval), Artificial intelligence, business, computer, Natural language processing, Information Systems, media_common
Abstract: The SALOMON system automatically summarizes Belgian criminal cases in order to improve access to the large number of existing and future court decisions. SALOMON extracts relevant text units from the case text to form a case summary. Such a case profile facilitates the rapid determination of the relevance of the case or may be employed in text search. In a first important abstracting step SALOMON performs an initial categorization of legal criminal cases and structures the case text into separate legally relevant and irrelevant components. A text grammar represented as a semantic network is used to automatically determine the category of the case and its components. In this way, we are able to extract from the case general data and to identify text portions relevant for further abstracting. It is argued that prior knowledge of the text structure and its indicative cues may support automatic abstracting. A text grammar is a promising form for representing the knowledge involved.
Published: 1997
Full Text: View/download PDF

7. Expanding end-users' query statements for free text searching with a search-aid thesaurus

Author: Jaana Kristensen
Subjects: Information retrieval, Recall, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Full text search, Library and Information Sciences, Management Science and Operations Research, Computer Science Applications, User assistance, Controlled vocabulary, Media Technology, Equivalence (formal languages), Document retrieval, Precision and recall, Associative property, Information Systems
Abstract: Authors and searchers usually express the same things in many different ways, which causes problems in free text searching of text databases. Thus, a switching tool connecting the different names of one concept is needed. This study tests the effectiveness of a thesaurus as a search-aid in free text searching of a full text database. A set of queries was searched against a large full text database of newspaper articles. The search-aid thesaurus constructed for the test contains the usual relationships of a thesaurus, namely equivalence, hierarchical, and associative relationships. Each query was searched in five distinct modes: basic search, synonym search, narrower term search, related term search, and union of all previous searches. The basic searches contained only terms included in the original query statements. In the synonym searches, the terms of the basic search were extended by disjunction of the synonyms given by the search-aid thesaurus without modifying the overall logic of the basic search. Likewise, the basic search was extended in turn with the narrower terms and with the related terms given by the search-aid thesaurus. The last search mode included the basic terms and all the terms used in the previous searches. The searches were analyzed in terms of relative recall and precision; relative recall was estimated by setting the recall of the union search to 100%. On the average the value of relative recall was 47.2% in the basic search, compared with 100% in the union search; the average value of precision decreased only from 62.5% in the basic search to 51.2% in the union search.
Published: 1993
Full Text: View/download PDF

8. Improving full text search performance through textual analysis

Author: Mavis B. Molto
Subjects: Computer science, business.industry, Information structure, Full text search, Design strategy, Library and Information Sciences, Management Science and Operations Research, computer.software_genre, Computer Science Applications, Term (time), Text mining, Media Technology, Proper noun, Personal name, Artificial intelligence, business, computer, Natural language processing, Natural language, Information Systems
Abstract: The increased availability of full text databases has given rise to a number of retrieval problems, resulting from the ambiguities in natural language. The purpose of this study was to explore the potential of text analysis as a tool in full text search and design improvement. A trial analysis was performed in a selected domain, family history literature, and search and design recommendations were then developed from the findings. The findings included information specific to name searching, along with article length and graphical data. Surprisingly, life event terms (e.g., birth year or marriage state), which are commonly used terms in name searches, occurred in the trial text relevant to less than a third of the sampled persons. This suggests that the higher frequency personal name terms (e.g., the subject's name or father's name) should be searched instead. Differences in male versus female search term patterns also occurred, suggesting gender-specific search strategies. There was a low incidence of pedigree charts in the literature, a finding of potential use in design. All of the findings offered insights into possible gains and losses in using one search or design strategy versus another, with strong evidence provided as to the potential of text analysis in full text search and design improvement.
Published: 1993
Full Text: View/download PDF

9. The second Text Retrieval Conference (TREC-2)

Author: Donna Harman
Subjects: Information retrieval, Computer science, Media Technology, Full text search, Library and Information Sciences, Management Science and Operations Research, Text Retrieval Conference, Computer Science Applications, Information Systems
Published: 1995
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Full text search"'

1. Hybrid compression of inverted lists for reordered document collections

2. Term discrimination for text search tasks derived from negative binomial distribution

3. A usage study of retrieval modalities for video shot retrieval

4. An evaluation method of words tendency depending on time-series variation and its improvements

5. Towards data abstraction in networked information retrieval systems

6. Automatic text structuring and categorization as a first step in summarizing legal cases

7. Expanding end-users' query statements for free text searching with a search-aid thesaurus

8. Improving full text search performance through textual analysis

9. The second Text Retrieval Conference (TREC-2)

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

9 results on '"Full text search"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources