108 results
Search Results
2. Modelling of User Preferences and Needs in Boolean Retrieval Systems.
- Author
-
Danilowicz, Czeslaw
- Abstract
Discusses end-user searching in Boolean information retrieval systems considers the role of search intermediaries and proposes a model of user preferences that incorporates a user's profile. Highlights include document representation; information queries; document output ranking; calculating user profiles; and selecting documents for a local database using the user's profile. (Contains 28 references.) (LRW)
- Published
- 1994
3. A Learning Scheme for Information Retrieval in Hypertext.
- Author
-
Savoy, Jacques
- Abstract
Proposes a new learning algorithm to improve the retrieval effectiveness of the search system used in the hypertext environment using an extended Boolean model with links to improve the ranking of retrieved items. Highlights include the basic retrieval scheme, the learning scheme, and a review of basic probabilistic retrieval models. (Contains 32 references.) (JLB)
- Published
- 1994
4. Fuzzy Query Processing Using Clustering Techniques.
- Author
-
Kamel, M.
- Abstract
Discusses the problem of processing fuzzy queries in databases and information retrieval systems and presents a prototype of a fuzzy query processing system for databases that is based on data clustering and uses Pascal programing language. Clustering schemes are explained, and the system architecture that uses natural language is described. (14 references) (LRW)
- Published
- 1990
5. Information Storage and Retrieval Scientific Report No. ISR-22.
- Author
-
Cornell Univ., Ithaca, NY. Dept. of Computer Science. and Salton, Gerard
- Abstract
The twenty-second in a series, this report describes research in information organization and retrieval conducted by the Department of Computer Science at Cornell University. The report covers work carried out during the period summer 1972 through summer 1974 and is divided into four parts: indexing theory, automatic content analysis, feedback searching, and dynamic file management. Twelve individual papers are presented. (Author/DGC)
- Published
- 1974
6. Term Fragment Analysis for Inversion of Large Files.
- Author
-
Illinois Inst. of Tech., Chicago. Research Inst. and Schipma, Peter B.
- Abstract
Words and word fragments from the computer-readable data bases "Chemical Abstracts Condensates" and "Biological Abstracts Previews" were analyzed in terms of length, number, and frequency of appearance to determine some parameters upon which inversion of these data bases could be predicated. Types (unique words or fragments) and tokens (all appearances of types) were counted and type:token ratios calculated. A KLIC (Key-letter-in-Context) Index was also generated from each of the data bases. The paper discusses the impact of the various counts, ratios and projections on the problem of inverting the data bases for retrospective search purposes. (Author)
- Published
- 1971
7. Efficient Organization and Access of Multi-Dimensional Datasets on Tertiary Storage Systems.
- Author
-
Chen, L. T.
- Abstract
This paper addresses the problem of data management techniques for efficiently retrieving requested subsets of large data sets from mass storage devices. Describes the development of algorithms and software that facilitate the partitioning of a large data set into multiple "clusters" that reflect their expected access. (Author/JKP)
- Published
- 1995
8. Information Processing of Remote-Sensing Data.
- Author
-
Berry, P. A. M. and Meadows, A. J.
- Abstract
Reviews the current status of satellite remote sensing data, including problems with efficient storage and rapid retrieval of the data, and appropriate computer graphics to process images. Areas of research concerned with overcoming these problems are described. (16 references) (CLB)
- Published
- 1987
9. Boolean Interpretation of Conjunctions for Document Retrieval.
- Author
-
Das-Gupta, Padmini
- Abstract
Presents an algorithm for use in natural language document retrieval systems which automatically determines if the conjunction "and" in a statement representing an information need should be translated into a Boolean "and" or "or." The results of an experiment that used the algorithm are reported, and further research is suggested. (CLB)
- Published
- 1987
10. Automatic Identification of Duplicates after Multidatabase Online Searching.
- Author
-
Onorato, Eveline and Bianchi, Gianfranco
- Abstract
Discusses the problem of duplicate citations resulting from file overlaps in multidatabase searching and shows that such duplicates could be identified automatically and eliminated by a host computer as a complementary service to online retrieval. Steps involved in the realization of this service are described, and 11 references are listed. (RBF)
- Published
- 1981
11. Mining Web data for Chinese segmentation.
- Author
-
Fu Lee Wang and Yang, Christopher C.
- Subjects
- *
PAPER , *DATA mining , *CHINESE people , *ALGORITHMS , *DATABASES , *SEARCH engines , *BLOGS , *INFORMATION retrieval , *INTERNET - Abstract
Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
12. Retrieval of Morphological Variants in Searches of Latin Text Databases
- Author
-
Schinke, Robyn, Greengrass, Mark, Robertson, Alexander M., and Willett, Peter
- Published
- 1997
13. Algorytmiczna metoda określania tonacji utworu muzycznego.
- Author
-
KOKKINOPOULOS, Konstantinos, KANIA, Paulina, and KANIA, Dariusz
- Subjects
MUSICAL form ,INFORMATION retrieval ,ALGORITHMS ,DATABASES - Abstract
Copyright of Przeglad Elektrotechniczny is the property of Przeglad Elektrotechniczny and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
- Full Text
- View/download PDF
14. An Enhanced Symptom Clustering with Profile Based Prescription Suggestion in Biomedical application.
- Author
-
Vijayarajeswari, R., Nagabhushan, M., and Parthasarathy, P.
- Subjects
DIAGNOSIS ,SYMPTOMS ,ALGORITHMS ,ARTIFICIAL intelligence ,BIOMEDICAL engineering ,DATABASE management ,DATABASES ,DECISION making ,INFORMATION retrieval ,INTERPROFESSIONAL relations ,MEDICAL records ,DATA mining ,ONTOLOGIES (Information retrieval) - Abstract
The application of data mining has been increasing day to day whereas the data base is also enhancing simultaneously. Hence retrieving required content from a huge data base is a critical task. This paper focus on biomedical engineering field, it concentrates on initial stage of database such as data preprocessing and cleansing to deal with noise and missing data in large biomedical data sets. The database of biomedical is huge and enhancing nature retrieving of specific content will be a critical task. Suggesting prescription with respect to identified disease based on profile analysis of specific patient is not available in current system. This paper proposes a recommendation system of prescription based on disease identification is done by combining user and professional suggestion with profile based analysis. Hence this focuses on profile based suggestions and report will be generated. The retrieving of specific suggestion from a huge database is done by hybrid feature selection algorithm. This approach focuses on enabling recommendation based on user profile and implementing Hybrid feature selection algorithm to retrieve specific content from a huge database. Hence it attains better retrieval of required content from a huge database compared to other existing approaches and suggests better recommendation with respect to user profile. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. 一种高效基于模式图的数据库关键字查询方法.
- Author
-
费云峰, 丁国辉, 滕一平, 李 景, and 孙莎莎
- Subjects
DATABASE searching ,KEYWORD searching ,INFORMATION retrieval ,DATABASES ,ALGORITHMS ,RELATIONAL databases - Abstract
Copyright of Application Research of Computers / Jisuanji Yingyong Yanjiu is the property of Application Research of Computers Edition and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
- Full Text
- View/download PDF
16. Authority-Based Keyword Search in Databases.
- Author
-
Hristidis, Vagelis, Heasoo Hwang, and Papakonstantinou, Yannis
- Subjects
KEYWORD searching ,ASSISTED searching (Information retrieval) ,DATABASES ,QUERY (Information retrieval system) ,INFORMATION storage & retrieval systems ,INFORMATION retrieval - Abstract
Our system applies authority-based ranking to keyword search in databases modeled as labeled graphs. Three ranking factors are used: the relevance to the query, the specificity and the importance of the result. All factors are handled using authority-flow techniques that exploit the link-structure of the data graph, in contrast to traditional Information Retrieval. We address the performance challenges in computing the authority flows in databases by using precomputation and exploiting the database schema if present. We conducted user surveys and performance experiments on multiple real and synthetic datasets, to assess the semantic meaningfulness and performance of our system. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
17. Adapted One-versus-All Decision Trees for Data Stream Classification.
- Author
-
Hashemi, Sattar, Ying Yang, Mirzamomen, Zahra, and Kangavari, Mohammadreza
- Subjects
DECISION making ,BINARY control systems ,INFORMATION services ,DATABASES ,DATA mining ,INFORMATION retrieval ,DATABASE searching ,ELECTRONIC data processing ,ALGORITHMS - Abstract
One-versus-all (OVA) classifiers learn κ individual binary classifiers, each distinguishing the instances of a single class from the instances of all other classes. To classify a new instance, the κ classifiers are run, and the one that returns the highest confidence is chosen. Thus, OVA is different from existing data stream classification schemes whose majority use multiclass classifiers, each discriminating among all the classes. This paper advocates some outstanding advantages of OVA for data stream classification. First, there is low error correlation and, hence, high diversity among OVA's component classifiers, which leads to high classification accuracy. Second, OVA is adept at accommodating new class labels that often appear in data streams. However, there also remain many challenges to deploy traditional OVA for classifying data streams. First, traditional OVA does not handle concept change, a key feature of data streams. Second, as every instance is fed to all component classifiers, OVA is known as an inefficient model. Third, OVA's classification accuracy is adversely affected by the imbalanced class distributions in data streams. This paper addresses those key challenges and consequently proposes a new OVA scheme that is adapted for data stream classification. Theoretical analysis and empirical evidence reveal that the adapted OVA can offer faster training, faster updating, and higher classification accuracy than many existing popular data stream classification algorithms. We expect these results to be of interest to researchers and practitioners because they suggest a simple but very elegant and effective alternative to existing classification schemes for data streams. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
18. WAVELET-BASED MULTIRESOLUTION HISTOGRAM FOR FAST IMAGE RETRIEVAL.
- Author
-
Jain, Pawan and Merchant, S. N.
- Subjects
INFORMATION storage & retrieval systems ,INFORMATION retrieval ,MULTIMEDIA systems ,ALGORITHMS ,IMAGE retrieval ,DATABASES - Abstract
Most of the content-based image retrieval systems require a distance computation of feature vectors for each candidate image in the image database. This exhaustive search is highly time-consuming and inefficient. This limits the usefulness of such system. Thus there is a growing need for a fast image retrieval system. Multiresolution data-structure algorithm provides a good solution to the above problem. In this paper we propose a wavelet-based multiresolution data-structure algorithm. Wavelet-based multiresolution data-structure further reduce the number of computation by around 50%. In the proposed approach we reuse the information obtained at lower resolution levels to calculate the distance at a higher resolution level. Apart from this, the proposed structure saves memory overheads by about 50% over multiresolution data-structure algorithm. The proposed algorithm can be easily combined with other algorithms for performance enhancement.[sup 4] In this paper we use the proposed technique to match luminance histogram for image retrieval. Fuzzy histograms enhances performance by considering the similarity between neighboring bins. We have extended the proposed approach to fuzzy histograms for better performance. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
19. NONLINEAR DYNAMICS TEXT MINING USING BIBLIOMETRICS AND DATABASE TOMOGRAPHY.
- Author
-
Kostoff, Ronald N., Shlesinger, Michael F., and Tshiteya, Rene
- Subjects
DATABASES ,TOMOGRAPHY ,ALGORITHMS ,DYNAMICS ,INFORMATION services ,FREQUENCIES of oscillating systems - Abstract
Database Tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a Nonlinear Dynamics database derived from the Science Citation Index/Social Science Citation Index (SCI). Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the Nonlinear Dynamics database, and the phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the Nonlinear Dynamics literature supplemented the DT results with author/journal/institution publication and citation data. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
20. A critical assessment of using ChatGPT for extracting structured data from clinical notes.
- Author
-
Huang, Jingwei, Yang, Donghan M., Rong, Ruichen, Nezafati, Kuroush, Treager, Colin, Chi, Zhikai, Wang, Shidan, Cheng, Xian, Guo, Yujia, Klesse, Laura J., Xiao, Guanghua, Peterson, Eric D., Zhan, Xiaowei, and Xie, Yang
- Subjects
ARTIFICIAL intelligence tests ,OSTEOSARCOMA ,DATABASES ,MEDICAL information storage & retrieval systems ,TERMS & phrases ,ARTIFICIAL intelligence ,RESEARCH evaluation ,EVALUATION of organizational effectiveness ,DATA curation ,NATURAL language processing ,DECISION making in clinical medicine ,PROBLEM solving ,DESCRIPTIVE statistics ,WORKFLOW ,INFORMATION retrieval ,MEDICAL records ,CONCEPTUAL structures ,LUNG tumors ,TUMOR classification ,ALGORITHMS - Abstract
Existing natural language processing (NLP) methods to convert free-text clinical notes into structured data often require problem-specific annotations and model training. This study aims to evaluate ChatGPT's capacity to extract information from free-text medical notes efficiently and comprehensively. We developed a large language model (LLM)-based workflow, utilizing systems engineering methodology and spiral "prompt engineering" process, leveraging OpenAI's API for batch querying ChatGPT. We evaluated the effectiveness of this method using a dataset of more than 1000 lung cancer pathology reports and a dataset of 191 pediatric osteosarcoma pathology reports, comparing the ChatGPT-3.5 (gpt-3.5-turbo-16k) outputs with expert-curated structured data. ChatGPT-3.5 demonstrated the ability to extract pathological classifications with an overall accuracy of 89%, in lung cancer dataset, outperforming the performance of two traditional NLP methods. The performance is influenced by the design of the instructive prompt. Our case analysis shows that most misclassifications were due to the lack of highly specialized pathology terminology, and erroneous interpretation of TNM staging rules. Reproducibility shows the relatively stable performance of ChatGPT-3.5 over time. In pediatric osteosarcoma dataset, ChatGPT-3.5 accurately classified both grades and margin status with accuracy of 98.6% and 100% respectively. Our study shows the feasibility of using ChatGPT to process large volumes of clinical notes for structured information extraction without requiring extensive task-specific human annotation and model training. The results underscore the potential role of LLMs in transforming unstructured healthcare data into structured formats, thereby supporting research and aiding clinical decision-making. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Content-based image retrieval in DCT compressed domain with MPEG-7 edge descriptor and genetic algorithm.
- Author
-
Phadikar, Baisakhi Sur, Phadikar, Amit, and Maity, Goutam Kumar
- Subjects
INFORMATION retrieval ,IMAGE retrieval ,IMAGE processing ,DATABASES ,ALGORITHMS - Abstract
In this paper, we propose a content-based image retrieval scheme in discrete cosine transform compressed domain with the help of genetic algorithm (GA). A combination of three image features, i.e., color histogram, color moments, and edge histogram, is extracted directly from the compressed domain and is used for similarity matching using the Euclidian distance. However, all the above image features are not equally important in image retrieval. Before similarity matching, GA is used to provide optimal weight factor (importance) on the image features to improve the system performance. Extensive experiments are carried out on three publicly available databases, and the comparison results demonstrate the outperforming performance of the proposed method over state-of-the-art techniques. Moreover, it is also seen that the use of GA improves 5.83% in precession (
P ), 6.36% in recall (R ), and 5.84% inF -score values over non-GA-based techniques. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
22. GLIP: A Concurrency Control Protocol for Clipping Indexing.
- Author
-
Chang-Tien Lu, Jing Dai, Ying Jin, and Mathuria, Janak
- Subjects
MULTIDIMENSIONAL databases ,COMPUTER network protocols ,DATABASES ,COMPUTER programming ,INFORMATION services ,DISTRIBUTED computing ,DATA mining ,INFORMATION retrieval ,ALGORITHMS - Abstract
Multidimensional databases are beginning to be used in a wide range of applications. To meet this fast-growing demand, the R-tree family is being applied to support fast access to multidimensional data, for which the R+-tree exhibits outstanding search performance. In order to support efficient concurrent access in multiuser environments, concurrency control mechanisms for multidimensional indexing have been proposed. However, these mechanisms cannot be directly applied to the R+-tree because an object in the R+-tree may be indexed in multiple leaves. This paper proposes a concurrency control protocol for R-tree variants with object clipping, namely, Granular Locking for clIPping indexing (GLIP). GLIP is the first concurrency control approach specifically designed for the R+-tree and its variants, and it supports efficient concurrent operations with serializable isolation, consistency, and deadlock-free. Experimental tests on both real and synthetic data sets validated the effectiveness and efficiency of the proposed concurrent access framework. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
23. Information Preserving Time Decompositions of Time Stamped Documents.
- Author
-
Chundi, Parvathi and Rosenkrantz, Daniel J.
- Subjects
ARCHIVES ,PUBLICATIONS ,ALGORITHMS ,DECOMPOSITION method ,INFORMATION retrieval ,DATABASES ,SYSTEM analysis ,DATA mining - Abstract
Extraction of sequences of events from news and other documents based on the publication times of these documents has been shown to be extremely effective in tracking past events. This paper addresses the issue of constructing an optimal information preserving decomposition of the time period associated with a given document set, i.e., a decomposition with the smallest number of subintervals, subject to no loss of information. We introduce the notion of the compressed interval decomposition, where each subinterval consists of consecutive time points having identical information content. We define optimality, and show that any optimal information preserving decomposition of the time period is a refinement of the compressed interval decomposition. We define several special classes of measure functions (functions that measure the prevalence of keywords in the document set and assign them numeric values), based on their effect on the information computed as document sets are combined. We give algorithms, appropriate for different classes of measure functions, for computing an optimal information preserving decomposition of a given document set. We studied the effectiveness of these algorithms by computing several compressed interval and information preserving decompositions for a subset of the Reuters-21578 document set. The experiments support the obvious conclusion that the temporal information gleaned from a document set is strongly dependent on the measure function used and on other user-defined parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
24. Toward Efficient Multifeature Query Processing.
- Author
-
Jagadish, H.V., Beng Chin Ooi, Heng Tao Shen, and Tan, Kian-Lee
- Subjects
ALGORITHMS ,QUERY languages (Computer science) ,ELECTRONIC data processing ,DATABASES ,ELECTRONIC information resource searching ,TOPOLOGY ,INFORMATION retrieval ,DATABASE searching ,INDEXING - Abstract
In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components: The first component is a 2D vector that reflects a distance range (minimum and maximum values) of the f features with respect to a reference point (the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: The first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B
+ tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files. [ABSTRACT FROM AUTHOR]- Published
- 2006
- Full Text
- View/download PDF
25. A Fuzzy Knowledge-Based System for Intelligent Retrieval.
- Author
-
Koyuncu, Murat and Yazici, Adnan
- Subjects
FUZZY systems ,INFORMATION retrieval ,INFERENCE (Logic) ,ALGORITHMS ,OBJECT-oriented databases ,DATABASES - Abstract
For many knowledge-intensive applications, it is important to develop an environment that permits flexible modeling and fuzzy querying of complex data and knowledge including uncertainty. With such an environment, one can have intelligent retrieval of information and knowledge, which has become a critical requirement for those applications. In this paper, we introduce a fuzzy knowledge-based (FKB) system along with the model and the inference mechanism. The inference mechanism is based on the extension of the Rete algorithm to handle fuzziness using a similarity-based approach. The proposed FKB system is used in the intelligent fuzzy object-oriented database (IFOOD) environment, in which a fuzzy object-oriented database is used to handle large scale of complex data while the FKB system is used to handle knowledge of the application domain. Both the fuzzy object-oriented database system and the fuzzy knowledge-based system are based on the object-oriented concepts to eliminate data type mismatches. The aim of this paper is mainly to introduce the FKB system of the IFOOD environment. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
26. Hierarchical directory mapping for category-constrained meta-search.
- Author
-
Tsay, Jyh-Jong and Lin, Chi-Hsiang
- Subjects
FEDERATED searching ,INFORMATION storage & retrieval systems ,INFORMATION resources ,INTERNET searching ,DATABASES ,ALGORITHMS - Abstract
Hierarchical category directories, in which categories are recursively partitioned into sub-categories, have been provided by many information sources, such as news, online stores and shopping websites. Such information sources categorize instances in their databases, and support category-constrained search in which one usually navigates along the category directory to select a category, and then submits a query to find objects in the selected category whose descriptions match the query. As more and more online sources are available, it is challenging to build a meta-search system which provides a unified directory and a meta-search capability to search and access all sources from different websites in one query submission. One of the fundamental problems in building such a meta-search system is category mapping which maps the selected category in the unified directory to categories provided by the information sources. In this paper, we develop an efficient algorithm for category mapping between hierarchical directories. Our algorithm is based on the following two techniques: consistency refinement and hierarchical substitution, which are developed with extensive use of hierarchical structures. Experiment shows that our approach substantially improves previous approaches, and can be used to implement automatic category mapping for meta-search systems which support category-constrained search. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
27. SIMILARITY-BASED RELATIONS IN DATALOG PROGRAMS.
- Author
-
HAJDINJAK, MELITA and BAUER, ANDREJ
- Subjects
DATA loggers ,DATABASES ,INFORMATION retrieval ,ALGORITHMS ,RELATION algebras ,QUERY (Information retrieval system) - Abstract
We consider similarity-based relational databases that allow to retrieve approximate data, find data within a given range of distance or similarity, and support imprecise queries. We focus on the recently introduced relational algebra with similarities on -relations, which are annotated with multi-dimensional similarity values with each dimension referring to a single attribute. The codomains of the annotated relations are De Morgan frames, and the annotations express the relevance of the tuples as answers to a similarity-based query. In this paper, we study Datalog programs on -relations, with and without negation. We describe the least-fixpoint algorithm for safe and rectified Datalog programs on -relations with finite support but without negative literals in the body. We further describe the perfect-minimal-fixpoint algorithm of a Datalog program on -relations with finite support and negative literals in the body when rules are safe, rectified and stratified. We introduce the idea of controlling the calculation of the annotations such that the tuples that enter an IDB relation last will be announced less desirable than those that enter first. For this we define a damping function that augments/diminishes the individual annotations that contribute to the final annotations of tuples. With a damping function, for instance, long chains of inferences may be made significantly less desirable or even totally undesirable. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
28. Rethinking information delivery: using a natural language processing application for point-of-care data discovery.
- Author
-
Workman, T. Elizabeth and Stoddart, Joan M.
- Subjects
INFORMATION retrieval ,ALGORITHMS ,DATABASE searching ,DATABASES ,DECISION support systems ,HEALTH ,INFORMATION storage & retrieval systems ,MEDICAL databases ,WEB development ,MEDLINE ,NATURAL language processing ,PROGRAMMING languages ,RESEARCH funding ,SEMANTICS ,INFORMATION needs - Abstract
Objective: This paper examines the use of Semantic MEDLINE, a natural language processing application enhanced with a statistical algorithm known as Combo, as a potential decision support tool for clinicians. Semantic MEDLINE summarizes text in PubMed citations, transforming it into compact declarations that are filtered according to a user's information need that can be displayed in a graphic interface. Integration of the Combo algorithm enables Semantic MEDLINE to deliver information salient to many diverse needs. Methods: The authors selected three disease topics and crafted PubMed search queries to retrieve citations addressing the prevention of these diseases. They then processed the citations with Semantic MEDLINE, with the Combo algorithm enhancement. To evaluate the results, they constructed a reference standard for each disease topic consisting of preventive interventions recommended by a commercial decision support tool. Results: Semantic MEDLINE with Combo produced an average recall of 79% in primary and secondary analyses, an average precision of 45%, and a final average F-score of 0.57. Conclusion: This new approach to point-of-care information delivery holds promise as a decision support tool for clinicians. Health sciences libraries could implement such technologies to deliver tailored information to their users. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
29. A Privacy-Preserved Analytical Method for eHealth Database with Minimized Information Loss.
- Author
-
Ya-Ling Chen, Bo-Chao Cheng, Hsueh-Lin Chen, Chia-I Lin, Guo-Tan Liao, Bo-Yu Hou, and Shih-Chun Hsu
- Subjects
MEDICAL ethics ,PRIVACY ,INFORMATION retrieval ,RISK assessment ,RISK management in business ,ALGORITHMS ,DATABASES ,USER interfaces ,DATA security ,ELECTRONIC health records - Abstract
Digitizing medical information is an emerging trend that employs information and communication technology (ICT) to manage health records, diagnostic reports, and other medical data more effectively, in order to improve the overall quality of medical services. However, medical information is highly confidential and involves private information, even legitimate access to data raises privacy concerns. Medical records provide health information on an as-needed basis for diagnosis and treatment, and the information is also important for medical research and other health management applications. Traditional privacy risk management systems have focused on reducing reidentification risk, and they do not consider information loss. In addition, such systems cannot identify and isolate data that carries high risk of privacy violations. This paper proposes the Hiatus Tailor (HT) system, which ensures low re-identification risk for medical records, while providing more authenticated information to database users and identifying high-risk data in the database for better system management. The experimental results demonstrate that the HT system achieves much lower information loss than traditional risk management methods, with the same risk of reidentification. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
30. Query processing issues in region-based image databases.
- Author
-
Bartolini, Ilaria, Ciaccia, Paolo, and Patella, Marco
- Subjects
DATABASES ,IMAGE databases ,INFORMATION storage & retrieval systems ,INFORMATION retrieval ,ALGORITHMS ,IMAGE retrieval - Abstract
Many modern image database systems adopt a region-based paradigm, in which images are segmented into homogeneous regions in order to improve the retrieval accuracy. With respect to the case where images are dealt with as a whole, this leads to some peculiar query processing issues that have not been investigated so far in an integrated way. Thus, it is currently hard to understand how the different alternatives for implementing the region-based image retrieval model might impact on performance. In this paper, we analyze in detail such issues, in particular the type of matching between regions (either one-to-one or many-to-many). Then, we propose a novel ranking model, based on the concept of Skyline, as an alternative to the usual one based on aggregation functions and k-Nearest Neighbors queries. We also discuss how different query types can be efficiently supported. For all the considered scenarios we detail efficient index-based algorithms that are provably correct. Extensive experimental analysis shows, among other things, that: (1) the 1-1 matching type has to be preferred to the N- M one in terms of efficiency, whereas the two have comparable effectiveness, (2) indexing regions rather than images performs much better, and (3) the novel Skyline ranking model is consistently the most efficient one, even if this sometimes comes at the price of a reduced effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
31. A new hardware-assisted PIR with O( n) shuffle cost.
- Author
-
Ding, Xuhua, Yang, Yanjiang, Deng, Robert, and Wang, Shuhong
- Subjects
INFORMATION retrieval ,ALGORITHMS ,DATABASE security ,DATABASE searching ,DATABASES ,DATABASE management ,QUERY (Information retrieval system) - Abstract
Since the concept of private information retrieval (PIR) was first formalized by Chor et al., various constructions have been proposed with a common goal of reducing communication complexity. Unfortunately, none of them is suitable for practical settings mainly due to the prohibitively high cost for either communications or computations. The booming of the Internet and its applications, especially, the recent trend in outsourcing databases, fuels the research on practical PIR schemes. In this paper, we propose a hardware-assisted PIR scheme with a novel shuffle algorithm. Our PIR construction entails O( n) offline computation cost, and constant online operations and O(log n) communication cost, where n is the database size. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
32. A new sampling technique for association rule mining.
- Author
-
Mahafzah, Basel A., Al-Badarneh, Amer F., and Zakaria, Mohammed Z.
- Subjects
ASSOCIATION rule mining ,SAMPLING (Process) ,DATA reduction ,DATA mining ,INFORMATION retrieval ,ALGORITHMS ,DATABASES ,MATHEMATICAL models - Abstract
Association Rule Mining (ARM) is one of the data mining techniques used to extract hidden knowledge from datasets, that can be used by an organization's decision makers to improve overall profit. However, performing ARM requires repeated passes over the entire database. Obviously, for large database, the role of input/output overhead in scanning the database is very significant. A popular solution to improve the speed of ARM is to apply the mining algorithm on a sample instead of the entire database. In this paper, a parameterized sampling algorithm for ARM is presented. This algorithm extracts sample datasets based on three parameters: transaction frequency, transaction length and transaction frequency-length. To evaluate its performance and accuracy, a comparison against a two-phase sampling-based algorithm is performed using real and synthetic datasets. The experimental results show that the proposed sampling algorithm in some cases outperforms two-phase sampling algorithm, and achieves up to 98% accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
33. A Survey of Uncertain Data Algorithms and Applications.
- Author
-
Aggarwal, Charu C. and Yu, Philip S.
- Subjects
ALGORITHMS ,INFORMATION services ,DATABASES ,DATA mining ,INFORMATION retrieval ,DATABASE searching ,ELECTRONIC data processing ,OLAP technology ,DOCUMENT clustering - Abstract
In recent years, a number of indirect data collection methodologies have led to the proliferation of uncertain data. Such databases are much more complex because of the additional challenges of representing the probabilistic information. In this paper, we provide a survey of uncertain data mining and management applications. We will explore the various models utilized for uncertain data representation. In the field of uncertain data management, we will examine traditional database management methods such as join processing, query processing, selectivity estimation, OLAP queries, and indexing. In the field of uncertain data mining, we will examine traditional mining problems such as frequent pattern mining, outlier detection, classification, and clustering. We discuss different methodologies to process and mine uncertain data in a variety of forms. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
34. From Wrapping to Knowledge.
- Author
-
Arjona, José L., Corchuelo, Rafael, Ruiz, David, and Toro, Miguel
- Subjects
WORLD Wide Web ,INFORMATION retrieval ,INTELLIGENT agents ,COMPUTER software ,SEMANTICS ,NATURAL language processing ,COMPUTER science ,DATABASES ,ANNOTATIONS ,ALGORITHMS - Abstract
One the most challenging problems for Enterprise Information Integration is to deal with heterogeneous information sources on the Web. The reason is that they usually provide information that is in human-readable form only, which makes it difficult for a software agent to understand it. Current solutions build on the idea of annotating the information with semantics. If the information is unstructured, proposals such as S-CREAM, MnM, or Armadillo may be effective enough since they rely on using natural language processing techniques; furthermore, their accuracy can be improved by using redundant information on the Web, as C-PANKOW has proved recently. If the information is structured and closely related to a back-end database, Deep Annotation ranges among the most effective proposals, but it requires the information providers to modify their applications; if Deep Annotation is not applicable, the easiest solution consists of using a wrapper and transforming its output into annotations. In this paper, we prove that this transformation can be automated by means of an efficient, domain-independent algorithm. To the best of our knowledge, this is the first attempt to devise and formalize such a systematic, general solution. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
35. Parallel CBIR implementations with load balancing algorithms
- Author
-
Bosque, José L., Robles, Oscar D., Pastor, Luis, and Rodríguez, Angel
- Subjects
- *
ALGORITHMS , *INFORMATION retrieval , *DATABASES , *INFORMATION services - Abstract
Abstract: The purpose of content-based information retrieval (CBIR) systems is to retrieve, from real data stored in a database, information that is relevant to a query. When large volumes of data are considered, as it is very often the case with databases dealing with multimedia data, it may become necessary to look for parallel solutions in order to store and gain access to the available items in an efficient way. Among the range of parallel options available nowadays, clusters stand out as flexible and cost effective solutions, although the fact that they are composed of a number of independent machines makes it easy for them to become heterogeneous. This paper describes a heterogeneous cluster-oriented CBIR implementation. First, the cluster solution is analyzed without load balancing, and then, a new load balancing algorithm for this version of the CBIR system is presented. The load balancing algorithm described here is dynamic, distributed, global and highly scalable. Nodes are monitored through a load index which allows the estimation of their total amount of workload, as well as the global system state. Load balancing operations between pairs of nodes take place whenever a node finishes its job, resulting in a receptor-triggered scheme which minimizes the system''s communication overhead. Globally, the CBIR cluster implementation together with the load balancing algorithm can cope effectively with varying degrees of heterogeneity within the cluster; the experiments presented within the paper show the validity of the overall strategy. Together, the CBIR implementation and the load balancing algorithm described in this paper span a new path for performant, cost effective CBIR systems which has not been explored before in the technical literature. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
36. Coarse-to-Fine Vision-Based Localization by Indexing Scale-Invariant Features.
- Author
-
Junqiu Wang, Hongbin Zha, and Cipolla, Roberto
- Subjects
LOCALIZATION theory ,INFORMATION retrieval ,VECTOR spaces ,DATABASES ,ALGORITHMS ,SOFTWARE localization ,MATRICES (Mathematics) - Abstract
This paper presents a novel coarse-to-fine global localization approach inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by scale-invariant transformation feature descriptors are used as natural landmarks. They are indexed into two databases: a location vector space model (LVSM) and a location database. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the LVSM is fast, but not accurate enough, whereas localization from the location database using a voting algorithm is relatively slow, but more accurate. The integration of coarse and. fine stages makes fast and reliable localization possible. If necessary, the localization result can be verified by epipolar geometry between the representative view in the database and the view to be localized. In addition, the localization system recovers the position of the camera by essential matrix decomposition. The localization system has been tested in indoor and outdoor environments. The results show that our approach is efficient and reliable. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
37. An experimental effectiveness comparison of methods for 3D similarity search.
- Author
-
Bustos, Benjamin, Keim, Daniel, Saupe, Dietmar, Schreck, Tobias, and Vranić, Dejan
- Subjects
DATABASES ,ALGORITHMS ,INFORMATION retrieval ,INFORMATION storage & retrieval systems ,ELECTRONIC information resources - Abstract
Methods for content-based similarity search are fundamental for managing large multimedia repositories, as they make it possible to conduct queries for similar content, and to organize the repositories into classes of similar objects. 3D objects are an important type of multimedia data with many promising application possibilities. Defining the aspects that constitute the similarity among 3D objects, and designing algorithms that implement such similarity definitions is a difficult problem. Over the last few years, a strong interest in 3D similarity search has arisen, and a growing number of competing algorithms for the retrieval of 3D objects have been proposed. The contributions of this paper are to survey a body of recently proposed methods for 3D similarity search, to organize them along a descriptor extraction process model, and to present an extensive experimental effectiveness and efficiency evaluation of these methods, using several 3D databases. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
38. Modeling and designing a proteomics application on PROTEUS.
- Author
-
Cannataro, M., Cuda, G., and Veltri, P.
- Subjects
BIOINFORMATICS ,BREAST cancer ,PROTEOMICS ,MASS spectrometry ,COMPUTERS in biology ,GENETIC mutation ,ALGORITHMS ,COMPARATIVE studies ,DATABASES ,INFORMATION retrieval ,INTERNATIONAL relations ,INTERNET ,MANAGEMENT information systems ,RESEARCH methodology ,MEDICAL cooperation ,MEDICAL informatics ,PROBLEM solving ,RESEARCH ,SYSTEM integration ,EVALUATION research ,HUMAN services programs - Abstract
Objectives: Biomedical applications, such as analysis and management of mass spectrometry proteomics experiments, involve heterogeneous platforms and knowledge, massive data sets, and complex algorithms. Main requirements of such applications are semantic modeling of the experiments and data analysis, as well as high performance computational platforms. In this paper we propose a software platform allowing to model and execute biomedical applications on the Grid.Methods: Computational Grids offer the required computational power, whereas ontologies and workflow help to face the heterogeneity of biomedical applications. In this paper we propose the use of domain ontologies and workflow techniques for modeling biomedical applications, whereas Grid middleware is responsible for high performance execution. As a case study, the modeling of a proteomics experiment is discussed.Results: The main result is the design and first use of PROTEUS, a Grid-based problem-solving environment for biomedical and bioinformatics applications.Conclusion: To manage the complexity of biomedical experiments, ontologies help to model applications and to identify appropriate data and algorithms, workflow techniques allow to combine the elements of such applications in a systematic way. Finally, translation of workflow into execution plans allows the exploitation of the computational power of Grids. Along this direction, in this paper we present PROTEUS discussing a real case study in the proteomics domain. [ABSTRACT FROM AUTHOR]- Published
- 2005
- Full Text
- View/download PDF
39. A method of spotting retrieval of similar intervals using frame-wisely renewing query.
- Author
-
Sekimoto, Nobuhiro, Nishimura, Takuichi, Takahashi, Hironobu, and Oka, Ryuichi
- Subjects
INFORMATION retrieval ,AUDIOVISUAL materials ,DATABASES ,ALGORITHMS ,ALGEBRA ,ELECTRONIC information resources - Abstract
The authors propose a method called the Rutic method for detecting time-series intervals similar to a sequentially entered time-series query in a large-scale time-series database containing data such as video or audio data. Although conventional methods such as RIFCDP or IPM handled sequentially entered time-series queries, they required a relatively large amount of calculations and were unsuitable for real-time retrieval from a large database. The proposed method enables retrieval results for the sequentially entered time-series query to be output for each entered frame. Since the Rutic method requires few calculations, it enables spotting retrieval to be implemented in real time. This paper describes the algorithm of the Rutic method and verifies its effectiveness by performing experiments using video retrieval to compare it with other methods. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 87(4): 65–76, 2004; Published online in Wiley InterScience (
www.interscience.wiley.com ). DOI 10.1002/ecjb.20072 [ABSTRACT FROM AUTHOR]- Published
- 2004
- Full Text
- View/download PDF
40. Large-Scale Video Retrieval Using Image Queries.
- Author
-
Araujo, Andre and Girod, Bernd
- Subjects
IMAGE processing ,INFORMATION retrieval ,DATABASES ,ALGORITHMS ,IMAGING systems - Abstract
Retrieving videos from large repositories using image queries is important for many applications, such as brand monitoring or content linking. We introduce a new retrieval architecture, in which the image query can be compared directly with database videos—significantly improving retrieval scalability compared with a baseline system that searches the database on a video frame level. Matching an image to a video is an inherently asymmetric problem. We propose an asymmetric comparison technique for Fisher vectors and systematically explore query or database items with varying amounts of clutter, showing the benefits of the proposed technique. We then propose novel video descriptors that can be compared directly with image descriptors. We start by constructing Fisher vectors for video segments, by exploring different aggregation techniques. For a database of lecture videos, such methods obtain a two orders of magnitude compression gain with respect to a frame-based scheme, with no loss in retrieval accuracy. Then, we consider the design of video descriptors, which combine Fisher embedding with hashing techniques, in a flexible framework based on Bloom filters. Large-scale experiments using three datasets show that this technique enables faster and more memory-efficient retrieval, compared with a frame-based method, with similar accuracy. The proposed techniques are further compared against pre-trained convolutional neural network features, outperforming them on three datasets by a substantial margin. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
41. Rapid Retrieval of Lung Nodule CT Images Based on Hashing and Pruning Methods.
- Author
-
Pan, Ling, Qiang, Yan, Yuan, Jie, and Wu, Lidong
- Subjects
- *
LUNG radiography , *ALGORITHMS , *COMPARATIVE studies , *COMPUTED tomography , *DATABASES , *INFORMATION retrieval , *MEDICAL information storage & retrieval systems , *LUNG tumors , *RESEARCH funding , *TIME , *DECISION making in clinical medicine , *EARLY diagnosis , *DATA analysis software , *DESCRIPTIVE statistics , *COMPUTER-aided diagnosis , *DIAGNOSIS - Abstract
The similarity-based retrieval of lung nodule computed tomography (CT) images is an important task in the computer-aided diagnosis of lung lesions. It can provide similar clinical cases for physicians and help them make reliable clinical diagnostic decisions. However, when handling large-scale lung images with a general-purpose computer, traditional image retrieval methods may not be efficient. In this paper, a new retrieval framework based on a hashing method for lung nodule CT images is proposed. This method can translate high-dimensional image features into a compact hash code, so the retrieval time and required memory space can be reduced greatly. Moreover, a pruning algorithm is presented to further improve the retrieval speed, and a pruning-based decision rule is presented to improve the retrieval precision. Finally, the proposed retrieval method is validated on 2,450 lung nodule CT images selected from the public Lung Image Database Consortium (LIDC) database. The experimental results show that the proposed pruning algorithm effectively reduces the retrieval time of lung nodule CT images and improves the retrieval precision. In addition, the retrieval framework is evaluated by differentiating benign and malignant nodules, and the classification accuracy can reach 86.62%, outperforming other commonly used classification methods. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
42. PRACTICAL ALGORITHM FOR EXTRACTING MULTIPLE DATA SAMPLES FROM GOOGLE TRENDS EXTENDED FOR HEALTH.
- Author
-
Raubenheimer, Jacques E
- Subjects
DATABASES ,MEDICAL information storage & retrieval systems ,USER interfaces ,AUTOMATIC data collection systems ,SEARCH engines ,INFORMATION retrieval ,DESCRIPTIVE statistics ,STATISTICAL correlation ,STATISTICAL models ,STATISTICAL sampling ,ALGORITHMS - Abstract
The authors discuss a sampling algorithm for obtaining estimates from data such as those given by Google via the Google Trends Extended for Health (GT-E) API. Highlights include an outline of different samplings based on width of sampling range, a Visual Basic for Applications code sheet provided for the algorithm, and the use of the algorithm with the unscaled GT-E data in favor of the scaled public facing data.
- Published
- 2022
- Full Text
- View/download PDF
43. Data Mining and Knowledge Discovery With Evolutionary Algorithms.
- Author
-
Ghosh, Ashish and Freitas, Alex A.
- Subjects
DATA mining ,ALGORITHMS ,CLASSIFICATION ,DATABASE searching ,INFORMATION retrieval ,DATABASES - Abstract
Introduces a series of articles on advances in the area of data mining and knowledge discovery with evolutionary algorithms. Information on classification as a task of data mining; Algorithm for knowledge discovery from texts; Data reduction in knowledge discovery in databases.
- Published
- 2003
- Full Text
- View/download PDF
44. Extraction of Plant Identification Keys Using Approximate String Matching for Species Properties Classification.
- Author
-
Sharifalillah, N., Mohd, S. B., and Khairuddin, I.
- Subjects
BIOINFORMATICS ,BIOLOGISTS ,DATABASES ,INFORMATION storage & retrieval systems ,ALGORITHMS - Abstract
Most biologists keep data in separate databases. These databases are not necessary well-structured. Plant identification keys are among such data. They are data-rich description containing plant identification terminologies and maybe used to identify various plant species. The way the data is kept often requires the species identification to be done using rules that are applied sequentially. Done manually, this is very time consuming. Information extraction (IE) is a process of selecting information such as names, terms, or phrases, from a natural language text documents. This information is then structured into a specified template for retrieval. This method is applied to plant identification keys kept by the biologists. Before the keys are extracted from the description, they have to go through a number of processes. In this paper, we illustrate the pre-processing and processing methods with an example from a database, with emphasis on the approximate string matching algorithm to extract the most relevant keys from the description. [ABSTRACT FROM AUTHOR]
- Published
- 2007
45. Optimal Parameters for Locality-Sensitive Hashing.
- Author
-
Slaney, Malcolm, Lifshits, Yury, and He, Junfeng
- Subjects
ALGORITHMS ,HASHING ,INFORMATION retrieval ,DATABASES ,WEB search engines - Abstract
Locality-sensitive hashing (LSH) is the basis of many algorithms that use a probabilistic approach to find nearest neighbors. We describe an algorithm for optimizing the parameters and use of LSH. Prior work ignores these issues or suggests a search for the best parameters. We start with two histograms: one that characterizes the distributions of distances to a point's nearest neighbors and the second that characterizes the distance between a query and any point in the data set. Given a desired performance level (the chance of finding the true nearest neighbor) and a simple computational cost model, we return the LSH parameters that allow an LSH index to meet the performance goal and have the minimum computational cost. We can also use this analysis to connect LSH to deterministic nearest-neighbor algorithms such as k-d trees and thus start to unify the two approaches. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
46. Locality-Sensitive Hashing for Chi2 Distance.
- Author
-
Gorisse, David, Cord, Matthieu, and Precioso, Frederic
- Subjects
DATA structures ,ALGORITHMS ,INFORMATION retrieval ,DATABASES ,NEAREST neighbor analysis (Statistics) ,SEARCH algorithms ,APPROXIMATION theory ,COMPUTATIONAL complexity ,SENSITIVITY analysis - Abstract
In the past 10 years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Nearest Neighbors search (or Approximate Nearest Neighbors search). If the Euclidean Locality Sensitive Hashing algorithm, which provides approximate nearest neighbors in a euclidean space with sublinear complexity, is probably the most popular, the euclidean metric does not always provide as accurate and as relevant results when considering similarity measure as the Earth-Mover Distance and χ² distances. In this paper, we present a new LSH scheme adapted to χ² distance for approximate nearest neighbors search in high-dimensional spaces. We define the specific hashing functions, we prove their local-sensitivity, and compare, through experiments, our method with the Euclidean Locality Sensitive Hashing algorithm in the context of image retrieval on real image databases. The results prove the relevance of such a new LSH scheme either providing far better accuracy in the context of image retrieval than euclidean scheme for an equivalent speed, or providing an equivalent accuracy but with a high gain in terms of processing speed. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
47. Web Mining Technique for Collaborative Web Surfing.
- Author
-
Jalbani, Akhtar Ali, Menghwar, Gordhan Das, Memon, Mukhtiar, and Yasmin, Aneela
- Subjects
DATA mining ,WEB browsing ,DATABASES ,ALGORITHMS ,INFORMATION retrieval - Abstract
Web mining is an application of a data mining technique which is used for knowledge discovery. In this paper, web mining techniques have been studied for the collaborative Web surfing, where more than one surfer are searching for the identical data from the world's largest database i.e. WWW. On the WWW the data is placed in an unstructured way. Therefore finding relevant information is always time consuming and a tiring job. We propose data mining technique using pattern matching method for collaborative web surfing. [ABSTRACT FROM AUTHOR]
- Published
- 2012
48. Image retrieval systems based on compact shape descriptor and relevance feedback information
- Author
-
Zagoris, Konstantinos, Ergina, Kavallieratou, and Papamarkos, Nikos
- Subjects
- *
IMAGE retrieval , *INFORMATION retrieval , *SUPPORT vector machines , *ALGORITHMS , *ALGEBRA , *DATABASES , *MPEG (Video coding standard) - Abstract
Abstract: One of the most important and most used low-level image feature is the shape employed in a variety of systems such as document image retrieval through word spotting. In this paper an MPEG-like descriptor is proposed that contains conventional contour and region shape features with a wide applicability from any arbitrary shape to document retrieval through word spotting. Its size and storage requirements are kept to minimum without limiting its discriminating ability. In addition to that, a relevance feedback technique based on Support Vector Machines is provided that employs the proposed descriptor with the purpose to measure how well it performs with it. In order to evaluate the proposed descriptor it is compared against different descriptors at the MPEG-7 CE1 Set B database. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
49. Implementing Temporal Databases in Object-Oriented Systems.
- Author
-
Steiner, A. and Norrie, M. C.
- Subjects
DATABASES ,QUERYING (Computer science) ,INFORMATION retrieval ,INFORMATION resources management ,ALGORITHMS - Published
- 1997
50. Text detection in images using sparse representation with discriminative dictionaries
- Author
-
Zhao, Ming, Li, Shutao, and Kwok, James
- Subjects
- *
ENCYCLOPEDIAS & dictionaries , *IMAGE analysis , *SPARSE matrices , *ALGORITHMS , *INFORMATION retrieval , *WAVELETS (Mathematics) , *DATABASES , *COLOR - Abstract
Abstract: Text detection is important in the retrieval of texts from digital pictures, video databases and webpages. However, it can be very challenging since the text is often embedded in a complex background. In this paper, we propose a classification-based algorithm for text detection using a sparse representation with discriminative dictionaries. First, the edges are detected by the wavelet transform and scanned into patches by a sliding window. Then, candidate text areas are obtained by applying a simple classification procedure using two learned discriminative dictionaries. Finally, the adaptive run-length smoothing algorithm and projection profile analysis are used to further refine the candidate text areas. The proposed method is evaluated on the Microsoft common test set, the ICDAR 2003 text locating set, and an image set collected from the web. Extensive experiments show that the proposed method can effectively detect texts of various sizes, fonts and colors from images and videos. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.