Start Over

GEMINI: a computationally-efficient search engine for large gene expression datasets.

Authors :: Timothy DeFreitas
Hachem Saddiki
Patrick Flaherty
Source :: BMC Bioinformatics. 2/24/2016, Vol. 17, p1-7. 7p. 1 Diagram, 2 Charts, 3 Graphs.
Publication Year :: 2016
Abstract: Background: Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query -- a text-based string -- is mismatched with the form of the target -- a genomic profile. Results: To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an O(log n) expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 105 samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. Conclusions: GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information. [ABSTRACT FROM AUTHOR]

Subjects :: *NUCLEOTIDE sequencing
*GENE expression
*METADATA
*SEARCH algorithms
*DATABASE searching
*OVARIAN cancer
*CANCER genetics
*GENETICS of breast cancer

Details

Language :: English
ISSN :: 14712105
Volume :: 17
Database :: Academic Search Index
Journal :: BMC Bioinformatics
Publication Type :: Academic Journal
Accession number :: 113299814
Full Text :: https://doi.org/10.1186/s12859-016-0934-8

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

GEMINI: a computationally-efficient search engine for large gene expression datasets.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

GEMINI: a computationally-efficient search engine for large gene expression datasets.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources