Back to Search
Start Over
Blazing Signature Filter: a library for fast pairwise similarity comparisons
- Source :
- BMC Bioinformatics, BMC Bioinformatics, Vol 19, Iss 1, Pp 1-12 (2018)
- Publication Year :
- 2017
- Publisher :
- Cold Spring Harbor Laboratory, 2017.
-
Abstract
- Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is that the vast majority of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.
- Subjects :
- 0301 basic medicine
Computer science
Scale (descriptive set theory)
lcsh:Computer applications to medicine. Medical informatics
Machine learning
computer.software_genre
Biochemistry
Task (project management)
03 medical and health sciences
0302 clinical medicine
Similarity (network science)
Structural Biology
Humans
lcsh:QH301-705.5
Pairwise similarity comparison
Molecular Biology
030304 developmental biology
Large-scale data mining
0303 health sciences
030102 biochemistry & molecular biology
Genome, Human
business.industry
Applied Mathematics
Computational Biology
High-Throughput Nucleotide Sequencing
Genomics
Sequence Analysis, DNA
Filter (signal processing)
Expression (mathematics)
Computer Science Applications
Data set
ComputingMethodologies_PATTERNRECOGNITION
030104 developmental biology
lcsh:Biology (General)
Filter (video)
030220 oncology & carcinogenesis
lcsh:R858-859.7
Pairwise comparison
Artificial intelligence
Data mining
Filtering
business
computer
Software
Algorithms
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics, BMC Bioinformatics, Vol 19, Iss 1, Pp 1-12 (2018)
- Accession number :
- edsair.doi.dedup.....136e35a659b9ab5fa666114f0257f82b
- Full Text :
- https://doi.org/10.1101/162750