Back to Search Start Over

An efficient sequential pattern mining algorithm for motifs with gap constraints

Authors :
Ming-Syan Chen
Vance Chiang-Chi Liao
Source :
BIBM
Publication Year :
2012
Publisher :
IEEE, 2012.

Abstract

Mining biological data can provide insight into various realms of biology, such as finding co-occurring biosequences, which is essential for biological analyses and data mining. Sequential pattern mining reveals all-length implicit motifs, which have specific structures and are of functional significance in biological sequences. Traditional sequential pattern mining algorithms are inefficient for small alphabets and long sequences, such as DNA and protein sequences; therefore, it is necessary to move away from these algorithms. An approach called the Depth-First Spelling algorithm for mining sequential patterns (motifs) with Gap constraints in biological sequences (referred to as DFSG) is proposed in this work. In biological sequences, DFSG runtime is substantially shorter than that of GenPrefixSpan, where GenPrefixSpan is a method based on PrefixSpan (PrefixSpan is one of the fastest algorithms in traditional sequential pattern mining algorithms).

Details

Database :
OpenAIRE
Journal :
2012 IEEE International Conference on Bioinformatics and Biomedicine
Accession number :
edsair.doi...........d2de28fc87dd4ae1ecec659c6697ee72