Back to Search Start Over

SUBSEQUENCE COUNTING AS A MEASURE OF SIMILARITY FOR SEQUENCES.

Authors :
WANG, HUI
Source :
International Journal of Pattern Recognition & Artificial Intelligence. Jun2007, Vol. 21 Issue 4, p745-758. 14p. 9 Diagrams, 1 Graph.
Publication Year :
2007

Abstract

The longest common subsequence is a well known and popular method for measuring similarity between sequences. It advocates the use of information contained in the longest common subsequence as an indication of similarity. In this paper we consider the count of all common subsequences as a measure of sequence similarity with the view that all common information is captured. This measure is inspired and derived from a generic similarity measure, neighborhood counting metric. The close connection of the neighborhood counting metric with probability and the Bayes classifier helps gain an insight from the probabilistic perspective into the all-common subsequences measure. We design algorithms to calculate this measure and we also carry out an experiment in the framework of k-nearest neighbors on a gene sequence classification task. The experiment shows that the all-common subsequences measure and the longest common subsequence measure have little difference for small k values, but differ significantly for large k values. The performance of the all-common subsequences measure remains steady as k gets larger, but the performance of the longest common subsequence measure drops sharply as k gets larger. Such a property may be useful for those tasks where we are interested not only in the nearest neighbor, but also in the first k nearest neighbors. The main contribution of this paper is a suite of algorithms for finding all common subsequences. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02180014
Volume :
21
Issue :
4
Database :
Academic Search Index
Journal :
International Journal of Pattern Recognition & Artificial Intelligence
Publication Type :
Academic Journal
Accession number :
25484947
Full Text :
https://doi.org/10.1142/S021800140700565X