Back to Search Start Over

Quasi-metrics, Similarities and Searches: aspects of geometry of protein datasets

Authors :
Stojmirovic, Aleksandar
Publication Year :
2008

Abstract

A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. Chapter 2 presents basic concepts of the theory of quasi-metric spaces and introduces a new examples of them: the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The main result of this chapter indicates that `a high dimensional quasi-metric space is close to being a metric space'. Chapter 5 investigates the geometric aspects of the theory of database similarity search in the context of quasi-metrics. The results about $pq$-spaces are used to produce novel theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.<br />Comment: 299 pages, 44 figures, 10 tables, 9 algorithms. PhD thesis in mathematics defended in May 2005 at the Victoria University of Wellington, Wellington, New Zealand (supervisors: Prof. Vladimir Pestov and Dr. Bill Jordan)

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.0810.5407
Document Type :
Working Paper