Back to Search
Start Over
Quasi-metrics, Similarities and Searches: aspects of geometry of protein datasets
- Publication Year :
- 2008
-
Abstract
- A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. Chapter 2 presents basic concepts of the theory of quasi-metric spaces and introduces a new examples of them: the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The main result of this chapter indicates that `a high dimensional quasi-metric space is close to being a metric space'. Chapter 5 investigates the geometric aspects of the theory of database similarity search in the context of quasi-metrics. The results about $pq$-spaces are used to produce novel theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.<br />Comment: 299 pages, 44 figures, 10 tables, 9 algorithms. PhD thesis in mathematics defended in May 2005 at the Victoria University of Wellington, Wellington, New Zealand (supervisors: Prof. Vladimir Pestov and Dr. Bill Jordan)
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.0810.5407
- Document Type :
- Working Paper