Back to Search Start Over

Gene prediction with conditional random fields

Authors :
James Galagan and David DeCaprio.
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Doherty, Matthew K
James Galagan and David DeCaprio.
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Doherty, Matthew K
Publication Year :
2008

Abstract

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.<br />Includes bibliographical references (p. 75-77).<br />The accurate annotation of an organism's protein-coding genes is crucial for subsequent genomic analysis. The rapid advance of sequencing technology has created a gap between genomic sequences and their annotations. Automated annotation methods are needed to bridge this gap, but existing solutions based on hidden Markov models cannot easily incorporate diverse evidence to make more accurate predictions. In this thesis, I built upon the semi-Markov conditional random field framework created by DeCaprio et al. to predict protein-coding genes in DNA sequences. Several novel extensions were designed and implemented, including a 29-state model with both semi-Markov and Markov states, an N-best Viterbi inference algorithm, several classes of discriminative feature functions that incorporate diverse evidence, and parallelization of the training and inference algorithms. The extensions were tested on the genomes of Phytophthora infestans, Culex pipiens, and Homo sapiens. The gene predictions were analyzed and the benefits of discriminative methods were explored.<br />by Matthew K. Doherty.<br />M.Eng.

Details

Database :
OAIster
Notes :
77 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1143338538
Document Type :
Electronic Resource