Back to Search Start Over

Discovering isochores by least-squares optimal segmentation.

Authors :
Haiminen N
Mannila H
Source :
Gene [Gene] 2007 Jun 01; Vol. 394 (1-2), pp. 53-60. Date of Electronic Publication: 2007 Feb 16.
Publication Year :
2007

Abstract

The isochore structure of a genome is observable by variation in the G+C (guanine and cytosine) content within and between the chromosomes. Describing the isochore structure of vertebrate genomes is a challenging task, and many computational methods have been developed and applied to it. Here we apply a well-known least-squares optimal segmentation algorithm to isochore discovery. The algorithm finds the best division of the sequence into k pieces, such that the segments are internally as homogeneous as possible. We show how this simple segmentation method can be applied to isochore discovery using as input the G+C content of sliding windows on the sequence. To evaluate the performance of this segmentation technique on isochore detection, we present results from segmenting previously studied isochore regions of the human genome. Detailed results on the MHC locus, on parts of chromosomes 21 and 22, and on a 100 Mb region from chromosome 1 are similar to previously suggested isochore structures. We also give results on segmenting all 22 autosomal human chromosomes. An advantage of this technique is that oversegmentation of G+C rich regions can generally be avoided. This is because the technique concentrates on greater global, instead of smaller local, differences in the sequence composition. The effect is further emphasized by a log-transformation of the data that lowers the high variance that is observed in G+C rich regions. We conclude that the least-squares optimal segmentation method is computationally efficient and yields results close to previous biologically motivated isochore structures.

Details

Language :
English
ISSN :
0378-1119
Volume :
394
Issue :
1-2
Database :
MEDLINE
Journal :
Gene
Publication Type :
Academic Journal
Accession number :
17389148
Full Text :
https://doi.org/10.1016/j.gene.2007.01.028