1. Overlapping Genomic Sequences: A Treasure Trove of Single-Nucleotide Polymorphisms
- Author
-
Pui-Yan Kwok, Patricia Taillon-Miller, Qun Li, Zhijie Gu, and LaDeana W. Hillier
- Subjects
Sequence analysis ,Molecular Sequence Data ,Population ,Human artificial chromosome ,Biology ,Genome ,Genome Methods ,DNA sequencing ,Genes, Overlapping ,Genetics ,Humans ,Bacteriophage P1 ,Cloning, Molecular ,education ,Genetics (clinical) ,Bacterial artificial chromosome ,education.field_of_study ,Polymorphism, Genetic ,Base Sequence ,Chromosomes, Human, Pair 13 ,Genome, Human ,Sequence Analysis, DNA ,Chromosomes, Bacterial ,SNP genotyping ,Chromosomes, Human, Pair 5 ,Human genome ,Chromosomes, Human, Pair 7 - Abstract
An efficient strategy to develop a dense set of single-nucleotide polymorphism (SNP) markers is to take advantage of the human genome sequencing effort currently under way. Our approach is based on the fact that bacterial artificial chromosomes (BACs) and P1-based artificial chromosomes (PACs) used in long-range sequencing projects come from diploid libraries. If the overlapping clones sequenced are from different lineages, one is comparing the sequences from 2 homologous chromosomes in the overlapping region. We have analyzed in detail every SNP identified while sequencing three sets of overlapping clones found on chromosome 5p15.2, 7q21–7q22, and 13q12–13q13. In the 200.6 kb of DNA sequence analyzed in these overlaps, 153 SNPs were identified. Computer analysis for repetitive elements and suitability for STS development yielded 44 STSs containing 68 SNPs for further study. All 68 SNPs were confirmed to be present in at least one of the three (Caucasian, African-American, Hispanic) populations studied. Furthermore, 42 of the SNPs tested (62%) were informative in at least one population, 32 (47%) were informative in two or more populations, and 23 (34%) were informative in all three populations. These results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AC003015 (for GS113423),AC002380 (GS330J10), AC000066 (RG293F11), AC003086 (RG104F04), AC002525(257C22A), and U73331 (96A18A).]
- Published
- 1998
- Full Text
- View/download PDF