6 results on '"Albrechtsen, Anders"'
Search Results
2. ANGSD: Analysis of Next Generation Sequencing Data
- Author
-
Korneliussen, Thorfinn Sand, Albrechtsen, Anders, and Nielsen, Rasmus
- Subjects
Human Genome ,Genetics ,Networking and Information Technology R&D (NITRD) ,Gene Frequency ,Genetics ,Population ,Genotype ,High-Throughput Nucleotide Sequencing ,Likelihood Functions ,Polymorphism ,Single Nucleotide ,Software ,Next-generation sequencing ,Bioinformatics ,Population genetics ,Association studies ,Mathematical Sciences ,Biological Sciences ,Information and Computing Sciences - Abstract
BackgroundHigh-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.ResultsWe present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.ConclusionsThe open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd . The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.
- Published
- 2014
3. Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data
- Author
-
Korneliussen, Thorfinn Sand, Moltke, Ida, Albrechtsen, Anders, and Nielsen, Rasmus
- Abstract
Abstract Background A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima’s D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. However, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions. Results We have developed an approach that accommodates the uncertainty of the data when calculating site frequency based neutrality test statistics. A salient feature of this approach is that it implicitly solves the problems of varying sequencing depth, missing data and avoids the need to infer variable sites for the analysis and thereby avoids ascertainment problems introduced by a SNP discovery process. Conclusion Using an empirical Bayes approach for fast computations, we show that this method produces results for low-coverage NGS data comparable to those achieved when the genotypes are known without uncertainty. We also validate the method in an analysis of data from the 1000 genomes project. The method is implemented in a fast framework which enables researchers to perform these neutrality tests on a genome-wide scale.
- Published
- 2013
4. Estimation of allele frequency and association mapping using next-generation sequencing data
- Author
-
Kim, Su, Lohmueller, Kirk E, Albrechtsen, Anders, Li, Yingrui, Korneliussen, Thorfinn, Tian, Geng, Grarup, Niels, Jiang, Tao, Andersen, Gitte, Witte, Daniel, Jorgensen, Torben, Hansen, Torben, Pedersen, Oluf, Wang, Jun, and Nielsen, Rasmus
- Abstract
Abstract Background Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. Results We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. Conclusions Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
- Published
- 2011
5. Estimation of allele frequency and association mapping using next-generation sequencing data
- Author
-
Andersen Gitte, Witte Daniel, Jiang Tao, Grarup Niels, Tian Geng, Korneliussen Thorfinn, Li Yingrui, Albrechtsen Anders, Lohmueller Kirk E, Kim Su, Jorgensen Torben, Hansen Torben, Pedersen Oluf, Wang Jun, and Nielsen Rasmus
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. Results We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. Conclusions Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
- Published
- 2011
- Full Text
- View/download PDF
6. Estimation of allele frequency and association mapping using next-generation sequencing data.
- Author
-
Su Yeon Kim, Lohmueller, Kirk E., Albrechtsen, Anders, Yingrui Li, Korneliussen, Thorfinn, Geng Tian, Grarup, Niels, Tao Jiang, Andersen, Gitte, Witte, Daniel, Jorgensen, Torben, Hansen, Torben, Pedersen, Oluf, Jun Wang, and Nielsen, Rasmus
- Subjects
POPULATION genetics ,COST effectiveness ,STATISTICS ,GENETIC polymorphisms ,GENETICS - Abstract
Background: Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. Results: We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. Conclusions: Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.