Back to Search Start Over

MethParquet: an R package for rapid and efficient DNA methylation association analysis adopting Apache Parquet.

Authors :
Wang Z
Cassidy M
Wallace DA
Sofer T
Source :
Bioinformatics (Oxford, England) [Bioinformatics] 2024 Jul 01; Vol. 40 (7).
Publication Year :
2024

Abstract

Summary: Genome-wide DNA methylation (DNAm) profiling is indispensable for unveiling how DNAm regulates biological pathways and individual phenotypes. However, managing and analyzing extensive DNAm data generated from large cohort studies present computational obstacles. Apache Parquet is a data file format that allows for efficient data storage, retrieval, and manipulation, alleviating computational hurdles associated with conventional row-based formats. We here introduce MethParquet, the first R package leveraging the columnar Parquet format for efficient DNAm data analysis. It can be used for data extraction, methylation risk score calculation, epigenome-wide association analyses, and other standard post-quality control tasks. The package flexibly implements diverse regression models. Via a public methylation dataset, we show the efficiency of this package in reducing running time and RAM usage in large-scale EWAS.<br />Availability and Implementation: The MethParquet R package is publicly available on the GitHub repository https://github.com/ZWangTen/MethParquet. It includes a vignette and a toy dataset derived from a public resource.<br /> (© The Author(s) 2024. Published by Oxford University Press.)

Details

Language :
English
ISSN :
1367-4811
Volume :
40
Issue :
7
Database :
MEDLINE
Journal :
Bioinformatics (Oxford, England)
Publication Type :
Academic Journal
Accession number :
38897661
Full Text :
https://doi.org/10.1093/bioinformatics/btae410