Start Over

A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Authors :: Tanika N. Kelly
May E Montasser
Alyna T. Khan
Laura M. Raffield
Carla Wilson
Elizabeth C. Oelsner
Kerri L. Wiggins
Ming-Huei Chen
Gina M. Peloso
Adolfo Correa
Andrew D. Johnson
Donna K. Arnett
Xiuqing Guo
Jai G. Broome
Daniel E. Weeks
Rebecca D. Jackson
Lucia Juarez
Stephen T. McGarvey
Pradeep Natarajan
Braxton D. Mitchell
Kent D. Taylor
Bruce M. Psaty
Santhi K Ganesh
Cathy C. Laurie
Nicola L. Hawley
Leslie S. Emery
Adrienne M. Stilp
Alanna C. Morrison
Jennifer A Smith
Charles Kooperberg
Catherine M. D’Augustine
Jan Graffelman
Paul S. de Vries
Chancellor Hohensee
Sharon L R Kardia
Patricia A Peyser
Wan-Ling Hsu
Erin J Buth
Kathleen C. Barnes
Susan R. Heckbert
Ramachandran S. Vasan
Nathan Pankratz
Karen M. Mutalik
Quenna Wong
Brian E. Cade
Jingmin Liu
Joshua C. Bis
Cecelia A. Laurie
Kari E. North
Fei Fei Wang
Mariza de Andrade
Nancy L. Heard-Costa
William Craig Johnson
L. Adrienne Cupples
Scott T. Weiss
Seyed Mehdi Nouraie
Patrick T. Ellinor
Jerome I. Rotter
Weiniu Gan
Shannon Kelly
Stephen S. Rich
Cashell E. Jaquish
Dongquan Chen
Nora Franceschini
Lisa R. Yanek
Jiwon Lee
Alexander P. Reiner
Megan L. Grove
Stella Aslibekyan
Myriam Fornage
Lawrence F Bielak
Rasika A. Mathias
Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
Universitat Politècnica de Catalunya. COSDA-UPC - COmpositional and Spatial Data Analysis
Source :: UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC), American Journal of Epidemiology, American journal of epidemiology, vol 190, iss 10
Publication Year :: 2021
Abstract: Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.

Subjects :: 0301 basic medicine
Program evaluation
Computer science
Epidemiology
common data elements
hematologic disease
Matemàtiques i estadística::Matemàtica aplicada a les ciències [Àrees temàtiques de la UPC]
Medical and Health Sciences
Mathematical Sciences
0302 clinical medicine
Documentation
cardiovascular disease
and Blood Institute (U.S.)
030212 general & internal medicine
Phenomics
Precision Medicine
Lung
lung diseases
Sleep-wake disorders
phenotypes
92 Biology and other natural sciences::92B Mathematical biology in general [Classificació AMS]
Common data elements
Cardiovascular disease
Phenotype
Phenotypes
Biomatemàtica
Information Dissemination
Harmonization
62 Statistics::62D05 Sampling theory, sample surveys [Classificació AMS]
Hematologic disease
03 medical and health sciences
Data Aggregation
Clinical Research
Controlled vocabulary
Genetics
Humans
AcademicSubjects/MED00860
sleep-wake disorders
Sampling (Statistics)
Genetic Association Studies
Lung diseases
Biomathematics
Data collection
Study Design
Matemàtiques i estadística::Estadística aplicada::Estadística biosanitària [Àrees temàtiques de la UPC]
Information dissemination
Human Genome
National Heart
Precision medicine
Data science
United States
030104 developmental biology
Good Health and Well Being
National Heart, Lung, and Blood Institute (U.S.)
Mostreig (Estadística)
Program Evaluation

Details

Language :: English
Database :: OpenAIRE
Journal :: UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC), American Journal of Epidemiology, American journal of epidemiology, vol 190, iss 10
Accession number :: edsair.doi.dedup.....0380088474cfe1cc5218a3fad8da984a

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources