Back to Search Start Over

Large scale proteomic studies create novel privacy considerations

Authors :
Andrew C. Hill
Claire Guo
Elizabeth M. Litkowski
Ani W. Manichaikul
Bing Yu
Iain R. Konigsberg
Betty A. Gorbet
Leslie A. Lange
Katherine A. Pratte
Katerina J. Kechris
Matthew DeCamp
Marilyn Coors
Victor E. Ortega
Stephen S. Rich
Jerome I. Rotter
Robert E. Gerzsten
Clary B. Clish
Jeffrey L. Curtis
Xiaowei Hu
Ma-en Obeidat
Melody Morris
Joseph Loureiro
Debby Ngo
Wanda K. O’Neal
Deborah A. Meyers
Eugene R. Bleecker
Brian D. Hobbs
Michael H. Cho
Farnoush Banaei-Kashani
Russell P. Bowler
Source :
Scientific Reports, Vol 13, Iss 1, Pp 1-14 (2023)
Publication Year :
2023
Publisher :
Nature Portfolio, 2023.

Abstract

Abstract Privacy protection is a core principle of genomic but not proteomic research. We identified independent single nucleotide polymorphism (SNP) quantitative trait loci (pQTL) from COPDGene and Jackson Heart Study (JHS), calculated continuous protein level genotype probabilities, and then applied a naïve Bayesian approach to link SomaScan 1.3K proteomes to genomes for 2812 independent subjects from COPDGene, JHS, SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) and Multi-Ethnic Study of Atherosclerosis (MESA). We correctly linked 90–95% of proteomes to their correct genome and for 95–99% we identify the 1% most likely links. The linking accuracy in subjects with African ancestry was lower (~ 60%) unless training included diverse subjects. With larger profiling (SomaScan 5K) in the Atherosclerosis Risk Communities (ARIC) correct identification was > 99% even in mixed ancestry populations. We also linked proteomes-to-proteomes and used the proteome only to determine features such as sex, ancestry, and first-degree relatives. When serial proteomes are available, the linking algorithm can be used to identify and correct mislabeled samples. This work also demonstrates the importance of including diverse populations in omics research and that large proteomic datasets (> 1000 proteins) can be accurately linked to a specific genome through pQTL knowledge and should not be considered unidentifiable.

Subjects

Subjects :
Medicine
Science

Details

Language :
English
ISSN :
20452322
Volume :
13
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Scientific Reports
Publication Type :
Academic Journal
Accession number :
edsdoj.9424c0bfdf7d4d96b4dca84e9697c383
Document Type :
article
Full Text :
https://doi.org/10.1038/s41598-023-34866-6