21 results on '"Data anonymization"'
Search Results
2. Summary of the National Cancer Institute 2023 Virtual Workshop on Medical Image De-identification-Part 1: Report of the MIDI Task Group - Best Practices and Recommendations, Tools for Conventional Approaches to De-identification, International Approaches to De-identification, and Industry Panel on Image De-identification.
- Author
-
Clunie D, Prior F, Rutherford M, Moore S, Parker W, Kondylakis H, Ludwigs C, Klenk J, Lou B, O'Sullivan LT, Marcus D, Dobes J, Gutman A, and Farahani K
- Subjects
- United States, Humans, Diagnostic Imaging methods, Data Anonymization, National Cancer Institute (U.S.)
- Abstract
De-identification of medical images intended for research is a core requirement for data-sharing initiatives, particularly as the demand for data for artificial intelligence (AI) applications grows. The Center for Biomedical Informatics and Information Technology (CBIIT) of the US National Cancer Institute (NCI) convened a virtual workshop with the intent of summarizing the state of the art in de-identification technology and processes and exploring interesting aspects of the subject. This paper summarizes the highlights of the first day of the workshop, the recordings, and presentations of which are publicly available for review. The topics covered included the report of the Medical Image De-Identification Initiative (MIDI) Task Group on best practices and recommendations, tools for conventional approaches to de-identification, international approaches to de-identification, and an industry panel., Competing Interests: Declarations. Conflict of Interest: Authors FP, MR SM, WP, HK, and KF declare they have no financial interests. Author DC is the owner of PixelMed Publishing. Author CL is an employee of Aigora. Author JK is an employee of Deloitte Consulting. Author BL is an employee of Google. Author LO is an employee of IBIS. Author DM is an employee of Flywheel. Author JD is an employee of John Snow Labs. Author AG is a founder of AG Mednet., (© 2024. The Author(s).)
- Published
- 2025
- Full Text
- View/download PDF
3. Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example.
- Author
-
Thoral, Patrick J., Peppink, Jan M., Driessen, Ronald H., Sijbrands, Eric J. G., Kompanje, Erwin J. O., Kaplan, Lewis, Bailey, Heatherlee, Kesecioglu, Jozef, Cecconi, Maurizio, Churpek, Matthew, Clermont, Gilles, van der Schaar, Mihaela, Ercole, Ari, Girbes, Armand R. J., Elbers, Paul W. G., and Amsterdam University Medical Centers Database (AmsterdamUMCdb) Collaborators and the SCCM/ESICM Joint Data Science Task Force
- Subjects
- *
CRITICAL care medicine , *ACADEMIC medical centers , *MEDICAL databases , *DATA science , *INFORMATION sharing , *ELECTRONIC data interchange standards , *MEDICAL ethics laws , *RIGHT of privacy , *DATABASE laws , *DATABASES , *PRIVACY , *INTENSIVE care units , *RESEARCH , *ELECTRONIC data interchange , *RESEARCH methodology , *HEALTH Insurance Portability & Accountability Act , *MEDICAL cooperation , *EVALUATION research , *COMPARATIVE studies , *MEDICAL ethics , *RESEARCH funding , *MEDICAL societies , *ETHICS , *LAW - Abstract
Objectives: Critical care medicine is a natural environment for machine learning approaches to improve outcomes for critically ill patients as admissions to ICUs generate vast amounts of data. However, technical, legal, ethical, and privacy concerns have so far limited the critical care medicine community from making these data readily available. The Society of Critical Care Medicine and the European Society of Intensive Care Medicine have identified ICU patient data sharing as one of the priorities under their Joint Data Science Collaboration. To encourage ICUs worldwide to share their patient data responsibly, we now describe the development and release of Amsterdam University Medical Centers Database (AmsterdamUMCdb), the first freely available critical care database in full compliance with privacy laws from both the United States and Europe, as an example of the feasibility of sharing complex critical care data.Setting: University hospital ICU.Subjects: Data from ICU patients admitted between 2003 and 2016.Interventions: We used a risk-based deidentification strategy to maintain data utility while preserving privacy. In addition, we implemented contractual and governance processes, and a communication strategy. Patient organizations, supporting hospitals, and experts on ethics and privacy audited these processes and the database.Measurements and Main Results: AmsterdamUMCdb contains approximately 1 billion clinical data points from 23,106 admissions of 20,109 patients. The privacy audit concluded that reidentification is not reasonably likely, and AmsterdamUMCdb can therefore be considered as anonymous information, both in the context of the U.S. Health Insurance Portability and Accountability Act and the European General Data Protection Regulation. The ethics audit concluded that responsible data sharing imposes minimal burden, whereas the potential benefit is tremendous.Conclusions: Technical, legal, ethical, and privacy challenges related to responsible data sharing can be addressed using a multidisciplinary approach. A risk-based deidentification strategy, that complies with both U.S. and European privacy regulations, should be the preferred approach to releasing ICU patient data. This supports the shared Society of Critical Care Medicine and European Society of Intensive Care Medicine vision to improve critical care outcomes through scientific inquiry of vast and combined ICU datasets. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
4. Preliminary Evaluation of Fine-Tuning the OpenDeLD Deidentification Pipeline Across Multi-Center Corpora.
- Author
-
Gupta S, Liu J, Wong ZS, and Jonnagaddala J
- Subjects
- Humans, Confidentiality, Data Anonymization, Deep Learning, United States, Australia, Natural Language Processing, Electronic Health Records
- Abstract
Automatic deidentification of Electronic Health Records (EHR) is a crucial step in secondary usage for biomedical research. This study introduces evaluation of an intricate hybrid deidentification strategy to enhance patient privacy in secondary usage of EHR. Specifically, this study focuses on assessing automatic deidentification using OpenDeID pipeline across diverse corpora for safeguarding sensitive information within EHR datasets by incorporating diverse corpora. Three distinct corpora were utilized: the OpenDeID v2 corpus containing pathology reports from Australian hospitals, the 2014 i2b2/UTHealth deidentification corpus with clinical narratives from the USA, and the 2016 CEGS N-GRID identification corpus comprising psychiatric notes. The OpenDeID pipeline employs a hybrid approach based on deep learning and contextual rules. Pre-processing steps involved harmonizing and addressing encoding and format issues. Precision, Recall, F-measure metrics were used to assess the performance. The evaluation metrics demonstrated the superior performance of the Discharge Summary BioBERT model. Trained on three corpora with a total of 4,038 reports, the best performing model exhibited robust deidentification capabilities when applied to EHR. It achieved impressive micro-averaged F1-scores of 0.9248 and 0.9692 for strict and relaxed settings, respectively. These results offer valuable insights into the model's efficacy and its potential role in safeguarding patient privacy in secondary usage of EHR.
- Published
- 2024
- Full Text
- View/download PDF
5. De-identification of free text data containing personal health information: a scoping review of reviews.
- Author
-
Negash B, Katz A, Neilson CJ, Moni M, Nesca M, Singer A, and Enns JE
- Subjects
- Data Anonymization, Electronic Health Records, Health Insurance Portability and Accountability Act, Review Literature as Topic, United States, Confidentiality, Health Records, Personal
- Abstract
Introduction: Using data in research often requires that the data first be de-identified, particularly in the case of health data, which often include Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHII). There are established procedures for de-identifying structured data, but de-identifying clinical notes, electronic health records, and other records that include free text data is more complex. Several different ways to achieve this are documented in the literature. This scoping review identifies categories of de-identification methods that can be used for free text data., Methods: We adopted an established scoping review methodology to examine review articles published up to May 9, 2022, in Ovid MEDLINE; Ovid Embase; Scopus; the ACM Digital Library; IEEE Explore; and Compendex. Our research question was: What methods are used to de-identify free text data? Two independent reviewers conducted title and abstract screening and full-text article screening using the online review management tool Covidence., Results: The initial literature search retrieved 3,312 articles, most of which focused primarily on structured data. Eighteen publications describing methods of de-identification of free text data met the inclusion criteria for our review. The majority of the included articles focused on removing categories of personal health information identified by the Health Insurance Portability and Accountability Act (HIPAA). The de-identification methods they described combined rule-based methods or machine learning with other strategies such as deep learning., Conclusion: Our review identifies and categorises de-identification methods for free text data as rule-based methods, machine learning, deep learning and a combination of these and other approaches. Most of the articles we found in our search refer to de-identification methods that target some or all categories of PHII. Our review also highlights how de-identification systems for free text data have evolved over time and points to hybrid approaches as the most promising approach for the future., Competing Interests: Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, and/or publication of this article.
- Published
- 2023
- Full Text
- View/download PDF
6. Nonspecific deidentification of date-like text in deidentified clinical notes enables reidentification of dates.
- Author
-
Alexander J and Beatty A
- Subjects
- Confidentiality, Electronic Health Records, Health Insurance Portability and Accountability Act, Humans, United States, Data Anonymization, Text Messaging
- Abstract
To facilitate the secondary usage of electronic health record data for research, the University of California, San Francisco (UCSF) recently implemented a clinical data warehouse including, among other data, deidentified clinical notes and reports, which are available to UCSF researchers without Institutional Review Board approval. For deidentification of these notes, most of the Health Insurance Portability and Accountability Act identifiers are redacted, but dates are transformed by shifting all dates for a patient back by the same random number of days. We describe an issue in which nonspecific (ie, excess) transformation of nondate, date-like text by this deidentification process enables reidentification of all dates, including birthdates, for certain patients. This issue undercuts the common assumption that excess deidentification is a safe tradeoff to protect patient privacy. We present this issue as a caution to other institutions that may also be considering releasing deidentified notes for research., (© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2022
- Full Text
- View/download PDF
7. HIPAA and the Leak of "Deidentified" EHR Data.
- Author
-
Mandl KD and Perakslis ED
- Subjects
- Computer Security, Datasets as Topic, Humans, Information Dissemination, United States, Data Anonymization, Electronic Health Records, Health Insurance Portability and Accountability Act
- Published
- 2021
- Full Text
- View/download PDF
8. Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record Deidentification.
- Author
-
Ahmed A, Abbasi A, and Eickhoff C
- Subjects
- Electronic Health Records, Humans, United States, Benchmarking, Data Anonymization
- Abstract
Electronic Health Records (EHRs) have become the primary form of medical data-keeping across the United States. Federal law restricts the sharing of any EHR data that contains protected health information (PHI). De-identification, the process of identifying and removing all PHI, is crucial for making EHR data publicly available for scientific research. This project explores several deep learning-based named entity recognition (NER) methods to determine which method(s) perform better on the de-identification task. We trained and tested our models on the i2b2 training dataset, and qualitatively assessed their performance using EHR data collected from a local hospital. We found that 1) Bi-LSTM-CRF represents the best-performing encoder/decoder combination, 2) character-embeddings tend to improve precision at the price of recall, and 3) transformers alone under-perform as context encoders. Future work focused on structuring medical text may improve the extraction of semantic and syntactic information for the purposes of EHR deidentification., (©2021 AMIA - All rights reserved.)
- Published
- 2021
9. Religious residue: Cross-cultural evidence that religious psychology and behavior persist following deidentification.
- Author
-
Van Tongeren DR, DeWall CN, Chen Z, Sibley CG, and Bulbulia J
- Subjects
- Adult, Data Anonymization, Emotions, Female, Hong Kong, Humans, Male, Middle Aged, Netherlands, New Zealand, United States, Young Adult, Cross-Cultural Comparison, Religion and Psychology, Social Identification
- Abstract
More than 1 billion people worldwide report no religious affiliation. These religious "nones" represent the world's third largest religion-related identity group and are a diverse group, with some having previous religious identification and others never identifying as religious. We examined how 3 forms of religious identification-current, former, and never-influence a range of cognitions, emotions, and behavior. Three studies using nationally representative samples of religious Western (United States), secular Western (Netherlands, New Zealand) and Eastern (Hong Kong) cultures showed evidence of a religious residue effect : Formerly religious individuals (i.e., religious "dones") differed from never religious and currently religious individuals in cognitive, emotional, and behavioral processes. Study 1 ( n = 3,071) offered initial cross-cultural evidence, which was extended in a preregistered replication study that also included measures of charitable contribution (Study 2; n = 1,626). Study 3 ( N = 31,604) found that individuals who deidentified were still relatively likely to engage in prosocial behavior (e.g., volunteering) after leaving religion. This research has broad implications for understanding changing global trends in religious identification and their consequences for psychology and behavior. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
- Published
- 2021
- Full Text
- View/download PDF
10. Using word embeddings to improve the privacy of clinical notes.
- Author
-
Abdalla M, Abdalla M, Rudzicz F, and Hirst G
- Subjects
- Health Insurance Portability and Accountability Act, Health Records, Personal, Humans, United States, Algorithms, Confidentiality, Data Anonymization, Electronic Health Records, Natural Language Processing
- Abstract
Objective: In this work, we introduce a privacy technique for anonymizing clinical notes that guarantees all private health information is secured (including sensitive data, such as family history, that are not adequately covered by current techniques)., Materials and Methods: We employ a new "random replacement" paradigm (replacing each token in clinical notes with neighboring word vectors from the embedding space) to achieve 100% recall on the removal of sensitive information, unachievable with current "search-and-secure" paradigms. We demonstrate the utility of this paradigm on multiple corpora in a diverse set of classification tasks., Results: We empirically evaluate the effect of our anonymization technique both on upstream and downstream natural language processing tasks to show that our perturbations, while increasing security (ie, achieving 100% recall on any dataset), do not greatly impact the results of end-to-end machine learning approaches., Discussion: As long as current approaches utilize precision and recall to evaluate deidentification algorithms, there will remain a risk of overlooking sensitive information. Inspired by differential privacy, we sought to make it statistically infeasible to recreate the original data, although at the cost of readability. We hope that the work will serve as a catalyst to further research into alternative deidentification methods that can address current weaknesses., Conclusion: Our proposed technique can secure clinical texts at a low cost and extremely high recall with a readability trade-off while remaining useful for natural language processing classification tasks. We hope that our work can be used by risk-averse data holders to release clinical texts to researchers., (© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2020
- Full Text
- View/download PDF
11. Effect of Public Deliberation on Patient Attitudes Regarding Consent and Data Use in a Learning Health Care System for Oncology.
- Author
-
Jagsi R, Griffith KA, Jones RD, Krenz C, Gornick M, Spence R, De Vries R, Hawley ST, Zon R, Bolte S, Sadeghi N, Schilsky RL, and Bradbury AR
- Subjects
- Adult, Aged, Aged, 80 and over, Female, Health Insurance Portability and Accountability Act, Health Knowledge, Attitudes, Practice, Humans, Male, Middle Aged, Policy Making, United States, Data Anonymization, Electronic Health Records, Health Services Research, Informed Consent, Learning Health System, Medical Oncology, Patient Preference, Patients psychology, Public Opinion
- Abstract
Purpose: We sought to generate informed and considered opinions regarding acceptable secondary uses of deidentified health information and consent models for oncology learning health care systems., Methods: Day-long democratic deliberation sessions included 217 patients with cancer at four geographically and sociodemographically diverse sites. Patients completed three surveys (at baseline, immediately after deliberation, and 1-month follow-up)., Results: Participants were 67.3% female, 21.7% black, and 6.0% Hispanic. The most notable changes in perceptions after deliberation related to use of deidentified medical-record data by insurance companies. After discussion, 72.3% of participants felt comfortable if the purpose was to make sure patients receive recommended care ( v 79.5% at baseline; P = .03); 24.9% felt comfortable if the purpose was to determine eligibility for coverage or reimbursement ( v 50.9% at baseline; P < .001). The most notable change about secondary research use related to believing it was important that doctors ask patients at least once whether researchers can use deidentified medical-records data for future research. The proportion endorsing high importance decreased from baseline (82.2%) to 68.7% immediately after discussion ( P < .001), and remained decreased at 73.1% ( P = .01) at follow-up. At follow-up, non-Hispanic whites were more likely to consider it highly important to be able to conduct medical research with deidentified electronic health records (96.8% v 87.7%; P = .01) and less likely to consider it highly important for doctors to get a patient's permission each time deidentified medical record information is used for research (23.2% v 51.6%; P < .001)., Conclusion: This research confirms that most patients wish to be asked before deidentified medical records are used for research. Policies designed to realize the potential benefits of learning health care systems can, and should be, grounded in informed and considered public opinion.
- Published
- 2019
- Full Text
- View/download PDF
12. Novel Data Sharing Agreement to Accelerate Big Data Translational Research Projects in the One Health Sphere.
- Author
-
Staley J, Mazloom R, Lowe P, Newsum CT, Jaberi-Douraki M, Riviere J, and Wyckoff GJ
- Subjects
- Animals, Data Anonymization, Health Insurance Portability and Accountability Act, Humans, One Health, United States, Big Data, Information Dissemination methods, Translational Research, Biomedical organization & administration
- Abstract
When conducting translational research, the ability to share data generated by researchers and clinicians working with for-profit companies is essential, particularly in cases that involve "one health" data (i.e., data that could come from human, animal, or environmental sources). The 1DATA Project, a collaboration between Kansas State University and the University of Missouri, has examined and overcome some of the barriers to sharing this information for "big data" projects. This article discusses some of the obstacles we encountered, and the ways those obstacles can be surmounted via a novel form of Master Sharing Agreement. Developed in collaboration with industry partners, it is presented here as a template for expediting future one health work., (Copyright © 2019 Elsevier Inc. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
13. Researchers sound alarm on European data law.
- Author
-
Rabesandratana T
- Subjects
- Data Anonymization, Finland, Humans, National Institutes of Health (U.S.), Personally Identifiable Information legislation & jurisprudence, United States, Computer Security legislation & jurisprudence, Confidentiality legislation & jurisprudence, European Union, Information Dissemination legislation & jurisprudence, International Cooperation legislation & jurisprudence
- Published
- 2019
- Full Text
- View/download PDF
14. On Anonymizing Medical Microdata with Large-Scale Missing Values - A Case Study with the FAERS Dataset .
- Author
-
Hsiao MH, Lin WY, Hsu KY, and Shen ZX
- Subjects
- Algorithms, Humans, Privacy, United States, United States Food and Drug Administration, Data Anonymization, Drug-Related Side Effects and Adverse Reactions
- Abstract
As big data analysis becomes one of the main driving forces for productivity and economic growth, the concern of individual privacy disclosure increases as well, especially for applications accessing medical or health data that contain personal information. Most contemporary techniques for privacy preserving data publishing follow a simple assumption-the data of concern is complete, i.e., containing no missing values, which however is not the case in the real world. This paper presents our endeavors on inspecting the effect of missing values upon medical data privacy. In particular, we inspected the US FAERS dataset, a public dataset containing adverse drug events released by US FDA. Following the presumption of current anonymization paradigm-the data should contain no missing values, we investigated three intuitive strategies, including or excluding missing values or executing imputation, to anonymize the FAERS dataset. Our results demonstrate the awkwardness of these intuitive strategies in handling data with a massive amount of missing values. Accordingly, we propose a new strategy, consolidation, and the corresponding privacy protection model and anonymization algorithm. Experimental results show that our method can prevent privacy disclosure and sustain the data utility for ADR signal detection.
- Published
- 2019
- Full Text
- View/download PDF
15. Incorporating a location-based socioeconomic index into a de-identified i2b2 clinical data warehouse.
- Author
-
Gardner BJ, Pedersen JG, Campbell ME, and McClay JC
- Subjects
- Adolescent, Adult, Aged, Aged, 80 and over, Censuses, Child, Child, Preschool, Data Anonymization, Emergency Service, Hospital statistics & numerical data, Female, Geographic Information Systems, Humans, Infant, Infant, Newborn, Logistic Models, Male, Middle Aged, Nebraska, Socioeconomic Factors, United States, Young Adult, Data Warehousing, Electronic Health Records, Geographic Mapping, Social Class, Social Determinants of Health
- Abstract
Objective: Clinical research data warehouses are largely populated from information extracted from electronic health records (EHRs). While these data provide information about a patient's medications, laboratory results, diagnoses, and history, her social, economic, and environmental determinants of health are also major contributing factors in readmission, morbidity, and mortality and are often absent or unstructured in the EHR. Details about a patient's socioeconomic status may be found in the U.S. census. To facilitate researching the impacts of socioeconomic status on health outcomes, clinical and socioeconomic data must be linked in a repository in a fashion that supports seamless interrogation of these diverse data elements. This study demonstrates a method for linking clinical and location-based data and querying these data in a de-identified data warehouse using Informatics for Integrating Biology and the Bedside., Materials and Methods: Patient data were extracted from the EHR at Nebraska Medicine. Socioeconomic variables originated from the 2011-2015 five-year block group estimates from the American Community Survey. Data querying was performed using Informatics for Integrating Biology and the Bedside. All location-based data were truncated to prevent identification of a location with a population <20 000 individuals., Results: We successfully linked location-based and clinical data in a de-identified data warehouse and demonstrated its utility with a sample use case., Discussion: With location-based data available for querying, research investigating the impact of socioeconomic context on health outcomes is possible. Efforts to improve geocoding can readily be incorporated into this model., Conclusion: This study demonstrates a means for incorporating and querying census data in a de-identified clinical data warehouse., (© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2019
- Full Text
- View/download PDF
16. Human Biospecimens Come from People.
- Author
-
Tomlinson T and De Vries RG
- Subjects
- Costs and Cost Analysis, Data Anonymization, Electronic Health Records ethics, Electronic Health Records legislation & jurisprudence, Humans, Informed Consent ethics, Risk Assessment, Selection Bias, United States, United States Dept. of Health and Human Services legislation & jurisprudence, Biomedical Research ethics, Biomedical Research legislation & jurisprudence, Informed Consent legislation & jurisprudence, Mental Competency, Specimen Handling ethics
- Abstract
Contrary to the revised Common Rule, and contrary to the views of many bioethicists and researchers, we argue that broad consent should be sought for anticipated later research uses of deidentified biospecimens and health information collected during medical care. Individuals differ in the kinds of risk they find concerning and in their willingness to permit use of their biospecimens for future research. For this reason, asking their permission for unspecified research uses is a fundamental expression of respect for them as persons and should be done absent some compelling moral consideration to the contrary. We examine three moral considerations and argue that each of them fails: that there is a duty of easy rescue binding on all, that seeking consent creates a selection bias that undermines the validity of biospecimen research, and that seeking and documenting consent will be prohibitively expensive., (© 2019 by The Hastings Center. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
17. Re-Identification Risk in HIPAA De-Identified Datasets: The MVA Attack.
- Author
-
Janmey V and Elkin PL
- Subjects
- Confidentiality, Health Insurance Portability and Accountability Act, Humans, Male, Risk, United States, Accidents, Traffic, Data Anonymization, Datasets as Topic
- Abstract
We present a re-identification attack that uses indirect (non-HIPAA) identifiers to target a vulnerable subset of records de-identified to the HIPAA Safe Harbor standard, those involving motor vehicle accidents (MVAs). Documentation of an MVA in a patient note creates a significant risk to patient privacy through the MVA re-identification attack, with a relative risk of 537 compared to the general population. Patients in a significant MVA resulting in either permanent injury, hospitalization or death (for any victim) should have the accident location information omitted due to the significant risk of re-identification of HIPAA de-identified data. Clinicians should also consider omitting location information for any MVA, as it significantly increases the risk of re-identification.
- Published
- 2018
18. An Open Source Tool for Game Theoretic Health Data De-Identification.
- Author
-
Prasser F, Gaupp J, Wan Z, Xia W, Vorobeychik Y, Kantarcioglu M, Kuhn K, and Malin B
- Subjects
- Censuses, Confidentiality, Data Accuracy, Humans, United States, Data Anonymization, Game Theory, Software
- Abstract
Biomedical data continues to grow in quantity and quality, creating new opportunities for research and data-driven applications. To realize these activities at scale, data must be shared beyond its initial point of collection. To maintain privacy, healthcare organizations often de-identify data, but they assume worst-case adversaries, inducing high levels of data corruption. Recently, game theory has been proposed to account for the incentives of data publishers and recipients (who attempt to re-identify patients), but this perspective has been more hypothetical than practical. In this paper, we report on a new game theoretic data publication strategy and its integration into the open source software ARX. We evaluate our implementation with an analysis on the relationship between data transformation, utility, and efficiency for over 30,000 demographic records drawn from the U.S. Census Bureau. The results indicate that our implementation is scalable and can be combined with various data privacy risk and quality measures.
- Published
- 2018
19. Probabilistic Matching of Deidentified Data From a Trauma Registry and a Traumatic Brain Injury Model System Center: A Follow-up Validation Study.
- Author
-
Kumar RG, Wang Z, Kesinger MR, Newman M, Huynh TT, Niemeier JP, Sperry JL, and Wagner AK
- Subjects
- Adolescent, Adult, Data Anonymization, Databases, Factual, Female, Follow-Up Studies, Humans, Male, Middle Aged, Predictive Value of Tests, Prospective Studies, Registries, Sensitivity and Specificity, United States, Young Adult, Algorithms, Brain Injuries, Traumatic, Datasets as Topic statistics & numerical data, Models, Statistical, Trauma Severity Indices
- Abstract
In a previous study, individuals from a single Traumatic Brain Injury Model Systems and trauma center were matched using a novel probabilistic matching algorithm. The Traumatic Brain Injury Model Systems is a multicenter prospective cohort study containing more than 14,000 participants with traumatic brain injury, following them from inpatient rehabilitation to the community over the remainder of their lifetime. The National Trauma Databank is the largest aggregation of trauma data in the United States, including more than 6 million records. Linking these two databases offers a broad range of opportunities to explore research questions not otherwise possible. Our objective was to refine and validate the previous protocol at another independent center. An algorithm generation and validation data set were created, and potential matches were blocked by age, sex, and year of injury; total probabilistic weight was calculated based on of 12 common data fields. Validity metrics were calculated using a minimum probabilistic weight of 3. The positive predictive value was 98.2% and 97.4% and sensitivity was 74.1% and 76.3%, in the algorithm generation and validation set, respectively. These metrics were similar to the previous study. Future work will apply the refined probabilistic matching algorithm to the Traumatic Brain Injury Model Systems and the National Trauma Databank to generate a merged data set for clinical traumatic brain injury research use.
- Published
- 2018
- Full Text
- View/download PDF
20. The Promise and Perils of Open Medical Data.
- Author
-
Hoffman S
- Subjects
- Genome, Human, Humans, Internet, United States, Biological Specimen Banks ethics, Biological Specimen Banks standards, Data Anonymization, Genetic Privacy ethics, Genetic Privacy trends, Human Genome Project ethics, Information Dissemination, Precision Medicine, Public Policy, Social Discrimination prevention & control
- Published
- 2016
- Full Text
- View/download PDF
21. Ascertainment of outpatient visits by patients with diabetes: The National Ambulatory Medical Care Survey (NAMCS) and the National Hospital Ambulatory Medical Care Survey (NHAMCS).
- Author
-
Asao K, McEwen LN, Lee JM, and Herman WH
- Subjects
- Adolescent, Adult, Aged, Aged, 80 and over, Blood Glucose analysis, Data Anonymization, Diabetes Mellitus blood, Diabetes Mellitus drug therapy, Diabetes Mellitus epidemiology, Drug Monitoring, Drug Prescriptions, Electronic Health Records, Female, Glycated Hemoglobin analysis, Health Care Surveys, Humans, Hypoglycemic Agents therapeutic use, International Classification of Diseases, Male, Middle Aged, Physicians, Primary Care, Prevalence, United States epidemiology, Young Adult, Ambulatory Care, Diabetes Mellitus therapy, Primary Health Care
- Abstract
Aims: To estimate and evaluate the sensitivity and specificity of providers' diagnosis codes and medication lists to identify outpatient visits by patients with diabetes., Methods: We used data from the 2006 to 2010 National Ambulatory Medical Care Survey and National Hospital Ambulatory Medical Care Survey. We assessed the sensitivity and specificity of providers' diagnoses and medication lists to identify patients with diabetes, using the checkbox for diabetes as the gold standard. We then examined differences in sensitivity by patients' characteristics using multivariate logistic regression models., Results: The checkbox identified 12,647 outpatient visits by adults with diabetes among the 70,352 visits used for this analysis. The sensitivity and specificity of providers' diagnoses or listed diabetes medications were 72.3% (95% CI: 70.8% to 73.8%) and 99.2% (99.1% to 99.4%), respectively. Diabetic patients ≥75 years of age, women, non-Hispanics, and those with private insurance or Medicare were more likely to be missed by providers' diagnoses and medication lists. Diabetic patients who had more diagnosis codes and medications recorded, had glucose or hemoglobin A1c measured, or made office- rather than hospital-outpatient visits were less likely to be missed., Conclusions: Providers' diagnosis codes and medication lists fail to identify approximately one quarter of outpatient visits by patients with diabetes., (Copyright © 2015 Elsevier Inc. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.