Author: "N. Murphy" / Journal: journal of the american medical informatics association : jamia - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"N. Murphy"' showing total 15 results

Start Over Author "N. Murphy" Journal journal of the american medical informatics association : jamia

15 results on '"N. Murphy"'

1. Generative transfer learning for measuring plausibility of EHR diagnosis records

Author: Hossein Estiri, Shawn N. Murphy, and Sebastien Vasey
Subjects: genetic structures, AcademicSubjects/SCI01060, generative models, Computer science, Health Informatics, transfer learning, Machine learning, computer.software_genre, Research and Applications, Machine Learning, 03 medical and health sciences, 0302 clinical medicine, Joint probability distribution, health services administration, Diagnosis, data quality, Humans, Disease, 030212 general & internal medicine, health care economics and organizations, AcademicSubjects/MED00580, 030304 developmental biology, Probability, 0303 health sciences, business.industry, Professional-Patient Relations, Biobank, electronic health records, Data quality, Scalability, Artificial intelligence, Health information, Supervised Machine Learning, AcademicSubjects/SCI01530, Transfer of learning, business, computer, Classifier (UML), diagnosis records, Delivery of Health Care, Generative grammar
Abstract: Objective Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease. Materials and Methods Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-agnostic features from EHRs that aim to represent Patients, pRoviders, and their Interactions within the healthcare SysteM (PRISM features). Results We demonstrate that PRISM features and the generative PRISM classifiers are potent for estimating disease probabilities and exhibit generalizable and transferable distributional characteristics across diseases and patient populations. The joint probabilities we learn about diseases through the PRISM features via PRISM generative models are transferable and generalizable to multiple diseases. Discussion The Generative Transfer Learning (GTL) approach with PRISM classifiers enables the scalable validation of computable phenotypes in EHRs without the need for domain-specific knowledge about specific disease processes. Conclusion Probabilities computed from the generative PRISM classifier can enhance and accelerate applied Machine Learning research and discoveries with EHR data.
Published: 2020

2. The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics

Author: Reeta Metta, David Wang, Nich Wattanasin, Sergey Goryachev, Shawn N. Murphy, Andrew Cagan, Victor M. Castro, Michael Mendis, Martin Rees, Barbara Benoit, Christopher Herrick, Vivian S. Gainer, Bhaswati Ghosh, and Heekyong Park
Subjects: Informatics, Data curation, business.industry, Data Collection, Interoperability, Information Storage and Retrieval, Health Informatics, Construct (python library), Patient data, Information repository, Data science, Biobank, Analytics, Humans, business, Biological Specimen Banks
Abstract: Objective Integrating and harmonizing disparate patient data sources into one consolidated data portal enables researchers to conduct analysis efficiently and effectively. Materials and Methods We describe an implementation of Informatics for Integrating Biology and the Bedside (i2b2) to create the Mass General Brigham (MGB) Biobank Portal data repository. The repository integrates data from primary and curated data sources and is updated weekly. The data are made readily available to investigators in a data portal where they can easily construct and export customized datasets for analysis. Results As of July 2021, there are 125 645 consented patients enrolled in the MGB Biobank. 88 527 (70.5%) have a biospecimen, 55 121 (43.9%) have completed the health information survey, 43 552 (34.7%) have genomic data and 124 760 (99.3%) have EHR data. Twenty machine learning computed phenotypes are calculated on a weekly basis. There are currently 1220 active investigators who have run 58 793 patient queries and exported 10 257 analysis files. Discussion The Biobank Portal allows noninformatics researchers to conduct study feasibility by querying across many data sources and then extract data that are most useful to them for clinical studies. While institutions require substantial informatics resources to establish and maintain integrated data repositories, they yield significant research value to a wide range of investigators. Conclusion The Biobank Portal and other patient data portals that integrate complex and simple datasets enable diverse research use cases. i2b2 tools to implement these registries and make the data interoperable are open source and freely available.
Published: 2021

3. Validation of an Internationally Derived Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data

Author: Ne Hooi Will Loh, Shawn N. Murphy, Bertrand Moal, Siegbert Rieg, Kenneth D. Mandl, Yuan Luo, Douglas S. Bell, Riccardo Bellazzi, Martin Boeker, Michele I. Morris, Alberto Malovini, Thomas Maulhardt, Victor Castro, Valentina Tibollo, Hossein Estiri, Shyam Visweswaran, Kavishwar B. Wagholikar, Vianney Jouhet, Anthony L L J Li, Amelia L.M. Tan, Kee Yuan Ngiam, Chuan Hong, Alon Geva, Andrew M South, Emily Schriver, Gabriel A. Brat, Griffin M. Weber, Danielle L. Mowery, David A. Hanauer, Meghan R Hutch, Zongqi Xia, Jason H. Moore, Robert W Follett, Jeffrey G. Klann, Malarkodi J Samayamuthu, Gilbert S. Omenn, Luca Chiovato, Karen L. Olson, Paul Avillach, Brett K. Beaulieu-Jones, Isaac S. Kohane, Bordeaux population health (BPH), and Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)
Subjects: medicine.medical_specialty, Coronavirus disease 2019 (COVID-19), AcademicSubjects/SCI01060, High variability, novel coronavirus, Health Informatics, Research and Applications, 01 natural sciences, Health informatics, Sensitivity and Specificity, Severity of Illness Index, Machine Learning, 03 medical and health sciences, 0302 clinical medicine, Electronic health record, data interoperability, Chart review, Medicine, medical informatics, Electronic Health Records, Humans, 030212 general & internal medicine, 0101 mathematics, AcademicSubjects/MED00580, business.industry, 010102 general mathematics, COVID-19, Prognosis, Phenotype, 3. Good health, Icu admission, Hospitalization, computable phenotype, data networking, ROC Curve, Analytics, Emergency medicine, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, disease severity, AcademicSubjects/SCI01530, business
Abstract: Objective The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity. Materials and Methods Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability—up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions. Conclusions We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
Published: 2021
Full Text: View/download PDF

4. sureLDA: A multidisease automated phenotyping method for the electronic health record

Author: Yuri Ahuja, Shawn N. Murphy, Victor M. Castro, Doudou Zhou, Chuan Hong, Vivian S. Gainer, Jiehuan Sun, Tianxi Cai, and Zeling He
Subjects: 0301 basic medicine, Topic model, Computer science, Health Informatics, Feature selection, Machine learning, computer.software_genre, Research and Applications, Latent Dirichlet allocation, Bottleneck, Translational Research, Biomedical, 03 medical and health sciences, symbols.namesake, 0302 clinical medicine, Electronic health record, Electronic Health Records, Humans, 030212 general & internal medicine, Precision Medicine, Cluster analysis, Natural Language Processing, business.industry, Range (mathematics), 030104 developmental biology, ROC Curve, Feature (computer vision), symbols, Artificial intelligence, business, computer, Algorithms
Abstract: Objective A major bottleneck hindering utilization of electronic health record data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though International Classification of Diseases codes are often used as surrogates for true labels in this setting, these sometimes suffer from poor specificity. We propose a fully automated topic modeling algorithm to simultaneously annotate multiple phenotypes. Materials and Methods Surrogate-guided ensemble latent Dirichlet allocation (sureLDA) is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on 2 surrogate features for each target phenotype, and then leverages these probabilities to constrain the LDA topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities. Results sureLDA achieves reliably high accuracy and precision across a range of simulated and real-world phenotypes. Its performance is robust to phenotype prevalence and relative informativeness of surogate vs nonsurrogate features. It also exhibits powerful feature selection properties. Discussion sureLDA combines attractive properties of PheNorm and LDA to achieve high accuracy and precision robust to diverse phenotype characteristics. It offers particular improvement for phenotypes insufficiently captured by a few surrogate features. Moreover, sureLDA’s feature selection ability enables it to handle high feature dimensions and produce interpretable computational phenotypes. Conclusions sureLDA is well suited toward large-scale electronic health record phenotyping for highly multiphenotype applications such as phenome-wide association studies .
Published: 2019

5. SMART-on-FHIR implemented over i2b2

Author: Kavishwar B. Wagholikar, Christopher G. Chute, Nich Wattanasin, Kenneth D. Mandl, Michael Mendis, Shawn N. Murphy, Jeffery G. Klann, and Joshua C. Mandel
Subjects: clinical decision support, Source code, Biomedical Research, Health Information Exchange, 020205 medical informatics, Computer science, Interface (computing), media_common.quotation_subject, Interoperability, Health Informatics, 02 engineering and technology, computer.software_genre, Brief Communication, information storage and retrieval, 03 medical and health sciences, User-Computer Interface, 0302 clinical medicine, Data Warehousing, 0202 electrical engineering, electronic engineering, information engineering, medical informatics, Electronic Health Records, Humans, 030212 general & internal medicine, media_common, Health Level Seven, Authentication, Database, business.industry, Health Information Interoperability, Modular design, Data warehouse, systems integration, 3. Good health, Data access, Cache, business, computer, Database management systems, Software
Abstract: We have developed an interface to serve patient data from Informatics for Integrating Biology and the Bedside (i2b2) repositories in the Fast Healthcare Interoperability Resources (FHIR) format, referred to as a SMART-on-FHIR cell. The cell serves FHIR resources on a per-patient basis, and supports the “substitutable” modular third-party applications (SMART) OAuth2 specification for authorization of client applications. It is implemented as an i2b2 server plug-in, consisting of 6 modules: authentication, REST, i2b2-to-FHIR converter, resource enrichment, query engine, and cache. The source code is freely available as open source. We tested the cell by accessing resources from a test i2b2 installation, demonstrating that a SMART app can be launched from the cell that accesses patient data stored in i2b2. We successfully retrieved demographics, medications, labs, and diagnoses for test patients. The SMART-on-FHIR cell will enable i2b2 sites to provide simplified but secure data access in FHIR format, and will spur innovation and interoperability. Further, it transforms i2b2 into an apps platform.
Published: 2016

6. Data interchange using i2b2

Author: Shawn N. Murphy, Jeffrey G. Klann, Aaron Abend, Kenneth D. Mandl, and Vijay A. Raghavan
Subjects: Biomedical Research, 020205 medical informatics, Databases, Factual, Computer science, Interoperability, Information Storage and Retrieval, Health Informatics, patient centered outcomes research institute, 02 engineering and technology, computer.software_genre, Research and Applications, Data type, Data modeling, 03 medical and health sciences, 0302 clinical medicine, PCORnet CDM, 0202 electrical engineering, electronic engineering, information engineering, medical informatics, Data Mining, Electronic Health Records, Humans, 030212 general & internal medicine, data integration, informatics for integrating biology and the bedside, Database, business.industry, Health Information Interoperability, Information Dissemination, 3. Good health, data models, Data extraction, Data model, Information model, Analytics, ontology-driven data representation, business, computer, Data integration
Abstract: Objective Reinventing data extraction from electronic health records (EHRs) to meet new analytical needs is slow and expensive. However, each new data research network that wishes to support its own analytics tends to develop its own data model. Joining these different networks without new data extraction, transform, and load (ETL) processes can reduce the time and expense needed to participate. The Informatics for Integrating Biology and the Bedside (i2b2) project supports data network interoperability through an ontology-driven approach. We use i2b2 as a hub, to rapidly reconfigure data to meet new analytical requirements without new ETL programming. Materials and Methods Our 12-site National Patient-Centered Clinical Research Network (PCORnet) Clinical Data Research Network (CDRN) uses i2b2 to query data. We developed a process to generate a PCORnet Common Data Model (CDM) physical database directly from existing i2b2 systems, thereby supporting PCORnet analytic queries without new ETL programming. This involved: a formalized process for representing i2b2 information models (the specification of data types and formats); an information model that represents CDM Version 1.0; and a program that generates CDM tables, driven by this information model. This approach is generalizable to any logical information model. Results Eight PCORnet CDRN sites have implemented this approach and generated a CDM database without a new ETL process from the EHR. This enables federated querying within the CDRN and compatibility with the national PCORnet Distributed Research Network. Discussion We have established a way to adapt i2b2 to new information models without requiring changes to the underlying data. Eight Scalable Collaborative Infrastructure for a Learning Health System sites vetted this methodology, resulting in a network that, at present, supports research on 10 million patients’ data. Conclusion New analytical requirements can be quickly and cost-effectively supported by i2b2 without creating new data extraction processes from the EHR.
Published: 2016

7. Empowering genomic medicine by establishing critical sequencing result data flows: the eMERGE example

Author: Emily Beth Devine, Samuel J. Aronson, Andrea L. Hartzler, Keith Marsolo, Robert R. Freimuth, Casey Overby Taylor, Jamie R. Robinson, Pedro J. Caraballo, Luke V. Rasmussen, Darren C. Ames, Christopher G. Chute, Lawrence J. Babb, James D. Ralston, Shawn N. Murphy, Chunhua Weng, Iftikhar J. Kullo, Wayne H. Liang, Firas Wehbe, Ken Wiley, Richard A. Gibbs, Eric Venner, Marc S. Williams, John J. Connelly, and Josh F. Peterson
Subjects: 0301 basic medicine, Knowledge management, Computer science, Health Informatics, Case Report, Letter of transmittal, Clinical decision support system, 03 medical and health sciences, Computer Communication Networks, Organizational boundaries, Genomic medicine, Electronic Health Records, Humans, Genetic Testing, Point of care, business.industry, Genome, Human, Information Dissemination, Genomics, Sequence Analysis, DNA, United States, Test (assessment), 030104 developmental biology, Work (electrical), business, Healthcare providers
Abstract: The eMERGE Network is establishing methods for electronic transmittal of patient genetic test results from laboratories to healthcare providers across organizational boundaries. We surveyed the capabilities and needs of different network participants, established a common transfer format, and implemented transfer mechanisms based on this format. The interfaces we created are examples of the connectivity that must be instantiated before electronic genetic and genomic clinical decision support can be effectively built at the point of care. This work serves as a case example for both standards bodies and other organizations working to build the infrastructure required to provide better electronic clinical decision support for clinicians.
Published: 2017

8. Surrogate-assisted feature extraction for high-throughput phenotyping

Author: Ashwin N. Ananthakrishnan, Tianrun Cai, Shawn N. Murphy, Peter Szolovits, Isaac S. Kohane, Susanne Churchill, Katherine P. Liao, Sheng Yu, Abhishek Chakrabortty, Vivian S. Gainer, and Tianxi Cai
Subjects: 0301 basic medicine, Computer science, Feature extraction, Health Informatics, Feature selection, Overfitting, computer.software_genre, Machine learning, Research and Applications, ComputingMethodologies_ARTIFICIALINTELLIGENCE, Set (abstract data type), Machine Learning, 03 medical and health sciences, Robustness (computer science), Data Mining, Electronic Health Records, Humans, Throughput (business), Natural Language Processing, Receiver operating characteristic, business.industry, 030104 developmental biology, ComputingMethodologies_PATTERNRECOGNITION, Phenotype, Scalability, Data mining, Artificial intelligence, business, computer, Algorithms
Abstract: Objective: Phenotyping algorithms are capable of accurately identifying patients with specific phenotypes from within electronic medical records systems. However, developing phenotyping algorithms in a scalable way remains a challenge due to the extensive human resources required. This paper introduces a high-throughput unsupervised feature selection method, which improves the robustness and scalability of electronic medical record phenotyping without compromising its accuracy. Methods: The proposed Surrogate-Assisted Feature Extraction (SAFE) method selects candidate features from a pool of comprehensive medical concepts found in publicly available knowledge sources. The target phenotype’s International Classification of Diseases, Ninth Revision and natural language processing counts, acting as noisy surrogates to the gold-standard labels, are used to create silver-standard labels. Candidate features highly predictive of the silver-standard labels are selected as the final features. Results: Algorithms were trained to identify patients with coronary artery disease, rheumatoid arthritis, Crohn’s disease, and ulcerative colitis using various numbers of labels to compare the performance of features selected by SAFE, a previously published automated feature extraction for phenotyping procedure, and domain experts. The out-of-sample area under the receiver operating characteristic curve and F-score from SAFE algorithms were remarkably higher than those from the other two, especially at small label sizes. Conclusion: SAFE advances high-throughput phenotyping methods by automatically selecting a succinct set of informative features for algorithm training, which in turn reduces overfitting and the needed number of gold-standard labels. SAFE also potentially identifies important features missed by automated feature extraction for phenotyping or experts.
Published: 2016

9. The SMART Platform: early experience enabling substitutable applications for electronic health records

Author: Kenneth D. Mandl, Joshua C. Mandel, Isaac S. Kohane, Shawn N. Murphy, David A. Kreda, J. Michael McCoy, Rachel L. Ramoni, Ben Adida, and Elmer V. Bernstam
Subjects: Web standards, Health information technology, Interface (Java), Computer science, Electronic health record, Information Storage and Retrieval, Health Informatics, Research and Applications, medical informatics applications, World Wide Web, accountable care organizations, User-Computer Interface, Health care, Electronic Health Records, Humans, Computer Security, Internet, business.industry, Information technology, personal electronic health record, hospital information systems, Systems Integration, Workflow, System integration, The Internet, business, health information exchanges, Software
Abstract: Objective The Substitutable Medical Applications, Reusable Technologies (SMART) Platforms project seeks to develop a health information technology platform with substitutable applications (apps) constructed around core services. The authors believe this is a promising approach to driving down healthcare costs, supporting standards evolution, accommodating differences in care workflow, fostering competition in the market, and accelerating innovation. Materials and methods The Office of the National Coordinator for Health Information Technology, through the Strategic Health IT Advanced Research Projects (SHARP) Program, funds the project. The SMART team has focused on enabling the property of substitutability through an app programming interface leveraging web standards, presenting predictable data payloads, and abstracting away many details of enterprise health information technology systems. Containers—health information technology systems, such as electronic health records (EHR), personally controlled health records, and health information exchanges that use the SMART app programming interface or a portion of it—marshal data sources and present data simply, reliably, and consistently to apps. Results The SMART team has completed the first phase of the project (a) defining an app programming interface, (b) developing containers, and (c) producing a set of charter apps that showcase the system capabilities. A focal point of this phase was the SMART Apps Challenge, publicized by the White House, using http://www.challenge.gov website, and generating 15 app submissions with diverse functionality. Conclusion Key strategic decisions must be made about the most effective market for further disseminating SMART: existing market-leading EHR vendors, new entrants into the EHR market, or other stakeholders such as health information exchanges.
Published: 2012

10. Taking advantage of continuity of care documents to populate a research repository

Author: Alyssa P. Goodson, Michael Mendis, Howard S. Goldberg, Lori C. Phillips, Jeffrey G. Klann, Nich Wattanasin, Beatriz H. Rocha, and Shawn N. Murphy
Subjects: Biomedical Research, Meaningful Use, Health information technology, business.industry, Computer science, Information Storage and Retrieval, Health Informatics, computer.file_format, Ontology (information science), Continuity of Patient Care, Clinical Document Architecture, Research and Applications, Health informatics, Data warehouse, World Wide Web, Systems Integration, Data access, Databases as Topic, Informatics, Database Management Systems, Humans, business, computer, Barriers to entry
Abstract: Objective Clinical data warehouses have accelerated clinical research, but even with available open source tools, there is a high barrier to entry due to the complexity of normalizing and importing data. The Office of the National Coordinator for Health Information Technology's Meaningful Use Incentive Program now requires that electronic health record systems produce standardized consolidated clinical document architecture (C-CDA) documents. Here, we leverage this data source to create a low volume standards based import pipeline for the Informatics for Integrating Biology and the Bedside (i2b2) clinical research platform. We validate this approach by creating a small repository at Partners Healthcare automatically from C-CDA documents. Materials and methods We designed an i2b2 extension to import C-CDAs into i2b2. It is extensible to other sites with variances in C-CDA format without requiring custom code. We also designed new ontology structures for querying the imported data. Results We implemented our methodology at Partners Healthcare, where we developed an adapter to retrieve C-CDAs from Enterprise Services. Our current implementation supports demographics, encounters, problems, and medications. We imported approximately 17 000 clinical observations on 145 patients into i2b2 in about 24 min. We were able to perform i2b2 cohort finding queries and view patient information through SMART apps on the imported data. Discussion This low volume import approach can serve small practices with local access to C-CDAs and will allow patient registries to import patient supplied C-CDAs. These components will soon be available open source on the i2b2 wiki. Conclusions Our approach will lower barriers to entry in implementing i2b2 where informatics expertise or data access are limited.
Published: 2014

11. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources

Author: Tianxi Cai, Katherine P. Liao, Shawn N. Murphy, Peter Szolovits, Susanne Churchill, Vivian S. Gainer, Sheng Yu, Isaac S. Kohane, and Stanley Y. Shaw
Subjects: Computer science, Feature extraction, Information Storage and Retrieval, Health Informatics, CAD, Machine learning, computer.software_genre, Domain (software engineering), Arthritis, Rheumatoid, Text mining, Electronic Health Records, Humans, Focus on Natural Language Processing, Throughput (business), Selection (genetic algorithm), Natural Language Processing, Receiver operating characteristic, business.industry, Unified Medical Language System, Pattern recognition, ComputingMethodologies_PATTERNRECOGNITION, Artificial intelligence, business, computer, Algorithms
Abstract: Objective Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy.Materials and methods Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype.Results The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features.Discussion Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable.Conclusion The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping.
Published: 2014

12. Query Health: standards-based, cross-platform population health surveillance

Author: Griffin M. Weber, Jeffrey G. Klann, Jeffrey S. Brown, Shawn N. Murphy, Michael D. Buck, Richard Elmore, and Marc Hadley
Subjects: Health information technology, Health Informatics, Population health, Healthcare Quality Assessment, Query language, Clinical Document Architecture, Research and Applications, Health informatics, World Wide Web, Public health surveillance, Medicine, Humans, Public Health Surveillance, Public Health Informatics, HRHIS, Internet, business.industry, Information Dissemination, computer.file_format, Data science, United States, Public health informatics, Systems Integration, Vocabulary, Controlled, Database Management Systems, Programming Languages, business, computer, Medical Informatics, Information Systems
Abstract: Objective Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites.
Published: 2014

13. A translational engine at the national scale: informatics for integrating biology and the bedside

Author: Shawn N. Murphy, Isaac S. Kohane, and Susanne Churchill
Subjects: Diagnostic Imaging, medicine.medical_specialty, Alternative medicine, Health Informatics, Brief Communication, Health informatics, Translational Research, Biomedical, Health Administration Informatics, Nursing, Health care, Medicine, Humans, Instrumentation (computer programming), Medical Informatics Applications, Natural Language Processing, Medical education, business.industry, Genomics, Biological materials, United States, Scale (social sciences), Informatics, business, Goals, Software, Forecasting
Abstract: Informatics for integrating biology and the bedside (i2b2) seeks to provide the instrumentation for using the informational by-products of health care and the biological materials accumulated through the delivery of health care to conduct discovery research and to study the healthcare system in vivo. This complements existing efforts such as prospective cohort studies or trials outside the delivery of routine health care. i2b2 has been used to generate genome-wide studies at less than one tenth the cost and one tenth the time of conventionally performed studies as well as to identify important risk from commonly used medications. i2b2 has been adopted by over 60 academic health centers internationally.
Published: 2011

14. Strategies for maintaining patient privacy in i2b2

Author: Susanne Churchill, Shawn N. Murphy, Isaac S. Kohane, Michael Mendis, and Vivian S. Gainer
Subjects: Health Insurance Portability and Accountability Act, Computer science, Health information technology, Data security, Information Storage and Retrieval, Health Informatics, Data breach, Computer security, computer.software_genre, Security policy, Research and Applications, Masking (Electronic Health Record), United States, Systems Integration, Translational Research, Biomedical, Information sensitivity, Artificial Intelligence, Computer Systems, Data Protection Act 1998, Humans, computer, Algorithms, Confidentiality
Abstract: Surveys have found that patients' opinions of how their data should be protected fall along a continuum, and although most can be classified as cautious regarding the use of their healthcare record (EHR) data for research, the majority of patients are not averse to the idea.1 The most prevalent reason for keeping EHR data private is the perceived risk to patients' personal lives. Stigmatizing health conditions appearing in the EHR can threaten social relationships and status.2 It should not be taken for granted that EHR data be used for anything other than caring for the specific patient from whom it was collected, but when patients' legitimate concerns are dealt with in a sensitive manner, it is possible to work with EHR data in ethical ways while promoting clinical research. We have identified three areas of importance in maintaining patient privacy: de-identification of data, the patients' trust in the researcher and in the research and the technical data security of the computer system. Is there a form of de-identification appropriate for all-comers? Algorithms have been developed for structured data to prevent the disclosure of sensitive information even when distributed at a detailed, non-aggregated, line-item patient level using a method known as ‘k-level anonymity’,3 4 where k represents the number of peoples' records that must be indistinguishable from another record in the set if it is to pass scrutiny. When a patient record exceeds this level of uniqueness, data values are removed until the records are no longer unique. Although superficially such methods seem like an adequate solution to the de-identification problem, they have been shown to be subject to ‘reverse engineering’, an undoing of the obfuscation.5 Furthermore, they often remove critical attributes from the data.6 Another popular form of de-identification used for medical records is the ‘scrubbing’ of textual medical reports.7–9 Computer programs search the text and attempt to remove patient names, dates, locations and other potentially identifying information. These programs perform to various levels of accuracy, and involve similar trade-offs as those described above for structured data. To ensure that the data are de-identified and ‘unmatchable’ to the original record, sentence structure and other important attributes of the data must often be removed.10 The failure of technology alone to offer a foolproof de-identification solution is not surprising. People are extremely resourceful at solving challenging puzzles, such as the re-identification of de-identified data. However, the true risk may be greatly overemphasized by these demonstrations,11 and results in two not entirely satisfactory approaches to de-identification: one that produces de-identified output that has become stripped of meaningful data, and another that maintains germane information using methods that can be breached if they fall into the wrong hands. In attempts to solve this paradox, illogical decisions can be made about patient privacy solutions. For example, at Marshfield Clinic there has been an enormous investment in a bank of over 20 000 consented patients who are genotyped using donated blood and tissue. These genotypes are combined with de-identified phenotypic data from the Marshfield Clinic electronic medical record.12 A well-intentioned policy was put into place to keep the people who view identified phenotypic data from having access to the associated de-identified genomic data, with the reasoning that a person who could see both datasets might find a way to tie them together. As all the physicians at Marshfield Clinic must have access to the EHR, the outcome is that many Marshfield physicians who are investigators can not look at the data from their studies. One approach to resolving such privacy management discordance is to match the level of data de-identification with the trustworthiness of the data recipients, in which the more identified the data, the more ‘trustworthy’ the recipients are required to be, and vice versa. This solution necessitates that the idea of trustworthiness be quantified and governed by established socially acceptable processes, such as criminal history checks, letters of reference and credentialing systems that have been used in many scenarios in society to perform objective trust assessments. Specific methods used at Partners Healthcare and Harvard University will be described later in the paper. The level of trust for a data recipient becomes a critical factor in determining what data may be seen by that person. We also need to consider the technical protection of the patient data itself, for which the Health Information Technology for Economic and Clinical Health (HITECH) Act requires covered entities to conduct a risk analyses and implement physical, administrative and technical safeguards that each covered entity determines are reasonable and appropriate.13 Technical safeguards to consider consist of user access and authentication controls, assignment of privileges, maintaining file and system integrity, back-ups, monitoring processes, log-keeping, auditing and physically securing the data. A range of possible solutions exists for managing the technical protection of the data and represents different choices of the competing aspects of risk, cost and flexibility. A solution at the University of California at San Francisco (UCSF) was to create an exclusive, protected area for data and analysis inside a specially firewalled area for the research community. The incentive to use the protected area is that legal coverage is provided should a data breach occur within the protected area. This solution guarantees that the technical safeguards implemented by the institution within the protected area, such as firewalls and network intrusion detection, virtual private networks and disk encryption are followed by the researchers. However, this requires a high resource commitment from the institution to maintain the protected area, and the use of specialized software on privately funded platforms is not supported. With more responsibility and trust given to the researchers, institutions such as Partners Healthcare have similar policies as UCSF; however, researchers are free to use most areas behind the institutional firewalls. Researchers must prove their knowledge of the security policies by taking a certified course on human subject research protection and by specifying the technical protections of the patient data in their institutional review board (IRB) applications. The researchers have more freedom to use their local computational platforms and software, but the institution loses the ‘guarantee’ of a flawless implementation of its technical security policies as there would be in the UCSF solution. Therefore, the more liberal solution at institutions such as Partners Healthcare leads to more attention needing to be paid to data de-identification or encryption, and better determination of the trustworthiness and abilities of its data recipients to set up a technically safe environment. Our objective was to create the i2b2 software platform so that it complied with real-world use cases for how patient privacy solutions were implemented, but given that no solution would be perfect, represents a balance between the data de-identification technology, the safety of the technical platform, and the requirement of various levels of trust in the researchers. The use cases were simplified to five patient privacy levels with clear requirements in each of these three components, not because the situation was simple, but because of the complexity of keeping the platform consistent across the data protection levels. Of course, as i2b2 is open source it can be adapted to satisfy the patient privacy requirements of a local site; however, careful attention must be paid to a consistent data protection formulation throughout the platform.
Published: 2011

15. The guideline interchange format: a model for representing guidelines

Author: Nilesh L. Jain, Edward H. Shortliffe, John H. Gennari, Diane E. Oliver, Robert A. Greenes, Shawn N. Murphy, Lucila Ohno-Machado, G. Octo Barnett, Samson W. Tu, and Edward Pattison-Gordon
Subjects: Database, Syntax (programming languages), business.industry, Process (engineering), Computer science, Reminder Systems, Original Investigations, Health Informatics, Guideline, Collaboratory, computer.software_genre, Data type, Set (abstract data type), Systems Integration, Brainstorming, Software Design, Practice Guidelines as Topic, Software design, Artificial intelligence, business, computer, Natural language processing, Decision Making, Computer-Assisted, Software, Information Systems
Abstract: Objective: To allow exchange of clinical practice guidelines among institutions and computer-based applications. Design: The GuideLine Interchange Format (GLIF) specification consists of the GLIF model and the GLIF syntax. The GLIF model is an object-oriented representation that consists of a set of classes for guideline entities, attributes for those classes, and data types for the attribute values. The GLIF syntax specifies the format of the test file that contains the encoding. Methods: Researchers from the InterMed Collaboratory at Columbia University, Harvard University (Brigham and Women's Hospital and Massachusetts General Hospital), and Stanford University analyzed four existing guideline systems to derive a set of requirements for guideline representation. The GLIF specification is a consensus representation developed through a brainstorming process. Four clinical guidelines were encoded in GLIF to assess its expressivity and to study the variability that occurs when two people from different sites encode the same guideline. Results: The encoders reported that GLIF was adequately expressive. A comparison of the encodings revealed substantial variability. Conclusion: GLIF was sufficient to model the guidelines for the four conditions that were examined. GLIF needs improvement in standard representation of medical concepts, criterion logic, temporal information, and uncertainty.
Published: 1998

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

15 results on '"N. Murphy"'

1. Generative transfer learning for measuring plausibility of EHR diagnosis records

2. The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics

3. Validation of an Internationally Derived Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data

4. sureLDA: A multidisease automated phenotyping method for the electronic health record

5. SMART-on-FHIR implemented over i2b2

6. Data interchange using i2b2

7. Empowering genomic medicine by establishing critical sequencing result data flows: the eMERGE example

8. Surrogate-assisted feature extraction for high-throughput phenotyping

9. The SMART Platform: early experience enabling substitutable applications for electronic health records

10. Taking advantage of continuity of care documents to populate a research repository

11. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources

12. Query Health: standards-based, cross-platform population health surveillance

13. A translational engine at the national scale: informatics for integrating biology and the bedside

14. Strategies for maintaining patient privacy in i2b2

15. The guideline interchange format: a model for representing guidelines

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

15 results on '"N. Murphy"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources