Back to Search
Start Over
Programmatic extraction of information from unstructured clinical data and the assessment of potential impacts on epidemiological research
-
Abstract
- Background For epidemiological research purposes structured data provide identifiable and immediate access to the information that has been recorded, however, many quantitative recordings in electronic medical records are unstructured. This means researchers have to manually identify and extract information of interest. This is costly in terms of time and money and with access to larger amounts of electronically stored data this approach is becoming increasingly impractical. Method Two programmatic methods were developed to extract and classify numeric quantities and identify attributes from unstructured dosage instructions and clinical comments from The Health Improvement Network (THIN) database. Both methods are based on frequently occurring patterns of recording from which models were formed. Dosage instructions: Automated coding was achieved through the interpretation of a representative set of language phrases with identifiable traits. The dosage data table was automatically recoded and assessed for accuracy and coverage of a daily dosage value, then assessed in the context of 146 commonly prescribed medications. Clinical comments: Automated coding was achieved through the identification of a representative set of text and/or Read code qualifications. The model was initially trained on THIN data for a wide range of numeric health indicators, then tested for generalizability using comments from an alternative source and assessed for accuracy, sensitivity, and specificity using a subset of 12 commonly recorded health indicators. Results Dosage instructions: The coverage of a daily dosage value within the dosage data table was increased from 42.1% to 84.8% coverage with an accuracy of 84.6%. For the 146 medications assessed, on a per-unique-instruction basis, the coverage was 79.7% on average with an accuracy of 95.4%. On an all-recorded-instructions basis the weighted coverage was 65.9% on average with an accuracy of 99.3%. Clinical comments: For all 12 of the he
Details
- Database :
- OAIster
- Notes :
- application/pdf, Cochrane, Nicholas J.K. (2015) Programmatic extraction of information from unstructured clinical data and the assessment of potential impacts on epidemiological research. PhD thesis, University of Nottingham., English
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1312872119
- Document Type :
- Electronic Resource