Back to Search Start Over

Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome

Authors :
Mathieu Kuchenbuch
Antoine Neuraz
Rima Nabbout
Tommaso Lo Barco
Nicolas Garcelon
Gestionnaire, HAL Sorbonne Université 5
Service de neurologie pédiatrique [CHU Necker]
Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-CHU Necker - Enfants Malades [AP-HP]
Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)
Imagine - Institut des maladies génétiques (IHU) (Imagine - U1163)
Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris Cité (UPCité)
Centre de Recherche des Cordeliers (CRC (UMR_S_1138 / U1138))
École pratique des hautes études (EPHE)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité)
Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Paris (UP)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université de Paris (UP)
Neuraz, Antoine
Université Paris Cité - UFR Médecine [Santé] (UPCité UFR Médecine)
Université Paris Cité (UPCité)
CHU Necker - Enfants Malades [AP-HP]
Università degli studi di Verona = University of Verona (UNIVR)
Service d'informatique médicale et biostatistiques [CHU Necker]
This work was supported by State funding from the Agence Nationale de la Recherche under 'Investissements d’Avenir' program (ANR-10-IAHU-01) and the 'Fondation Bettencourt Schueller' (RN).
École Pratique des Hautes Études (EPHE)
Health data- and model- driven Knowledge Acquisition (HeKA)
Inria de Paris
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche des Cordeliers (CRC (UMR_S_1138 / U1138))
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité)-École Pratique des Hautes Études (EPHE)
Source :
Orphanet Journal of Rare Diseases, Orphanet Journal of Rare Diseases, BioMed Central, 2021, 16 (1), ⟨10.1186/s13023-021-01936-9⟩, Orphanet Journal of Rare Diseases, Vol 16, Iss 1, Pp 1-12 (2021), Orphanet Journal of Rare Diseases, 2021, 16 (1), pp.309. ⟨10.1186/s13023-021-01936-9⟩
Publication Year :
2021
Publisher :
BioMed Central, 2021.

Abstract

Background The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care. Methods Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions. Results We found significative higher representation of concepts related to seizures’ phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. Conclusions Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.

Details

Language :
English
ISSN :
17501172
Volume :
16
Database :
OpenAIRE
Journal :
Orphanet Journal of Rare Diseases
Accession number :
edsair.doi.dedup.....165b2bbed5f2873b6310089403c5ede4