32 results on '"Rouhizadeh M"'
Search Results
2. Modeling and Visualizing Locus of Control with Facebook Language
- Author
-
Kokil Jaidka, Buffone, A., Eichstaedt, J., Rouhizadeh, M., and Ungar, L. H.
- Abstract
A body of literature has demonstrated that users' psychological traits such as personality can be predicted from their posts on social media. However, there is still a gap between the computational and descriptive analyses of the language features associated with different psychological traits, and their use by social scientists and psychologists to make deeper behavioral inferences. In this study, we aim to bridge this gap with a visualization that situates the language associated with one psychological trait in the context of other psychological dimensions. We predict Locus of Control (LoC), an individual's perception of personal control over events in their lives, from their Facebook language (F1=0.82). We then look at how language explains the relationship of LoC with consciousness and emotional stability.
- Published
- 2018
- Full Text
- View/download PDF
3. Compound Verbs in Persian Wordnet
- Author
-
Mansoory, N., primary, Shamsfard, M., additional, and Rouhizadeh, M., additional
- Published
- 2011
- Full Text
- View/download PDF
4. Data collection and normalization for building the Scenario-Based Lexical Knowledge Resource of a text-to-scene conversion system.
- Author
-
Rouhizadeh, M., Bowler, M., Sproat, R., and Coyne, B.
- Published
- 2010
- Full Text
- View/download PDF
5. Distributional semantic models for the evaluation of disordered language
- Author
-
Rouhizadeh, M., Prud Hommeaux, E., Roark, B., and Jan van Santen
- Subjects
Article - Abstract
Atypical semantic and pragmatic expression is frequently reported in the language of children with autism. Although this atypicality often manifests itself in the use of unusual or unexpected words and phrases, the rate of use of such unexpected words is rarely directly measured or quantified. In this paper, we use distributional semantic models to automatically identify unexpected words in narrative retellings by children with autism. The classification of unexpected words is sufficiently accurate to distinguish the retellings of children with autism from those with typical development. These techniques demonstrate the potential of applying automated language analysis techniques to clinically elicited language data for diagnostic purposes.
6. Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation.
- Author
-
Zandbiglari K, Kumar S, Bilal M, Goodin A, and Rouhizadeh M
- Subjects
- Humans, Suicidal Ideation, Algorithms, Natural Language Processing, Electronic Health Records, Semantics, Suicide
- Abstract
Background: Suicide is a leading cause of death worldwide, making early identification of suicidal behaviors crucial for clinicians. Current Natural Language Processing (NLP) approaches for identifying suicidal behaviors in Electronic Health Records (EHRs) rely on keyword searches, rule-based methods, and binary classification, which may not fully capture the complexity and spectrum of suicidal behaviors. This study aims to create a multi-class labeled dataset with annotation guidelines and develop a novel NLP approach for fine-grained, multi-label classification of suicidal behaviors, improving the efficiency of the annotation process and accuracy of the NLP methods., Methods: We develop a multi-class labeling system based on guidelines from FDA, CDC, and WHO, distinguishing between six categories of suicidal behaviors and allowing for multiple labels per data sample. To efficiently create an annotated dataset, we use an MPNet-based semantic retrieval framework to extract relevant sentences from a large EHR dataset, reducing annotation space while capturing diverse expressions. Experts annotate the extracted sentences using the multi-class system. We then formulate the task as a multi-label classification problem and fine-tune transformer-based models on the curated dataset to accurately classify suicidal behaviors in EHRs., Results: Lexical analysis revealed key themes in assessing suicide risk, considering an individual's history, mental health, substance use, and family background. Fine-tuned transformer-based models effectively identified suicidal behaviors from EHRs, with Bio_ClinicalBERT, BioBERT, and XLNet achieving the F1 scores (0.81), outperforming BERT and RoBERTa. The proposed approach, based on a multi-label classification system, captures the complexity of suicidal behaviors effectively particularly "Suicide Attempt" and "Family History" instances. The proposed approach, using task-specific NLP models and a multi-label classification system, captures the complexity of suicidal behaviors more effectively than traditional binary classification. However, direct comparisons with existing studies are difficult due to varying metrics and label definitions., Conclusion: This study presents a robust NLP framework for detecting suicidal behaviors in EHRs, leveraging task-specific fine-tuning of transformer-based models and a semi-automated pipeline. Despite limitations, the approach demonstrates the potential of advanced NLP techniques in enhancing the identification of suicidal behaviors. Future work should focus on model expansion and integration to further improve patient care and clinical decision-making., Competing Interests: Declaration of competing interest All authors declare that they have no conflicts of interest., (Copyright © 2024 Elsevier Inc. All rights reserved.)
- Published
- 2025
- Full Text
- View/download PDF
7. Cannabis Use and Inhalational Anesthesia Administration in Older Adults: A Propensity-matched Retrospective Cohort Study.
- Author
-
Sajdeya R, Rouhizadeh M, Cook RL, Ison RL, Bai C, Jugl S, Gao H, Mardini MT, Zandbiglari K, Adiba FI, Dasa O, Winterstein AG, Price CC, Pearson TA, Seubert CN, and Tighe PJ
- Subjects
- Humans, Male, Aged, Female, Retrospective Studies, Cohort Studies, Aged, 80 and over, Isoflurane administration & dosage, Sevoflurane administration & dosage, Marijuana Use epidemiology, Anesthetics, Inhalation administration & dosage, Propensity Score, Anesthesia, Inhalation methods
- Abstract
Background: Cannabis use is associated with higher intravenous anesthetic administration. Similar data regarding inhalational anesthetics are limited. With rising cannabis use prevalence, understanding any potential relationship with inhalational anesthetic dosing is crucial. Average intraoperative isoflurane or sevoflurane minimum alveolar concentration equivalents between older adults with and without cannabis use were compared., Methods: The electronic health records of 22,476 surgical patients 65 yr or older at the University of Florida Health System between 2018 and 2020 were reviewed. The primary exposure was cannabis use within 60 days of surgery, determined via (1) a previously published natural language processing algorithm applied to unstructured notes and (2) structured data, including International Classification of Diseases codes for cannabis use disorders and poisoning by cannabis, laboratory cannabinoids screening results, and RxNorm codes. The primary outcome was the intraoperative time-weighted average of isoflurane or sevoflurane minimum alveolar concentration equivalents at 1-min resolution. No a priori minimally clinically important difference was established. Patients demonstrating cannabis use were matched 4:1 to non-cannabis use controls using a propensity score., Results: Among 5,118 meeting inclusion criteria, 1,340 patients (268 cannabis users and 1,072 nonusers) remained after propensity score matching. The median and interquartile range age was 69 (67 to 73) yr; 872 (65.0%) were male, and 1,143 (85.3%) were non-Hispanic White. The median (interquartile range) anesthesia duration was 175 (118 to 268) min. After matching, all baseline characteristics were well-balanced by exposure. Cannabis users had statistically significantly higher average minimum alveolar concentrations than nonusers (mean ± SD, 0.58 ± 0.23 vs. 0.54 ± 0.22, respectively; mean difference, 0.04; 95% confidence limits, 0.01 to 0.06; P = 0.020)., Conclusion: Cannabis use was associated with administering statistically significantly higher inhalational anesthetic minimum alveolar concentration equivalents in older adults, but the clinical significance of this difference is unclear. These data do not support the hypothesis that cannabis users require clinically meaningfully higher inhalational anesthetics doses., (Copyright © 2024 American Society of Anesthesiologists. All Rights Reserved.)
- Published
- 2024
- Full Text
- View/download PDF
8. Cannabis use and acute postoperative pain outcomes in older adults: a propensity matched retrospective cohort study.
- Author
-
Sajdeya R, Rouhizadeh M, Cook RL, Ison RL, Bai C, Jugl S, Gao H, Mardini MT, Dasa O, Zandbiglari K, Adiba FI, Winterstein AG, Price CC, Pearson TA, Seubert CN, and Tighe PJ
- Abstract
Introduction: Cannabis use is increasing among older adults, but its impact on postoperative pain outcomes remains unclear in this population. We examined the association between cannabis use and postoperative pain levels and opioid doses within 24 hours of surgery., Methods: We conducted a propensity score-matched retrospective cohort study using electronic health records data of 22 476 older surgical patients with at least 24-hour hospital stays at University of Florida Health between 2018 and 2020. Of the original cohort, 2577 patients were eligible for propensity-score matching (1:3 cannabis user: non-user). Cannabis use status was determined via natural language processing of clinical notes within 60 days of surgery and structured data. The primary outcomes were average Defense and Veterans Pain Rating Scale (DVPRS) score and total oral morphine equivalents (OME) within 24 hours of surgery., Results: 504 patients were included (126 cannabis users and 378 non-users). The median (IQR) age was 69 (65-72) years; 295 (58.53%) were male, and 442 (87.70%) were non-Hispanic white. Baseline characteristics were well balanced. Cannabis users had significantly higher average DVPRS scores (median (IQR): 4.68 (2.71-5.96) vs 3.88 (2.33, 5.17); difference=0.80; 95% confidence limit (CL), 0.19 to 1.36; p=0.01) and total OME (median (IQR): 42.50 (15.00-60.00) mg vs 30.00 (7.50-60.00) mg; difference=12.5 mg; 95% CL, 3.80 mg to 21.20 mg; p=0.02) than non-users within 24 hours of surgery., Discussion: This study showed that cannabis use in older adults was associated with increased postoperative pain levels and opioid doses., Competing Interests: Competing interests: None declared., (© American Society of Regional Anesthesia & Pain Medicine 2024. No commercial re-use. See rights and permissions. Published by BMJ.)
- Published
- 2024
- Full Text
- View/download PDF
9. Classifying early infant feeding status from clinical notes using natural language processing and machine learning.
- Author
-
Lemas DJ, Du X, Rouhizadeh M, Lewis B, Frank S, Wright L, Spirache A, Gonzalez L, Cheves R, Magalhães M, Zapata R, Reddy R, Xu K, Parker L, Harle C, Young B, Louis-Jaques A, Zhang B, Thompson L, Hogan WR, and Modave F
- Subjects
- Female, Humans, Infant, Software, Electronic Health Records, Mothers, Natural Language Processing, Machine Learning
- Abstract
The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
10. An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C).
- Author
-
Liu S, Wen A, Wang L, He H, Fu S, Miller R, Williams A, Harris D, Kavuluru R, Liu M, Abu-El-Rub N, Schutte D, Zhang R, Rouhizadeh M, Osborne JD, He Y, Topaloglu U, Hong SS, Saltz JH, Schaffter T, Pfaff E, Chute CG, Duong T, Haendel MA, Fuentes R, Szolovits P, Xu H, and Liu H
- Subjects
- Humans, Electronic Health Records, Algorithms, Natural Language Processing, COVID-19
- Abstract
Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts., (© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2023
- Full Text
- View/download PDF
11. Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system.
- Author
-
Gray GM, Zirikly A, Ahumada LM, Rouhizadeh M, Richards T, Kitchen C, Foroughmand I, and Hatef E
- Abstract
Objectives: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs)., Materials and Methods: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F 1 score., Results: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F 1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric., Discussion: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system., Conclusion: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system., Competing Interests: None declared., (© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2023
- Full Text
- View/download PDF
12. Understanding the circumstances of paediatric fall injuries: a machine learning analysis of NEISS narratives.
- Author
-
Omaki E, Shields W, Rouhizadeh M, Delgado-Barroso P, Stefanos R, and Gielen A
- Subjects
- Infant, Humans, Child, Child, Preschool, Infant, Newborn, Cross-Sectional Studies, Prevalence, Wounds and Injuries epidemiology
- Abstract
Objectives: Falls are the leading cause of non-fatal injury among young children. The aim of this study was to identify and quantify the circumstances contributing to medically attended paediatric fall injuries among 0-4 years old., Methods: Cross-sectional data for falls among kids under 5 years recorded between 2012 and 2016 in the National Electronic Injury Surveillance System was obtained. A sample of 4546 narratives was manually coded for: (1) where the child fell from; (2) what the child fell onto; (3) the activities preceding the fall and (4) how the fall occurred. A natural language processing model was developed and subsequently applied to the remaining uncoded data to yield a set of 91 325 cases coded for what the child fell from, fell onto, the activities preceding the fall, and how the fall occurred. Data were descriptively tabulated by age and disposition., Results: Children most often fell from the bed accounting for one-third (33%) of fall injuries in infants, 13% in toddlers and 12% in preschoolers. Children were more likely to be hospitalised if they fell from another person (7.4% vs 2.6% for all other sources; p<0.01). After adjusting for age, the odds of a child being hospitalised following a fall from another person were 2.1 times higher than falling from other surfaces (95% CI 1.6 to 2.7)., Conclusions: The prevalence of injuries due to falling off the bed, and the elevated risk of serious injury from falling from another person highlights the need for more robust and effective communication to caregivers on fall injury prevention., Competing Interests: Competing interests: None declared., (© Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ.)
- Published
- 2023
- Full Text
- View/download PDF
13. A risk identification model for detection of patients at risk of antidepressant discontinuation.
- Author
-
Zolnour A, Eldredge CE, Faiola A, Yaghoobzadeh Y, Khani M, Foy D, Topaz M, Kharrazi H, Fung KW, Fontelo P, Davoudi A, Tabaie A, Breitinger SA, Oesterle TS, Rouhizadeh M, Zonnor Z, Moen H, Patrick TB, and Zolnoori M
- Abstract
Purpose: Between 30 and 68% of patients prematurely discontinue their antidepressant treatment, posing significant risks to patient safety and healthcare outcomes. Online healthcare forums have the potential to offer a rich and unique source of data, revealing dimensions of antidepressant discontinuation that may not be captured by conventional data sources., Methods: We analyzed 891 patient narratives from the online healthcare forum, "askapatient.com," utilizing content analysis to create PsyRisk-a corpus highlighting the risk factors associated with antidepressant discontinuation. Leveraging PsyRisk, alongside PsyTAR [a publicly available corpus of adverse drug reactions (ADRs) related to antidepressants], we developed a machine learning-driven algorithm for proactive identification of patients at risk of abrupt antidepressant discontinuation., Results: From the analyzed 891 patients, 232 reported antidepressant discontinuation. Among these patients, 92% experienced ADRs, and 72% found these reactions distressful, negatively affecting their daily activities. Approximately 26% of patients perceived the antidepressants as ineffective. Most reported ADRs were physiological (61%, 411/673), followed by cognitive (30%, 197/673), and psychological (28%, 188/673) ADRs. In our study, we employed a nested cross-validation strategy with an outer 5-fold cross-validation for model selection, and an inner 5-fold cross-validation for hyperparameter tuning. The performance of our risk identification algorithm, as assessed through this robust validation technique, yielded an AUC-ROC of 90.77 and an F1-score of 83.33. The most significant contributors to abrupt discontinuation were high perceived distress from ADRs and perceived ineffectiveness of the antidepressants., Conclusion: The risk factors identified and the risk identification algorithm developed in this study have substantial potential for clinical application. They could assist healthcare professionals in identifying and managing patients with depression who are at risk of prematurely discontinuing their antidepressant treatment., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The handling editor LW declared a shared affiliation with author DF at the time of the review. The reviewer LT declared a shared affiliation with the authors MK and TP at the time of the review., (Copyright © 2023 Zolnour, Eldredge, Faiola, Yaghoobzadeh, Khani, Foy, Topaz, Kharrazi, Fung, Fontelo, Davoudi, Tabaie, Breitinger, Oesterle, Rouhizadeh, Zonnor, Moen, Patrick and Zolnoori.)
- Published
- 2023
- Full Text
- View/download PDF
14. A Natural Language Processing Algorithm for Classifying Suicidal Behaviors in Alzheimer's Disease and Related Dementia Patients: Development and Validation Using Electronic Health Records Data.
- Author
-
Zandbiglari K, Hasanzadeh HR, Kotecha P, Sajdeya R, Goodin AJ, Jiao T, Adiba FI, Mardini MT, Bian J, and Rouhizadeh M
- Abstract
This study aimed to develop a natural language processing algorithm (NLP) using machine learning (ML) and Deep Learning (DL) techniques to identify and classify documentation of suicidal behaviors in patients with Alzheimer's disease and related dementia (ADRD). We utilized MIMIC-III and MIMIC-IV datasets and identified ADRD patients and subsequently those with suicide ideation using relevant International Classification of Diseases (ICD) codes. We used cosine similarity with ScAN (Suicide Attempt and Ideation Events Dataset) to calculate semantic similarity scores of ScAN with extracted notes from MIMIC for the clinical notes. The notes were sorted based on these scores, and manual review and categorization into eight suicidal behavior categories were performed. The data were further analyzed using conventional ML and DL models, with manual annotation as a reference. The tested classifiers achieved classification results close to human performance with up to 98% precision and 98% recall of suicidal ideation in the ADRD patient population. Our NLP model effectively reproduced human annotation of suicidal ideation within the MIMIC dataset. These results establish a foundation for identifying and categorizing documentation related to suicidal ideation within ADRD population, contributing to the advancement of NLP techniques in healthcare for extracting and classifying clinical concepts, particularly focusing on suicidal ideation among patients with ADRD. Our study showcased the capability of a robust NLP algorithm to accurately identify and classify documentation of suicidal behaviors in ADRD patients.
- Published
- 2023
- Full Text
- View/download PDF
15. Developing and validating a natural language processing algorithm to extract preoperative cannabis use status documentation from unstructured narrative clinical notes.
- Author
-
Sajdeya R, Mardini MT, Tighe PJ, Ison RL, Bai C, Jugl S, Hanzhi G, Zandbiglari K, Adiba FI, Winterstein AG, Pearson TA, Cook RL, and Rouhizadeh M
- Subjects
- Humans, Natural Language Processing, Algorithms, Documentation, Electronic Health Records, Cannabis
- Abstract
Objective: This study aimed to develop a natural language processing algorithm (NLP) using machine learning (ML) techniques to identify and classify documentation of preoperative cannabis use status., Materials and Methods: We developed and applied a keyword search strategy to identify documentation of preoperative cannabis use status in clinical documentation within 60 days of surgery. We manually reviewed matching notes to classify each documentation into 8 different categories based on context, time, and certainty of cannabis use documentation. We applied 2 conventional ML and 3 deep learning models against manual annotation. We externally validated our model using the MIMIC-III dataset., Results: The tested classifiers achieved classification results close to human performance with up to 93% and 94% precision and 95% recall of preoperative cannabis use status documentation. External validation showed consistent results with up to 94% precision and recall., Discussion: Our NLP model successfully replicated human annotation of preoperative cannabis use documentation, providing a baseline framework for identifying and classifying documentation of cannabis use. We add to NLP methods applied in healthcare for clinical concept extraction and classification, mainly concerning social determinants of health and substance use. Our systematically developed lexicon provides a comprehensive knowledge-based resource covering a wide range of cannabis-related concepts for future NLP applications., Conclusion: We demonstrated that documentation of preoperative cannabis use status could be accurately identified using an NLP algorithm. This approach can be employed to identify comparison groups based on cannabis exposure for growing research efforts aiming to guide cannabis-related clinical practices and policies., (© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2023
- Full Text
- View/download PDF
16. Representing and utilizing clinical textual data for real world studies: An OHDSI approach.
- Author
-
Keloth VK, Banda JM, Gurley M, Heider PM, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves RM, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei WQ, Williams AE, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, and Xu H
- Subjects
- Humans, Electronic Health Records, Natural Language Processing, Narration, Data Science, Medical Informatics
- Abstract
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies., Competing Interests: Declaration of Competing Interest Dr. Hua Xu and The University of Texas Health Science Center at Houston have research related financial interests at Melax Technologies Inc. Dr. Xiaoyan Wang has related financial interests at Sema4 Mount Sinai Genomics Inc., (Copyright © 2023 Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
17. Identification of Prediabetes Discussions in Unstructured Clinical Documentation: Validation of a Natural Language Processing Algorithm.
- Author
-
Schwartz JL, Tseng E, Maruthur NM, and Rouhizadeh M
- Abstract
Background: Prediabetes affects 1 in 3 US adults. Most are not receiving evidence-based interventions, so understanding how providers discuss prediabetes with patients will inform how to improve their care., Objective: This study aimed to develop a natural language processing (NLP) algorithm using machine learning techniques to identify discussions of prediabetes in narrative documentation., Methods: We developed and applied a keyword search strategy to identify discussions of prediabetes in clinical documentation for patients with prediabetes. We manually reviewed matching notes to determine which represented actual prediabetes discussions. We applied 7 machine learning models against our manual annotation., Results: Machine learning classifiers were able to achieve classification results that were close to human performance with up to 98% precision and recall to identify prediabetes discussions in clinical documentation., Conclusions: We demonstrated that prediabetes discussions can be accurately identified using an NLP algorithm. This approach can be used to understand and identify prediabetes management practices in primary care, thereby informing interventions to improve guideline-concordant care., (©Jessica L Schwartz, Eva Tseng, Nisa M Maruthur, Masoud Rouhizadeh. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 24.02.2022.)
- Published
- 2022
- Full Text
- View/download PDF
18. Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: a comparison of 3 integrated healthcare delivery systems.
- Author
-
Hatef E, Rouhizadeh M, Nau C, Xie F, Rouillard C, Abu-Nasser M, Padilla A, Lyons LJ, Kharrazi H, Weiner JP, and Roblin D
- Abstract
Objective: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems., Materials and Methods: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity., Results: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0)., Discussion: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs., Conclusion: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems., (© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2022
- Full Text
- View/download PDF
19. Suicidal Ideation and Suicide-Attempt-Related Hospitalizations among People with Alzheimer's Disease (AD) and AD-Related Dementias in the United States during 2016-2018.
- Author
-
Alipour-Haris G, Armstrong MJ, Sullivan JL, Suryadevara U, Rouhizadeh M, and Brown JD
- Abstract
People living with Alzheimer's disease (AD) and AD-related dementias (ADRDs) are at a higher risk of suicidal behaviors given intersecting risk factors. Previous studies generally only focused on AD, small clinical samples, or grouped all dementia subtypes together, limiting insights for other ADRD subtypes. The objective of this study was to generate evidence related to the relative burden of suicidal behaviors (suicidal ideation and suicide attempt) among people with AD and ADRDs. This retrospective cross-sectional study identified hospitalizations related to suicidal behaviors (suicidal ideation and suicide attempt) for patients with Alzheimer's disease (AD) and AD-related dementias using ICD-10-CM codes from the Nationwide Readmissions Database (NRD). A logistic regression model was estimated to assess associations between AD/ADRD subtype and patient characteristics, and the risk for a suicidal-behavior-related hospitalization and modes of harm were reported. During 2016-2018, there were 12,538 hospitalizations related to suicidal behaviors for people with AD/ADRDs. The overall prevalence of suicidal-behavior-related hospitalizations was lowest for AD (0.8%) and highest for frontotemporal dementia (2.6%). Among hospitalizations for suicide attempts, the most common mode of harm was medications or drugs (89.2% of all attempts), followed by weapons (17.7%). We found that there was a difference in the frequency of suicidal-behavior-related hospitalizations among AD/ADRD hospitalized patients across dementia subtypes.
- Published
- 2022
- Full Text
- View/download PDF
20. Measuring the Value of a Practical Text Mining Approach to Identify Patients With Housing Issues in the Free-Text Notes in Electronic Health Record: Findings of a Retrospective Cohort Study.
- Author
-
Hatef E, Singh Deol G, Rouhizadeh M, Li A, Eibensteiner K, Monsen CB, Bratslaver R, Senese M, and Kharrazi H
- Subjects
- Data Mining, Female, Humans, Retrospective Studies, Social Determinants of Health, United States, Electronic Health Records, Housing
- Abstract
Introduction: Despite the growing efforts to standardize coding for social determinants of health (SDOH), they are infrequently captured in electronic health records (EHRs). Most SDOH variables are still captured in the unstructured fields (i.e., free-text) of EHRs. In this study we attempt to evaluate a practical text mining approach (i.e., advanced pattern matching techniques) in identifying phrases referring to housing issues, an important SDOH domain affecting value-based healthcare providers, using EHR of a large multispecialty medical group in the New England region, United States. To present how this approach would help the health systems to address the SDOH challenges of their patients we assess the demographic and clinical characteristics of patients with and without housing issues and briefly look into the patterns of healthcare utilization among the study population and for those with and without housing challenges. Methods: We identified five categories of housing issues [i.e., homelessness current (HC), homelessness history (HH), homelessness addressed (HA), housing instability (HI), and building quality (BQ)] and developed several phrases addressing each one through collaboration with SDOH experts, consulting the literature, and reviewing existing coding standards. We developed pattern-matching algorithms (i.e., advanced regular expressions), and then applied them in the selected EHR. We assessed the text mining approach for recall (sensitivity) and precision (positive predictive value) after comparing the identified phrases with manually annotated free-text for different housing issues. Results: The study dataset included EHR structured data for a total of 20,342 patients and 2,564,344 free-text clinical notes. The mean (SD) age in the study population was 75.96 (7.51). Additionally, 58.78% of the cohort were female. BQ and HI were the most frequent housing issues documented in EHR free-text notes and HH was the least frequent one. The regular expression methodology, when compared to manual annotation, had a high level of precision (positive predictive value) at phrase, note, and patient levels (96.36, 95.00, and 94.44%, respectively) across different categories of housing issues, but the recall (sensitivity) rate was relatively low (30.11, 32.20, and 41.46%, respectively). Conclusion: Results of this study can be used to advance the research in this domain, to assess the potential value of EHR's free-text in identifying patients with a high risk of housing issues, to improve patient care and outcomes, and to eventually mitigate socioeconomic disparities across individuals and communities., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2021 Hatef, Singh Deol, Rouhizadeh, Li, Eibensteiner, Monsen, Bratslaver, Senese and Kharrazi.)
- Published
- 2021
- Full Text
- View/download PDF
21. COVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model.
- Author
-
Wang J, Abu-El-Rub N, Gray J, Pham HA, Zhou Y, Manion FJ, Liu M, Song X, Xu H, Rouhizadeh M, and Zhang Y
- Subjects
- Deep Learning, Humans, Symptom Assessment methods, COVID-19 diagnosis, Electronic Health Records, Information Storage and Retrieval methods, Natural Language Processing
- Abstract
The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19., (© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2021
- Full Text
- View/download PDF
22. Assessing the Impact of Social Needs and Social Determinants of Health on Health Care Utilization: Using Patient- and Community-Level Data.
- Author
-
Hatef E, Ma X, Rouhizadeh M, Singh G, Weiner JP, and Kharrazi H
- Subjects
- Aged, Electronic Health Records, Female, Humans, Male, Patient Acceptance of Health Care, Residence Characteristics, United States, Medicare, Social Determinants of Health
- Abstract
As the US health care system moves to expand access to and quality of medical care, the importance of addressing patient-level social needs and community-level social determinants of health (SDOH) is increasingly being recognized. This study evaluates individual- and community-level needs of housing (one of the SDOH domains) across the patient population of an academic medical center and explores how the level of housing needs impacts health care utilization. The authors performed a descriptive analysis of housing issues identified in both structured and unstructured (eg, clinical notes) data extracted from the electronic health record (EHR) and compared this to community-level characteristics of patients' neighborhood as measured by the Area Deprivation Index. Multivariate analyses were performed to assess the association between these and other factors on the frequency of service encounters. Among the 1,034,683 study participants, 59,703 (5.8%) had at least 1 housing issue identified in their EHR from structured or unstructured data combined. After adjusting for other factors, patients with housing instability and homelessness had 49% and 34% more encounters with the health care system compared to patients without housing issues ( P < 0.00001). Patients living in the most disadvantaged neighborhoods had 55% more encounters with the health care system compared to those living in the most advantaged neighborhoods ( P < 0.00001). This data collection approach and findings can inform health care systems aiming to make use of their EHRs and community-level SDOH information to provide a full assessment of patients' social needs and challenges.
- Published
- 2021
- Full Text
- View/download PDF
23. Analysis of Primary Care Provider Electronic Health Record Notes for Discussions of Prediabetes Using Natural Language Processing Methods.
- Author
-
Tseng E, Schwartz JL, Rouhizadeh M, and Maruthur NM
- Published
- 2021
- Full Text
- View/download PDF
24. Patient Trajectories Among Persons Hospitalized for COVID-19 : A Cohort Study.
- Author
-
Garibaldi BT, Fiksel J, Muschelli J, Robinson ML, Rouhizadeh M, Perin J, Schumock G, Nagy P, Gray JH, Malapati H, Ghobadi-Krueger M, Niessen TM, Kim BS, Hill PM, Ahmed MS, Dobkin ED, Blanding R, Abele J, Woods B, Harkness K, Thiemann DR, Bowring MG, Shah AB, Wang MC, Bandeen-Roche K, Rosen A, Zeger SL, and Gupta A
- Subjects
- Adolescent, Adult, Aged, Aged, 80 and over, Child, Child, Preschool, Disease Progression, Female, Humans, Infant, Male, Middle Aged, Pandemics, Retrospective Studies, Risk Factors, SARS-CoV-2, United States epidemiology, COVID-19 mortality, Hospital Mortality, Hospitalization, Severity of Illness Index
- Abstract
Background: Risk factors for progression of coronavirus disease 2019 (COVID-19) to severe disease or death are underexplored in U.S. cohorts., Objective: To determine the factors on hospital admission that are predictive of severe disease or death from COVID-19., Design: Retrospective cohort analysis., Setting: Five hospitals in the Maryland and Washington, DC, area., Patients: 832 consecutive COVID-19 admissions from 4 March to 24 April 2020, with follow-up through 27 June 2020., Measurements: Patient trajectories and outcomes, categorized by using the World Health Organization COVID-19 disease severity scale. Primary outcomes were death and a composite of severe disease or death., Results: Median patient age was 64 years (range, 1 to 108 years); 47% were women, 40% were Black, 16% were Latinx, and 21% were nursing home residents. Among all patients, 131 (16%) died and 694 (83%) were discharged (523 [63%] had mild to moderate disease and 171 [20%] had severe disease). Of deaths, 66 (50%) were nursing home residents. Of 787 patients admitted with mild to moderate disease, 302 (38%) progressed to severe disease or death: 181 (60%) by day 2 and 238 (79%) by day 4. Patients had markedly different probabilities of disease progression on the basis of age, nursing home residence, comorbid conditions, obesity, respiratory symptoms, respiratory rate, fever, absolute lymphocyte count, hypoalbuminemia, troponin level, and C-reactive protein level and the interactions among these factors. Using only factors present on admission, a model to predict in-hospital disease progression had an area under the curve of 0.85, 0.79, and 0.79 at days 2, 4, and 7, respectively., Limitation: The study was done in a single health care system., Conclusion: A combination of demographic and clinical variables is strongly associated with severe COVID-19 disease or death and their early onset. The COVID-19 Inpatient Risk Calculator (CIRC), using factors present on admission, can inform clinical and resource allocation decisions., Primary Funding Source: Hopkins inHealth and COVID-19 Administrative Supplement for the HHS Region 3 Treatment Center from the Office of the Assistant Secretary for Preparedness and Response.
- Published
- 2021
- Full Text
- View/download PDF
25. COVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model.
- Author
-
Wang J, Abu-El-Rub N, Gray J, Pham HA, Zhou Y, Manion FJ, Liu M, Song X, Xu H, Rouhizadeh M, and Zhang Y
- Abstract
The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19., Competing Interests: CONFLICT OF INTEREST STATEMENT Dr Hua Xu, Mr Jingqi Wang, and The University of Texas Health Science Center at Houston have financial related research interest in Melax Technologies, Inc.
- Published
- 2020
26. Language impairment in adults with end-stage liver disease: application of natural language processing towards patient-generated health records.
- Author
-
Dickerson LK, Rouhizadeh M, Korotkaya Y, Bowring MG, Massie AB, McAdams-Demarco MA, Segev DL, Cannon A, Guerrerio AL, Chen PH, Philosophe BN, and Mogul DB
- Abstract
End-stage liver disease (ESLD) is associated with cognitive impairment ranging from subtle alterations in attention to overt hepatic encephalopathy that resolves after transplant. Natural language processing (NLP) may provide a useful method to assess cognitive status in this population. We identified 81 liver transplant recipients with ESLD (4/2013-2/2018) who sent at least one patient-to-provider electronic message pre-transplant and post-transplant, and matched them 1:1 to "healthy" controls-who had similar disease, but had not been evaluated for liver transplant-by age, gender, race/ethnicity, and liver disease. Messages written by patients pre-transplant and post-transplant and controls was compared across 19 NLP measures using paired Wilcoxon signed-rank tests. While there was no difference overall in word length, patients with Model for End-Stage Liver Disease Score (MELD) ≥ 30 ( n = 31) had decreased word length in pre-transplant messages (3.95 [interquartile range (IQR) 3.79, 4.14]) compared to post-transplant (4.13 [3.96, 4.28], p = 0.01) and controls (4.2 [4.0, 4.4], p = 0.01); there was no difference between post-transplant and controls ( p = 0.4). Patients with MELD ≥ 30 had fewer 6+ letter words in pre-transplant messages (19.5% [16.4, 25.9] compared to post-transplant (23.4% [20.0, 26.7] p = 0.02) and controls (25.0% [19.2, 29.4]; p = 0.01). Overall, patients had increased sentence length pre-transplant (12.0 [9.8, 13.7]) compared to post-transplant (11.0 [9.2, 13.3]; p = 0.046); the same was seen for MELD ≥ 30 (12.3 [9.8, 13.7] pre-transplant vs. 10.8 [9.6, 13.0] post-transplant; p = 0.050). Application of NLP to patient-generated messages identified language differences-longer sentences with shorter words-that resolved after transplant. NLP may provide opportunities to detect cognitive impairment in ESLD., Competing Interests: Competing interestsD.L.S. receives speaking and advisory honoraria from Novartis, Sanofi, and CSL Behring. The remaining authors declare no competing interests., (© The Author(s) 2019.)
- Published
- 2019
- Full Text
- View/download PDF
27. Assessing the Availability of Data on Social and Behavioral Determinants in Structured and Unstructured Electronic Health Records: A Retrospective Analysis of a Multilevel Health Care System.
- Author
-
Hatef E, Rouhizadeh M, Tia I, Lasser E, Hill-Briggs F, Marsteller J, and Kharrazi H
- Abstract
Background: Most US health care providers have adopted electronic health records (EHRs) that facilitate the uniform collection of clinical information. However, standardized data formats to capture social and behavioral determinants of health (SBDH) in structured EHR fields are still evolving and not adopted widely. Consequently, at the point of care, SBDH data are often documented within unstructured EHR fields that require time-consuming and subjective methods to retrieve. Meanwhile, collecting SBDH data using traditional surveys on a large sample of patients is infeasible for health care providers attempting to rapidly incorporate SBDH data in their population health management efforts. A potential approach to facilitate targeted SBDH data collection is applying information extraction methods to EHR data to prescreen the population for identification of immediate social needs., Objective: Our aim was to examine the availability and characteristics of SBDH data captured in the EHR of a multilevel academic health care system that provides both inpatient and outpatient care to patients with varying SBDH across Maryland., Methods: We measured the availability of selected patient-level SBDH in both structured and unstructured EHR data. We assessed various SBDH including demographics, preferred language, alcohol use, smoking status, social connection and/or isolation, housing issues, financial resource strains, and availability of a home address. EHR's structured data were represented by information collected between January 2003 and June 2018 from 5,401,324 patients. EHR's unstructured data represented information captured for 1,188,202 patients between July 2016 and May 2018 (a shorter time frame because of limited availability of consistent unstructured data). We used text-mining techniques to extract a subset of SBDH factors from EHR's unstructured data., Results: We identified a valid address or zip code for 5.2 million (95.00%) of approximately 5.4 million patients. Ethnicity was captured for 2.7 million (50.00%), whereas race was documented for 4.9 million (90.00%) and a preferred language for 2.7 million (49.00%) patients. Information regarding alcohol use and smoking status was coded for 490,348 (9.08%) and 1,728,749 (32.01%) patients, respectively. Using the International Classification of Diseases-10th Revision diagnoses codes, we identified 35,171 (0.65%) patients with information related to social connection/isolation, 10,433 (0.19%) patients with housing issues, and 3543 (0.07%) patients with income/financial resource strain. Of approximately 1.2 million unique patients with unstructured data, 30,893 (2.60%) had at least one clinical note containing phrases referring to social connection/isolation, 35,646 (3.00%) included housing issues, and 11,882 (1.00%) had mentions of financial resource strain., Conclusions: Apart from demographics, SBDH data are not regularly collected for patients. Health care providers should assess the availability and characteristics of SBDH data in EHRs. Evaluating the quality of SBDH data can potentially enable health care providers to modify underlying workflows to improve the documentation, collection, and extraction of SBDH data from EHRs., (©Elham Hatef, Masoud Rouhizadeh, Iddrisu Tia, Elyse Lasser, Felicia Hill-Briggs, Jill Marsteller, Hadi Kharrazi. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 02.08.2019.)
- Published
- 2019
- Full Text
- View/download PDF
28. Measuring idiosyncratic interests in children with autism.
- Author
-
Rouhizadeh M, Prud'hommeaux E, van Santen J, and Sproat R
- Abstract
A defining symptom of autism spectrum disorder (ASD) is the presence of restricted and repetitive activities and interests, which can surface in language as a perseverative focus on idiosyncratic topics. In this paper, we use semantic similarity measures to identify such idiosyncratic topics in narratives produced by children with and without ASD. We find that neurotypical children tend to use the same words and semantic concepts when retelling the same narrative, while children with ASD, even when producing accurate retellings, use different words and concepts relative not only to neurotypical children but also to other children with ASD. Our results indicate that children with ASD not only stray from the target topic but do so in idiosyncratic ways according to their own restricted interests.
- Published
- 2015
- Full Text
- View/download PDF
29. Similarity Measures for Quantifying Restrictive and Repetitive Behavior in Conversations of Autistic Children.
- Author
-
Rouhizadeh M, Sproat R, and van Santen J
- Abstract
Restrictive and repetitive behavior (RRB) is a core symptom of autism spectrum disorder (ASD) and are manifest in language. Based on this, we expect children with autism to talk about fewer topics, and more repeatedly, during their conversations. We thus hypothesize a higher semantic overlap ratio between dialogue turns in children with ASD compared to those with typical development (TD). Participants of this study include children ages 4-8, 44 with TD and 25 with ASD without language impairment. We apply several semantic similarity metrics to the children's dialogue turns in semi-structured conversations with examiners. We find that children with ASD have significantly more semantically overlapping turns than children with TD, across different turn intervals. These results support our hypothesis, and could provide a convenient and robust ASD-specific behavioral marker.
- Published
- 2015
30. COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM.
- Author
-
Prud'hommeaux E, Morley E, Rouhizadeh M, Silverman L, van Santen J, Roark B, Sproat R, Kauper S, and DeLaHunta R
- Abstract
Deficits in semantic and pragmatic expression are among the hallmark linguistic features of autism. Recent work in deriving computational correlates of clinical spoken language measures has demonstrated the utility of automated linguistic analysis for characterizing the language of children with autism. Most of this research, however, has focused either on young children still acquiring language or on small populations covering a wide age range. In this paper, we extract numerous linguistic features from narratives produced by two groups of children with and without autism from two narrow age ranges. We find that although many differences between diagnostic groups remain constant with age, certain pragmatic measures, particularly the ability to remain on topic and avoid digressions, seem to improve. These results confirm findings reported in the psychology literature while underscoring the need for careful consideration of the age range of the population under investigation when performing clinically oriented computational analysis of spoken language.
- Published
- 2014
- Full Text
- View/download PDF
31. Distributional semantic models for the evaluation of disordered language.
- Author
-
Rouhizadeh M, Prud'hommeaux E, Roark B, and van Santen J
- Abstract
Atypical semantic and pragmatic expression is frequently reported in the language of children with autism. Although this atypicality often manifests itself in the use of unusual or unexpected words and phrases, the rate of use of such unexpected words is rarely directly measured or quantified. In this paper, we use distributional semantic models to automatically identify unexpected words in narrative retellings by children with autism. The classification of unexpected words is sufficiently accurate to distinguish the retellings of children with autism from those with typical development. These techniques demonstrate the potential of applying automated language analysis techniques to clinically elicited language data for diagnostic purposes.
- Published
- 2013
32. Automatic detection of pragmatic deficits in children with autism.
- Author
-
Prud'hommeaux E and Rouhizadeh M
- Abstract
Autism spectrum disorder (ASD) is characterized by atypical and idiosyncratic language, which often has its roots in pragmatic deficits. Identifying and measuring pragmatic language ability is challenging and requires substantial clinical expertise. In this paper, we present a method for automatically identifying pragmatically inappropriate language in narratives using two features related to relevance and topicality. These features, which are derived using techniques from machine translation and information retrieval, are able to distinguish the narratives from children with ASD from those of their language-matched peers and may prove useful in the development of automated screening tools for autism and neurodevelopmental disorders.
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.