1. EILEEN: A Multi-Modal Framework for Extracting Alcohol Consumption Patterns From Bilingual Clinical Notes
- Author
-
Han Kyul Kim, Yujin Park, Yoon Ji Kim, Seungah Yi, Yeju Park, Sujin So, Hyeon-Ji Lee, and Ye Seul Bae
- Subjects
Clinical informatics ,alcohol information extraction ,natural language processing ,multimodal learning ,multilingual transformers ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In this work, we introduce EILEEN (Efficient Inference for Language-based Extraction of EHR Notes), a novel multi-modal natural language processing (NLP) framework designed to extract various alcohol consumption patterns from unstructured clinical notes, particularly in bilingual and non-English contexts. Recent advances in NLP have significantly improved information extraction capability across various domains. However, identifying patterns of alcohol consumption in medical documents remains underexplored, with existing approaches heavily relying on traditional NLP methods such as bag-of-words models that require extensive text preprocessing. These methods are often limited to English-language clinical settings, where robust medical ontologies and NLP toolkits are available to support preprocessing tasks. Therefore, this limitation hinders their use in multilingual healthcare settings and in environments lacking robust NLP toolkits to facilitate preprocessing. Motivated by the need for a more generalizable and accurate approach, this paper investigates the impact of large language models (LLMs) in advancing alcohol consumption pattern extraction from clinical notes. By reducing the need for manual preprocessing and improving adaptability to multilingual clinical notes, this work aims to enable broader, more practical applications of NLP models in extracting alcohol consumption patterns from clinical notes. By fine-tuning multilingual language models along with additional data sources, EILEEN effectively analyzes unstructured electronic health records (EHR) without relying on traditional concept normalization or extensive text preprocessing resources. Furthermore, the multi-modal component of EILEEN enables it to integrate and leverage diverse types of alcohol-related information, such as various types and amounts of alcohol consumed by a patient, thereby improving its pattern extraction accuracy. Our experiments, conducted in two different medical institutions in Korea, demonstrate that EILEEN significantly outperforms existing NLP methods in accurately identifying clinically relevant alcohol consumption patterns. By providing accurate, detailed, and clinically useful alcohol consumption patterns from unstructured clinical notes, EILEEN empowers healthcare practitioners with actionable insights essential for informed clinical decision-making.
- Published
- 2025
- Full Text
- View/download PDF