Author: "Wang, Yanshan" / Language: english - Searchworks@Jio Institute Digital Library Search Results

1. Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition

Author: Shen, Feichen, Liu, Sijia, Fu, Sunyang, Wang, Yanshan, Henry, Sam, Uzuner, Ozlem, and Liu, Hongfang
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: BackgroundAs a risk factor for many diseases, family history (FH) captures both shared genetic variations and living environments among family members. Though there are several systems focusing on FH extraction using natural language processing (NLP) techniques, the evaluation protocol of such systems has not been standardized. ObjectiveThe n2c2/OHNLP (National NLP Clinical Challenges/Open Health Natural Language Processing) 2019 FH extraction task aims to encourage the community efforts on a standard evaluation and system development on FH extraction from synthetic clinical narratives. MethodsWe organized the first BioCreative/OHNLP FH extraction shared task in 2018. We continued the shared task in 2019 in collaboration with the n2c2 and OHNLP consortium, and organized the 2019 n2c2/OHNLP FH extraction track. The shared task comprises 2 subtasks. Subtask 1 focuses on identifying family member entities and clinical observations (diseases), and subtask 2 expects the association of the living status, side of the family, and clinical observations with family members to be extracted. Subtask 2 is an end-to-end task which is based on the result of subtask 1. We manually curated the first deidentified clinical narrative from FH sections of clinical notes at Mayo Clinic Rochester, the content of which is highly relevant to patients’ FH. ResultsA total of 17 teams from all over the world participated in the n2c2/OHNLP FH extraction shared task, where 38 runs were submitted for subtask 1 and 21 runs were submitted for subtask 2. For subtask 1, the top 3 runs were generated by Harbin Institute of Technology, ezDI, Inc., and The Medical University of South Carolina with F1 scores of 0.8745, 0.8225, and 0.8130, respectively. For subtask 2, the top 3 runs were from Harbin Institute of Technology, ezDI, Inc., and University of Florida with F1 scores of 0.681, 0.6586, and 0.6544, respectively. The workshop was held in conjunction with the AMIA 2019 Fall Symposium. ConclusionsA wide variety of methods were used by different teams in both tasks, such as Bidirectional Encoder Representations from Transformers, convolutional neural network, bidirectional long short-term memory, conditional random field, support vector machine, and rule-based strategies. System performances show that relation extraction from FH is a more challenging task when compared to entity identification task.
Published: 2021
Full Text: View/download PDF

2. The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview

Author: Wang, Yanshan, Fu, Sunyang, Shen, Feichen, Henry, Sam, Uzuner, Ozlem, and Liu, Hongfang
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: BackgroundSemantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. ObjectiveOur objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. MethodsWe organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. ResultsOf the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. ConclusionsThe 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.
Published: 2020
Full Text: View/download PDF

3. Characterizing Chronic Pain Episodes in Clinical Text at Two Health Care Systems: Comprehensive Annotation and Corpus Analysis

Author: Carlson, Luke A, Jeffery, Molly M, Fu, Sunyang, He, Huan, McCoy, Rozalina G, Wang, Yanshan, Hooten, William Michael, St Sauver, Jennifer, Liu, Hongfang, and Fan, Jungwei
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: BackgroundChronic pain affects more than 20% of adults in the United States and is associated with substantial physical, mental, and social burden. Clinical text contains rich information about chronic pain, but no systematic appraisal has been performed to assess the electronic health record (EHR) narratives for these patients. A formal content analysis of the unstructured EHR data can inform clinical practice and research in chronic pain. ObjectiveWe characterized individual episodes of chronic pain by annotating and analyzing EHR notes for a stratified cohort of adults with known chronic pain. MethodsWe used the Rochester Epidemiology Project infrastructure to screen all residents of Olmsted County, Minnesota, for evidence of chronic pain, between January 1, 2005, and September 30, 2015. Diagnosis codes were used to assemble a cohort of 6586 chronic pain patients; people with cancer were excluded. The records of an age- and sex-stratified random sample of 62 patients from the cohort were annotated using an iteratively developed guideline. The annotated concepts included date, location, severity, causes, effects on quality of life, diagnostic procedures, medications, and other treatment modalities. ResultsA total of 94 chronic pain episodes from 62 distinct patients were identified by reviewing 3272 clinical notes. Documentation was written by clinicians across a wide spectrum of specialties. Most patients (40/62, 65%) had 1 pain episode during the study period. Interannotator agreement ranged from 0.78 to 1.00 across the annotated concepts. Some pain-related concepts (eg, body location) had 100% (94/94) coverage among all the episodes, while others had moderate coverage (eg, effects on quality of life) (55/94, 59%). Back pain and leg pain were the most common types of chronic pain in the annotated cohort. Musculoskeletal issues like arthritis were annotated as the most common causes. Opioids were the most commonly captured medication, while physical and occupational therapies were the most common nonpharmacological treatments. ConclusionsWe systematically annotated chronic pain episodes in clinical text. The rich content analysis results revealed complexity of the chronic pain episodes and of their management, as well as the challenges in extracting pertinent information, even for humans. Despite the pilot study nature of the work, the annotation guideline and corpus should be able to serve as informative references for other institutions with shared interest in chronic pain research using EHRs.
Published: 2020
Full Text: View/download PDF

4. Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation

Author: Liu, Sijia, Wang, Yanshan, Wen, Andrew, Wang, Liwei, Hong, Na, Shen, Feichen, Bedrick, Steven, Hersh, William, and Liu, Hongfang
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: BackgroundWidespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. ObjectiveIn this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). MethodsCREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. ResultsOur case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. ConclusionsThe implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.
Published: 2020
Full Text: View/download PDF

5. Integrating structured and unstructured data for predicting emergency severity: an association and predictive study using transformer-based natural language processing models

Author: Zhang, Xingyu, Wang, Yanshan, Jiang, Yun, Pacella, Charissa B., and Zhang, Wenbin
Published: 2024
Full Text: View/download PDF

6. PRISM: Patient Records Interpretation for Semantic clinical trial Matching system using large language models

Author: Gupta, Shashi, Basu, Aditya, Nievas, Mauro, Thomas, Jerrin, Wolfrath, Nathan, Ramamurthi, Adhitya, Taylor, Bradley, Kothari, Anai N., Schwind, Regina, Miller, Therica M., Nadaf-Rahrov, Sorena, Wang, Yanshan, and Singh, Hrituraj
Published: 2024
Full Text: View/download PDF

7. A framework for human evaluation of large language models in healthcare derived from literature review

Author: Tam, Thomas Yu Chow, Sivarajkumar, Sonish, Kapoor, Sumit, Stolyar, Alisa V., Polanska, Katelyn, McCarthy, Karleigh R., Osterhoudt, Hunter, Wu, Xizhi, Visweswaran, Shyam, Fu, Sunyang, Mathur, Piyush, Cacciamani, Giovanni E., Sun, Cong, Peng, Yifan, and Wang, Yanshan
Published: 2024
Full Text: View/download PDF

8. Enhancing post-traumatic stress disorder patient assessment: leveraging natural language processing for research of domain criteria identification using electronic medical records

Author: Miranda, Oshin, Kiehl, Sophie Marie, Qi, Xiguang, Brannock, M. Daniel, Kosten, Thomas, Ryan, Neal David, Kirisci, Levent, Wang, Yanshan, and Wang, LiRong
Published: 2024
Full Text: View/download PDF

9. Emerging opportunities of using large language models for translation between drug molecules and indications

Author: Oniani, David, Hilsman, Jordan, Zang, Chengxi, Wang, Junmei, Cai, Lianjin, Zawala, Jan, and Wang, Yanshan
Published: 2024
Full Text: View/download PDF

10. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

Author: Abbasian, Mahyar, Khatibi, Elahe, Azimi, Iman, Oniani, David, Shakeri Hossein Abad, Zahra, Thieme, Alexander, Sriram, Ram, Yang, Zhongqi, Wang, Yanshan, Lin, Bryant, Gevaert, Olivier, Li, Li-Jia, Jain, Ramesh, and Rahmani, Amir M.
Published: 2024
Full Text: View/download PDF

11. Clinical Information Retrieval: A Literature Review

Author: Sivarajkumar, Sonish, Mohammad, Haneef Ahamed, Oniani, David, Roberts, Kirk, Hersh, William, Liu, Hongfang, He, Daqing, Visweswaran, Shyam, and Wang, Yanshan
Published: 2024
Full Text: View/download PDF

12. Effect of torrefaction atmospheres on the pyrolysis and combustion of torrefied municipal solid waste

Author: Zhu, Xiaochao, Li, Songjiang, Wang, Yanshan, Zhou, Shengquan, Li, Jian, Su, Hong, Sun, Yunan, Yan, Beibei, and Chen, Guanyi
Published: 2024
Full Text: View/download PDF

13. Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness

Author: Zhang, Gongbo, Jin, Qiao, Jered McInerney, Denis, Chen, Yong, Wang, Fei, Cole, Curtis L., Yang, Qian, Wang, Yanshan, Malin, Bradley A, Peleg, Mor, Wallace, Byron C., Lu, Zhiyong, Weng, Chunhua, and Peng, Yifan
Published: 2024
Full Text: View/download PDF

14. Statistics of Generative Artificial Intelligence and Nongenerative Predictive Analytics Machine Learning in Medicine

Author: Rashidi, Hooman H., Hu, Bo, Pantanowitz, Joshua, Tran, Nam, Liu, Silvia, Chamanzar, Alireza, Gur, Mert, Chang, Chung-Chou H., Wang, Yanshan, Tafti, Ahmad, Pantanowitz, Liron, and Hanna, Matthew G.
Published: 2024
Full Text: View/download PDF

15. Adopting and expanding ethical principles for generative artificial intelligence from military to healthcare

Author: Oniani, David, Hilsman, Jordan, Peng, Yifan, Poropatich, Ronald K., Pamplin, Jeremy C., Legault, Gary L., and Wang, Yanshan
Published: 2023
Full Text: View/download PDF

16. Activation of peroxymonosulfate by food waste digestate derived biochar for sulfamethoxazole degradation: Performance and mechanism

Author: Wang, Yanshan, Liang, Lan, Dai, Haoxi, Li, Ning, Song, Yingjin, Yan, Beibei, Chen, Guanyi, and Hou, Li'an
Published: 2023
Full Text: View/download PDF

17. Fair patient model: Mitigating bias in the patient representation learned from the electronic health records

Author: Sivarajkumar, Sonish, Huang, Yufei, and Wang, Yanshan
Published: 2023
Full Text: View/download PDF

18. Wet torrefaction coupled pyrolysis of camel dung: Fuel properties, pyrolysis behaviors and evolved gases

Author: Wang, Yanshan, Zhu, Xiaochao, Li, Songjiang, Song, Yingjin, Chen, Guanyi, and Hou, Li'an
Published: 2023
Full Text: View/download PDF

19. ReDWINE: A clinical datamart with text analytical capabilities to facilitate rehabilitation research

Author: Oniani, David, Parmanto, Bambang, Saptono, Andi, Bove, Allyn, Freburger, Janet, Visweswaran, Shyam, Cappella, Nickie, McLay, Brian, Silverstein, Jonathan C., Becich, Michael J., Delitto, Anthony, Skidmore, Elizabeth, and Wang, Yanshan
Published: 2023
Full Text: View/download PDF

20. Conversion and impact of dissolved organic matters in a heterogeneous catalytic peroxymonosulfate system for pollutant degradation

Author: Wang, Yanshan, Li, Ning, Fu, Qinglong, Cheng, Zhanjun, Song, Yingjin, Yan, Beibei, Chen, Guanyi, Hou, Li'an, and Wang, Shaobin
Published: 2023
Full Text: View/download PDF

21. Effect of the recurring random signal waveform on SBS and self-pulsing in a phase-modulated narrow-linewidth linearly polarized fiber amplifier

Author: Wang, Yanshan, Sun, Yinhong, Peng, Wanjing, Wang, Jue, Feng, Yujun, Ma, Yi, Gao, Qingsong, Zhu, Rihong, and Tang, Chun
Published: 2022
Full Text: View/download PDF

22. Breaking rate-limiting steps in a red mud-sewage sludge carbon catalyst activated peroxymonosulfate system: Effect of pyrolysis temperature

Author: Liang, Lan, Wang, Yanshan, Li, Ning, Yan, Beibei, Chen, Guanyi, and Hou, Li'an
Published: 2022
Full Text: View/download PDF

23. Effect of phosphates on oxidative species generation and sulfamethoxazole degradation in a pig manure derived biochar activated peroxymonosulfate system

Author: Wang, Chuanbin, Wang, Yanshan, Yu, Yang, Cui, Xiaoqiang, Yan, Beibei, Song, Yingjin, Li, Ning, Chen, Guanyi, and Wang, Shaobin
Published: 2022
Full Text: View/download PDF

24. Influences and mechanisms of phosphate ions onto persulfate activation and organic degradation in water treatment: A review

Author: Li, Ning, Wang, Yanshan, Cheng, Xiaoshuang, Dai, Haoxi, Yan, Beibei, Chen, Guanyi, Hou, Li'an, and Wang, Shaobin
Published: 2022
Full Text: View/download PDF

25. Sulfamethoxazole degradation by regulating active sites on distilled spirits lees-derived biochar in a continuous flow fixed bed peroxymonosulfate reactor

Author: Wang, Yanshan, Peng, Wenchao, Wang, Jun, Chen, Guanyi, Li, Ning, Song, Yingjin, Cheng, Zhanjun, Yan, Beibei, Hou, Li’an, and Wang, Shaobin
Published: 2022
Full Text: View/download PDF

26. Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.

Author: Sivarajkumar, Sonish, Tam, Thomas Yu Chow, Mohammad, Haneef Ahamed, Viggiano, Samuel, Oniani, David, Visweswaran, Shyam, and Wang, Yanshan
Abstract: Objectives Alzheimer's disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. We aim to automate the extraction of specific sleep-related patterns, such as snoring, napping, poor sleep quality, daytime sleepiness, night wakings, other sleep problems, and sleep duration, from clinical notes of AD patients. These sleep patterns are hypothesized to play a role in the incidence of AD, providing insight into the relationship between sleep and AD onset and progression. Materials and Methods A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192 000 de-identified clinical notes of 7266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based natural language processing (NLP) algorithm, machine learning models, and large language model (LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset. Results The annotated dataset of 482 patients comprised a predominantly White (89.2%), older adult population with an average age of 84.7 years, where females represented 64.1%, and a vast majority were non-Hispanic or Latino (94.6%). Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of positive predictive value (PPV), the rule-based NLP algorithm achieved the highest PPV scores for daytime sleepiness (1.00) and sleep duration (1.00), while the machine learning models had the highest PPV for napping (0.95) and bad sleep quality (0.86), and LLAMA2 with finetuning had the highest PPV for night wakings (0.93) and sleep problem (0.89). Discussion Although sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches did not achieve good results, which is due to the small size of sleep information in the training data. Conclusion The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD but could be extended to general sleep information extraction for other diseases. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Active sites decoration on sewage sludge-red mud complex biochar for persulfate activation to degrade sulfanilamide

Author: Liang, Lan, Chen, Guanyi, Li, Ning, Liu, Hengxin, Yan, Beibei, Wang, Yanshan, Duan, Xiaoguang, Hou, Li'an, and Wang, Shaobin
Published: 2022
Full Text: View/download PDF

28. Tunable active sites on biogas digestate derived biochar for sulfanilamide degradation by peroxymonosulfate activation

Author: Wang, Yanshan, Song, Yingjin, Li, Ning, Liu, Wen, Yan, Beibei, Yu, Yang, Liang, Lan, Chen, Guanyi, Hou, Li’an, and Wang, Shaobin
Published: 2022
Full Text: View/download PDF

29. Co/N co-doped carbonized wood sponge with 3D porous framework for efficient peroxymonosulfate activation: Performance and internal mechanism

Author: Yu, Yang, Li, Ning, Lu, Xukai, Yan, Beibei, Chen, Guanyi, Wang, Yanshan, Duan, Xiaoguang, Cheng, Zhanjun, and Wang, Shaobin
Published: 2022
Full Text: View/download PDF

30. Computational drug repurposing based on electronic health records: a scoping review

Author: Zong, Nansu, Wen, Andrew, Moon, Sungrim, Fu, Sunyang, Wang, Liwei, Zhao, Yiqing, Yu, Yue, Huang, Ming, Wang, Yanshan, Zheng, Gang, Mielke, Michelle M., Cerhan, James R., and Liu, Hongfang
Published: 2022
Full Text: View/download PDF

31. MedSTS : a resource for clinical semantic textual similarity

Author: Wang, Yanshan, Afzal, Naveed, Fu, Sunyang, Wang, Liwei, Shen, Feichen, Rastegar-Mojarad, Majid, and Liu, Hongfang
Published: 2020

32. Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision

Author: Shen, Zitao, Schutte, Dalton, Yi, Yoonkwon, Bompelli, Anusha, Yu, Fang, Wang, Yanshan, and Zhang, Rui
Published: 2022
Full Text: View/download PDF

33. Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation

Author: Cusick, Marika, Adekkanattu, Prakash, Campion, Thomas R., Jr., Sholle, Evan T., Myers, Annie, Banerjee, Samprit, Alexopoulos, George, Wang, Yanshan, and Pathak, Jyotishman
Published: 2021
Full Text: View/download PDF

34. An aberration detection-based approach for sentinel syndromic surveillance of COVID-19 and other novel influenza-like illnesses

Author: Wen, Andrew, Wang, Liwei, He, Huan, Liu, Sijia, Fu, Sunyang, Sohn, Sunghwan, Kugel, Jacob A., Kaggal, Vinod C., Huang, Ming, Wang, Yanshan, Shen, Feichen, Fan, Jungwei, and Liu, Hongfang
Published: 2021
Full Text: View/download PDF

35. Large language models for biomedicine: foundations, opportunities, challenges, and best practices.

Author: Sahoo, Satya S, Plasek, Joseph M, Xu, Hua, Uzuner, Özlem, Cohen, Trevor, Yetisgen, Meliha, Liu, Hongfang, Meystre, Stéphane, and Wang, Yanshan
Abstract: Objectives Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF). Target Audience Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices. Scope We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Distilling large language models for matching patients to clinical trials.

Author: Nievas, Mauro, Basu, Aditya, Wang, Yanshan, and Singh, Hrituraj
Abstract: Objective The objective of this study is to systematically examine the efficacy of both proprietary (GPT-3.5, GPT-4) and open-source large language models (LLMs) (LLAMA 7B, 13B, 70B) in the context of matching patients to clinical trials in healthcare. Materials and methods The study employs a multifaceted evaluation framework, incorporating extensive automated and human-centric assessments along with a detailed error analysis for each model, and assesses LLMs' capabilities in analyzing patient eligibility against clinical trial's inclusion and exclusion criteria. To improve the adaptability of open-source LLMs, a specialized synthetic dataset was created using GPT-4, facilitating effective fine-tuning under constrained data conditions. Results The findings indicate that open-source LLMs, when fine-tuned on this limited and synthetic dataset, achieve performance parity with their proprietary counterparts, such as GPT-3.5. Discussion This study highlights the recent success of LLMs in the high-stakes domain of healthcare, specifically in patient-trial matching. The research demonstrates the potential of open-source models to match the performance of proprietary models when fine-tuned appropriately, addressing challenges like cost, privacy, and reproducibility concerns associated with closed-source proprietary LLMs. Conclusion The study underscores the opportunity for open-source LLMs in patient-trial matching. To encourage further research and applications in this field, the annotated evaluation dataset and the fine-tuned LLM, Trial-LLAMA, are released for public use. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction.

Author: Fu, Sunyang, Wang, Liwei, He, Huan, Wen, Andrew, Zong, Nansu, Kumari, Anamika, Liu, Feifan, Zhou, Sicheng, Zhang, Rui, Li, Chenyu, Wang, Yanshan, Sauver, Jennifer St, Liu, Hongfang, and Sohn, Sunghwan
Abstract: Background Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. Objectives This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. Materials and Methods We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both.dtd and.owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. Results The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. Conclusion The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Clinical concept extraction: A methodology review

Author: Fu, Sunyang, Chen, David, He, Huan, Liu, Sijia, Moon, Sungrim, Peterson, Kevin J., Shen, Feichen, Wang, Liwei, Wang, Yanshan, Wen, Andrew, Zhao, Yiqing, Sohn, Sunghwan, and Liu, Hongfang
Published: 2020
Full Text: View/download PDF

39. Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records

Author: Wang, Yanshan, Zhao, Yiqing, Therneau, Terry M., Atkinson, Elizabeth J., Tafti, Ahmad P., Zhang, Nan, Amin, Shreyasee, Limper, Andrew H., Khosla, Sundeep, and Liu, Hongfang
Published: 2020
Full Text: View/download PDF

40. Automatic extraction and assessment of lifestyle exposures for Alzheimer’s disease using natural language processing

Author: Zhou, Xin, Wang, Yanshan, Sohn, Sunghwan, Therneau, Terry M., Liu, Hongfang, and Knopman, David S.
Published: 2019
Full Text: View/download PDF

41. HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology

Author: Shen, Feichen, Peng, Suyuan, Fan, Yadan, Wen, Andrew, Liu, Sijia, Wang, Yanshan, Wang, Liwei, and Liu, Hongfang
Published: 2019
Full Text: View/download PDF

42. A comparison of word embeddings for the biomedical natural language processing

Author: Wang, Yanshan, Liu, Sijia, Afzal, Naveed, Rastegar-Mojarad, Majid, Wang, Liwei, Shen, Feichen, Kingsbury, Paul, and Liu, Hongfang
Published: 2018
Full Text: View/download PDF

43. Clinical information extraction applications: A literature review

Author: Wang, Yanshan, Wang, Liwei, Rastegar-Mojarad, Majid, Moon, Sungrim, Shen, Feichen, Afzal, Naveed, Liu, Sijia, Zeng, Yuqun, Mehrabi, Saeed, Sohn, Sunghwan, and Liu, Hongfang
Published: 2018
Full Text: View/download PDF

44. Salience of Medical Concepts of Inside Clinical Texts and Outside Medical Records for Referred Cardiovascular Patients

Author: Moon, Sungrim, Liu, Sijia, Chen, David, Wang, Yanshan, Wood, Douglas L., Chaudhry, Rajeev, Liu, Hongfang, and Kingsbury, Paul
Published: 2019
Full Text: View/download PDF

45. Spectral broadening in narrow linewidth, continuous-wave high power fiber amplifiers

Author: Feng, Yujun, Wang, Xiaojun, Ke, Weiwei, Sun, Yinhong, Zhang, Kai, Ma, Yi, Li, Tenglong, Wang, Yanshan, and Wu, Juan
Published: 2017
Full Text: View/download PDF

46. Boosting electrocatalytic activities of plasmonic metallic nanostructures by tuning the kinetic pre-exponential factor

Author: Xiong, Yunjie, Ren, Mingjun, Li, Dongdong, Lin, Bolin, Zou, Liangliang, Wang, Yanshan, Zheng, Haifeng, Zou, Zhiqing, Zhou, Yi, Ding, Yihong, Wang, Zhongyang, Dai, Liming, and Yang, Hui
Published: 2017
Full Text: View/download PDF

47. Flexible broadband plasmonic absorber on moth-eye substrate

Author: Ji, Ting, Wang, Yanshan, Cui, Yanxia, Lin, Yinyue, Hao, Yuying, and Li, Dongdong
Published: 2017
Full Text: View/download PDF

48. Influence of seed instability on the stimulated Raman scattering of high power narrow linewidth fiber amplifier

Author: Wang, Yanshan, Peng, Wanjing, Ke, Weiwei, Sun, Yinhong, Chang, Zhe, Ma, Yi, Zhu, Rihong, and Tang, Chun
Published: 2020
Full Text: View/download PDF

49. Use of Natural Language Processing Algorithms to Identify Common Data Elements in Operative Notes for Total Hip Arthroplasty

Author: Wyles, Cody C., Tibbo, Meagan E., Fu, Sunyang, Wang, Yanshan, Sohn, Sunghwan, Kremers, Walter K., Berry, Daniel J., Lewallen, David G., and Maradit-Kremers, Hilal
Published: 2019
Full Text: View/download PDF

50. Atrial overexpression of microRNA-27b attenuates angiotensin II-induced atrial fibrosis and fibrillation by targeting ALK5

Author: Wang, Yanshan, Cai, Heng, Li, Hongmei, Gao, Zhisheng, and Song, Kunqing
Published: 2018
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

311 results on '"Wang, Yanshan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources