Back to Search Start Over

A disease-specific language representation model for cerebrovascular disease research.

Authors :
Lin, Ching-Heng
Hsu, Kai-Cheng
Liang, Chih-Kuang
Lee, Tsong-Hai
Liou, Chia-Wei
Lee, Jiann-Der
Peng, Tsung-I
Shih, Ching-Sen
Fann, Yang C.
Source :
Computer Methods & Programs in Biomedicine. Nov2021, Vol. 211, pN.PAG-N.PAG. 1p.
Publication Year :
2021

Abstract

• StrokeBERT is a disease-specific BERT-based model pre-trained on real world evidence (RWE) from cerebrovascular disease-related corpora. • The model was evaluated and validated in larger, multiple-center datasets by two independent empirical tasks (stenosis detection and stroke recurrence prediction). • Disease-specific BERT model improves results of various disease-specific language processing tasks compared to similar BERT-models pre-trained on the general domain corpora. Effectively utilizing disease-relevant text information from unstructured clinical notes for medical research presents many challenges. BERT (Bidirectional Encoder Representation from Transformers) related models such as BioBERT and ClinicalBERT, pre-trained on biomedical corpora and general clinical information, have shown promising performance in various biomedical language processing tasks. This study aims to explore whether a BERT-based model pre-trained on disease-related clinical information can be more effective for cerebrovascular disease-relevant research. This study proposed the StrokeBERT which was initialized from BioBERT and pre-trained on large-scale cerebrovascular disease related clinical text information. The pre-trained corpora contained 113,590 discharge notes, 105,743 radiology reports, and 38,199 neurological reports. Two real-world empirical clinical tasks were conducted to validate StrokeBERT's performance. The first task identified extracranial and intracranial artery stenosis from two independent sets of radiology angiography reports. The second task predicted the risk of recurrent ischemic stroke based on patients' first discharge information. In stenosis detection, StrokeBERT showed improved performance on targeted carotid arteries, with an average AUC compared to that of ClinicalBERT of 0.968 ± 0.021 and 0.956 ± 0.018, respectively. In recurrent ischemic stroke prediction, after 10-fold cross-validation on 1,700 discharge information, StrokeBERT presented better prediction ability (AUC±SD = 0.838 ± 0.017) than ClinicalBERT (AUC±SD = 0.808 ± 0.045). The attention scores of StrokeBERT showed better ability to detect and associate cerebrovascular disease related terms than current BERT based models. This study shows that a disease-specific BERT model improved the performance and accuracy of various disease-specific language processing tasks and can readily be fine-tuned to advance cerebrovascular disease research and further developed for clinical applications. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01692607
Volume :
211
Database :
Academic Search Index
Journal :
Computer Methods & Programs in Biomedicine
Publication Type :
Academic Journal
Accession number :
153173183
Full Text :
https://doi.org/10.1016/j.cmpb.2021.106446