Back to Search Start Over

XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering

Authors :
Stremmel, Joel
Saeedi, Ardavan
Hassanzadeh, Hamid
Batra, Sanjit
Hertzberg, Jeffrey
Murillo, Jaime
Halperin, Eran
Publication Year :
2023

Abstract

Extractive question answering (QA) systems can enable physicians and researchers to query medical records, a foundational capability for designing clinical studies and understanding patient medical history. However, building these systems typically requires expert-annotated QA pairs. Large language models (LLMs), which can perform extractive QA, depend on high quality data in their prompts, specialized for the application domain. We introduce a novel approach, XAIQA, for generating synthetic QA pairs at scale from data naturally available in electronic health records. Our method uses the idea of a classification model explainer to generate questions and answers about medical concepts corresponding to medical codes. In an expert evaluation with two physicians, our method identifies $2.2\times$ more semantic matches and $3.8\times$ more clinical abbreviations than two popular approaches that use sentence transformers to create QA pairs. In an ML evaluation, adding our QA pairs improves performance of GPT-4 as an extractive QA model, including on difficult questions. In both the expert and ML evaluations, we examine trade-offs between our method and sentence transformers for QA pair generation depending on question difficulty.<br />Comment: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 8 pages

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2312.03567
Document Type :
Working Paper