Back to Search Start Over

Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation

Authors :
Gauy, Marcelo Matheus
Koza, Natalia Hitomi
Morita, Ricardo Mikio
Stanzione, Gabriel Rocha
Junior, Arnaldo Candido
Berti, Larissa Cristina
Levin, Anna Sara Shafferman
Sabino, Ester Cerdeira
Svartman, Flaviane Romani Fernandes
Finger, Marcelo
Publication Year :
2024

Abstract

We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO$_2$) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO$_2$ levels, commonly defined as the threshold SpO$_2$ <92%. While SpO$_2$ serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO$_2$ levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO$_2$ levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO$_2$-regression into a SpO$_2$-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO$_2$ levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.<br />Comment: 23 pages, 4 figures, in review at Journal of Biomedical Signal Processing and Control

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2407.20989
Document Type :
Working Paper