1. Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation
- Author
-
Tholan, Masoud Thajudeen, Hegde, Vinayaka, Sharma, Chetan, and Ghosh, Prasanta Kumar
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Real-time Magnetic Resonance Imaging (rtMRI) is frequently used in speech production studies as it provides a complete view of the vocal tract during articulation. This study investigates the effectiveness of rtMRI in analyzing vocal tract movements by employing the SegNet and UNet models for Air-Tissue Boundary (ATB)segmentation tasks. We conducted pretraining of a few base models using increasing numbers of subjects and videos, to assess performance on two datasets. First, consisting of unseen subjects with unseen videos from the same data source, achieving 0.33% and 0.91% (Pixel-wise Classification Accuracy (PCA) and Dice Coefficient respectively) better than its matched condition. Second, comprising unseen videos from a new data source, where we obtained an accuracy of 99.63% and 98.09% (PCA and Dice Coefficient respectively) of its matched condition performance. Here, matched condition performance refers to the performance of a model trained only on the test subjects which was set as a benchmark for the other models. Our findings highlight the significance of fine-tuning and adapting models with limited data. Notably, we demonstrated that effective model adaptation can be achieved with as few as 15 rtMRI frames from any new dataset., Comment: Accepted to ICASSP 2025
- Published
- 2025