Back to Search Start Over

Advanced montane bird monitoring using self-supervised learning and transformer on passive acoustic data.

Authors :
Wei, Yu-Cheng
Chen, Wei-Lun
Tuanmu, Mao-Ning
Lu, Sheng-Shan
Shiao, Ming-Tang
Source :
Ecological Informatics; Dec2024, Vol. 84, pN.PAG-N.PAG, 1p
Publication Year :
2024

Abstract

Passive acoustic monitoring combined with deep learning-based bird sound classifiers is an effective tool, particularly in remote areas. While self-supervised learning has recently excelled in natural language processing and image recognition, its application to bird sound recognition remains limited. This study proposes an innovative self-supervised learning approach, which leverages vast amounts of passive acoustic recordings for pre-training, followed by fine-tuning of target species. Compared to the three state-of-the-art models based on transfer learning from ImageNet, the proposed method demonstrated improvements in overall recognition performance, with even more significant gains for tail-end species. These results confirm that domain-specific pre-training in self-supervised learning enhances downstream recognition tasks and provides greater robustness, benefiting tail-end species in imbalanced ecological datasets. Our experiments further demonstrate that integrating open-source datasets and data augmentation techniques is the most effective strategy for mitigating data imbalances and cross-domain issues. In addition, introducing a 'catch-all' category into training datasets has been shown to improve model robustness in open set recognition scenarios. We also identified the minimum viable sample size requirements for our proposed model and explored the impact of overlapping bird vocalizations during dawn choruses on model performance. Targeting 31 bird species in the montane regions of subtropical Taiwan, the model achieved a class-wise mean average precision of 0.782 and an overall precision of 85.6 % at the F 0.5 threshold in dawn chorus soundscape recordings. This study confirms the effectiveness and advantages of self-supervised learning in bird sound recognition, supporting long-term monitoring of bird distribution and vocal activity in remote montane areas. [Display omitted] • Self-supervised audio-MAE excels in bird sound recognition, notably for tail species. • Domain-specific tasks enhance recognition, outperforming ImageNet-based models. • Data augmentation mitigate data imbalance and cross-domain challenges. • 'Catch-all' category boosts model robustness in open set recognition. • Dawn chorus overlaps affect FNR rather than FPR in bird sound models. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15749541
Volume :
84
Database :
Supplemental Index
Journal :
Ecological Informatics
Publication Type :
Academic Journal
Accession number :
181648138
Full Text :
https://doi.org/10.1016/j.ecoinf.2024.102927