201. A novel visual representation method for multi-dimensional sound scene analysis in source localization problem.
- Author
-
Jung, In-Jee and Cho, Wan-Ho
- Subjects
- *
LOCALIZATION (Mathematics) , *ACOUSTIC localization , *COLOR space , *ACOUSTIC field , *DATA augmentation , *ANECHOIC chambers , *MACHINE learning - Abstract
Real-time source localization test for the moving multiple quadcopter drones. Spatial scene analysis using an open-access audio dataset for machine learning. [Display omitted] • A novel visual representation method for multi-dimensional sound scene analysis. • Representation of the estimated localization result as RGB color channels. • Encoding with any localization algorithm in the preprocessing. • Possibility of adopting any localization method in the preprocessing for encoding. • Human-interpretable dataset capable of quantitative analysis with decoding. • Exported image file including metadata for decoding. A visual representation method, DoAgram, for multi-dimensional sound scene analysis is suggested. The visual representation of the sound source localization result gives intuitive information about the estimated one to the end user. Also, image-based deep learning is popularly used in the acoustic field nowadays, so such a visualized one can be used for data augmentation. To analyze the spatial sound scene for the moving source, the method displays the estimated azimuth angle and elevation angle of the source, and its corresponding time stamp and frequencies as RGB color channels and metadata by mapping the spatial coordinate to color space. Even though the suggested method is human-interpretable, decoding is needed for the quantitative analysis. Therefore, the time and frequency scanning method, and a histogram to estimate the DoA of the source are proposed. An experiment is conducted in an anechoic chamber to localize two quadcopter drones that have a mean angular velocity of 8°/s ± 9°/s (95 % CI) and 25°/s ± 31°/s (95 % CI), respectively, and the spatial sound scene analysis is implemented using the proposed methods. The test result shows that the trajectories with respect to the time of each source are well separated. Also, an additional test is conducted using an open-access audio dataset for machine learning. The cumulative source mapping method is adopted for the spatial sound scene analysis, and the decoded result shows that the DoAgram is feasible to adopt for machine learning applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF