Back to Search
Start Over
Spectral automated classification in large databases
- Source :
- UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
- Publication Year :
- 2021
- Publisher :
- Universitat Politècnica de Catalunya, 2021.
-
Abstract
- Due to the vast amount of data collected every day, there exists a need of modelling Machine Learning algorithms that are able to manipulate and link the raw data with as little human supervision as possible. One of the most popular is the Random Forest, which can be used to solve a great variety of categorization tasks. Particularly, in Astronomy millions of objects are captured by satellites and telescopes, for instance by the Gaia space mission, and the receiving signals are displayed in a spectrum. Random Forest algorithms have been proven to be a versatile and powerful tool in identifying and classifying stellar populations. In the present project, we apply a Random Forest algorithm based on spectroscopic analysis with the aim of efficiently classifying three different populations of stars of particular interest. Our main objective is to study the principle parameters and variables that affect the classification performance of the algorithm, and also to model the Random Forest to categorize observed spectra by current and future missions. We aim to obtain the best results according to the characteristics of each population, while maintaining an efficient and versatile model. To achieve that, we rely on both simulated and observed spectra to train and test the algorithm, and on quantitative metrics to measure its performance. Along this project, we have set the basis of the modelled Random Forest classifier and the preparation of the data, analyzing the theoretical classification with simulated data. We have classified with the Random Forest model a real set of spectroscopic data collected by the Sloan Digital Sky Survey, which revealed a notable agreement between the human-made and the Random Forest classifications, greatly enhanced after the application of different improvements to the algorithm. Finally, we simulated spectra of the expected observed population that will be released by the Gaia space mission, and built a Random Forest model based on it. Several improvements were introduced, but we could eventually achieve a solid model with satisfactory results. With that, we were able to classify two different sets of stellar spectra with different characteristics, maximizing the number of well classified objects while minimizing the amount of false positives.
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC)
- Accession number :
- edsair.dedup.wf.001..b854eb8770c29922c51e85ec6da4b6a8