1. Racing trees to query partial data
- Author
-
Vu-Linh Nguyen, Marie-Hélène Masson, Rashad Ghassani, Sébastien Destercke, Heuristique et Diagnostic des Systèmes Complexes [Compiègne] (Heudiasyc), and Université de Technologie de Compiègne (UTC)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
0209 industrial biotechnology ,Process (engineering) ,Active learning (machine learning) ,Computer science ,media_common.quotation_subject ,Decision tree ,Computational intelligence ,02 engineering and technology ,Machine learning ,computer.software_genre ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Theoretical Computer Science ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Quality (business) ,media_common ,business.industry ,Missing data ,Line (geometry) ,020201 artificial intelligence & image processing ,Geometry and Topology ,Artificial intelligence ,business ,computer ,Value (mathematics) ,Software - Abstract
International audience; Dealing with partially known or missing data is a common problem in machine learning. This work is interested in the problem of querying the true value of data to improve the quality of the learned model, when those data are only partially known. This study is in the line of active learning, since we consider that the precise value of some partial data can be queried to reduce the uncertainty in the learning process, yet can consider any kind of partial data (not only entirely missing one). We propose a querying strategy based on the concept of racing algorithms in which several models are competing. The idea is to identify the query that will help the most to quickly decide the winning model in the competition. After discussing and formalizing the general ideas of our approach, we study the particular case of decision trees in case of interval-valued features and setvalued labels. The experimental results indicate that, in comparison to other baselines, the proposed approach significantly outperforms simpler strategies in the case
- Published
- 2021
- Full Text
- View/download PDF