Back to Search
Start Over
The space of models in machine learning: using Markov chains to model transitions
- Publication Year :
- 2021
- Publisher :
- Umeå universitet, Institutionen för datavetenskap, 2021.
-
Abstract
- Machine and statistical learning is about constructing models from data. Data is usually understood as a set of records, a database. Nevertheless, databases are not static but change over time. We can understand this as follows: there is a space of possible databases and a database during its lifetime transits this space. Therefore, we may consider transitions between databases, and the database space. NoSQL databases also fit with this representation. In addition, when we learn models from databases, we can also consider the space of models. Naturally, there are relationships between the space of data and the space of models. Any transition in the space of data may correspond to a transition in the space of models. We argue that a better understanding of the space of data and the space of models, as well as the relationships between these two spaces is basic for machine and statistical learning. The relationship between these two spaces can be exploited in several contexts as, e.g., in model selection and data privacy. We consider that this relationship between spaces is also fundamental to understand generalization and overfitting. In this paper, we develop these ideas. Then, we consider a distance on the space of models based on a distance on the space of data. More particularly, we consider distance distribution functions and probabilistic metric spaces on the space of data and the space of models. Our modelization of changes in databases is based on Markov chains and transition matrices. This modelization is used in the definition of distances. We provide examples of our definitions. CC BY 4.0© 2021, The Author(s).Correspondence Address: Torra, V.; School of Informatics, Sweden; email: vtorra@ieee.orgPublished: 12 April 2021Acknowledgements: This study was partially funded by Vetenskapsrådet project “Disclosure risk and transparency in big data privacy” (VR 2016-03346, 2017-2020), Spanish project TIN2017-87211-R is gratefully acknowledged, and by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.
- Subjects :
- Theoretical computer science
Distance distribution functions
Computer science
Space of data
Overfitting
Space (mathematics)
NoSQL
computer.software_genre
Artificial Intelligence
Machine learning
Machine and statistical learning models
Probabilistic metric spaces
Representation (mathematics)
Model Selection
Computer Science::Databases
Nosql database
Hypothesis space
Markov chain
Markov chains
Constructing models
Space of models
Computer Sciences
Model selection
Model transition
Transition matrices
Probabilistic logic
Statistical learning
Metric space
Datavetenskap (datalogi)
Database systems
computer
Data privacy
Distribution functions
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....dda9eed10d885243c75840ffc0da949b