Start Over

Distributions-free Martingales Test Distributions-shift.

Authors :: Xi, Zepu
Chen, Hongbo
Chen, Xiaoqian
Yao, Wen
Source :: Procedia Computer Science; 2023, Vol. 222, p157-166, 10p
Publication Year :: 2023
Abstract: A standard assumption of the theory of machine learning is the data are generated from a fixed but unknown probability distribution. Although this assumption is based on the foundations of the theory of probability, however, for most learning problems we usually technically random shuffle the original datasets, such as random split into training and test datasets before the training model, to satisfy the assumption, and then we use the shuffled training dataset to train a machine learning model. However, honestly, for real-life learning applications, the data pairs are observed batch by batch under their own original order and it is not necessary to randomly shuffle the original order in advance. From a mathematical point of view, we test if the random shuffling will play a non-negligible influence on the generalization of learning machines. We reduce the problem of random shuffling into the problem of distribution-shift detection. This paper is devoted to testing the null hypothesis that random shuffling does not affect the generalization of learning machines and introduces a distributions-free martingales method against the null hypothesis. We report the five real-life benchmarks of experimental performance with the help of Support Vector Machines and a multi-layer perceptron model. The results show a bonafide fact that the distribution shift in itself of the data is an inescapable reality when we build machine learning algorithms as the original order. [ABSTRACT FROM AUTHOR]