3,065 results on '"Generalization error"'
Search Results
2. Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by Adam and AdamW optimizers
- Author
-
Mohammad Lashkari and Amin Gheibi
- Subjects
generalization error ,adam algorithm ,lipschitz constant ,Mathematics ,QA1-939 - Abstract
The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with a lower Lipschitz constant and maximum value improves the generalization of the model trained by Adam or AdamW.
- Published
- 2024
- Full Text
- View/download PDF
3. Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by Adam and AdamW optimizers.
- Author
-
Lashkari, Mohammad and Gheibi, Amin
- Subjects
GENERALIZATION ,MACHINE learning ,OPTIMIZATION algorithms ,ARTIFICIAL neural networks ,COMPUTER vision - Abstract
The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with a lower Lipschitz constant and maximum value improves the generalization of the model trained by Adam or AdamW. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Adaptive Learning of the Latent Space of Wasserstein Generative Adversarial Networks.
- Author
-
Qiu, Yixuan, Gao, Qingyi, and Wang, Xiao
- Abstract
AbstractGenerative models based on latent variables, such as generative adversarial networks (GANs) and variational auto-encoders (VAEs), have gained lots of interests due to their impressive performance in many fields. However, many data such as natural images usually do not populate the ambient Euclidean space but instead reside in a lower-dimensional manifold. Thus an inappropriate choice of the latent dimension fails to uncover the structure of the data, possibly resulting in mismatch of latent representations and poor generative qualities. Toward addressing these problems, we propose a novel framework called the latent Wasserstein GAN (LWGAN) that fuses the Wasserstein auto-encoder and the Wasserstein GAN so that the intrinsic dimension of the data manifold can be adaptively learned by a modified informative latent distribution. We prove that there exist an encoder network and a generator network in such a way that the intrinsic dimension of the learned encoding distribution is equal to the dimension of the data manifold. We theoretically establish that our estimated intrinsic dimension is a consistent estimate of the true dimension of the data manifold. Meanwhile, we provide an upper bound on the generalization error of LWGAN, implying that we force the synthetic data distribution to be similar to the real data distribution from a population perspective. Comprehensive empirical experiments verify our framework and show that LWGAN is able to identify the correct intrinsic dimension under several scenarios, and simultaneously generate high-quality synthetic data by sampling from the learned latent distribution. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Prediction and model evaluation for space–time data.
- Author
-
Watson, G. L., Reid, C. E., Jerrett, M., and Telesca, D.
- Subjects
- *
PREDICTION models , *CALIFORNIA wildfires , *SPACETIME , *AIR pollution , *INTERPOLATION - Abstract
Evaluation metrics for prediction error, model selection and model averaging on space–time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space–time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Separating hard clean samples from noisy samples with samples' learning risk for DNN when learning with noisy labels.
- Author
-
Deng, Lihui, Yang, Bo, Kang, Zhongfeng, Wu, Jiajin, Li, Shaosong, and Xiang, Yanping
- Subjects
ARTIFICIAL neural networks ,MEMORIZATION ,SOURCE code - Abstract
Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples' Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples' learning risk, which represents samples' influence on DNN's accuracy. We show that samples' learning risk is comprehensively determined by samples' learning difficulty as well as samples' feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples' learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Fundamentals of Surrogate Modeling and Surrogate-Assisted Optimization
- Author
-
Pietrenko-Dabrowska, Anna, Koziel, Slawomir, Pietrenko-Dabrowska, Anna, and Koziel, Slawomir
- Published
- 2024
- Full Text
- View/download PDF
8. Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels
- Author
-
Lihui Deng, Bo Yang, Zhongfeng Kang, Jiajin Wu, Shaosong Li, and Yanping Xiang
- Subjects
Learning with noisy labels ,Deep neural networks ,Generalization error ,Learning risk ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples’ Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples’ learning risk, which represents samples’ influence on DNN’s accuracy . We show that samples’ learning risk is comprehensively determined by samples’ learning difficulty as well as samples’ feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples’ learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL .
- Published
- 2024
- Full Text
- View/download PDF
9. Bounding the Rademacher complexity of Fourier neural operators.
- Author
-
Kim, Taeyoung and Kang, Myungjoo
- Subjects
NONLINEAR operators ,FUNCTION spaces ,OPERATOR functions ,MACHINE learning ,GENERALIZATION - Abstract
Recently, several types of neural operators have been developed, including deep operator networks, graph neural operators, and Multiwavelet-based operators. Compared with these models, the Fourier neural operator (FNO), a physics-inspired machine learning method, is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. This study investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the model. In addition, we investigate the correlation between the empirical generalization error and the proposed capacity of FNO. We infer that the type of group norm determines the information about the weights and architecture of the FNO model stored in capacity. The experimental results offer insight into the impact of the number of modes used in the FNO model on the generalization error. The results confirm that our capacity is an effective index for estimating generalization errors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Exploring the Learning Difficulty of Data: Theory and Measure.
- Author
-
Zhu, Weiyao, Wu, Ou, Su, Fengguang, and Deng, Yingjun
- Subjects
LEARNING strategies ,MACHINE learning - Abstract
"Easy/hard sample" is a popular parlance in machine learning. Learning difficulty of samples refers to how easy/hard a sample is during a learning procedure. An increasing need of measuring learning difficulty demonstrates its importance in machine learning (e.g., difficulty-based weighting learning strategies). Previous literature has proposed a number of learning difficulty measures. However, no comprehensive investigation for learning difficulty is available to date, resulting in that nearly all existing measures are heuristically defined without a rigorous theoretical foundation. This study attempts to conduct a pilot theoretical study for learning difficulty of samples. First, influential factors for learning difficulty are summarized. Under various situations conducted by summarized influential factors, correlations between learning difficulty and two vital criteria of machine learning, namely, generalization error and model complexity, are revealed. Second, a theoretical definition of learning difficulty is proposed on the basis of these two criteria. A practical measure of learning difficulty is proposed under the direction of the theoretical definition by importing the bias-variance trade-off theory. Subsequently, the rationality of theoretical definition and the practical measure are demonstrated, respectively, by analysis of several classical weighting methods and abundant experiments realized under all situations conducted by summarized influential factors. The mentioned weighting methods can be reasonably explained under the proposed theoretical definition and concerned propositions. The comparison in these experiments indicates that the proposed measure significantly outperforms the other measures throughout the experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. A Priori Error Estimate of Deep Mixed Residual Method for Elliptic PDEs.
- Author
-
Li, Lingfeng, Tai, Xue-Cheng, Yang, Jiang, and Zhu, Quanhui
- Abstract
In this work, we derive a priori error estimate of the deep mixed residual method (DMRM) when solving some elliptic partial differential equations (PDEs). DMRM is a new deep-learning based method for solving PDEs and it has been shown to be efficient and accurate in previous studies. Our work is the first theoretical study of DMRM. We prove that the neural network solutions will converge if we increase the training samples and network size without any constraint on the ratio of training samples to the network size. Besides, our results suggest that the DMRM can approximate the Laplacian of the solution by the intermediate auxiliary variable, which is dismissing in the deep Ritz method. It is verified by the numerical experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Conductivity Imaging from Internal Measurements with Mixed Least-Squares Deep Neural Networks.
- Author
-
Bangti Jin, Xiyao Li, Qimeng Quan, and Zhi Zhou
- Subjects
ARTIFICIAL neural networks ,PROBLEM solving ,ELECTRICAL conductivity measurement ,DEEP learning - Abstract
In this work, we develop a novel approach using deep neural networks (DNNs) to reconstruct the conductivity distribution in elliptic problems from one measurement of the solution over the whole domain. The approach is based on a mixed reformulation of the governing equation and utilizes the standard least-squares objective, with DNNs as ansatz functions to approximate the conductivity and flux simultaneously. We provide a thorough analysis of the DNN approximations of the conductivity for both continuous and empirical losses, including rigorous error estimates that are explicit in terms of the noise level, various penalty parameters, and neural network architectural parameters (depth, width, and parameter bounds). We also provide multiple numerical experiments in two dimensions and multidimensions to illustrate distinct features of the approach, e.g., excellent stability with respect to data noise and capability of solving high-dimensional problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Statistical Learning
- Author
-
Gao, Qingyi, Wang, Xiao, Merkle, Dieter, Managing Editor, Merkle, Dieter, Managing Editor, and Pham, Hoang, editor
- Published
- 2023
- Full Text
- View/download PDF
14. Understanding Difficulty-Based Sample Weighting with a Universal Difficulty Measure
- Author
-
Zhou, Xiaoling, Wu, Ou, Zhu, Weiyao, Liang, Ziyang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amini, Massih-Reza, editor, Canu, Stéphane, editor, Fischer, Asja, editor, Guns, Tias, editor, Kralj Novak, Petra, editor, and Tsoumakas, Grigorios, editor
- Published
- 2023
- Full Text
- View/download PDF
15. A prospective study of the randomized forest approach to predict the effectiveness of art healing in the treatment of depression
- Author
-
Ma Di and Ma Tianhe
- Subjects
random forest ,c4.5 decision tree ,generalization error ,roc curve ,depression treatment ,68t09 ,Mathematics ,QA1-939 - Abstract
Art healing methods have gradually become a new outlet for the treatment of depression in recent years. In this paper, an expressive art healing method is proposed based on the condition of depression, and the treatment effect is predicted using the randomized forest method. The direction of influencing factor selection for depression is proposed from three perspectives: demographics, physical and cognitive health status, and mental health. After determining the influencing factors, the samples of depression were extracted, and the classification tree model was constructed using the C4.5 decision tree algorithm. The random forest mode classifier’s generalization error is identified, and the model’s overfitting problem is prevented by restricting the random forest’s convergence. Finally, the evaluation index of the model is proposed, and the prediction accuracy and effect of the random forest model are investigated through the combination of simulation experiments and practical application. The area under the ROC curve shows that the number of cases correctly categorized by the random forest prediction model before and after art healing is 456 and 468, respectively, and the AUC values are 0.8145 and 0.8265, respectively, which makes the model’s prediction ability better. From the prediction results of the random forest model, the intervention group has a significant difference from the control group in two aspects of paranoia level and horror mood, with p-values of 0.01 and 0.001, respectively, after the prediction of the random forest model has a better effect on the intervention of depression.
- Published
- 2024
- Full Text
- View/download PDF
16. LapRamp: a noise resistant classification algorithm based on manifold regularization.
- Author
-
Liang, Xijun, Yu, Qi, Zhang, Kaili, Zeng, Pan, and Jian, Ling
- Subjects
CLASSIFICATION algorithms ,NOISE ,DATA quality ,GENERALIZATION - Abstract
Abstract: In many applications, data samples contain incorrect labels due to data quality issues and the high cost of labeling. Although current noise-resistant classification algorithms can handle specific types of label noise, identifying the type of noise present in the given data is challenging. To address this issue, we propose a robust classification method called LapRamp, which works with multiple types of label noise. LapRamp utilizes the ramp loss function to minimize the impact of mislabeled samples far from the discriminant surface. Additionally, we incorporate manifold regularization to capture the inherent geometric structure of the data. We analyze the generalization error bound of the model in terms of Rademacher complexity, and the preliminary experimental results indicate that LapRamp has good generalization performance despite the presence of mixed label noise. Furthermore, it demonstrates stable classification accuracy when dealing with noisy labels in various scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Robust supervised learning with coordinate gradient descent.
- Author
-
Merad, Ibrahim and Gaïffas, Stéphane
- Abstract
This paper considers the problem of supervised learning with linear methods when both features and labels can be corrupted, either in the form of heavy tailed data and/or corrupted rows. We introduce a combination of coordinate gradient descent as a learning algorithm together with robust estimators of the partial derivatives. This leads to robust statistical learning methods that have a numerical complexity nearly identical to non-robust ones based on empirical risk minimization. The main idea is simple: while robust learning with gradient descent requires the computational cost of robustly estimating the whole gradient to update all parameters, a parameter can be updated immediately using a robust estimator of a single partial derivative in coordinate gradient descent. We prove upper bounds on the generalization error of the algorithms derived from this idea, that control both the optimization and statistical errors with and without a strong convexity assumption of the risk. Finally, we propose an efficient implementation of this approach in a new Python library called linlearn, and demonstrate through extensive numerical experiments that our approach introduces a new interesting compromise between robustness, statistical performance and numerical efficiency for this problem. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. The role of mutual information in variational classifiers.
- Author
-
Vera, Matias, Rey Vega, Leonardo, and Piantanida, Pablo
- Subjects
STIMULUS generalization ,BOLTZMANN machine ,GENERALIZATION ,INFORMATION theory - Abstract
Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various—sometimes based on heuristics—regularization techniques, which are motivated by upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings which are trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds provide an information-theoretic understanding of generalization in the so-called class of variational classifiers, which are regularized by a Kullback–Leibler (KL) divergence term. These results give theoretical grounds for the highly popular KL term in variational inference methods that was already recognized to act effectively as a regularization penalty. We further observe connections with well studied notions such as Variational Autoencoders, Information Dropout, Information Bottleneck and Boltzmann Machines. Finally, we perform numerical experiments on MNIST, CIFAR and other datasets and show that mutual information is indeed highly representative of the behaviour of the generalization error. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. CNNs Avoid the Curse of Dimensionality by Learning on Patches
- Author
-
Vamshi C. Madala, Shivkumar Chandrasekaran, and Jason Bunk
- Subjects
A priori analysis ,convolutional neural networks ,curse of dimensionality ,generalization error ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Despite the success of convolutional neural networks (CNNs) in numerous computer vision tasks and their extraordinary generalization performances, several attempts to predict the generalization errors of CNNs have only been limited to a posteriori analyses thus far. A priori theories explaining the generalization performances of deep neural networks have mostly ignored the convolutionality aspect and do not specify why CNNs are able to seemingly overcome curse of dimensionality on computer vision tasks like image classification where the image dimensions are in thousands. Our work attempts to explain the generalization performance of CNNs on image classification under the hypothesis that CNNs operate on the domain of image patches. Ours is the first work we are aware of to derive an a priori error bound for the generalization error of CNNs and we present both quantitative and qualitative evidences in the support of our theory. Our patch-based theory also offers explanation for why data augmentation techniques like Cutout, CutMix and random cropping are effective in improving the generalization error of CNNs.
- Published
- 2023
- Full Text
- View/download PDF
20. Diametrical Risk Minimization: theory and computations.
- Author
-
Norton, Matthew D. and Royset, Johannes O.
- Subjects
GENERALIZATION ,NEIGHBORHOODS - Abstract
The theoretical and empirical performance of Empirical Risk Minimization (ERM) often suffers when loss functions are poorly behaved with large Lipschitz moduli and spurious sharp minimizers. We propose and analyze a counterpart to ERM called Diametrical Risk Minimization (DRM), which accounts for worst-case empirical risks within neighborhoods in parameter space. DRM has generalization bounds that are independent of Lipschitz moduli for convex as well as nonconvex problems and it can be implemented using a practical algorithm based on stochastic gradient descent. Numerical results illustrate the ability of DRM to find quality solutions with low generalization error in sharp empirical risk landscapes from benchmark neural network classification problems with corrupted labels. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Theoretical bounds of generalization error for generalized extreme learning machine and random vector functional link network.
- Author
-
Kim, Meejoung
- Subjects
- *
MACHINE learning , *GENERALIZATION , *LEAST squares , *LINEAR network coding , *NETWORK performance , *BIG data - Abstract
Ensuring the prediction accuracy of a learning algorithm on a theoretical basis is crucial and necessary for building the reliability of the learning algorithm. This paper analyzes prediction error obtained through the least square estimation in the generalized extreme learning machine (GELM), which applies the limiting behavior of the Moore–Penrose generalized inverse (M–P GI) to the output matrix of ELM. ELM is the random vector functional link (RVFL) network without direct input to output links Specifically, we analyze tail probabilities associated with upper and lower bounds to the error expressed by norms. The analysis employs the concepts of the L 2 norm, the Frobenius norm, the stable rank, and the M–P GI. The coverage of theoretical analysis extends to the RVFL network. In addition, a criterion for more precise bounds of prediction errors that may give stochastically better network environments is provided. The analysis is applied to simple examples and large-size datasets to illustrate the procedure and verify the analysis and execution speed with big data. Based on this study, we can immediately obtain the upper and lower bounds of prediction errors and their associated tail probabilities through matrices calculations appearing in the GELM and RVFL. This analysis provides criteria for the reliability of the learning performance of a network in real-time and for network structure that enables obtaining better performance reliability. This analysis can be applied in various areas where the ELM and RVFL are adopted. The proposed analytical method will guide the theoretical analysis of errors occurring in DNNs, which employ a gradient descent algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Improving the Performance and Stability of TIC and ICE.
- Author
-
Ward, Tyler
- Subjects
- *
AKAIKE information criterion - Abstract
Takeuchi's Information Criterion (TIC) was introduced as a generalization of Akaike's Information Criterion (AIC) in 1976. Though TIC avoids many of AIC's strict requirements and assumptions, it is only rarely used. One of the reasons for this is that the trace term introduced in TIC is numerically unstable and computationally expensive to compute. An extension of TIC called ICE was published in 2021, which allows this trace term to be used for model fitting (where it was primarily compared to L2 regularization) instead of just model selection. That paper also examined numerically stable and computationally efficient approximations that could be applied to TIC or ICE, but these approximations were only examined on small synthetic models. This paper applies and extends these approximations to larger models on real datasets for both TIC and ICE. This work shows the practical models may use TIC and ICE in a numerically stable way to achieve superior results at a reasonable computational cost. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Optimality of the rescaled pure greedy learning algorithms.
- Author
-
Zhang, Wenhui, Ye, Peixin, and Xing, Shuo
- Subjects
- *
MACHINE learning , *GREEDY algorithms , *COMPUTATIONAL complexity , *KERNEL operating systems - Abstract
We propose the Rescaled Pure Greedy Learning Algorithm (RPGLA) for solving the kernel-based regression problem. The computational complexity of the RPGLA is less than the Orthogonal Greedy Learning Algorithm (OGLA) and Relaxed Greedy Learning Algorithm (RGLA). We obtain the convergence rates of the RPGLA for continuous kernels. When the kernel is infinitely smooth, we derive a convergence rate that can be arbitrarily close to the best rate O (m − 1) under a mild assumption of the regression function. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. Generalization error guaranteed auto-encoder-based nonlinear model reduction for operator learning.
- Author
-
Liu, Hao, Dahal, Biraj, Lai, Rongjie, and Liao, Wenjing
- Subjects
- *
NONLINEAR partial differential operators , *NONLINEAR differential equations , *PARTIAL differential equations , *ESTIMATION theory , *ERROR analysis in mathematics - Abstract
Many physical processes in science and engineering are naturally represented by operators between infinite-dimensional function spaces. The problem of operator learning, in this context, seeks to extract these physical processes from empirical data, which is challenging due to the infinite or high dimensionality of data. An integral component in addressing this challenge is model reduction, which reduces both the data dimensionality and problem size. In this paper, we utilize low-dimensional nonlinear structures in model reduction by investigating Auto-Encoder-based Neural Network (AENet). AENet first learns the latent variables of the input data and then learns the transformation from these latent variables to corresponding output data. Our numerical experiments validate the ability of AENet to accurately learn the solution operator of nonlinear partial differential equations. Furthermore, we establish a mathematical and statistical estimation theory that analyzes the generalization error of AENet. Our theoretical framework shows that the sample complexity of training AENet is intricately tied to the intrinsic dimension of the modeled process, while also demonstrating the robustness of AENet to noise. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
25. Deep learning based on randomized quasi-Monte Carlo method for solving linear Kolmogorov partial differential equation.
- Author
-
Xiao, Jichang, Fu, Fengjiang, and Wang, Xiaoqun
- Subjects
- *
DEEP learning , *PARTIAL differential equations , *MACHINE learning , *APPROXIMATION error , *MONTE Carlo method - Abstract
Deep learning algorithms have been widely used to solve linear Kolmogorov partial differential equations (PDEs) in high dimensions, where the loss function is defined as a mathematical expectation. We propose to use the randomized quasi-Monte Carlo (RQMC) method instead of the Monte Carlo (MC) method for computing the loss function. In theory, we decompose the error from empirical risk minimization (ERM) into the generalization error and the approximation error. Notably, the approximation error is independent of the sampling methods. We prove that the convergence order of the mean generalization error for the RQMC method is O (n − 1 + ϵ) for arbitrarily small ϵ > 0 , while for the MC method it is O (n − 1 / 2 + ϵ) for arbitrarily small ϵ > 0. Consequently, we find that the overall error for the RQMC method is asymptotically smaller than that for the MC method as n increases. Our numerical experiments show that the algorithm based on the RQMC method consistently achieves smaller relative L 2 error than that based on the MC method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Towards a robust out-of-the-box neural network model for genomic data
- Author
-
Zhaoyi Zhang, Songyang Cheng, and Claudia Solis-Lemus
- Subjects
Generalization error ,Phenotype prediction ,Convolutional ,Natural language processing ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. Results Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. Conclusions While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers.
- Published
- 2022
- Full Text
- View/download PDF
27. Online machine learning modeling and predictive control of nonlinear systems with scheduled mode transitions.
- Author
-
Hu, Cheng, Cao, Yuan, and Wu, Zhe
- Subjects
PREDICTIVE control systems ,ONLINE education ,RECURRENT neural networks ,SLIDING mode control ,REAL-time computing ,PREDICTION models ,MACHINE learning - Abstract
This work develops a model predictive control (MPC) scheme using online learning of recurrent neural network (RNN) models for nonlinear systems switched between multiple operating regions following a prescribed switching schedule. Specifically, an RNN model is initially developed offline to model process dynamics using the historical operational data collected in a small region around a certain steady‐state. After the system is switched to another operating region under a Lyapunov‐based MPC with suitable constraints to ensure satisfaction of the prescribed switching schedule policy, RNN models are updated using real‐time process data to improve closed‐loop performance. A generalization error bound is derived for the updated RNN models using the notion of regret, and closed‐loop stability results are established for the switched nonlinear system under RNN‐based MPC. Finally, a chemical process example with the operation schedule that requires switching between two steady‐states is used to demonstrate the effectiveness of the proposed RNN‐MPC scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. Analysis of Kernel Matrices via the von Neumann Entropy and Its Relation to RVM Performances.
- Author
-
Belanche-Muñoz, Lluís A. and Wiejacha, Małgorzata
- Subjects
- *
QUANTUM entropy , *KERNEL functions , *MATRICES (Mathematics) , *GENERATING functions - Abstract
Kernel methods have played a major role in the last two decades in the modeling and visualization of complex problems in data science. The choice of kernel function remains an open research area and the reasons why some kernels perform better than others are not yet understood. Moreover, the high computational costs of kernel-based methods make it extremely inefficient to use standard model selection methods, such as cross-validation, creating a need for careful kernel design and parameter choice. These reasons justify the prior analyses of kernel matrices, i.e., mathematical objects generated by the kernel functions. This paper explores these topics from an entropic standpoint for the case of kernelized relevance vector machines (RVMs), pinpointing desirable properties of kernel matrices that increase the likelihood of obtaining good model performances in terms of generalization power, as well as relate these properties to the model's fitting ability. We also derive a heuristic for achieving close-to-optimal modeling results while keeping the computational costs low, thus providing a recipe for efficient analysis when processing resources are limited. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Generalization bounds for sparse random feature expansions.
- Author
-
Hashemi, Abolfazl, Schaeffer, Hayden, Shi, Robert, Topcu, Ufuk, Tran, Giang, and Ward, Rachel
- Subjects
- *
SCIENCE education , *GENERALIZATION , *FUNCTION spaces , *MACHINE learning , *RANDOM forest algorithms - Abstract
Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications. We introduce the sparse random feature expansion to obtain parsimonious random feature models. We leverage ideas from compressive sensing to generate random feature expansions with theoretical guarantees even in the data-scarce setting. We provide generalization bounds for functions in a certain class depending on the number of samples and the distribution of features. By introducing sparse features, i.e. features with random sparse weights, we provide improved bounds for low order functions. We show that our method outperforms shallow networks in several scientific machine learning tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Estimates on the generalization error of physics-informed neural networks for approximating PDEs.
- Author
-
Mishra, Siddhartha and Molinaro, Roberto
- Subjects
- *
GENERALIZATION - Abstract
Physics-informed neural networks (PINNs) have recently been widely used for robust and accurate approximation of partial differential equations (PDEs). We provide upper bounds on the generalization error of PINNs approximating solutions of the forward problem for PDEs. An abstract formalism is introduced and stability properties of the underlying PDE are leveraged to derive an estimate for the generalization error in terms of the training error and number of training samples. This abstract framework is illustrated with several examples of nonlinear PDEs. Numerical experiments, validating the proposed theory, are also presented. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Robust Methods for High-Dimensional Linear Learning.
- Author
-
Merad, Ibrahim and Gäıffas, Stéphane
- Subjects
- *
MACHINE learning , *PYTHON programming language , *LOW-rank matrices , *VANILLA , *SAMPLE size (Statistics) - Abstract
We propose statistically robust and computationally efficient linear learning methods in the highdimensional batch setting, where the number of features d may exceed the sample size n. We employ, in a generic learning setting, two algorithms depending on whether the considered loss function is gradient-Lipschitz or not. Then, we instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery. This leads, for each application, to efficient and robust learning algorithms, that reach near-optimal estimation rates under heavy-tailed distributions and the presence of outliers. For vanilla s-sparsity, we are able to reach the s log(d)=n rate under heavy-tails and η-corruption, at a computational cost comparable to that of non-robust analogs. We provide an efficient implementation of our algorithms in an open-source Python library called linlearn, by means of which we carry out numerical experiments which confirm our theoretical findings together with a comparison to other recent approaches proposed in the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2023
32. Dropout Training is Distributionally Robust Optimal.
- Author
-
Blanchet, José, Yang Kang, Montiel Olea, José Luis, Viet Anh Nguyen, and Xuhui Zhang
- Subjects
- *
ERRORS-in-variables models , *ZERO sum games , *STATISTICIANS - Abstract
This paper shows that dropout training in generalized linear models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician's covariates using a multiplicative nonparametric errors-in-variables model. In this game, nature's least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some fixed probability δ. This result implies that dropout training indeed provides out-of-sample expected loss guarantees for distributions that arise from multiplicative perturbations of in-sample data. The paper makes a concrete recommendation on how to select the tuning parameter δ. The paper also provides a novel, parallelizable, unbiased multi-level Monte Carlo algorithm to speed-up the implementation of dropout training. Our algorithm has a much smaller computational cost compared to the naive implementation of dropout, provided the number of data points is much smaller than the dimension of the covariate vector. [ABSTRACT FROM AUTHOR]
- Published
- 2023
33. On generalization error of neural network models and its application to predictive control of nonlinear processes.
- Author
-
Alhajeri, Mohammed S., Alnajdi, Aisha, Abdullah, Fahim, and Christofides, Panagiotis D.
- Subjects
- *
ARTIFICIAL neural networks , *PREDICTIVE control systems , *MACHINE learning , *RECURRENT neural networks , *STIMULUS generalization , *NONLINEAR dynamical systems - Abstract
In order to approximate nonlinear dynamic systems utilizing time-series data, recurrent neural networks (RNNs) and long short-term memory (LSTM) networks have frequently been used. The training error of neural networks may often be made suitably modest; however, the accuracy can be further improved by incorporating prior knowledge in the construction of machine learning-based models. Specifically, physics-based RNN modeling has yielded more reliable RNN models than traditional RNNs. Yet, a framework for constructing and assessing the generalization ability of such RNN models as well as LSTM models to be utilized in model predictive control (MPC) systems is lacking. In this work, we develop a methodological framework to quantify the generalization error bounds for partially-connected RNNs and LSTM models. The partially-connected RNN model is then utilized to predict the state evolution in a MPC scheme. We illustrate through open-loop and closed-loop simulations of a nonlinear chemical process of two reactors-in-series that the proposed approach provides a flexible framework for leveraging both prior knowledge and data, thereby improving the performance significantly when compared to a fully-connected modeling approach under Lyapunov-based MPC. • ML model generalization error bounds are calculated. • Fully-connected and partially-connected ML models are studied. • ML model is used in model predictive control. • Application to a chemical process example is carried out. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. 基于结构误差的图卷积网络.
- Author
-
吴琳, 许茹玉, 粟兴旺, 黄金玻王, and 晓明
- Subjects
- *
SUPPORT vector machines , *ERROR analysis in mathematics , *GENERALIZATION , *OVERTRAINING , *DEEP learning , *MATHEMATICAL convolutions - Abstract
In view of the problems that select cross entropy as a loss function in a graph convolution network may lead to the over-training and the weak generalization ability of the model in a small sample data sets, this paper proposed a graph convolution network based on structural error. Using the improved support vector machine (SVM) as the classifier of the graph convolution network could reduce the risk of over-fitting of the model. Based on the generalization error theory of SVM,improving the loss function of SVM,the proposed method maximized the interval of different samples and limited the interval of similar samples, improved the generalization ability of the model. Firstly, it calculated the average distance from the feature vector to the center point in the feature space, used it to approximately replace the radius of the sphere, and then the new loss function would guide model learning. Experiments on the NTU RGB+D60 and NTU RGB+D120 datasets in the field of behavior recognition based on skeleton prove that compared with the traditional graph convolution network model, the proposed method can obviously improve the recognition accuracy and has better generalization performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Bridging the Gap Between Few-Shot and Many-Shot Learning via Distribution Calibration.
- Author
-
Yang, Shuo, Wu, Songhua, Liu, Tongliang, and Xu, Min
- Subjects
- *
GAUSSIAN distribution , *CALIBRATION , *APPROXIMATION error , *BRIDGES , *DATA distribution - Abstract
A major gap between few-shot and many-shot learning is the data distribution empirically oserved by the model during training. In few-shot learning, the learned model can easily become over-fitted based on the biased distribution formed by only a few training examples, while the ground-truth data distribution is more accurately uncovered in many-shot learning to learn a well-generalized model. In this paper, we propose to calibrate the distribution of these few-sample classes to be more unbiased to alleviate such an over-fitting problem. The distribution calibration is achieved by transferring statistics from the classes with sufficient examples to those few-sample classes. After calibration, an adequate number of examples can be sampled from the calibrated distribution to expand the inputs to the classifier. Specifically, we assume every dimension in the feature representation from the same class follows a Gaussian distribution so that the mean and the variance of the distribution can borrow from that of similar classes whose statistics are better estimated with an adequate number of samples. Extensive experiments on three datasets, miniImageNet, tieredImageNet, and CUB, show that a simple linear classifier trained using the features sampled from our calibrated distribution can outperform the state-of-the-art accuracy by a large margin. Besides the favorable performance, the proposed method also exhibits high flexibility by showing consistent accuracy improvement when it is built on top of any off-the-shelf pretrained feature extractors and classification models without extra learnable parameters. The visualization of these generated features demonstrates that our calibrated distribution is an accurate estimation thus the generalization ability gain is convincing. We also establish a generalization error bound for the proposed distribution-calibration-based few-shot learning, which consists of the distribution assumption error, the distribution approximation error, and the estimation error. This generalization error bound theoretically justifies the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Lower Bounds on the Generalization Error of Nonlinear Learning Models.
- Author
-
Seroussi, Inbar and Zeitouni, Ofer
- Subjects
- *
GENERALIZATION , *RANDOM matrices , *ARTIFICIAL neural networks , *COMPLEXITY (Philosophy) - Abstract
We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We derive explicit generalization lower bounds for general biased estimators, in the cases of two-layered networks. For linear activation function, the bound is asymptotically tight. In the nonlinear case, we provide a comparison of our bounds with an empirical study of the stochastic gradient descent algorithm. In addition, we derive bounds for unbiased estimators, which show that the latter have unacceptable performance for truly nonlinear networks. The analysis uses elements from the theory of large random matrices. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. Improved Information-Theoretic Generalization Bounds for Distributed, Federated, and Iterative Learning †.
- Author
-
Barnes, Leighton Pate, Dytso, Alex, and Poor, Harold Vincent
- Subjects
- *
GENERALIZATION , *STATISTICAL learning , *STATISTICAL errors - Abstract
We consider information-theoretic bounds on the expected generalization error for statistical learning problems in a network setting. In this setting, there are K nodes, each with its own independent dataset, and the models from the K nodes have to be aggregated into a final centralized model. We consider both simple averaging of the models as well as more complicated multi-round algorithms. We give upper bounds on the expected generalization error for a variety of problems, such as those with Bregman divergence or Lipschitz continuous losses, that demonstrate an improved dependence of 1 / K on the number of nodes. These "per node" bounds are in terms of the mutual information between the training dataset and the trained weights at each node and are therefore useful in describing the generalization properties inherent to having communication or privacy constraints at each node. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Application of convex hull analysis for the evaluation of data heterogeneity between patient populations of different origin and implications of hospital bias in downstream machine-learning-based data processing: A comparison of 4 critical-care patient datasets
- Author
-
Konstantin Sharafutdinov, Jayesh S. Bhat, Sebastian Johannes Fritsch, Kateryna Nikulina, Moein E. Samadi, Richard Polzin, Hannah Mayer, Gernot Marx, Johannes Bickenbach, and Andreas Schuppert
- Subjects
dataset-bias ,data pooling ,ARDS ,convex hull (CH) ,generalization error ,Information technology ,T58.5-58.64 - Abstract
Machine learning (ML) models are developed on a learning dataset covering only a small part of the data of interest. If model predictions are accurate for the learning dataset but fail for unseen data then generalization error is considered high. This problem manifests itself within all major sub-fields of ML but is especially relevant in medical applications. Clinical data structures, patient cohorts, and clinical protocols may be highly biased among hospitals such that sampling of representative learning datasets to learn ML models remains a challenge. As ML models exhibit poor predictive performance over data ranges sparsely or not covered by the learning dataset, in this study, we propose a novel method to assess their generalization capability among different hospitals based on the convex hull (CH) overlap between multivariate datasets. To reduce dimensionality effects, we used a two-step approach. First, CH analysis was applied to find mean CH coverage between each of the two datasets, resulting in an upper bound of the prediction range. Second, 4 types of ML models were trained to classify the origin of a dataset (i.e., from which hospital) and to estimate differences in datasets with respect to underlying distributions. To demonstrate the applicability of our method, we used 4 critical-care patient datasets from different hospitals in Germany and USA. We estimated the similarity of these populations and investigated whether ML models developed on one dataset can be reliably applied to another one. We show that the strongest drop in performance was associated with the poor intersection of convex hulls in the corresponding hospitals' datasets and with a high performance of ML methods for dataset discrimination. Hence, we suggest the application of our pipeline as a first tool to assess the transferability of trained models. We emphasize that datasets from different hospitals represent heterogeneous data sources, and the transfer from one database to another should be performed with utmost care to avoid implications during real-world applications of the developed models. Further research is needed to develop methods for the adaptation of ML models to new hospitals. In addition, more work should be aimed at the creation of gold-standard datasets that are large and diverse with data from varied application sites.
- Published
- 2022
- Full Text
- View/download PDF
39. Generalization Performance Comparison of Machine Learners for the Detection of Computer Worms Using Behavioral Features
- Author
-
Ochieng, Nelson, Mwangi, Waweru, Ateya, Ismail, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Pant, Millie, editor, Kumar Sharma, Tarun, editor, Arya, Rajeev, editor, Sahana, B.C., editor, and Zolfagharinia, Hossein, editor
- Published
- 2020
- Full Text
- View/download PDF
40. Att-ConvLSTM: PM2.5 Prediction Model and Application
- Author
-
Xu, Zhe, Lv, Yi, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Liu, Yong, editor, Wang, Lipo, editor, Zhao, Liang, editor, and Yu, Zhengtao, editor
- Published
- 2020
- Full Text
- View/download PDF
41. Basics of Data-Driven Surrogate Modeling
- Author
-
Koziel, Slawomir, Pietrenko-Dabrowska, Anna, Koziel, Slawomir, and Pietrenko-Dabrowska, Anna
- Published
- 2020
- Full Text
- View/download PDF
42. Generalization error of random feature and kernel methods: Hypercontractivity and kernel matrix concentration.
- Author
-
Mei, Song, Misiakiewicz, Theodor, and Montanari, Andrea
- Subjects
- *
ARTIFICIAL neural networks , *GENERALIZATION , *SUPERVISED learning , *APPROXIMATION error , *VECTOR data - Abstract
Consider the classical supervised learning problem: we are given data (y i , x i) , i ≤ n , with y i a response and x i ∈ X a covariates vector, and try to learn a model f ˆ : X → R to predict future responses. Random feature methods map the covariates vector x i to a point ϕ (x i) in a higher dimensional space R N , via a random featurization map ϕ . We study the use of random feature methods in conjunction with ridge regression in the feature space R N. This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime. We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random feature ridge regression. In particular, we address two fundamental questions: (1) What is the generalization error of KRR? (2) How big N should be for the random feature approximation to achieve the same error as KRR? In this setting, we prove that KRR is well approximated by a projection onto the top ℓ eigenfunctions of the kernel, where ℓ depends on the sample size n. We show that the test error of random feature ridge regression is dominated by its approximation error and is larger than the error of KRR as long as N ≤ n 1 − δ for some δ > 0. We characterize this gap. For N ≥ n 1 + δ , random features achieve the same error as the corresponding KRR, and further increasing N does not lead to a significant change in test error. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. Full error analysis for the training of deep neural networks.
- Author
-
Beck, Christian, Jentzen, Arnulf, and Kuckuck, Benno
- Subjects
- *
MACHINE learning , *DEEP learning , *ARTIFICIAL neural networks , *MATHEMATICAL errors , *ERROR analysis in mathematics , *APPROXIMATION error - Abstract
Deep learning algorithms have been applied very successfully in recent years to a range of problems out of reach for classical solution paradigms. Nevertheless, there is no completely rigorous mathematical error and convergence analysis which explains the success of deep learning algorithms. The error of a deep learning algorithm can in many situations be decomposed into three parts, the approximation error, the generalization error, and the optimization error. In this work we estimate for a certain deep learning algorithm each of these three errors and combine these three error estimates to obtain an overall error analysis for the deep learning algorithm under consideration. In particular, we thereby establish convergence with a suitable convergence speed for the overall error of the deep learning algorithm under consideration. Our convergence speed analysis is far from optimal and the convergence speed that we establish is rather slow, increases exponentially in the dimensions, and, in particular, suffers from the curse of dimensionality. The main contribution of this work is, instead, to provide a full error analysis (i) which covers each of the three different sources of errors usually emerging in deep learning algorithms and (ii) which merges these three sources of errors into one overall error estimate for the considered deep learning algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Confidence intervals for the random forest generalization error.
- Author
-
Marques F., Paulo C.
- Subjects
- *
GENERALIZATION , *FOREST productivity , *RANDOM forest algorithms , *CONFIDENCE intervals , *STOCHASTIC processes , *CONSTRUCTION costs , *SAMPLE size (Statistics) - Abstract
• A simple way to construct confidence intervals for the random forest generalization error. • Low computational cost: no model retraining or data splitting. • Quantification of the confidence on the generalization capacity beyond point estimates. We show that the byproducts of the standard training process of a random forest yield not only the well known and almost computationally free out-of-bag point estimate of the model generalization error, but also open a direct path to compute confidence intervals for the generalization error which avoids processes of data splitting and model retraining. Besides the low computational cost involved in their construction, these confidence intervals are shown through simulations to have good coverage and appropriate shrinking rate of their width in terms of the training sample size. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Data‐driven storage operations: Cross‐commodity backtest and structured policies.
- Author
-
Mandl, Christian, Nadarajah, Selvaprabu, Minner, Stefan, and Gavirneni, Srinagesh
- Subjects
STORAGE facilities ,COMMODITY exchanges ,FEATURE selection ,COMMODITY futures ,STORAGE ,MARKOV processes - Abstract
Storage assets are critical for physical trading of commodities under volatile prices. State‐of‐the‐art methods for managing storage facilities such as the reoptimization heuristic (RH), which are part of commercial software, approximate a Markov Decision Process (MDP) assuming full information regarding the state and the stochastic commodity price process and hence suffer from informational inconsistencies with observed price data and structural inconsistencies with the true optimal policy, which are both components of generalization error. Focusing on spot trades, we find via an extensive backtest that this error can lead to significantly suboptimal RH policies. We develop a forward‐looking data‐driven approach (DDA) to learn policies and reduce generalization error. This approach extends standard (backward‐looking) DDA in two ways: (i) It represents historical and estimated future profits as functions of features in the training objective, which typically includes only past profits; and (ii) it enforces structural properties of the optimal policy. To elaborate, DDA trains parameters of bang‐bang and base‐stock policies, respectively, using linear‐ and mixed‐integer programs, thereby extending known DDAs that parameterize decisions as functions of features without policy structure. We backtest the performance of RH and DDA on six major commodities, employing feature selection across data from Reuters, Bloomberg, and other public data sets. DDA can improve RH on real data, with policy structure needed to realize this improvement. Our research advances the state‐of‐the‐art for storage operations and can be extended beyond spot trading to handle generalization error when also including forward trades. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. Individually Conditional Individual Mutual Information Bound on Generalization Error.
- Author
-
Zhou, Ruida, Tian, Chao, and Liu, Tie
- Subjects
- *
GENERALIZATION , *ERROR-correcting codes , *HEURISTIC algorithms , *NOISE measurement , *MATHEMATICAL decoupling , *CONDITIONAL expectations - Abstract
We propose an information-theoretic bound on the generalization error based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou. In a previous work, Haghifam et al. proposed a different bound combining the two aforementioned techniques, which we refer to as the conditional individual mutual information (CIMI) bound. However, in a simple Gaussian setting, both the CMI and the CIMI bounds are order-wise worse than that by Bu et al. This observation motivated us to propose the bound, which overcomes this issue by reducing the conditioning terms in the conditional mutual information. In the process of establishing this bound, a conditional decoupling lemma is established, which also leads to a meaningful dichotomy and comparison among these information-theoretic bounds. As an application of the proposed bound, we analyze the noisy and iterative stochastic gradient Langevin dynamics and provide an upper bound on its generalization error. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. Statistical machine‐learning–based predictive control of uncertain nonlinear processes.
- Author
-
Wu, Zhe, Alnajdi, Aisha, Gu, Quanquan, and Christofides, Panagiotis D.
- Subjects
PREDICTIVE control systems ,RECURRENT neural networks ,STABILITY of nonlinear systems ,STATISTICAL learning ,ADAPTIVE control systems ,CHEMICAL reactors ,CLOSED loop systems - Abstract
In this study, we present machine‐learning–based predictive control schemes for nonlinear processes subject to disturbances, and establish closed‐loop system stability properties using statistical machine learning theory. Specifically, we derive a generalization error bound via Rademacher complexity method for the recurrent neural networks (RNN) that are developed to capture the dynamics of the nominal system. Then, the RNN models are incorporated in Lyapunov‐based model predictive controllers, under which we study closed‐loop stability properties for the nonlinear systems subject to two types of disturbances: bounded disturbances and stochastic disturbances with unbounded variation. A chemical reactor example is used to demonstrate the implementation and evaluate the performance of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Towards a robust out-of-the-box neural network model for genomic data.
- Author
-
Zhang, Zhaoyi, Cheng, Songyang, and Solis-Lemus, Claudia
- Subjects
ARTIFICIAL neural networks ,RECURRENT neural networks ,CONVOLUTIONAL neural networks ,COMPUTER vision ,NATURAL language processing ,BIG data - Abstract
Background: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. Results: Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. Conclusions: While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Multicategory large margin classification with unequal costs.
- Author
-
Zheng, Qingle and Che, Yuezhang
- Subjects
- *
COST , *CLASSIFICATION , *CLASSIFICATION algorithms , *COPULA functions , *ALGORITHMS - Abstract
In this paper, we propose a multicategory large margin classification with unequal costs. In addition to extending the standard multicategory SVM to the case of unequal costs, we also develop an unequal costs classification for ψ-loss and propose an efficient algorithm for computation. Besides commonly used L2 penalty, the adaptive LASSO is also examined to remove irrelevant variables. Theoretically, we derive the Bayes rules under the generalized cost and reduce the infinite sum-to-zero constraint to a finite constraint. Numerically, we demonstrate the good performance of our methodology on two simulated examples and one real-life dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
50. Revisiting Analog Over-the-Air Machine Learning: The Blessing and Curse of Interference.
- Author
-
Yang, Howard H., Chen, Zihan, Quek, Tony Q. S., and Poor, H. Vincent
- Abstract
We study a distributed machine learning problem carried out by an edge server and multiple agents in a wireless network. The objective is to minimize a global function that is a sum of the agents’ local loss functions. And the optimization is conducted by analog over-the-air model training. Specifically, each agent modulates its local gradient onto a set of waveforms and transmits to the edge server simultaneously. From the received analog signal the edge server extracts a noisy aggregated gradient which is distorted by the channel fading and interference, and uses it to update the global model and feedbacks to all the agents for another round of local computing. Since the electromagnetic interference generally exhibits a heavy-tailed intrinsic, we use the $\alpha$ -stable distribution to model its statistic. In consequence, the global gradient has an infinite variance that hinders the use of conventional techniques for convergence analysis that rely on second-order moments’ existence. To circumvent this challenge, we take a new route to establish the analysis of convergence rate, as well as generalization error, of the algorithm. We also show that the training algorithm can be run in tandem with the momentum scheme to accelerate the convergence. Our analyses reveal a two-sided effect of the interference on the overall training procedure. On the negative side, heavy tail noise slows down the convergence rate of the model training: the heavier the tail in the distribution of interference, the slower the algorithm converges. On the positive side, heavy tail noise has the potential to increase the generalization power of the trained model: the heavier the tail, the better the model generalizes. This perhaps counterintuitive conclusion implies that the prevailing thinking on interference – that it is only detrimental to the edge learning system – is outdated and we shall seek new techniques that exploit, rather than simply mitigate, the interference for better machine learning in wireless networks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.