Back to Search
Start Over
Multi-resolution subsampling for large-scale linear classification
- Publication Year :
- 2024
-
Abstract
- Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information of the full data. The present work takes the view that sampling techniques are recommended for the region we focus on and summary measures are enough to collect the information for the rest according to a well-designed data partitioning. We propose a multi-resolution subsampling strategy that combines global information described by summary measures and local information obtained from selected subsample points. We show that the proposed method will lead to a more efficient subsample-based estimator for general large-scale classification problems. Some asymptotic properties of the proposed method are established and connections to existing subsampling procedures are explored. Finally, we illustrate the proposed subsampling strategy via simulated and real-world examples.<br />Comment: 40 pages
- Subjects :
- Statistics - Methodology
Mathematics - Statistics Theory
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2407.05691
- Document Type :
- Working Paper