Back to Search
Start Over
ForeXGBoost: passenger car sales prediction based on XGBoost
- Source :
- Distributed and Parallel Databases. 38:713-738
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- The rapid development of machine learning has spurred wide applications to various industries, where prediction models are built to forecast sales to help enterprises and governments make better plans. Alibaba Cloud and the Yancheng Municipal Government held a competition in 2018, calling for global efforts to build machine learning models that can accurately forecast vehicle sales based on large-scale datasets. This paper presents the design, implementation and evaluation of ForeXGBoost, and our proposed model that won the first place in the competition. ForeXGBoost takes full advantage of carefully-designed data filling algorithms for missing values to improve data quality. By using the sliding window to extract historical sales and production data features, ForeXGBoost can improve prediction accuracy. An extensive study is conducted to evaluate the influence of different attributes on vehicle sales via information gain and data correlation, based on which we select the most indicative features from the feature set for prediction. Furthermore, we leverage the XGBoost prediction algorithm to achieve a high prediction accuracy with short running time for vehicle sales prediction. Extensive experiments confirm that ForeXGBoost can achieve a high prediction accuracy with a low overhead.
- Subjects :
- Information Systems and Management
Computer science
business.industry
Cloud computing
02 engineering and technology
Data structure
Missing data
computer.software_genre
Competition (economics)
Hardware and Architecture
020204 information systems
Sliding window protocol
Data quality
0202 electrical engineering, electronic engineering, information engineering
Leverage (statistics)
Data mining
business
computer
Software
Predictive modelling
Information Systems
Subjects
Details
- ISSN :
- 15737578 and 09268782
- Volume :
- 38
- Database :
- OpenAIRE
- Journal :
- Distributed and Parallel Databases
- Accession number :
- edsair.doi...........a181138d7b19c02c8ca388919c485eca
- Full Text :
- https://doi.org/10.1007/s10619-020-07294-y