291 results on '"GBDT"'
Search Results
2. Threshold and interaction effects of environmental variables affecting the spatial distribution of Pb
- Author
-
Jiang, Yongcheng, Li, Fupeng, Gong, Yufeng, Yang, Xiuyuan, and Zhang, Zhenming
- Published
- 2024
- Full Text
- View/download PDF
3. Research on glass-forming ability based on transformer and tabular data transformation
- Author
-
Lin, Yuancheng, Liang, Yongchao, and Chen, Qian
- Published
- 2025
- Full Text
- View/download PDF
4. Advanced machine learning analysis of radiation hardening in reduced-activation ferritic/martensitic steels
- Author
-
Wang, Pengxin, Tao, Qing, Dong, Hongbiao, and El-Fallah, G.M.A.M.
- Published
- 2025
- Full Text
- View/download PDF
5. Identification of multiple complications as independent risk factors associated with 1-, 3-, and 5-year mortality in hepatitis B-associated cirrhosis patients.
- Author
-
Shen, Duo, Sha, Ling, Yang, Ling, and Gu, Xuefeng
- Subjects
- *
FISHER discriminant analysis , *RECEIVER operating characteristic curves , *PROGNOSTIC models , *DECISION trees , *OVERALL survival - Abstract
Background: Hepatitis B-associated cirrhosis (HBC) is associated with severe complications and adverse clinical outcomes. This study aimed to develop and validate a predictive model for the occurrence of multiple complications (three or more) in patients with HBC and to explore the effects of multiple complications on HBC prognosis. Methods: In this retrospective cohort study, data from 121 HBC patients treated at Nanjing Second Hospital from February 2009 to November 2019 were analysed. The maximum follow-up period was 10.75 years, with a median of 5.75 years. Eight machine learning techniques were employed to construct predictive models, including C5.0, linear discriminant analysis (LDA), least absolute shrinkage and selection operator (LASSO), k-nearest neighbour (KNN), gradient boosting decision tree (GBDT), support vector machine (SVM), generalised linear model (GLM) and naive Bayes (NB), utilising variables such as medical history, demographics, clinical signs, and laboratory test results. Model performance was evaluated via receiver operating characteristic (ROC) curve analysis, residual analysis, calibration curve analysis, and decision curve analysis (DCA). The influence of multiple complications on HBC survival time was assessed via Kaplan‒Meier curve analysis. Furthermore, LASSO and univariable and multivariable Cox regression analyses were conducted to identify independent prognostic factors for overall survival (OS) in patients with HBC, followed by ROC, C-index, calibration curve, and DCA curve analyses of the constructed prognostic nomogram model. This study utilized bootstrap resampling for internal validation and employed the Medical Information Mart for Intensive Care IV (MIMIC-IV) database for external validation. Results: The GBDT model exhibited the highest area under the curve (AUC) and emerged as the optimal model for predicting the occurrence of multiple complications. The key predictive factors included posthospitalisation fever (PHF), body mass index (BMI), retinol binding protein (RBP), total bilirubin (TB) levels, and eosinophils (EOS). Kaplan–Meier analysis revealed that patients with multiple complications had significantly worse OS than those with fewer complications. Additionally, multivariable Cox regression analysis, informed by least absolute shrinkage and LASSO selection, identified hepatocellular carcinoma (HCC), multiple complications, and lactate dehydrogenase (LDH) levels as independent prognostic factors for OS. The prognostic model demonstrated 1-year, 3-year, and 5-year OS ROC AUCs of 0.802, 0.793, and 0.817, respectively. For the internal validation cohort, the corresponding AUC values were 0.797, 0.832, and 0.835. In contrast, the external validation cohort yielded a 1-year ROC AUC of 0.707. Calibration curves indicated good consistency of the model, and DCA demonstrated the model's clinical utility, showing high net benefits within certain threshold ranges. Compared with the univariable models, the multivariable ROC curves indicated higher AUC values for this prognostic model, and the model also possessed the best c-index. Conclusion: The GBDT prediction model provides a reliable tool for the early identification of high-risk HBC patients prone to developing multiple complications. The concurrent occurrence of multiple complications is an independent prognostic factor for OS in patients with HBC. The constructed prognostic model demonstrated remarkable predictive performance and clinical applicability, indicating its crucial role in enhancing patient outcomes through timely and targeted interventions. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
6. What matters in promoting new town by High-Speed Railway station? Evidence from China.
- Author
-
Liu, Yuting, Xu, Shuxian, Tian, Junfang, Liu, Tian-Liang, and Dong, Tao
- Subjects
- *
URBAN growth , *HIGH speed trains , *PUBLIC transit , *DECISION trees , *RAILROAD stations - Abstract
While theoretically High-Speed Rail (HSR) should stimulate local development, in practice, the emergence of prosperous HSR-driven new towns is not always guaranteed. A depth and comprehensive exploration of determinants of HSR new town development, especially station-district conditions, is lacking. Using a dataset of newly constructed HSR stations in China for the period 2009–2019, we employ an interpretable model combining Gradient Boosting Decision Tree (GBDT) and SHapley Additive exPlanations (SHAP) to explore how macro-level urban development, the meso-level HSR station-district conditions, and the micro-level HSR station environment influence HSR station area development. The results indicate that travel convenience to HSR station and station location significantly enhance the development of HSR new towns. Public transit lines and metro have positive effects, and the distances to the city center or pre-existing station exhibit negative impacts. We also identify an inverse relationship that urban and station-district economic levels drive HSR new town development. What's more, we identify the nonlinear and threshold effects and analyze the interactive effects of different influencing factors. These findings offer new perspectives on the promotion of the efficient development of HSR new towns. • Nighttime lighting data is used to evaluate HSR new town development. • Influencing factors are quantified from three distinct levels. • The nonlinear effects of influencing factors are explained by GBDT and SHAP. • The number of transit routes is strongly interacted with many variables. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. A graphics-based digital twin (GBDT) framework for accurate UAV localization in GPS-denied environments.
- Author
-
Matiki, Thomas, Narazaki, Yasutaka, Chowdhary, Girish, and Spencer, Billie F
- Subjects
- *
POSE estimation (Computer vision) , *OBJECT recognition algorithms , *INFRASTRUCTURE (Economics) , *DIGITAL twins , *DRONE aircraft , *GLOBAL Positioning System - Abstract
Autonomous navigation of Unmanned Aerial Vehicles (UAVs) is crucial for effective assessment of large-scale civil infrastructure, as manual UAV control is time consuming and prone to mishaps. Autonomous navigation outdoors typically employs GPS signals to enable accurate localization and reduce long-term drifts. However, in many civil engineering applications, GPS signals are either poor or unavailable due to interference and/or multi-path effects. Current approaches for UAV localization in GPS-denied environments, track geometric features such as corners; however, these natural features can be misrepresented due to the presence of occlusions and/or image processing errors, resulting in significant localization errors that can grow with time. AprilTags can reduce localization errors, but installation on large-scale civil infrastructure can be challenging. Therefore, this paper proposes a framework that leverages the wealth of visual and geometric information encoded in a Graphics-Based Digital Twin (GBDT) of a target infrastructure to provide accurate localization of a UAV. The GBDT is comprised of a computer graphics model that faithfully represents a target structure's geometry, structural features, and visual textures. The visual details of the GBDT are exploited to design an object recognition algorithm that detects GBDTtags, which are distinctive objects (or a collection of components) on the target structure in advance; GBDTtags provide functionality similar to AprilTags. When the camera attached to a UAV detects one or more GBDTtags, the vertices of the GBDTtags are mapped from the image plane into the GBDT coordinate system, allowing for the UAV to be localized. The framework, termed herein as GBDTpose, is first validated numerically using Blender, Gazebo, and Mavros software-in-the-loop (SITL). Subsequently, field validation is carried out using the Kavita and Lalit Bahl Smart Bridge at the University of Illinois Urbana-Champaign (UIUC). Results show that localization in GPS-denied environments can be achieved with 5-50 cm accuracy without the need for physical markers being placed on the structure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. A comparative analysis of consumer credit risk models in Peer-to-Peer Lending.
- Author
-
Lua Thi Trinh
- Subjects
- *
ARTIFICIAL neural networks , *FISHER discriminant analysis , *MACHINE learning , *CREDIT risk , *LOANS , *COUNTERPARTY risk , *PEER-to-peer lending - Abstract
Purpose: The purpose of this paper is to compare nine different models to evaluate consumer credit risk, which are the following: Logistic Regression (LR), Naive Bayes (NB), Linear Discriminant Analysis (LDA), k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), Classification and Regression Tree (CART), Artificial Neural Network (ANN), Random Forest (RF) and Gradient Boosting Decision Tree (GBDT) in Peer-to-Peer (P2P) Lending. Design/methodology/approach: The author uses data from P2P Lending Club (LC) to assess the efficiency of a variety of classification models across different economic scenarios and to compare the ranking results of credit risk models in P2P lending through three families of evaluation metrics. Findings: The results from this research indicate that the risk classification models in the 2013-2019 economic period show greater measurement efficiency than for the difficult 2007-2012 period. Besides, the results of ranking models for predicting default risk show that GBDT is the best model for most of the metrics or metric families included in the study. The findings of this study also support the results of Tsai et al. (2014) and Teplý and Polena (2019) that LR, ANN and LDA models classify loan applications quite stably and accurately, while CART, k-NN and NB show the worst performance when predicting borrower default risk on P2P loan data. Originality/value: The main contributions of the research to the empirical literature review include: comparing nine prediction models of consumer loan application risk through statistical and machine learning algorithms evaluated by the performance measures according to three separate families of metrics (threshold, ranking and probabilistic metrics) that are consistent with the existing data characteristics of the LC lending platform through two periods of reviewing the current economic situation and platform development. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. 车联网中基于stacking集成学习的攻击检测模型.
- Author
-
徐会彬, 方龙, and 张莎
- Abstract
Copyright of Telecommunications Science is the property of Beijing Xintong Media Co., Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
10. Examining the nonlinear and threshold effects of the 5Ds built environment to land values using interpretable machine learning models.
- Author
-
Doan, Quang Cuong, Vu, Khac Hung, Trinh, Thi Kieu Trang, and Bui, Thi Cam Ngoc
- Abstract
Previous studies have extensively explored the critical influence of the built environment on land values, but the non-linear relationship has yet to be fully revealed. This study aims to uncover the non-linear relationship between land values and the five built environment dimensions using machine learning algorithms and Shapley Additive exPlanation (SHAP). The results highlight that the Gradient Boost Decision Tree (GBDT) outperforms eXtreme Gradient Boosting (XGBoost), Ordinary Least Squares (OLS), and Multiscale Geographically Weighted Regression (MGWR) in land value estimation, exhibiting higher R
2 and lower Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results illustrate that density and destination accessibility are the dominant factors, contributing 32.48% and 37.38% to land value variation, respectively. We observed that the top three factors affecting land values are the built-floor area ratio, the number of floors and the number of restaurants. Additionally, the results revealed the non-linear relationship between the built environment and land values, suggesting that maintaining built environment features at optimal thresholds may increase land values. Neglecting interaction effects may lead to bias in determining relationships between land values and the built environment. This study contributes to the literature by providing non-linear and threshold identification evidence in land value determinants, offering valuable insights for urban planners and real estate managers. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
11. Attack detection model based on stacking ensemble learning for Internet of vehicles
- Author
-
XU Huibin, FANG Long, and ZHANG Sha
- Subjects
Internet of vehicles ,intrusion detection ,ADASYN ,GBDT ,stacking ,Telecommunication ,TK5101-6720 ,Technology - Abstract
Due to openness of wireless communication, Internet of vehicles (IoV) is vulnerable to many cyber-attacks such as denial of service, spoofing and fuzzy attacks. Therefore, random forest (RF) and gradient boosting decision tree-based stacking intrusion detection (RF-IDS) model was proposed. Firstly, the adaptive synthetic sampling (ADASYN) algorithm was adopted to generate more similar samples through the nearest neighbor sampling strategy in order to balance the training samples of different categories, and form a relatively symmetric dataset. Secondly, GBDT was used to evaluate the importance of features and select sample data with important features to build a lightweight classifier. Finally, the k-fold cross-validation stacking method was used to reduce the probability of overfitting. RF, GBDT and LightGBM classifiers serve were used as base-learner. The RG-IDS model was tested by CICIDS 2017 and NSL-KDD datasets. The experimental results demonstrate that RG-IDS model can achieve a higher F1-score.
- Published
- 2024
- Full Text
- View/download PDF
12. Machine Learning-Based Water Quality Classification Assessment.
- Author
-
Chen, Wenliang, Xu, Duo, Pan, Bowen, Zhao, Yuan, and Song, Yan
- Subjects
GROUNDWATER quality ,OPTIMIZATION algorithms ,WATER quality ,ENVIRONMENTAL quality ,WATER analysis - Abstract
Water is a vital resource, and its quality has a direct impact on human health. Groundwater, as one of the primary water sources, requires careful monitoring to ensure its safety. Although manual methods for testing water quality are accurate, they are often time-consuming, costly, and inefficient when dealing with large and complex data sets. In recent years, machine learning has become an effective alternative for water quality assessment. However, current approaches still face challenges, such as the limited performance of individual models, minimal improvements from optimization algorithms, lack of dynamic feature weighting mechanisms, and potential information loss when simplifying model inputs. To address these challenges, this paper proposes a hybrid model, BS-MLP, which combines GBDT (gradient-boosted decision tree) and MLP (multilayer perceptron). The model leverages GBDT's strength in feature selection and MLP's capability to manage nonlinear relationships, enabling it to capture complex interactions between water quality parameters. We employ Bayesian optimization to fine-tune the model's parameters and introduce a feature-weighting attention mechanism to develop the BS-FAMLP model, which dynamically adjusts feature weights, enhancing generalization and classification accuracy. In addition, a comprehensive parameter selection strategy is employed to maintain data integrity. These innovations significantly improve the model's classification performance and efficiency in handling complex water quality environments and imbalanced datasets. This model was evaluated using a publicly available groundwater quality dataset consisting of 188,623 samples, each with 15 water quality parameters and corresponding labels. The BS-FAMLP model shows strong classification performance, with optimized hyperparameters and an adjusted feature-weighting attention mechanism. Specifically, it achieved an accuracy of 0.9616, precision of 0.9524, recall of 0.9655, F1 Score of 0.9589, and an AUC score of 0.9834 on the test set. Compared to single models, classification accuracy improved by approximately 10%, and when compared to other hybrid models with additional attention mechanisms, BS-FAMLP achieved an optimal balance between classification performance and computational efficiency. The core objective of this study is to utilize the acquired water quality parameter data for efficient classification and assessment of water samples, with the aim of streamlining traditional laboratory-based water quality analysis processes. By developing a reliable water quality classification model, this research provides robust technical support for water safety management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. A fair and verifiable federated learning profit-sharing scheme.
- Author
-
Li, Xianxian, Huang, Mei, Gao, Shiqi, and Shi, Zhenkui
- Subjects
- *
FEDERATED learning , *MACHINE learning , *DECISION trees , *PROFIT-sharing , *FAIRNESS - Abstract
In recent years, gradient boosting decision trees (GBDTs) has become a popular machine learning algorithm and there have been some studies on federated GBDT training to preserve clients' privacy. However, existing schemes face some severe issues. For example, the integrity of the training process cannot be guaranteed. And most of the schemes ignore how to evaluate the performance gains from different clients' datasets fairly. Developing a fair and secure contribution evaluation mechanism in federated learning to motivate clients to join federated learning remains a challenge. In this paper, we propose a fair and verifiable secure federated GBDT scheme that utilizes Trusted Execution Environments (TEEs) to ensure the integrity of the GBDT training process and quantify the contribution of different parties fairly. We propose a fair and verifiable contribution calculation mechanism based on TEE and the adaptive truncated Monte Carlo approximation Shapley value method. The mechanism can adapt to the limited resources of the device and avoid dishonest behaviors during the training process. In addition, as far as we all know, we attempted to implement the validation of contributions in the federated GBDT scheme for the first time. We implement a prototype of our scheme and evaluate it comprehensively. The results show that, compared with calculating the contribution of each party by the Shapley value method, our scheme can significantly improve the efficiency of contribution calculation in the case of more parties, and provide integrity and fairness guarantees for model and contribution calculations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Predictive AI Models for Early Pest Infestation Alerts Using Climate and Soil Data.
- Author
-
K. V., Rameswara Reddy, Reddy, A. Vishnuvardhan, and Reddy, Mukkamalla Madhusudhan
- Subjects
SUSTAINABLE agriculture ,SUSTAINABILITY ,PEST control ,DECISION trees ,RANDOM forest algorithms - Abstract
This research aims to develop predictive models that use artificial intelligence (AI) to forecast early pest infestations in agriculture by integrating climate and soil data. Pests significantly threaten global food security, causing up to 40% of crop losses annually, highlighting the need for proactive pest management strategies. The study uses a hybrid approach, combining Gradient Boosting Decision Trees (GBDT) and Long Short-Term Memory (LSTM) networks, to analyze how variables such as temperature, humidity, rainfall, soil pH, soil moisture, and nutrient levels influence pest behavior. The models were trained and tested on diverse datasets, and evaluation metrics like accuracy, precision, recall, F1-score, and ROC-AUC were used to determine their effectiveness. The Random Forest model showed the highest accuracy at 89%, making it the most reliable for early pest detection. The findings demonstrate the potential of AI in enhancing agricultural productivity by enabling early warnings, reducing pesticide use, and supporting more sustainable farming practices. This study contributes to the development of scalable, data-driven solutions that integrate environmental variables, enabling better pest management and supporting global food security efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
15. Dynamic Hazard Assessment of Rainfall-Induced Landslides Using Gradient Boosting Decision Tree with Google Earth Engine in Three Gorges Reservoir Area, China.
- Author
-
Yang, Ke, Niu, Ruiqing, Song, Yingxu, Dong, Jiahui, Zhang, Huaidan, and Chen, Jie
- Subjects
LANDSLIDE hazard analysis ,LANDSLIDES ,RAINSTORMS ,DECISION trees ,NATURAL disaster warning systems ,REMOTE sensing by radar ,GORGES ,HAZARD mitigation - Abstract
Rainfall-induced landslides are a major hazard in the Three Gorges Reservoir area (TGRA) of China, encompassing 19 districts and counties with extensive coverage and significant spatial variation in terrain. This study introduces the Gradient Boosting Decision Tree (GBDT) model, implemented on the Google Earth Engine (GEE) cloud platform, to dynamically assess landslide risks within the TGRA. Utilizing the GBDT model for landslide susceptibility analysis, the results show high accuracy with a prediction precision of 86.2% and a recall rate of 95.7%. Furthermore, leveraging GEE's powerful computational capabilities and real-time updated rainfall data, we dynamically mapped landslide hazards across the TGRA. The integration of the GBDT with GEE enabled near-real-time processing of remote sensing and meteorological radar data from the significant "8–31" 2014 rainstorm event, achieving dynamic and accurate hazard assessments. This study provides a scalable solution applicable globally to similar regions, making a significant contribution to the field of geohazard analysis by improving real-time landslide hazard assessment and mitigation strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Fault diagnosis of belt conveyor idlers based on gradient boosting decision tree.
- Author
-
Soares, João L. L., Costa, Thiago B., Moura, Lis S., Sousa, Walter S., Mesquita, Alexandre L. A., Mesquita, André L. A., de Figueiredo, Jullyane M. S., and Braga, Danilo S.
- Subjects
- *
CONVEYOR belts , *BELT conveyors , *DECISION trees , *CLASSIFICATION algorithms , *FAILURE mode & effects analysis , *BOOSTING algorithms , *FAULT diagnosis - Abstract
Maintenance planning and control should prioritize predictive techniques (e.g., vibration analysis on critical components such as belt conveyors and idlers) for addressing the low reliability of bulk transportation systems. The extensive length of the belt conveyor hampers manual inspections and machine learning based on vibration measurements becomes an effective method for fault diagnosis. Models that classify faults, such as Gradient Boosting Decision Tree (GBDT), offer flexible algorithms that optimize decision tree classification through gradient-based techniques for minimizing predictive function errors. However, in case of non-stationary and nonlinear vibration signals, traditional techniques like Fourier Transform can hinder vibration analysis. Methods such as Wavelet Packet Decomposition (WPD) have emerged as an alternative to improve defect detection by extracting energy from signal frequency bands. This paper proposes a combination of WPD and GBDT for feature extraction and classification, respectively, for diagnosing two different failure modes in laboratory belt conveyor idlers, namely, bearing faults and surface wear. GBDT hyperparameters were well-fitted from 21 boosting stages, corroborating the high flexibility of the classification algorithm applied for less robust datasets. Furthermore, GBDT models achieved diagnosis accuracies of 100% for bearing defects and 97.5% for surface wear, showing the effectiveness of the combination for fault identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. How Does the Built Environment Affect Mechanical Parking Space Planning: A Case Study in Xi'an City.
- Author
-
Wang, Yuejiao and Li, Weijia
- Subjects
BUILT environment ,PARK design ,PUBLIC spaces ,PARKING lots ,ENVIRONMENTAL protection ,URBAN planners - Abstract
Mechanical parking lots and spaces are known as the "energy saver" of urban space because of their small footprint, high efficiency, and environmental protection. However, the location and number of mechanical parking lots and space planning have become an important part of effectively exerting the function of mechanical parking lots. In order to explore the planning problem of mechanical parking lots, this study used the gradient boosting decision tree–Shapley additive explanations (GBDT-SHAPs) to measure the non-linear impact of the urban built environment on the mechanical parking spaces ratio and extract the optimal threshold of key variables. The results show that land use mix and distance to Bell Tower (CBD) are two key variables affecting mechanical parking space planning, and both have a non-linear relationship with the built environment. The threshold values are 0.83 and 7 km. The results will provide urban and transport planners with strategies for planning mechanical parking lots and spaces. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. GBDT-based multivariate structural stress data analysis for predicting the sinking speed of an open caisson foundation.
- Author
-
Dong, Xuechao, Guo, Mingwei, and Wang, Shuilin
- Subjects
STRAINS & stresses (Mechanics) ,CAISSONS ,BOOSTING algorithms ,DATA analysis ,BRIDGE design & construction ,SPEED ,STRUCTURAL health monitoring - Abstract
Open caisson foundations are often used in large-span bridge construction because of their advantageous, and an open caisson foundation gradually sinks to a predetermined position via earth excavation. Excessive sinking speed may result in various construction risks (e.g. inclination and structural damage of the open caisson). To prevent these risks, it is important to analyse the sinking situation of the foundation and predict the sinking speed during the sinking process. A sinking speed prediction model is proposed based on the gradient boosting decision tree (GBDT) algorithm, and the model can extract the data features of the structural stress monitoring data to predict the sinking speed of open caissons. Taking the supersized open caisson foundation in the Changtai Yangtze River Bridge Project as a case study, the proposed model was validated by using the monitoring data of this foundation. The validation results of this project indicate that the proposed model has high prediction accuracy, short time consumption, a good prediction effect and high practicability. Based on the model's prediction, an earth excavation scheme can be flexibly adjusted to prevent the potential construction risks of open caissons caused by excessive sinking speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Prospectivity and Uncertainty Analysis of Tungsten Polymetallogenic Mineral Resources in the Nanling Metallogenic Belt, South China: A Comparative Study of AdaBoost, GBDT, and XgBoost Algorithms.
- Author
-
Li, Tongfei, Xia, Qinglin, Ouyang, Yongpeng, Zeng, Runling, Liu, Qiankun, and Li, Taotao
- Subjects
MINES & mineral resources ,SUPERVISED learning ,MACHINE learning ,BOOSTING algorithms ,ORE deposits ,PROSPECTING - Abstract
Supervised machine learning algorithms are utilized to predict undiscovered mineral resources by analyzing the correlation between geological data and mineral deposits. The scarcity of mineralization and the uncertainty arising from the selection of training samples also the accuracy and generalization of such algorithms. This study employed the adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), and extreme gradient boosting (XgBoost) algorithms to map the prospectivity of tungsten polymetallic mineral resources in the Nanling metallogenic belt. Firstly, the under-sampling and synthetic minority oversampling technique (SMOTE) methods were used to generate training datasets. Secondly, 50 groups of training datasets were generated using under-sampling, and another 50 groups of training datasets were generated using the SMOTE method. These datasets were used to separately train different boosting algorithms in order to assess the uncertainty associated with the selection of negative samples and the generation of positive samples. Finally, the risk–return analysis was used to mitigate uncertainty, and an enhanced prediction–area (P–A) plot was proposed to evaluate the performance. The results indicate that AdaBoost is the least affected by the selection of negative samples, followed by XgBoost. The SMOTE not only enhances the performance of AdaBoost and XgBoost algorithms but it also reduces the uncertainty related to the selection of negative samples and the generation of positive samples. In addition, the enhanced P–A plot can simultaneously account for both prediction accuracy and uncertainty, making it a potential tool for model evaluation. According to the results, eight potential areas with high return and low risk have been identified as priority areas for exploration. This research not only introduces a new method for mineral prospectivity mapping and uncertainty evaluation but also provides guidance for mineral exploration in this region. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Financial Budget Item Identification Model: Accurately Matching the Budget Items of Reimbursement Claims based on KNN Algorithm
- Author
-
Wáng, Junyi, Suo, Peichun, Kou, Weili, Zhang, Yan, Zhu, Meicai, Fournier-Viger, Philippe, Series Editor, Lau, Evan Poh Hock, editor, Baharum, Aslina, editor, Wheeb, Ali Hussein, editor, and Chen, Lei, editor
- Published
- 2024
- Full Text
- View/download PDF
21. Forecast Model of the Price of a Product with a Cold Start
- Author
-
Drin, Svitlana, Shchestyuk, Nataliya, Corazza, Marco, editor, Gannon, Frédéric, editor, Legros, Florence, editor, Pizzi, Claudio, editor, and Touzé, Vincent, editor
- Published
- 2024
- Full Text
- View/download PDF
22. Electronic Nose Using Machine Learning Techniques
- Author
-
Gondaliya, Sanskruti H., Gondaliya, Nirali H., Öchsner, Andreas, Series Editor, da Silva, Lucas F. M., Series Editor, Altenbach, Holm, Series Editor, Joshi, Nirav J., editor, and Navale, Sachin, editor
- Published
- 2024
- Full Text
- View/download PDF
23. SecureBoost: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree
- Author
-
Fan, Tao, Chen, Weijing, Ma, Guoqiang, Kang, Yan, Fan, Lixin, Yang, Qiang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yang, De-Nian, editor, Xie, Xing, editor, Tseng, Vincent S., editor, Pei, Jian, editor, Huang, Jen-Wei, editor, and Lin, Jerry Chun-Wei, editor
- Published
- 2024
- Full Text
- View/download PDF
24. Transformer fault diagnosis method based on SMOTE and NGO-GBDT
- Author
-
Li-zhong Wang, Jian-fei Chi, Ye-qiang Ding, Hai-yan Yao, Qiang Guo, and Hai-qi Yang
- Subjects
Fault diagnosis ,Transformers ,Oversampling ,LightGBM feature selection ,GBDT ,Northern goshawk optimization algorithm ,Medicine ,Science - Abstract
Abstract In order to improve the accuracy of transformer fault diagnosis and improve the influence of unbalanced samples on the low accuracy of model identification caused by insufficient model training, this paper proposes a transformer fault diagnosis method based on SMOTE and NGO-GBDT. Firstly, the Synthetic Minority Over-sampling Technique (SMOTE) was used to expand the minority samples. Secondly, the non-coding ratio method was used to construct multi-dimensional feature parameters, and the Light Gradient Boosting Machine (LightGBM) feature optimization strategy was introduced to screen the optimal feature subset. Finally, Northern Goshawk Optimization (NGO) algorithm was used to optimize the parameters of Gradient Boosting Decision Tree (GBDT), and then the transformer fault diagnosis was realized. The results show that the proposed method can reduce the misjudgment of minority samples. Compared with other integrated models, the proposed method has high fault identification accuracy, low misjudgment rate and stable performance.
- Published
- 2024
- Full Text
- View/download PDF
25. GBDT Method Integrating Feature-Enhancement and Active-Learning Strategies—Sea Ice Thickness Inversion in Beaufort Sea.
- Author
-
Han, Yanling, Huang, Junjie, Ma, Zhenling, Zheng, Bowen, Wang, Jing, and Zhang, Yun
- Subjects
- *
SEA ice , *ACTIVE learning , *STANDARD deviations , *OCEAN temperature - Abstract
Sea ice, as an important component of the Earth's ecosystem, has a profound impact on global climate and human activities due to its thickness. Therefore, the inversion of sea ice thickness has important research significance. Due to environmental and equipment-related limitations, the number of samples available for remote sensing inversion is currently insufficient. At high spatial resolutions, remote sensing data contain limited information and noise interference, which seriously affect the accuracy of sea ice thickness inversion. In response to the above issues, we conducted experiments using ice draft data from the Beaufort Sea and designed an improved GBDT method that integrates feature-enhancement and active-learning strategies (IFEAL-GBDT). In this method, the incident angle and time series are used to perform spatiotemporal correction of the data, reducing both temporal and spatial impacts. Meanwhile, based on the original polarization information, effective multi-attribute features are generated to expand the information content and improve the separability of sea ice with different thicknesses. Taking into account the growth cycle and age of sea ice, attributes were added for month and seawater temperature. In addition, we studied an active learning strategy based on the maximum standard deviation to select more informative and representative samples and improve the model's generalization ability. The improved GBDT model was used for training and prediction, offering advantages in dealing with nonlinear, high-dimensional data, and data noise problems, further expanding the effectiveness of feature-enhancement and active-learning strategies. Compared with other methods, the method proposed in this paper achieves the best inversion accuracy, with an average absolute error of 8 cm and a root mean square error of 13.7 cm for IFEAL-GBDT and a correlation coefficient of 0.912. This research proves the effectiveness of our method, which is suitable for the high-precision inversion of sea ice thickness determined using Sentinel-1 data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Model and design of an efficient controller for microgrid connected HRES system with integrated DC–DC converters: ATLA-GBDT approach.
- Author
-
Vechalapu, Kamaraju and Bhaskara Reddy, Chintapalli V. V. S.
- Subjects
RENEWABLE energy sources ,MICROGRIDS ,ENERGY consumption ,DC-to-DC converters ,BOOSTING algorithms ,DATABASES ,DECISION trees - Abstract
A controller is modelled and designed to optimize the power transfer in microgrid-connected hybrid renewable energy systems using an integrated DC/DC converter. To maximize the converter's output power and minimize the switching losses of the converter, a model is developed by including a simplified high conversion ratio converter, a maximal power point tracker, and an optimal controller with an effective control strategy. The proposed control system is a combination of the Artificial Transgender Longicorn Algorithm (ATLA) and the Gradient Boosting Decision Tree (GBDT) algorithm, named the ATLA-GBDT method. In the suggested technique, the ATLA is used as an assessment method to build up accurate control signals for the system and to improve the control signals database for offline use while considering the power exchange between the source and load. In addition, for training a GBDT system online, the data set received from the sensor is used to develop a control system for faster response. In addition, the goal function is defined by the system data, which is subject to equality and inequality constraints. Various constraints considered in the problem formulation are the output of renewable energy sources, power requirements, and the state of charge of storage components. The proposed control system is simulated using the MATLAB/Simulink platform, and the implementation is compared with the existing techniques. Various performance metrics like accuracy, specificity, recall and precision, RMSE, MAPE, and MBE of the proposed method and existing methods in the literature are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Flight Trainee Performance Evaluation Using Gradient Boosting Decision Tree, Particle Swarm Optimization, and Convolutional Neural Network (GBDT-PSO-CNN) in Simulated Flights.
- Author
-
Shang, Lei, Wang, Haibo, Si, Haiqing, Wang, Yonghu, Pan, Ting, Liu, Haibo, and Li, Yixuan
- Subjects
CONVOLUTIONAL neural networks ,PARTICLE swarm optimization ,DECISION trees ,FLIGHT training - Abstract
Flight simulation training is one of the most important methods in early-stage civil aviation flight training. In this regard, flight simulation competitions are effective tools for evaluating the flight skills of trainees. In this study, a model is developed for evaluating the flight skills of trainees by integrating GBDT (Gradient Boosting Decision Tree), PSO (Particle Swarm Optimization), and CNNs (Convolutional Neural Networks). Flight data from simulations is employed for model training. Initially, performance data and scores are gathered from a simulated flight competition platform. The GBDT algorithm is then applied to filter and identify essential flight parameters from the collected data. Subsequently, the PSO-CNN model is utilized to train on the extracted flight parameters. The proposed GBDT-PSO-CNN model achieves a recognition rate of 93.8% on the test dataset. This assessment system is of significant importance for improving the specific maneuvering skill level of flight trainees. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Transformer fault diagnosis method based on SMOTE and NGO-GBDT.
- Author
-
Wang, Li-zhong, Chi, Jian-fei, Ding, Ye-qiang, Yao, Hai-yan, Guo, Qiang, and Yang, Hai-qi
- Subjects
GOSHAWK ,DIAGNOSIS methods ,FAULT diagnosis ,OPTIMIZATION algorithms ,DECISION trees ,FEATURE selection - Abstract
In order to improve the accuracy of transformer fault diagnosis and improve the influence of unbalanced samples on the low accuracy of model identification caused by insufficient model training, this paper proposes a transformer fault diagnosis method based on SMOTE and NGO-GBDT. Firstly, the Synthetic Minority Over-sampling Technique (SMOTE) was used to expand the minority samples. Secondly, the non-coding ratio method was used to construct multi-dimensional feature parameters, and the Light Gradient Boosting Machine (LightGBM) feature optimization strategy was introduced to screen the optimal feature subset. Finally, Northern Goshawk Optimization (NGO) algorithm was used to optimize the parameters of Gradient Boosting Decision Tree (GBDT), and then the transformer fault diagnosis was realized. The results show that the proposed method can reduce the misjudgment of minority samples. Compared with other integrated models, the proposed method has high fault identification accuracy, low misjudgment rate and stable performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Fusion of GBDT and neural network for click-through rate estimation.
- Author
-
Zhao, Bin, Cao, Wei, Zhang, Jiqun, Gao, Yilong, Li, Bin, and Chen, Fengmei
- Abstract
Aiming at the issue that the current click-through rate prediction methods ignore the varying impacts of different input features on prediction accuracy and exhibit low accuracy when dealing with large-scale data, a click-through rate prediction method (GBIFM) which combines Gradient Boosting Decision Tree (GBDT) and Input-aware Factorization Machine (IFM) is proposed in this paper. The proposed GBIFM method employs GBDT for data processing, which can flexibly handle various types of data without the need for one-hot encoding of discrete features. An Input-aware strategy is introduced to refine the weight vector and embedding vector of each feature for different instances, adaptively learning the impact of each input vector on feature representation. Furthermore, a fully connected network is incorporated to capture high-order features in a non-linear manner, enhancing the method’s ability to express and generalize complex structured data. A comprehensive experiment is conducted on the Criteo and Avazu datasets, the results show that compared to typical methods such as DeepFM, AFM, and IFM, the proposed method GBIFM can increase the AUC value by 10% –12% and decrease the Logloss value by 6% –20%, effectively improving the accuracy of click-through rate prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Comparison of Machine Learning Models in Simulating Glacier Mass Balance: Insights from Maritime and Continental Glaciers in High Mountain Asia.
- Author
-
Ren, Weiwei, Zhu, Zhongzheng, Wang, Yingzheng, Su, Jianbin, Zeng, Ruijie, Zheng, Donghai, and Li, Xin
- Subjects
- *
MACHINE learning , *ARTIFICIAL neural networks , *GLACIERS , *ALPINE glaciers , *DEEP learning , *STANDARD deviations , *GLACIAL melting , *EVIDENCE gaps - Abstract
Accurately simulating glacier mass balance (GMB) data is crucial for assessing the impacts of climate change on glacier dynamics. Since physical models often face challenges in comprehensively accounting for factors influencing glacial melt and uncertainties in inputs, machine learning (ML) offers a viable alternative due to its robust flexibility and nonlinear fitting capability. However, the effectiveness of ML in modeling GMB data across diverse glacier types within High Mountain Asia has not yet been thoroughly explored. This study addresses this research gap by evaluating ML models used for the simulation of annual glacier-wide GMB data, with a specific focus on comparing maritime glaciers in the Niyang River basin and continental glaciers in the Manas River basin. For this purpose, meteorological predictive factors derived from monthly ERA5-Land datasets, and topographical predictive factors obtained from the Randolph Glacier Inventory, along with target GMB data rooted in geodetic mass balance observations, were employed to drive four selective ML models: the random forest model, the gradient boosting decision tree (GBDT) model, the deep neural network model, and the ordinary least-square linear regression model. The results highlighted that ML models generally exhibit superior performance in the simulation of GMB data for continental glaciers compared to maritime ones. Moreover, among the four ML models, the GBDT model was found to consistently exhibit superior performance with coefficient of determination ( R 2 ) values of 0.72 and 0.67 and root mean squared error ( R M S E ) values of 0.21 m w.e. and 0.30 m w.e. for glaciers within Manas and Niyang river basins, respectively. Furthermore, this study reveals that topographical and climatic factors differentially influence GMB simulations in maritime and continental glaciers, providing key insights into glacier dynamics in response to climate change. In summary, ML, particularly the GBDT model, demonstrates significant potential in GMB simulation. Moreover, the application of ML can enhance the accuracy of GMB modeling, providing a promising approach to assess the impacts of climate change on glacier dynamics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. A black-box model for predicting difficulty of word puzzle games: a case study of Wordle.
- Author
-
Shi, Ling, Chen, Yingke, Lin, Jiaxuan, Chen, Xiaoyu, and Dai, Guangming
- Subjects
WORD games ,DEEP learning ,VIDEO games ,RULES of games ,PARTS of speech ,WORD frequency - Abstract
The popular word-filling game Wordle has gained widespread attention since its release in 2022. Much attention has been paid to find the optimal strategy. However, this article proposes a black-box prediction model that can accurately predict the difficulty level of words in the game to find the deep rules in the game data. In this work, we scientifically established a black-box model for game difficulty prediction. We achieve high accuracy in new datasets and show strong stability in similar tasks. The black-box model is divided into the game input content feature extraction model and the game output content rule extraction model. This research scientifically and effectively extracts word attributes, including word frequency, letter frequency, part of speech, times of letter repetitions, and word meaning score from the input content. Then it reduces the seven kinds of proportion of people in different tries in output content into two indices using the Critic method. Finally, it establishes a gradient boosting decision tree-based multiple regression model, making the final prediction accuracy of difficulty level for new words reach 95%. It is believed that the black-box prediction model can provide valuable insights for game designers and developers. And the research provides an innovative method to predict and understand user behavior in online games, contributing to the broader field of data science. The integration of data-driven methodologies in the gaming industry opens new possibilities for understanding player interactions and further enhancing game development strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Intelligent Prediction of the Sport Game Outcome Using a Hybrid Machine Learning Model
- Author
-
Kaiwen Cui, Xuanyi Li, and Shuo Yang
- Subjects
basketball game ,GBDT ,hybrid model ,NCAA ,SVM ,Tabnet ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
The National Collegiate Athletic Association (NCAA) serves as the platform for showcasing the skills of talented basketball players from various colleges. With the historical set provided by NCAA this study proposes a hybrid model which is combining the gradient boosting decision tree (GBDT), Tabnet and support vector machine (SVM) for 2023 NCAA basketball game outcome. For each possible matchup between two top college teams, the model can predict the probability of the win rate and the winner team. The fusion model combines the strengths of tree-based model, linear models like SVM and Tabnet to enhance prediction performance, robustness, and interpretability. The data exploration and preparation part shows the important features like the win Ratio of different teams and the feature engineering for the further model training.The experiment part shows the data distribution and feature engineering and performance for each model. The hybrid model beats the separated model with a better brier score of 0.176, which shows the superiority of the hybrid model.
- Published
- 2024
- Full Text
- View/download PDF
33. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier
- Author
-
Xin Liu, Bao Zhu, Xia-Wei Dai, Zhi-Ao Xu, Rui Li, Yuting Qian, Ya-Ping Lu, Wenqing Zhang, Yong Liu, and Junnian Zheng
- Subjects
Lysine glutarylation ,Post-translational modification ,GBDT ,Elastic Net ,NearMiss-3 ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. Results In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. Conclusion GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite .
- Published
- 2023
- Full Text
- View/download PDF
34. Optimization of shale gas fracturing parameters based on artificial intelligence algorithm
- Author
-
Shihao Qian, Zhenzhen Dong, Qianqian Shi, Wei Guo, Xiaowei Zhang, Zhaoxia Liu, Lingjun Wang, Lei Wu, Tianyang Zhang, and Weirong Li
- Subjects
Shale gas ,Parameter optimization ,Prediction ,GBDT ,PSO ,Geography (General) ,G1-922 ,Information technology ,T58.5-58.64 - Abstract
Resource-rich shale gas plays a pivotal role in new energy types. The key to scientifically and efficiently developing shale gas fields is to clarify the main factors that affect the production of shale gas wells. In this paper, according to the shale gas reservoir characteristic of the Fuling marine Longmaxi Formation, a single-well geological model was established using the reservoir numerical simulation software CMG. Then, 10,000 different reservoir models were randomly generated for different formation physical parameters, completion parameters, and fracturing parameters using the Monte Carlo method, and these 10,000 models were simulated numerically. The machine learning model uses a dataset of 10,000 different geological, completion, and fracturing parameters as input and 10,000 production curves as output. Multiple machine learning regression methods were used to train and test the dataset, and the optimal method (GBDT algorithm) was selected, and the accuracy R2 of the test set of the GBDT prediction model is 0.96. A fracturing parameter optimization workflow was constructed by combining a production prediction model with a particle swarm optimizer (PSO). The process can quickly optimize the fracturing parameters and predict the production for each time by targeting the cumulative gas production under different geological conditions. The optimized parameters are Fracture Spacing, Fracture Width, Intrinsic Permeability, Fracture Half-length, Langmuir Pressure, and Langmuir Volume. The initial predicted cumulative gas production was 4.59 × 108 m3, which was optimized to 4.90 × 108 m3. The proposed PSO-GBDT proxy model can instantly predict the production of shale gas wells with considerable accuracy, reliability, and efficiency, which is a vital tool for optimizing fracture design. This investigation provides a solid foundation for predicting the production of unconventional gas reservoirs and for parameter optimization.
- Published
- 2023
- Full Text
- View/download PDF
35. Measurement and Calibration of EMF: A Study Using Phone and GBDT for Mobile Communication Signals.
- Author
-
Zeng, Sheng, Chen, Weiwei, Ji, Yuhang, Yan, Liping, and Zhao, Xiang
- Subjects
ELECTRIC field strength ,SIGNALS & signaling ,MOBILE communication systems ,SMARTPHONES ,TELECOMMUNICATION ,ELECTRIC fields ,CELL phones - Abstract
Electromagnetic exposure caused by mobile communication signals has always been a cause of concern. Due to the cost and inconvenience of professional measurement equipment, researchers have turned to smartphone APPs to study and assess the electric field strength caused by mobile communication signals. However, existing cell phone‐based measurements have two weaknesses. First, no system architecture suitable for large‐scale crowdsourced testing has been proposed. Second, since smartphone sensors cannot measure electric field strength directly, existing methods for converting the received signal power of the phone and electric field strength have errors of more than 5 dB. This paper proposes a measurement and calibration method for electric field strength of mobile communication signals based on a smartphone app and gradient boosting decision tree (GBDT). This method consists of a downlink signal acquisition system based on an APP and a calibration model based on GBDT to convert received signal power into electric field strength. The experimental results show that the proposed model achieves a R2 score of 0.93 and a MAE of 0.97 dB. Compared with the existing methods, our method improves the calibration accuracy by 4 dB, enabling large‐scale, low‐cost, and high‐precision direct measurement of the electric field strength of mobile communication signals. Key Points: Using smartphones and machine learning to measure the electric field strength of mobile communication signalsA system converting the power of mobile communication signals to electric field strength with high accuracySuitable for large‐scale and low‐cost measurement of the electric field strength of mobile communication signals [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Assessing landslide susceptibility using improved machine learning methods and considering spatial heterogeneity for the Three Gorges Reservoir Area, China.
- Author
-
Dong, Jiahui, Niu, Ruiqing, Chen, Tao, and Dong, LiangYun
- Subjects
LANDSLIDE hazard analysis ,LANDSLIDES ,MACHINE learning ,GORGES ,RECEIVER operating characteristic curves ,HETEROGENEITY ,SUPPORT vector machines - Abstract
When conducting susceptibility evaluation for study areas of special significance, especially those with spatial heterogeneity of landslide development, it is easy to ignore the potential errors caused by spatial asymmetry of geographic factors and differences in landslide development when evaluating the whole area. This study proposed an evaluation method that breaks down the Three Gorges Reservoir Area (TGRA) into smaller regions and assesses the susceptibility of landslides to each sub-region in order to assess and resolve the effect of spatial heterogeneity within the entire reservoir area of the TGRA. This method uses a combination of certainty factors (CF) and machine learning models to identify the key factors of high susceptibility index. Three machine learning models—the support vector machine (SVM), the logistic regression (LR), and the gradient boosted descent tree (GBDT)—were improved in this study. These enhanced models incorporate CF, resulting in the creation of CF-LR, CF-SVM, and CF-GBDT models. The results of the zonal evaluation are superior to those of the direct overall assessment, according to the examination of receiver operating characteristic (ROC) curves, and CF-GBDT outperforms the other five models in terms of determining the susceptibility of the entire TGRA. The occurrence of regional heterogeneity in the TGRA is confirmed by the CF-GBDT model, which also takes into account the importance of landslide influence factors between Region I and Region II. By analyzing the impact of zonal evaluation on each district and county in the TGRA, the significance of zoning in the study of landslide susceptibility within large watersheds is emphasized, providing a new perspective for regional landslide susceptibility assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. The Real-Time Dynamic Prediction of Optimal Taxi Cruising Area Based on Deep Learning.
- Author
-
Wang, Sai, Wang, Jianjun, Ma, Chicheng, Li, Dongyi, and Cai, Lu
- Abstract
A real-time, effective, and dynamic taxi cruising recommendation strategy is essential to solving the problem of taxi cruising passenger difficulty and urban road traffic congestion. This study focuses on two aspects of the real-time accessible range and pick-up ratio (PR) and proposes a real-time dynamic identification method for taxi optimal cruise-seeking area. Firstly, based on the cumulative opportunity method, a univariate temporal convolutional network (UTCN) accessible range dynamic prediction model is proposed to predict the real-time accessible range of taxis. Secondly, based on the gradient boosting decision tree (GBDT) model, the influencing factors with a high correlation with the PR are selected from the four dimensions of traffic characteristics, environmental meteorology, and time and space variables. Then, a multivariate univariate temporal convolutional network (MTCN) global grid PR prediction model is constructed, and the optimal taxi cruising area is identified based on the maximum PR. The results show that the taxi accessible range and PR of the same grid in different periods change with time, and based on the model comparison, the accessible range and PR prediction results of UTCN and MTCN algorithms in different periods are the best to identify the optimal cruising area of taxis in different periods. The main contribution of this study is that the proposed optimal cruising area prediction model has timeliness, accessibility, and dynamics. It can not only improve the probability of taxis receiving passengers and avoid taxis cruising aimlessly, but also solve the shortage of taxis in hotspots, thus shortening the waiting time of passengers. This provides a scientific basis for improving taxi cruising efficiency and the government's formulation of taxi operation management policies, which can effectively promote the sustainable development of urban traffic. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Predicting Risk of Bullying Victimization among Primary and Secondary School Students: Based on a Machine Learning Model.
- Author
-
Qiu, Tian, Wang, Sizhe, Hu, Di, Feng, Ningning, and Cui, Lijuan
- Subjects
- *
SCHOOL bullying , *SCHOOL children , *MACHINE learning , *SECONDARY school students , *CONTROL (Psychology) , *FAMILY relations - Abstract
School bullying among primary and secondary school students has received increasing attention, and identifying relevant factors is a crucial way to reduce the risk of bullying victimization. Machine learning methods can help researchers predict and identify individual risk behaviors. Through a machine learning approach (i.e., the gradient boosting decision tree model, GBDT), the present longitudinal study aims to systematically examine individual, family, and school environment factors that can predict the risk of bullying victimization among primary and secondary school students a year later. A total of 2767 participants (2065 secondary school students, 702 primary school students, 55.20% female students, mean age at T1 was 12.22) completed measures of 24 predictors at the first wave, including individual factors (e.g., self-control, gender, grade), family factors (family cohesion, parental control, parenting style), peer factor (peer relationship), and school factors (teacher–student relationship, learning capacity). A year later (i.e., T2), they completed the Olweus Bullying Questionnaire. The GBDT model predicted whether primary and secondary school students would be exposed to school bullying after one year by training a series of base learners and outputting the importance ranking of predictors. The GBDT model performed well. The GBDT model yielded the top 6 predictors: teacher–student relationship, peer relationship, family cohesion, negative affect, anxiety, and denying parenting style. The protective factors (i.e., teacher–student relationship, peer relationship, and family cohesion) and risk factors (i.e., negative affect, anxiety, and denying parenting style) associated with the risk of bullying victimization a year later among primary and secondary school students are identified by using a machine learning approach. The GBDT model can be used as a tool to predict the future risk of bullying victimization for children and adolescents and to help improve the effectiveness of school bullying interventions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.
- Author
-
Liu, Xin, Zhu, Bao, Dai, Xia-Wei, Xu, Zhi-Ao, Li, Rui, Qian, Yuting, Lu, Ya-Ping, Zhang, Wenqing, Liu, Yong, and Zheng, Junnian
- Subjects
PREDICTION models ,POST-translational modification ,CELL physiology ,AMINO acid sequence ,LYSINE - Abstract
Background: Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. Results: In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. Conclusion: GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. 多变量乘用车销量预测模型研究.
- Author
-
段昊江 and 吴冰
- Abstract
Copyright of Automotive Digest is the property of Automotive Digest Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
41. Do CEO and chairman characteristics affect green innovation? Evidence from a comparative analysis of machine learning models
- Author
-
Xue, Ruixiang, Ong, Tze San, and Demir, Ezgi
- Published
- 2024
- Full Text
- View/download PDF
42. Accurate Identification of Submitochondrial Protein Location Based on Deep Representation Learning Feature Fusion
- Author
-
Sui, Jianan, Chen, Yuehui, Cao, Yi, Zhao, Yaou, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Premaratne, Prashan, editor, Jin, Baohua, editor, Qu, Boyang, editor, Jo, Kang-Hyun, editor, and Hussain, Abir, editor
- Published
- 2023
- Full Text
- View/download PDF
43. Bayesian model averaging for predicting factors associated with length of COVID-19 hospitalization
- Author
-
Shabnam Bahrami, Karimollah Hajian-Tilaki, Masomeh Bayani, Mohammad Chehrazi, Zahra Mohamadi-Pirouz, and Abazar Amoozadeh
- Subjects
AIC ,GBDT ,Bayesian model averaging ,BIC ,COVID-19 ,Length of hospital stay ,Medicine (General) ,R5-920 - Abstract
Abstract Introduction The length of hospital stay (LOHS) caused by COVID-19 has imposed a financial burden, and cost on the healthcare service system and a high psychological burden on patients and health workers. The purpose of this study is to adopt the Bayesian model averaging (BMA) based on linear regression models and to determine the predictors of the LOHS of COVID-19. Methods In this historical cohort study, from 5100 COVID-19 patients who had registered in the hospital database, 4996 patients were eligible to enter the study. The data included demographic, clinical, biomarkers, and LOHS. Factors affecting the LOHS were fitted in six models, including the stepwise method, AIC, BIC in classical linear regression models, two BMA using Occam's Window and Markov Chain Monte Carlo (MCMC) methods, and GBDT algorithm, a new method of machine learning. Results The average length of hospitalization was 6.7 ± 5.7 days. In fitting classical linear models, both stepwise and AIC methods (R 2 = 0.168 and adjusted R 2 = 0.165) performed better than BIC (R 2 = 0.160 and adjusted = 0.158). In fitting the BMA, Occam's Window model has performed better than MCMC with R 2 = 0.174. The GBDT method with the value of R 2 = 0.64, has performed worse than the BMA in the testing dataset but not in the training dataset. Based on the six fitted models, hospitalized in ICU, respiratory distress, age, diabetes, CRP, PO2, WBC, AST, BUN, and NLR were associated significantly with predicting LOHS of COVID-19. Conclusion The BMA with Occam's Window method has a better fit and better performance in predicting affecting factors on the LOHS in the testing dataset than other models.
- Published
- 2023
- Full Text
- View/download PDF
44. Application of Sustainable Education in Chinese Language Education in the Context of Big Data
- Author
-
Nan Yunfan
- Subjects
data mining ,random forest ,light gbm ,gbdt ,chinese language education ,00a35 ,Mathematics ,QA1-939 - Abstract
This paper explores all student related data using educational data mining techniques to draw conclusions about their performance and behavior. The main idea and theoretical basis of Random Forest is described, the importance of each feature is calculated using Random Forest based Important Feature Selection Algorithm, and each feature is ranked and the best feature is selected as the effective feature for constructing the performance prediction model. By Light GBM is to further improve the GBDT algorithm and XGBoost algorithm to construct the Light GBM grade prediction model in order to improve the training speed and the prediction ability of the model. In order to verify the feasibility of the constructed model, the application of sustainable education in Chinese language education is tested from various aspects such as model testing as well as learning behavior. The results show that the accuracy of the Ligth GBM grade prediction model increases from 0.68 to 0.918 when the course progress is raised from 10 to 100, i.e., the accuracy of the Ligth GBM grade prediction model in predicting students’ grades gradually increases as the course progresses, so that it can effectively analyze the application of sustainable education in Chinese education.
- Published
- 2024
- Full Text
- View/download PDF
45. Artificial Intelligence Technology Empowers Practical Teaching of Higher Vocational Accounting Majors
- Author
-
Wu Jian, Chen Xuemei, Zhang Yuan, and Wang Hui
- Subjects
lightgbm model ,gbdt ,objective function ,practice teaching system ,accounting specialty ,68t01 ,Mathematics ,QA1-939 - Abstract
This paper proposes an interactive study of online practice teaching based on LightGBM model. To avoid the overfitting of training data by LightGBM model, the objective function of LightGBM model is derived utilizing GBDT gradient boosting tree to optimize the overfitting problem of training data. Based on the interactive study of online practice teaching based on the LightGBM model, the construction of the accounting practice teaching system is completed by using B/S mode, and the simulation analysis of online practice teaching of accounting majors in the context of the artificial intelligence era is carried out. The results show that the students in class C, with an AUC value of 0.819, are higher than that before optimization by 0.095, and similarly comparing the AUC values of other classes are higher than that before optimization. The LightGBM model optimized by the grid search algorithm can effectively identify and interact with students’ accounting practice behavioral characteristics, and to a certain extent, effectively predict students’ accounting practice ability. This study has the potential to guide students in mastering accounting practice knowledge, guaranteeing quality practice teaching, and fostering the growth of accounting professionals.
- Published
- 2024
- Full Text
- View/download PDF
46. A Study of Measurement Modeling of Decision Trees in Machine Learning Processes
- Author
-
Li Guo, Qin Yi, and Wang Minghua
- Subjects
decision tree ,gradient boosting ,cnn ,gbdt ,measurement modeling ,00a71 ,Mathematics ,QA1-939 - Abstract
Accompanied by the rapid development of economy and science and technology, the ordinary measurement model with a single method of parameter determination and accuracy is not guaranteed, which has made it difficult to adapt to the measurement needs of complex data in industrial engineering and other systems. This study proposes a measurement model for complex data through the optimization of decision trees in the process of machine learning. Firstly, the gradient-boosting-based decision tree measurement model (GBDT) is constructed by analyzing the decision tree model, and then the model is solved. At the same time, latent variables were included in the model, SEM described the reflection relationship of explicit variables to latent variables, and the GBDT optimization model, including latent variables, was constructed by using the results of the model measurement, including latent variables. Then, for the measurement of multivariate data, the fusion convolutional network was used for image data feature extraction, and the combined measurement model with multi-source data fusion (MDF-DTFEE) was constructed on the basis of the decision tree measurement model. In the empirical analysis of the measurement model, the predicted and actual values of the model training were fitted between 4~60 mg/L and 5~45 ml/L, respectively, and its R² on the training set and test set were 0.948 and 0.886, respectively, with the RMSE lower than 1.2, and none of the MAPE exceeded 0.2. The practical application always had an error range of 1 mg/L, which is in line with the requirements. It fulfills the practical application requirements, demonstrates the practical value of the measurement model in this paper, and provides a useful solution for measuring complex data.
- Published
- 2024
- Full Text
- View/download PDF
47. A Predictive Model of Learning Effectiveness in Flipped Classroom Mode: An Exploration of Higher Vocational English Learning Based on Machine Learning
- Author
-
Wang Lizhen
- Subjects
random forest ,gbdt ,xgboost prediction ,stacking fusion algorithm ,flipped classroom ,00a71 ,Mathematics ,QA1-939 - Abstract
Taking English microclasses as an example, this paper analyzes the practical operation of flipped classroom teaching in the reform of higher vocational English teaching from the three phases of pre-course, in-course and post-course. Comparing and analyzing the advantages of each fusion algorithm, the Stacking model fusion algorithm is selected to construct a multi-model fusion prediction model of students’ learning effectiveness, and the experimental process of students’ learning effectiveness prediction model based on Stacking fusion is summarized. The algorithmic performance of each machine learning prediction model is determined using each evaluation index. The multi-model fusion learning effectiveness prediction model is employed to predict and analyze the overall and individual effectiveness of English learning by organizing students’ English learning data. Combined with the prediction results of the flipped classroom platform data, the overall performance of the multi-model fusion prediction model is more stable, with a more balanced distribution in the range of 0.7~0.9, which can obtain better accuracy performance than LR, GBDT and XGBoost, and is more capable of predicting the students’ learning effectiveness in terms of the stages of learning (certified, grade, and total_time) in real life. Prediction.
- Published
- 2024
- Full Text
- View/download PDF
48. Empirical analysis of monthly electricity consumption prediction in manufacturing industry using machine learning techniques
- Author
-
Shi Yan, Yang Fengjiu, Zhang Yi, Wang Siteng, and Han Junjie
- Subjects
svm ,gbdt ,particle swarm optimization ,wavelet decomposition ,fuzzy mean clustering ,electricity consumption prediction ,00a79 ,Mathematics ,QA1-939 - Abstract
Electricity consumption prediction is an important part of power planning and the basis of power dispatch planning. The SVM and GBDT algorithms that were optimized by the PSO algorithm are used to build the machine learning-based electricity consumption prediction model in this paper. The decomposed and reconstructed high-frequency signals and low-frequency signals are optimized by the particle swarm algorithms SVM and GBDT for power consumption prediction, respectively. Improve the clustering performance of the traditional fuzzy C-mean algorithm for unbalanced data. The performance of the algorithms is analyzed in two different application scenarios, namely, artificial dataset and power users' real measurement dataset. Power consumption prediction in the manufacturing industry is performed. It is found that the ARI, FMI, and AMI index values of the improved algorithms proposed in this paper are 0.9543, 0.9347, and 0.9344, respectively, on the grid user-measured dataset, while the traditional DPC and K-means clustering algorithms are less effective. The machine learning algorithm optimized after wavelet decomposition increased R² by 8.98%, MAPE decreased by 19.78%, and RMSE decreased by 11.53% compared to PSO-GBDT, and the predictive evaluation indexes were all improved, and the two machine learning algorithms in this paper optimized by wavelet decomposition combined with PSO have good predictive effect R² increased from 0.801 to 0.842, and the two machine learning algorithms designed in this paper based on wavelet decomposition have good predictive effect R² increased from 0.801 to 0.842. The designed machine learning model for electricity consumption prediction based on wavelet decomposition and PSO has excellent performance, and the design expectation has been fulfilled. This paper makes a useful exploration and proposes an effective method for accurate prediction of electricity consumption in the manufacturing industry.
- Published
- 2024
- Full Text
- View/download PDF
49. Research on Detection and Restoration Methods of Basic Operation Data for Inter-Basin Water Diversion Projects.
- Author
-
Lu, Mengyao, Xu, Guitao, and Liu, Xiaolian
- Subjects
WATER diversion ,WATER levels ,WATER supply ,WATER pipelines ,HYDRAULIC models ,DECISION trees - Abstract
Inter-basin water diversion is an essential means to alleviate the contradiction between the supply and demand of water resources, and accurate hydraulic modelling guarantees smooth operation. However, due to the increasing complexity of water diversion methods, structures, water conservancy facilities and equipment, it is difficult to obtain accurate and effective measured data to establish a model. Therefore, based on a data-driven method, this paper diagnoses and restores the important parameters of the water diversion projects, including the elevation of pipeline and water level data, which can be used to establish the adaptive hydraulic transition model of the water diversion projects. Firstly, the abnormal data of the elevation of pipeline were identified using expert data annotation and support vector classification (SVC), with the identification accuracy of abnormal data being as high as 91%. Then, the single and continuous abnormal data were restored using an interpolation method and multiple linear regression algorithm (MLR), and the restored data were found to be consistent with the judgment of expert knowledge. Secondly, K-medoids was used to classify the complex multi-dimensional water level data, combined with the 3-sigma method to identify the outliers in each class. The gradient boosting decision tree algorithm (GBDT) trained on normal data restored outliers in a predictive manner, and the mean absolute percentage error (MAPE) was 0.003%, 0.025% and 0.091% in each class, respectively, showing the best accuracy compared with other models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF).
- Author
-
Delgado-Panadero, Ángel, Benítez-Andrades, José Alberto, and García-Ordás, María Teresa
- Subjects
DISTRIBUTED algorithms ,DECISION trees - Abstract
Tree ensemble algorithms as RandomForest and GradientBoosting are currently the dominant methods for modeling discrete or tabular data, however, they are unable to perform a hierarchical representation learning from raw data as NeuralNetworks does thanks to its multi-layered structure, which is a key feature for DeepLearning problems and modeling unstructured data. This limitation is due to the fact that tree algorithms can not be trained with back-propagation because of their mathematical nature. However, in this work, we demonstrate that the mathematical formulation of bagging and boosting can be combined together to define a graph-structured-tree-ensemble algorithm with a distributed representation learning process between trees naturally (without using back-propagation). We call this novel approach Distributed Gradient Boosting Forest (DGBF) and we demonstrate that both RandomForest and GradientBoosting can be expressed as particular graph architectures of DGBT. Finally, we see that the distributed learning outperforms both RandomForest and GradientBoosting in 7 out of 9 datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.