4,715 results on '"BOOSTING algorithms"'
Search Results
2. Machine learning for zombie hunting: predicting distress from firms' accounts and missing values.
- Author
-
Bargagli-Stoffi, Falco J, Incerti, Fabio, Riccaboni, Massimo, and Rungi, Armando
- Subjects
BOOSTING algorithms ,MACHINE learning ,DISCLOSURE laws ,ACCOUNTING firms ,BANKRUPTCY - Abstract
In this contribution, we propose machine learning techniques to predict zombie firms. First, we derive the risk of failure by training and testing our algorithms on disclosed financial information and nonrandom missing values of 304,906 firms active in Italy from 2008 to 2017. We then identify the highest financial distress conditional on predictions that lie above a threshold for which a combination of the false positive rate (false prediction of firm failure) and the false negative rate (false prediction of active firms) is minimized. Therefore, we identify zombies as firms that remain in financial distress, i.e. whose forecasts fall into the risk category above the threshold for at least three consecutive years. To this end, we implement a gradient boosting algorithm (XGBoost) that exploits information about missing values. The inclusion of missing values in our prediction model is crucial because patterns of undisclosed accounts are correlated with firm failure. Finally, we show that our preferred machine learning algorithm outperforms (i) proxy models such as Z -scores and the distance-to-default, (ii) traditional econometric methods, and (iii) other widely used machine learning techniques. We provide evidence that zombies are less productive and smaller on average and that they tend to increase in times of crisis. Finally, we argue that our application can help financial institutions and public authorities design evidence-based policies—e.g. optimal bankruptcy laws and information disclosure policies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Prediction of thermoelectric-figure-of-merit based on autoencoder and light gradient boosting machine.
- Author
-
Xu, Yingying, Liu, Xinyi, and Wang, Jifen
- Subjects
- *
THERMOELECTRIC generators , *MACHINE learning , *DEEP learning , *BOOSTING algorithms , *THERMOELECTRIC materials , *MATERIALS handling , *FORECASTING - Abstract
The evaluation of thermoelectric materials relies significantly on the thermoelectric figure of merit, ZT, which serves as a crucial parameter in assessing their properties. The accurate prediction of ZT values can be accomplished by utilizing machine learning models to learn material characteristics. However, factors such as the size of the dataset, model hyperparameters, and data quality can all impact the accuracy of machine learning. In contrast to previous research where high-dimensional features were simply discarded to transform them into low-dimensional ones, deep learning models such as autoencoder can extract more effective information. Therefore, in this article, the combination of autoencoders and the Light Gradient Boosting Machine (LightGBM) is employed to learn the chemical characteristics and ZT values of various materials. The reliability of the model was confirmed by achieving an R2 score of 0.94 during tenfold cross-validation. 130 000 materials were predicted and screened, the temperature dependence of the screened materials was studied in depth, and 13 materials with high ZT values were identified. Four of the 13 most promising candidates identified are existing thermoelectric materials, while nine are ideal candidates for future experimental studies and validation. This work utilizes autoencoders for extensive prediction and screening of promising materials, providing an effective approach for handling high-dimensional material data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Impact of Project Updates and Their Social Endorsement in Online Medical Crowdfunding.
- Author
-
Wu, Yi, Ye, Hua, Jensen, Matthew L., and Liu, Linwei
- Subjects
CROWD funding ,PANEL analysis ,SOCIAL influence ,INDIVIDUAL needs ,FUNDRAISING ,BOOSTING algorithms ,USER-generated content - Abstract
Online crowdfunding has become an important fundraising channel for medical care. Yet, individuals in need face numerous challenges in meeting their fundraising goals. To improve fundraising, individuals may use project updates to evoke donors' sympathy and secure donations. Informed by the sympathy bias literature, this paper conceptualizes two important but distinct aspects of project updates—positive sentiment and negative sentiment—and hypothesizes their individual and relative impacts on the donation amount of a crowdfunding project. In addition, this paper explores moderating effects of social endorsements on the influence of update sentiment. To test our hypotheses, we conducted two studies. Study 1 examined unique project-day panel data from 1,467 projects on a leading medical crowdfunding platform. Results reveal that both positive and negative update sentiment positively affect the donation amount of a crowdfunding project, but negative updates have a greater effect. Further, endorsements by strong ties and weak ties attenuate the positive effects of update sentiment. Study 2 was a controlled, randomized experiment that corroborated findings from Study 1, established the causality of observed effects, and confirmed the mediating effects of sympathy. The findings of this paper underscore the role of sympathy in enhancing online charitable crowdfunding to individual donors and show project updates with positive or negative sentiment to be potent mechanisms to boost donations. Furthermore, the substitution effect of social endorsement on donation amount also adds nuance to our knowledge suggesting that a proper combination of various information can better help improve crowdfunding performance. Platform operators and fundraisers should prioritize project update sentiment to enhance online charitable crowdfunding success. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Causal machine learning models for predicting low birth weight in midwife-led continuity care intervention in North Shoa Zone, Ethiopia.
- Author
-
Moges, Wudneh Ketema, Tegegne, Awoke Seyoum, Mitku, Aweke A., Tesfahun, Esubalew, and Hailemeskel, Solomon
- Subjects
- *
MACHINE learning , *LOW birth weight , *BOOSTING algorithms , *MEDICAL sciences , *POSTNATAL care - Abstract
Background: Low birth weight (LBW) is a critical global health issue that affects infants disproportionately, particularly in developing countries. This study adopted causal machine learning (CML) algorithms for predicting LBW in newborns, drawing from midwife-led continuity care (MLCC). Methods: A quasi-experimental study was carried out in the North Shoa Zone of Ethiopia from August 2019 to September 2020. A total of 1166 women were allocated into two groups. The first group, the MLCC group, received all their antenatal, labor, birth, and immediate post-natal care from a single midwife. The second group received care from various staff members at different times throughout their pregnancy and childbirth. In this study, CML was implemented to predict LBW. Data preprocessing, including data cleaning, was conducted. CML was then employed to identify the most suitable classifier for predicting LBW. Gradient boosting algorithms were used to estimate the causal effect of MLCC on LBW. Moreover, meta-learner algorithms were utilized to estimate the individual treatment effect (ITE), the average treatment effect (ATE), and performance. Moreover, meta-learner algorithms were utilized to estimate the individual treatment effect (ITE), the average treatment effect (ATE), and performance. Results: The study results revealed that Causal K-Nearest Neighbors (CKNN) was the most effective classifier based on accuracy and estimated LBW using a 94.52% accuracy, 90.25% precision, 92.57% recall, and an F1 score of 88.2%. Meconium aspiration, perinatal mortality, pregnancy-induced hypertension, vacuum babies in need of resuscitation, and previous surgeries on their reproductive organs were identified as the top five features affecting LBW. The estimated impact of MLCC versus other professional groups on LBW was analyzed using gradient boosting algorithms and was found to be 0.237. The estimated ATE for the S-learner was 0.284, which is lower than the true ATE of 0.216. Additionally, the estimated ITE for both the T-learner and X-learner was less than -0.5, indicating that mothers would not choose to participate in the MLCC program. Conclusions: Based on these findings, the CKNN classifier demonstrated a higher accuracy and effectiveness. The S-learner and R-learner models, utilizing the XGBoost Regressor and BaseSRegressor, provided accurate estimations of ITE for assessing the impact of the MLCC program. Promoting the MLCC program could help stabilize LBW outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
6. Machine learning for predicting severe dengue in Puerto Rico.
- Author
-
Madewell, Zachary J., Rodriguez, Dania M., Thayer, Maile B., Rivera-Amill, Vanessa, Paz-Bailey, Gabriela, Adams, Laura E., and Wong, Joshua M.
- Subjects
- *
CLINICAL decision support systems , *ENSEMBLE learning , *MACHINE learning , *BOOSTING algorithms , *ARTIFICIAL neural networks - Abstract
Background: Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico. Methods: We analyzed data from Puerto Rico's Sentinel Enhanced Dengue Surveillance System (May 2012–August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance. Results: Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0–98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4–6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5–98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%. Conclusions: ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models' applicability in resource-limited settings, where access to laboratory data may be limited. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
7. Predicting the hub interactome of COVID-19 and oral squamous cell carcinoma: uncovering ALDH-mediated Wnt/β-catenin pathway activation via salivary inflammatory proteins.
- Author
-
Yadalam, Pradeep Kumar, Arumuganainar, Deepavalli, Natarajan, Prabhu Manickam, and Ardila, Carlos M.
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *SALIVARY proteins , *SQUAMOUS cell carcinoma , *RANDOM forest algorithms , *BOOSTING algorithms - Abstract
Understanding shared pathways and mechanisms involved in the pathogenesis of diseases like oral squamous cell carcinoma (OSCC) and COVID-19 could lead to the development of novel therapeutic strategies and diagnostic biomarkers. This study aims to predict the interactome of OSCC and COVID-19 based on salivary inflammatory proteins. Datasets for OSCC and COVID-19 were obtained from https://www.salivaryproteome.org/differential-expression and selected for differential gene expression analysis. Differential gene expression analysis was performed using log transformation and a fold change of two. Hub proteins were identified using Cytoscape and Cytohubba, and machine learning algorithms including naïve Bayes, neural networks, gradient boosting, and random forest were used to predict hub genes. Top hub genes identified included ALDH1A1, MT-CO2, SERPINC1, FGB, and TF. The random forest model achieved the highest accuracy (93%) and class accuracy (84%). The naive Bayes model had lower accuracy (63%) and class accuracy (66%), while the neural network model showed 55% accuracy and class accuracy, possibly due to data pre-processing issues. The gradient boosting model outperformed all models with an accuracy of 95% and class accuracy of 95%. Salivary proteomic interactome analysis revealed novel hub proteins as potential common biomarkers. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. EpiForecaster: a novel deep learning ensemble optimization approach to combining forecasts for emerging epidemic outbreaks.
- Author
-
Soto-Ferrari, Milton, Carrasco-Pena, Alejandro, and Prieto, Diana
- Subjects
- *
LONG short-term memory , *BOOSTING algorithms , *DEEP learning , *CONVOLUTIONAL neural networks , *MONKEYPOX - Abstract
This study introduces EpiForecaster, a novel deep-learning ensemble method designed to improve the accuracy of epidemic forecasts during rapidly evolving outbreaks. Using weekly case data from the 2022 Mpox outbreak in Brazil, the USA, Mexico, the UK, and France, spanning from July 10 to October 9, we experiment with combinations of four state-of-the-art deep learning models: Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, Bi-directional LSTMs (Bi-LSTM), and hybrid CNN-LSTM architectures. Through extensive hyperparameter tuning, each model generates individual stacked forecasts, which are subsequently combined using various meta-model strategies. The primary ensemble approach, EpiForecaster, optimizes model weights using the Limited-memory Broyden–Fletcher–Goldfarb–Shanno with Bounds (L-BFGS-B) algorithm, dynamically assigning weights based on each model's performance. Comparative analyses included Gradient Boosting and Genetic Algorithm ensembles, with the Equal-Weight ensemble serving as a baseline. Performance evaluation based on Mean Squared Error (MSE) and Mean Absolute Error (MAE) showed that EpiForecaster consistently enhanced forecast accuracy, challenging the traditional view that larger ensembles uniformly yield better forecasts. EpiForecaster presents a scalable and effective solution for managing the unpredictability of non-stationary data, demonstrating its value in building reliable ensemble models for a variety of predictive applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
9. Statistical feature engineering using discrete wavelet transform: a tool for improving the prediction of compressional and shear sonic travel time logs.
- Author
-
Kaleem, Waquar, Khan, Kashan Ahmed, and Saxena, Amit
- Subjects
- *
DISCRETE wavelet transforms , *TRAVEL time (Traffic engineering) , *BOOSTING algorithms , *HYDROCARBON reservoirs , *MACHINE learning - Abstract
This paper aims to predict compressional and shear sonic travel time logs using conventional logs that are easier and less expensive to acquire during the development cycle. The study uses a gradient boosting algorithm and incorporates a novel method of using statistical feature engineering with wavelet transform to improve prediction accuracy across thin beds in hydrocarbon reservoirs. The approach uses a discrete wavelet transform over the neutron log, a prominent feature, to identify thin layers and enhance prediction accuracy. The detailed coefficients are analyzed using Daubechies and Haar wavelets to reconstruct newly transformed data with an enhanced thin-layer signal. The Haar wavelet with three levels is the most optimum wavelet and decomposition levels. The algorithm for the reconstructed log shows an increased thin layer resolution, with an accuracy improvement of 6.8% for the prediction. The proposed approach significantly contributes to geologists, geophysicists, and reservoir technologists for reservoir characterization and safe drilling operations. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
10. A machine learning model to predict liver-related outcomes after the functional cure of chronic hepatitis B.
- Author
-
Hur, Moon Haeng, Yip, Terry Cheuk-Fung, Kim, Seung Up, Lee, Hyun Woong, Lee, Han Ah, Lee, Hyung-Chul, Wong, Grace Lai-Hung, Wong, Vincent Wai-Sun, Park, Jun Yong, Ahn, Sang Hoon, Kim, Beom Kyung, Kim, Hwi Young, Seo, Yeon Seok, Shin, Hyunjae, Park, Jeayeon, Ko, Yunmi, Park, Youngsu, Lee, Yun Bin, Yu, Su Jong, and Lee, Sang Hyub
- Subjects
- *
MACHINE learning , *HEPATITIS associated antigen , *BOOSTING algorithms , *CHRONIC hepatitis B , *ARTIFICIAL intelligence - Abstract
The risk of hepatocellular carcinoma (HCC) and hepatic decompensation persists after hepatitis B surface antigen (HBsAg) seroclearance. This study aimed to develop and validate a machine learning model to predict the risk of liver-related outcomes (LROs) following HBsAg seroclearance. A total of 4,787 consecutive patients who achieved HBsAg seroclearance between 2000 and 2022 were enrolled from six centers in South Korea and a territory-wide database in Hong Kong, comprising the training (n = 944), internal validation (n = 1,102), and external validation (n = 2,741) cohorts. Three machine learning-based models were developed and compared in each cohort. The primary outcome was the development of any LRO, including HCC, decompensation, and liver-related death. During a median follow-up of 55.2 (IQR 30.1–92.3) months, 123 LROs were confirmed (1.1%/person-year) in the Korean cohort. The model with the best predictive performance in the training cohort was selected as the final model (designated as PLAN-B-CURE), which was constructed using a gradient boosting algorithm and seven variables (age, sex, diabetes, alcohol consumption, cirrhosis, albumin, and platelet count). Compared to previous HCC prediction models, PLAN-B-CURE showed significantly superior accuracy in the training cohort (c-index: 0.82 vs. 0.63–0.70, all p <0.001; area under the receiver-operating characteristic curve: 0.86 vs. 0.62–0.72, all p <0.01; area under the precision-recall curve: 0.53 vs. 0.13–0.29, all p <0.01). PLAN-B-CURE showed a reliable calibration function (Hosmer–Lemeshow test p >0.05) and these results were reproduced in the internal and external validation cohorts. This novel machine learning model consisting of seven variables provides reliable risk prediction of LROs after HBsAg seroclearance that can be used for personalized surveillance. Using large-scale multinational data, we developed a machine learning model to predict the risk of liver-related outcomes (i.e., hepatocellular carcinoma, decompensation, and liver-related death) after the functional cure of chronic hepatitis B (CHB). The new model named PLAN-B-CURE was constructed using seven variables (age, sex, alcohol consumption, diabetes, cirrhosis, serum albumin, and platelet count) and a gradient boosting machine algorithm, and it demonstrated significantly better predictive accuracy than previous models in both the training and validation cohorts. The inclusion of diabetes and significant alcohol intake as model inputs suggests the importance of metabolic risk factor management after the functional cure of CHB. Using seven readily available clinical factors, PLAN-B-CURE, the first machine learning-based model for risk prediction after the functional cure of CHB, may serve as a basis for individualized risk stratification. [Display omitted] • A new machine learning model was developed to predict the risk of liver-related outcomes after the functional cure of CHB. • PLAN-B-CURE incorporated 7 variables: age, sex, alcohol consumption, diabetes, cirrhosis, serum albumin, and platelet count. • PLAN-B-CURE demonstrated significantly superior predictive accuracy to previous models in both training and validation cohorts. • PLAN-B-CURE low-risk group showed a significantly lower 5-year incidence of liver-related outcomes than the other groups. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. A backtracking search-based extreme gradient boosting algorithm for soil moisture prediction using meteorological variables.
- Author
-
Emami, Hojjat, Emami, Somayeh, and Rezaverdinejad, Vahid
- Subjects
- *
MACHINE learning , *BOOSTING algorithms , *SOIL moisture , *ECOSYSTEM management , *SEARCH algorithms - Abstract
Developing effective soil moisture estimating systems can provide substantial information for various applications including precision agriculture and ecosystem management. This highlights the need to use data mining and machine learning algorithms to estimate soil moisture accurately. For this purpose, an efficient backtracking search-based extreme gradient boosting algorithm (BS-XGB) algorithm is presented for soil moisture estimation. The incentive mechanism of the proposed BS-XGB is tuning the hyper-parameters of the extreme gradient boosting optimally by incorporating the backtracking search algorithm, which significantly improves the prediction performance. The proposed algorithm is evaluated on a benchmark dataset containing daily soil moisture parameters in four depths 10, 25, 50, and 100 cm sampled from the Kingston station in the United States of America. The results indicated that the BS-XGB model achieved impressive performance in estimating soil moisture, with an R² of 0.999 for the training dataset and an R² of 0.973 for the testing dataset. Comparing the results of BS-XGB with those of the counterpart algorithms proved its superiority in terms of statistical metrics. The feature importance analysis suggested that the variables of soil temperature, relative humidity, minimum temperature, and solar radiation are the most important factors in soil moisture prediction. The results reveal that the proposed model with a high degree of confidence can be used as a qualified alternative to predict soil moisture and save time and cost. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. Enhancing Cold Joint Shear Strength Prediction in Concrete Structures: Novel Approach with Ensemble Spiking Neural Networks.
- Author
-
Barkhordari, Mohammad Sadegh
- Subjects
ARTIFICIAL neural networks ,CONCRETE construction ,BOOSTING algorithms ,MACHINE learning ,ENSEMBLE learning ,PRECAST concrete ,SHEAR strength ,CIVIL engineering - Abstract
Cold joints often appear in precast structures, bridges, and retrofitted buildings, where concrete parts cast at different times meet. The potential of these cold joints to transfer shear stresses between concrete interfaces severely affects the overall structural integrity. Therefore, when developing or evaluating precast and retrofitted structures, it is crucial to comprehend the shear force transfer capability of cold joints. This research explores the application of ensemble spiking neural network models for predicting interface shear strength in concrete structures, a crucial parameter in civil engineering. The study utilizes a database of 217 cold joints, categorized by surface type (smooth or roughened), and employs a range of input parameters, including concrete strength, reinforcement characteristics, and interface dimensions, among others. Three ensemble learning techniques, namely, model averaging, separated stacking, integrated stacking, and local cascade ensemble, are employed, with spiking neural networks serving as base learners. The proposed models are compared with established machine learning algorithms, including eXtreme gradient boosting, gradient boosting, random forests, AdaBoost, and bagging. Results indicate that the stacked separate models with the bagging regressor algorithm outperforms other models, achieving the lowest RMSE, competitive mean absolute error, and a high R2 score on the testing set. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. An hourly solar radiation prediction model using eXtreme gradient boosting algorithm with the effect of fog-haze.
- Author
-
Chunxiao Zhang, Yingbo Zhang, Jihong Pu, Zhengguang Liu, Zhanwei Wang, and Lin Wang
- Subjects
AIR quality indexes ,SOLAR radiation ,GLOBAL radiation ,BOOSTING algorithms ,SOLAR energy - Abstract
Hourly global solar radiation data is an important factor for solar energy utilization. Due to the lack of solar radiation observation stations in many areas, some hourly solar radiation models are proposed to predict hourly solar radiation. However, the existing models perform poorly in heavy fog-haze areas because the weakening effect of fog-haze on solar radiation is not considered. Thus, in this paper, hourly global solar radiation prediction models are developed considering air quality index (AQI) using XGBoost algorithm. The results show a general improvement in the accuracy of models with AQI as an additional input (Model B1-B6) compared to models that do not consider AQI (Model A1-A6). Compared to Model A, Model B have an increase in R value from 0.927 to 0.948, a decrease in RMSE value from 0.300 to 0.282 and a decrease in MAPE value from 0.159 to 0.145. In addition, for hourly solar radiation prediction, the six most important inputs are the day of the year, air temperature difference, surface temperature difference, hour, AQI, and total cloud cover. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. Predicting the Risk of Maxillary Canine Impaction Based on Maxillary Measurements Using Supervised Machine Learning.
- Author
-
de Araujo, Cristiano Miranda, Freitas, Pedro Felipe de Jesus, Ferraz, Aline Xavier, Andreis, Patricia Kern Di Scala, Meger, Michelle Nascimento, Baratto‐Filho, Flares, Augusto Rodenbusch Poletto, Cesar, Küchler, Erika Calvano, Camargo, Elisa Souza, and Schroder, Angela Graciela Deliga
- Subjects
SUPERVISED learning ,CONE beam computed tomography ,MACHINE learning ,BOOSTING algorithms ,SUPPORT vector machines - Abstract
Objectives: To predict palatally impacted maxillary canines based on maxilla measurements through supervised machine learning techniques. Materials and Methods: The maxilla images from 138 patients were analysed to investigate intermolar width, interpremolar width, interpterygoid width, maxillary length, maxillary width, nasal cavity width and nostril width, obtained through cone beam computed tomography scans. The predictive models were built using the following machine learning algorithms: Adaboost Classifier, Decision Tree, Gradient Boosting Classifier, K‐Nearest Neighbours (KNN), Logistic Regression, Multilayer Perceptron Classifier (MLP), Random Forest Classifier and Support Vector Machine (SVM). A 5‐fold cross‐validation approach was employed to validate each model. Metrics such as area under the curve (AUC), accuracy, recall, precision and F1 Score were calculated for each model, and ROC curves were constructed. Results: The predictive model included four variables (two dental and two skeletal measurements). The interpterygoid width and nostril width showed the largest effect sizes. The Gradient Boosting Classifier algorithm exhibited the best metrics, with AUC values ranging from 0.91 [CI95% = 0.74–0.98] for test data to 0.89 [CI95% = 0.86–0.94] for crossvalidation. The nostril width variable demonstrated the highest importance across all tested algorithms. Conclusion: The use of maxillary measurements, through supervised machine learning techniques, is a promising method for predicting palatally impacted maxillary canines. Among the models evaluated, both the Gradient Boosting Classifier and the Random Forest Classifier demonstrated the best performance metrics, with accuracy and AUC values exceeding 0.8, indicating strong predictive capability. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
15. Efficient diagnosis of diabetes mellitus using an improved ensemble method.
- Author
-
Olorunfemi, Blessing Oluwatobi, Ogunde, Adewale Opeoluwa, Almogren, Ahmad, Adeniyi, Abidemi Emmanuel, Ajagbe, Sunday Adeola, Bharany, Salil, Altameem, Ayman, Rehman, Ateeq Ur, Mehmood, Asif, and Hamam, Habib
- Subjects
- *
RANDOM forest algorithms , *FEATURE selection , *ARTIFICIAL intelligence , *REGRESSION trees , *IMAGE processing , *BOOSTING algorithms - Abstract
Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods—XG Boost, AdaBoostM1, and Gradient Boosting—using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. Research on Transformer Temperature Early Warning Method Based on Adaptive Sliding Window and Stacking.
- Author
-
Zhang, Pan, Zhang, Qian, Hu, Huan, Hu, Huazhi, Peng, Runze, and Liu, Jiaqi
- Subjects
ENSEMBLE learning ,MACHINE learning ,RANDOM forest algorithms ,DEBYE temperatures ,GENERALIZATION ,BOOSTING algorithms - Abstract
This paper proposes a transformer temperature early warning method based on an adaptive sliding window and stacking ensemble learning algorithm, aiming to improve the accuracy and robustness of temperature prediction. The transformer temperature early warning system is crucial for ensuring the safe operation of the power system, and temperature prediction, as the foundation of early warning, directly affects the early warning effectiveness. This paper analyzes the characteristics of transformer temperature using support vector regression, random forest, and gradient boosting regression as base learners and ridge regression as the meta-learner to construct a stacking model. At the same time, Bayesian optimization is used to automatically adjust the sliding window size, achieving adaptive sliding window processing. The experimental results indicate that the temperature prediction method based on adaptive sliding window and stacking significantly reduces prediction errors, enhances the model's adaptability and generalization ability, and provides more reliable technical support for transformer fault warning. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
17. Predictive models and determinants of mortality among T2DM patients in a tertiary hospital in Ghana, how do machine learning techniques perform?
- Author
-
Kpene, Godsway Edem, Lokpo, Sylvester Yao, and Darfour-Oduro, Sandra A.
- Subjects
- *
DIABETES complications , *RISK assessment , *BOOSTING algorithms , *SEX distribution , *SMOKING , *TERTIARY care , *RETROSPECTIVE studies , *AGE distribution , *FAMILY history (Medicine) , *TYPE 2 diabetes , *MEDICAL records , *ACQUISITION of data , *MARITAL status , *ALCOHOL drinking , *EDUCATIONAL attainment , *EMPLOYMENT , *COMORBIDITY ,MORTALITY risk factors - Abstract
Background: The increasing prevalence of type 2 diabetes mellitus (T2DM) in lower and middle – income countries call for preventive public health interventions. Studies from Africa including those from Ghana, consistently reveal high T2DM-related mortality rates. While previous research in the Ho municipality has primarily examined risk factors, comorbidity, and quality of life of T2DM patients, this study specifically investigated mortality predictors among these patients. Method: The study was retrospective involving medical records of T2DM patients. Data extracted included mortality outcome (dead or alive), sociodemographic characteristics (age, sex, marital status, educational level, occupation and location), family history of diseases (diabetes, cardiovascular disease (CVD), or asthma), lifestyle (smoking and alcohol intake), comorbidities (such as skin infections, sickle cell disease, urinary tract infections, and pneumonia) and complications of diabetes (CVD, nephropathy, neuropathy, foot ulcers, and diabetic ketoacidosis) were analyzed using Stata version 16.0 and Python 3.6.1 programming language. Both descriptive and inferential statistics were done to describe and build predictive models respectively. The performance of machine learning (ML) techniques such as support vector machine (SVM), decision tree, k nearest neighbor (kNN), eXtreme Gradient Boosting (XGBoost) and logistic regression were evaluated using the best-fitting predictive model for T2DM mortality. Results: Of the 328 participants, 183 (55.79%) were female, and the percentage of mortality was 11.28%. A 100% mortality was recorded among the T2DM patients with sepsis (p-value = 0.012). T2DM in-patients were 3.83 times as likely to die [AOR = 3.83; 95% CI: (1.53–9.61)] if they had nephropathy compared to T2DM in-patients without nephropathy (p-value = 0.004). The full model which included sociodemographic characteristics, family history, lifestyle variables and complications of T2DM had the best prediction of T2DM mortality outcome (ROC = 72.97%). The accuracy for (test and train datasets) were as follows: (90% and 90%), (100% and 100%), (90% and 90%), (90% and 88%) and (88% and 90%) respectively for the various ML classification techniques: logistic regression, Decision tree classifier, kNN classifier, SVM and XGBoost. Conclusion: This study found that all in-patients with sepsis died. Nephropathy was the identified significant predictor of T2DM mortality. Decision tree classifier provided the best classifying potential. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. Early childhood caries risk prediction using machine learning approaches in Bangladesh.
- Author
-
Hasan, Fardous, Tantawi, Maha El, Haque, Farzana, Foláyan, Moréniké Oluwátóyìn, and Virtanen, Jorma I.
- Subjects
RISK assessment ,RANDOM forest algorithms ,BOOSTING algorithms ,PREDICTION models ,RECEIVER operating characteristic curves ,SECONDARY analysis ,QUESTIONNAIRES ,DISEASE prevalence ,AGE distribution ,SUPPORT vector machines ,HEALTH behavior ,PSYCHOLOGY of mothers ,DENTAL plaque ,DENTAL caries ,EARLY diagnosis ,MACHINE learning ,TOOTH care & hygiene ,ALGORITHMS ,EDUCATIONAL attainment ,DISEASE risk factors ,CHILDREN - Abstract
Background: In the last years, artificial intelligence (AI) has contributed to improving healthcare including dentistry. The objective of this study was to develop a machine learning (ML) model for early childhood caries (ECC) prediction by identifying crucial health behaviours within mother-child pairs. Methods: For the analysis, we utilized a representative sample of 724 mothers with children under six years in Bangladesh. The study utilized both clinical and survey data. ECC was assessed using ICDAS II criteria in the clinical examinations. Recursive Feature Elimination (RFE) and Random Forest (RF) was applied to identify the optimal subsets of features. Random forest classifier (RFC), extreme gradient boosting (XGBoost), support vector machine (SVM), adaptive boosting (AdaBoost), and multi-layer perceptron (MLP) models were used to identify the best fitted model as the predictor of ECC. SHAP and MDG-MDA plots were visualized for model interpretability and identify significant predictors. Results: The RFC model identified 10 features as the most relevant for ECC prediction obtained by RFE feature selection method. The features were: plaque score, age of child, mother's education, number of siblings, age of mother, consumption of sweet, tooth cleaning tools, child's tooth brushing frequency, helping child brushing, and use of F-toothpaste. The final ML model achieved an AUC-ROC score (0.77), accuracy (0.72), sensitivity (0.80) and F1 score (0.73) in the test set. Of the prediction model, dental plaque was the strongest predictor of ECC (MDG: 0.08, MDA: 0.10). Conclusions: Our final ML model, integrating 10 key features, has the potential to predict ECC effectively in children under five years. Additional research is needed for validation and optimization across various groups. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
19. Predictive Mortality and Gastric Cancer Risk Using Clinical and Socio-Economic Data: A Nationwide Multicenter Cohort Study.
- Author
-
Kang, Seong Uk, Nam, Seung-Joo, Kwon, Oh Beom, Yim, Inhyeok, Kim, Tae-Hoon, Yeo, Na Young, Lim, Myoung Nam, Kim, Woo Jin, and Park, Sang Won
- Subjects
- *
RISK assessment , *LIFESTYLES , *RANDOM forest algorithms , *BOOSTING algorithms , *LYMPH nodes , *STOMACH tumors , *PREDICTION models , *HUMAN services programs , *RECEIVER operating characteristic curves , *RESEARCH funding , *SOCIOECONOMIC factors , *SMOKING , *CAUSES of death , *DESCRIPTIVE statistics , *LONGITUDINAL method , *RESEARCH , *MACHINE learning , *ALCOHOL drinking , *DATA analysis software , *PROPORTIONAL hazards models , *DIABETES , *SENSITIVITY & specificity (Statistics) , *DISEASE risk factors - Abstract
Simple Summary: Gastric cancer (GC) affects more than one million and is the fifth most frequently diagnosed cancer and the fourth leading cause of cancer death worldwide. Past studies have usually focused on a limited number of clinical or demographic factors to predict GC prognosis. In contrast, we used twenty-four features, including demographic, laboratory, clinical, and socio-economic information, to predict GC mortality. We investigated two case groups divided by cause of mortality (all-cause and disease-specific) with the construction of six machine learning (ML) models. In addition, the Shapley Additive Explanation (SHAP) method, an explainable artificial intelligence technique, was used. This approach allows us to identify and interpret the key features that have a significant impact on GC mortality. Key predictors of the mortality classification model included occurrence in other organs, age at diagnosis, AJCC7 stage, tumor size, CEA, smoking, and CA19-9. Accurate prediction of mortality and detection of risk factors for GC based on ML might provide opportunities for appropriate therapeutic interventions and decision-making. Background/Objectives: Gastric cancer is a leading cause of cancer-related mortality, particularly in East Asia, with a notable burden in Republic of Korea. This study aimed to construct and develop machine learning models for the prediction of gastric cancer mortality and the identification of risk factors. Methods: All data were acquired from the Korean Clinical Data Utilization for Research Excellence by multiple medical centers in South Korea. A total of 23,717 gastric cancer patients were divided into two groups by cause of mortality (all-cause of 2664 and disease-specific of 1620) and investigated. We used comprehensive data integrating clinical, pathological, lifestyle, and socio-economic factors. Cox proportional hazards analysis was conducted to estimate hazard ratios for mortality. Five machine learning models (random forest, gradient boosting machine, XGBoost, light GBM, and cat boosting) were developed to predict mortality. The models were interpreted by SHAP, one of the explainable AI techniques. Results: For all-cause mortality, the gradient-boosting machine learning model demonstrated the highest performance with an AUC-ROC of 0.795. For disease-specific mortality, the light GBM model outperformed others, achieving an AUC-ROC of 0.867. Significant predictors included the AJCC7 stage, tumor size, lymph node count, and lifestyle factors such as smoking, drinking, and diabetes. Conclusions: This study underscores the importance of integrating both clinical and lifestyle data to enhance mortality prediction accuracy in gastric cancer patients. The findings highlight the need for personalized treatment approaches in the Korean population and emphasize the role of demographic-specific data in predictive modeling. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
20. Price Prediction for Fresh Agricultural Products Based on a Boosting Ensemble Algorithm.
- Author
-
Zhang, Nana, An, Qi, Zhang, Shuai, and Ma, Huanhuan
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *ENSEMBLE learning , *FARM produce prices , *SUSTAINABLE agriculture , *BOOSTING algorithms - Abstract
The time series of agricultural prices exhibit brevity and considerable volatility. Considering that traditional time series models and machine learning models are facing challenges in making predictions with high accuracy and robustness, this paper proposes a Light gradient boosting machine model based on the boosting ensemble learning algorithm to predict prices for three representative types of fresh agricultural products (bananas, beef, crucian carp). The prediction performance of the Light gradient boosting machine model is evaluated by comparing it against multiple benchmark models (ARIMA, decision tree, random forest, support vector machine, XGBoost, and artificial neural network) in terms of accuracy, generalizability, and robustness on different datasets and under different time windows. Among these models, the Light gradient boosting machine model is shown to have the highest prediction accuracy and the most stable performance across three different datasets under both long-term and short-term time windows. As the time window length increases, the Light gradient boosting machine model becomes more advantageous for effectively reducing error fluctuation, demonstrating better robustness. Consequently, the model proposed in this paper holds significant potential for forecasting fresh agricultural product prices, thereby facilitating the advancement of precision and sustainable farming practices. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. Combining Postural Sway Parameters and Machine Learning to Assess Biomechanical Risk Associated with Load-Lifting Activities.
- Author
-
Prisco, Giuseppe, Pirozzi, Maria Agnese, Santone, Antonella, Cesarelli, Mario, Esposito, Fabrizio, Gargiulo, Paolo, Amato, Francesco, and Donisi, Leandro
- Subjects
- *
MACHINE learning , *BOOSTING algorithms , *WEIGHT lifting , *ARTIFICIAL intelligence , *LUMBOSACRAL region - Abstract
Background/Objectives: Long-term work-related musculoskeletal disorders are predominantly influenced by factors such as the duration, intensity, and repetitive nature of load lifting. Although traditional ergonomic assessment tools can be effective, they are often challenging and complex to apply due to the absence of a streamlined, standardized framework. Recently, integrating wearable sensors with artificial intelligence has emerged as a promising approach to effectively monitor and mitigate biomechanical risks. This study aimed to evaluate the potential of machine learning models, trained on postural sway metrics derived from an inertial measurement unit (IMU) placed at the lumbar region, to classify risk levels associated with load lifting based on the Revised NIOSH Lifting Equation. Methods: To compute postural sway parameters, the IMU captured acceleration data in both anteroposterior and mediolateral directions, aligning closely with the body's center of mass. Eight participants undertook two scenarios, each involving twenty consecutive lifting tasks. Eight machine learning classifiers were tested utilizing two validation strategies, with the Gradient Boost Tree algorithm achieving the highest accuracy and an Area under the ROC Curve of 91.2% and 94.5%, respectively. Additionally, feature importance analysis was conducted to identify the most influential sway parameters and directions. Results: The results indicate that the combination of sway metrics and the Gradient Boost model offers a feasible approach for predicting biomechanical risks in load lifting. Conclusions: Further studies with a broader participant pool and varied lifting conditions could enhance the applicability of this method in occupational ergonomics. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
22. A Hybrid Model for Psoriasis Subtype Classification: Integrating Multi Transfer Learning and Hard Voting Ensemble Models †.
- Author
-
Avcı, İsmail Anıl, Zirekgür, Merve, Karakaya, Barış, and Demir, Betül
- Subjects
- *
ENSEMBLE learning , *BOOSTING algorithms , *DATA augmentation , *K-nearest neighbor classification , *BLENDED learning - Abstract
Background: Psoriasis is a chronic, immune-mediated skin disease characterized by lifelong persistence and fluctuating symptoms. The clinical similarities among its subtypes and the diversity of symptoms present challenges in diagnosis. Early diagnosis plays a vital role in preventing the spread of lesions and improving patients' quality of life. Methods: This study proposes a hybrid model combining multiple transfer learning and ensemble learning methods to classify psoriasis subtypes accurately and efficiently. The dataset includes 930 images labeled by expert dermatologists from the Dermatology Clinic of Fırat University Hospital, representing four distinct subtypes: generalized, guttate, plaque, and pustular. Class imbalance was addressed by applying synthetic data augmentation techniques, particularly for the rare subtype. To reduce the influence of nonlesion environmental factors, the images underwent systematic cropping and preprocessing steps, such as Gaussian blur, thresholding, morphological operations, and contour detection. DenseNet-121, EfficientNet-B0, and ResNet-50 transfer learning models were utilized to extract feature vectors, which were then combined to form a unified feature set representing the strengths of each model. The feature set was divided into 80% training and 20% testing subsets and evaluated using a hard voting classifier consisting of logistic regression, random forest, support vector classifier, k-nearest neighbors, and gradient boosting algorithms. Results: The proposed hybrid approach achieved 93.14% accuracy, 96.75% precision, and an F1 score of 91.44%, demonstrating superior performance compared to individual transfer learning models. Conclusions: This method offers significant potential to enhance the classification of psoriasis subtypes in clinical and real-world settings. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
23. The Key Descriptors for Predicting the Exciton Binding Energy of Organic Photovoltaic Materials.
- Author
-
Zhu, Lingyun, Huang, Miaofei, Han, Guangchao, Wei, Zhixiang, and Yi, Yuanping
- Subjects
- *
MACHINE learning , *BOOSTING algorithms , *ENERGY dissipation , *OPTOELECTRONIC devices , *SOLAR cells - Abstract
Exciton binding energy (Eb) is a key parameter to determine the mechanism and performance of organic optoelectronic devices. Small Eb benefits to reduce the interfacial energy offset and the energy loss of organic solar cells. However, quantum‐chemical calculations of the Eb in solid state with considering electronic polarization effects are extremely time‐consuming. Furthermore, current studies lack critical descriptors. Here, we use data‐driven machine learning (ML) to accelerate the computation and identify the key descriptors most relevant to the solid‐state Eb. The results verify two key descriptors associated with molecular and aggregation‐state properties for efficient prediction of the solid‐state Eb. Moreover, a very high accuracy is achieved by using the extreme gradient boosting algorithm, with the Pearson's correlation coefficient of 0.92. Finally, we use this ML model to predict the Eb of thin films, which is difficult to achieve using the current quantum‐chemical calculations due to the large structural disorder. Remarkably, the predicted thin‐film Eb values are fully consistent with the results of temperature‐dependent photoluminescence spectra. Therefore, our work provides an accurate and efficient approach to predict the solid‐state Eb and would be helpful to accelerate the exploitation of novel promising organic photovoltaic materials. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
24. Improving Clinical Preparedness: Community Health Nurses and Early Hypoglycemia Prediction in Type 2 Diabetes Using Hybrid Machine Learning Techniques.
- Author
-
Gaikwad, Sachin Ramnath, Bontha, Mallikarjun Reddy, Devi, Seeta, and Dumbre, Dipali
- Subjects
- *
RANDOM forest algorithms , *BOOSTING algorithms , *COMMUNITY health nurses , *EARLY medical intervention , *PREDICTION models , *RESEARCH funding , *RECEIVER operating characteristic curves , *COMMUNITY health nursing , *INTERVIEWING , *DESCRIPTIVE statistics , *CHI-squared test , *BLOOD sugar , *TYPE 2 diabetes , *ARTIFICIAL neural networks , *DATA quality , *MACHINE learning , *COMPARATIVE studies , *HYPOGLYCEMIA , *ALGORITHMS , *REGRESSION analysis , *SYMPTOMS - Abstract
Objectives: The aim of the study was to analyze the data of diabetic patients regarding warning signs of hypoglycemia to predict it at an early stage using various novel machine learning (ML) algorithms. Individual interviews with diabetic patients were conducted over 6 months to acquire information regarding their experience with hypoglycemic episodes. Design: This information included warning signs of hypoglycemia, such as incoherent speech, exhaustion, weakness, and other clinically relevant cases of low blood sugar. Researchers used supervised, unsupervised, and hybrid techniques. In supervised techniques, researchers applied regression, while in hybrid classification ML techniques were used. In a 5‐fold cross‐validation approach, the prediction performance of seven models was examined using the area under the receiver operating characteristic curve (AUROC). We analyzed the data of 290 diabetic patients with low blood sugar episodes. Results: Our investigation discovered that gradient boosting and neural networks performed better in regression, with accuracies of 0.416 and 0.417, respectively. In classification models, gradient boosting, AdaBoost, and random forest performed better overall, with AUC scores of 0.821, 0.814, and 0.821, individually. Precision values were 0.779, 0.775, and 0.776 for gradient boosting, AdaBoost, and random forest, respectively. Conclusion: AdaBoost and Gradient Boosting models, in particular, outperformed all others in predicting the probability of clinically severe hypoglycemia. These techniques enable community health nurses to predict hypoglycemia at an early stage and provide the necessary therapies to patients to prevent complications resulting from hypoglycemia. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
25. Boosting the HP filter for trending time series with long-range dependence.
- Author
-
Biswas, Eva, Sabzikar, Farzad, and Phillips, Peter C. B.
- Subjects
- *
BOOSTING algorithms , *BROWNIAN motion , *FOURIER series , *TIME series analysis , *DETERMINISTIC processes - Abstract
This article extends recent asymptotic theory developed for the Hodrick Prescott (HP) filter and boosted HP (bHP) filter to long-range dependent time series that have fractional Brownian motion (fBM) limit processes after suitable standardization. Under general conditions, it is shown that the asymptotic form of the HP filter is a smooth curve, analogous to the finding in Phillips and Jin for integrated time series and series with deterministic drifts. Boosting the filter using the iterative procedure suggested in Phillips and Shi leads under well-defined rate conditions to a consistent estimate of the fBM limit process or the fBM limit process with an accompanying deterministic drift when that is present. A stopping criterion is used to automate the boosting algorithm, giving a data-determined method for practical implementation. The theory is illustrated in simulations and two real data examples that highlight the differences between simple HP filtering and the use of boosting. The analysis is assisted by employing a uniformly and almost surely convergent trigonometric series representation of fBM. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
26. Predicting dyslipidemia in Chinese elderly adults using dietary behaviours and machine learning algorithms.
- Author
-
Wang, Biying, Lin, Luotao, Wang, Wenjun, Song, Hualing, and Xu, Xianglong
- Subjects
- *
RISK assessment , *CROSS-sectional method , *BOOSTING algorithms , *RANDOM forest algorithms , *HYPERLIPIDEMIA , *LOGISTIC regression analysis , *DESCRIPTIVE statistics , *SUPPORT vector machines , *FOOD habits , *MACHINE learning , *DIET , *ALGORITHMS , *DISEASE risk factors - Abstract
We aimed to predict dyslipidemia risk in elderly Chinese adults using machine learning and dietary analysis for public health. This cross-sectional study includes 13,668 Chinese adults aged 65 or older from the 2018 Chinese Longitudinal Healthy Longevity Survey. Dyslipidemia prediction was carried out using a variety of machine learning algorithms, including Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Gaussian Naive Bayes (GNB), Gradient Boosting Machine (GBM), Adaptive Boosting Classifier (AdaBoost), Light Gradient Boosting Machine (LGBM), and K-Nearest Neighbour (KNN), as well as conventional logistic regression (LR). The prevalence of dyslipidemia among eligible participants was 5.4 %. LGBM performed best in predicting dyslipidemia, followed by LR, XGBoost, SVM, GBM, AdaBoost, RF, GNB, and KNN (all AUC > 0.70). Frequency of nut product consumption, childhood water source, and housing types were key predictors for dyslipidemia. Machine learning algorithms that integrated dietary behaviours accurately predicted dyslipidemia in elderly Chinese adults. Our research identified novel predictors such as the frequency of nut product consumption, the main source of drinking water during childhood, and housing types, which could potentially prevent and control dyslipidemia in elderly adults. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
27. Data preprocessing methods for selective sweep detection using convolutional neural networks.
- Author
-
Zhao, Hanqing and Alachiotis, Nikolaos
- Subjects
- *
CONVOLUTIONAL neural networks , *CLASSIFICATION algorithms , *POPULATION genetics , *ALGORITHMS , *PIXELS , *BOOSTING algorithms - Abstract
The identification of positive selection has been framed as a classification task, with Convolutional Neural Networks (CNNs) already outperforming summary statistics and likelihood-based approaches in accuracy. Despite the prevalence of CNN-based methods that manipulate the pixels of images representing raw genomic data as a preprocessing step to improve classification accuracy, the efficacy of these pixel-rearrangement techniques remains inadequately examined, particularly in the presence of confounding factors like population bottlenecks, migration and recombination hotspots. We introduce a set of pixel rearrangement algorithms aimed at enhancing CNN classification accuracy in detecting selective sweeps. These algorithms are employed to assess the performance of four CNN models for selective sweep detection. Our findings illustrate that the judicious application of rearrangement algorithms notably enhances the overall classification accuracy of a CNN across various datasets simulating confounding factors. We observed that sorting the columns of the genomic matrices has higher on CNN performance than rearranging the sequences. To some extent, these rearrangement algorithms are more robust to misspecified demographic models compared with the utilization of the default preprocessing algorithm as suggested by the respective authors of each CNN architecture. We provide the data rearrangement algorithms as a distinct package available for download at: https://github.com/Zhaohq96/Genetic-data-rearrangement. • Data rearrangement algorithms can boost the overall classification accuracy of CNNs in identifying selective sweeps. • To some extent, data rearrangement algorithms improve classification robustness to demographic model misspecification. • Suitable rearrangement algorithms per CNN are robust to varying genomic window sizes. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. Gait Pattern Recognition through Force Sensor Platform based on XGBoost Model and Harris' Hawks Optimization.
- Author
-
Lie Yu, Pengzhi Mei, and Lei Ding
- Subjects
- *
OPTIMIZATION algorithms , *SHOE soles , *DATA transmission systems , *BOOSTING algorithms , *MICROCONTROLLERS , *ALGORITHMS - Abstract
This study developed a gait pattern classification system based on ground contact forces measured by six force sensors embedded inside the shoe sole. The data transmission is facilitated via the Bluetooth module integrated into an STM32 microcontroller. The extreme gradient boosting (XGBoost) algorithm is used to identify the gait patterns, and the basic idea of XGBoost is to use second-order derivatives to make the loss function more precise, incorporate regularization to prevent tree overfitting, and enable block storage for parallel computation. By optimizing the XGBoost algorithm with four algorithms, the exploration capabilities of these algorithms are effectively incorporated into the fusion model. Experimental results indicate that the XGBoost algrithm optimized by Harris' hawks optimization (HHO) outperforms the other optimization algorithms. Specifically, the HHO-XGBoost achieved high values of 97.41%, 97.03%, and 97.22% severally in the metrics of precision, recall, and F1 score. This research illustrates the HHO-XGBoost method's superiority in gait phase recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2025
29. XGBoost Based Multiclass NLOS Channels Identification in UWB Indoor Positioning System.
- Author
-
Majeed, Ammar Fahem, Arsat, Rashidah, Baharudin, Muhammad Ariff, Abdul Latiff, Nurul Mu'azzah, and Albaidhani, Abbas
- Subjects
INDOOR positioning systems ,SEARCH algorithms ,LOCATION-based services ,GENETIC algorithms ,MACHINE learning ,BOOSTING algorithms - Abstract
Accurate non-line of sight (NLOS) identification technique in ultra-wideband (UWB) location-based services is critical for applications like drone communication and autonomous navigation. However, current methods using binary classification (LOS/NLOS) oversimplify real-world complexities, with limited generalisation and adaptability to varying indoor environments, thereby reducing the accuracy of positioning. This study proposes an extreme gradient boosting (XGBoost) model to identify multi-class NLOS conditions. We optimise the model using grid search and genetic algorithms. Initially, the grid search approach is used to identify the most favourable values for integer hyperparameters. In order to achieve an optimised model configuration, the genetic algorithm is employed to fine-tune the floating-point hyperparameters. The model evaluations utilise a wide-ranging dataset of real-world measurements obtained with a Qorvo DW1000 UWB device, covering various indoor scenarios. Experimental results show that our proposed XGBoost achieved the highest overall accuracy of 99.47%, precision of 99%, recall of 99%, and an F-score of 99% on an open-source dataset. Additionally, based on a local dataset, the model achieved the highest performance, with an accuracy of 96%, precision of 96%, recall of 97%, and an F-score of 97%. In contrast to current machine learning methods in the literature, the suggestion model enhances classification accuracy and effectively addresses the NLOS/LOS identification as a multiclass propagation channel. This approach provides a robust solution with generalisation and adaptability across various dataset types and environments for more reliable and accurate indoor positioning technologies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
30. Forestry climate adaptation with HarvesterSeasons service—a gradient boosting model to forecast soil water index SWI from a comprehensive set of predictors in Destination Earth.
- Author
-
Strahlendorff, Mikko, Kröger, Anni, Prakasam, Golda, Kosmale, Miriam, Moisander, Mikko, Ovaskainen, Heikki, and Poikela, Asko
- Subjects
MACHINE learning ,STANDARD deviations ,BOOSTING algorithms ,NUMERICAL weather forecasting ,LONG-range weather forecasting - Abstract
Soil wetness forecasts on a local level are needed to ensure sustainable forestry operations during summer when the soil is neither frozen nor covered with snow. Training gradient boosting models has been successful in predicting satellite observation-based products into the future using Numerical Weather Prediction (NWP) and Earth Observation (EO) climate data as inputs. The Copernicus Global Land Monitoring Service's Soil Water Index (SWI) satellite-based observations from 2015 to 2023 at 10,000 locations in Europe were used as the predictand (target parameter) to train an artificial intelligence (AI) model to predict soil wetness with XGBoost (eXtreme Gradient Boosting) and LightGBM (Light Gradient Boosting Machine) implementations of gradient boosting algorithms. The locations were selected as a representative set of points from the Land Use/Cover Area Frame Survey (LUCAS) sites, which helped evaluate the characteristics of distinct locations used in fitting to represent diverse landscapes across Europe. Over 40 predictors, mainly from ERA5-Land reanalysis, were used in the final model. Over 70 predictors were tested, including the climatology of EO based predictors like SWI and Leaf-Area Index (LAI). The final model achieved a mean absolute error of 5.5% and a root mean square error of 7% for variable values ranging from 0% to 100%, an accuracy sufficient for forestry use case. To further validate the model, SWI prediction was made using the 215-day seasonal forecast ensemble from April 2021, consisting of 51 members. With this, the quality could also be demonstrated in the way our forestry climate service (HarvesterSeasons.com) would use the forecasts. As soil wetness is not changing as rapidly as many weather parameters, the forecast skill appears to last longer for it than for the weather variables. The technology demonstration and machine learning work were conducted as a part of the HarvesterDestinE project, supported by European Union Destination Earth funding managed by the European Center for Medium-Range Weather Forecasts (ECMWF) contract DE_370d_FMI. The authors wish to acknowledge CSC – IT Center for Science, Finland, for computational resources. The code for the machine learning work and the predictions are available as open source at https://github.com/fmidev/ml-harvesterseasons (see README-SWI2). The training data and ML models are at https://destine.data.lit.fmi.fi/soilwater/. All data used for predictions are accessible from the SmartMet server at https://desm.harvesterseasons.com/grid-gui and the work flow is available in the script https://github.com/fmidev/harvesterseasons-smartmet/blob/master/bin/get-seasonal.sh Everything is made available for ensuring reproducibility. One will need to register and use their own https://cds.climate.copernicus.eu credentials for doing so. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
31. IDSSCNN-XgBoost: Improved Dual-Stream Shallow Convolutional Neural Network Based on Extreme Gradient Boosting Algorithm for Micro Expression Recognition.
- Author
-
Ahmad, Adnan, Li, Zhao, Tariq, Irfan, and He, Zhengran
- Subjects
ARTIFICIAL neural networks ,CONVOLUTIONAL neural networks ,BOOSTING algorithms ,SCIENCE databases ,DATABASES - Abstract
Micro-expressions (ME) recognition is a complex task that requires advanced techniques to extract informative features from facial expressions. Numerous deep neural networks (DNNs) with convolutional structures have been proposed. However, unlike DNNs, shallow convolutional neural networks often outperform deeper models in mitigating overfitting, particularly with small datasets. Still, many of these methods rely on a single feature for recognition, resulting in an insufficient ability to extract highly effective features. To address this limitation, in this paper, an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm (IDSSCNN-XgBoost) is introduced for ME Recognition. The proposed method utilizes a dual-stream architecture where motion vectors (temporal features) are extracted using Optical Flow TV-L1 and amplify subtle changes (spatial features) via Eulerian Video Magnification (EVM). These features are processed by IDSSCNN, with an attention mechanism applied to refine the extracted effective features. The outputs are then fused, concatenated, and classified using the XgBoost algorithm. This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information, supported by the robust classification power of XgBoost. The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database (CASMEII), Spontaneous Micro-Expression Database (SMIC-HS), and Spontaneous Actions and Micro-Movements (SAMM). Experimental results indicate that the proposed model can achieve outstanding results compared to recent models. The accuracy results are 79.01%, 69.22%, and 68.99% on CASMEII, SMIC-HS, and SAMM, and the F1-score are 75.47%, 68.91%, and 63.84%, respectively. The proposed method has the advantage of operational efficiency and less computational time. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
32. CSRWA: Covert and Severe Attacks Resistant Watermarking Algorithm.
- Author
-
Majeed, Balsam Dhyia, Taherinia, Amir Hossein, Yazdi, Hadi Sadoghi, and Harati, Ahad
- Subjects
DIGITAL watermarking ,CONVOLUTIONAL neural networks ,BOOSTING algorithms ,WATERMARKS ,ALGORITHMS - Abstract
Watermarking is embedding visible or invisible data within media to verify its authenticity or protect copyright. The watermark is embedded in significant spatial or frequency features of the media to make it more resistant to intentional or unintentional modification. Some of these features are important perceptual features according to the human visual system (HVS), which means that the embedded watermark should be imperceptible in these features. Therefore, both the designers of watermarking algorithms and potential attackers must consider these perceptual features when carrying out their actions. The two roles will be considered in this paper when designing a robust watermarking algorithm against the most harmful attacks, like volumetric scaling, histogram equalization, and non-conventional watermarking attacks like the Denoising Convolution Neural Network (DnCNN), which must be considered in watermarking algorithm design due to its rising role in the state-of-the-art attacks. The DnCNN is initialized and trained using watermarked image samples created by our proposed Covert and Severe Attacks Resistant Watermarking Algorithm (CSRWA) to prove its robustness. For this algorithm to satisfy the robustness and imperceptibility tradeoff, implementing the Dither Modulation (DM) algorithm is boosted by utilizing the Just Noticeable Distortion (JND) principle to get an improved performance in this sense. Sensitivity, luminance, inter and intra-block contrast are used to adjust the JND values. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
33. Efficient OpenMP Based Z-curve Encoding and Decoding Algorithms.
- Author
-
Zhou, Zicheng, Sun, Shaowen, Liang, Teng, Li, Mengjuan, and Xia, Fengling
- Subjects
BOOSTING algorithms ,ALGORITHMS ,DECODING algorithms ,ENCODING - Abstract
Z-curve's encoding and decoding algorithms are primely important in many Z-curve-based applications. The bit interleaving algorithm is the current state-of-the-art algorithm for encoding and decoding Z-curve. Although simple, its efficiency is hindered by the step-by-step coordinate shifting and bitwise operations. To tackle this problem, we first propose the efficient encoding algorithm LTFe and the corresponding decoding algorithm LTFd, which adopt two optimization methods to boost the algorithm's efficiency: 1) we design efficient lookup tables (LT) that convert encoding and decoding operations into table-lookup operations; 2) we design a bit detection mechanism that skips partial order of a coordinate or a Z-value with consecutive 0s in the front, avoiding unnecessary iterative computations. We propose order-parallel and point-parallel OpenMP-based algorithms to exploit the modern multi-core hardware. Experimental results on discrete, skewed, and real datasets indicate that our point-parallel algorithms can be up to 12.6× faster than the existing algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
34. Pseudo-static slope stability analysis using explainable machine learning techniques.
- Author
-
Waris, Kenue Abdul, Fayaz, Sheikh Junaid, Reddy, Alluri Harshith, and Basha, B. Munwar
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,MONTE Carlo method ,KRIGING ,ARTIFICIAL intelligence ,BOOSTING algorithms - Abstract
This research focuses on developing the optimal machine learning (ML) based predictive model for calculating the factor of safety (FS
MP ) for finite slopes using the Morgenstern-Price method of slices. The ML models utilize geometric and geotechnical parameters, including slope angle, friction angle, cohesion, slope height, unit weight, horizontal seismic acceleration coefficient, and the ratio of horizontal to vertical seismic acceleration coefficients. A comprehensive dataset of 19,128 data points is generated using in-house MATLAB code. These data points are trained with the ML models to learn the underlying correlations for the prediction of FSMP . Various ML predictive models, such as multiple linear regression, support vector regression, Gaussian process regression, random forest, extreme gradient boosting, and artificial neural networks, are considered for constructing the optimal model. The objective is to develop a tailored framework for arriving at the best-performing predictive model for replication of pseudo-static stability analysis of soil slopes in geotechnical engineering. Comparison of different data-driven models are also presented. The study also utilized interpretable machine learning models with Shapley values to mitigate the inherent "black box" nature of ML models. The study also establishes a physically interpretable error validation model to assess model predictions. The findings illustrate the effectiveness and precision of the Gaussian process regression (GPR) model, as evidenced by R2 error metric values of 99.9% and 99.8% for the training and test sets, respectively. Further, the error metric for the artificial neural network (ANN) achieved values of 99.7% and 99.6% for the training and test sets, respectively. The GPR model offers conservative results over ANN, making it the preferred predictive model for safe FSMP predictions. It serves as an efficient estimation tool for field practitioners, can be integrated into smartphones and above all integrated into the performance function for uncertainty quantification in the otherwise computationally expensive Monte Carlo simulations. Design charts are also generated using the selected optimal model for depicting the generalizability of this model, enabling geotechnical engineers to determine FSMP without complex calculations. This research showcases the potential of ML techniques for complex geotechnical problems, advancing conventional slope stability analysis and opening avenues for their practical and reliable use in geotechnical engineering. [ABSTRACT FROM AUTHOR]- Published
- 2025
- Full Text
- View/download PDF
35. Recognition of Threats in Hybrid Wireless Sensor Networks by Integrating Harris Hawks with Gradient Boosting Algorithm.
- Author
-
Rasool, Hussein Ali, Najim, Ali Hamzah, Abd Alsadh, Mustafa Hamid, and Hariz, Hussein Muhi
- Subjects
WIRELESS sensor networks ,BOOSTING algorithms ,INFRASTRUCTURE (Economics) ,DIGITAL technology ,VIRTUAL communities - Abstract
Due to the increasing sophistication and complexity of cyber-attacks, particularly in Hybrid Wireless Sensor Networks (HWSNs), digital community infrastructures face significant security challenges. The Gradient Boosting Machine (GBM) is known for its strong predictive capabilities in hazard identification, while Harris Hawks Optimization (HHO), inspired by hawk hunting behavior, enhances the efficient exploration and exploitation of the search space. The proposed method involves pre-processing the data to ensure cleanliness and consistency, followed by the application of HHO and GBM for threat detection, using the NSL-KDD, WSN-DS, and CIDDS-001 datasets. HHO’s iterative optimization process accelerates convergence toward optimal solutions, while GBM builds a robust and accurate threat detection model. This advanced approach provides network administrators and security experts with a powerful tool to protect HWSNs from malicious activities, offering real-world applicability. With high detection accuracy and efficiency, it is well-suited to address evolving threats and ensure the availability and integrity of critical infrastructure in modern network environments. Using Python for implementation, the model achieved exceptional results, with 99.6% accuracy on NSL-KDD, 99.1% on CIDDS-001, and 98.9% on WSN-DS when HHO and GBM were combined for threat recognition in HWSNs. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
36. Cheating Detection in Online Exams Using Deep Learning and Machine Learning.
- Author
-
Erdem, Bahaddin and Karabatak, Murat
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,STUDENT cheating ,SUPPORT vector machines ,EDUCATION ethics ,BOOSTING algorithms ,DEEP learning - Abstract
Featured Application: The proposed model can contribute to the field in revealing the best performance with the least error rate in deep learning and machine learning applications. It can be useful in classifying and detecting unethical behavior patterns in online distance education exams. This study aims to identify the best deep learning and machine learning models to identify the unethical behavior patterns of learners using distance education exam data of an educational institution. One hundred twenty-nine online exam data were analyzed by the researcher with three different scenarios to reveal the best model performance in regression and classification. For regression and classification, deep neural network (DNN) from deep learning algorithms and support vector machine (SVM), decision trees (DTs), k-nearest neighbor (KNN), random forest (RF), logistic regression (LR), and extreme gradient boosting (XGBoost) algorithms from machine learning algorithms were used. In the regression analysis conducted within the scope of Scenario-1, the model we proposed to detect "cheating" behavior, which is one of the unethical learner behaviors, was found to be a 5-layer DNN model with a test performance success of 80.9%. In the binary classification analysis for Scenario-2, students who "copied" from unethical behaviors were obtained with an accuracy rate of 96.9% by the model established by the 10-layer DNN algorithm we proposed. In the triple classification analysis for Scenario-3 defined in the study, the XGBoost model was found to have the highest accuracy rate of 97.7% for students who "cheated" due to unethical behaviors and the highest performance in all other metric values. In addition, SHAP and LIME methods, which are explanatory methods for the XGBoost model, which is one of the best-performing models, were applied, and the attributes and percentages affecting the model were shared. As a result of this study, it has been shown that the application of the most appropriate layer functions and parameter selection that will increase performance can be effective in estimating complex problems and target values that cannot be solved using classical mathematical models. The proposed models can provide educational institutions with a roadmap and insight in evaluating online examination practices and ensuring academic integrity. Future researchers may need more data sets and different analyses for better performance of the established models. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
37. Comparison of Machine Learning-Based Breakdown Pressure Prediction and Initiation Criteria in Hydraulic Fracturing Testing.
- Author
-
Ma, Dongdong, Wu, Yu, Tang, Jizhou, Hao, Yang, Pu, Hai, and Yuan, Hai
- Subjects
COMPUTATIONAL learning theory ,MACHINE learning ,POISSON'S ratio ,PATTERN recognition systems ,SUPERVISED learning ,BOOSTING algorithms ,PANTOGRAPH ,STEEL pipe - Published
- 2025
- Full Text
- View/download PDF
38. In Situ Classification of Original Rocks by Portable Multi-Directional Laser-Induced Breakdown Spectroscopy Device.
- Author
-
Zhang, Mengyang, Fu, Hongbo, Wang, Huadong, Shi, Feifan, Jamali, Saifullah, Ding, Zongling, Wu, Bian, and Zhang, Zhirong
- Subjects
LASER-induced breakdown spectroscopy ,PETROLEUM prospecting ,SHALE oils ,SUPPORT vector machines ,K-nearest neighbor classification ,BOOSTING algorithms - Abstract
In situ rapid classification of rock lithology is crucial in various fields, including geological exploration and petroleum logging. Laser-induced breakdown spectroscopy (LIBS) is particularly well-suited for in situ online analysis due to its rapid response time and minimal sample preparation requirements. To facilitate in situ raw rock discrimination analysis, a portable LIBS device was developed specifically for outdoor use. This device built upon a previous multi-directional optimization scheme and integrated machine learning to classify seven types of original rock samples: mudstone, basalt, dolomite, sandstone, conglomerate, gypsolyte, and shale from oil logging sites. Initially, spectral data were collected from random areas of each rock sample, and a series of pre-processing steps and data dimensionality reduction were performed to enhance the accuracy and efficiency of the LIBS device. Subsequently, four classification algorithms—linear discriminant analysis (LDA), K-nearest neighbor (KNN), support vector machine (SVM), and extreme gradient boosting (XGBoost)—were employed for classification discrimination. The results were evaluated using a confusion matrix. The final average classification accuracies achieved were 95.71%, 93.57%, 92.14%, and 98.57%, respectively. This work not only demonstrates the effectiveness of the portable LIBS device in classifying various original rock types, but it also highlights the potential of the XGBoost algorithm in improving LIBS analytical performance in field scenarios and geological applications, such as oil logging sites. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
39. Investigation on eXtreme Gradient Boosting for cutting force prediction in milling.
- Author
-
Heitz, Thomas, He, Ning, Ait-Mlouk, Addi, Bachrathy, Daniel, Chen, Ni, Zhao, Guolong, and Li, Liang
- Subjects
MACHINE learning ,BOOSTING algorithms ,STANDARD deviations ,CUTTING force ,COST control - Abstract
Accurate prediction of cutting forces is critical in milling operations, with implications for cost reduction and improved manufacturing efficiency. While traditional mechanistic models provide high accuracy, their reliance on extensive milling data for force coefficient fitting poses challenges. The eXtreme Gradient Boosting algorithm offers a potential solution with reduced data requirements, yet the optimal utilization of eXtreme Gradient Boosting remains unexplored. This study investigates its effectiveness in predicting cutting forces during down-milling of Al2024. A novel framework is proposed optimizing its precision, efficiency, and user-friendliness. The model training incorporates the mechanistic force model in both time and frequency domains as new features. Through rigorous experimentation, various aspects of the eXtreme Gradient Boosting configuration are explored, including identifying the optimal number of periods for the training dataset, determining the best normalization and scaling technique, and assessing the hyperparameters' impact on model performance in terms of accuracy and computational time. The results show the remarkable effectiveness of the eXtreme Gradient Boosting model with an average normalized root mean square error of 14.7%, surpassing the 21.9% obtained by the mechanistic force model. Additionally, the machine learning model could capture the runout effect. These findings enable optimized milling operations regarding cost, accuracy and computation time. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
40. An intelligent predictive maintenance system based on random forest for addressing industrial conveyor belt challenges.
- Author
-
Wu, Mingyu, Goh, Kai Woon, Chaw, Kam Heng, Koh, Ye Sheng, Dares, Marvin, Yeong, Che Fai, Su, Eileen L. M., William, Holderbaum, and Zhang, Yunhui
- Subjects
MACHINE learning ,RANDOM forest algorithms ,DECISION trees ,INDUSTRIALISM ,BELT conveyors ,BOOSTING algorithms - Abstract
This study introduces a sophisticated intelligent predictive maintenance system for industrial conveyor belts powered by a random forest machine learning model. The random forest model was evaluated against established models such as logistic regression, neural networks, decision trees, and gradient boosting, demonstrating superior performance. The model achieved 100% accuracy in classifying gearbox lubricant levels and sprocket conditions, highlighting its potential for addressing critical challenges in predictive maintenance, such as avoiding unexpected downtime. However, further validation with larger datasets and varied operational environments is recommended to confirm robustness. This performance highlights its effectiveness in multiclass fault detection and overfitting mitigation, establishing a new standard in predictive maintenance technology. The system, enhanced by a comprehensive sensor array, not only adeptly captures but also intelligently analyzes critical operational data, providing proactive and data-driven insights for maintenance decision-making. This study not only affirms the dominance of the random forest model in predictive analytics but also underscores its pivotal role in optimizing maintenance strategies, enhancing operational efficiency, and ensuring the reliability of conveyor systems in industrial settings. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
41. Vibration analysis of embedded porous nanobeams under thermal effects using boosting machine learning algorithms and semi-analytical approach.
- Author
-
Tariq, Aiman, Uzun, Büşra, Deliktaş, Babür, and Yaylı, Mustafa Özgür
- Subjects
- *
MACHINE learning , *BOOSTING algorithms , *STRAINS & stresses (Mechanics) , *FOURIER series , *THERMAL analysis - Abstract
This study presents a thermal vibration analysis of functionally graded porous nanobeams using boosting machine learning models and a semi-analytical approach. Nonlocal strain gradient theory is employed to explore vibration behavior, accounting for thermal and size effects. A semi-analytical approach solution utilizing Fourier series and Stokes' transform to establish an eigenvalue problem capable of examining vibrational frequencies of porous nanobeams in both rigid and deformable boundary conditions is presented. Four boosting models including gradient boosting (GBoost), light gradient boosting (LGBoost), extreme gradient boosting (LGBoost), and adaptive boosting (AdaBoost) are employed to study the impact of seven crucial parameters on natural frequencies of a nanobeam. Sobol quasi-random space-filling method is used to generate the samples by varying input feature combinations for different porous nanobeam distributions. The model performance is assessed using statistical metrics, visualization tools, 5-fold cross-validation, and SHAP analysis for feature importance. The results highlight the effectiveness of boosting ML models in predicting natural frequencies, particularly XGBoost, which achieved an exceptional R2 value of 0.999, accompanied by the lowest MAE, MAPE, and RMSE values among the models assessed. LGBoost and AdaBoost follow XGBoost in performance, while GBoost exhibits relatively lower effectiveness, as highlighted by radar plots. The SHAP analysis revealed the significant impact of the foundation parameter on frequency prediction, with porosity coefficients notably influencing higher vibration modes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Estimation of compressive strength for spiral stirrup-confined circular concrete column using optimized machine learning with interpretable techniques.
- Author
-
Sun, Yang
- Subjects
- *
CONCRETE columns , *MACHINE learning , *CONCRETE construction , *GRAPHICAL user interfaces , *COMPRESSIVE strength , *BOOSTING algorithms - Abstract
The compressive strength (CS) of a concrete column confined with spiral stirrups is an important indicator for assessing the safety and stability of concrete structures. However, achieving accurate CS estimation for confined concrete remains challenging due to the complex confinement mechanism provided by spiral stirrups. In this study, three robust machine learning (ML) algorithms—support vector regression (SVR), random forest (RF) and extreme gradient boosting (XGBoost)—are employed to predict the CS value of the spiral stirrup-confined circular concrete columns. The hyperparameters of the ML models undergo fine-tuning via Bayesian optimization with 10-fold cross-validation, and the optimized ML models are evaluated for their predictive capabilities. Results show that compared to SVR and RF, XGBoost exhibits more stable generalization performance, achieving an average coefficient of determination (R2) of 0.944 for the 10-fold cross-validation, and demonstrates superior accuracy on the testing dataset with an R2 value of 0.967. To provide insights into the relationship between input features and the output CS value, Individual Conditional Exception (ICE) plots, one/two-dimensional Partial Dependence Plots (PDPs), and Shapley Additive Explanation (SHAP) techniques are utilized to interpret the optimized XGBoost model. Additionally, a friendly online graphical user interface (GUI) has been specially developed based on the optimized XGBoost model to facilitate convenient CS estimation for spiral stirrup-confined circular concrete column. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Machine learning algorithms in constructing prediction models for assisted reproductive technology (ART) related live birth outcomes.
- Author
-
Peng, Junwei, Geng, Xiaoyujie, Zhao, Yiyue, Hou, Zhijin, Tian, Xin, Liu, Xinyi, Xiao, Yuanyuan, and Liu, Yang
- Subjects
- *
RECEIVER operating characteristic curves , *RANDOM forest algorithms , *REPRODUCTIVE technology , *MEDICAL sciences , *LOGISTIC regression analysis , *FERTILIZATION in vitro , *MACHINE learning , *BOOSTING algorithms - Abstract
Currently applicable models for predicting live birth outcomes in patients who received assisted reproductive technology (ART) have methodological or study design limitations that greatly obstruct their dissemination and application. Models suitable for Chinese couples have not yet been identified. We conducted a retrospective study by using a database includes a total of 11,938 couples who underwent in vitro fertilization (IVF) treatment between January 2015 and December 2022 in a medical institution of southwest China Yunnan province. Multiple candidate predictors were screened out by using the importance scores. Four machine learning (ML) algorithms including random forest, extreme gradient boosting, light gradient boosting machine and binary logistic regression were used to construct prediction models. An initial assessment of the predictive performance was conducted and validated by using cross-validation and bootstrap methods. A total of seven predictors were identified, namely maternal age, duration of infertility, basal follicle-stimulating hormone (FSH), progressive sperm motility, progesterone (P) on HCG day, estradiol (E2) on HCG day, and luteinizing hormone (LH) on HCG day. Of the four predictive models, the random forest model and the logistic regression model were considered to have the optimal performance, with the areas under the receiver operating characteristic curve (AUROC) curves of 0.671 (95% CI 0.630–0.713) and 0.674 (95% CI 0.627–0.720). The Brier scores were 0.183 (95% CI 0.170–0.196) and 0.183 (95% CI 0.170–0.196), respectively. Considering the simplicity of model fitting, we recommend the logistic regression model as the best predictive model for live birth. Furthermore, maternal age, P on HCG day and E2 on HCG day were deemed to have the highest contribution to model prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Assessment of resilient modulus of soil using hybrid extreme gradient boosting models.
- Author
-
Duan, Xiangfeng
- Subjects
- *
REGRESSION analysis , *MATHEMATICAL statistics , *ARTIFICIAL intelligence , *IMAGE processing , *MACHINE learning , *BOOSTING algorithms - Abstract
Accurate estimation of the soil resilient modulus (MR) is essential for designing and monitoring pavements. However, experimental methods tend to be time-consuming and costly; regression equations and constitutive models usually have limited applications, while the predictive accuracy of some machine learning studies still has room for improvement. To forecast MR efficiently and accurately, a new model named black-winged kite algorithm-extreme gradient boosting (BKA-XGBOOST) is proposed. In BKA-XGBOOST, XGBOOST captures the many-to-one nonlinear relationship between geotechnical factors and MR, while BKA provides the optimal hyperparameters for XGBOOST. By combining them, XGBOOST has stable and accurate predictive capabilities for different combinations of soil data. Comparisons with nine models show that the proposed model outperforms other models in terms of MR prediction accuracy, with a determination coefficient (R2) of 0.995 and a mean absolute error (MAE) of 0.975 MPa. In addition, an efficient MR prediction software is developed based on the model to improve its practicality and interactivity, which is promising for assisting engineers in evaluating pavement properties. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Identifying lithofacies types by boosting algorithm and resampling technique: a case study of deep-water submarine fans in an oil field in West Africa.
- Author
-
Zhen, Yan, Xiao, Yifei, Zhao, Xiaoming, Lu, Xiaoya, Fang, Junyi, Kang, Jintao, and Liu, Liang
- Subjects
- *
MACHINE learning , *SUBMARINE fans , *BOOSTING algorithms , *OIL fields , *PETROLEUM prospecting , *GAS fields , *NATURAL gas prospecting - Abstract
The continuous discovery of giant oil and gas fields in deep-water low stand fans has made deep-water submarine fan reservoirs with huge oil and gas potential important targets for oil and gas exploration and development. Nowadays, machine learning algorithm has been proven to be an effective method to classify various rock types from geophysical logging data, but rarely has there been focus on predicting deep-water submarine fans in previous studies. In this paper, we utilized five classical Boosting machine learning algorithms, namely GBDT, XGBoost, LightGBM, CatBoost, and LogitBoost, to identify 14 deep-water submarine fan lithofacies types from 7 wells in a West African oilfield. To address the sample non-balance problem, we employed SMOTE and MAHAKIL oversampling techniques and optimized the hyperparameters of the model using Genetic Algorithm. The experimental results show that the model performance is improved by using oversampling technologies and hyperparameter optimization. The proposed MAHAKIL-GA-GBDT algorithm is the most effective in identifying the lithofacies of deep-water submarine fans, with an accuracy of 0.986. This study provides a new approach for identifying deep-water submarine fan lithofacies and highlights the potential of machine learning algorithms in this field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Machine learning model for constant volume depletion prediction of the Niger Delta condensate systems.
- Author
-
Fadairo, Adesina, Olatunji, David, Ling, Kegang, Rasouli, Vamegh, Adeyemi, Gbadegesin, and Gbadamosi, Olumide
- Subjects
- *
BOOSTING algorithms , *ARTIFICIAL intelligence , *WATER temperature , *GAS condensate reservoirs , *ALGORITHMS , *MODEL validation - Abstract
Accurate assessment and evaluation of condensate reservoir performance is dependent on the depth of study of the condensate systems. Due to the anomalous behavior of these reservoirs, a great deal of knowledge is required to accurately forecast performances associated with these systems. Three machine learning (ML) models with proven capabilities and accuracy were developed based on Extratrees (ET), Adaptive Boosting (Adaboost) and Gradient Boosting algorithm (GBM). These models were used to exploits various parameters such as, gas composition, C7+ fraction properties, depletion pressure steps, and reservoir temperature. The data points stretching through different samples from the Niger Delta that passed the quality control tests were used in the validation of the proposed model for this study. Using various metrics to evaluate the accuracy of the models, analyzing the result of two samples A and B show that algorithm with R2 of 0.9868 and 0.9951, MAPE of 0.21224 and 0.10984, RMSE of 0.01746 and 0.01697, MAE of 0.0128 and 0.01457 for sample A and B respectively. Algorithm presents a more accurate prediction when compared with Adaboost and GBM models for two samples. The results from this study can be uniquely serialized and deployed into PVT simulation packages [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Tabnet efficiency for facies classification and learning feature embedding from well log data.
- Author
-
Ta, Viet-Cuong, Hoang, Thi-Linh, Doan, Ngoc-San, Nguyen, Van-Thang, Nguyen Dieu, Nuong, Pham, Thi Thanh Thuy, and Nguyen Dang, Nam
- Subjects
- *
MACHINE learning , *SUPPORT vector machines , *TRANSFORMER models , *DATA logging , *RANDOM forest algorithms , *BOOSTING algorithms , *DEEP learning - Abstract
The well log data is represented as raw tabular data with diverse and nonlinear features. This poses a challenge for feature learning by machine learning models. The recent popular decision tree-based algorithms, such as random forest (RF), extreme gradient boosting (XGB) are prominent for learning nonlinear relationships of well log data in comparison with other methods of support vector machines (SVMs) and even deep learning models. In this work, we proposed using Tabnet model for direct learning tabular data of well logs. To our knowledge, this is the first time a state-of-the-art transformer-based model of Tabnet has been utilized for this task. The efficiency of Tabnet-based feature embedding is evaluated in two tasks of rock facies classification and learning feature embedding. We prove the efficiency of Tabnet model by experimental results on two small datasets of public Kansas dataset, which has nine wells for training and two wells for testing, and our own-built dataset, which has four wells for training and one well for testing. Although training on the modest amount of well log data, the proposed Tabnet model still promotes better classification efficiency than tree-based models of RF, XGBoost, LightGBM and deep learning models of MLP, CNN-1D, and ResNet-1D. KEY POINTS: Tabnet efficiency for facies classification and learning feature embedding from well log data. A challenge to learn these raw features directly for separating classes of facies. The superiority of the Tabnet network in comparison with other ruling tree-based methods and deep learning models. Facies classification and learning feature embeddings for categorical variables of well logs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Analysis of the 10-day ultra-marathon using a predictive XG boost model.
- Author
-
Knechtle, Beat, Villiger, Elias, Valero, David, Braschler, Lorin, Weiss, Katja, Vancini, Rodrigo Luiz, Andrade, Marilia S., Scheer, Volker, Nikolaidis, Pantelis T., Cuk, Ivan, Rosemann, Thomas, and Thuany, Mabliny
- Subjects
- *
MACHINE learning , *RUNNING races , *ULTRAMARATHON running , *RUNNERS (Sports) , *BOOSTING algorithms , *RUNNING speed - Abstract
Objective: Ultra-marathon running races are held as distance-limited or time-limited events, ranging from 6 h to 10 days. Only a few runners compete in 10-day events, and so far, we have little knowledge about the athletes' origins, performance, and event characteristics. The aim of the present study was to investigate the origin and performance of these runners and the fastest race locations. A machine learning model based on the XG Boost algorithm was built to predict running speed from the athlete´s age, gender, country of origin, country where the race takes place, the type of race and the kind of running surface. The model explainability tools were then used to investigate how each independent variable would influence the predicted running speed. Results: The model rated the origin of the athlete as the most important predictor, followed by age group, running on dirt path, gender, running on asphalt, and event location. Running on dirt path led to a significant reduction of running speed, while running on asphalt showed faster running speeds compared to other surfaces. Most athletes came from USA, followed by Russia, Germany, Ukraine, the Czech Republic, and Slovakia. Most of the runners competed in USA. The fastest 10-day runners were from Finland and Israel. The fastest 10-day races were held in Greece. Conclusions: Most 10-day runners originated from USA, but the fastest runners originate from Finland and Israel. The fastest race courses were in Greece. Running on dirt paths leads to a significant reduction in running speed while running on asphalt leads to faster running speeds. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Alphabet Handwriting Recognition: From Wood‐Framed Hydrogel Arrays Design to Machine Learning Decoding.
- Author
-
Yan, Guihua, Hu, Xichen, Miao, Ziyue, Liu, Yongde, Zeng, Xianhai, Lin, Lu, Ikkala, Olli, and Peng, Bo
- Subjects
- *
BOOSTING algorithms , *FEATURE extraction , *WOOD , *MACHINE learning , *POLYACRYLIC acid , *HANDWRITING recognition (Computer science) - Abstract
Handwriting recognition is a highly integrated system, demanding hardware to collect handwriting signals and software to deal with input data. Nonetheless, the design of such a system from scratch with sustainable materials and an easily accessible computing network presents significant challenges. In pursuit of this goal, a flexible, and electrically conductive wood‐derived hydrogel array is developed as a handwriting input panel, enabling recognizing alphabet handwriting assisted by machine learning technique. For this, lignin extraction‐refill, polypyrrole coating, and polyacrylic acid filling, endowing flexibility, and electrical conduction to wood are sequentially implemented. Subsequently, these woods are manufactured into a 5 × 5 array, creating a matrix of signals upon handwriting. Efficient handwritten recognition is then achieved through appropriate manual feature extraction and algorithms with low complexity within a computing network, as demonstrated in this work, the strategic choice of expertise‐based feature engineering and simplified algorithms effectively boost the overall model performance on handwriting recognition. With potential adaptability, further applications in customized wearable devices and hands‐on healthcare appliances are envisioned. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Rapid diagnosis of power battery faults in new energy vehicles based on improved boosting algorithm and big data.
- Author
-
Wang, Jiali and Chen, Jia
- Subjects
ELECTRIC vehicles ,RANDOM forest algorithms ,PROCESS capability ,FAULT diagnosis ,ARTIFICIAL intelligence ,BOOSTING algorithms - Abstract
In recent years, the new energy vehicle industry has developed rapidly. A fast diagnostic method based on Boosting and big data is proposed to address the low accuracy and efficiency of fault diagnosis in new energy vehicle power batteries. Boosting is a machine learning technique that combines multiple weak learners into a strong learner. Big data refers to large-scale, complex datasets that exceed traditional data processing capabilities. Firstly, analyze and preprocess the big data uploaded by the battery. Subsequently, the importance of indicators in the data was analyzed using the Random Forest algorithm (RF). Finally, three improved Boosting algorithms were proposed, namely Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting Tree (XGBoost), and Gradient Boosting Decision Tree (CatBoost). The experimental results indicate that the LightGBM model effectively detects anomalies in battery big data. The accuracy values of XGBoost, CatBoost, and LightGBM are 97.84%, 98.57%, and 99.16%, respectively. The recall rates of XGBoost, CatBoost, and LightGBM models are all 1. The F1 values of GBoost, CatBoost, and LightGBM are 0.873, 0.983, and 0.985, respectively. The power battery is the core component of new energy vehicles, and its safety performance directly affects the operational safety of the vehicle. Timely identification and diagnosis of battery faults can effectively reduce potential accidents such as battery overheating and short circuits. Research can achieve real-time monitoring and timely reminders of potential faults. By early detection of issues such as battery overheating and voltage imbalance, this method can effectively reduce the risk of serious safety accidents and improve the overall operational reliability of new energy vehicles during driving. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.