37,258 results on '"RANDOM forest algorithms"'
Search Results
2. Predicting the length-of-stay of pediatric patients using machine learning algorithms.
- Author
-
Boff Medeiros, Natália, Fogliatto, Flávio Sanson, Karla Rocha, Miriam, and Tortorella, Guilherme Luz
- Subjects
RANDOM forest algorithms ,MACHINE learning ,LENGTH of stay in hospitals ,CHILDREN'S injuries ,FOREST productivity ,CHILD patients - Abstract
The management of hospitals' resource capacity has a strong impact on the quality of care, and the length-of-stay (LOS) of patients is an indicator that reflects its efficiency and effectiveness. This study aims at predicting the LOS of pediatric patients (LOS-P) in hospitals to assist in decision-making regarding resource utilisation. LOS-P forecasting presents additional challenges to the analyst compared to other medical specialties since Pediatrics comprises several other subspecialties (e.g. pediatric oncology and traumatology). Pediatric patients within subspecialties compete for the same hospital resources, and aggregate LOS-P predictions are more useful for resource planning. However, aggregate pediatric LOS datasets are harder to model and result in lower forecasting accuracy. To address that problem, we propose a forecasting model based on Machine Learning algorithms. The method for LOS-P forecasting comprises five steps (data visualisation, data pre-processing, sample partitioning, model testing, and model definition through parameter setting and variable selection) and is tested using a dataset of hospitalisations of pediatric patients from a large Brazilian University hospital. Multiple linear regression, random forest, support vector regression, ridge regression, and partial least squares algorithms are applied and compared to determine the best forecasting model. Results indicate that all forecasting models yield satisfactory accuracy, with the best algorithms being random forest and support vector regressor. After refining the model through variable selection and using a Grid Search to find the best parameters, the random forest algorithm yielded an R
2 of 65.67%, with an average absolute error of 3.51 days. Highlights: Prediction of the length of stay of pediatric patients (LOS-P) in hospitals based on Machine Learning algorithms Multiple linear regression, random forest, support vector regression, ridge regression, and partial least squares algorithms were applied and compared Random forest algorithm yielded an R2 of 65.67%, with an average absolute error of 3.51 days [ABSTRACT FROM AUTHOR]- Published
- 2025
- Full Text
- View/download PDF
3. Predicting Basketball Shot Outcome From Visuomotor Control Data Using Explainable Machine Learning.
- Author
-
Aitcheson-Huehn, Nikki, MacPherson, Ryan, Panchuk, Derek, and Kiefer, Adam W.
- Subjects
- *
VISUOMOTOR coordination , *RANDOM forest algorithms , *EYE tracking , *DECISION trees , *MACHINE learning - Abstract
Quiet eye (QE), the visual fixation on a target before initiation of a critical action, is associated with improved performance. While QE is trainable, it is unclear whether QE can directly predict performance, which has implications for training interventions. This study predicted basketball shot outcome (make or miss) from visuomotor control variables using a decision tree classification approach. Twelve basketball athletes completed 200 shots from six on-court locations while wearing mobile eye-tracking glasses. Training and testing data sets were used for modeling eight predictors (shot location, arm extension time, and absolute and relative QE onset, offset, and duration) via standard and conditional inference decision trees and random forests. On average, the trees predicted over 66% of makes and over 50% of misses. The main predictor, relative QE duration, indicated success for durations over 18.4% (range: 14.5%–22.0%). Training to prolong QE duration beyond 18% may enhance shot success. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Uncovering Financial Constraints.
- Author
-
Linn, Matthew and Weagley, Daniel
- Subjects
BUSINESS enterprises ,RANDOM forest algorithms ,BUSINESS finance ,INVESTORS ,MARKET sentiment - Abstract
We use a random forest model to classify firms' financial constraints using only financial variables. Our methodology expands the range of classified firms compared to text-based measures while maintaining similar levels of informativeness. We construct two versions of our constraint measures, one using many firm characteristics and the other using a small set of more primitive characteristics. Using our measures, we find that institutional investors hold a lower percentage of shares in equity-focused constrained firms, while retail investors show a preference for them. Equity issuance and investment of constrained firms also increases during periods of high investor sentiment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Not seeing the wood for the trees: Influences on random forest accuracy.
- Author
-
Hand, Chris and Fitkov-Norris, Elena
- Subjects
RANDOM forest algorithms ,RANDOM noise theory ,WOOD ,MACHINE learning ,STATISTICAL sampling - Abstract
Machine learning classifiers are increasingly widely used. This research note explores how a particular widely used classifier, the Random Forest, performs when faced with samples which are imbalanced and noisy data. Both are known to affect accuracy, but if their effects are independent or not has not been explored. Based on an experiment using synthetic data generated for the study we find that the effects of noise and sample balance interact with each other; classification accuracy is worse when faced with both noisy data and sample imbalance. This has implications for the use of RF in market research, but also for how methods to address either sample imbalance or noise are assessed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. A novel approach for predicting Lockout/Tagout safety procedures for smart maintenance strategies.
- Author
-
Delpla, Victor, Chapron, Kevin, Kenné, Jean-Pierre, and Hof, Lucas A.
- Subjects
SMART cities ,ARTIFICIAL neural networks ,RANDOM forest algorithms ,CONCRETE industry - Abstract
This article presents an approach for predicting Lockout/Tagout (LOTO) procedure sheets, which are commonly used in the manufacturing industry to prevent premature equipment restart during maintenance. The prediction problem of energetic devices to lock from machine names is regarded as a multi-task classification problem. The dataset was obtained by processing LOTO sheets in Portable Document Format (PDF). The K-Nearest Neighbours (KNN), Random Forest (RF), and Deep Neural Network (DNN) algorithms were compared for this problem. The best prediction performance was achieved with the DNN method, with top-1 accuracies exceeding 63% and top-2 accuracies exceeding 90% for all devices. The sensitivity analysis conducted on the results indicates that the approach is robust and reliable, regardless of the industrial sector considered. In other words, the approach is not significantly affected by variations in the industry or its specific characteristics. These results suggest that the proposed approach can be used to assist workers in drafting LOTO sheets, and offers strong potential for concrete applications in safety management in the era of smart manufacturing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Quality design based on kernel trick and Bayesian semiparametric model for multi-response processes with complex correlations.
- Author
-
Yang, Shijuan, Wang, Jianjun, Cheng, Xiaoying, Wu, Jiawei, and Liu, Jinpei
- Subjects
PRINCIPAL components analysis ,EVOLUTIONARY algorithms ,RANDOM forest algorithms ,LEAST squares - Abstract
Processes or products are typically complex systems with numerous interrelated procedures and interdependent components. This results in complex relationships between responses and input factors, as well as complex nonlinear correlations among multiple responses. If the two types of complex correlations in the quality design cannot be properly dealt with, it will affect the prediction accuracy of the response surface model, as well as the accuracy and reliability of the recommended optimal solutions. In this paper, we combine kernel trick-based kernel principal component analysis, spline-based Bayesian semiparametric additive model, and normal boundary intersection-based evolutionary algorithm to address these two types of complex correlations. The effectiveness of the proposed method in modeling and optimisation is validated through a simulation study and a case study. The results show that the proposed Bayesian semiparametric additive model can better describe the process relationships compared to least squares regression, random forest regression, and support vector basis regression, and the proposed multi-objective optimisation method performs well on several indicators mentioned in the paper. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Sales forecasting of a food and beverage company using deep clustering frameworks.
- Author
-
Mitra, Rony, Saha, Priyam, and Kumar Tiwari, Manoj
- Subjects
SALES forecasting ,FOOD industry ,GAUSSIAN mixture models ,RANDOM forest algorithms ,RETAIL industry ,HIERARCHICAL clustering (Cluster analysis) - Abstract
The competition among Food & Beverage companies has substantially increased in today's age of digitization. Sales forecasting is one of their main challenges. Due to space limitations, employee shortages, and rising online demand, retail sales forecasting became extremely important for Food and Beverage companies. This research analyzed the sales data of a multinational Food & Beverage Company. It proposed a framework using Gaussian Mixture Model (GMM) clustering, Hierarchical Agglomerative Clustering (HAC), and Random Forest algorithm for forecasting sales. This model analyzes the impact of the weekends, holidays, promotional activities, customer sentiments, festivals, and socio-economic situations in sales data and is able to forecast sales ranging from one to 15 months. An investigation of the suggested model's performance compared to numerous cutting-edge sales forecasting techniques is carried out to show its efficacy. Here, we demonstrate that the proposed hybrid model surpasses current predicting and computing efficiency methods. The results of this study can help retail managers to allocate resources and manage inventories in well-informed ways. The findings suggest that combining many strategies may produce the most precise forecasts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. The optimization of evaporation rate in graphene-water system by machine learning algorithm.
- Author
-
Qiao, Degao, Yang, Ming, Gao, Yin, Hou, Jue, Zhang, Xingli, and Zhang, Hang
- Subjects
- *
RANDOM forest algorithms , *INTERFACIAL bonding , *PRODUCTION methods , *INSTRUCTIONAL systems , *PREDICTION models , *ERROR rates , *MACHINE learning , *PEARSON correlation (Statistics) , *DATA extraction - Abstract
Solar interfacial evaporation, as a novel practical freshwater production method, requires continuous research on how to improve the evaporation rates to increase water production. In this study, sets of data were obtained from molecule dynamics simulation and literature, in which the parameters included height, diameter, height–radius ratio, evaporation efficiency, and evaporation rate. Initially, the correlation between the four input parameters and the output of the evaporation rate was examined through traditional pairwise plots and Pearson correlation analysis, revealing weak correlations. Subsequently, the accuracy and generalization performance of the evaporation rate prediction models established by neural network and random forest were compared, with the latter demonstrating superior performance and reliability confirmed via random data extraction. Furthermore, the impact of different percentages (10%, 20%, and 30%) of the data on the model performance was explored, and the result indicated that the model performance is better when the test set is 20% and all the constructed model converge. Moreover, the mean absolute error and mean squared error of the evaporation rate prediction model for the three ratios were calculated to evaluate their performance. However, the relationship between the height- radius ratio and optimal evaporation rate was investigated using the enumeration method, and it was determined that the evaporation efficiency was optimal when the height–radius ratio was 6. Finally, the importance of height, diameter, height– radius ratio, and evaporation efficiency were calculated to optimize evaporator structure, increase evaporation rate, and facilitate the application of interfacial evaporation in solar desalination. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Performance Prediction of Server Using Neural Network Algorithm Compared with Random Forest Algorithm Based on Option Posted by Players
- Author
-
Janardhan, K. P., Selvakumar, A., Ramesh, S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Dutta, Soumi, editor, Bhattacharya, Abhishek, editor, Shahnaz, Celia, editor, and Chakrabarti, Satyajit, editor
- Published
- 2025
- Full Text
- View/download PDF
11. An empirical studies on online gender-based violence: Classification analysis utilizing XGBOOST.
- Author
-
Primandari, Arum Handini and Ermayani, Putri
- Subjects
- *
GENDER-based violence , *RANDOM forest algorithms , *DECISION trees , *CYBERBULLYING , *SOCIAL media - Abstract
It is undeniable that the presence of broad reach of the internet provides space for online gender-based violence (Kekerasan Berbasis Gender Online, KBGO). As is the case in real-world violence, perpetrators of violence in the online world have the intention of harassing their victims based on gender or sexuality. The kinds of online violence based on Komnas Perempuan reports are cyber grooming, cyberbullying, cyber harassment, hacking, infringement of privacy, etc. According to that phenomenon, this research builds an automation model to classify social media comments into comments that are indicated to contain Online Gender-Based Violence (GBV) or not. The 18,239 documents (statements) took from comments on celebrity or influencer accounts during 2022-2021. The model used for classification analysis is XGBoost (eXtreme Gradient Boosting), an ensemble decision tree model with a gradient boosting algorithm. XGBoost attempts to build a robust classifier from the number of weak classifiers. The main benefit of using XGBoost to train random forest ensembles is its speed. This is expected to be much faster than random forest itself. Documents went through a pre-processing series, then a text vectorization process with TF-IDF. Because the classes in the sample are imbalanced, the SMOTE-ENN method is employed to balance the classes. Measurement metrics, including accuracy, f1-score, precision, and recall, show good XGBoost performance which more than 90%. Comparing several model, including random forest, adaboost, and XGBoost, the XGBoost is not the model having highest accuracy, however it is the fastest model. The running time for XGBoost is almost seven to eight time faster than random forest. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. Estimating COVID-19 using chest x-ray images through AI-driven diagnosis.
- Author
-
Sofia, R., Mahendran, K., and Devi, K. Nirmala
- Subjects
- *
MACHINE learning , *COVID-19 pandemic , *RANDOM forest algorithms , *SUPPORT vector machines , *X-ray imaging - Abstract
The rapid global spread of COVID-19 has sparked a significant increase in testing efforts worldwide, marking it as a pandemic. This unprecedented situation has profoundly impacted daily life, public health, and the global economy. Traditional laboratory methods, like Polymerase Chain Reaction (PCR) testing, though considered the gold standard, are time-consuming and can yield false negatives. Consequently, there arose an urgent demand for swift and accurate diagnostic techniques to identify COVID-19 cases promptly and curb the pandemic's spread. Artificial intelligence (AI) has emerged as a potent tool in conjunction with radiographic imaging to aid in detecting COVID-19. This study proposes a classification approach for identifying infectious conditions in chest X-ray images. A dataset comprising X-ray images from healthy individuals, pneumonia cases including SARS, Streptococcus, Pneumococcus, and COVID-19 patients was compiled. Leveraging the Histogram of Oriented Gradients (HOG) technique for feature extraction, the study employed machine learning algorithms such as K-Nearest Neighbors (KNN), Random Forests, and Support Vector Machines (SVM) for classification. Results demonstrated classification accuracies of 98.14%, 96.29%, and 88.89% for KNN, Random Forests, and SVM, respectively. These findings underscore promising opportunities for utilizing image analysis in the detection of COVID-19 and other respiratory illnesses, providing a robust framework for future research and clinical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. Hyperparameter optimization in cardio vascular disease prediction.
- Author
-
Kuppusamy, Saraswathi, Thangavel, Renukadevi, Kumar, Deepan, Janani, and Yamunadevi
- Subjects
- *
SUPPORT vector machines , *HEART diseases , *RANDOM forest algorithms , *DECISION trees , *VASCULAR diseases - Abstract
A condition affecting the heart with blood vessels is called Cardio Vascular Disease (CVD). It is the main reason for death. Early detection can help prevent or lessen it, which lowers mortality. Various study articles describe the application of algorithmic machine learning to identify cardiac diseases. When the algorithm is applied to the dataset's records, a faster and more precise prediction of cardiovascular illnesswill enable the patient to receive therapy. Cardiologists can make judgments quickly with the aid of these projections. The suggested study employs self-defined Decision Tree, random forest, Logistic Regressions, Support Vector Machine (SVM), grid search to identify the presenceof cardiovascular illness. We examine and assess its performance to forecast it. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. Enhancing predictive maintenance in the industrial sector: A comparative analysis of machine learning models.
- Author
-
Levin, Semen
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *PLANT maintenance , *RANDOM forest algorithms , *EXPERTISE - Abstract
This research evaluates the efficacy of machine learning (ML) models – Random Forest, Gradient Boosting Machine (GBM), and Deep Neural Networks (DNNs) – for predictive maintenance in the industrial sector, using a dataset reflective of centrifuge, pump, and compressor operations. It assesses these models based on accuracy, precision, recall, F1 score, and ROC AUC metrics, focusing on the GBM model's feature importance analysis, identifying vibration levels and operational hours as key predictors of equipment failure. The study demonstrates DNNs' superior performance, highlighting their potential to significantly enhance predictive maintenance through improved prediction accuracy and operational efficiency. Despite the advantages, the paper notes challenges in implementing these advanced models, including computational demands and the need for specialised expertise. By offering a comprehensive framework for applying ML to predictive maintenance and addressing gaps in existing literature, this research contributes valuable insights into developing more efficient and reliable industrial maintenance strategies, paving the way for future innovations in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A machine learning based model for customer churn prediction in telecommunication.
- Author
-
Sehrawat, Neeshant, Yadav, Saneh Lata, and Dahiya, Mamta
- Subjects
- *
RANDOM forest algorithms , *LOGISTIC regression analysis , *CONSUMERS , *RESEARCH personnel , *PREDICTION models - Abstract
Customer churn is a major problem and one of the biggest problems for large firms. Companies are developing techniques to predict probable customer churn since it directly affects their revenue. In order to decrease customer turnover, it is crucial to identify the variables that contribute to this churn. This paper's key contribution is to showcase the importance of customer churn in telecom that helps telecom providers identify consumers that are most likely to experience churn. The work is described in this study applies machine learning methods on datasets to predict whether a customer is likely to churn or not. To evaluate the effectiveness of churn prediction models, researchers have focused on assessing the accuracy of various machine learning models. Another significant contribution is the use of hyper-parameter tuning to increase the efficiency of the best resulted model. The accuracy of the model was improved by hyper-parameter tuning from 80.17% to 80.31%. The model is constructed and evaluated on the python platform using a sizable dataset that is produced by converting massive raw data provided by the Telco, a fictional telecommunications business. The model tested five different machine learning algorithms: Logistic Regression, Naive-Bayes Classifier, Support Vector Classifier (SVC), Decision-tree Classifier and Random Forest Classifier. However, using the Logistic Regression method produced the greatest results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Development of bio-medicinal plants and herbs classifier with random forest algorithm and QR code generator.
- Author
-
Sharma, Mansi, Srivastav, Gaurav, Puri, Chetan, and Khedkar, Sandip
- Subjects
- *
TWO-dimensional bar codes , *RANDOM forest algorithms , *COVID-19 pandemic , *CODE generators , *PLANT identification - Abstract
In today's time basically from ancient time India mostly uses ayurveda's culture and that bio-medical plants can be under a ayurveda to cure the disease by making medicine using herbs. Bio-medicinal plants is known to be medical herbs which having therapeutic properties and we use for various medicinal purpose. It can be classified into several types based on various usages and chemical components like Alkaloid rich plant, Terphenoid rich plants, Glycoside rich plant and many more. In the development of Bio-medical plants and herbs classifier we have used the Random Forest classifier to classify these Random Forest is a classifier utilizing many decision trees on different subsets of the input dataset and averages the results in order to improve the dataset's predictive accuracy. Also, QR codes have become increasingly popular in recent years, as they provide a quick and easy way to get information using a smartphone. There is an area to develop QR code particularly useful in identification and learning the information about medicinal plants and herbs. By generating QR code systems for plants and herbs, gardeners, horticulturists and other learners also can rapidly access information about a particular plant or herb, including its planting location, plant species plant origin, plant care instruction and plant identification etc. The use of QR codes has become more dominant during the COVID-19 pandemic as they provide a contactless way to get information and do transactions. QR codes are becoming gradually popular in the horticulture industry, providing a quick and suitable way for gardeners, horticulturists, and enthusiasts to access information about specific plants and herbs. This article discovers the development of a QR code system for various medicinal plants and herbs, its benefits, and potential applications in education and commercial plant sales. By using QR codes, individuals can expand their knowledge, promote sustainability, and enhance their overall experience with gardening. In this article the author introduces the QR code system for plants and herbs and how quickly it gives information about plants and herbs. the main use of QR code system is to store huge amounts of data in small areas so here time and space is completely utilized in an effective manner. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Comparative analysis of machine learning algorithms on different diabetes datasets.
- Author
-
Kawarkhe, Madhuri and Kaur, Parminder
- Subjects
- *
MACHINE learning , *K-nearest neighbor classification , *RANDOM forest algorithms , *DECISION trees , *PATIENT monitoring - Abstract
Diabetes is a disease caused due to elevated blood glucose levels. If diabetes is not treated properly it may lead to many health complications and may even cause individual death also. Diabetes prevention is a major need in the near future. Recent trends in the healthcare system provided a pathway for disease diagnosis, monitoring patients and predicting individual health conditions also. In this paper we compared Naive Bayes, Random Forest, Logistic Regression, AdaBoost, Decision Tree and K-Nearest Neighbor machine learning algorithms for prediction of diabetes. Experimentation is performed using three different diabetes datasets. The result shows that Random Forest outperformed in all the datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Enhancing credit card transaction fraud detection with random forest and robust scaling.
- Author
-
Wajgi, Rakhi, Agarkar, Himanshu, Patil, Rohan, Rao, Harshit, and Petkar, Nipun
- Subjects
- *
CREDIT card fraud , *MACHINE learning , *RANDOM forest algorithms , *FRAUD investigation , *CREDIT cards - Abstract
In an era where credit card fraud poses an ever-increasing threat to financial institutions and consumers, the precise detection of fraudulent transactions is paramount. This study delves into the realm of data science and machine learning to fortify the defenses against credit card fraud. We evaluate the performance of three distinct machine learning models—decision trees, random forests, and logistic regression—in classifying, predicting, and detecting fraudulent credit card transactions. Our findings reveal that the Random Forest model emerged as the standout performer, achieving an impressive accuracy rate of 99% and boasting an AUC (Area Under the Curve) of 98.5% in the identification and prediction of fraudulent credit card transactions. This remarkable accuracy, combined with superior precision, recall, and F1-score, positions Random Forest as the optimal choice for the critical task of credit card fraud detection. Furthermore, we emphasize the importance of employing the RobustScaler preprocessing technique, which contributed significantly to enhancing the robustness and overall performance of our machine learning models. The study underscores the applicability of Random Forest for precise and equitable categorization, particularly for the minority class, making it a compelling choice for real-world applications. As fraudsters continue to evolve their tactics, the use of advanced machine learning techniques, exemplified by Random Forest, becomes increasingly crucial in safeguarding the integrity of credit card transactions. This research offers valuable insights into the frontlines of fraud detection, providing a foundation for enhanced security in the payment ecosystem. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Unleashing cricket's potential: The ultimate portal for prediction, analysis, and live scores.
- Author
-
Hambarde, Bhagyashree, Govardhan, Prashant, Parkhi, Priya, and Bodhe, Ketan D.
- Subjects
- *
RANDOM forest algorithms , *WEB-based user interfaces , *DECISION trees , *LOGISTIC regression analysis , *MACHINE learning - Abstract
Cricket stands as one of the world's most-watched sports, drawing an ever-growing audience eager to delve into its intricacies. With an array of data types—numerical, categorical, time-series, text, and ordinal—providing diverse insights, the game's outcome hinges on various game-changing factors. Enter Cricviz: a Django and Bootstrap-powered web application tailored to offer comprehensive Cricket Analysis, Predictions, and Live scores for all IPL seasons. In the prediction model, many machine learning techniques were implemented and classified. Accuracy values for these algorithms range from 60.8%, 44.5%, 62%, 58.6%, 56.5% and 58% for Decision Tree, KNN, Random Forest, Logistic Regression, GaussianNB and Gradient Boosting. The top-performing model takes centre stage in our prediction page, enhancing user experience and reliability. For in-depth analysis and visualization, presenting the capabilities of Microsoft Power BI, embedding interactive dashboards within our Analysis section. Dive into live match summaries presented over-by-over, ball-by-ball, accompanied by insightful commentary, available on the Live Score page. Paper presents the transform cricket engagement through data-driven insights and real-time updates. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Predictive analysis of bullying victimization trajectory in a Chinese early adolescent cohort based on machine learning.
- Author
-
Wen, Xue, Tang, Ting, Wang, Xinhui, Tong, Yingying, Zhu, Dongxue, Wang, Fan, Ding, Han, Su, Puyu, and Wang, Gengfu
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *MULTIPLE regression analysis , *LOGISTIC regression analysis , *SATISFACTION , *BULLYING , *CYBERBULLYING - Abstract
The development of bullying victimization among adolescents displays significant individual variability, with general, group-based interventions often proving insufficient for partial victims. This study aimed to conduct a machine learning-based predictive analysis of bullying victimization trajectories among Chinese early adolescents and to examine the underlying determinants. Data were collected from 1549 students who completed three assessments of bullying victimization from 2019 to 2021. Self-reported questionnaires were used to measure bullying victimization and its associated risk and protective factors. Trajectories were classified using the Group-based Trajectory Model (GBTM), while a Random Forest algorithm was employed to develop a predictive model. Associations between baseline characteristics and victimization trajectories were evaluated via multiple logistic regression analysis. The GBTM identified four distinct victimization trajectories, with the predictive model demonstrating adequate accuracy across these trajectories, ranging from 0.812 to 0.990. Predictors exhibited varying influences across different trajectory subgroups. Odds ratios (ORs) were notably higher in the persistent severe victimization group compared to the low victimization group (OR for adverse school experiences: 3.698 vs. 1.386; for age: 2.160 vs. 1.252; for irritability traits: 1.867 vs. 1.270). Adolescents reporting lower school satisfaction and higher borderline personality features showed a greater likelihood of persistent severe victimization, while those with lower peer satisfaction faced increased victimization over time. The machine learning-based predictive model facilitates the identification of adolescents across different victimization trajectory groups, offering insights for designing targeted interventions. The identified risk factors are instrumental in guiding effective intervention strategies. • The machine learning model showed a desirable performance in the prediction of bullying victimization trajectories. • The important predictors presented different effect across different trajectory subgroups. • Physical aggression and hostility were found significantly associated with low victimization trajectory. • Satisfaction with school and borderline personality features were associated with persistent severe victimization. • The results might provide valuable insights in identifying at-risk groups and designing targeted intervention strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset: Granular-ball-matrix-based incremental semi-supervised feature selection...: W.H. Xu and J.L. Li
- Author
-
Xu, Weihua and Li, Jinlong
- Subjects
ARTIFICIAL intelligence ,RANDOM forest algorithms ,IMAGE processing ,MATRIX functions ,NEIGHBORHOODS - Abstract
In numerous real-world applications, data tends to be ordered and partially labelled, predominantly due to the constraints of labeling costs. The current methodologies for managing such data are inadequate, especially when confronted with the challenge of high-dimensional datasets, which often require reprocessing from the start, resulting in significant inefficiencies. To tackle this, we introduce an incremental semi-supervised feature selection algorithm that is grounded in neighborhood discernibility, and incorporates pseudolabel granular balls and matrix updating techniques. This novel approach evaluates the significance of features for both labelled and unlabelled data independently, using the power of neighborhood distinguishability to identify the most optimal subset of features. In a bid to enhance computational efficiency, especially with large datasets, we adopt a pseudolabel granular balls technique, which effectively segments the dataset into more manageable samples prior to feature selection. For high-dimensional data, we employ matrices to store neighborhood information, with distance functions and matrix structures that are tailored for both low and high-dimensional contexts. Furthermore, we present an innovative matrix updating method designed to accommodate fluctuations in the number of features. Our experimental results conducted across 12 datasets-including 4 with over 2000 features-demonstrate that our algorithm not only outperforms existing methods in handling large samples and high-dimensional datasets but also achieves an average time reduction of over six fold compared to similar semi-supervised algorithms. Moreover, we observe an average improvement in accuracy of 1.4%, 0.6%, and 0.2% per dataset for SVM, KNN, and Random Forest classifiers, respectively, when compared to the best-performing algorithm among the compared algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
22. Factors and Reasons Associated with Hesitating to Seek Care for Migraine: Results of the OVERCOME (US) Study.
- Author
-
Shapiro, Robert E., Muenzel, Eva Jolanda, Nicholson, Robert A., Zagar, Anthony J., L. Reed, Michael, Buse, Dawn C., Hutchinson, Susan, Ashina, Sait, Pearlman, Eric M., and Lipton, Richard B.
- Subjects
- *
SUPERVISED learning , *MEDICAL personnel , *MIGRAINE , *BURDEN of care , *RANDOM forest algorithms - Abstract
Introduction: Despite a variety of available treatment options for migraine, many people with migraine do not seek medical care, thereby reducing opportunities for diagnosis and effective treatment and potentially leading to missed opportunities to reduce the burden of disease. Understanding why people hesitate to seek care for migraine may help healthcare professionals and advocates address barriers and improve outcomes. The aim of this study, in a large adult population sample in the United States (US), was to identify factors associated with and reasons for hesitating to seek healthcare for migraine. Methods: The web-based OVERCOME (US) survey study identified adults with active migraine in a demographically representative US sample who answered questions about hesitating to seek care from a healthcare provider for migraine and reasons for hesitating. Supervised machine learning (random forest, least absolute shrinkage and selection operator) identified factors associated with hesitation; logistic regression models assessed association of factors on hesitation. Results: The study results show that of the 58,403 participants with active migraine who completed the OVERCOME (US) baseline survey and provided responses to the question on hesitating to seek care for migraine, 45.1% (n = 26,330/58,403) with migraine indicated that they had ever hesitated to seek care for migraine. Factors most associated with hesitating to seek care were hiding migraine (odds ratio [OR] = 2.69; 95% confidence interval [CI]: 2.50, 2.89), experiencing migraine-related stigma (OR = 2.13; 95% CI 1.95, 2.33), higher migraine-related disability (OR = 1.30; 95% CI 1.23, 1.38), and higher ictal cutaneous allodynia (OR = 1.26; 95% CI 1.19, 1.35). The most common reasons participants stated for hesitating included (1) 44.2% wanting to try and take care of migraine on their own, (2) 33.8% feeling that their migraine or headache would not be taken seriously, (3) 29.2% thinking that their migraine was not serious/painful enough, and (4) 27.4% not being able to afford it or not wanting to spend the money. The main limitation of the study includes the requirement for respondents to have internet, access which may have reflected cohort bias, and the quota sampling rather than random sampling to create a demographically representative sample. Conclusions: Hesitating to seek migraine care is common and is most strongly associated with hiding the disease and migraine-related stigma. Those experiencing higher migraine-related burden are more hesitant to seek the care that might alleviate the burden. These findings suggest that migraine's social context (e.g., stigma) is a major determinant of hesitance to seek migraine care. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
23. Inverse modeling of untethered electromagnetic actuators using machine learning.
- Author
-
Türkmen, Gökmen Atakan and Çetin, Levent
- Subjects
- *
ELECTROMAGNETIC actuators , *VECTOR fields , *MAGNETIC fields , *RANDOM forest algorithms , *MAXWELL equations - Abstract
Untethered electromagnetic actuation becomes an appealing concept for developing applications in microscale motion control. Although actuator modeling is critical, there is a lack of inverse modeling methods for untethered electromagnetic actuators (EMA) for control design and implementation. Herein, we focused on a machine learning-based framework to obtain inverse models of untethered EMAs. The inverse model is defined as a model which takes a point in the workspace of EMA together with the magnetic field at that point as input and gives the current(s) and position(s) of electromagnets as output. To obtain the inverse model; initially, the Maxwell Equations are solved for the defined set of coil currents and electromagnet positions numerically. Then, the classification problem is defined by concerning the obtained magnetic field values as data and corresponding the input values (currents and positions) as labels. The Random Forest Classifier is trained to obtain an inverse model to match the given magnetic field vector at a position with input values. The proposed approach is employed for three common structures: Single, Double, and Quadruple EMA. The performance test showed that the obtained inverse model is capable of giving the required magnetic field with accuracy of 1.43% Moreover, experimental study shown that the obtained inverse model is also capable of simulating the real-time behavior of EMA systems. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
24. A lightweight machine learning methods for malware classification.
- Author
-
Farfoura, Mahmoud E., Mashal, Ibrahim, Alkhatib, Ahmad, and Batyha, Radwan M.
- Subjects
- *
INFORMATION technology , *RANDOM forest algorithms , *COMPUTER systems , *MACHINE learning , *EDGE computing - Abstract
Today's Information Technology landscape is rapidly evolving. Cyber professionals are increasingly concerned about maintaining security and privacy. Research has shown that the emergence of new malware is on the rise. The realm of malware assault and defense is an endless circle. Antivirus firms are always striving to create signatures for hazardous malware, while attackers are constantly seeking to circumvent these signatures. Machine learning is incredibly successful at detecting malware. ML-based Malware detection falls into two categories: feature extraction and malware classification. The proposed solutions are designed specifically for low-power embedded devices and edge computing systems. These methods allow for real-time malware detection without imposing a significant computing burden. This study provides an in-depth analysis of feature reduction, and lightweight algorithms to enable this proposed method to work effectively and efficiently on any device starting from PC, IoT devices and servers. Extensive experiments were carried out on BODMAS dataset to provide the best low-complexity method with an F1 score of more than 99%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
25. Prediction of heart disease using random forest algorithm, support vector machine, and neural network.
- Author
-
Setiyadi, Didik, Henderi, Suryaningrat, Anrie, Swastika, Rulin, Saludin, Mutoffar, Muhamad Malik, and Yunianto, Imam
- Subjects
- *
RANDOM forest algorithms , *MEDICAL forecasting , *SUPPORT vector machines , *HEART diseases , *DIAGNOSTIC errors - Abstract
The heart is a vital organ responsible for pumping blood throughout the human body. Machine learning has become an increasingly important tool in medical forecasting, improving diagnostic accuracy and reducing human errors. This study focuses on detecting heart disease using machine learning algorithms. It aims to compare the performance of three key algorithms random forest (RF), support vector machine (SVM), and neural networks (NN), in predicting heart disease. Using a patient dataset with both nominal and numeric attributes, record mining techniques were applied through Orange software. The target classes indicated the absence (0) or presence (1) of heart disorders. The evaluation was based on the prediction accuracy of each algorithm. Results show that SVM achieved the highest accuracy, with a rate of 85%, outperforming RF and NN. The findings suggest that the SVM algorithm is a reliable tool for heart disease prediction, helping reduce diagnostic errors and improve medical decision-making. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
26. Enhancing stormwater network overflow prediction: investigation of ensemble learning models.
- Author
-
Boughandjioua, Samira, Laouacheria, Fares, and Azizi, Nabiha
- Subjects
- *
ENSEMBLE learning , *FLOOD warning systems , *ARTIFICIAL intelligence , *RANDOM forest algorithms , *WATER depth - Abstract
This study addresses the critical issue of urban flooding caused by stormwater network overflow, necessitating unified and efficient management measures to handle increasing water volumes and the effects of climate change. The proposed approach aims to improve the precision and efficiency of overflow rate predictions by investigating advanced machine learning algorithms, specifically ensemble methods such as gradient boosting and random forest algorithms. The main contribution lies in introducing the SWN-ML approach, which integrates hydraulic simulations using MIKE + with machine learning to predict average overflow rates for various rainfall durations and return periods. Mike + model was calibrated for the only available observed data of water depth at the outlet point during the storm event of February 4, 2019. The datasets for model calibration used in ML models consisted of many input variables such as peak flow, max depth, length, slope, roughness, and diameter and average overflow rate as output variable. Experimental results show that these methods are effective under a variety of scenarios, with the ensemble methods consistently outperforming classical machine learning models. For example, the models exhibit similar performance metrics with an MSE of 0.023, RMSE of 0.15, and MAE of 0.101 for a 2-h rainfall duration and a 10-year return period. Correlation analysis further confirms the strong correlation between ensemble method predictions and MIKE + simulated models, with values ranging between 0.72 and 0.80, indicating their effectiveness in capturing stormwater network dynamics. These results validate the utility of ensemble learning models in predicting overflow rates in flood-prone urban areas. The study highlights the potential of ensemble learning models in forecasting overflow rates, offering valuable insights for the development of early warning systems and flood mitigation strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
27. A Weighted Likelihood Ensemble Approach for Failure Prediction of Water Pipes.
- Author
-
Beig Zali, Ramiz, Latifi, Milad, Javadi, Akbar A., and Farmani, Raziyeh
- Subjects
- *
SUPPORT vector machines , *WATER distribution , *CLASSIFICATION algorithms , *RANDOM forest algorithms , *MACHINE learning - Abstract
This paper presents a novel weighted likelihood ensemble approach for predicting pipe failures in water distribution networks (WDNs). The proposed method leverages ensemble modeling, specifically stacking, to enhance prediction capability. The study utilizes a data set of water pipe failures from 2006 to 2017, segmented into different time intervals. Various classification algorithms, including logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGB), are employed to predict failures within these segments. These individual models are then combined to create ensemble models. The results show that the stacked models consistently outperform the models that use the training data set as a whole. Along with traditional evaluation metrics, practical assessments are conducted, considering different percentages of pipes for replacement. These evaluations align with tactical and strategic maintenance plans. Remarkably, the most significant improvements are observed in models with lower replacement percentages. The novel aspect of this approach lies in assigning weights to prediction results from different models, each utilizing distinct time segments of data. By developing a meta-model with linear regression based on weighted likelihoods of pipe failures, this method provides valuable insights for asset managers and decision makers. It aids in prioritizing pipe rehabilitation programs, with the potential for further refinement as new failure data becomes available. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. Estimation of distribution algorithms for well placement optimization in petroleum fields.
- Author
-
Brum, Artur, Coelho, Guilherme, Santos, Antonio Alberto, and Schiozer, Denis José
- Subjects
- *
DISTRIBUTION (Probability theory) , *NET present value , *OIL fields , *RANDOM forest algorithms , *ARTIFICIAL intelligence - Abstract
Optimizing well placement is one of the primary challenges in oil field development. The number and positions of wells must be carefully considered, as it is directly related to the infrastructure cost and the profits over the field's life cycle. In this paper, we propose three estimation of distribution algorithms to optimize well placement with the objective of maximizing the net present value. The methods are guided by an elite set of solutions and are able to obtain multiple local optima in a single run. We also present an auxiliary regression model to preemptively discard candidate solutions with poor performance prediction, thus avoiding running computationally expensive simulations for unpromising candidates. The model is trained with the data obtained during the search process and does not require previous training. Our algorithms yielded a significant improvement compared to a state-of-the-art reference method from the literature, as evidenced by computational experiments with two benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
29. Comparative analysis of machine learning algorithms for heart disease prediction.
- Author
-
Gupta, Isha, Bajaj, Anu, and Sharma, Vikas
- Subjects
- *
MACHINE learning , *CONVOLUTIONAL neural networks , *LONG short-term memory , *FEATURE selection , *RANDOM forest algorithms , *DEEP learning , *ARRHYTHMIA - Abstract
Heart diseases are a major cause of death worldwide, highlighting the need for early detection. The electrocardiogram (ECG) records the heart's electrical activity using electrodes. Our research focuses on the ECG data to diagnose heart disorders, particularly arrhythmias. We utilized the MIT-BIH arrhythmia dataset for comparative analysis of various machine learning techniques, including random forest, K-Nearest Neighbor, and Decision Tree, along with deep learning algorithms like Long short-term memory and Convolutional Neural Networks. This required employing various preprocessing methods like filtering and normalization and feature selection techniques such as chi-square and sequential feature selectors to improve the performance of heart disease prediction. Therefore, hybrid machine and deep learning models are proposed, and the results reveal that hybrid models perform better than conventional models. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
30. Predicting adult students' online learning persistence: A case study in South Korea using random forest analysis.
- Author
-
Nam, Na-Ra and Song, Sue-Yeon
- Subjects
- *
STUDENT engagement , *RANDOM forest algorithms , *SCHOOL attendance , *INSTRUCTIONAL systems , *SCHOOL dropout prevention - Abstract
This empirical study uses a random forest algorithm to examine the factors that influence learners' persistence in online learning at a prominent Korean institution. The data were collected from students who began their studies in Spring 2021, and encompassed a range of variables including individual attributes, academic engagement, academic achievement, course status, and satisfaction with the institution. The study identified several key predictors of student retention, including academic achievement and variables related to academic engagement, such as students' learning time, course completion rate, and number of logins to the online learning system. Students' number of submitted mid-term assignments and attendance at face-to-face classes also emerged as significant factors related to persistence. The predictive model utilised in this study can provide valuable insight, indicating when a learner is at risk of dropping out and thus enabling timely interventions that promote academic persistence and student success. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
31. Spatial and temporal correlation between soil and rice relative yield in small-scale paddy fields and management zones.
- Author
-
Zhang, Zhihao, He, Jiaoyang, Zhao, Yanxi, Fu, Zhaopeng, Wang, Weikang, Zhang, Jiayi, Liu, Xiaojun, Cao, Qiang, Zhu, Yan, Cao, Weixing, and Tian, Yongchao
- Subjects
- *
RICE farming , *RICE , *AGRICULTURE , *RANDOM forest algorithms , *PRECISION farming - Abstract
Investigating soil properties and yield variability in farming systems is crucial for delineating Management Zones (MZs). The objectives of study were to investigate the spatiotemporal variability of soil properties, identify spatial and temporal yield-limiting factors of soil and delineate MZs based on these factors. This study was conducted at the Xinghua Rice Smart Farm (33.08°E, 119.98°N) in Jiangsu Province, China, and the experiment covered five consecutive years of soil and rice yield testing from 2017 to 2021, with 933 geo-referenced soil samples and 140 rice yield samples collected annually. Soil samples were analyzed for pH, soil organic matter (SOM), total nitrogen (TN), available phosphorus (AP), available potassium (AK), and apparent soil conductivity (ECa). Spatial and temporal variability of soil properties and RY were analyzed using statistical and geostatistical methods. Ordinary Kriging (OK) interpolation characterized these distributions, and the random forest (RF) algorithm identified key yield-limiting factors. Subsequently, the effectiveness of using all variables to delineate the MZ was compared against the approach of defining MZs based solely on the identified yield-limiting factors. The study also compared Fuzzy C Means (FCM) and Spatial Fuzzy C-Means (sFCM) clustering to evaluate MZs and their temporal stability. Results showed that the coefficients of variation for soil properties ranged from low to medium (7.7-77.4%), with semi-variational function analyses showing moderate to high spatial dependence for most properties. Temporally, soil nutrients and ECa exhibited a slow increase, whereas pH decreased, showing the highest temporal stability for pH and the lowest for AP. RF analysis identified SOM, TN, and ECa as primary influencers of spatial variability of RY, and SOM, pH, and TN as main contributors to its temporal variability. The integration of yield-limiting factors with the sFCM method improves performance of MZ delineation, maintaining stability over the five-year period. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
32. Hybrid physically based and machine learning model to enhance high streamflow prediction.
- Author
-
López-Chacón, Sergio Ricardo, Salazar, Fernando, and Bladé, Ernest
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *STANDARD deviations , *MACHINE performance , *STREAMFLOW - Abstract
Despite the significant performance of machine learning models for streamflow prediction, their precision for poorly represented data is reduced. This is a concern for flood mitigation purposes where high streamflow values are the most relevant but scarce. Consequently, this study proposes a methodology to create a hybrid model to mitigate the accuracy reduction of a standalone machine learning model in high streamflow values. The hybrid model combines a surrogate model that reproduces a physically based model with a model to estimate its residuals employing the random forest algorithm. The hybrid model reaches a root mean square error reduction of 23% and 33% in the respective study catchments for values over a 3-year return period compared to a standalone machine learning model. The percentage bias decreases by more than 70% from values over a 1.5-year return period. Moreover, the hybrid model has shown close predictions of values higher than the training set. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
33. A spatially explicit multi-hazard framework for assessing flood, landslide, wildfire, and drought susceptibilities.
- Author
-
Choubin, Bahram, Jaafari, Abolfazl, and Mafi-Gholami, Davood
- Subjects
- *
ENSEMBLE learning , *RAINFALL , *RANDOM forest algorithms , *SUSTAINABLE construction , *BUILDING design & construction , *LANDSLIDES , *LANDSLIDE hazard analysis , *HAZARD mitigation , *DROUGHT management - Abstract
Sustainable development goals require evaluating vulnerabilities and examining natural and climatic hazards for effective planning that reduces their impact on economic, social, and developmental efforts. Key hazards like floods, landslides, wildfires, and droughts have significantly affected terrestrial ecosystems and human societies, emphasizing the importance of comprehending these hazards. This study aimed to predict and spatially map multi-hazard, identifying historical and potential risks to inform sustainable development and construction programs that mitigate risks and promote resilience. A 34-year drought magnitude map was generated using long-term data, and ensemble and individual machine learning techniques were used to produce maps of flood, landslide, and wildfire hazards in a northwest region of Iran. Results demonstrated that ensemble learning models outperformed individual models, with the top-performing models being the weighted average (WA) of the two best models, random forest, extreme gradient boosting, WA models with over 80% accuracy, and WA incorporating all models, respectively. The CART model performed best among individual models. Variable importance analysis revealed that slope and precipitation were crucial factors for identifying high-hazard landslide areas, distance from waterways, vegetation cover, and topographic humidity index emerged as the most crucial factor for identifying flood hazard areas, while vegetation, rainfall, and proximity to roads significantly impacted wildfire hazard. The multi-hazard map produced by our study indicated that about 30% of the study area was highly and very highly susceptible to floods, landslides, wildfires, and droughts and the hazards mitigation efforts should be primarily directed to these specific portions of the study area. Our study underscored the importance of integrating long-term data and machine learning techniques in multi-hazard prediction and mapping, ultimately guiding mitigation efforts and promoting resilience in the face of natural and climatic hazards. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
34. A novel approach to improve population mapping considering facility-based service capacity and land livability.
- Author
-
Ma, Yu, Zhou, Chen, and Li, Manchun
- Subjects
- *
REMOTE sensing , *CENSUS , *RANDOM forest algorithms , *CELL phones , *LAND use - Abstract
Social sensing data, including points-of-interest (POI) and mobile position, are important data sources for population mapping. However, existing studies using POI data disregard the heterogeneity in facility size among the same type of POI and urban–rural differences. Moreover, mobile position data face biased issues. This study presents a hybrid model that considers facility-based service capacity (FSC) and land livability to map fine-scale population distributions. Based on extracting the FSC index by integrating POI, mobile phone positioning (MPP), and road network data, the district-level census population was disaggregated into 100-m grids using a random forest model. Subsequently, a regression equation was developed from the land use data to correct the estimated residuals. The results showed that the hybrid model exhibited considerably smaller errors than those of the POI density-based method and direct mapping of MPP (RMSE = 5,393.31, 7,348.91, and 7,824.76, respectively), effectively reducing population misallocation in extreme-density regions. Compared to WorldPop and LandScan datasets and a comparative model, our method reduced the RMSE by 30–60%, proving the effectiveness of integrating various social sensing and remote sensing data for improving population mapping. This study improves on an existing method as a thought-provoking step toward advancement in fine-scale population mapping research. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
35. Determination of Bridge Elements' Weights Using the Random Forest Algorithm.
- Author
-
Abiona, Qozeem O. and Head, Monique H.
- Subjects
- *
RANDOM forest algorithms , *DECISION trees , *BRIDGE design & construction , *BRIDGE maintenance & repair , *BUDGET , *BRIDGE inspection , *BRIDGES - Abstract
Significant bridge inspection data has been collected over the years at the component and element level to improve management practices in the United States. A widely adopted systematic approach to correlate the weight or importance of the bridge elements to the overall bridge performance, which influences the maintenance, repair, and replacement (MRR) schedule and resource allocation for structures, does not exist given the existing data. Some transportation agencies use a cost-based approach to assign weights to bridge elements, which can be in terms of the loss accrued during downtime or the amount needed for the replacement of the element. However, this approach does not consider the bridge elements' structural relevance to the overall performance of the bridge. This study proposes a novel framework to synthesize component and element-level bridge data to showcase their relationship using the random forest algorithm, which is essentially an ensemble of decision trees to evaluate the importance of different elements relative to the overall condition of the bridge. The analysis focused on eight bridge design types predominant in Delaware, Maryland, Pennsylvania, Virginia, and West Virginia, and analyzed 104,699 bridge records consisting of the condition rating and element-level data from the National Bridge Inventory (NBI). The random forest algorithm showed that bridge elements' weight (or importance) is not constant as implied by the cost-based approach; rather, bridge elements' weight varies based on their relevance to the bridge's structural performance. The resultant bridge elements' weight, which is the element weight multiplied by the component weight, can be used to improve the existing Bridge Health Index (BHI) equation found in the Manual for Bridge Evaluation (MBE) using this data-driven approach. Given more available component and element-level bridge data, this formulation provides a framework for transportation personnel to determine which set of bridge elements to prioritize in their maintenance actions and ascertain if the elements receiving the highest priority in the MRR schedule and budget allocation are also the same set of elements that bridge inspection reports regard as needing attention. Practical Applications: The United States bridge inventory is made up of several bridge design types with distinct deterioration characteristics based on their structural configuration and needed to make decisions about maintenance and repair strategies. However, the method currently adopted by bridge owners to prioritize the repair of the many bridge parts (or elements) is largely dependent on the cost of repair and economic loss at the downtime of such elements as decided by experts, which introduces personal bias and does not account for the distinctions among the different bridge types available in the inventory (Chase et al. 2016; Inkoom and Sobanjo 2018). Given nationwide efforts to collect bridge inspection data, it is essential to consider a data-driven approach that derives the bridge elements' importance from historical bridge inspection data and separates the bridge inventory into design types to innovatively determine the bridge elements' importance (or weight) needed to compute the overall bridge health using historical condition state and condition rating data of bridge elements and components. This helps to capture in real time how the deterioration of one bridge part affects another part, which in turn helps to identify the bridge elements that most influence the overall condition of the bridge when prioritizing bridge repairs using the random forest algorithm. This paper showcases a data-driven approach within a novel framework used to assess the overall bridge health using random forest algorithms that track how the deterioration of small bridge elements affects the condition of the bridge components they are attached to, and the overall bridge condition, thus potentially improving the method for computing bridge element weights within the existing Bridge Health Index (BHI) formulation documented in the Manual for Bridge Evaluation (MBE). [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
36. Analysis of high-molecular-weight proteins using MALDI-TOF MS and machine learning for the differentiation of clinically relevant Clostridioides difficile ribotypes.
- Author
-
Candela, Ana, Rodriguez-Temporal, David, Blázquez-Sánchez, Mario, Arroyo, Manuel J., Marín, Mercedes, Alcalá, Luis, Bou, Germán, Rodríguez-Sánchez, Belén, and Oviaño, Marina
- Subjects
- *
SUPERVISED learning , *CLOSTRIDIOIDES difficile , *SUPPORT vector machines , *K-nearest neighbor classification , *RANDOM forest algorithms - Abstract
Purpose: Clostridioides difficile is the main cause of antibiotic related diarrhea and some ribotypes (RT), such as RT027, RT181 or RT078, are considered high risk clones. A fast and reliable approach for C. difficile ribotyping is needed for a correct clinical approach. This study analyses high-molecular-weight proteins for C. difficile ribotyping with MALDI-TOF MS. Methods: Sixty-nine isolates representative of the most common ribotypes in Europe were analyzed in the 17,000–65,000 m/z region and classified into 4 categories (RT027, RT181, RT078 and 'Other RTs'). Five supervised Machine Learning algorithms were tested for this purpose: K-Nearest Neighbors, Support Vector Machine, Partial Least Squares-Discriminant Analysis, Random Forest (RF) and Light-Gradient Boosting Machine (GBM). Results: All algorithms yielded cross-validation results > 70%, being RF and Light-GBM the best performing, with 88% of agreement. Area under the ROC curve of these two algorithms was > 0.9. RT078 was correctly classified with 100% accuracy and isolates from the RT181 category could not be differentiated from RT027. Conclusions: This study shows the possibility of rapid discrimination of relevant C. difficile ribotypes by using MALDI-TOF MS. This methodology reduces the time, costs and laboriousness of current reference methods. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
37. Balance evaluation system using wearable IMU sensing.
- Author
-
Wang, Tiantian, Liu, Minghui, Bao, Benkun, Zhang, Senhao, Yang, Liuxin, Yang, Hongbo, Guo, Kai, and Meng, Dianhuai
- Subjects
- *
RANDOM forest algorithms , *WEARABLE technology , *DEEP learning , *TESTING equipment , *MEDICAL rehabilitation - Abstract
The evaluation of balance and postural stability holds significant importance in both medical rehabilitation and daily life. However, the clinical method is hindered by the inconvenience of immobility and relatively high costs associated with the force platforms. Wearable sensors, such as accelerometers, have emerged as an alternative solution, overcoming the limitations of traditional force platforms. Thus, the purpose of this study is to utilize data obtained from a low-cost, portable, small-sized IMU (specifically an accelerometer) to predict indicators derived from force platform devices. A miniaturized and portable acceleration test equipment was proposed. Together with the random forest algorithm, our classification method achieved classification results with accuracy, recall, precision, f1-score, and specificity scores above 95%, This study provides a more portable and highly accurate tool for assessing balance ability. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
38. Machine Learning Versus Empirical Models to Predict Daily Global Solar Irradiation in an Average Year: Homogeneous Parallel Ensembles Prevailed.
- Author
-
De Souza, Keith
- Subjects
- *
MACHINE learning , *EXTREME learning machines , *STANDARD deviations , *ARTIFICIAL intelligence , *SUPPORT vector machines , *DECISION trees , *RANDOM forest algorithms - Abstract
Accurate predictive daily global horizontal irradiation models are essential for diverse solar energy applications. Their long-term performances can be assessed using average years. This study scrutinized 70 machine learning and 44 empirical models using two disjoint 5-year average daily training and validation datasets, each comprising 365 records and ten features. The features included day number, minimum and maximum air temperature, air temperature amplitude, theoretical and observed sunshine hours, theoretical extraterrestrial horizontal irradiation, relative sunshine, cloud cover, and relative humidity. Fourteen machine learning algorithms, namely, multiple linear regression, ridge regression, Lasso regression, elastic net regression, Huber regression, k-nearest neighbors, decision tree, support vector machine, multilayer perceptron, extreme learning machine, generalized regression neural network, extreme gradient boosting, gradient boosting machine, and light gradient boosting machine were trained, validated, and instantiated as base learners in four strategically designed homogeneous parallel ensembles--variants of pasting, random subspace, bagging, and random patches--which also were scrutinized, producing 70 models. Specific hyperparameters from the algorithms were optimized. Validation showed that at least two ensembles outperformed its individual model. Huber-subspace ranked first with a root mean square error of 1.495 MJ/m2/day. The multilayer perceptron was most robust to the random perturbations of the ensembles which extrapolate to good tolerance to ground-truth data noise. The best empirical model returned a validation root mean square error of 1.595 MJ/m2/day but was outperformed by 93% of the machine learning models with the homogeneous parallel ensembles producing superior predictive accuracies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
39. Experimental Verification for Machine-Learning Approaches in Compressive Strength Prediction of Alkali-Activated Concrete.
- Author
-
Morsy, Alaa M., Saleh, Sara A., and Shalan, Ali H.
- Subjects
LONG short-term memory ,RECURRENT neural networks ,IMPACT strength ,RANDOM forest algorithms ,ALKALINE solutions ,MACHINE learning - Abstract
This study presents a new tool for predicting the compressive strength of alkali-activated concrete (AAC) based on its binder mineralogy. It was made using a machine-learning (ML) framework. To achieve this challenging task, the authors collected 45 data sources from the literature to build a data set of 809 samples that included nine effective features, such as binder content, alkaline-to-binder ratio, binder chemical compositions (CaO, SiO2 , Al2O3 , and MgO contents), NaOH molarity and percentage in alkaline solution, age, and compressive strength. To assess the accuracy of the prediction tool, the authors trained and evaluated the data set using the most relevant ML methods: Lasso regression, random forest regression, decision tree, AdaBoost, extreme gradient boosting (XGB), and long short-term memory with recurrent neural network (LSTM-RNN). An experimental program was also conducted using the ML approaches to further validate the accuracy of the predictions. Overall, the XGB and LSTM-RNN methods were observed to significantly outperform the other methods in terms of accuracy when predicting compressive strength. Particularly impressive results were seen, with R2 values of 0.93 and 0.96 for compressive strength prediction being recorded. Further analysis of the binder mineralogy showed that increasing calcite content percentage led to an increase in AAC compressive strength, whereas increasing silicates in the binder mineralogy caused a decrease in AAC compressive strength due to the shortage of calcites. The Shapley additive explanations (SHAP) analysis revealed that calcite and silicate had the highest SHAP values for the AAC compressive strength. In contrast, the Al2O3 and MgO percentages had only a minor impact on the compressive strength of AAC. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
40. Applicability of Machine Learning to Predict the Flexural Stresses in Jointed Plain Concrete Pavement.
- Author
-
Khichad, Jeetendra Singh, Vishwakarma, Rameshwar J., Singh, Saurabh, and Sain, Amit
- Subjects
CONCRETE pavements ,RANDOM forest algorithms ,PLAINS ,SENSITIVITY analysis - Abstract
The development of critical flexural stresses causes variations in the design thickness requirements of jointed plain concrete pavement (JPCP), which are influenced by various combinations of design parameters. It is important to study these design parameters, such as maximum temperature difference, strength of the foundation, slab thickness, and vehicular load, to determine maximum flexural stresses. A total of 480 design conditions (2,880 data sets) for single- and tandem-axle loading each were analyzed and considered covering all practical range of design parameters combinations. Equations for determining maximum flexural stresses in terms of design parameters were derived using machine learning (ML) algorithms: multiple linear regression (MLR), support vector regression (SVR), and random forest (RF) with 3-fold cross-validation. These ML algorithms have shown a good correlation of R2=0.918 , 0.927, and 0.929 for single-axle load and R2=0.915 , 0.905, and 0.934 for tandem-axle load, respectively. These equations were used to find the critical flexural stresses for the thickness design of JPCP. The computed maximum flexural stresses were also compared with different design approaches such as simplified approach, stress charts, and regression equations for the thickness design of the JPCP slab. Sensitivity analysis was performed to assess the effects of adjusting input variable data by 10% while keeping the other input parameters constant. The radius of relative stiffness and stress coefficient had the greatest influence in maximum flexural stresses. This study would be helpful for the prediction of design parameters effect and precise determination of maximum flexural stresses in JPCP. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
41. Exploring Bengali speech for gender classification: machine learning and deep learning approaches.
- Author
-
Arpita, Habiba Dewan, Al Ryan, Abdullah, Hossain, Md. Fahad, Rahman, Md. Sadekur, Sajjad, Md, and Islam Prova, Nuzhat Noor
- Subjects
MACHINE learning ,DEEP learning ,CONVOLUTIONAL neural networks ,SUPPORT vector machines ,RANDOM forest algorithms ,SPEECH perception - Abstract
Speech enables clear and powerful idea transmission. The human voice, rich in tone and emotion, holds unique beauty and significance in daily life. Vocal pitches vary by gender and are influenced by emotions and languages. While people naturally perceive these nuances, machines often struggle to capture these subtle distinctions. Machines may struggle to detect these nuances, but people effortlessly perceive them. This project aims to use various machine learning (ML) and deep learning (DL) techniques to reliably determine an individual's gender from a corpus of Bengali conversations. Our dataset comprises 3185 Bengali speeches, with 1100 delivered by males, 1035 by women, and 1050 by those who identify as third gender. We employed six distinct feature extraction techniques to examine the audio data: roll-off, spectral centroid, chroma-stft, spectral bandwidth, zero crossing rate, and Mel-frequency cepstral coefficients (MFCC). Extreme gradient boosting (XGBoost), support vector machines (SVM), Knearest neighbors (KNN), decision trees classifier (DTC), and random forest (RF) were employed as the five ML algorithms to comprehensively analyze the dataset. For a full study, we also included 1D convolutional neural networks (CNN) from the DL area. The 1D CNN performed extraordinarily well, exceeding the accuracy of all other algorithms with a stunning 99.37%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
42. Fault diagnosis of power transformer using random forest based combined classifier.
- Author
-
Prasojo, Rahman Azis, Sutjipto, Rachmat, Hanif, Muhammad Rafi, Dermawan, Chalvyn Rahmat, and Kurniawan, Indra
- Subjects
RANDOM forest algorithms ,INSULATING oils ,POWER transformers ,DIELECTRIC materials ,FAULT diagnosis - Abstract
In the power system, transformers are crucial electrical equipment that require an insulator or dielectric material, such as paper immersed in insulating oil, to prevent electrical contact between components. The dissolved gas analysis (DGA) test is important for diagnosing and determining the maintenance recommendations for transformers. The duval triangle method (DTM) is commonly used to identify faults in transformers. The data used in this article are from DGA test of power transformers in East Java and Bali transmission main unit (UIT JBM). The DGA data were analyzed based on the IEEE C57.104-2019 standards, and by using the developed random forest (RF) classifier-based DTM for easier software implementation and better accuracy. The results of fault identification in 6 transformers case study showed a low-thermal fault (T1)<300 ℃ in transformer 1, where methane gas increased, stray gassing (S) in transformer 5 due to escalating hydrogen gas production, overheating (O)=250 ℃ indicated in transformers 2 and 6 due to rising ethane gas production. Transformers 3 and 4 were found in normal condition. This fault identification is done to enhance the accuracy of maintenance recommendation action based on DGA. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
43. 基于快速温度调制的气体传感器选择性提高方法.
- Author
-
林凯滨, 林建华, 贾 建, 高晓光, and 何秀丽
- Subjects
GAS detectors ,PULSE modulation ,SUPPORT vector machines ,SENSOR arrays ,RANDOM forest algorithms - Abstract
Copyright of Journal of Test & Measurement Technology is the property of Publishing Center of North University of China and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2025
- Full Text
- View/download PDF
44. Evaluating machine learning models for predictive analytics of liver disease detection using healthcare big data.
- Author
-
Khaled, Osama Mohareb, Elsherif, Ahmed Zakareia, Salama, Ahmed, Herajy, Mostafa, and Elsedimy, Elsayed
- Subjects
K-nearest neighbor classification ,RANDOM forest algorithms ,LIVER diseases ,CIRRHOSIS of the liver ,EARLY diagnosis - Abstract
Liver diseases rank among the most prevalent health issues globally, causing significant morbidity and mortality. Early detection of liver diseases allows for timely intervention, which can prevent the progression of such diseases to more severe stages such as cirrhosis or liver cancer. To this end, many machine learning models have been previously developed to early predict liver diseases among potential patients. However, each model has its accuracy and performance limitations. In this paper, we present a comprehensive comparison of three different machine learning models that can be employed to enhance the prediction and management of liver diseases. We utilize a big data set of 32,000 records to evaluate the performance of each model. First, we implement a preprocessing technique to rectify missing or corrupt data in liver disease datasets, ensuring data integrity. Afterwards, we compare the performance of three machine models: k-nearest neighbors (KNN), gaussian naive Bayes (Gaussian NB) and random forest (RF). We concluded that the RF algorithm demonstrates superior performance in our evaluation, excelling in both predictive accuracy and the ability to classify patients accurately regarding the presence of liver disease. Our results show that RF outperforms other models based on several performance metrics including accuracy: 97.3%, precision: 97%, recall: 96%, and F1-score: 95%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
45. Identifying freshness of various chilled pork cuts using rapid imaging analysis.
- Author
-
Cheng, Haoran, Li, Jinglei, Yang, Yulong, Zhou, Gang, Xu, Baocai, and Yang, Liu
- Subjects
- *
MACHINE learning , *IMAGE analysis , *RANDOM forest algorithms , *COLORIMETERS , *DECISION trees - Abstract
BACKGROUND: Determining the freshness of chilled pork is of paramount importance to consumers worldwide. Established freshness indicators such as total viable count, total volatile basic nitrogen and pH are destructive and time‐consuming. Color change in chilled pork is also associated with freshness. However, traditional detection methods using handheld colorimeters are expensive, inconvenient and prone to limitations in accuracy. Substantial progress has been made in methods for pork preservation and freshness evaluation. However, traditional methods often necessitate expensive equipment or specialized expertise, restricting their accessibility to general consumers and small‐scale traders. Therefore, developing a user‐friendly, rapid and economical method is of particular importance. RESULTS: This study conducted image analysis of photographs captured by smartphone cameras of chilled pork stored at 4 °C for 7 days. The analysis tracked color changes, which were then used to develop predictive models for freshness indicators. Compared to handheld colorimeters, smartphone image analysis demonstrated superior stability and accuracy in color data acquisition. Machine learning regression models, particularly the random forest and decision tree models, achieved prediction accuracies of more than 80% and 90%, respectively. CONCLUSION: Our study provides a feasible and practical non‐destructive approach to determining the freshness of chilled pork. © 2024 Society of Chemical Industry. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
46. Efficient diagnosis of diabetes mellitus using an improved ensemble method.
- Author
-
Olorunfemi, Blessing Oluwatobi, Ogunde, Adewale Opeoluwa, Almogren, Ahmad, Adeniyi, Abidemi Emmanuel, Ajagbe, Sunday Adeola, Bharany, Salil, Altameem, Ayman, Rehman, Ateeq Ur, Mehmood, Asif, and Hamam, Habib
- Subjects
- *
RANDOM forest algorithms , *FEATURE selection , *ARTIFICIAL intelligence , *REGRESSION trees , *IMAGE processing , *BOOSTING algorithms - Abstract
Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods—XG Boost, AdaBoostM1, and Gradient Boosting—using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
47. Detecting noncredible symptomology in ADHD evaluations using machine learning.
- Author
-
Finley, John-Christopher A., Phillips, Matthew S., Soble, Jason R., and Rodriguez, Violeta J.
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *TEST validity , *ARTIFICIAL intelligence , *ATTENTION-deficit hyperactivity disorder - Abstract
IntroductionMethodResultsConclusionDiagnostic evaluations for attention-deficit/hyperactivity disorder (ADHD) are becoming increasingly complicated by the number of adults who fabricate or exaggerate symptoms. Novel methods are needed to improve the assessment process required to detect these noncredible symptoms. The present study investigated whether unsupervised machine learning (ML) could serve as one such method, and detect noncredible symptom reporting in adults undergoing ADHD evaluations.Participants were 623 adults who underwent outpatient ADHD evaluations. Patients’ scores from symptom validity tests embedded in two self-report questionnaires were examined in an unsupervised ML model. The model, called “sidClustering,” is based on a clustering and random forest algorithm. The model synthesized the raw scores (without cutoffs) from the symptom validity tests into an unspecified number of groups. The groups were then compared to predetermined ratings of credible versus noncredible symptom reporting. The noncredible symptom ratings were defined by either two or three or more symptom validity test elevations.The model identified two groups that were significantly (
p < .001) and meaningfully associated with the predetermined ratings of credible or noncredible symptom reporting, regardless of the number of elevations used to define noncredible reporting. The validity test assessing overreporting of various types of psychiatric symptoms was most influential in determining group membership; but symptom validity tests regarding ADHD-specific symptoms were also contributory.These findings suggest that unsupervised ML can effectively identify noncredible symptom reporting using scores from multiple symptom validity tests without predetermined cutoffs. The ML-derived groups also support the use of two validity test elevations to identify noncredible symptom reporting. Collectively, these findings serve as a proof of concept that unsupervised ML can improve the process of detecting noncredible symptoms during ADHD evaluations. With additional research, unsupervised ML may become a useful supplementary tool for quickly and accurately detecting noncredible symptoms during these evaluations. [ABSTRACT FROM AUTHOR]- Published
- 2025
- Full Text
- View/download PDF
48. Anxiety in aquatics: Leveraging machine learning models to predict adult zebrafish behavior.
- Author
-
Srivastava, Vartika, Muralidharan, Anagha, Swaminathan, Amrutha, and Poulose, Alwin
- Subjects
- *
NAIVE Bayes classification , *ANIMAL behavior , *DRUG discovery , *RANDOM forest algorithms , *SUPPORT vector machines , *MACHINE learning - Abstract
Accurate analysis of anxiety behaviors in animal models is pivotal for advancing neuroscience research and drug discovery. This study compares the potential of DeepLabCut, ZebraLab, and machine learning models to analyze anxiety-related behaviors in adult zebrafish. Using a dataset comprising video recordings of unstressed and pre-stressed zebrafish, we extracted features such as total inactivity duration/immobility, time spent at the bottom, time spent at the top and turn angles (large and small). We observed that the data obtained using DeepLabCut and ZebraLab were highly correlated. Using this data, we annotated behaviors as anxious and not anxious and trained several machine learning models, including Logistic Regression, Decision Tree, K-Nearest Neighbours (KNN), Random Forests, Naive Bayes Classifiers, and Support Vector Machines (SVMs). The effectiveness of these machine learning models was validated and tested on independent datasets. We found that some machine learning models, such as Decision Tree and Random Forests, performed excellently to differentiate between anxious and non-anxious behavior, even in the control group, where the differences between subjects were more subtle. Our findings show that upcoming technologies, such as machine learning models, are able to effectively and accurately analyze anxiety behaviors in zebrafish and provide a cost-effective method to analyze animal behavior. [Display omitted] • Immobility, bottom/top dwelling and turn angles are reliable features to identify anxiety in zebrafish. • DeepLabCut reliably extracts features of zebrafish behavior. • Machine learning models can accurately classify adult zebrafish as anxious and non-anxious. • Decision Tree and Random Forest models excel in identifying anxiety behaviors. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
49. Causal effect estimation for competing risk data in randomized trial: adjusting covariates to gain efficiency.
- Author
-
Cho, Youngjoo, Zheng, Cheng, Qi, Lihong, Prentice, Ross L., and Zhang, Mei-Jie
- Subjects
- *
LOW-fat diet , *RANDOM forest algorithms , *CAUSAL inference , *COMPETING risks , *DIET in disease , *NONPARAMETRIC estimation - Abstract
The double-blinded randomized trial is considered the gold standard to estimate the average causal effect (ACE). The naive estimator without adjusting any covariate is consistent. However, incorporating the covariates that are strong predictors of the outcome could reduce the issue of unbalanced covariate distribution between the treated and controlled groups and can improve efficiency. Recent work has shown that thanks to randomization, for linear regression, an estimator under risk consistency (e.g. Random Forest) for the regression coefficients could maintain the convergence rate even when a nonparametric model is assumed for the effect of covariates. Also, such an adjusted estimator will always lead to efficiency gain compared to the naive unadjusted estimator. In this paper, we extend this result to the competing risk data setting and show that under similar assumptions, the augmented inverse probability censoring weighting (AIPCW) based adjusted estimator has the same convergence rate and efficiency gain. Extensive simulations were performed to show the efficiency gain in the finite sample setting. To illustrate our proposed method, we apply it to the Women's Health Initiative (WHI) dietary modification trial studying the effect of a low-fat diet on cardiovascular disease (CVD) related mortality among those who have prior CVD. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
50. Machine learning prediction method for assessing water quality impacts on sandstone reservoir permeability and its application in energy development.
- Author
-
Song, Xiankun, Liu, Yuetian, Song, Zhenyu, Wang, Jianzhong, Yang, Xiaowen, Li, Guanlin, and Fan, Pingtian
- Subjects
- *
ENSEMBLE learning , *RANDOM forest algorithms , *PARTICLE swarm optimization , *SUSPENDED solids , *WATER quality - Abstract
Water quality is a key determinant of sandstone reservoir permeability, directly influencing subsurface reservoir management and energy extraction efficiency. However, existing research mainly focuses on static permeability, and has not fully elucidated how factors such as suspended solids, oil content, porosity, and initial permeability jointly affect permeability changes, as well as how to accurately predict the dynamic changes in permeability affected by water quality. To address this knowledge gap, we conducted a systematic investigation of these critical factors through experimental data analysis. Our findings indicate that permeability decreases rapidly when the dimensionless injection volume is less than 100, after which the rate of decline slows. High-permeability reservoirs exhibit robust resistance to permeability loss, retaining 34.65% of their permeability after 200 PV injections under Grade V water conditions. In contrast, low-permeability reservoirs are highly sensitive to water quality, retaining only 2.79% of their permeability under the same conditions. Grade I water significantly mitigates permeability loss across all reservoir types. To accurately predict permeability variations, we developed an ensemble learning model that incorporates injection volume, suspended solids, particle size, oil content, porosity, and the permeability ratio. Using a random forest algorithm, the model quantifies the relative importance of each factor in influencing reservoir permeability. The model was further optimized using Particle Swarm Optimization and denoised with a Gaussian filter, resulting in a highly accurate predictive performance with an R2 value of 0.9486. Application of the model to the Gudong Oilfield validated its ability to predict permeability changes under varying water quality conditions and analyze productivity differences through numerical simulations. This dynamic prediction model for water quality, injection volume, and permeability addresses the limitations of previous models in complex environments, providing critical methodological support for energy extraction and reservoir management. • Built an ensemble learning model for permeability prediction. • Quantified effects of injection volume, particles, and oil on permeability. • Ranked influencing factors via laboratory core flow tests. • Provided charts for six permeability changes under five water qualities. • Developed a time-varying ML method for predicting sandstone permeability. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.