24,566 results on '"RANDOM forest algorithms"'
Search Results
2. Predicting the length-of-stay of pediatric patients using machine learning algorithms.
- Author
-
Boff Medeiros, Natália, Fogliatto, Flávio Sanson, Karla Rocha, Miriam, and Tortorella, Guilherme Luz
- Subjects
RANDOM forest algorithms ,MACHINE learning ,LENGTH of stay in hospitals ,CHILDREN'S injuries ,FOREST productivity ,CHILD patients - Abstract
The management of hospitals' resource capacity has a strong impact on the quality of care, and the length-of-stay (LOS) of patients is an indicator that reflects its efficiency and effectiveness. This study aims at predicting the LOS of pediatric patients (LOS-P) in hospitals to assist in decision-making regarding resource utilisation. LOS-P forecasting presents additional challenges to the analyst compared to other medical specialties since Pediatrics comprises several other subspecialties (e.g. pediatric oncology and traumatology). Pediatric patients within subspecialties compete for the same hospital resources, and aggregate LOS-P predictions are more useful for resource planning. However, aggregate pediatric LOS datasets are harder to model and result in lower forecasting accuracy. To address that problem, we propose a forecasting model based on Machine Learning algorithms. The method for LOS-P forecasting comprises five steps (data visualisation, data pre-processing, sample partitioning, model testing, and model definition through parameter setting and variable selection) and is tested using a dataset of hospitalisations of pediatric patients from a large Brazilian University hospital. Multiple linear regression, random forest, support vector regression, ridge regression, and partial least squares algorithms are applied and compared to determine the best forecasting model. Results indicate that all forecasting models yield satisfactory accuracy, with the best algorithms being random forest and support vector regressor. After refining the model through variable selection and using a Grid Search to find the best parameters, the random forest algorithm yielded an R
2 of 65.67%, with an average absolute error of 3.51 days. Highlights: Prediction of the length of stay of pediatric patients (LOS-P) in hospitals based on Machine Learning algorithms Multiple linear regression, random forest, support vector regression, ridge regression, and partial least squares algorithms were applied and compared Random forest algorithm yielded an R2 of 65.67%, with an average absolute error of 3.51 days [ABSTRACT FROM AUTHOR]- Published
- 2025
- Full Text
- View/download PDF
3. Predicting Basketball Shot Outcome From Visuomotor Control Data Using Explainable Machine Learning.
- Author
-
Aitcheson-Huehn, Nikki, MacPherson, Ryan, Panchuk, Derek, and Kiefer, Adam W.
- Subjects
- *
VISUOMOTOR coordination , *RANDOM forest algorithms , *EYE tracking , *DECISION trees , *MACHINE learning - Abstract
Quiet eye (QE), the visual fixation on a target before initiation of a critical action, is associated with improved performance. While QE is trainable, it is unclear whether QE can directly predict performance, which has implications for training interventions. This study predicted basketball shot outcome (make or miss) from visuomotor control variables using a decision tree classification approach. Twelve basketball athletes completed 200 shots from six on-court locations while wearing mobile eye-tracking glasses. Training and testing data sets were used for modeling eight predictors (shot location, arm extension time, and absolute and relative QE onset, offset, and duration) via standard and conditional inference decision trees and random forests. On average, the trees predicted over 66% of makes and over 50% of misses. The main predictor, relative QE duration, indicated success for durations over 18.4% (range: 14.5%–22.0%). Training to prolong QE duration beyond 18% may enhance shot success. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Uncovering Financial Constraints.
- Author
-
Linn, Matthew and Weagley, Daniel
- Subjects
BUSINESS enterprises ,RANDOM forest algorithms ,BUSINESS finance ,INVESTORS ,MARKET sentiment - Abstract
We use a random forest model to classify firms' financial constraints using only financial variables. Our methodology expands the range of classified firms compared to text-based measures while maintaining similar levels of informativeness. We construct two versions of our constraint measures, one using many firm characteristics and the other using a small set of more primitive characteristics. Using our measures, we find that institutional investors hold a lower percentage of shares in equity-focused constrained firms, while retail investors show a preference for them. Equity issuance and investment of constrained firms also increases during periods of high investor sentiment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Not seeing the wood for the trees: Influences on random forest accuracy.
- Author
-
Hand, Chris and Fitkov-Norris, Elena
- Subjects
RANDOM forest algorithms ,RANDOM noise theory ,WOOD ,MACHINE learning ,STATISTICAL sampling - Abstract
Machine learning classifiers are increasingly widely used. This research note explores how a particular widely used classifier, the Random Forest, performs when faced with samples which are imbalanced and noisy data. Both are known to affect accuracy, but if their effects are independent or not has not been explored. Based on an experiment using synthetic data generated for the study we find that the effects of noise and sample balance interact with each other; classification accuracy is worse when faced with both noisy data and sample imbalance. This has implications for the use of RF in market research, but also for how methods to address either sample imbalance or noise are assessed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. A novel approach for predicting Lockout/Tagout safety procedures for smart maintenance strategies.
- Author
-
Delpla, Victor, Chapron, Kevin, Kenné, Jean-Pierre, and Hof, Lucas A.
- Subjects
SMART cities ,ARTIFICIAL neural networks ,RANDOM forest algorithms ,CONCRETE industry - Abstract
This article presents an approach for predicting Lockout/Tagout (LOTO) procedure sheets, which are commonly used in the manufacturing industry to prevent premature equipment restart during maintenance. The prediction problem of energetic devices to lock from machine names is regarded as a multi-task classification problem. The dataset was obtained by processing LOTO sheets in Portable Document Format (PDF). The K-Nearest Neighbours (KNN), Random Forest (RF), and Deep Neural Network (DNN) algorithms were compared for this problem. The best prediction performance was achieved with the DNN method, with top-1 accuracies exceeding 63% and top-2 accuracies exceeding 90% for all devices. The sensitivity analysis conducted on the results indicates that the approach is robust and reliable, regardless of the industrial sector considered. In other words, the approach is not significantly affected by variations in the industry or its specific characteristics. These results suggest that the proposed approach can be used to assist workers in drafting LOTO sheets, and offers strong potential for concrete applications in safety management in the era of smart manufacturing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Quality design based on kernel trick and Bayesian semiparametric model for multi-response processes with complex correlations.
- Author
-
Yang, Shijuan, Wang, Jianjun, Cheng, Xiaoying, Wu, Jiawei, and Liu, Jinpei
- Subjects
PRINCIPAL components analysis ,EVOLUTIONARY algorithms ,RANDOM forest algorithms ,LEAST squares - Abstract
Processes or products are typically complex systems with numerous interrelated procedures and interdependent components. This results in complex relationships between responses and input factors, as well as complex nonlinear correlations among multiple responses. If the two types of complex correlations in the quality design cannot be properly dealt with, it will affect the prediction accuracy of the response surface model, as well as the accuracy and reliability of the recommended optimal solutions. In this paper, we combine kernel trick-based kernel principal component analysis, spline-based Bayesian semiparametric additive model, and normal boundary intersection-based evolutionary algorithm to address these two types of complex correlations. The effectiveness of the proposed method in modeling and optimisation is validated through a simulation study and a case study. The results show that the proposed Bayesian semiparametric additive model can better describe the process relationships compared to least squares regression, random forest regression, and support vector basis regression, and the proposed multi-objective optimisation method performs well on several indicators mentioned in the paper. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Sales forecasting of a food and beverage company using deep clustering frameworks.
- Author
-
Mitra, Rony, Saha, Priyam, and Kumar Tiwari, Manoj
- Subjects
SALES forecasting ,FOOD industry ,GAUSSIAN mixture models ,RANDOM forest algorithms ,RETAIL industry ,HIERARCHICAL clustering (Cluster analysis) - Abstract
The competition among Food & Beverage companies has substantially increased in today's age of digitization. Sales forecasting is one of their main challenges. Due to space limitations, employee shortages, and rising online demand, retail sales forecasting became extremely important for Food and Beverage companies. This research analyzed the sales data of a multinational Food & Beverage Company. It proposed a framework using Gaussian Mixture Model (GMM) clustering, Hierarchical Agglomerative Clustering (HAC), and Random Forest algorithm for forecasting sales. This model analyzes the impact of the weekends, holidays, promotional activities, customer sentiments, festivals, and socio-economic situations in sales data and is able to forecast sales ranging from one to 15 months. An investigation of the suggested model's performance compared to numerous cutting-edge sales forecasting techniques is carried out to show its efficacy. Here, we demonstrate that the proposed hybrid model surpasses current predicting and computing efficiency methods. The results of this study can help retail managers to allocate resources and manage inventories in well-informed ways. The findings suggest that combining many strategies may produce the most precise forecasts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. The optimization of evaporation rate in graphene-water system by machine learning algorithm.
- Author
-
Qiao, Degao, Yang, Ming, Gao, Yin, Hou, Jue, Zhang, Xingli, and Zhang, Hang
- Subjects
- *
RANDOM forest algorithms , *INTERFACIAL bonding , *PRODUCTION methods , *INSTRUCTIONAL systems , *PREDICTION models , *ERROR rates , *MACHINE learning , *PEARSON correlation (Statistics) , *DATA extraction - Abstract
Solar interfacial evaporation, as a novel practical freshwater production method, requires continuous research on how to improve the evaporation rates to increase water production. In this study, sets of data were obtained from molecule dynamics simulation and literature, in which the parameters included height, diameter, height–radius ratio, evaporation efficiency, and evaporation rate. Initially, the correlation between the four input parameters and the output of the evaporation rate was examined through traditional pairwise plots and Pearson correlation analysis, revealing weak correlations. Subsequently, the accuracy and generalization performance of the evaporation rate prediction models established by neural network and random forest were compared, with the latter demonstrating superior performance and reliability confirmed via random data extraction. Furthermore, the impact of different percentages (10%, 20%, and 30%) of the data on the model performance was explored, and the result indicated that the model performance is better when the test set is 20% and all the constructed model converge. Moreover, the mean absolute error and mean squared error of the evaporation rate prediction model for the three ratios were calculated to evaluate their performance. However, the relationship between the height- radius ratio and optimal evaporation rate was investigated using the enumeration method, and it was determined that the evaporation efficiency was optimal when the height–radius ratio was 6. Finally, the importance of height, diameter, height– radius ratio, and evaporation efficiency were calculated to optimize evaporator structure, increase evaporation rate, and facilitate the application of interfacial evaporation in solar desalination. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Performance Prediction of Server Using Neural Network Algorithm Compared with Random Forest Algorithm Based on Option Posted by Players
- Author
-
Janardhan, K. P., Selvakumar, A., Ramesh, S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Dutta, Soumi, editor, Bhattacharya, Abhishek, editor, Shahnaz, Celia, editor, and Chakrabarti, Satyajit, editor
- Published
- 2025
- Full Text
- View/download PDF
11. Effective prediction of used car prices using machine learning.
- Author
-
Bhatt, Nirav, Patel, Rudra, Patel, Divy, Patel, Prathamkumar, and Prajapati, Purvi
- Subjects
- *
USED car sales & prices , *AUTOMOBILE sales & prices , *USED cars , *RANDOM forest algorithms , *AUTOMOBILE industry - Abstract
Cars have an important role in promoting mobility and economic engagement. However, precisely finding a fair price for a secondhand car remains difficult. In this paper, we apply machine learning to estimate used automobile prices. Using two Kaggle datasets, we examine the performance of three methods: linear regression, random forest, and gradient boosting. Our results show that Random Forest consistently beats the other techniques in both datasets, indicating its usefulness in predicting automobile pricing. This study provides important insights for buyers, sellers, and the automotive industry. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. Herb compounds screening as meningitis inhibitor candidates using neural network and random forest methods.
- Author
-
Aprilia, Riska, Faroby, Mohammad Hamim Zajuli Al, Kamali, Muhammad Adib, and Fauzi, Muhammad Dzulfikar
- Subjects
- *
RANDOM forest algorithms , *DRUG discovery , *DNA fingerprinting , *FEATURE extraction , *MYCOSES - Abstract
Meningitis is an inflammation of the meninges that occurs in the protective lining of the brain and spinal cord caused by bacterial, viral, or fungal infections. This disease is difficult to recognize because it has initial symptoms like the flu where the patient has a fever and headache. Current efforts to prevent the disease by strengthening antibodies. Meanwhile, drug candidates for the treatment of this disease still have not found optimal results in reducing mortality due to meningitis. This study aims to find and analyses herbal compound candidates that might be inhibitors of meningitis. Compound data was acquired from a validated open database. The data acquired are smiles of the chemical bond structure of the compounds. In the data processing process, compound feature extraction is required by applying the concept of molecular fingerprint. The results of feature extraction are used as datasets to build classification models by applying the Multilayer Perceptron (MLP) and Random Forest algorithms. The two models are compared, and a more robust model is selected to be used as a prediction model for herbal compound search. The MLP model has a better accuracy of 0.97 compared to the Random Forest model. The results of screening using the MLP learning model obtained Symphytine, cis-Linalool oxide, and 3-O-Methylcalopocarpin compounds have the highest probability compared to thousands of other herbal compounds. This candidate compound can be used as a recommendation for drug discovery to treat patients who contract Meningitis. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. Improving stunting prediction in children: Evaluating ensemble algorithms with SMOTE and feature selection.
- Author
-
Byna, Agus, Anisa, Fadhiyah Noor, and Nurhaeni, Nurhaeni
- Subjects
- *
GROWTH of children , *FEATURE selection , *DEEP learning , *RANDOM forest algorithms , *DECISION trees - Abstract
Stunting is a pressing issue for welfare and health in many developing countries, including Indonesia. Stunting occurs due to a lack, excess, or imbalance of energy and nutrients important in child growth. This study aims to model by applying Machine Learning evaluating three ensemble algorithms on the Banjarmasin Demographic Health dataset to predict stunting in children under five. We applied the three algorithms with SMOTE and Feature selection techniques to improve the accuracy level to provide the best value. The data used were 457 stunted children. Thirteen features were selected to be included in the twelve models. Decision Tree with SMOTE and Feature Selection was the most accurate model, with an accuracy score of 90% in 70% of testing in training data, while Random Forest with SMOTE was the worst - performing model for predicting stunting. Based on these findings, we can consider that the Decision Tree model with SMOTE and Feature Selection is superior to the other 11 models used in this study to predict stunting status in children under five in Banjarmasin. In future research, we will add more features and data and try different models, such as a combination of Machine Learning and Deep Learning models. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. Predicting Förster critical distance using machine learning techniques.
- Author
-
Mahato, Kapil Dev, Das, S. S. Gourab Kumar, Azad, Chandrashekhar, and Kumar, Uday
- Subjects
- *
STANDARD deviations , *REGRESSION trees , *RANDOM forest algorithms , *DECISION trees , *MACHINE learning - Abstract
The efficiency of energy transfer between the donor (D) and acceptor (A) is a function of the Förster distance (R₀) in addition to other parameters, which can be determined experimentally at the cost of time and money. Can we have an intelligent approach that can estimate R₀ with a high degree of accuracy while costing minimal time and money without using a laboratory or experimental facility? For this purpose, we considered two well-established dyes, Rhodamine-6G (Rh-6G) and Rhodamine-B (Rh-B), as acceptors with other dyes as donors, and their related host (D/A)-guest (solvent/any solid) properties from the literature for predicting R₀ using the Machine Learning Regression (MLR) technique. For estimating R₀, we employed seven models, such as Linear Regression (LR), Decision tree regression (DTR), Random Forest Regression (RFR), AdaBoost regression (ABR), Extra Tree Regression (ETR), Gradient Boosting Regression (GBR), and XGBoost regression (XGBR). Out of seven, DTR outperformed other models in all four evaluation parameters: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²) values of 11.34, 130.52, 11.42, and 0.92, respectively. This 92% accuracy achieved with a small data set is the strength of the present study. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
15. An empirical studies on online gender-based violence: Classification analysis utilizing XGBOOST.
- Author
-
Primandari, Arum Handini and Ermayani, Putri
- Subjects
- *
GENDER-based violence , *RANDOM forest algorithms , *DECISION trees , *CYBERBULLYING , *SOCIAL media - Abstract
It is undeniable that the presence of broad reach of the internet provides space for online gender-based violence (Kekerasan Berbasis Gender Online, KBGO). As is the case in real-world violence, perpetrators of violence in the online world have the intention of harassing their victims based on gender or sexuality. The kinds of online violence based on Komnas Perempuan reports are cyber grooming, cyberbullying, cyber harassment, hacking, infringement of privacy, etc. According to that phenomenon, this research builds an automation model to classify social media comments into comments that are indicated to contain Online Gender-Based Violence (GBV) or not. The 18,239 documents (statements) took from comments on celebrity or influencer accounts during 2022-2021. The model used for classification analysis is XGBoost (eXtreme Gradient Boosting), an ensemble decision tree model with a gradient boosting algorithm. XGBoost attempts to build a robust classifier from the number of weak classifiers. The main benefit of using XGBoost to train random forest ensembles is its speed. This is expected to be much faster than random forest itself. Documents went through a pre-processing series, then a text vectorization process with TF-IDF. Because the classes in the sample are imbalanced, the SMOTE-ENN method is employed to balance the classes. Measurement metrics, including accuracy, f1-score, precision, and recall, show good XGBoost performance which more than 90%. Comparing several model, including random forest, adaboost, and XGBoost, the XGBoost is not the model having highest accuracy, however it is the fastest model. The running time for XGBoost is almost seven to eight time faster than random forest. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. Estimating COVID-19 using chest x-ray images through AI-driven diagnosis.
- Author
-
Sofia, R., Mahendran, K., and Devi, K. Nirmala
- Subjects
- *
MACHINE learning , *COVID-19 pandemic , *RANDOM forest algorithms , *SUPPORT vector machines , *X-ray imaging - Abstract
The rapid global spread of COVID-19 has sparked a significant increase in testing efforts worldwide, marking it as a pandemic. This unprecedented situation has profoundly impacted daily life, public health, and the global economy. Traditional laboratory methods, like Polymerase Chain Reaction (PCR) testing, though considered the gold standard, are time-consuming and can yield false negatives. Consequently, there arose an urgent demand for swift and accurate diagnostic techniques to identify COVID-19 cases promptly and curb the pandemic's spread. Artificial intelligence (AI) has emerged as a potent tool in conjunction with radiographic imaging to aid in detecting COVID-19. This study proposes a classification approach for identifying infectious conditions in chest X-ray images. A dataset comprising X-ray images from healthy individuals, pneumonia cases including SARS, Streptococcus, Pneumococcus, and COVID-19 patients was compiled. Leveraging the Histogram of Oriented Gradients (HOG) technique for feature extraction, the study employed machine learning algorithms such as K-Nearest Neighbors (KNN), Random Forests, and Support Vector Machines (SVM) for classification. Results demonstrated classification accuracies of 98.14%, 96.29%, and 88.89% for KNN, Random Forests, and SVM, respectively. These findings underscore promising opportunities for utilizing image analysis in the detection of COVID-19 and other respiratory illnesses, providing a robust framework for future research and clinical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
17. Hyperparameter optimization in cardio vascular disease prediction.
- Author
-
Kuppusamy, Saraswathi, Thangavel, Renukadevi, Kumar, Deepan, Janani, and Yamunadevi
- Subjects
- *
SUPPORT vector machines , *HEART diseases , *RANDOM forest algorithms , *DECISION trees , *VASCULAR diseases - Abstract
A condition affecting the heart with blood vessels is called Cardio Vascular Disease (CVD). It is the main reason for death. Early detection can help prevent or lessen it, which lowers mortality. Various study articles describe the application of algorithmic machine learning to identify cardiac diseases. When the algorithm is applied to the dataset's records, a faster and more precise prediction of cardiovascular illnesswill enable the patient to receive therapy. Cardiologists can make judgments quickly with the aid of these projections. The suggested study employs self-defined Decision Tree, random forest, Logistic Regressions, Support Vector Machine (SVM), grid search to identify the presenceof cardiovascular illness. We examine and assess its performance to forecast it. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. Enhancing predictive maintenance in the industrial sector: A comparative analysis of machine learning models.
- Author
-
Levin, Semen
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *PLANT maintenance , *RANDOM forest algorithms , *EXPERTISE - Abstract
This research evaluates the efficacy of machine learning (ML) models – Random Forest, Gradient Boosting Machine (GBM), and Deep Neural Networks (DNNs) – for predictive maintenance in the industrial sector, using a dataset reflective of centrifuge, pump, and compressor operations. It assesses these models based on accuracy, precision, recall, F1 score, and ROC AUC metrics, focusing on the GBM model's feature importance analysis, identifying vibration levels and operational hours as key predictors of equipment failure. The study demonstrates DNNs' superior performance, highlighting their potential to significantly enhance predictive maintenance through improved prediction accuracy and operational efficiency. Despite the advantages, the paper notes challenges in implementing these advanced models, including computational demands and the need for specialised expertise. By offering a comprehensive framework for applying ML to predictive maintenance and addressing gaps in existing literature, this research contributes valuable insights into developing more efficient and reliable industrial maintenance strategies, paving the way for future innovations in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. A machine learning based model for customer churn prediction in telecommunication.
- Author
-
Sehrawat, Neeshant, Yadav, Saneh Lata, and Dahiya, Mamta
- Subjects
- *
RANDOM forest algorithms , *LOGISTIC regression analysis , *CONSUMERS , *RESEARCH personnel , *PREDICTION models - Abstract
Customer churn is a major problem and one of the biggest problems for large firms. Companies are developing techniques to predict probable customer churn since it directly affects their revenue. In order to decrease customer turnover, it is crucial to identify the variables that contribute to this churn. This paper's key contribution is to showcase the importance of customer churn in telecom that helps telecom providers identify consumers that are most likely to experience churn. The work is described in this study applies machine learning methods on datasets to predict whether a customer is likely to churn or not. To evaluate the effectiveness of churn prediction models, researchers have focused on assessing the accuracy of various machine learning models. Another significant contribution is the use of hyper-parameter tuning to increase the efficiency of the best resulted model. The accuracy of the model was improved by hyper-parameter tuning from 80.17% to 80.31%. The model is constructed and evaluated on the python platform using a sizable dataset that is produced by converting massive raw data provided by the Telco, a fictional telecommunications business. The model tested five different machine learning algorithms: Logistic Regression, Naive-Bayes Classifier, Support Vector Classifier (SVC), Decision-tree Classifier and Random Forest Classifier. However, using the Logistic Regression method produced the greatest results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Development of bio-medicinal plants and herbs classifier with random forest algorithm and QR code generator.
- Author
-
Sharma, Mansi, Srivastav, Gaurav, Puri, Chetan, and Khedkar, Sandip
- Subjects
- *
TWO-dimensional bar codes , *RANDOM forest algorithms , *COVID-19 pandemic , *CODE generators , *PLANT identification - Abstract
In today's time basically from ancient time India mostly uses ayurveda's culture and that bio-medical plants can be under a ayurveda to cure the disease by making medicine using herbs. Bio-medicinal plants is known to be medical herbs which having therapeutic properties and we use for various medicinal purpose. It can be classified into several types based on various usages and chemical components like Alkaloid rich plant, Terphenoid rich plants, Glycoside rich plant and many more. In the development of Bio-medical plants and herbs classifier we have used the Random Forest classifier to classify these Random Forest is a classifier utilizing many decision trees on different subsets of the input dataset and averages the results in order to improve the dataset's predictive accuracy. Also, QR codes have become increasingly popular in recent years, as they provide a quick and easy way to get information using a smartphone. There is an area to develop QR code particularly useful in identification and learning the information about medicinal plants and herbs. By generating QR code systems for plants and herbs, gardeners, horticulturists and other learners also can rapidly access information about a particular plant or herb, including its planting location, plant species plant origin, plant care instruction and plant identification etc. The use of QR codes has become more dominant during the COVID-19 pandemic as they provide a contactless way to get information and do transactions. QR codes are becoming gradually popular in the horticulture industry, providing a quick and suitable way for gardeners, horticulturists, and enthusiasts to access information about specific plants and herbs. This article discovers the development of a QR code system for various medicinal plants and herbs, its benefits, and potential applications in education and commercial plant sales. By using QR codes, individuals can expand their knowledge, promote sustainability, and enhance their overall experience with gardening. In this article the author introduces the QR code system for plants and herbs and how quickly it gives information about plants and herbs. the main use of QR code system is to store huge amounts of data in small areas so here time and space is completely utilized in an effective manner. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Comparative analysis of machine learning algorithms on different diabetes datasets.
- Author
-
Kawarkhe, Madhuri and Kaur, Parminder
- Subjects
- *
MACHINE learning , *K-nearest neighbor classification , *RANDOM forest algorithms , *DECISION trees , *PATIENT monitoring - Abstract
Diabetes is a disease caused due to elevated blood glucose levels. If diabetes is not treated properly it may lead to many health complications and may even cause individual death also. Diabetes prevention is a major need in the near future. Recent trends in the healthcare system provided a pathway for disease diagnosis, monitoring patients and predicting individual health conditions also. In this paper we compared Naive Bayes, Random Forest, Logistic Regression, AdaBoost, Decision Tree and K-Nearest Neighbor machine learning algorithms for prediction of diabetes. Experimentation is performed using three different diabetes datasets. The result shows that Random Forest outperformed in all the datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Enhancing credit card transaction fraud detection with random forest and robust scaling.
- Author
-
Wajgi, Rakhi, Agarkar, Himanshu, Patil, Rohan, Rao, Harshit, and Petkar, Nipun
- Subjects
- *
CREDIT card fraud , *MACHINE learning , *RANDOM forest algorithms , *FRAUD investigation , *CREDIT cards - Abstract
In an era where credit card fraud poses an ever-increasing threat to financial institutions and consumers, the precise detection of fraudulent transactions is paramount. This study delves into the realm of data science and machine learning to fortify the defenses against credit card fraud. We evaluate the performance of three distinct machine learning models—decision trees, random forests, and logistic regression—in classifying, predicting, and detecting fraudulent credit card transactions. Our findings reveal that the Random Forest model emerged as the standout performer, achieving an impressive accuracy rate of 99% and boasting an AUC (Area Under the Curve) of 98.5% in the identification and prediction of fraudulent credit card transactions. This remarkable accuracy, combined with superior precision, recall, and F1-score, positions Random Forest as the optimal choice for the critical task of credit card fraud detection. Furthermore, we emphasize the importance of employing the RobustScaler preprocessing technique, which contributed significantly to enhancing the robustness and overall performance of our machine learning models. The study underscores the applicability of Random Forest for precise and equitable categorization, particularly for the minority class, making it a compelling choice for real-world applications. As fraudsters continue to evolve their tactics, the use of advanced machine learning techniques, exemplified by Random Forest, becomes increasingly crucial in safeguarding the integrity of credit card transactions. This research offers valuable insights into the frontlines of fraud detection, providing a foundation for enhanced security in the payment ecosystem. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Unleashing cricket's potential: The ultimate portal for prediction, analysis, and live scores.
- Author
-
Hambarde, Bhagyashree, Govardhan, Prashant, Parkhi, Priya, and Bodhe, Ketan D.
- Subjects
- *
RANDOM forest algorithms , *WEB-based user interfaces , *DECISION trees , *LOGISTIC regression analysis , *MACHINE learning - Abstract
Cricket stands as one of the world's most-watched sports, drawing an ever-growing audience eager to delve into its intricacies. With an array of data types—numerical, categorical, time-series, text, and ordinal—providing diverse insights, the game's outcome hinges on various game-changing factors. Enter Cricviz: a Django and Bootstrap-powered web application tailored to offer comprehensive Cricket Analysis, Predictions, and Live scores for all IPL seasons. In the prediction model, many machine learning techniques were implemented and classified. Accuracy values for these algorithms range from 60.8%, 44.5%, 62%, 58.6%, 56.5% and 58% for Decision Tree, KNN, Random Forest, Logistic Regression, GaussianNB and Gradient Boosting. The top-performing model takes centre stage in our prediction page, enhancing user experience and reliability. For in-depth analysis and visualization, presenting the capabilities of Microsoft Power BI, embedding interactive dashboards within our Analysis section. Dive into live match summaries presented over-by-over, ball-by-ball, accompanied by insightful commentary, available on the Live Score page. Paper presents the transform cricket engagement through data-driven insights and real-time updates. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Predictive analysis of bullying victimization trajectory in a Chinese early adolescent cohort based on machine learning.
- Author
-
Wen, Xue, Tang, Ting, Wang, Xinhui, Tong, Yingying, Zhu, Dongxue, Wang, Fan, Ding, Han, Su, Puyu, and Wang, Gengfu
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *MULTIPLE regression analysis , *LOGISTIC regression analysis , *SATISFACTION , *BULLYING , *CYBERBULLYING - Abstract
The development of bullying victimization among adolescents displays significant individual variability, with general, group-based interventions often proving insufficient for partial victims. This study aimed to conduct a machine learning-based predictive analysis of bullying victimization trajectories among Chinese early adolescents and to examine the underlying determinants. Data were collected from 1549 students who completed three assessments of bullying victimization from 2019 to 2021. Self-reported questionnaires were used to measure bullying victimization and its associated risk and protective factors. Trajectories were classified using the Group-based Trajectory Model (GBTM), while a Random Forest algorithm was employed to develop a predictive model. Associations between baseline characteristics and victimization trajectories were evaluated via multiple logistic regression analysis. The GBTM identified four distinct victimization trajectories, with the predictive model demonstrating adequate accuracy across these trajectories, ranging from 0.812 to 0.990. Predictors exhibited varying influences across different trajectory subgroups. Odds ratios (ORs) were notably higher in the persistent severe victimization group compared to the low victimization group (OR for adverse school experiences: 3.698 vs. 1.386; for age: 2.160 vs. 1.252; for irritability traits: 1.867 vs. 1.270). Adolescents reporting lower school satisfaction and higher borderline personality features showed a greater likelihood of persistent severe victimization, while those with lower peer satisfaction faced increased victimization over time. The machine learning-based predictive model facilitates the identification of adolescents across different victimization trajectory groups, offering insights for designing targeted interventions. The identified risk factors are instrumental in guiding effective intervention strategies. • The machine learning model showed a desirable performance in the prediction of bullying victimization trajectories. • The important predictors presented different effect across different trajectory subgroups. • Physical aggression and hostility were found significantly associated with low victimization trajectory. • Satisfaction with school and borderline personality features were associated with persistent severe victimization. • The results might provide valuable insights in identifying at-risk groups and designing targeted intervention strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
25. Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset: Granular-ball-matrix-based incremental semi-supervised feature selection...: W.H. Xu and J.L. Li
- Author
-
Xu, Weihua and Li, Jinlong
- Subjects
ARTIFICIAL intelligence ,RANDOM forest algorithms ,IMAGE processing ,MATRIX functions ,NEIGHBORHOODS - Abstract
In numerous real-world applications, data tends to be ordered and partially labelled, predominantly due to the constraints of labeling costs. The current methodologies for managing such data are inadequate, especially when confronted with the challenge of high-dimensional datasets, which often require reprocessing from the start, resulting in significant inefficiencies. To tackle this, we introduce an incremental semi-supervised feature selection algorithm that is grounded in neighborhood discernibility, and incorporates pseudolabel granular balls and matrix updating techniques. This novel approach evaluates the significance of features for both labelled and unlabelled data independently, using the power of neighborhood distinguishability to identify the most optimal subset of features. In a bid to enhance computational efficiency, especially with large datasets, we adopt a pseudolabel granular balls technique, which effectively segments the dataset into more manageable samples prior to feature selection. For high-dimensional data, we employ matrices to store neighborhood information, with distance functions and matrix structures that are tailored for both low and high-dimensional contexts. Furthermore, we present an innovative matrix updating method designed to accommodate fluctuations in the number of features. Our experimental results conducted across 12 datasets-including 4 with over 2000 features-demonstrate that our algorithm not only outperforms existing methods in handling large samples and high-dimensional datasets but also achieves an average time reduction of over six fold compared to similar semi-supervised algorithms. Moreover, we observe an average improvement in accuracy of 1.4%, 0.6%, and 0.2% per dataset for SVM, KNN, and Random Forest classifiers, respectively, when compared to the best-performing algorithm among the compared algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
26. Predicting the hub interactome of COVID-19 and oral squamous cell carcinoma: uncovering ALDH-mediated Wnt/β-catenin pathway activation via salivary inflammatory proteins.
- Author
-
Yadalam, Pradeep Kumar, Arumuganainar, Deepavalli, Natarajan, Prabhu Manickam, and Ardila, Carlos M.
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *SALIVARY proteins , *SQUAMOUS cell carcinoma , *RANDOM forest algorithms , *BOOSTING algorithms - Abstract
Understanding shared pathways and mechanisms involved in the pathogenesis of diseases like oral squamous cell carcinoma (OSCC) and COVID-19 could lead to the development of novel therapeutic strategies and diagnostic biomarkers. This study aims to predict the interactome of OSCC and COVID-19 based on salivary inflammatory proteins. Datasets for OSCC and COVID-19 were obtained from https://www.salivaryproteome.org/differential-expression and selected for differential gene expression analysis. Differential gene expression analysis was performed using log transformation and a fold change of two. Hub proteins were identified using Cytoscape and Cytohubba, and machine learning algorithms including naïve Bayes, neural networks, gradient boosting, and random forest were used to predict hub genes. Top hub genes identified included ALDH1A1, MT-CO2, SERPINC1, FGB, and TF. The random forest model achieved the highest accuracy (93%) and class accuracy (84%). The naive Bayes model had lower accuracy (63%) and class accuracy (66%), while the neural network model showed 55% accuracy and class accuracy, possibly due to data pre-processing issues. The gradient boosting model outperformed all models with an accuracy of 95% and class accuracy of 95%. Salivary proteomic interactome analysis revealed novel hub proteins as potential common biomarkers. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
27. Machine-learning diagnostics of breast cancer using piRNA biomarkers.
- Author
-
Zhao, Amy R., Kouznetsova, Valentina L., Kesari, Santosh, and Tsigelny, Igor F.
- Subjects
- *
MACHINE learning , *BREAST cancer , *TUMOR markers , *LOGISTIC regression analysis , *RANDOM forest algorithms - Abstract
AbstractBackground and objectivesMethodsResultsConclusionsPrior studies have shown that small non-coding RNAs (sncRNAs) are associated with cancer occurrence or development. Recently, a newly discovered class of small ncRNAs known as PIWI-interacting RNAs (piRNAs) have been found to play a vital role in physiological processes and cancer initiation. This study aims to utilize piRNAs as innovative, noninvasive diagnostic biomarkers for breast cancer. Our objective is to develop computational methods that leverage piRNA attributes for breast cancer prediction and its application in diagnostics.We created a set of piRNA sequence descriptors using information extracted from the piRNA sequences. To ensure accuracy, we found a path to convert non-standard piRNA to standard names to enable precise identification of these sequences. Using these descriptors, we applied machine-learning (ML) techniques in WEKA (Waikato Environment for Knowledge Analysis) to a dataset of piRNA to assess the predictive accuracy of the following classifiers: Logistic Regression model, Sequential Minimal Optimization (SMO), Random Forest classifier, and Logistic Model Tree (LMT). Furthermore, we performed Shapley additive explanations (SHAP) Analysis to understand which descriptors were the most relevant to the prediction accuracy. The ML models were then validated on an independent dataset to evaluate their effectiveness in predicting breast cancer.The top three performing classifiers in WEKA were Logistic Regression, SMO, and LMT. The Logistic Regression model achieved an accuracy of 90.7% in predicting breast cancer, while SMO and LMT attained 89.7% and 85.65%, respectively.Our study demonstrates the effectiveness of using ML-based piRNA classifiers in diagnosing breast cancer and contributes to the growing body of evidence supporting piRNAs as biomarkers in cancer diagnosis. However, additional research is needed to validate these findings and further assess the clinical applicability of this approach. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. Development and validation of machine learning models for MASLD: based on multiple potential screening indicators.
- Author
-
Chen, Hao, Zhang, Jingjing, Chen, Xueqin, Luo, Ling, Dong, Wenjiao, Wang, Yongjie, Zhou, Jiyu, Chen, Canjin, Wang, Wenhao, Zhang, Wenbin, Zhang, Zhiyi, Cai, Yongguang, Kong, Danli, and Ding, Yuanlin
- Subjects
MACHINE learning ,INSULIN resistance ,PREDICTION models ,DEMOGRAPHIC characteristics ,RANDOM forest algorithms - Abstract
Background: Multifaceted factors play a crucial role in the prevention and treatment of metabolic dysfunction-associated steatotic liver disease (MASLD). This study aimed to utilize multifaceted indicators to construct MASLD risk prediction machine learning models and explore the core factors within these models. Methods: MASLD risk prediction models were constructed based on seven machine learning algorithms using all variables, insulin-related variables, demographic characteristics variables, and other indicators, respectively. Subsequently, the partial dependence plot(PDP) method and SHapley Additive exPlanations (SHAP) were utilized to explain the roles of important variables in the model to filter out the optimal indicators for constructing the MASLD risk model. Results: Ranking the feature importance of the Random Forest (RF) model and eXtreme Gradient Boosting (XGBoost) model constructed using all variables found that both homeostasis model assessment of insulin resistance (HOMA-IR) and triglyceride glucose-waist circumference (TyG-WC) were the first and second most important variables. The MASLD risk prediction model constructed using the variables with top 10 importance was superior to the previous model. The PDP and SHAP methods were further utilized to screen the best indicators (including HOMA-IR, TyG-WC, age, aspartate aminotransferase (AST), and ethnicity) for constructing the model, and the mean area under the curve value of the models was 0.960. Conclusions: HOMA-IR and TyG-WC are core factors in predicting MASLD risk. Ultimately, our study constructed the optimal MASLD risk prediction model using HOMA-IR, TyG-WC, age, AST, and ethnicity. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
29. A feature selection and scoring scheme for dimensionality reduction in a machine learning task.
- Author
-
Emmoh, Philemon Uten, Eke, Christopher Ifeanyi, and Moses, Timothy
- Subjects
- *
MACHINE learning , *CHI-squared test , *DECISION trees , *K-nearest neighbor classification , *RANDOM forest algorithms - Abstract
The selection of important features is very vital in machine learning tasks involving high-dimensional dataset with large features. It helps to reduce the dimensionality of a dataset and improve model performance. Most of the feature selection techniques have restrictions on the kind of dataset to be used. This study proposed a feature selection technique based on statistical lift measure to select important features from a dataset. The proposed technique is a generic approach that can be used in any binary classification dataset problem. The technique successfully determined the most important feature subset and outperformed the existing techniques. The proposed technique was tested on lungs cancer dataset and happiness classification dataset. The effectiveness of the proposed technique in selecting important features subset was evaluated and compared with other existing techniques, namely Chi-Square, Pearson Correlation and Information Gain. The proposed and the existing techniques were evaluated on five machine learning models using four standard evaluation metrics such as accuracy, precision, recall and F1-score. The experimental results of the proposed technique on lung cancer dataset shows that logistic regression, decision tree, adaboost, gradient boost and random forest produced a predictive accuracy of 0.919%, 0.935%, 0.919%, 0.935% and 0.935% respectively, and that of happiness classification dataset produced a predictive accuracy of 0.758%, 0.689%, 0.724%, 0.655% and 0.689% on random forest, k-nearest neighbor, decision tree, gradient boost and cat boost respectively, which outperformed the existing techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
30. Unraveling clay-mineral genesis and climate change on Earth and Mars using machine learning-based VNIR spectral modeling.
- Author
-
Zhao, Lulu, Deng, Anbei, Hong, Hanlie, Zhao, Jiannan, Algeo, Thomas J., Liu, Fuxing, Luozhui, Nanmujia, and Fang, Qian
- Subjects
- *
CLAY minerals , *CHEMICAL weathering , *RANDOM forest algorithms , *X-ray diffraction , *SPACE exploration - Abstract
Clay minerals are common in martian geological units and are globally widespread on Earth. Understanding the origin, formation, and alteration of clay minerals is crucial for unraveling past environmental conditions on Earth and Mars, in which the composition and crystallinity of clay minerals serve as important surrogate indicators for addressing these issues. Here, 621 soil and sediment samples from five chronosequences representing different climatic zones of China were investigated using visible to near-infrared reflectance (VNIR) in combination with X-ray diffraction (XRD) analysis. The crystallinity of clay minerals (i.e., illite crystallinity, illite chemistry index, kaolinite crystallinity) and clay mineral alteration index (CMAI) were analyzed with conventional methods and then predicted through a spectral modeling approach. Our results show that kaolinite with a pedogenic or sedimentary origin is characterized by a broad crystallinity range and a poorly ordered structure, especially when generated in an intense weathering environment. Predictive models were constructed with data-mining methods, including partial least-squares regression (PLSR), random forest (RF), and Cubist algorithms. The predictive performance of the crystallinity and CMAI proxies is robust, with an overall accuracy of 78% and a residual prediction deviation (RPD) of 2.57. We also found that the model's accuracy in predicting clay-mineral-related proxies increased by 45% using random forest (RF) and Cubist compared to the PLSR models. We suggest that VNIR spectroscopy combined with RF and Cubist methods has the potential to be an alternative and broadly applicable tool for analyzing typical clay-mineral proxies, substituting for a series of common mineralogic analyses. Spectral modeling can reveal genetic and climatic information at both field and regional scales, which has profound implications for Mars missions and other space exploration programs. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
31. Mapping rice lodging severity using dual-pol Sentinel-1 SAR data: polarimetric parameters, lodging sensitivity, and fuzzy classification modelling.
- Author
-
Wanga, Mo, Sun, Qing, Che, Xianghong, Chen, Li, Cui, Yunpeng, Liu, Juan, Wu, Jinming, Wang, Ting, and Li, Huan
- Subjects
- *
UNCERTAINTY (Information theory) , *SYNTHETIC aperture radar , *STOKES parameters , *FEATURE selection , *RANDOM forest algorithms - Abstract
Lodging is a major cause of crop yield loss worldwide. Remote sensing data provides unparalleled advantages for large-scale lodging monitoring. This study leverages the strengths of polarimetric synthetic aperture radar (PolSAR) data to detect changes in surface roughness and texture caused by rice lodging. Using Stokes parameters and a dual-polarization H-α decomposition, we extracted 14 polarimetric parameters from dual-polarization Sentinel-1 data to assess their sensitivity to rice lodging. A feature sensitivity metric was defined as the absolute value of Cohen’s d for pre- and post-lodging samples. The results indicate that the Stokes parameter g2 was the most sensitive feature for discriminating rice lodging. Other top-ranked sensitive features include VH backscatter intensity (${\rm{\sigma }}_{{\rm{VH}}}^0$σVH0), the second eigenvalue (l2) of the coherence matrix ${C_2}$C2, and Normalized Shannon Entropy (NSE). The three least important features were Stokes parameter g1, VV backscatter intensity (${\rm{\sigma }}_{{\rm{VV}}}^0$σVV0), and the first eigenvalue (l1) of ${C_2}$C2. We further developed a recursive feature selection procedure based on the permutation importance of the features. Three types of classifiers – Random Forest, XGBoost, and Multilayer Perceptron – were tested for the binary classification of lodging states. The random forest classifier was identified as the most effective model for detecting severe rice lodging (Precision = 0.86 and Kappa = 0.67). The probability of severe rice lodging was mapped with a fuzzy classification strategy. This study confirms the feasibility of using polarimetric parameters from dual-pol SAR images to monitor rice lodging and offers a reference for PolSAR feature selection in related studies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
32. Determining the most accurate machine learning algorithms for medical diagnosis using the monk' problems database and statistical measurements.
- Author
-
Avuçlu, Emre
- Subjects
- *
MACHINE learning , *COMPUTER-aided diagnosis , *RANDOM forest algorithms , *DATABASES , *STATISTICAL learning - Abstract
Computer-aided diagnosis process in the field of health, especially cancer diagnosis, is of vital importance. Computer-aided diagnosis helps specialist physicians to make the most accurate diagnosis. According to research studies, it has been stated that the number of wrong or late diagnosis increases with each passing year and ultimately causes the death of people living in many parts of the world. For this reason, some calculations must be made to determine the most accurate one in the algorithm to be used to make the correct diagnosis. In this study, three different database Monk' problems were used to determine the most accurate algorithm for medical diagnosis. Monk' problems are used as one of the several classification problems used to create an important comparative study. Train and test operations were performed using five different Machine Learning Algorithms (MLAs) (k Nearest Neighbor (k-NN), Decision Tree Algorithm (DT), Random Forest Algorithm (RF), Naive Bayes algorithm (NB), Support Vector Cases (SVM)). These machine learning algorithms are compared statistically in terms of performance. Two different databases in the medical field were used to test the results (Breast Cancer Coimbra Data Set, Diabetic Retinopathy Debrecen Data Set). In the test processes in the experimental studies, the highest accuracy rate was obtained from the k-NN, DT, RF, NB, SVM algorithms, respectively; 0.9758, 1, 1, 0.9180, 0.9344. The best performance was obtained from RF MLA for 1. dataset, DT MLA for 2. dataset, highest accuracy rates from k-NN and RF MLAs in 3. dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
33. Can Ingredients-Based Forecasting Be Learned? Disentangling a Random Forest's Severe Weather Predictions.
- Author
-
Mazurek, Alexandra C., Hill, Aaron J., Schumacher, Russ S., and McDaniel, Hanna J.
- Subjects
- *
MACHINE learning , *SEVERE storms , *ARTIFICIAL intelligence , *RANDOM forest algorithms , *TRUST , *WEATHER forecasting - Abstract
Machine learning (ML)–based models have been rapidly integrated into forecast practices across the weather forecasting community in recent years. While ML tools introduce additional data to forecasting operations, there is a need for explainability to be available alongside the model output such that the guidance can be transparent and trustworthy for the forecaster. This work makes use of the algorithm tree interpreter (TI) to disaggregate the contributions of meteorological features used in the Colorado State University Machine Learning Probabilities (CSU-MLP) system, a random forest–based ML tool that produces real-time probabilistic forecasts for severe weather using inputs from the Global Ensemble Forecast System. TI feature contributions are analyzed in time and space for CSU-MLP day 2 and day 3 individual hazard (tornado, wind, and hail) forecasts and day 4 aggregate severe forecasts over a 2-yr period. For individual forecast periods, this work demonstrates that feature contributions derived from TI can be interpreted in an ingredients-based sense, effectively making the CSU-MLP physically interpretable. When investigated in an aggregate sense, TI illustrates that the CSU-MLP system's predictions use meteorological inputs in ways that are consistent with the spatiotemporal patterns seen in meteorological fields that pertain to severe storm climatology. A discussion on how these insights could benefit forecast operations more broadly is also provided. Significance Statement: Machine learning tools are becoming more common in weather forecasting settings, and there is a need to provide information to meteorologists on how machine learning models make their predictions in real time. Severe weather forecasts made by an operational machine learning model are deconstructed into meteorological components in a way that offers physically insightful context to the model's predictions. The results show that the machine learning model uses the input meteorological fields to make predictions that resemble various aspects of severe storm climatology and environments. This work presents an avenue for using explainable artificial intelligence in operational weather forecasting by illustrating a method that could provide trust, transparency, and confidence in machine learning–based forecast guidance. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
34. Assessing the impact of 2022 extreme drought on the Yangtze River basin using downscaled GRACE/GRACE-FO data obtained by partitioned random forest algorithm.
- Author
-
Cui, Lilu, Li, Yu, Zhong, Bo, An, Jiachuan, Meng, Jiacheng, Guo, Haoyang, and Xu, Chuang
- Subjects
- *
RANDOM forest algorithms , *STANDARD deviations , *WATERSHEDS , *LOW temperatures , *SPATIAL resolution - Abstract
The Gravity Recovery and Climate Experiment (GRACE) and GRACE Follow-On (GRACE-FO) data have been widely used to monitor and analyze extreme hydrological events globally. However, their coarse spatial resolution limits their application in small- and medium-scale regions. In this study, we proposed a partitioned random forest downscaling (PRFD) strategy to improve the spatial resolution of GRACE/GRACE-FO data and quantitatively assessed the downscaling performance using a closed-loop simulation experiment. Our enhanced approach improved the spatial resolution of GRACE/GRACE-FO data from 1°to 0.1°, and the downscaled data were used to characterize the 2022 extreme drought in the Yangtze River basin (YRB), with particular on a smaller basin (i.e. the Wu River basin, WRB). Our findings show that the PRFD reduced the root mean square error by 39.29% compared to the traditional over RF downscaling (ORFD), and 27.8% of grid points showed significantly accuracy improvements. The downscaled results provided a more detailed depiction of the 2022 extreme drought in the YRB, allowing for precision identification of drought onset, extent and severity, and a more accurate assessment of the drought impacts in the WRB. The extreme drought originated in the northern WRB, gradually extending southward across the basin, with more severe drought conditions in the north than in the south. High temperatures and low precipitation were primary drives, while elevated high human water use also contributed. This study provides a valuable technique for downscaling GRACE/GRACE-FO data and understanding extreme drought in regional-scale areas. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
35. Identification of marine oil spill types based on multi-source remote sensing data using multi-scale dynamic convolution model.
- Author
-
Wang, Bei, Yang, Junfang, Liu, Shanwei, Sun, Caiyi, Ma, Yi, and Zhang, Jie
- Subjects
- *
CONVOLUTIONAL neural networks , *OIL spills , *REMOTE sensing , *SUPPORT vector machines , *RANDOM forest algorithms - Abstract
The identification of marine oil spill types is closely related to source tracing and pollution treatment. At present, most of the remote sensing monitoring of sea surface oil spill is conducted based on a single technical means, which has certain limitations. At the same time, the classification methods used have insufficient ability to extract hyperspectral image features, and the classification results cannot meet the needs of accurate monitoring. In this paper, a multi-scale dynamic convolution algorithm (MDCM) for oil types identification based on airborne hyperspectral data was constructed, and the ability to improve the identification of hyperspectral oil spill types by adding thermal infrared and SAR features was explored. The method application experiments were carried out based on the measured airborne hyperspectral, thermal infrared and SAR remote sensing data of different oil spill types obtained from outdoor oil spill simulation experiment. The experimental results show that: 1) The MDCM model exhibited good classification performance, with an identification accuracy of 88.95% for oil spill types based on hyperspectral image, which was 18.24%, 7.28% and 2.56% higher than those of Support Vector Machine, Random Forest and Convolutional Neural Network, respectively. 2) The combination of hyperspectral, thermal infrared, and SAR data can effectively improve the identification accuracy of oil spill types, with an average improvement of 7.6% compared to that of single hyperspectral features. The improvement effect of thermal infrared features was more significant, while the improvement effect of single SAR features on hyperspectral identification results was not significant. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
36. An approach for evaluating traffic safety of expressway weaving segments: Investigating risk patterns of lane-changing conflicts.
- Author
-
Ouyang, Pengying, Guo, Yanyong, Liu, Pan, Chen, Tianyi, and Yu, Hao
- Subjects
- *
LANE changing , *RANDOM forest algorithms , *WEAVING , *WEAVING patterns , *DATA analysis , *TRAFFIC safety - Abstract
Weaving segments, integral to expressway systems, face safety challenges due to frequent lane changes. While previous research has leveraged pre– and post–lane-changing (LC) data for risk estimation, few efforts have assessed time-series characteristics of LC conflict risks. This study proposes an approach to evaluate safety performance of weaving segments by investigating conflict risk patterns during the LC process using vehicle trajectory data. First, risk profiles are generated based on driving safety field theory for detected LC conflicts. These profiles are then categorized into distinct time-series patterns using k-shape clustering. Finally, determinants of these patterns are identified by enhanced random forest, which provides valuable insights into their relationship with risk severity. Analysis of trajectory data from 12 weaving segments in China reveals three main risk patterns (ascent, descent, and stabilization with a peak) for both rear-end and sideswipe conflicts. Ascending risk patterns signify more severe conflicts compared to descending ones. Increasing auxiliary lanes at weaving segments results in a more evenly distributed acceleration, potentially leading to a more severe LC conflict. The lowest risk is observed at the weaving segments with two auxiliary lanes. This study offers a novel perspective for mitigating LC crashes and enhancing traffic safety in weaving segments. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
37. Use of machine learning to identify prognostic variables for outcomes in chronic low back pain treatment: a retrospective analysis.
- Author
-
Cheema, Carolyn, Baldwin, Jonathan, Rodeghero, Jason, Werneke, Mark W, Mioduski, Jerry E, Jeffries, Lynn, Kucksdorf, Joseph, Shepherd, Mark, Dionne, Carol, and Randall, Ken
- Subjects
- *
RANDOM forest algorithms , *CHRONIC pain , *RESEARCH funding , *SCIENTIFIC observation , *TREATMENT effectiveness , *RETROSPECTIVE studies , *FUNCTIONAL status , *LONGITUDINAL method , *MACHINE learning , *LUMBAR pain , *PREDICTIVE validity - Abstract
Objectives: Most patients seen in physical therapy (PT) clinics for low back pain (LBP) are treated for chronic low back pain (CLBP), yet PT interventions suggest minimal effectiveness. The Cochrane Back Review Group proposed 'Holy Grail' questions, one being: 'What are the most important (preventable) predictors of chronicity' for patients with LBP? Subsequently, prognostic factors influencing outcomes for CLBP have been described, however results remain conflicting due to methodological weaknesses. Methods: This retrospective observational cohort study examined prognostic risk factors for PT outcomes in CLBP treatment using a sub-type of AI. Bootstrap random forest supervised machine learning analysis was employed to identify the outcomes-associated variables. Results: The top variables identified as predictive were: FOTO™ predicted functional status (FS) change score; FOTO™ predicted number of visits; initial FS score, age; history of jogging/walking, obesity, and previous treatments; provider education level; medication use; gender. Conclusion: This article presents how AI can be used to predict risk prognostic factors in healthcare research. Improving predictive accuracy helps clinicians predict outcomes and determine most appropriate plans of care and may impact research attrition rates. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
38. Predicting performance in exams and deep approach to learning in first year university students: a new look at academic success.
- Author
-
Bächtold, Manuel, Papet, Jacqueline, Barbe Asensio, Dominique, Mas, André, Borne, Sandra, and Ngoua Ondo, Appolinaire
- Subjects
- *
RANDOM forest algorithms , *LEARNING , *SOCIAL background , *INTRINSIC motivation , *DEEP learning , *REFLECTIVE learning - Abstract
This study calls for a broadening of the perspective on academic success. While passing exams is an essential objective of higher education, it should not overshadow another important objective which is the development of students' skills, such as becoming curious, autonomous and reflective in the learning process. This study used Academic Performance in Exams (APE) and Deep Approach to Learning (DAL) as measures related to these two objectives. The aim was to identify and compare the factors that may influence APE and DAL. The study was conducted on first-year students (2011) at a French university. It was based on a random forest algorithm and took into account a wide range of factors belonging to different dimensions: demographics, social background, educational background, context of the educational programme, behavioural engagement, social environment, psychological and cognitive characteristics. The results show that the most important factors in predicting APE are the educational programme undertaken, student's educational background and parents' occupation. DAL was not found to be an important factor in APE. Regarding the prediction of DAL, the results point to the predominant weight of intrinsic motivation and the important weight of elaborated epistemic beliefs. In contrast, demographics and behavioural engagement were found to have negligible weight in predicting both APE and DAL. These findings raise questions about the type of success that is valued in the first year of university and call for reflection on assessment methods. They also allow the identification of levers that teachers can activate to support first year students. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
39. Factors and Reasons Associated with Hesitating to Seek Care for Migraine: Results of the OVERCOME (US) Study.
- Author
-
Shapiro, Robert E., Muenzel, Eva Jolanda, Nicholson, Robert A., Zagar, Anthony J., L. Reed, Michael, Buse, Dawn C., Hutchinson, Susan, Ashina, Sait, Pearlman, Eric M., and Lipton, Richard B.
- Subjects
- *
SUPERVISED learning , *MEDICAL personnel , *MIGRAINE , *BURDEN of care , *RANDOM forest algorithms - Abstract
Introduction: Despite a variety of available treatment options for migraine, many people with migraine do not seek medical care, thereby reducing opportunities for diagnosis and effective treatment and potentially leading to missed opportunities to reduce the burden of disease. Understanding why people hesitate to seek care for migraine may help healthcare professionals and advocates address barriers and improve outcomes. The aim of this study, in a large adult population sample in the United States (US), was to identify factors associated with and reasons for hesitating to seek healthcare for migraine. Methods: The web-based OVERCOME (US) survey study identified adults with active migraine in a demographically representative US sample who answered questions about hesitating to seek care from a healthcare provider for migraine and reasons for hesitating. Supervised machine learning (random forest, least absolute shrinkage and selection operator) identified factors associated with hesitation; logistic regression models assessed association of factors on hesitation. Results: The study results show that of the 58,403 participants with active migraine who completed the OVERCOME (US) baseline survey and provided responses to the question on hesitating to seek care for migraine, 45.1% (n = 26,330/58,403) with migraine indicated that they had ever hesitated to seek care for migraine. Factors most associated with hesitating to seek care were hiding migraine (odds ratio [OR] = 2.69; 95% confidence interval [CI]: 2.50, 2.89), experiencing migraine-related stigma (OR = 2.13; 95% CI 1.95, 2.33), higher migraine-related disability (OR = 1.30; 95% CI 1.23, 1.38), and higher ictal cutaneous allodynia (OR = 1.26; 95% CI 1.19, 1.35). The most common reasons participants stated for hesitating included (1) 44.2% wanting to try and take care of migraine on their own, (2) 33.8% feeling that their migraine or headache would not be taken seriously, (3) 29.2% thinking that their migraine was not serious/painful enough, and (4) 27.4% not being able to afford it or not wanting to spend the money. The main limitation of the study includes the requirement for respondents to have internet, access which may have reflected cohort bias, and the quota sampling rather than random sampling to create a demographically representative sample. Conclusions: Hesitating to seek migraine care is common and is most strongly associated with hiding the disease and migraine-related stigma. Those experiencing higher migraine-related burden are more hesitant to seek the care that might alleviate the burden. These findings suggest that migraine's social context (e.g., stigma) is a major determinant of hesitance to seek migraine care. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
40. Inverse modeling of untethered electromagnetic actuators using machine learning.
- Author
-
Türkmen, Gökmen Atakan and Çetin, Levent
- Subjects
- *
ELECTROMAGNETIC actuators , *VECTOR fields , *MAGNETIC fields , *RANDOM forest algorithms , *MAXWELL equations - Abstract
Untethered electromagnetic actuation becomes an appealing concept for developing applications in microscale motion control. Although actuator modeling is critical, there is a lack of inverse modeling methods for untethered electromagnetic actuators (EMA) for control design and implementation. Herein, we focused on a machine learning-based framework to obtain inverse models of untethered EMAs. The inverse model is defined as a model which takes a point in the workspace of EMA together with the magnetic field at that point as input and gives the current(s) and position(s) of electromagnets as output. To obtain the inverse model; initially, the Maxwell Equations are solved for the defined set of coil currents and electromagnet positions numerically. Then, the classification problem is defined by concerning the obtained magnetic field values as data and corresponding the input values (currents and positions) as labels. The Random Forest Classifier is trained to obtain an inverse model to match the given magnetic field vector at a position with input values. The proposed approach is employed for three common structures: Single, Double, and Quadruple EMA. The performance test showed that the obtained inverse model is capable of giving the required magnetic field with accuracy of 1.43% Moreover, experimental study shown that the obtained inverse model is also capable of simulating the real-time behavior of EMA systems. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
41. A lightweight machine learning methods for malware classification.
- Author
-
Farfoura, Mahmoud E., Mashal, Ibrahim, Alkhatib, Ahmad, and Batyha, Radwan M.
- Subjects
- *
INFORMATION technology , *RANDOM forest algorithms , *COMPUTER systems , *MACHINE learning , *EDGE computing - Abstract
Today's Information Technology landscape is rapidly evolving. Cyber professionals are increasingly concerned about maintaining security and privacy. Research has shown that the emergence of new malware is on the rise. The realm of malware assault and defense is an endless circle. Antivirus firms are always striving to create signatures for hazardous malware, while attackers are constantly seeking to circumvent these signatures. Machine learning is incredibly successful at detecting malware. ML-based Malware detection falls into two categories: feature extraction and malware classification. The proposed solutions are designed specifically for low-power embedded devices and edge computing systems. These methods allow for real-time malware detection without imposing a significant computing burden. This study provides an in-depth analysis of feature reduction, and lightweight algorithms to enable this proposed method to work effectively and efficiently on any device starting from PC, IoT devices and servers. Extensive experiments were carried out on BODMAS dataset to provide the best low-complexity method with an F1 score of more than 99%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
42. Prediction of heart disease using random forest algorithm, support vector machine, and neural network.
- Author
-
Setiyadi, Didik, Henderi, Suryaningrat, Anrie, Swastika, Rulin, Saludin, Mutoffar, Muhamad Malik, and Yunianto, Imam
- Subjects
- *
RANDOM forest algorithms , *MEDICAL forecasting , *SUPPORT vector machines , *HEART diseases , *DIAGNOSTIC errors - Abstract
The heart is a vital organ responsible for pumping blood throughout the human body. Machine learning has become an increasingly important tool in medical forecasting, improving diagnostic accuracy and reducing human errors. This study focuses on detecting heart disease using machine learning algorithms. It aims to compare the performance of three key algorithms random forest (RF), support vector machine (SVM), and neural networks (NN), in predicting heart disease. Using a patient dataset with both nominal and numeric attributes, record mining techniques were applied through Orange software. The target classes indicated the absence (0) or presence (1) of heart disorders. The evaluation was based on the prediction accuracy of each algorithm. Results show that SVM achieved the highest accuracy, with a rate of 85%, outperforming RF and NN. The findings suggest that the SVM algorithm is a reliable tool for heart disease prediction, helping reduce diagnostic errors and improve medical decision-making. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
43. Enhancing stormwater network overflow prediction: investigation of ensemble learning models.
- Author
-
Boughandjioua, Samira, Laouacheria, Fares, and Azizi, Nabiha
- Subjects
- *
ENSEMBLE learning , *FLOOD warning systems , *ARTIFICIAL intelligence , *RANDOM forest algorithms , *WATER depth - Abstract
This study addresses the critical issue of urban flooding caused by stormwater network overflow, necessitating unified and efficient management measures to handle increasing water volumes and the effects of climate change. The proposed approach aims to improve the precision and efficiency of overflow rate predictions by investigating advanced machine learning algorithms, specifically ensemble methods such as gradient boosting and random forest algorithms. The main contribution lies in introducing the SWN-ML approach, which integrates hydraulic simulations using MIKE + with machine learning to predict average overflow rates for various rainfall durations and return periods. Mike + model was calibrated for the only available observed data of water depth at the outlet point during the storm event of February 4, 2019. The datasets for model calibration used in ML models consisted of many input variables such as peak flow, max depth, length, slope, roughness, and diameter and average overflow rate as output variable. Experimental results show that these methods are effective under a variety of scenarios, with the ensemble methods consistently outperforming classical machine learning models. For example, the models exhibit similar performance metrics with an MSE of 0.023, RMSE of 0.15, and MAE of 0.101 for a 2-h rainfall duration and a 10-year return period. Correlation analysis further confirms the strong correlation between ensemble method predictions and MIKE + simulated models, with values ranging between 0.72 and 0.80, indicating their effectiveness in capturing stormwater network dynamics. These results validate the utility of ensemble learning models in predicting overflow rates in flood-prone urban areas. The study highlights the potential of ensemble learning models in forecasting overflow rates, offering valuable insights for the development of early warning systems and flood mitigation strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
44. A Weighted Likelihood Ensemble Approach for Failure Prediction of Water Pipes.
- Author
-
Beig Zali, Ramiz, Latifi, Milad, Javadi, Akbar A., and Farmani, Raziyeh
- Subjects
- *
SUPPORT vector machines , *WATER distribution , *CLASSIFICATION algorithms , *RANDOM forest algorithms , *MACHINE learning - Abstract
This paper presents a novel weighted likelihood ensemble approach for predicting pipe failures in water distribution networks (WDNs). The proposed method leverages ensemble modeling, specifically stacking, to enhance prediction capability. The study utilizes a data set of water pipe failures from 2006 to 2017, segmented into different time intervals. Various classification algorithms, including logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGB), are employed to predict failures within these segments. These individual models are then combined to create ensemble models. The results show that the stacked models consistently outperform the models that use the training data set as a whole. Along with traditional evaluation metrics, practical assessments are conducted, considering different percentages of pipes for replacement. These evaluations align with tactical and strategic maintenance plans. Remarkably, the most significant improvements are observed in models with lower replacement percentages. The novel aspect of this approach lies in assigning weights to prediction results from different models, each utilizing distinct time segments of data. By developing a meta-model with linear regression based on weighted likelihoods of pipe failures, this method provides valuable insights for asset managers and decision makers. It aids in prioritizing pipe rehabilitation programs, with the potential for further refinement as new failure data becomes available. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
45. Estimation of distribution algorithms for well placement optimization in petroleum fields.
- Author
-
Brum, Artur, Coelho, Guilherme, Santos, Antonio Alberto, and Schiozer, Denis José
- Subjects
- *
DISTRIBUTION (Probability theory) , *NET present value , *OIL fields , *RANDOM forest algorithms , *ARTIFICIAL intelligence - Abstract
Optimizing well placement is one of the primary challenges in oil field development. The number and positions of wells must be carefully considered, as it is directly related to the infrastructure cost and the profits over the field's life cycle. In this paper, we propose three estimation of distribution algorithms to optimize well placement with the objective of maximizing the net present value. The methods are guided by an elite set of solutions and are able to obtain multiple local optima in a single run. We also present an auxiliary regression model to preemptively discard candidate solutions with poor performance prediction, thus avoiding running computationally expensive simulations for unpromising candidates. The model is trained with the data obtained during the search process and does not require previous training. Our algorithms yielded a significant improvement compared to a state-of-the-art reference method from the literature, as evidenced by computational experiments with two benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
46. Comparative analysis of machine learning algorithms for heart disease prediction.
- Author
-
Gupta, Isha, Bajaj, Anu, and Sharma, Vikas
- Subjects
- *
MACHINE learning , *CONVOLUTIONAL neural networks , *LONG short-term memory , *FEATURE selection , *RANDOM forest algorithms , *DEEP learning , *ARRHYTHMIA - Abstract
Heart diseases are a major cause of death worldwide, highlighting the need for early detection. The electrocardiogram (ECG) records the heart's electrical activity using electrodes. Our research focuses on the ECG data to diagnose heart disorders, particularly arrhythmias. We utilized the MIT-BIH arrhythmia dataset for comparative analysis of various machine learning techniques, including random forest, K-Nearest Neighbor, and Decision Tree, along with deep learning algorithms like Long short-term memory and Convolutional Neural Networks. This required employing various preprocessing methods like filtering and normalization and feature selection techniques such as chi-square and sequential feature selectors to improve the performance of heart disease prediction. Therefore, hybrid machine and deep learning models are proposed, and the results reveal that hybrid models perform better than conventional models. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
47. Predicting adult students' online learning persistence: A case study in South Korea using random forest analysis.
- Author
-
Nam, Na-Ra and Song, Sue-Yeon
- Subjects
- *
STUDENT engagement , *RANDOM forest algorithms , *SCHOOL attendance , *INSTRUCTIONAL systems , *SCHOOL dropout prevention - Abstract
This empirical study uses a random forest algorithm to examine the factors that influence learners' persistence in online learning at a prominent Korean institution. The data were collected from students who began their studies in Spring 2021, and encompassed a range of variables including individual attributes, academic engagement, academic achievement, course status, and satisfaction with the institution. The study identified several key predictors of student retention, including academic achievement and variables related to academic engagement, such as students' learning time, course completion rate, and number of logins to the online learning system. Students' number of submitted mid-term assignments and attendance at face-to-face classes also emerged as significant factors related to persistence. The predictive model utilised in this study can provide valuable insight, indicating when a learner is at risk of dropping out and thus enabling timely interventions that promote academic persistence and student success. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
48. Spatial and temporal correlation between soil and rice relative yield in small-scale paddy fields and management zones.
- Author
-
Zhang, Zhihao, He, Jiaoyang, Zhao, Yanxi, Fu, Zhaopeng, Wang, Weikang, Zhang, Jiayi, Liu, Xiaojun, Cao, Qiang, Zhu, Yan, Cao, Weixing, and Tian, Yongchao
- Subjects
- *
RICE farming , *RICE , *AGRICULTURE , *RANDOM forest algorithms , *PRECISION farming - Abstract
Investigating soil properties and yield variability in farming systems is crucial for delineating Management Zones (MZs). The objectives of study were to investigate the spatiotemporal variability of soil properties, identify spatial and temporal yield-limiting factors of soil and delineate MZs based on these factors. This study was conducted at the Xinghua Rice Smart Farm (33.08°E, 119.98°N) in Jiangsu Province, China, and the experiment covered five consecutive years of soil and rice yield testing from 2017 to 2021, with 933 geo-referenced soil samples and 140 rice yield samples collected annually. Soil samples were analyzed for pH, soil organic matter (SOM), total nitrogen (TN), available phosphorus (AP), available potassium (AK), and apparent soil conductivity (ECa). Spatial and temporal variability of soil properties and RY were analyzed using statistical and geostatistical methods. Ordinary Kriging (OK) interpolation characterized these distributions, and the random forest (RF) algorithm identified key yield-limiting factors. Subsequently, the effectiveness of using all variables to delineate the MZ was compared against the approach of defining MZs based solely on the identified yield-limiting factors. The study also compared Fuzzy C Means (FCM) and Spatial Fuzzy C-Means (sFCM) clustering to evaluate MZs and their temporal stability. Results showed that the coefficients of variation for soil properties ranged from low to medium (7.7-77.4%), with semi-variational function analyses showing moderate to high spatial dependence for most properties. Temporally, soil nutrients and ECa exhibited a slow increase, whereas pH decreased, showing the highest temporal stability for pH and the lowest for AP. RF analysis identified SOM, TN, and ECa as primary influencers of spatial variability of RY, and SOM, pH, and TN as main contributors to its temporal variability. The integration of yield-limiting factors with the sFCM method improves performance of MZ delineation, maintaining stability over the five-year period. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
49. Hybrid physically based and machine learning model to enhance high streamflow prediction.
- Author
-
López-Chacón, Sergio Ricardo, Salazar, Fernando, and Bladé, Ernest
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *STANDARD deviations , *MACHINE performance , *STREAMFLOW - Abstract
Despite the significant performance of machine learning models for streamflow prediction, their precision for poorly represented data is reduced. This is a concern for flood mitigation purposes where high streamflow values are the most relevant but scarce. Consequently, this study proposes a methodology to create a hybrid model to mitigate the accuracy reduction of a standalone machine learning model in high streamflow values. The hybrid model combines a surrogate model that reproduces a physically based model with a model to estimate its residuals employing the random forest algorithm. The hybrid model reaches a root mean square error reduction of 23% and 33% in the respective study catchments for values over a 3-year return period compared to a standalone machine learning model. The percentage bias decreases by more than 70% from values over a 1.5-year return period. Moreover, the hybrid model has shown close predictions of values higher than the training set. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
50. A spatially explicit multi-hazard framework for assessing flood, landslide, wildfire, and drought susceptibilities.
- Author
-
Choubin, Bahram, Jaafari, Abolfazl, and Mafi-Gholami, Davood
- Subjects
- *
ENSEMBLE learning , *RAINFALL , *RANDOM forest algorithms , *SUSTAINABLE construction , *BUILDING design & construction , *LANDSLIDES , *LANDSLIDE hazard analysis , *HAZARD mitigation , *DROUGHT management - Abstract
Sustainable development goals require evaluating vulnerabilities and examining natural and climatic hazards for effective planning that reduces their impact on economic, social, and developmental efforts. Key hazards like floods, landslides, wildfires, and droughts have significantly affected terrestrial ecosystems and human societies, emphasizing the importance of comprehending these hazards. This study aimed to predict and spatially map multi-hazard, identifying historical and potential risks to inform sustainable development and construction programs that mitigate risks and promote resilience. A 34-year drought magnitude map was generated using long-term data, and ensemble and individual machine learning techniques were used to produce maps of flood, landslide, and wildfire hazards in a northwest region of Iran. Results demonstrated that ensemble learning models outperformed individual models, with the top-performing models being the weighted average (WA) of the two best models, random forest, extreme gradient boosting, WA models with over 80% accuracy, and WA incorporating all models, respectively. The CART model performed best among individual models. Variable importance analysis revealed that slope and precipitation were crucial factors for identifying high-hazard landslide areas, distance from waterways, vegetation cover, and topographic humidity index emerged as the most crucial factor for identifying flood hazard areas, while vegetation, rainfall, and proximity to roads significantly impacted wildfire hazard. The multi-hazard map produced by our study indicated that about 30% of the study area was highly and very highly susceptible to floods, landslides, wildfires, and droughts and the hazards mitigation efforts should be primarily directed to these specific portions of the study area. Our study underscored the importance of integrating long-term data and machine learning techniques in multi-hazard prediction and mapping, ultimately guiding mitigation efforts and promoting resilience in the face of natural and climatic hazards. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.