2,025 results on '"Cross validation"'
Search Results
2. Random forest with feature selection and K-fold cross validation for predicting the electrical and thermal efficiencies of air based photovoltaic-thermal systems
- Author
-
Ait tchakoucht, Taha, Elkari, Badr, Chaibi, Yassine, and Kousksou, Tarik
- Published
- 2024
- Full Text
- View/download PDF
Catalog
3. Unveiling the potential of operating time in improving machine learning models’ performance for waste biomass gasification systems
- Author
-
Olca, Kadriye Deniz and Yücel, Özgün
- Published
- 2024
- Full Text
- View/download PDF
4. Prediction of leaf nitrogen in sugarcane (Saccharum spp.) by Vis-NIR-SWIR spectroradiometry
- Author
-
Fiorio, Peterson Ricardo, Silva, Carlos Augusto Alves Cardoso, Rizzo, Rodnei, Demattê, José Alexandre Melo, Luciano, Ana Cláudia dos Santos, and Silva, Marcelo Andrade da
- Published
- 2024
- Full Text
- View/download PDF
5. The development and validation of the Student Self-feedback Behavior Scale.
- Author
-
Yang, Yongle, Yan, Zi, Zhu, Jinyu, Guo, Wuyuan, Wu, Junsheng, and Huang, Bingjun
- Subjects
EXPLORATORY factor analysis ,CONFIRMATORY factor analysis ,CHINESE-speaking students ,HIGH school students ,STUDENT development - Abstract
Though the importance and benefits of students' active role in the feedback process have been widely discussed in the literature, an instrument for measuring students' self-feedback behavior is still lacking. This paper reports the development and validation of the Self-feedback Behavior Scale (SfBS), which comprises three dimensions (seeking, processing, and using feedback). The SfBS items were constructed in line with the self-feedback behavioral model. One thousand two hundred fifty-two high school students (Grade 10 to Grade 12) in mainland China participated in this survey. The exploratory factor analysis revealed a three-factor model reaffirmed in the confirmatory factor analysis. The multi-group CFA supported the measurement invariance of the SfBS across gender. Using the SfBS can help researchers and teachers better understand students' self-feedback behavior and optimize benefits derived from the self-feedback process. [ABSTRACT FROM AUTHOR] more...
- Published
- 2025
- Full Text
- View/download PDF
6. Can Level-2 Firth’s Bias-reduced logistic regression be considered a robust approach for predicting landslide susceptibility?
- Author
-
Pradhan, Ananta Man Singh, Shrestha, Suchita, Lee, Ji-Sung, and Kim, Yun-Tae
- Abstract
The implementation of effective landslide mitigation strategies relies heavily on the availability of accurate and reliable landslide susceptibility map. This study focuses on the adequacy evaluation of the Level-2 Firth’s Bias-Reduced Logistic Regression (BLR) to predict landslide susceptibility. The study was performed at the mountain Seunghak which lies in the southern-west part of Busan. A total of 57 multi-temporal landslides since 2006 to 2019 were identified and plotted in geographic information system (GIS) environment. Although, twelve spatial environmental variables were selected for the analysis, topographic wetness index was removed to avoid a collinearity issue. The dataset was randomly divided into two sets: training set (70%) and test set (30%), ensuring they did not overlap. In order to assess the performance of the model, two different cross-validation methods i.e. random cross-validation (RCV) and spatial cross-validation (SCV) were applied. The overall accuracy was examined using area under the curve of receiver operating characteristic curve (mean AUC of RCV = 0.965, mean AUC of SCV = 0.939). The true positive and true negative values depicted correctly which showed the excellent adequacy of the prediction of the landslide occurrences. Among eleven environmental variables, slope played a significant role in the result of landslide prediction. The susceptibility estimation component of BLR model outperformed a standard logistic regression (LR) model, which we used as a benchmark. LR is the most widely used classifier in landslide research, making it a key point of comparison. In our discussion, we explored the strength and weaknesses of the new modeling framework and its potential applicability in various domains. We highlighted both the specific considerations related to hazards and geomorphology, as well as the broad implications of its application. [ABSTRACT FROM AUTHOR] more...
- Published
- 2025
- Full Text
- View/download PDF
7. Análisis y predicción del desempeño docente por medio de encuestas estudiantiles. Búsqueda de relaciones desde la minería de datos.
- Author
-
Castrillón, Omar Danilo
- Subjects
DEPENDENCY (Psychology) ,STUDENT surveys ,DEPENDENT variables ,TEACHER educators ,DATA mining - Abstract
Copyright of Formación Universitaria is the property of Centro de Informacion Tecnologica (CIT) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) more...
- Published
- 2024
- Full Text
- View/download PDF
8. Machine Learning-Based Scrap Steel Price Forecasting for the Northeast Chinese Market.
- Author
-
Jin, Bingzi and Xu, Xiaojie
- Subjects
PRICES ,KRIGING ,STANDARD deviations ,STEEL prices ,MARKET prices ,STOCK price forecasting - Abstract
Throughout history, governments and investors have relied on predictions of prices for a broad spectrum of commodities. Using time-series data covering 08/23/2013–04/15/2021, this study investigates the challenging problem of predicting scrap steel prices, which are issued daily for the northeast China market. Previous research has not sufficiently taken into account estimates for this significant commodity price measurement. In this instance, Gaussian process regression methods are created using Bayesian optimisation approaches and cross-validation processes, and the resulting price forecasts are constructed. This empirical prediction methodology provides reasonably accurate price estimates for the out-of-sample period from 09/17/2019 to 04/15/2021, with a root mean square error of 9.6951, mean absolute error of 5.4218, and correlation coefficient of 99.9122%. Governments and investors can arrive at informed decisions regarding regional scrap steel markets by using pricing research models. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
9. Music statistics: uncertain logistic regression models with applications in analyzing music.
- Author
-
Lu, Jue, Zhou, Lianlian, Zeng, Wenxing, and Li, Anshui
- Subjects
REGRESSION analysis ,LOGISTIC regression analysis ,RESEARCH personnel ,DATA analysis ,STATISTICS ,AMBIGUITY - Abstract
In the realm of data analysis, traditional statistical methods often struggle when faced with ambiguity and uncertainty inherent in real world data. Uncerainty theory, developed to better model and interpret such data, offers a promising alternative to conventional techniques. In this paper, we establish logistic regression models to initiate music statistics based on uncertainty theory. In particular, we will classify the music into different types named Baroque, Classical, Romantic, and Impressionism based on four characteristics: harmonic complexity, rhythmic complexity, texture complexity, and formal structure, with the help of the uncertain logistic models proposed. This theoretical framework for music classification provides a nuanced understanding of how music is interpreted under conditions of ambiguity and variability. Compared with the probabilistic counterpart, our approach highlights the versatility of uncertainty theory and provides researchers one much more feasible method to analyze the often-subjective nature of music reception, as well as broadening the potential applications of uncertainty theory. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
10. Predicting fungal infection sensitivity of sepals in harvested tomatoes using imaging spectroscopy and partial least squares discriminant analysis.
- Author
-
Bertotto, Mercedes, de Villiers, Hendrik AC, Chauhan, Aneesh, Hogeveen-van Echtelt, Esther, Mensink, Manon, Grbovic, Zeljana, Stefanovic, Dimitrije, Panic, Marko, and Brdar, Sanja
- Subjects
TOMATO diseases & pests ,MYCOSES ,TOMATO harvesting ,TOMATO yields ,TOMATO varieties ,HYPERSPECTRAL imaging systems - Abstract
Tomatoes (Solanum lycopersicum L.) are a widely grown and globally traded vegetable, essential for both local consumption and international trade. However, approximately 30% of harvested tomato yields are lost due to fungal decay during postharvest handling. Timely disease identification is crucial to prevent such losses, but certain tomato varieties exhibit higher susceptibility to fungal infections than others. Additionally, there are variations in susceptibility among individual sepals, with unknown underlying causes. Traditional methods for assessing fungal presence in plants have limitations, such as sample destruction and a focus on symptom detection rather than evaluating susceptibility to fungal infection. Hence, there is a demand need for an accurate, non-destructive method capable of predicting susceptibility to fungal infection. The use of hyperspectral imaging (HSI) with chemometrics presents a pioneering approach to address this need. In this study, three tomato cultivars ('Brioso,' 'Cappricia,' and 'Provine') were studied. Hyperspectral images were captured on day-1 of harvest, followed by controlled fungal growth conditions. Ground truth assessments were conducted by three experts on day-3 and day-4, averaging severity scores assigned per sepal. The methodology involved extracting spectra from HSI images and calibrating and validating models using partial least squares discriminant analysis (PLSDA), aiming to optimize model parameters for accurate predictions. The models were categorized into those developed using data from a single variety (intravariety) and those utilizing data from multiple varieties combined (global models). The best-performing intravariety model was established using the Cappricia variety, achieving a balanced accuracy of 0.84. Conversely, a global model combining Cappricia and Provine varieties achieved a balanced accuracy of 0.70. Overall, the results suggest that distinguishing between more and less susceptible sepals is feasible under controlled conditions. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
11. Numerical properties of solutions of LASSO regression.
- Author
-
Lakshmi, Mayur V. and Winkler, Joab R.
- Subjects
- *
CONSTRAINT satisfaction , *LEAST squares , *LINEAR systems , *EQUATIONS - Abstract
The determination of a concise model of a linear system when there are fewer samples m than predictors n requires the solution of the equation A x = b , where A ∈ R m × n and rank A = m , such that the selected solution from the infinite number of solutions is sparse, that is, many of its components are zero. This leads to the minimisation with respect to x of f (x , λ) = ‖ A x − b ‖ 2 2 + λ ‖ x ‖ 1 , where λ is the regularisation parameter. This problem, which is called LASSO regression, yields a family of functions x lasso (λ) and it is necessary to determine the optimal value of λ , that is, the value of λ that balances the fidelity of the model, ‖ A x lasso (λ) − b ‖ ≈ 0 , and the satisfaction of the constraint that x lasso (λ) be sparse. The aim of this paper is an investigation of the numerical properties of x lasso (λ) , and the main conclusion of this investigation is the incompatibility of sparsity and stability, that is, a sparse solution x lasso (λ) that preserves the fidelity of the model exists if the least squares (LS) solution x ls = A † b is unstable. Two methods, cross validation and the L-curve, for the computation of the optimal value of λ are compared and it is shown that the L-curve yields significantly better results. This difference between stable and unstable solutions x ls of the LS problem manifests itself in the very different forms of the L-curve for these two solutions. The paper includes examples of stable and unstable solutions x ls that demonstrate the theory. [ABSTRACT FROM AUTHOR] more...
- Published
- 2025
- Full Text
- View/download PDF
12. Cubic spline estimation for non parametric uncertain differential equation.
- Author
-
Shi, Yuxin, Zhao, Jiangtao, and Sheng, Yuhong
- Subjects
- *
AIR quality indexes , *DIFFERENTIAL equations , *PARAMETER estimation , *SPLINES , *COMPARATIVE studies - Abstract
Abstract.In the history of researching to estimate the unknown parameters that were in the uncertain differential equation (UDE), the problem of parameter estimation with known functional forms is often studied. However, in practical situations, its functional form is often unknown. In order to deal with this problem, this article proposes the cubic spline method to approximate the autonomous UDE, and perform non parametric estimation on it. The cross validation is introduced to determine the number of term (
J ), which is in the approximate cubic spline. In addition, the uncertain hypothesis testing is given to verify the rationality of this method. Finally, some numerical examples are given. Then this method is applied to a case study of the Beijing Air Quality Index, and a comparative analysis is given to verify the practicability and superiority of the cubic spline method. [ABSTRACT FROM AUTHOR] more...- Published
- 2024
- Full Text
- View/download PDF
13. Forecasts of thermal coal prices through Gaussian process regressions.
- Author
-
Jin, Bingzi and Xu, Xiaojie
- Abstract
Given thermal coal's significance as a tactical energy source, price projections for the commodity are crucial for investors and decision-makers alike. The goal of the current work is to determine whether Gaussian process regressions are useful for this forecast problem using a dataset of closing prices of thermal coal traded on the China Zhengzhou Commodity Exchange from January 4, 2016, to December 31, 2020. This is a significant financial index that has not received enough attention in the literature in terms of price forecasting. Our forecasting exercises make use of Bayesian optimizations and cross-validation. The price from January 02, 2020, to December 31, 2020 is successfully predicted by the generated models, with the out-of-sample relative root mean square error of 0.4210%. Gaussian process regressions are shown to be useful for the thermal coal price forecast problem. The outcomes of this projection might be used as independent technical forecasts or in conjunction with other forecasts for policy research that entails developing viewpoints on price patterns. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
14. Enhancing Indoor Localization Accuracy through Multiple Access Point Deployment.
- Author
-
Aziz, Toufiq and Insoo, Koo
- Subjects
WIRELESS LANs ,STANDARD deviations ,RADIO frequency ,MACHINE learning ,LOCALIZATION (Mathematics) - Abstract
This study addresses the limitations of wireless local area networks in indoor localization by utilizing Extra-Trees Regression (ETR) to estimate locations based on received signal strength indicator (RSSI) values from a radio environment map (REM). We investigate how integrating numerous access points can enhance indoor localization accuracy. By constructing an extensive REM using RSSI data from various access points collected by a mobile robot in the intended interior setting, we evaluate several machine learning regression techniques. Our research pays special attention to an optimized ETR model, validated through 10-fold cross-validation and hyperparameter tuning. We quantitatively evaluate the efficiency of our suggested multi-access-point approach using root mean square error (RMSE) for REM evaluation and location error metrics for accurate localization. The results show that incorporating multiple access points significantly improves indoor localization accuracy, providing a substantial improvement over single-access-point systems when assessing interior radio frequency environments. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
15. Identifying the environmental drivers of corridors and predicting connectivity between seasonal ranges in multiple populations of Alpine ibex (Capra ibex) as tools for conserving migration.
- Author
-
Chauveau, Victor, Garel, Mathieu, Toïgo, Carole, Anderwald, Pia, Beurier, Mathieu, Bouche, Michel, Bunz, Yoann, Cagnacci, Francesca, Canut, Marie, Cavailhes, Jérôme, Champly, Ilka, Filli, Flurin, Frey‐Roos, Alfred, Gressmann, Gunther, Herfindal, Ivar, Jurgeit, Florian, Martinelli, Laura, Papet, Rodolphe, Petit, Elodie, and Ramanzin, Maurizio more...
- Subjects
- *
ENVIRONMENTALISM , *MIGRATORY animals , *FRAGMENTED landscapes , *SEASONS , *SPRING , *WILDLIFE reintroduction , *CORRIDORS (Ecology) , *HOME range (Animal geography) , *INTERNAL migration - Abstract
Aim: Seasonal migrations, such as those of ungulates, are particularly threatened by habitat transformations and fragmentation, climate and other environmental changes caused by anthropogenic activities. Mountain ungulate migrations are neglected because they are relatively short, although traversing heterogeneous altitudinal gradients particularly exposed to anthropogenic threats. Detecting migration routes of these species and understanding their drivers are therefore of primary importance to predict connectivity and preserve ecosystem functions and services. The populations of Alpine ibex Capra ibex have all been reintroduced from the last remnant source population. Despite a general increase in abundance and overall distribution range, ibex populations are mostly disconnected but display intra‐population migrations. Therefore, its conservation is strictly linked to the interplay between external threats and related behavioural responses, including space use and migration. Location: Austria, France, Italy and Switzerland. Methods: By using 337 migratory tracks from 425 GPS‐collared individuals from 15 Alpine ibex populations distributed across their entire range, we (i) identified the environmental drivers of movement corridors in both spring and autumn and (ii) compared the ability of a connectivity modelling algorithm to predict migratory movements between seasonal ranges of the 15 populations, using either population‐specific or multipopulation datasets, and three validation procedures. Results: Steep, south‐facing, snow‐free slopes were selected while high elevation changes were avoided. This revealed the importance of favourable resources and an attempt to limit energy expenditures and perceived predation risk. The abilities of the modelling methods we compared to predict migratory connectivity from the results of those movement analyses were similar. Main Conclusions: The trade‐off between energy expenditure, food and cover was the major driver of migration routes and was overall consistent among populations. Based on these findings, we provided useful connectivity models to inform conservation of Alpine ibex and its habitats, and a framework for future research investigating connectivity in migratory species. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
16. Prediction and model evaluation for space–time data.
- Author
-
Watson, G. L., Reid, C. E., Jerrett, M., and Telesca, D.
- Subjects
- *
PREDICTION models , *CALIFORNIA wildfires , *SPACETIME , *AIR pollution , *INTERPOLATION - Abstract
Evaluation metrics for prediction error, model selection and model averaging on space–time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space–time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
17. Comparing simulated demand flexibility against actual performance in commercial office buildings
- Author
-
Yin, Rongxin, Liu, Jingjing, Piette, Mary Ann, Xie, Jiarong, Pritoni, Marco, Casillas, Armando, Yu, Lili, and Schwartz, Peter
- Subjects
Built Environment and Design ,Architecture ,Demand flexibility ,Commercial office building ,Cross validation ,Control strategy ,Global temperature adjustment ,Field-testing ,Prototype building model ,Demand Flexibility ,commercial office building ,cross validation ,control strategy ,global temperature adjustment ,prototype building model ,Environmental Science and Management ,Building ,Building & Construction ,Built environment and design ,Engineering - Abstract
Commercial building energy benchmarking has been used as a mechanism to evaluate energy use of a single building over time, relative to other similar buildings, or to simulations of a reference building conforming to various energy standards. Lack of empirical demand flexibility data and consistent flexibility metrics has limited the ability to compare demand flexibility performance with estimated demand flexibility in buildings. In this study, we collected demand response performance data for a total of 831 demand response events from 192 sites as a first step to build such a building demand flexibility dataset, and propose a standard core data schema to consolidate field data from different sources. We also performed parametric simulations of a control strategy called “global temperature adjustment” using commercial office prototype building models. We then compared the simulated demand flexibility performance against the actual data for offices with global temperature adjustment strategy implemented. During demand response events with an average outside air temperature of 34 °C (range 23 °C–42 °C), the measured demand decrease intensity of the demand flexibility metrics were 6.1 watts per square meter (W/m2), 10.0 W/m2, 11.1 W/m2, 7.1 W/m2, and 4.7 W/m2 for small, small–medium, medium, medium–large, and large office buildings, respectively. Compared to the measured data in medium- and large-size buildings, the simulated demand decrease intensity was 0.7 W/m2 (17%) lower on average. The discrepancy between simulated and measured peak demand intensities fell within one standard deviation of the mean measured data. The comparison results validate the credibility of simulations in capturing real building data for assessing the technical potential of building demand flexibility. more...
- Published
- 2023
18. The development and validation of the Student Self-feedback Behavior Scale
- Author
-
Yongle Yang, Zi Yan, Jinyu Zhu, Wuyuan Guo, Junsheng Wu, and Bingjun Huang
- Subjects
self-feedback behavior ,scale development and validation ,Chinese student ,cross validation ,measurement invariance ,Psychology ,BF1-990 - Abstract
Though the importance and benefits of students’ active role in the feedback process have been widely discussed in the literature, an instrument for measuring students’ self-feedback behavior is still lacking. This paper reports the development and validation of the Self-feedback Behavior Scale (SfBS), which comprises three dimensions (seeking, processing, and using feedback). The SfBS items were constructed in line with the self-feedback behavioral model. One thousand two hundred fifty-two high school students (Grade 10 to Grade 12) in mainland China participated in this survey. The exploratory factor analysis revealed a three-factor model reaffirmed in the confirmatory factor analysis. The multi-group CFA supported the measurement invariance of the SfBS across gender. Using the SfBS can help researchers and teachers better understand students’ self-feedback behavior and optimize benefits derived from the self-feedback process. more...
- Published
- 2025
- Full Text
- View/download PDF
19. Machine Learning-Based Scrap Steel Price Forecasting for the Northeast Chinese Market
- Author
-
Bingzi Jin and Xiaojie Xu
- Subjects
Regional scrap steel price ,time-series forecast ,Gaussian process regression ,Bayesian optimization ,cross validation ,Economics as a science ,HB71-74 - Abstract
Throughout history, governments and investors have relied on predictions of prices for a broad spectrum of commodities. Using time-series data covering 08/23/2013–04/15/2021, this study investigates the challenging problem of predicting scrap steel prices, which are issued daily for the northeast China market. Previous research has not sufficiently taken into account estimates for this significant commodity price measurement. In this instance, Gaussian process regression methods are created using Bayesian optimisation approaches and cross-validation processes, and the resulting price forecasts are constructed. This empirical prediction methodology provides reasonably accurate price estimates for the out-of-sample period from 09/17/2019 to 04/15/2021, with a root mean square error of 9.6951, mean absolute error of 5.4218, and correlation coefficient of 99.9122%. Governments and investors can arrive at informed decisions regarding regional scrap steel markets by using pricing research models. more...
- Published
- 2024
- Full Text
- View/download PDF
20. The illusion of success: Test set disproportion causes inflated accuracy in remote sensing mapping research
- Author
-
Yuanjun Xiao, Zhen Zhao, Jingfeng Huang, Ran Huang, Wei Weng, Gerui Liang, Chang Zhou, Qi Shao, and Qiyu Tian
- Subjects
Accuracy assessment ,Test set ,Sample size ratio ,Biased accuracy ,Accuracy adjustment ,Cross validation ,Physical geography ,GB3-5030 ,Environmental sciences ,GE1-350 - Abstract
In remote sensing mapping studies, selecting an appropriate test set to accurately evaluate the results is critical. An imprecise accuracy assessment can be misleading and fail to validate the applicability of mapping products. Commencing with the WHU-Hi-HanChuan dataset, this paper revealed the impact of sample size ratios in test sets on accuracy metrics by generating a series of test sets with varying ratios of positive and negative sample size to evaluate the same map. A rigorous approach for accuracy assessment was suggested, and an example of tea plantations mapping is used to demonstrate the process and analyse potential issues in traditional approaches. A scale factor (λ) was constructed to measure the discrepancy in sample size ratios between test sets and actual conditions. Accuracy adjustment formulas were developed and applied to adjust the accuracy of 42 previous maps based on the λ. Results showed a higher ratio of positive to negative sample size in test set led to inflated user’s accuracy (UA), F1-score (F1) and overall accuracy (OA), but had little impact on producer’s accuracy. When the ratio aligned with that in the target area, the UA, F1, and OA closely matched the true values, indicating the proportion of positive and negative samples in test set should be consistent with that in actual situation. The accuracies reported by the traditional approaches including test set sampling from labelled data and 5-fold cross validation were far from the true accuracy and could not reflect the performance of the map. Among 42 previous maps, nearly 60% of the maps had UAs overestimated by 10%, and 9.5% of the maps had UAs and F1s deviations of more than 25%. The conclusions of this study provide a clear caution for future mapping research and assist in producing and identifying truly excellent maps. more...
- Published
- 2024
- Full Text
- View/download PDF
21. An analysis on classification models for customer churn prediction
- Author
-
Kathi Chandra Mouli, Ch. V. Raghavendran, V. Y. Bharadwaj, G. Y. Vybhavi, C. Sravani, Khristina Maksudovna Vafaeva, Rajesh Deorari, and Laith Hussein
- Subjects
Customer churn ,classification models ,class imbalance ,accuracy metrics ,cross validation ,hyper parameters ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
The rapid expansion of technical infrastructure has brought about transformative changes in business operations. A notable consequence of this digital evolution is the proliferation of subscription-based services. With an increasing array of options for goods and services, customer churn has emerged as a significant challenge, posing a threat to businesses across sectors. The direct impact on earnings has prompted businesses to proactively develop tools for predicting potential client turnover. Identifying the underlying factors contributing to churn is crucial for implementing effective retention strategies. Our research makes a pivotal contribution by presenting a churn prediction model designed to assist businesses in identifying clients at risk of churn. The proposed model leverages machine learning classification techniques, with the customer data undergoing thorough pre-processing phases prior to model application. We systematically evaluated ten classification techniques, including Logistic Regression, Support Vector Classifier, Kernel SVM, KNN, Gaussian Naïve Bayes, Decision Tree Classifier, Random Forest, ADA Boost, XGBoost, and Gradient Boost. The assessment encompassed various evaluation metrics, such as ROC AUC Mean, ROC AUC STD, Accuracy Mean, Accuracy STD, Accuracy, Precision, Recall, F1 Score, and F2 Score. Employing 10-fold cross-validation and hyper parameter tuning through GridSearchCV and RandomizedSearchCV, we identified Random Forest as the most effective classifier, achieving an 85% Area Under the Curve (AUC) for optimal results. more...
- Published
- 2024
- Full Text
- View/download PDF
22. Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod
- Author
-
Jonathan Donhauser, Anna Doménech-Pascual, Xingguo Han, Karen Jordaan, Jean-Baptiste Ramond, Aline Frossard, Anna M. Romaní, and Anders Priemé
- Subjects
Trait sequence database ,DNA sequencing ,Microbial community ,Cross validation ,Weighted ensemble model ,Information technology ,T58.5-58.64 ,Ecology ,QH540-549.5 - Abstract
We present a comprehensive, customizable workflow for inferring prokaryotic phenotypic traits from marker gene sequences and modelling the relationships between these traits and environmental factors, thus overcoming the limited ecological interpretability of marker gene sequencing data. We created the trait sequence database ampliconTraits, constructed by cross-mapping species from a phenotypic trait database to the SILVA sequence database and formatted to enable seamless classification of environmental sequences using the SINAPS algorithm. The R package MicEnvMod enables modelling of trait – environment relationships, combining the strengths of different model types and integrating an approach to evaluate the models' predictive performance in a single framework. Traits could be accurately predicted even for sequences with low sequence identity (80 %) with the reference sequences, indicating that our approach is suitable to classify a wide range of environmental sequences. Validating our approach in a large trans-continental soil dataset, we showed that trait distributions were robust to classification settings such as the bootstrap cutoff for classification and the number of discrete intervals for continuous traits. Using functions from MicEnvMod, we revealed precipitation seasonality and land cover as the most important predictors of genome size. We found Pearson correlation coefficients between observed and predicted values up to 0.70 using repeated split sampling cross validation, corroborating the predictive ability of our models beyond the training data. Predicting genome size across the Iberian Peninsula, we found the largest genomes in the northern part. Potential limitations of our trait inference approach include dependence on the phylogenetic conservation of traits and limited database coverage of environmental prokaryotes. Overall, our approach enables robust inference of ecologically interpretable traits combined with environmental modelling allowing to harness traits as bioindicators of soil ecosystem functioning. more...
- Published
- 2024
- Full Text
- View/download PDF
23. Steel price index forecasts through machine learning for northwest China: Steel price index forecasts through machine learning for northwest China
- Author
-
Jin, Bingzi and Xu, Xiaojie
- Published
- 2024
- Full Text
- View/download PDF
24. Spatial Distribution of Soil pH Status in Forest Soils of Telangana using GIS-Based Geo-Statistical Models
- Author
-
Patel, Ruby and Panwar, Vijender Pal
- Published
- 2024
- Full Text
- View/download PDF
25. Applicability of smell agent optimization and Tasmanian devil optimization hybridized with ANFIS and SVR as reliable solutions in estimation of cooling load in buildings
- Author
-
Li, Shaoxu
- Published
- 2024
- Full Text
- View/download PDF
26. A compartmental model for smoking dynamics in Italy: a pipeline for inference, validation, and forecasting under hypothetical scenarios
- Author
-
Alessio Lachi, Cecilia Viscardi, Giulia Cereda, Giulia Carreras, and Michela Baccini
- Subjects
Compartmental models ,Smoking dynamics ,Tobacco control policies ,Global sensitivity analysis ,Parametric bootstrap ,Cross validation ,Medicine (General) ,R5-920 - Abstract
Abstract We propose a compartmental model for investigating smoking dynamics in an Italian region (Tuscany). Calibrating the model on local data from 1993 to 2019, we estimate the probabilities of starting and quitting smoking and the probability of smoking relapse. Then, we forecast the evolution of smoking prevalence until 2043 and assess the impact on mortality in terms of attributable deaths. We introduce elements of novelty with respect to previous studies in this field, including a formal definition of the equations governing the model dynamics and a flexible modelling of smoking probabilities based on cubic regression splines. We estimate model parameters by defining a two-step procedure and quantify the sampling variability via a parametric bootstrap. We propose the implementation of cross-validation on a rolling basis and variance-based Global Sensitivity Analysis to check the robustness of the results and support our findings. Our results suggest a decrease in smoking prevalence among males and stability among females, over the next two decades. We estimate that, in 2023, 18% of deaths among males and 8% among females are due to smoking. We test the use of the model in assessing the impact on smoking prevalence and mortality of different tobacco control policies, including the tobacco-free generation ban recently introduced in New Zealand. more...
- Published
- 2024
- Full Text
- View/download PDF
27. Heart Sound Processing for Early Diagnostic of Heart Abnormalities using Support Vector Machine
- Author
-
Sebastian Michael Paschalis, Duma Kristina Yanti Hutapea, and Karel Octavianus Bachri
- Subjects
support vector machine ,heart sound ,linear kernel ,cross validation ,heart disease ,early diagnostic ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Information technology ,T58.5-58.64 - Abstract
This paper addresses the critical issue of cardiovascular disease (CVD), the leading cause of global mortality, emphasizing the imperative for effective and early detection to mitigate CVD-related deaths. The research problem underscores the urgency of developing advanced diagnostic tools to identify heart abnormalities promptly. The primary objective is to create a Support Vector Machine (SVM) algorithm for accurate classification of different heart conditions, namely Normal heart, Mitral Stenosis, and Mitral Regurgitation. To achieve this objective, the study utilizes a dataset of heart sounds available online using a 10-fold cross-validation method. The focus is on evaluating the efficacy of various kernel functions within the SVM framework for heart sound classification. The findings demonstrate that the linear kernel exhibits superior accuracy and robustness in effectively classifying heart conditions. Notably, the proposed classification method attains an impressive 96% accuracy, highlighting its potential as a reliable tool for early detection of cardiovascular diseases. This research contributes to the ongoing efforts to enhance diagnostic capabilities and ultimately reduce the global burden of CVD-related fatalities. more...
- Published
- 2024
- Full Text
- View/download PDF
28. A Deep Learning-based U-Net 3+ Technique for Segmentation Blood Cell.
- Author
-
ULUTAŞ, Hasan
- Subjects
- *
DEEP learning , *BLOOD cells , *BAYESIAN analysis , *ROBUST statistics , *IMAGE segmentation - Abstract
Segmentation and classification of blood cells are crucial for various medical applications, including disease diagnosis, treatment monitoring, and research purposes. This process allows for accurate identification and quantification of different cell types, aiding in the detection and understanding of various blood-related disorders. The proposed U-Net 3+ architecture incorporates structural modifications, including strengthened connections between convolutional layers, increased filter numbers, and integration of Bayesian optimization for hyperparameter tuning. The model's generalization capability is optimized through the dynamic adjustment of dropout rates and learning rates. Bayesian optimization facilitates the exploration of optimal hyperparameter combinations, allowing the model to adapt effectively to diverse datasets. Advanced training strategies, such as adaptive learning rate adjustment and early stopping, are employed to mitigate overfitting and enhance training efficiency. The proposed model exhibits exceptional performance across multiple folds, achieving low training and validation losses, high accuracy metrics, and robust segmentation indices. Evaluation metrics, including mean IoU (Jaccard Index), dice score, pixel accuracy, and precision, confirm the model's proficiency in accurately delineating blood cell boundaries. The study demonstrates the effectiveness of custom architectures and optimization techniques, achieving an average IoU (Jaccard Index) of 0.9324 and a dice score of 0.9667. The proposed U-Net 3+ model stands as a promising solution for accurate and reliable blood cell segmentation, demonstrating adaptability and robust performance across various datasets. This work sets the stage for future research in the domain of medical image segmentation, emphasizing the potential for continued advancements in precise and efficient segmentation methodologies. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
29. Random forest based quantile-oriented sensitivity analysis indices estimation.
- Author
-
Elie-Dit-Cosaque, Kévin and Maume-Deschamps, Véronique
- Subjects
- *
RANDOM forest algorithms , *SENSITIVITY analysis , *TREE size , *QUANTILE regression - Abstract
We propose a random forest based estimation procedure for Quantile-Oriented Sensitivity Analysis—QOSA. In order to be efficient, a cross-validation step on the leaf size of trees is required. Our full estimation procedure is tested on both simulated data and a real dataset. Our estimators use either the bootstrap samples or the original sample in the estimation. Also, they are either based on a quantile plug-in procedure (the R-estimators) or on a direct minimization (the Q-estimators). This leads to 8 different estimators which are compared on simulations. From these simulations, it seems that the estimation method based on a direct minimization is better than the one plugging the quantile. This is a significant result because the method with direct minimization requires only one sample and could therefore be preferred. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
30. A New Comparative Approach Based on Features of Subcomponents and Machine Learning Algorithms to Detect and Classify Power Quality Disturbances.
- Author
-
Akkaya, Sıtkı, Yüksek, Emre, and Akgün, Hasan Metehan
- Subjects
- *
POWER quality disturbances , *MACHINE learning , *COMPARATIVE method , *FEATURE extraction , *K-nearest neighbor classification - Abstract
Current measurement systems based on the IEEE-1159 standard have some limitations and robustness problems under noisy and fast-changing conditions. Besides, applying different methods for each Power Quality Disturbance (PQD) to every window is required but time-consuming and not feasible. Therefore, different kinds of two-stage methods, Detection and Classification (D&C), have been improved in many studies. Then, the required measurement can be performed to define disturbance. For this purpose, a new approach based on features of subcomponents with Machine Learning Algorithms (MLAs) to detect and classify PQDs is proposed. 21-class dataset including single and multiple PQDs under different noisy conditions was prepared randomly. Of this dataset, determined features were extracted and some of these were selected. Then, selected features were trained and tested with some MLAs in a workstation. Results obtained from comparative MLAs and the other classification methods show that the best MLA with related features is Random Forest with 96.97% while LightGBM, k-Nearest Neighbors, and XGBoost 96.85%, 96.73%, and 92.82% accuracy, respectively. Because the selected features, optimized parameters, and the related MLA were obtained by investigating for features provided from the PQDs in the whole parameter space, this approach brings the advantages of high accuracy, low D&C complexity, and computing load. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
31. Greedy Weighted Stacking of Machine Learning Models for Optimizing Dam Deformation Prediction.
- Author
-
Alocén, Patricia, Fernández-Centeno, Miguel Á., and Toledo, Miguel Á.
- Subjects
DAMS ,STACKING machines ,DAM safety ,CONCRETE dams ,ARTIFICIAL intelligence ,GREEDY algorithms ,MACHINE learning - Abstract
Dam safety monitoring is critical due to its social, environmental, and economic implications. Although conventional statistical approaches have been used for surveillance, advancements in technology, particularly in Artificial Intelligence (AI) and Machine Learning (ML), offer promising avenues for enhancing predictive capabilities. We investigate the application of ML algorithms, including Boosted Regression Trees (BRT), Random Forest (RF), and Neural Networks (NN), focussing on their combination by Stacking to improve prediction accuracy on concrete dam deformation using radial displacement data from three dams. The methodology involves training first-level models (experts) using those algorithms, and a second-level meta-learner that combines their predictions using BRT, a Linear Model (LM) and the Greedy Weighted Algorithm (GWA). A comparative analysis demonstrates the superiority of Stacking over traditional methods. The GWA emerged as the most suitable meta-learner, enhancing the optimal expert in all cases, with improvement rates reaching up to 16.12% over the optimal expert. Our study addresses critical questions regarding the GWA's expert weighting and its impact on prediction precision. The results indicate that the combination of accurate experts using the GWA improves model reliability by reducing error dispersion. However, variations in optimal weights over time necessitate robust error estimation using cross-validation by blocks. Furthermore, the assignment of weights to experts closely correlates with their precision: the more accurate a model is, the more weight that is assigned to it. The GWA improves on the optimal expert in most cases, including at extreme values of error, with improvement rates up to 41.74%. Our findings suggest that the proposed methodology significantly advances AI applications in infrastructure monitoring, with implications for dam safety. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
32. Optimal Latent Variables Number for the Reconstruction of Time Series with PLSR
- Author
-
Balsa, Carlos, Dupuis, Hugo, Breve, Murilo-M., Guivarch, Ronan, Rufino, José, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Garcia, Marcelo V., editor, Gordón-Gallegos, Carlos, editor, Salazar-Ramírez, Asier, editor, and Nuñez, Carlos, editor more...
- Published
- 2024
- Full Text
- View/download PDF
33. Credit Card Fraud Detection Based on Machine Learning Prediction
- Author
-
Yang, Ge, Fournier-Viger, Philippe, Series Editor, and Wang, Yulin, editor
- Published
- 2024
- Full Text
- View/download PDF
34. Cricket Forecast: Unraveling Future Matches’ Outcomes
- Author
-
Siddharth, Vemula Vivek, Vikranth, Pulukuri Shalem, Karthik, N., Vani, V., Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Carette, Jacques, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Owoc, Mieczyslaw Lech, editor, Varghese Sicily, Felix Enigo, editor, Rajaram, Kanchana, editor, and Balasundaram, Prabavathy, editor more...
- Published
- 2024
- Full Text
- View/download PDF
35. MetroPT Predictive Maintenance Using Logistic Regression and Random Forest with Isolation Forest Preprocessing
- Author
-
Sandhu, Jaspreet, Mahapatra, Bandana, Kulkarni, Sarang, Bhatt, Abhishek, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Pant, Millie, editor, Deep, Kusum, editor, and Nagar, Atulya, editor more...
- Published
- 2024
- Full Text
- View/download PDF
36. A Machine Learning Approach for Risk Prediction of Cardiovascular Disease
- Author
-
Panda, Shovna, Palei, Shantilata, Samartha, Mullapudi Venkata Sai, Jena, Biswajit, Saxena, Sanjay, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Kaur, Harkeerat, editor, Jakhetiya, Vinit, editor, Goyal, Puneet, editor, Khanna, Pritee, editor, Raman, Balasubramanian, editor, and Kumar, Sanjeev, editor more...
- Published
- 2024
- Full Text
- View/download PDF
37. Breast Cancer Detection: An Evaluation of Machine Learning, Ensemble Learning, and Deep Learning Algorithms
- Author
-
Rai, Deepak, Mishra, Tripti, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Chauhan, Naveen, editor, Yadav, Divakar, editor, Verma, Gyanendra K., editor, Soni, Badal, editor, and Lara, Jorge Morato, editor more...
- Published
- 2024
- Full Text
- View/download PDF
38. Android Malware Detection Using Artificial Intelligence
- Author
-
Masele, Rebecca Kipanga, Khennou, Fadoua, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Lopata, Audrius, editor, Gudonienė, Daina, editor, and Butkienė, Rita, editor more...
- Published
- 2024
- Full Text
- View/download PDF
39. Evaluation of four machine learning methods in predicting orthodontic extraction decision from clinical examination data and analysis of feature contribution
- Author
-
Jialiang Huang, Ian-Tong Chan, Zhixian Wang, Xiaoyi Ding, Ying Jin, Congchong Yang, and Yichen Pan
- Subjects
orthodontic treatment ,tooth extraction decision ,decision tree ,machine learning ,cross validation ,Biotechnology ,TP248.13-248.65 - Abstract
IntroductionThe study aims to predict tooth extraction decision based on four machine learning methods and analyze the feature contribution, so as to shed light on the important basis for experts of tooth extraction planning, providing reference for orthodontic treatment planning.MethodsThis study collected clinical information of 192 patients with malocclusion diagnosis and treatment plans. This study used four machine learning strategies, including decision tree, random forest, support vector machine (SVM) and multilayer perceptron (MLP) to predict orthodontic extraction decisions on clinical examination data acquired during initial consultant containing Angle classification, skeletal classification, maxillary and mandibular crowding, overjet, overbite, upper and lower incisor inclination, vertical growth pattern, lateral facial profile. Among them, 30% of the samples were randomly selected as testing sets. We used five-fold cross-validation to evaluate the generalization performance of the model and avoid over-fitting. The accuracy of the four models was calculated for the training set and cross-validation set. The confusion matrix was plotted for the testing set, and 6 indicators were calculated to evaluate the performance of the model. For the decision tree and random forest models, we observed the feature contribution.ResultsThe accuracy of the four models in the training set ranges from 82% to 90%, and in the cross-validation set, the decision tree and random forest had higher accuracy. In the confusion matrix analysis, decision tree tops the four models with highest accuracy, specificity, precision and F1-score and the other three models tended to classify too many samples as extraction cases. In the feature contribution analysis, crowding, lateral facial profile, and lower incisor inclination ranked at the top in the decision tree model.ConclusionAmong the machine learning models that only use clinical data for tooth extraction prediction, decision tree has the best overall performance. For tooth extraction decisions, specifically, crowding, lateral facial profile, and lower incisor inclination have the greatest contribution. more...
- Published
- 2024
- Full Text
- View/download PDF
40. Prediction of one- and three-months yoga practices effect on chronic venous insufficiency based on machine learning classifiers
- Author
-
Xue Han and Nan Hu
- Subjects
Chronic venous insufficiency ,Cross validation ,Univariate selection ,Correlation matrix ,Classification methods ,Optimization algorithms ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The rise of technology has heightened work demands, adversely impacting mental health and fitness. The COVID-19 pandemic exacerbates psychological stress, emphasizing the need for non-pharmacological interventions like yoga. Yoga positively influences the autonomic nervous system, benefiting cardio-respiratory health, metabolic efficiency, and conditions like Type-2 diabetes, Chronic Venous disease, and obesity. This study employs a dataset with 100 samples and 43 features related to Chronic Venous Insufficiency (CVI). Logistic and Random Forest classifiers are validated using K-fold cross-validation, with feature selection optimizing prediction accuracy. Hybrid models, enhanced with optimization algorithms, predict Venous Clinical Severity Score (VCSS) before, one, and three months after yoga practices. The Random Forest classifier, particularly RFGT, proves highly accurate in categorizing baseline severity and identifying Mild and Moderate CVI cases. RFGT demonstrated AUC score of 0.9072, 0.8714, 0.7709, and 0.7200 in Absent, Mild, Moderate, and Severe patient groups classification before yoga practices (VCSS-Pre). These values were 0.9158, 0.8644, 0.8142, and 0.6333 for VCSS-1 and reported as 0.9269, 0.8399, 0.7838, and 0.7500 for patients’ classification in VCSS-3. Predicting VCSS scores before yoga intervention assists in categorizing participants for personalized care and efficient resource allocation. The RFC-based models, notably RFGT, show high accuracy in identifying baseline severity, enabling early intervention for high-risk individuals. These models, especially RFGT, perform well in classifying Mild and Moderate CVI cases, informing lifestyle modifications. Predicting VCSS-1 scores evaluates the short-term impact of yoga practices, identifying individuals requiring additional support. RFGT aids in personalized recommendations based on specific factors, crucial for severe conditions. Predicting VCSS-3 scores assesses the sustained impact over three months, identifying intervention responders, particularly in Severe and Moderate groups. RFGT demonstrates optimal predictions, contributing to future interventions tailored to individual responses and improved outcomes. more...
- Published
- 2024
- Full Text
- View/download PDF
41. Convolutional Neural Network untuk Klasifikasi Batik Tenun Ikat Bandar Berdasarkan Fitur Warna dan Tekstur
- Author
-
Mohammad Atif Faiz Muthrofin, Danang Erwanto, and Iska Yanuartanti
- Subjects
tenun ikat ,cnn ,glcm ,ccm ,cross validation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Tenun Ikat Bandar Kediri adalah salah satu jenis batik berupa kain yang ditenun dan diberi suatu pola dan motif pada teksturnya menggunakan suatu mesin tenun kayu tradisional. Pola dan motif pada batik tenun ikat sangat bervariasi tergantung pada rumah produksinya. Biasanya setiap rimah produksi memiliki suatu ciri khas khusus pada pola dan motifnya. Banyaknya pola dan motif tersebut akan menjadikan masyarakat sulit mengenali dan mempelajari ciri visual Tenun Ikat tersebut sehingga bila ada suatu sistem yang mempelajari pola dan motif tersebut maka akan sangat membantu masyarakat. Sistem klasifikasi yang dibuat pada penelitian ini mengimplementasikan algoritma Convolutional Neural Network (CNN) dengan ekstraksi tekstur Tenun menggunakan fitur Gray Level Cooccurence Matrix (GLCM) dan ekstraksi warna menggunakan fitur Color Co-occourrence Matrix (CCM). Pada penelitian ini menggunakan dataset sebanyak 125 citra gambar dari 5 motif batik pada suatu rumah produksi tenun ikat dengan proporsi setiap pola yang seimbang. Hasil dari penelitian ini menunjukkan bahwa rata-rata akurasi dari setiap pengujian mencapai angka 0,94, ini menunjukkan bahwa metode yang dimaksudkan telah dapat melakukan klasifikasi dengan baik. more...
- Published
- 2024
- Full Text
- View/download PDF
42. Assessing risk factors for malnutrition among women in Bangladesh and forecasting malnutrition using machine learning approaches
- Author
-
Estiyak Ahmed Turjo and Md. Habibur Rahman
- Subjects
Malnutrition ,Machine learning ,Cross validation ,Bangladesh ,Nutrition. Foods and food supply ,TX341-641 ,Food processing and manufacture ,TP368-456 ,Medicine (General) ,R5-920 - Abstract
Abstract Background This paper presents an in-depth examination of malnutrition in women in Bangladesh. Malnutrition in women is a major public health issue related to different diseases and has negative repercussions for children, such as premature birth, decreased infection resistance, and an increased risk of death. Moreover, malnutrition is a severe problem in Bangladesh. Data from the Bangladesh Demographic Health Survey (BDHS) conducted in 2017-18 was used to identify risk factors for malnourished women and to create a machine learning-based strategy to detect their nutritional status. Methods A total of 17022 women participants are taken to conduct the research. All the participants are from different regions and different ages. A chi-square test with a five percent significance level is used to identify possible risk variables for malnutrition in women and six machine learning-based classifiers (Naïve Bayes, two types of Decision Tree, Logistic Regression, Random Forest, and Gradient Boosting Machine) were used to predict the malnutrition of women. The models are being evaluated using different parameters like accuracy, sensitivity, specificity, positive predictive value, negative predictive value, $$F_1$$ F 1 score, and area under the curve (AUC). Results Descriptive data showed that 45% of the population studied were malnourished women, and the chi-square test illustrated that all fourteen variables are significantly associated with malnutrition in women and among them, age and wealth index had the most influence on their nutritional status, while water source had the least impact. Random Forest had an accuracy of 60% and 60.2% for training and test data sets, respectively. CART and Gradient Boosting Machine also had close accuracy like Random Forest but based on other performance metrics such as kappa and $$F_1$$ F 1 scores Random Forest got the highest rank among others. Also, it had the highest accuracy and $$F_1$$ F 1 scores in k-fold validation along with the highest AUC (0.604). Conclusion The Random Forest (RF) approach is a reasonably superior machine learning-based algorithm for forecasting women’s nutritional status in Bangladesh in comparison to other ML algorithms investigated in this work. The suggested approach will aid in forecasting which women are at high susceptibility to malnutrition, hence decreasing the strain on the healthcare system. more...
- Published
- 2024
- Full Text
- View/download PDF
43. Predicting open interest in thermal coal futures using machine learning
- Author
-
Jin, Bingzi and Xu, Xiaojie
- Published
- 2024
- Full Text
- View/download PDF
44. Forecasts of coking coal futures price indices through Gaussian process regressions
- Author
-
Jin, Bingzi and Xu, Xiaojie
- Published
- 2024
- Full Text
- View/download PDF
45. Yapay Sinir Ağları ve Derin Öğrenme Modeli Kullanılarak USD/TRY Döviz Kurunun Tahmin Edilmesi.
- Author
-
GÜMÜŞ, Ersin
- Abstract
The exchange rate is one of the most important economic indicators for many reasons, such as affecting the costs of inputs such as raw materials, energy and technological products, the convertibility of external debt, and the risks posed by exchange rate volatility on the economy. In the study, it is aimed to predict the end-of-month values of the USD/TRY exchange rate through the macroeconomic data published in the current month in line with the data disclosure calendar, using artificial neural networks and deep learning method. In the first stage of this study, in which monthly data covering the period 05:2006 - 08:2022 were used, the data were separated as training, validation and test sets, and different deep learning architectures were tried with different layers and neuron numbers and the most suitable model was determined. In the second stage, the consistency of the determined model was examined by using the cross-validation method and as a result of the findings, positive results were obtained for the consistency of the model. At the last stage, USD/TRY exchange rates for September 2022 and October 2022 were estimated with the deep learning model. It has been observed that the deep learning model can produce prediction values that are very close to the real values within certain error limits, and that the independent variables used have the power to predict the end-of-month level of the USD/TRY exchange rate. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
46. Using ARIMA Model to Forecast Production of kharif Sweet Potato in Odisha.
- Author
-
Pradhan, Jayashree and Dash, Abhiram
- Abstract
This article, published in the journal "Environment & Ecology," presents a study on forecasting the production of kharif sweet potato in Odisha, India using the ARIMA model. The researchers collected data from 1970-2020 and determined the best fit ARIMA models for the area, yield, and production of sweet potatoes. The results suggest that these variables will remain constant in the future. The study provides valuable insights for farmers and policymakers in the region. [Extracted from the article] more...
- Published
- 2024
- Full Text
- View/download PDF
47. Implementing Time Series Cross Validation to Evaluate the Forecasting Model Performance.
- Author
-
Winita Sulandari, Yudho Yudhanto, Sri Subanti, Etik Zukhronah, and Muhammad Zidni Subarkah
- Subjects
BOX-Jenkins forecasting ,FORECASTING ,STATISTICAL smoothing - Abstract
Theoretically, forecast error increases as the forecast horizon increases. This study aims to assess whether the statement is generally accepted or not. This study applies time series cross-validation to evaluate forecasting results up to seven steps ahead. As an illustration, we use Malaysia's hourly electricity load data. Each hour is considered a series of each, so there are 24 daily series. Time series cross-validation with a 334 window was applied to 24 data series, and then each daily series was modeled with the Autoregressive Integrated Moving Average (ARIMA), Neural Network Autoregressive (NNAR), ExponenTial Smoothing (ETS), Singular Spectrum Analysis (SSA), and General Regression Neural Network (GRNN) models. In terms of mean absolute percentage error (MAPE) from one to seven steps ahead, we then evaluate the performance of all models. The experimental results show that the MAPEs obtained from the GRNN model tend to increase along with the theory. However, MAPEs obtained from ETS increase by up to three steps ahead and decrease after that. Among the five models, ARIMA, NNAR, and SSA produce a reasonably stable MAPE value for one to seven steps ahead. However, SSA has the most stable error value compared to ARIMA and NNAR. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
48. Analysis of EEG signals with the use of wavelet transform for accurate classification of Alzheimer Disease, Frontotemporal Dementia and healthy subjects using Machine Learning Models.
- Author
-
Parihar, Akanksha and Swami, Preety D.
- Subjects
MACHINE learning ,ALZHEIMER'S disease ,NOSOLOGY ,MULTIPLE Signal Classification ,FRONTOTEMPORAL dementia - Abstract
Dementia is a brain disorder, if not prevented; takes the form of various types of diseases that have no cure yet. Accurate classification of multiple types of dementia diseases is required to provide proper medication to the patient so that growth of that disease can be delayed. This study analyzes EEG signal for the classification of multiple dementia diseases such as Alzheimer's disease (AD), Fronto-temporal dementia (FTD) and control normal (CN) subjects using machine learning (ML) algorithms. Each of the 19 channels of EEG dataset is analyzed separately in this work to perform the classification. Combination of parameters like Hjorth Activity, Mobility and Complexity along with kurtosis value of the data has been extracted in time-frequency domain for each EEG frequency band (Delta, Theta, Alpha, Beta and Gamma) is applied to the machine learning algorithms. This research is focused on classification of multiple dementia classes (ADvsFTD) as well as three-way (ADvsFTDvsCN) classification. This research is validated using public EEG dataset with 23 participants of each category. Best classification result is achieved using random forest classifier and leave-one-subject-out (LOSO) cross validation method. The three-way classification i.e., ADvsCNvsFTD achieved best accuracy of 75.29%, whereas binary classifications i.e. ADvsCN, ADvsFTD and CNvsFTD achieved best accuracy of 88.90%, 88.44% and 84.10% respectively. The proposed framework shows better results than existing work on dementia classification using machine learning. The results obtained from proposed framework showed that combination of EEG frequency band features can be utilized for the classification of multiple dementia diseases with greater accuracy. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
49. Joint species distribution modeling with competition for space.
- Author
-
Kettunen, Juho, Mehtätalo, Lauri, Tuittila, Eeva‐Stiina, Korrensalo, Aino, and Vanhatalo, Jarno
- Subjects
SPECIES distribution ,COMPETITION (Biology) ,BIOTIC communities ,LATENT variables ,GROUND vegetation cover ,GROUND cover plants - Abstract
Joint species distribution models (JSDM) are among the most important statistical tools in community ecology. However, existing JSDMs cannot model mutual exclusion between species. We tackle this deficiency in the context of modeling plant percentage cover data, where mutual exclusion arises from limited growing space and competition for light. We propose a hierarchical JSDM where latent Gaussian variable models describe species' niche preferences and Dirichlet‐Multinomial distribution models the observation process and competition between species. We also propose a decision theoretic model comparison and validation approach to assess the goodness of JSDMs in four different types of predictive tasks. We apply our models and methods to a case study on modeling vegetation cover in a boreal peatland. Our results show that ignoring the interspecific interactions and competition reduces models' predictive performance and leads to biased estimates for total percentage cover. Models' relative predictive performance also depends on the predictive task highlighting that model comparison and assessment should resemble the true predictive task. Our results also demonstrate that the proposed JSDM can be used to simultaneously infer interspecific correlations in niche preference as well as mutual competition for space and through that provide novel insight into ecological research. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
50. Integrated machine learning methods with oversampling technique for regional suitability prediction of waste-to-energy incineration projects.
- Author
-
Hou, Yali, Wang, Qunwei, Zhou, Kai, Zhang, Ling, and Tan, Tao
- Subjects
- *
MACHINE learning , *WASTE products as fuel , *WASTE management , *CLEAN energy , *SUSTAINABLE development , *INCINERATION , *REFUSE as fuel - Abstract
[Display omitted] • A data-driven model is proposed to predict the suitability of incineration plants. • Machine learning models integrated with oversampling boost model performance. • The stacking technique aids in bolstering the model's generalization ability. • The data-driven model is reusable and can predict new alternative locations. China's tiered strategy to enhance county-level waste incineration for energy aligns with the sustainable development goals (SDGs), emphasizing the need for comprehensive assessments of waste-to-energy (WtE) plant suitability. Traditional assessment methodologies face challenges, particularly in suggesting innovative site alternatives, adapting to new data sets, and their dependence on strict assumptions. This study introduced enhancements in three pivotal dimensions. Methodologically, it leverages data-driven machine learning (ML) approaches to capture the complex relationships essential for site selection, reducing dependency on strict assumptions. In terms of predictive performance, the integration of oversampling with stacked ensemble models enhances the diversity and generalizability of ML models. The area under curve (AUC) scores from four ML models, enhanced by the oversampled dataset, demonstrated significant improvements compared to the original dataset. The stacking model excelled, achieving a score of 92%. It also led in overall Precision and Recall, reaching 85.2% and 85.08% respectively. Nevertheless, a noticeable discrepancy existed in Precision and Recall for positive classes. The stacking model topped Precision scores at 83.1%, followed by eXtreme Gradient Boosting (XGBoost) (82.61%). In terms of Recall, XGBoost recorded the lowest at 85.07%, while the other three classifiers all marked 88.06%. From an industry applicability standpoint, the stacking model provides innovative location alternatives and demonstrates adaptability in Hunan province, offering a reusable tool for WtE location. In conclusion, this study not only enhances the methodological aspects of WtE site selection but also provides practical and adaptable solutions, contributing positively to sustainable waste management practices. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.