Descriptor: "regression tree" / Search Limiters: Peer Reviewed - Searchworks@Jio Institute Digital Library Search Results

1. Ensemble learning based approach for the prediction of monthly significant wave heights

Author: Chen, Jinzhou and Xue, Xinhua
Published: 2025
Full Text: View/download PDF

2. Forecasting the performance and emissions of a diesel engine powered by waste cooking biodiesel with carbon nano additives using tree-based, least square boost and Gaussian regression models

Author: Gad, M.S. and Alenany, Ahmed
Published: 2025
Full Text: View/download PDF

3. Using regression tree analysis to examine demographic and geographic characteristics of COVID-19 vaccination trends over time, United States, May 2021–April 2022, National Immunization Survey Adult COVID Module

Author: Earp, Morgan, Meng, Lu, Black, Carla L., Carter, Rosalind J., Lu, Peng-Jun, Singleton, James A., and Chorba, Terence
Published: 2024
Full Text: View/download PDF

4. Unraveling birth weight determinants: Integrating machine learning, spatial analysis, and district-level mapping

Author: Rubaiya, Mansur, Mohaimen, Alam, Md. Muhitul, and Rayhan, Md. Israt
Published: 2024
Full Text: View/download PDF

5. Methodological approaches for the assessment of bisphenol A exposure

Author: Costa, Sofia Almeida, Severo, Milton, Correia, Daniela, Carvalho, Catarina, Magalhães, Vânia, Vilela, Sofia, Cunha, Sara, Casal, Susana, Lopes, Carla, and Torres, Duarte
Published: 2023
Full Text: View/download PDF

6. Analysis and modeling of high-performance polymer electrolyte membrane electrolyzers by machine learning

Author: Günay, M. Erdem, Tapan, N. Alper, and Akkoç, Gizem
Published: 2022
Full Text: View/download PDF

7. A comprehensive evaluation of eco-productivity of the municipal solid waste service in Chile.

Author: Mocholi-Arce, Manuel, Sala-Garrido, Ramon, Molinos-Senante, Maria, and Maziotis, Alexandros
Abstract: Moving toward a circular economy requires improvement of the economic and environmental performance of municipalities in their provision of municipal solid waste (MSW) services. Understanding performance changes over years is fundamental to support decision-making. This study employs the Luenberger-Hicks-Moorsteen productivity indicator to evaluate eco-productivity change and its drivers in the MSW sector in Chile over the years 2015–2019. The further use of decision tree and linear regression analysis allows exploration of the interaction between operating characteristics and eco-productivity estimations. The results of the eco-productivity assessment show that, although the Chilean MSW sector was still facing a transitional period, from 2015 to 2019, eco-productivity increased 1.28% per year. Gains in eco-productivity were due to technical progress and small gains in efficiency, whereas scale effect had an adverse impact. Other factors such as waste spending per inhabitant and the amount of waste collected and recycled per inhabitant had a significant impact on the eco-productivity of Chilean municipalities. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

8. Thermal performance prediction of a V-trough solar water heater with a modified twisted tape using ANFIS, G.L.R., R.T. and SVM models of machine learning

Author: A. Saravanan, S. Rama Sree, M. Sreenivasa Reddy, Elumalai PV, Krishnasamy Karthik, Ashok Kumar Cheeli, and Nasim Hasan
Subjects: Solar water heater, Adaptive neuro-fuzzy inference system, Generalised linear regression, Regression tree, Machine learning, Medicine, Science
Abstract: Abstract Four distinct neural models were used to evaluate the efficiency of a V-trough solar water heater (VTSWH) equipped with square-cut twisted tape (SCTT) and V-cut twisted tape (VCTT) at two different twist ratios, 3 and 5. The objective of this study was the use of ANFIS (Adaptive Neuro-Fuzzy Inference System), G.L.R. (Generalised linear regression), R.T. (Regression tree), and SVM (Support Vector Machine). A total of 162 data sets were acquired for these models through a variety of trials. Outdoor experiments were done using a twist ratio of Y = 3 and Y = 5, using both SCTT and VCTT. The models included eight distinct variables: ambient temperature, water mass flow rate, water intake temperature, water exit temperature, absorber plate temperature, tube temperature, solar intensity, and twist ratio. The dependent variables in this study are the Nusselt number (Nu), friction factor (FF), and efficiency (η). 130 datasets were chosen for training purposes, while 32 were used for testing. Using the ANFIS, G.L.R., R.T., and SVM techniques, the correlation coefficient (R2) values for Nusselt number were 0.9990, 0.9961, 0.9562, and 0.9280 for friction factor 0.9966, 0.9683, 0.9810, and 0.9560, and for efficiency 0.9997, 0.9976, 0.9845, and 0.9614, respectively. Comparing all models shows that ANFIS is the most effective of the four strategies studied. The ANFIS model outperformed the other models regarding Nu, FF, and η, with RMSE values of 0.0805, 0.0.0004, and 0.4534. According to the above data, the VTSWH thermal performance predicted using the ANFIS approach has the highest accuracy.
Published: 2024
Full Text: View/download PDF

9. Tree-based analysis of longevity predictors and their ten-year changes: a 35-Year mortality follow-up

Author: Lily Nosraty, Jaakko Nevalainen, Jani Raitanen, and Linda Enroth
Subjects: Mortality, Relative measure of longevity, Machine learning, Regression tree, Realized probability of dying, Geriatrics, RC952-954.6
Abstract: Abstract Background Prior studies on longevity often examine predictors in isolation and rely solely on baseline information, limiting our understanding of the most important predictors and their dynamic nature. In this study, we used an innovative regression tree model to explore the common characteristics of those who lived longer than their age and sex peers in 35-years follow-up. We identified different pathways leading to a long life, and examined to how changes in characteristics over 10 years (from 1979 to 1989) affect the findings on longevity predictors. Methods Data was obtained from the “Tampere Longitudinal Study on Ageing” (TamELSA) in Finland. Survey data was collected in 1979 from 1056 participants aged 60–89 years (49.8% men). In 1989, a second survey was conducted among 432 survivors from the 1979 cohort (40.2% men). Dates of death were provided by the Finnish Population Register until 2015. We employed an individual measure of longevity known as the realized probability of dying (RPD), which was calculated based on each participant’s age and sex, utilizing population life tables. RPD is based on a comparison of the survival time of each individual of a specific age and sex with the survival time of his/her peers in the total population. A regression tree analysis was used to examine individual-based longevity with RPD as an outcome. Results This relative measure of longevity (RPD) provided a complex regression tree where the most important characteristics were self-rated health, years of education, history of smoking, and functional ability. We identified several pathways leading to a long life such as individuals with (1) good self-rated health (SRH), short smoking history, and higher education, (2) good SRH, short smoking history, lower education, and excellent mobility, and (3) poor SRH but able to perform less demanding functions, aged 75 or older, willing to do things, and sleeping difficulties. Changes in the characteristics over time did not change the main results. Conclusion The simultaneous examination of a broad range of potential predictors revealed that longevity can be achieved under very different conditions and is achieved by heterogeneous groups of people.
Published: 2024
Full Text: View/download PDF

10. Classifying clinical phenotypes of functional recovery for acute traumatic spinal cord injury. An observational cohort study.

Author: Mputu Mputu, Pascal, Beauséjour, Marie, Richard-Denis, Andréane, Fallah, Nader, Noonan, Vanessa K., and Mac-Thiong, Jean-Marc
Subjects: *STATISTICAL models, *SENSES, *NEUROLOGIC examination, *HEALTH self-care, *WOUNDS & injuries, *MATHEMATICAL variables, *RANDOM forest algorithms, *RESEARCH funding, *DISABILITY evaluation, *SCIENTIFIC observation, *SEX distribution, *QUESTIONNAIRES, *MULTIPLE regression analysis, *SPINAL cord injuries, *FUNCTIONAL status, *REPORTING of diseases, *RETROSPECTIVE studies, *AGE distribution, *DISCHARGE planning, *DESCRIPTIVE statistics, *LONGITUDINAL method, *CONVALESCENCE, *EPIDEMIOLOGY, *DATA analysis software, *PHENOTYPES, *PHYSICAL mobility, *COMORBIDITY, *TIME, *HEALTH care teams, *NONPARAMETRIC statistics
Abstract: Purpose: Identify patient subgroups with different functional outcomes after SCI and study the association between functional status and initial ISNCSCI components. Methods: Using CART, we performed an observational cohort study on data from 675 patients enrolled in the Rick-Hansen Registry(RHSCIR) between 2014 and 2019. The outcome was the Spinal Cord Independence Measure (SCIM) and predictors included AIS, NLI, UEMS, LEMS, pinprick(PPSS), and light touch(LTSS) scores. A temporal validation was performed on data from 62 patients treated between 2020 and 2021 in one of the RHSCIR participating centers. Results: The final CART resulted in four subgroups with increasing totSCIM according to PPSS, LEMS, and UEMS: 1)PPSS < 27(totSCIM = 28.4 ± 16.3); 2)PPSS ≥ 27, LEMS < 1.5, UEMS < 45(totSCIM = 39.5 ± 19.0); 3)PPSS ≥ 27, LEMS < 1.5, UEMS ≥ 45(totSCIM = 57.4 ± 13.8); 4)PPSS ≥ 27, LEMS ≥ 1.5(totSCIM = 66.3 ± 21.7). The validation model performed similarly to the original model. The adjusted R-squared and F-test were respectively 0.556 and 62.2(P-value <0.001) in the development cohort and, 0.520 and 31.9(P-value <0.001) in the validation cohort. Conclusion: Acknowledging the presence of four characteristic subgroups of patients with distinct phenotypes of functional recovery based on PPSS, LEMS, and UEMS could be used by clinicians early after tSCI to plan rehabilitation and establish realistic goals. An improved sensory function could be key for potentiating motor gains, as a PPSS ≥ 27 was a predictor of a good function. IMPLICATIONS FOR REHABILITATION: After a traumatic Spinal Cord Injury (SCI), early neurological examination using the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) is recommended to determine initial injury severity and prognosis. This study identified three initial ISNCSCI components defining four subgroups of SCI patients with different expectations in functional outcomes, namely the initial pinprick sensory score, the Lower Extremity Motor Score, and the Upper Extremity Motor Score. Clinicians could use these subgroups early after tSCI to plan rehabilitation and set realistic therapeutic goals regarding functional outcomes. In clinical practice, careful and accurate assessment of pinprick sensation early after the SCI is crucial when predicting function or stratifying patients based on the expected function. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Assessing the predictive performance of the Bagging algorithm for genomic selection.

Author: Ghafouri-Kesbi, Farhad
Subjects: *BOOTSTRAP aggregation (Algorithms), *GAMMA distributions, *SINGLE nucleotide polymorphisms, *REGRESSION trees, *RANDOM forest algorithms
Abstract: The aim of the present study was to compare the predictive performance of the Bagging algorithm with other decision tree-based methods, including Regression Tree (RT), Random Forest (RF) and Boosting in genomic selection. A genome including ten chromosomes for 1,000 individuals on which 10,000 single nucleotide polymorphisms (SNP) were evenly distributed was simulated. QTL effects were assigned to 10% of the polymorphic SNPs, with effects sampled from a gamma distribution. Predictive performance measures including accuracy of prediction, reliability and bias were used to compare the methods. Computing time and memory requirements of the studied methods were also measured. In all methods studied, the accuracy of genomic evaluation increased following increase in the heritability level from 0.10 to 0.50. While RT was the most efficient user of time and memory, it was not recommended for genomic selection due to its poor predictive performance. The obtained results showed that the predictive performance of Bagging was equal to RF and higher than RT and Boosting. However, it required significantly higher computational time and memory requirements. Considering the overall performance, Bagging was recommended for genomic selection, especially when due to the size and structure of the genomic data, the use of RF is limited. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Overlapping coefficient in network-based semi-supervised clustering.

Author: Conversano, Claudio, Frigau, Luca, and Contu, Giulia
Subjects: *REGRESSION trees, *REGRESSION analysis, *MATRICES (Mathematics), *ALGORITHMS, *CLASSIFICATION
Abstract: Network-based Semi-Supervised Clustering (NeSSC) is a semi-supervised approach for clustering in the presence of an outcome variable. It uses a classification or regression model on resampled versions of the original data to produce a proximity matrix that indicates the magnitude of the similarity between pairs of observations measured with respect to the outcome. This matrix is transformed into a complex network on which a community detection algorithm is applied to search for underlying community structures which is a partition of the instances into highly homogeneous clusters to be evaluated in terms of the outcome. In this paper, we focus on the case the outcome variable to be used in NeSSC is numeric and propose an alternative selection criterion of the optimal partition based on a measure of overlapping between density curves as well as a penalization criterion which takes accounts for the number of clusters in a candidate partition. Next, we consider the performance of the proposed method for some artificial datasets and for 20 different real datasets and compare NeSSC with the other three popular methods of semi-supervised clustering with a numeric outcome. Results show that NeSSC with the overlapping criterion works particularly well when a reduced number of clusters are scattered localized. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Tree-based analysis of longevity predictors and their ten-year changes: a 35-Year mortality follow-up.

Author: Nosraty, Lily, Nevalainen, Jaakko, Raitanen, Jani, and Enroth, Linda
Subjects: REGRESSION trees, LONGEVITY, REGRESSION analysis, LIFE tables, MACHINE learning
Abstract: Background: Prior studies on longevity often examine predictors in isolation and rely solely on baseline information, limiting our understanding of the most important predictors and their dynamic nature. In this study, we used an innovative regression tree model to explore the common characteristics of those who lived longer than their age and sex peers in 35-years follow-up. We identified different pathways leading to a long life, and examined to how changes in characteristics over 10 years (from 1979 to 1989) affect the findings on longevity predictors. Methods: Data was obtained from the "Tampere Longitudinal Study on Ageing" (TamELSA) in Finland. Survey data was collected in 1979 from 1056 participants aged 60–89 years (49.8% men). In 1989, a second survey was conducted among 432 survivors from the 1979 cohort (40.2% men). Dates of death were provided by the Finnish Population Register until 2015. We employed an individual measure of longevity known as the realized probability of dying (RPD), which was calculated based on each participant's age and sex, utilizing population life tables. RPD is based on a comparison of the survival time of each individual of a specific age and sex with the survival time of his/her peers in the total population. A regression tree analysis was used to examine individual-based longevity with RPD as an outcome. Results: This relative measure of longevity (RPD) provided a complex regression tree where the most important characteristics were self-rated health, years of education, history of smoking, and functional ability. We identified several pathways leading to a long life such as individuals with (1) good self-rated health (SRH), short smoking history, and higher education, (2) good SRH, short smoking history, lower education, and excellent mobility, and (3) poor SRH but able to perform less demanding functions, aged 75 or older, willing to do things, and sleeping difficulties. Changes in the characteristics over time did not change the main results. Conclusion: The simultaneous examination of a broad range of potential predictors revealed that longevity can be achieved under very different conditions and is achieved by heterogeneous groups of people. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Calculation of the mechanical properties of high‐performance concrete employing hybrid and ensemble‐hybrid techniques.

Author: Zhang, Leilei and Zhao, Yuwei
Subjects: *OPTIMIZATION algorithms, *METAHEURISTIC algorithms, *STRUCTURAL engineering, *DATABASES, *REGRESSION trees
Abstract: This study aims to execute machine learning methods to predict the mechanical properties containing TS and CS of HPC. They are essential parameters for the durability, workability, and efficiency of concrete structures in civil engineering. In this regard, obtaining the estimation of the mechanical properties of HPC is complex energy and time‐consuming. Due to this, an observed database was compiled, including 168 datasets for CS and 120 for TS. This database trained and validated two machine learning models: SVR and RT. The models combine the prediction outputs from the meta‐heuristic algorithms to build hybrid and ensemble‐hybrid models, which include dwarf mongoose optimization, PPSO, and moth flame optimization. According to the observed outputs, the ensemble models have great potential to be a recourse to deal with the overfitting problem of civil engineering, thus leading to the development of more supportable and less polluting concrete structures. This research significantly improves the efficiency and accuracy of predicting vital mechanical properties in high‐performance concrete by integrating machine learning and metaheuristic algorithms, offering promising avenues for enhanced concrete structure design and development. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Machine learning prediction of methane, ethane, and propane solubility in pure water and electrolyte solutions: Implications for stray gas migration modeling.

Author: Kooti, Ghazal, Taherdangkoo, Reza, Chen, Chaofan, Sergeev, Nikita, Doulati Ardejani, Faramarz, Meng, Tao, and Butscher, Christoph
Subjects: *MACHINE learning, *OPTIMIZATION algorithms, *REGRESSION trees, *GAS migration, *HYDRAULIC fracturing, *ELECTROLYTE solutions, *SHALE gas
Abstract: Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs. A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep subsurface to shallow aquifers. The stray gas can dissolve in groundwater leading to chemical and biological reactions, which could negatively affect groundwater quality and contribute to atmospheric emissions. The knowledge of light hydrocarbon solubility in the aqueous environment is essential for the numerical modelling of flow and transport in the subsurface. Herein, we compiled a database containing 2129 experimental data of methane, ethane, and propane solubility in pure water and various electrolyte solutions over wide ranges of operating temperature and pressure. Two machine learning algorithms, namely regression tree (RT) and boosted regression tree (BRT) tuned with a Bayesian optimization algorithm (BO) were employed to determine the solubility of gases. The predictions were compared with the experimental data as well as four well-established thermodynamic models. Our analysis shows that the BRT-BO is sufficiently accurate, and the predicted values agree well with those obtained from the thermodynamic models. The coefficient of determination (R2) between experimental and predicted values is 0.99 and the mean squared error (MSE) is 9.97 × 10−8. The leverage statistical approach further confirmed the validity of the model developed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Estimation of the time of concentration of small watersheds located in Northeastern North America.

Author: Bolduc, Samuel, Mailhot, Alain, and Talbot, Guillaume
Subjects: *REGRESSION trees, *SQUARE root, *HYDROLOGY, *RIVER channels, *LAKES
Abstract: The time of concentration is an important concept in hydrology. It provides a characteristic hydrological response time (CHRT) useful in many applications. Estimation of the time of concentration is challenging because small watersheds (<100 km2) with sub-daily flow and precipitation records are uncommon. Many practitioners therefore use empirical equations developed from watersheds exposed to different climates and with different attributes. The main objective of this study is to develop an approach to estimate the CHRT from physiographic characteristics for small watersheds located in Ontario, Québec and the northeastern USA. Regression trees are used to identify the physiographic characteristics associated with CHRT. The fraction of lakes and wetlands was identified as the most significant attribute related to CHRT, followed by the ratio between the main watercourse length and the square root of the main watercourse slope. Uncertainties on estimated CHRT values based on regression tree are also provided. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Relationships between agronomic traits and characterization of the white oat ideotype for cultivation with and without chemical fertilization

Author: Murilo Vieira Loro, Ivan Carvalho, Genesio Luiz Meggiolaro Junior, Leonardo Cesar Pradebon, Jaqueline Piesanti Sangiovo, João Pedro Dalla Roza, and Willyan Júnior Adorian Bandeira
Subjects: avena sativa, correlation, path analysis, regression tree, kohonen map, Special aspects of education, LC8-6691, Technology
Abstract: This paper aimed to characterize and verify whether the linear relationships between agronomic traits of white oat are different between crops with and without chemical fertilization; and identify the agronomic ideotype that enhances the agronomic performance of white oats. Two uniformity tests were carried out with and without chemical fertilization in the 2020 harvest. In each trial, on 285 plants, agronomic traits were measured. Pearson's linear correlation coefficients and the direct and indirect effects of the trial analysis were calculated. The regression tree algorithm and Kohonen map neural network were used to identify the agronomic ideotype. The linear relationships between agronomic characters of white oat are similar between crops with and without chemical fertilization. White oat genotypes with greater panicle grain weight can be selected indirectly by panicle weight, regardless of cultivation with or without fertilization. White oat genotypes measuring 114.57 cm in height, 97.41 cm in panicle insertion, 18.11 cm in panicle length, 1.31 g in panicle weight and 27.92 grains in the panicle characterize the agronomic ideotype that maximizes panicle grain weight.
Published: 2024
Full Text: View/download PDF

18. Prediction of body weight of mixed breeds of pigs in Nigeria through morpho-biometric traits using classification and regression tree models.

Author: Mallam, I., Yakubu, A., and Achi, N. P.
Subjects: *WEIGHT of swine, *BODY weight, *MARKET prices, *ANIMAL industry
Abstract: The study was conducted to predict the body weight of mixed breeds of pigs in Nigeria through morpho-biometric traits (body length, chest girth, height at withers, ear length, head length, foreleg length, and hind leg length) using classification and regression tree models. The data were produced using 500 randomly selected mixed breeds of pigs from various farms in five Local Government Areas of Kaduna State, North West Nigeria. The collected data were analysed using the Statistical Package for Social Sciences (SPSS, 2016). Body weight correlated well with morphometric characteristics except with foreleg length, which had a low correlation and no significant (P>0.05) difference. Two body dimensions were shown to be more effective in predicting the body weight of the mixed-breeds based on the significance of the independent variables: chest girth and body length. The largest dividing variable was determined to be chest girth, which explained roughly 88.60 % of the difference in body weight. The decision tree model revealed that pigs with chest girth or chest circumference greater than 76.00 cm are expected to have a higher body weight, which livestock producers and researchers could use to determine the feed amount, drug dose, and market price of an animal, as well as the management, selection, and genetic improvement of mixed breeds of pigs in Nigeria. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. A new approach for classification of stretch-shortening cycle: Beyond 250 ms of ground contact time.

Author: Ünver, Evrim, Konşuk Ünlü, Hande, Yıldız, Adalet E., and Cinemre, Şükrü Alpan
Subjects: *SKELETAL muscle physiology, *BIOMECHANICS, *PLYOMETRICS, *RESEARCH funding, *ACHILLES tendon, *MUSCLE strength, *JUMPING, *COMPARATIVE studies, *TIME, *REGRESSION analysis
Abstract: The stretch-shortening cycle (SSC) has been classified into fast (<250 ms) and slow (>250 ms) groups based on ground contact time (GCT) threshold values. However, there are gaps in the literature on how the 250 ms threshold value was found and which variables affect it. The purpose of this study is to validate the 250 ms threshold by investigating the factors affecting this threshold. For this purpose, force–time variables during a drop jump (DJ) with a force plate and achilles tendon (AT) muscle-tendon unit mechanical properties using shear-wave elastography in 46 recreationally active men were analysed. A regression tree analysis was conducted using R studio to classify GCT with correlated variables (p < 0.05). The new GCT threshold values (GCT < 188 ms, 188 ≤ GCT < 222 ms and GCT ≥ 222 ms) were found according to the lowest root mean square error of approximation value (0.1985) at reactive strength index. Comparisons of GCT groups showed significant differences in force, time, power variables and AT length (p < 0.05). AT length is the main variable differentiating GCT groups: Short AT results in a short GCT and long AT results in a long GCT. This study reveals that SSC can be classified into three groups using new GCT threshold values, offering a new perspective for SSC assessment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Random forests regression for soft interval data.

Author: Gaona-Partida, Paul, Yeh, Chih-Ching, Sun, Yan, and Cutler, Adele
Abstract: AbstractAnalyzing soft interval data for uncertainty quantification has attracted much attention recently. Within this context, regression methods for interval data have been extensively studied. As most existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and the development of nonlinear regression tools for interval data is crucial. This paper proposes an interval-valued random forests model that defines the splitting criterion of variance reduction based on an L2 type metric in the space of compact intervals. The model simultaneously considers the centers and ranges of the interval data as well as their possible interactions. Unlike most linear models that require additional constraints to ensure mathematical coherences, the proposed random forests model estimates the regression function in a nonparametric way, and so the predicted interval length is naturally nonnegative without any constraints. Simulation studies show that the new method outperforms typical existing regression methods for various linear, semi-linear, and nonlinear data archetypes and under different error measures. To demonstrate the applicability, a real data example is presented where the price range data of the Dow Jones Industrial Average index and its component stocks are analyzed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. The historical lepto-variance of the US stock returns

Author: Vassilis Polimenis
Subjects: total variance, regression tree, lepto-variance, macro-variance, lepto-ratio, lepto-regression, Finance, HG1-9999, Statistics, HA1-4737
Abstract: Regression trees (RT) involve sorting samples based on a particular feature and identifying the splitting point that yields the highest drop in variance from a parent node to its children. The optimal factor for reducing mean squared error (MSE) is the target variable itself. Consequently, employing the target variable as the basis for splitting sets an upper limit on the reduction of MSE and, equivalently, a lower limit on the residual MSE. Building upon this observation, we define lepto-regression as the process of constructing an RT of a target feature on itself. Lepto-variance pertains to the portion of variance that cannot be mitigated by any regression tree, providing a measure of inherent variance at a specific tree depth. This concept is valuable as it offers insights into the intrinsic structure of the dataset by establishing an upper boundary on the "resolving power" of RTs for a sample. The maximal variance that can be accounted for by RTs with depths up to k is termed the sample k-bit macro-variance. At each depth, the overall variance within a dataset is thus broken into lepto- and macro-variance. We perform 1- and 2-bit lepto-variance analysis for the entire US stock universe for a large historical period since 1926. We find that the optimal 1-bit split is a 30–70 balance. The two children subsets are centered roughly at −1% and 0.5%. The 1-bit macro-variance is almost 42% of the total US stock variability. The other 58% is structure beyond the resolving power of a 1-bit RT. The 2-bit lepto-variance equals 26.3% of the total, with 42% and 47% of the 1-bit lepto-variance of the left and right subtree, respectively.
Published: 2024
Full Text: View/download PDF

22. Effect of environmental factors on conjugative transfer of antibiotic resistance genes in aquatic settings.

Author: Dadeh Amirfard, Katayoun, Moriyama, Momoko, Suzuki, Satoru, and Sano, Daisuke
Subjects: *HORIZONTAL gene transfer, *DRUG resistance in bacteria, *BACTERIAL transformation, *BACTERIAL conjugation, *REGRESSION trees
Abstract: Antimicrobial-resistance genes (ARGs) are spread among bacteria by horizontal gene transfer, however, the effect of environmental factors on the dynamics of the ARG in water environments has not been very well understood. In this systematic review, we employed the regression tree algorithm to identify the environmental factors that facilitate/inhibit the transfer of ARGs via conjugation in planktonic/biofilm-formed bacterial cells based on the results of past relevant research. Escherichia coli strains were the most studied genus for conjugation experiments as donor/recipient in the intra-genera category. Conversely, Pseudomonas spp. Acinetobacter spp. and Salmonella spp. were studied primarily as recipients across inter-genera bacteria. The conjugation efficiency (ce) was found to be highly dependent on the incubation period. Some antibiotics, such as nitrofurantoin (at ≥0.2 µg ml−1) and kanamycin (at ≥9.5 mg l−1) as well as metallic compounds like mercury (II) chloride (HgCl2, ≥3 µmol l−1), and vanadium (III) chloride (VCl3, ≥50 µmol l−1) had enhancing effect on conjugation. The highest ce value (−0.90 log10) was achieved at 15°C–19°C, with linoleic acid concentrations <8 mg l−1, a recognized conjugation inhibitor. Identifying critical environmental factors affecting ARG dissemination in aquatic environments will accelerate strategies to control their proliferation and combat antibiotic resistance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Performance Comparison of Machine Learning Models for Concrete Compressive Strength Prediction.

Author: Sah, Amit Kumar and Hong, Yao-Ming
Subjects: *ARTIFICIAL neural networks, *COMPRESSIVE strength, *STANDARD deviations, *CONCRETE testing, *MACHINE performance, *REGRESSION trees, *MACHINE learning
Abstract: This study explores the prediction of concrete compressive strength using machine learning models, aiming to overcome the time-consuming and complex nature of conventional methods. Four models—an artificial neural network (ANN), a multiple linear regression, a support vector machine, and a regression tree—are employed and compared for performance, using evaluation metrics such as mean absolute deviation, root mean square error, coefficient of correlation, and mean absolute percentage error. After preprocessing 1030 samples, the dataset is split into two subsets: 70% for training and 30% for testing. The ANN model, further divided into training, validation (15%), and testing (15%), outperforms others in accuracy and efficiency. This outcome streamlines compressive strength determination in the construction industry, saving time and simplifying the process. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. PREDICTING BODY WEIGHT OF THREE CHICKEN GENOTYPES FROM LINEAR BODY MEASUREMENTS USING MARS AND CART DATA MINING ALGORITHMS.

Author: ASSAN, N., MPOFU, M., MUSASIRA, M., MOKOENA, K., TYASI, T. L., and MWAREYA, N.
Subjects: BODY weight, STANDARD deviations, LENGTH measurement, DATA mining, CHICKENS
Abstract: The aim of the current study was to predict the body weight from linear body measurements of Astrolope, Boschveld and indigenous Sacco genotype using Classification and regression tree (CART) and Multivariate Adaptive Regression Spline (MARS) algorithm. A total of 389 body weight (BW) records, including five continuous predictors such as Neck length (NL), body circumference (BC), shank length (SL), body length (BL) and shank circumference (SC) were used. The best model was selected based on goodness of fit, such as, standard deviation ratio (SDR), root mean square error (RMSE), coefficient of variation (CV), adjusted coefficient of determination (ARsq), coefficient of determination (Rsq) and Pearson's correlation coefficients (PC). The Rsq (%) values ranged from 59 (MARS) to 69 (CART). The lowest SDR was recorded by CART (0.56) and the highest by MARS (0.70). The CART was selected to be the best algorithm with sex, genotype, SC, SL, BL, NL, and BC as influential predictor of BW. The heaviest body weight on females of genotype (Boschveld, Sacco) was recorded when BL was less than 43 cm and BL higher than 47 cm. The goodness of fit criteria suggest that CART model outperformed the MARS model on predicting the body weight of the three genotypes. The findings will assist farmers in the prediction of body wight and selection of heavier chickens. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Short-Term Load Forecasting Based on Optimized Random Forest and Optimal Feature Selection.

Author: Magalhães, Bianca, Bento, Pedro, Pombo, José, Calado, Maria do Rosário, and Mariano, Sílvio
Subjects: *RANDOM forest algorithms, *FEATURE selection, *REGRESSION trees, *FORECASTING, *ELECTRIC power consumption, *COST control
Abstract: Short-term load forecasting (STLF) plays a vital role in ensuring the safe, efficient, and economical operation of power systems. Accurate load forecasting provides numerous benefits for power suppliers, such as cost reduction, increased reliability, and informed decision-making. However, STLF is a complex task due to various factors, including non-linear trends, multiple seasonality, variable variance, and significant random interruptions in electricity demand time series. To address these challenges, advanced techniques and models are required. This study focuses on the development of an efficient short-term power load forecasting model using the random forest (RF) algorithm. RF combines regression trees through bagging and random subspace techniques to improve prediction accuracy and reduce model variability. The algorithm constructs a forest of trees using bootstrap samples and selects random feature subsets at each node to enhance diversity. Hyperparameters such as the number of trees, minimum sample leaf size, and maximum features for each split are tuned to optimize forecasting results. The proposed model was tested using historical hourly load data from four transformer substations supplying different campus areas of the University of Beira Interior, Portugal. The training data were from January 2018 to December 2021, while the data from 2022 were used for testing. The results demonstrate the effectiveness of the RF model in forecasting short-term hourly and one day ahead load and its potential to enhance decision-making processes in smart grid operations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Multiscale computation of different plan curvature forms to enhance the prediction of soil properties in a low-relief watershed.

Author: Khanifar, Javad and Khademalrasoul, Ataallah
Subjects: *CART algorithms, *DIGITAL soil mapping, *CURVATURE, *SOIL moisture, *DIGITAL elevation models
Abstract: This study focuses on the multiscale calculation of different plan curvature forms to enhance the modeling of soil penetration resistance and gravimetric soil water content utilizing the classification and regression trees algorithm in a low-relief watershed. To that end, three forms of plan curvature were derived using the Wood method from a two-meter digital elevation model on six neighborhood sizes. The results showed that the neighborhood size influenced the plan curvature values and there was little difference between the utilization of three forms of plan curvature in the landform determination. The modeling results indicated that the three forms of plan curvature on most neighborhood scales have different contributions to each other in modeling the spatial variability of each soil property. The neighborhood scale was a critical factor in soil modeling because it controls the smoothing rate of plan curvature. The overall results suggest that soil models with poor performance could be constructed if the plan curvature forms and the neighborhood size are not considered in the geomorphometric analysis. Therefore, it is recommended to use the procedure implemented in this study for digital soil mapping in various regions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Taxi-out time prediction at Mohammed V Casablanca Airport.

Author: Zbakh, Douae, El Gonnouni, Amina, Benkacem, Abderrahmane, Kasttet, Mohammed Said, and Lyhyaoui, Abdelouahid
Subjects: AIR travel, TRAFFIC estimation, STANDARD deviations, SUPPORT vector machines, MACHINE learning, REGRESSION trees, AIRPORTS
Abstract: Airports are vital for global connectivity. However, the increasing volume of air travel has presented significant challenges in airport managing. Accurate predictions of taxi-out times (TXOT) offer potential to enhance airport performance, minimize delays, optimize airline schedules, and enhance customer satisfaction. This paper focuses on developing a machine learning model to forecast taxi-out times at Mohammed V Airport. Historical taxiing data from various airports will be analyzed to predict taxi-out times based on diverse runway-stand combinations and congestion levels. we used neural network (NN), support vector machines (SVM), and regression tree (RT) in order to create a real-time model that forecasts TXOT and congestion levels for different runway-stand combinations. The result showed that the NN model outperformed other forecasting models when their performances are compared using the mean absolute percentage error, root mean square error as accuracy measures. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. A Novel Machine Learning-Based Approach for Fault Detection and Location in Low-Voltage DC Microgrids.

Author: Salehimehr, Sirus, Miraftabzadeh, Seyed Mahdi, and Brenna, Morris
Abstract: DC microgrids have gained significant attention in recent years due to their potential to enhance energy efficiency, integrate renewable energy sources, and improve the resilience of power distribution systems. However, the reliable operation of DC microgrids relies on the early detection and location of faults to ensure an uninterrupted power supply. This paper aims to develop fast and reliable fault detection and location mechanisms for DC microgrids, thereby enhancing operational efficiency, minimizing environmental impact, and contributing to resource conservation and sustainability goals. The fault detection method is based on compressed sensing (CS) and Regression Tree (RT) techniques. Besides, an accurate fault location method using the feature matrix and long short-term memory (LSTM) model combination has been provided. To implement the proposed fault detection and location method, a DC microgrid equipped with photovoltaic (PV) panels, the vehicle-to-grid (V2G) charging station, and a hybrid energy storage system (ESS) are used. The simulation results represent the proposed methods' superiority over the recent studies. The fault occurrence in the studied DC microgrid is detected in 1 ms, and the proposed fault location method locates the fault with an accuracy of more than 93%. The presented techniques enhance DC microgrid reliability while conserving renewable resources, vital to promoting a greener and more sustainable power grid. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. 2D score-based estimation of heterogeneous treatment effects

Author: Ye Steven Siwei, Chen Yanzhen, and Padilla Oscar Hernan Madrid
Subjects: observational data, subgroup treatment effects, regression tree, matching, 62d20, 62g05, Mathematics, QA1-939, Probabilities. Mathematical statistics, QA273-280
Abstract: Statisticians show growing interest in estimating and analyzing heterogeneity in causal effects in observational studies. However, there usually exists a trade-off between accuracy and interpretability for developing a desirable estimator for treatment effects, especially in the case when there are a large number of features in estimation. To make efforts to address the issue, we propose a score-based framework for estimating the conditional average treatment effect (CATE) function in this article. The framework integrates two components: (i) leverage the joint use of propensity and prognostic scores in a matching algorithm to obtain a proxy of the heterogeneous treatment effects for each observation and (ii) utilize nonparametric regression trees to construct an estimator for the CATE function conditioning on the two scores. The method naturally stratifies treatment effects into subgroups over a 2d grid whose axis are the propensity and prognostic scores. We conduct benchmark experiments on multiple simulated data and demonstrate clear advantages of the proposed estimator over state-of-the-art methods. We also evaluate empirical performance in real-life settings, using two observational data from a clinical trial and a complex social survey, and interpret policy implications following the numerical results.
Published: 2023
Full Text: View/download PDF

30. Patterns and drivers of amphibian and reptile road mortality vary among species and across scales: Evidence from eastern Ontario, Canada

Author: Joshua D. Jones, Ori Urquhart, Evelyn Garrah, Ewen Eberhardt, and Ryan K. Danby
Subjects: Road ecology, Wildlife-vehicle collisions, Roadkill, Conservation biology, Herpetofauna, Regression tree, Ecology, QH540-549.5
Abstract: The mortality of wildlife on roadways is a major conservation concern worldwide. Amphibians and reptiles are especially vulnerable to vehicular collisions, and this is of particular concern in the Frontenac Arch Biosphere Reserve (Ontario, Canada) where several species are near their geographic limits of distribution and designated as species-at-risk. We completed regular surveys (n = 270) of two major highways in the Reserve, each slightly less than 40 km in length. All observations of wildlife-vehicle collisions were documented for two years on each road, including 18,278 frogs, turtles, and snakes. We used kernel density estimation to map relative magnitude of this mortality and built a suite of regression tree models to assess the influence of landcover and other habitat factors on roadkill at two scales (1 ha and 20 ha). Sample size was large enough to conduct species-level analyses for Chrysemys picta marginata (midland painted turtle) and Nerodia sipedon (northern watersnake). Spatial clustering of roadkill was evident on both roads and for all taxa. However, the extent of clustering varied between the two roadways due to differences in landcover pattern and clustering was more discrete for frogs and turtles than for snakes. For frogs, turtles, and northern watersnakes we found that elevated levels of mortality were positively associated with the amount of wetland and open water in adjacent areas as well as the proximity of water features. However, mortality locations for other species of snakes were more closely associated with upland habitat types. While some generalities emerge from our study, the variation also suggests that caution be exercised when attempting to extend results to different taxa and roadways, especially since these results may vary with scale. Nonetheless, scale-related differences can be informative for identifying the location of roadkill mitigation efforts and we illustrate how such an approach could be implemented for snakes that exhibit less discrete clustering of mortality.
Published: 2024
Full Text: View/download PDF

31. A regression tree method for longitudinal and clustered data with multivariate responses.

Author: Jing, Wenbo and Simonoff, Jeffrey S.
Subjects: *REGRESSION trees, *PANEL analysis, *LONGITUDINAL method, *MULTICASTING (Computer networks)
Abstract: In this paper, we propose a tree-based method called Multivariate RE-EM tree, which combines the regression tree and the linear mixed effects model for modeling multivariate response longitudinal or clustered data. The Multivariate RE-EM tree method estimates a population-level single tree structure that is driven by the multiple responses simultaneously and object-level random effects for each response variable, where correlation between the response variables and between the associated random effects are each allowed. Through simulation studies, we verify the advantage of the Multivariate RE-EM tree over the use of multiple univariate RE-EM trees and the Multivariate Regression Tree. We apply the Multivariate RE-EM tree to analyze a real data set that contains multidimensional nonfinancial characteristics of poverty of different countries as responses, and various potential causes of poverty as predictors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Explaining Central Government's Tax Revenue Categories through the Bradley-Terry Regression Trunk Model.

Author: Baldassarre, Alessio, D'Ambrosio, Antonio, and Conversano, Claudio
Subjects: *INTERNAL revenue, *INCOME tax, *PUBLIC finance, *REGRESSION trees, *LOG-linear models
Abstract: The Bradley-Terry Regression Trunk (BTRT) model combines the log-linear Bradley-Terry model, including subject-specific covariates, with a particular tree-based model, the so-called regression trunk. It aims to consider simultaneously the main effects and the interaction effects of covariates on data expressed as paired comparisons. We apply this model to financial data expressed as rankings and then transformed into paired comparisons. Tax revenues differentiated by category represent the statistical units of the analysis (i.e., taxes on income, social security contributions, taxes on property, and taxes on goods and services). We combine data from OECD, World Bank, and IMF databases for the year 2018 to investigate the effect size of socio-economic covariates and their interaction on the composition of tax revenues for a set of 100 countries worldwide. We also present a comparison with a more established method proposed in tax determinants literature and with two alternative models used for matched pairs. Finally, we discuss the implications of reported results for stakeholders and policymakers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. The Bradley–Terry Regression Trunk approach for Modeling Preference Data with Small Trees.

Author: Baldassarre, Alessio, Dusseldorp, Elise, D'Ambrosio, Antonio, Rooij, Mark de, and Conversano, Claudio
Subjects: LOG-linear models, DATA modeling, JUDGES, REGRESSION analysis, TREES
Abstract: This paper introduces the Bradley–Terry regression trunk model, a novel probabilistic approach for the analysis of preference data expressed through paired comparison rankings. In some cases, it may be reasonable to assume that the preferences expressed by individuals depend on their characteristics. Within the framework of tree-based partitioning, we specify a tree-based model estimating the joint effects of subject-specific covariates over and above their main effects. We, therefore, combine a tree-based model and the log-linear Bradley-Terry model using the outcome of the comparisons as response variable. The proposed model provides a solution to discover interaction effects when no a-priori hypotheses are available. It produces a small tree, called trunk, that represents a fair compromise between a simple interpretation of the interaction effects and an easy to read partition of judges based on their characteristics and the preferences they have expressed. We present an application on a real dataset following two different approaches, and a simulation study to test the model's performance. Simulations showed that the quality of the model performance increases when the number of rankings and objects increases. In addition, the performance is considerably amplified when the judges' characteristics have a high impact on their choices. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. Neural Networks with Dependent Inputs.

Author: Boskabadi, Mostafa and Doostparast, Mahdi
Subjects: REGRESSION trees, NEUROPLASTICITY, WEIGHT training, MONTE Carlo method, DATA science
Abstract: Neural networks and decision tree algorithms are essential tools in machine learning and data science. They deal with patterns among inputs and provide predictions for targets. In this article, we use a hybrid approach in regression trees by incorporating possible dependencies among inputs and apply neural networks in terminal nodes. The proposed approach implements neural networks on the basis of dependency structures among inputs. We allow that the weights in training neural networks differ in various terminal nodes. In both regression and classification problems, the performance of the new approach is assessed by analyzing various real datasets and by conducting a Monte–Carlo simulation study. We show that the proposed approach provides more flexibility for neural networks when associations among inputs are observed. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

35. Decision Tree-Supported Analysis of Gallium Arsenide Growth Using the LEC Method.

Author: Tang, Xia, Chappa, Gagan Kumar, Vieira, Lucas, Holena, Martin, and Dropka, Natasha
Subjects: DECISION making, GALLIUM arsenide, COMPUTATIONAL fluid dynamics, DECISION trees, CRYSTAL growth
Abstract: In this study, an axisymmetric Czochralski furnace model for the LEC growth of gallium arsenide is presented. We produced 88 datasets through computational fluid dynamics simulations. Among the many parameters that affect crystal growth, a total of 13 input parameters were selected, including the geometry and material parameters of the hot zone (crucible, heaters, radiation shield, and crystal), as well as the process parameters (such as pulling and rotation rates, heating power, etc.). Voronkov criteria (v/Gn), interface deflection, and the average interface temperature gradient were selected as the output parameters. We carried out a correlation analysis between the variables and used decision trees to study the impact of the 13 input variables on the output variables. The results indicated that in the growth of gallium arsenide, the main factor affecting interface deflection and the average interface thermal gradients is the crucible rotation rate. For v/Gn, it is the pulling rate. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

36. Comparative analysis of regression algorithms for the prediction of NavIC differential corrections.

Author: Karthan, Madhu Krishna and Perumalla, Naveen Kumar
Subjects: *REGRESSION analysis, *ARTIFICIAL satellites in navigation, *COMPARATIVE studies, *FORECASTING, *CITIES & towns, *REGRESSION trees
Abstract: Indian Regional Navigation Satellite System (IRNSS) or Navigation with Indian Constellation (NavIC) provides positioning, navigation and timing information services to various users in Indian region. Standalone NavIC may not meet the position accuracies for certain application such as civil aviation. Differential NavIC is used for improving the position accuracy of rover receiver, which make use of differential corrections (transmitted from reference station). However, if the satellite signals are temporarily lost due to abruptly changing atmosphere, satellite health issues or if the satellite signals are attenuated due to city infrastructures in urban areas, tree canopies, the accuracy of NavIC will be degraded. This article compares regression tree and bagging tree based differential corrections prediction algorithm with the actual differential corrections, by considering the NavIC satellite signal strength (C/No) and elevation angle (El), to improve the NavIC positioning accuracy. The improvement in the position accuracy is obtained by utilizing predicted differential corrections. The position accuracy of rover using actual differential corrections (2DRMS – 3.09 m), regression tree predicted differential corrections (2DRMS – 5.96 m) and bagged tree predicted differential corrections (2DRMS – 3.06 m) are compared. Here, the rover accuracy using actual differential corrections and bagged tree predicted differential corrections are approximately equal. So, the position accuracy using bagged tree predicted differential corrections are accurate when compared to regression tree predicted differential corrections. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

37. Predicting and Mapping Dominant Height of Oriental Beech Stands Using Environmental Variables in Sinop, Northern Turkey.

Author: Yener, Ismet and Guvendi, Engin
Abstract: The dominant height of forest stands (SDH) is an essential indicator of site productivity in operational forest management. It refers to the capacity of a particular site to support stand growth. Sites with taller dominant trees are typically more productive and may be more suitable for certain management practices. The present study investigated the relationship between the dominant height of oriental beech stands and numerous environmental variables, including physiographic, climatic, and edaphic attributes. We developed models and generated maps of SDH using multilinear regression (MLR) and regression tree (RT) techniques based on environmental variables. With this aim, the total height, diameter at breast height, and age of sample trees were measured on 222 sample plots. Additionally, topsoil samples (0–20 cm) were collected from each plot to analyze the physical and chemical soil properties. The statistical results showed that latitude, elevation, mean annual maximum temperature, and several soil attributes (i.e., bulk density, field capacity, organic carbon, and pH) were significantly correlated with the SDH. The RT model outperformed the MLR model, explaining 57% of the variation in the SDH with an RMSE of 2.37 m. The maps generated by both models clearly indicated an increasing trend in the SDH from north to south, suggesting that elevation above sea level is a driving factor shaping forest canopy height. The assessments, models, and maps provided by this study can be used by forest planners and land managers, as there is no reliable data on site productivity in the studied region. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

38. Corruption, quality of institutions and growth

Author: Beyaert, Arielle, García-Solanes, José, and Lopez-Gomez, Laura
Published: 2023
Full Text: View/download PDF

39. Segmenting tourists by length of stay using regression tree models

Author: Jackman, Mahalia and Naitram, Simon
Published: 2023
Full Text: View/download PDF

40. Modelling the Symphyotrichum lanceolatum invasion in Slovakia, Central Europe

Author: Michalová, Martina, Hrabovský, Michal, Kubalová, Silvia, and Miháliková, Tatiana
Published: 2024
Full Text: View/download PDF

41. Corruption, quality of institutions and growth

Author: Arielle Beyaert, José García-Solanes, and Laura Lopez-Gomez
Subjects: Growth, Regression tree, Corruption, Institutional quality, Solow model, Economics as a science, HB71-74
Abstract: Purpose – This paper aims to apply regression-tree analysis to capture the nonlinear effects of corruption on economic growth. Using data of 103 countries for the period 1996–2017, the authors endogenously detect two distinct areas in corruption quality in which the members share the same model of economic growth. Design/methodology/approach – The authors apply regression tree analysis to capture the nonlinearity of the influences. This methodology allows us to split endogenously the whole sample of countries and characterize the different ways through which corruption impacts economic growth in each group of countries. Findings – The traditional determinants of economic growth have different impacts on countries depending on their level of corruption, which, in turn, confirms the parameter heterogeneity of the Solow model found in other strands of the literature. Originality/value – The authors apply a new approach to a worldwide sample obtaining novel results.
Published: 2023
Full Text: View/download PDF

42. RECLAIM: Renewable Energy Based Demand-Side Management Using Machine Learning Models

Author: Zohaib Asghar, Kamran Hafeez, Dilshad Sabir, Bilal Ijaz, Syed Sabir Hussain Bukhari, and Jongsuk Ro
Subjects: Artificial neural network, regression tree, linear regression, demand side management, machine learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The diesel generators sets (DGs) and battery storage systems (BSS) are the essential energy sources in a modern high-rise buildings. In this paper DG, BSS and Photovoltaic system (PV) has been considered to minimize the grid power injection using a centralized Energy Management System (EMS). Machine Learning (ML) techniques are used to predict the performance of various regression models by comparing grid power and load curves. It includes Artificial Neural Network (ANN), Wide Neural Network (WNN), Linear Regression (LR), Linear Regression Interaction (LR-I), Linear Regression Stepwise (LR-S), Regression Fine Tree (RF-T), Regression Coarse Tree (RC-T) and Gaussian Process Regression (GPR) based techniques. The Demand Side Management (DSM) techniques such as peak shaving and valley filling is integrated with ML technique in a Hybrid energy source (HS) system.The comparative analysis of results depicts the effective reshaping of the grid profile without scheduling or disconnecting the loads. Matlab simulation software is used to validate the results.
Published: 2023
Full Text: View/download PDF

43. Prescriptive price optimization using optimal regression trees

Author: Shunnosuke Ikeda, Naoki Nishimura, Noriyoshi Sukegawa, and Yuichi Takano
Subjects: Price optimization, Demand forecasting, Regression tree, Mixed-integer optimization, Coordinate ascent, Mathematics, QA1-939
Abstract: This paper is concerned with prescriptive price optimization, which integrates machine learning models into price optimization to maximize future revenues or profits of multiple items. The prescriptive price optimization requires accurate demand forecasting models because the prediction accuracy of these models has a direct impact on price optimization aimed at increasing revenues and profits. The goal of this paper is to establish a novel framework of prescriptive price optimization using optimal regression trees, which can achieve high prediction accuracy without losing interpretability by means of mixed-integer optimization (MIO) techniques. We use the optimal regression trees for demand forecasting and then formulate the associated price optimization problem as a mixed-integer linear optimization (MILO) problem. We also develop a scalable heuristic algorithm based on the randomized coordinate ascent for efficient price optimization. Simulation results demonstrate the effectiveness of our method for price optimization and the computational efficiency of the heuristic algorithm.
Published: 2023
Full Text: View/download PDF

44. A novel LOF-based ensemble regression tree methodology.

Author: Öngelen, Gözde and İnkaya, Tülin
Subjects: *REGRESSION trees, *STANDARD deviations, *STATISTICAL hypothesis testing
Abstract: With the emergence of digitilization, numeric prediction has become a prominent problem in various fields including finance, engineering, industry, and medicine. Among several machine learning methods, regression tree is a widely preferred method due to its simplicity, interpretability and robustness. Motivated by this, we introduce a novel ensemble regression tree based methodology, namely LOF-BRT+OR. The proposed methodology is an integrated solution approach with outlier removal, regression tree and ensemble learning. First, irregular data points are removed using local outlier factor (LOF), which measures the degree of being an outlier for each point. Next, a novel regression tree with LOF weighted node model is introduced. In the proposed node model, the weights of the points in the nodes are determined according to their surrounding neighborhood, as a function of LOF values and neighbor ranks. Finally, in order to increase the prediction performance, ensemble learning is adopted. In particular, bootstrap aggregation is used to generate multiple regression trees with LOF weighted node model. The experimental study shows that the proposed methodology yields the best root mean squared error (RMSE) values in five out of nine data sets. Also, the non-parametric tests demonstrate the statistical significance of the proposed approach over the benchmark methods. The proposed methodology can be applicable to various prediction problems. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

45. A Hybrid Regression Model for Improving Prediction Accuracy.

Author: Poojari, Satyanarayana and B., Ismail
Subjects: *REGRESSION analysis, *REGRESSION trees, *MONTE Carlo method, *K-nearest neighbor classification, *FEATURE selection, *PREDICTION models
Abstract: Regression Tree (RT) and K-Nearest Neighbor (KNN) models play significant roles in machine learning. RT facilitates interpretable decision-making, aiding in the comprehension of complex data relationships, while KNN is valued for its simplicity, adaptability to non-linear data, and robustness to noise, making it a versatile tool across various applications. The primary drawback of Regression Tree is its tendency to assign the same predicted value (average value) to all tuples satisfying the same corresponding splitting criterion. K-Nearest Neighbors (KNN) is sensitive to irrelevant or redundant features since all features contribute to similarity. This paper proposes a hybrid regression model based on Regression Tree (RT) and KNN, addressing the aforementioned issues. The model's performance is compared with KNN using 10 types of distance measures and further assessed against RT, KNearest Neighbor regression (KNN), and Support Vector Regression (SVR) through a Monte Carlo simulation study. Simulation results indicate that the hybrid model outperforms all other regression models, regardless of sample size, when observations follow normal distributions or t-distributions. The proposed model's effectiveness is demonstrated through a real-life application using data on global warming in Delhi. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. Árvore de regressão para previsão da produtividade de matéria fresca da parte aérea de teosinto em função de variáveis meteorológicas.

Author: Reis, Mikael B., Cargnelutti Filho, Alberto, Loro, Murilo V., Augusto Andretta, João, Ortiz, Vithória M., and Schuller, Bruno R.
Subjects: GLOBAL radiation, SOLAR radiation, CORN, HEAT radiation & absorption, REGRESSION analysis, REGRESSION trees, SOWING
Abstract: Copyright of Sigmae is the property of Universidade Federal de Alfenas (UNIFAL-MG) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023

47. Predição da massa fresca e massa seca da parte aérea da planta de teosinto em função de caracteres morfológicos.

Author: Konrad, Marcelo, Cargnelutti Filho, Alberto, Loro, Murilo V., Reis, Mikael B., Ortiz, Vithoria M., Augusto Andretta, João, and Schuller, Bruno R.
Subjects: REGRESSION analysis, REGRESSION trees, AGRICULTURE, FORAGE plants, CULTIVATORS, SOWING
Abstract: Copyright of Sigmae is the property of Universidade Federal de Alfenas (UNIFAL-MG) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023

48. A comparative analysis of intelligent techniques to predict energy generated by a small wind turbine from atmospheric variables.

Author: Porras, Santiago, Jove, Esteban, Baruque, Bruno, and Calvo-Rolle, José Luis
Subjects: WIND turbines, WIND power, ENERGY consumption, CLEAN energy, COMPARATIVE studies, RENEWABLE energy sources
Abstract: The harmful consequences of fossil fuels use has resulted in the promotion of clean and renewable energies. During the past decades, green technologies have experienced a strong development, paying especial attention to wind energy, that covers a significant share of the electric energy demand. In this context, the main efforts are focused on the optimization of wind generator facilities, not only in the mechanic design but also in the energy management. Then, the present work deals with the prediction of the energy generated in a small wind turbine placed in a bioclimatic house located on the north west region of Spain. This includes an analysis of the characteristics of the atmospheric variables registered during the turbine operation for a period of one year and an exploratory examination of a range of regression techniques in order to assess the suitability of using the registered information to predict the installation's power generation levels on the short term. The study detailed in this work proves that this objective is an attainable one with a good degree of accuracy. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

49. Measuring technical, environmental and eco-efficiency in municipal solid waste management in Chile

Author: Ramon Sala-Garrido, Manuel Mocholi-Arce, Maria Molinos-Senante, and Alexandros Maziotis
Subjects: eco-efficiency, undesirable outputs, non-radial data envelopment analysis, regression tree, circular economy, municipal solid waste, Engineering (General). Civil engineering (General), TA1-2040
Abstract: Moving towards a circular and sustainable economy requires improving the management of municipal solid waste (MSW) increasing recycling rates while minimising unsorted waste and operational costs. It is essential to evaluate the economic and environmental efficiency of MSW services. Previous studies focused on developed countries and employed radial parametric and non-parametric methods. By contrast, this study assessed the technical efficiency, environmental efficiency and eco-efficiency of several Chilean municipalities employing the non-radial range adjusted measure approach. A second stage of assessment was conducted to explore the influence of a set of environmental variables on efficiency scores. Results revealed that the evaluated Chilean municipalities performed poorly from a technical perspective since the average technical efficiency score was 0.484. By contrast, average environmental efficiency and eco-efficiency scores were 0.899 and 0.922, respectively. Nevertheless, the percentage of eco-efficient municipalities was lower than 1%. It was also found that tourism had a major and negative impact on all types of efficiency. By contrast, population density had a significant and positive impact on environmental efficiency. The assessment of three types of efficiency provides relevant information to policymakers to define specific strategies to improve MSW management according to sustainability and circular economic objectives.
Published: 2022
Full Text: View/download PDF

50. Winter Road Friction Estimations via Multi-Source Road Weather Data—A Case Study of Alberta, Canada

Author: Xueru Ding and Tae J. Kwon
Subjects: road friction estimation, road weather information systems (RWIS), regression tree, kriging interpolation, Engineering (General). Civil engineering (General), TA1-2040
Abstract: Road friction has long been recognized as one of the most effective winter road maintenance (WRM) performance measures. It allows WRM personnel to make more informed decisions to improve their services and helps road users make trip-related decisions. In this paper, a machine-learning-based methodological framework was developed to model road friction using inputs from mobile road weather information systems (RWIS) that collect spatially continuous road weather data and road grip. This study also attempts to estimate friction using data from stationary RWIS that are installed far from each other, thereby leaving large areas unmonitored. To fill in the spatial gaps, a kriging interpolator was developed to create a continuous friction map. Slippery road risk levels were classified to provide an overview of road conditions via a risk warning map. The proposed method was evaluated with a selected highway segment in Alberta, Canada. Results show that the models developed herein are highly accurate (93.3%) in estimating friction and identifying dangerous road segments via a color-coded risk map. Given its high performance, the developed model has the potential for large-scale implementation to facilitate more efficient WRM services while also improving the safety and mobility of the traveling public.
Published: 2022
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

823 results on '"regression tree"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources