255 results on '"generalized additive model"'
Search Results
2. Retrieval of water quality parameters based on IOA-ML models and their response to short-term hydrometeorological factors
- Author
-
Hu, Wentong, Miao, Donghao, Zhang, Chi, He, Zixian, Gu, Wenquan, and Shao, Dongguo
- Published
- 2025
- Full Text
- View/download PDF
3. Application of a delta-generalized additive model to assess the impact of environmental changes on the spatial distribution of bigeye tuna (Thunnus obesus) in the Indian Ocean
- Author
-
Lurkpranee, Supatcha and Kitakado, Toshihide
- Published
- 2025
- Full Text
- View/download PDF
4. Impact of coronavirus disease (COVID-19) on gaseous pollutants and particulate matter in a hot arid climate
- Author
-
Albanai, Jasem A., Shehab, Maryam, Vatresia, Arie, Jasim, Marium, Al-Dashti, Hassan, and Yassin, Mohamed F.
- Published
- 2025
- Full Text
- View/download PDF
5. Relationships between site index modeling of crimean juniper and habitat factors
- Author
-
Kuzugudenli, Emre and Ozkan, Kursad
- Published
- 2025
- Full Text
- View/download PDF
6. Estimating monthly NO2, O3, and SO2 concentrations via an ensemble three-stage procedure with downscaled satellite remote sensing data and ground measurements
- Author
-
Chen, Chu-Chih, Wang, Yin-Ru, Wang, Fu-Cheng, Shiu, Yi-Shiang, Wu, Chang-Fu, and Lin, Tang-Huang
- Published
- 2024
- Full Text
- View/download PDF
7. Spatial distributions, driving factors, and threshold effects of soil organic carbon stocks in the Tibetan Plateau
- Author
-
Sun, Zheng, Liu, Feng, Yang, Fei, Wang, Decai, and Zhang, Gan-Lin
- Published
- 2025
- Full Text
- View/download PDF
8. Optimizing potassium management for enhanced cotton yields in China's diverse agro-ecological regions
- Author
-
Liang, Hongbang, Yin, Feihu, Zhang, Jinzhu, Zhang, Jihong, Zhao, Yue, Zhao, Tao, Li, Deyi, and Wang, Zhenhua
- Published
- 2025
- Full Text
- View/download PDF
9. Modelling height to crown base using non-parametric methods for mixed forests in China
- Author
-
Zhou, Zeyu, Zhang, Huiru, Sharma, Ram P., Zhang, Xiaohong, Feng, Linyan, Du, Manyi, Zhang, Lianjin, Feng, Huanying, Hu, Xuefan, and Yu, Yang
- Published
- 2025
- Full Text
- View/download PDF
10. Reimagining habitat suitability modeling for Pacific saury (Cololabis saira) in the Northwest Pacific Ocean through acoustic data analysis from fishing vessels
- Author
-
Xue, Minghua, Tong, Jianfeng, Ma, Wen, Zhu, Zhenhong, Wang, Weiqi, Lyu, Shuo, and Chen, Xinjun
- Published
- 2025
- Full Text
- View/download PDF
11. Precipitation is the most crucial factor determining the distribution of moso bamboo in Mainland China
- Author
-
Shi, Peijian, Preisler, Haiganoush K., Quinn, Brady K., Zhao, Jie, Huang, Weiwei, Röll, Alexander, Cheng, Xiaofei, Li, Huarong, and Hölscher, Dirk
- Published
- 2020
- Full Text
- View/download PDF
12. Predicting Hydrological Drought Conditions of Boryeong Dam Inflow Using Climate Variability in South Korea.
- Author
-
Noh, Seonhui, Felix, Micah Lourdes, Oh, Seungchan, and Jung, Kwansue
- Abstract
When a hydrological drought occurs due to a decrease in water storage, there is no choice but to supply limited water. Because this has a devastating impact on the community, it is necessary to identify causes and make predictions for emergency planning. The state of change in dam inflow can be used to confirm hydrological drought conditions using the Standardized Runoff Index (SRI), and meteorological drought and climate variability are used to identify causal relationships. Multiple Linear Regression (MLR) and Generalized Additive Model (GAM) models are developed to predict accumulated hydrological drought for 6, 12, and 24 months in the Boryeong Dam basin, and the Nash-Sutcliffe model efficiency coefficient (NSE) exceeded 0.4, satisfying the suitability criteria. The estimation ability is highest when predicting a 12-month annual drought, and reliability can be further increased by reflecting some climate fluctuations in a non-linear form. The droughts of 6 month and 24 month cumulative scales are significantly influenced by the Western Hemisphere Warm Pool (WHWP) extending from the eastern North Pacific to the North Atlantic and by the Nino 3.4 region in the tropical Pacific. Furthermore, it is anticipated that the drought conditions of the inflow volume to the Boryeong Dam will worsen with increasing sea surface temperatures in both regions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. High-dimensional sparse predictive modeling applied to varied correlation structure via the generalized additive model.
- Author
-
Bondaug, Farlley G. and Tubo, Bernadette F.
- Subjects
PRINCIPAL components analysis ,DISCRETE choice models ,PREDICTION models - Abstract
This paper explores the characteristics of a two-step procedure (dimension reduction and function approximation) in discrete choice modeling with high-dimension data. This study proposes the SS-GAM procedure which is an extension of the Super Sparse Principal Component Analysis (SSPCA) where the results are further processed with the Generalized Additive Model (GAM) in a classification problem. Moreover, the Orthogonal Sparse Principal Component Analysis with GAM (OS-GAM) is also proposed. For baseline comparison, the General Adaptive Sparse-PCA with GAM (GAS-GAM) is considered in this paper. The performance of these three sparse PCA methods are investigated with varied underlying correlation structure. In the simulation study, it is demonstrated that with varied degree of dimensionality, and levels of correlation structure, SS-GAM performed better compared to OS-GAM and GAS-GAM in terms of its predictive rate, on the average. It was observed that the OS-GAM performed best when data exhibits low correlation structure. However, with high correlation structure, OS-GAM and GAS-GAM obtained comparable result. Moreover, in terms of computational time, OS-GAM seems to be not affected by the increase of feature dimension. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Estimating test-day milk yields by modeling proportional daily yields: Going beyond linearity.
- Author
-
Wu, Xiao-Lin, Wiggans, George R., Norman, H. Duane, Enzenauer, Heather A., Miles, Asha M., Van Tassell, Curtis P., Baldwin VI, Ransom L., Burchard, Javier, and Dürr, João
- Subjects
- *
MILK yield , *INDEPENDENT variables , *CORRECTION factors , *PROPORTIONAL hazards models , *MILK contamination , *REGRESSION analysis , *MILKFAT - Abstract
In the United States, lactation milk yields are not measured directly but are calculated from the test-day milk yields. Still, test-day milk yields are estimated from partial yields obtained from single milkings. Various methods have been proposed to estimate test-day milk yields, primarily to deal with unequal milking intervals dating back to the 1970s and 1980s. The Wiggans model is a de facto method for estimating test-day milk yields in the United States, which was initially proposed for cows milked 3 times daily, assuming a linear relationship between a proportional test-day milk yield and milking interval. However, the linearity assumption did not hold precisely in Holstein cows milked twice daily because of prolonged and uneven milking intervals. The present study reviewed and evaluated the nonlinear models that extended the Wiggans model for estimating daily or test-day milk yields. These nonlinear models, except step functions, demonstrated smaller errors and greater accuracies for estimated test-day milk yields compared with the conventional methods. The nonlinear models offered additional benefits. For example, the locally weighted regression model (e.g., locally estimated scatterplot smoothing) could utilize data information in scalable neighborhoods and weigh observations according to their distance in milking interval time. General additive models provide a flexible, unified framework to model nonlinear predictor variables additively. Another drawback of the conventional methods is a loss of accuracy caused by discretizing milking interval time into large bins while deriving multiplicative correction factors for estimating test-day milk yields. To overcome this problem, we proposed a general approach that allows milk yield correction factors to be derived for every possible milking interval time, resulting in more accurately estimated test-day milk yields. This approach can be applied to any model, including nonparametric models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Meteorological and traffic effects on air pollutants using Bayesian networks and deep learning.
- Author
-
Lin, Yuan-Chien, Lin, Yu-Ting, Chen, Cai-Rou, and Lai, Chun-Yeh
- Subjects
- *
SHORT-term memory , *LONG-term memory , *RAINFALL , *AIR quality , *BAYESIAN analysis , *AIR pollutants , *AIR pollution - Abstract
Traffic emissions have become the major air pollution source in urban areas. Therefore, understanding the highly non-stational and complex impact of traffic factors on air quality is very important for building air quality prediction models. Using real-world air pollutant data from Taipei City, this study integrates diverse factors, including traffic flow, speed, rainfall patterns, and meteorological factors. We constructed a Bayesian network probability model based on rainfall events as a big data analysis framework to investigate understand traffic factor causality relationships and condition probabilities for meteorological factors and air pollutant concentrations. Generalized Additive Model (GAM) verified non-linear relationships between traffic factors and air pollutants. Consequently, we propose a long short term memory (LSTM) model to predict airborne pollutant concentrations. This study propose a new approach of air pollutants and meteorological variable analysis procedure by considering both rainfall amount and patterns. Results indicate improved air quality when controlling vehicle speed above 40 km/h and maintaining an average vehicle flow < 1200 vehicles per hour. This study also classified rainfall events into four types depending on its characteristic. Wet deposition from varied rainfall types significantly affects air quality, with TypeⅠrainfall events (long-duration heavy rain) having the most pronounced impact. An LSTM model incorporating GAM and Bayesian network outcomes yields excellent performance, achieving correlation R 2 > 0.9 and 0.8 for first and second order air pollutants, i.e., CO, NO, NO 2 , and NO x ; and O 3 , PM 10 , and PM 2.5 , respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. Short-term forecasting of COVID-19 using support vector regression: An application using Zimbabwean data.
- Author
-
Shoko, Claris and Sigauke, Caston
- Abstract
• COVID-19 undermines progress toward SDG number 3 of good health and well-being. • The COVID-19 pandemic has hard hit the SADC region and Zimbabwe has not been left out. • Statistical forecasting models play a vital role in predicting future pandemic and threats. • Support vector regression with pairwise interactions provides robust forecasts. • The can assist decision-makers in planning adequate policies. This study aims to show that including pairwise hierarchical interactions of covariates and combining forecasts from individual models improves prediction accuracy. The least absolute shrinkage and selection operator via hierarchical pairwise interaction is used in selecting variables that are not correlated and with the greatest predictive power in single forecast models (Gradient boosting method [GBM], Generalized additive models [GAMs], Support vector regression [SVR]) are used in the analysis. The best model was selected based on the mean absolute error (MAE), the best key performance indicator for skewed data. Forecasts from the 5 models were combined using linear quantile regression averaging (LQRA). Box and Whiskers plots are used to diagnose the overall performance of fitted models. Single forecast models (GBM, GAMs, and SVRs) show that including pairwise interactions improves forecast accuracy. The SVR model with interactions based on the radial basis kernel function is the best from single forecast models with the lowest MAE. Combining point forecasts from all the single forecast models using the LQRA approach further reduces the MAE. However, based on the Box and Whiskers plot, the SVR model with pairwise interactions has the smallest range. Based on the key performance indicators, combining predictions from several individual models improves forecast accuracy. However, overall, the SVM with pairwise hierarchical interactions outperforms all the other models [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Alignment of words and actions? Government environmental attention and enterprise digital transformation.
- Author
-
Shi, Peihao and Huang, Qinghua
- Subjects
DIGITAL transformation ,ORGANIZATIONAL legitimacy ,TECHNOLOGICAL innovations ,HUMAN capital ,EMPIRICAL research - Abstract
This study employs a unique dataset of A-share listed companies from 2007 to 2021 to investigate the impact of government environmental attention on enterprise digital transformation (EDT) and its underlying mechanisms. Empirical findings document that in response to upholding organizational legitimacy, government environmental attention remarkably favors EDT. This core conclusion remains robust after addressing endogeneity concerns and alternative robustness tests. Mechanism analysis unveils that intensified government environmental attention propels companies to expedite green technology innovation, alleviate financing constraints, and enhance human capital quality, all accelerating EDT. Subsequent investigations indicate that heightened government environmental attention notably impacts EDT more for larger plants and highly polluting industries. In regions with lower financial development, government environmental attention serves as a reliable signaling mechanism, motivating EDT. These findings guide plants in accelerating EDT for enhanced sustainable development while shedding light on the evolving mechanisms of government attention allocation and EDT, which offers insights for future research on the correlation between government actions and corporate environmental governance. • Government environmental attention significantly favors enterprise digital transformation. • Accelerating innovation, easing financing constraints, and raising human capital are key mechanisms in this context. • The impacts is more for larger enterprises and highly polluting industries. • The Generalized Additive Model (GAM) is employed in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. The multiscale response of global cropland cropping intensity to urban expansion.
- Author
-
Ma, Chen, Li, Manchun, and Jiang, Penghui
- Subjects
- *
ANTHROPOGENIC effects on nature , *GLOBAL environmental change , *URBAN growth , *DOUBLE cropping , *AGRICULTURAL intensification - Abstract
Urban expansion(UE) and multiple cropping(MC) are key factors in anthropogenic impacts on global environmental change. However, the multi-scale response patterns of UE and MC have not yet been revealed, and how urbanization affects cropland intensification is still not deeply explored. This study examines the spatial and temporal trends in global UE and MC and analyses the multi-scale response patterns of both. The GEE (Google Earth Engine) platform was used to count global cropland cropping intensity (CCI) and impervious surface rasters on an image-by-image basis, while GIS was employed for spatial analyses, and the generalized additive model (GAM) was applied to inscribe variable response trajectories. The global multiple cropping index(MCI) increased significantly (by 4.1 %) over the period 2001–2019, with growth in double- and triple-cropped cropland dominating this change. Double cropping, as a widespread global farming strategy, has led to a shift towards the intensification of agriculture, with countries in the northern hemisphere contributing more. Global UE significantly expands to twice the baseline level at an average annual growth rate of 2. 42 × 104 km2 over the period 2001–2019, with the expansion of world-class urban agglomerations in China, the United States and Europe dominates this trend. The spatial clustering of MC and UE has continued to intensify, with high-intensity cropping strategies progressively clustering towards areas of significant UE, and this tendency has a clear decreasing urban-rural gradient effect. The global CCI grows significantly and non-linearly with UE, but an important inflection point in the growth trajectory has occurred under the influence of threshold effects. Developed countries tend to be more flexible in their cropping intensity strategies as they move forward with urbanization. There is a clear increasing effect of the urban-rural gradient in the degree of non-linearity in the response of urbanization and cropping intensity. The results of the study contribute to the understanding of the complex spatial and temporal coupling mechanisms between UE and MC, and provide useful insights for the development of trade-offs between urbanization and maturity strategies. [Display omitted] • The multi-scale coupling of global cropland cropping intensity and urban expansion was investigated. • High-intensity cropping strategies are gradually towards urbanization hotspots. • Global cropland cropping intensity increases non-linearly with urbanization and there is a clear inflection point in the trajectory. • Urbanization in developed countries favours more flexible planting strategies and there is an incremental urban-rural gradient effect. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Interpretable machine learning tools to analyze PM2.5 sensor network data so as to quantify local source impacts and long-range transport.
- Author
-
de Foy, Benjamin, Edwards, Ross, Joy, Khaled Shaifullah, Zaman, Shahid Uz, Salam, Abdus, and Schauer, James J.
- Subjects
- *
MACHINE learning , *VERTICAL mixing (Earth sciences) , *SENSOR networks , *HUMIDITY control , *ARTIFICIAL intelligence - Abstract
Sensor networks provide spatially resolved information about the time variation of PM 2.5 concentrations in urban areas around the world. With relatively simple improvements to the control of the temperature and humidity of incoming air, and with proper quality assurance and calibration protocols, a low cost monitor was developed that provides measurements that were highly correlated with a reference PM 2.5 monitor. These have sufficient resolution and reliability to quantify small differences within an urban area. Nevertheless, many sites report similar concentrations and it can be difficult to interpret the results or distinguish local from regional effects. Generalized Additive Models are an effective Machine Learning method to distinguish the impact of factors across very different scales. As a type of Interpretable Machine Learning / eXplainable Artificial Intelligence, they provide direct information on the link between specific factors and PM 2.5 concentrations. GAM simulations were developed for sensors located around Dhaka, Bangladesh, for both the dry and the wet seasons. The simulations show that the largest contributor to high PM 2.5 concentration variations across both urban and peri-urban sites is the boundary layer height which represent the vertical mixing of the urban plume. Using Trajectory Cluster Concentration Impacts, the simulations showed that robust estimates of long-range transport could be obtained from measurements located within a polluted environment, and the model further showed that enhancements of more than 40 μg / m 3 were associated with air transport from the Indo-Gangetic Plain in the dry season. Finally, interaction maps of the effect of horizontal wind speed and direction showed that these could be associated with up to +/−20 % variation in PM 2.5 from site to site. Most of the enhancements are related to very calm winds and appear to be more strongly associated with road emissions than with point sources. Overall, the sensor network shows that air is polluted throughout the Dhaka area and into the peripheries, and that a multipronged approach will be needed to improve air quality for its inhabitants. • GAM simulations identify the impacts of local sources within the urban area. • Monitoring sites on the periphery are impacted by industrial sources and brick kilns. • Trajectory Cluster Concentration Impacts provide reliable estimates of long-range transport. • Vertical mixing, representative of urban emissions, is the largest factor in the dry season. • Network map shows strong similarities in hourly PM 2.5 across the Dhaka metropolis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. A Statistical Non-Parametric data analysis for COVID-19 incidence data.
- Author
-
Minu, R.I. and Nagarajan, G.
- Subjects
COVID-19 pandemic ,STATISTICS ,COVID-19 ,RURAL population ,DATA analysis - Abstract
The impact of COVID-19 on the Global scale is tremendously drastic. There are several types of research going on across the world simultaneously to understand and overcome this dire pandemic outbreak. This paper is purely a statistical study on a distinct set of datasets regarding COVID-19 in India. The motivation of this study is to provide an insight into the rapid growth of confirmed COVID-19 cases in India. The rapid growth of COVID-19 cases in India started in March 2020. The main objective of this paper is to provide a solid statistical model for the policymaker to handle this kind of pandemic situation in the near future with nonlinear data. In this paper, the data was got from 1st April to 29th November 2020. To come up with a solid statistical model, various nonlinear data such as confirmed COVID-19 cases, maximum temperature, minimum temperature, the total population (state-wise), the total area in km2 (state-wise), and the total rural and urban population count (state-wise) have been analyzed. In this paper, six different Generalized Additive Models (GAM) was identified after a thorough analysis of other researchers' (Xie and Zhu, 2020; Prata et al., 2020) findings. In all perspectives, the results were identified and analyzed. The GAM model regarding total COVID-19 confirmed cases, total population, and the total rural population provides the best average fit of R2 value of 0.934. As the population value is quite high, the author has concise it using logarithm to provide the best p -value of 0.000542 and 0.001407 for a relation between the total number of COVID-19 cases regarding the total population and total rural population respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Differentially private and explainable boosting machine with enhanced utility.
- Author
-
Baek, Incheol and Chung, Yon Dohn
- Subjects
- *
MACHINE learning , *DATA privacy , *PRIVACY , *ARTIFICIAL intelligence - Abstract
In this paper, we introduce DP-EBM*, an enhanced utility version of the Differentially Private Explainable Boosting Machine (DP-EBM). DP-EBM* offers predictions for both classification and regression tasks, providing inherent explanations for its predictions while ensuring the protection of sensitive individual information via Differential Privacy. DP-EBM* has two major improvements over DP-EBM. Firstly, we develop an error measure to assess the efficiency of using privacy budget, a crucial factor to accuracy, and optimize this measure. Secondly, we propose a feature pruning method, which eliminates less important features during the training process. Our experimental results demonstrate that DP-EBM* outperforms the state-of-the-art differentially private explainable models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Estimating rates of change to interpret quantitative wastewater surveillance of disease trends.
- Author
-
Holcomb, David A., Christensen, Ariel, Hoffman, Kelly, Lee, Allison, Blackwood, A. Denene, Clerkin, Thomas, Gallard-Góngora, Javier, Harris, Angela, Kotlarz, Nadine, Mitasova, Helena, Reckling, Stacie, de los Reyes III, Francis L., Stewart, Jill R., Guidry, Virginia T., Noble, Rachel T., Serre, Marc L., Garcia, Tanya P., and Engel, Lawrence S.
- Published
- 2024
- Full Text
- View/download PDF
23. Relationships between water quality of a long-distance inter-basin water diversion project and air pollution emissions along the canal: Distributions, lag effects, and nonlinear responses.
- Author
-
Nong, Xizhi, Luo, Kunting, Lin, Minzhi, Chen, Lihua, and Long, Di
- Subjects
EMISSIONS (Air pollution) ,AIR quality ,WATER pollution ,WATER quality ,EMISSION standards ,WATER diversion ,AIR pollution - Abstract
Understanding and quantifying the influences and contributions of air pollution emissions on water quality variations is critically important for surface water quality protection and management. To address this, we created a five-year daily data matrix of six water quality indicators—permanganate index (COD Mn), NH 3 -N, pH, turbidity, conductivity, and dissolved organic matter (DOM)—and six air pollution indicators—O 3 , CO, NO 2 , SO 2 , 2.5 μm particulate matter (PM 2.5), and inhalable particles (PM 10)—using data from seven national monitoring stations along the world's longest water-diversion project, the Middle Route of the South-to-North Water Diversion Project in China (MR-SNWD). Multivariate techniques (Mann–Kendall, Spearman's correlation, lag correlation, and Generalized Additive Models [GAMs]) were applied to examine the nonlinear relationships and lag effects of air pollution on water quality. Air pollution and water quality exhibited marked spatial heterogeneity along the MR-SNWD, with all water quality parameters meeting Class I or II national standards and the air pollution indicators exceeding those thresholds. Except for COD Mn and DOM, the other water quality and air pollution indicators exhibited significant seasonal differences. Air pollution exhibited significant lag effects on water quality at the northern stations, with NO 2 , SO 2 , PM 2.5 , and PM 10 being highly correlated with changes in pH, with an average lag of 17 d. Based on the GAMs, lag effects enhanced the significant nonlinear relationships between air pollution and water quality, increasing the average deviance explained for COD Mn , NH 3 -N, and pH by 93%, 24%, and 41%, respectively. These findings provide a scientific basis for protecting water quality along the long-distance inter-basin water-diversion project under anthropogenic air pollution. [Display omitted] • Time lag effects of air pollution emissions on water quality variations were verified. • Nonlinear water quality and air pollution emission relationships were identified. • SO 2 and NO 2 were the major indicators explaining water quality variations. • Single air pollution indicator has a limited impact on water quality. • Air pollution impacts on water quality degradation should not be overestimated. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Pre-analytical stability and physiological fluctuations affect plasma steroid hormone outcomes: A real-world study.
- Author
-
Zhong, Jian, Wang, Danchen, Xie, Shaowei, Li, Ming, Yin, Yicong, Yu, Jialei, Ma, Chaochao, Yu, SongLin, and Qiu, Ling
- Subjects
- *
LIQUID chromatography-mass spectrometry , *STEROID hormones , *SEX hormones , *CORTISONE , *ADRENAL diseases - Abstract
Since steroids are crucial for diagnosing endocrine disorders, the lack of research on factors that affect hormone levels makes interpreting the results difficult. Our study aims to assess the stability of the pre-analytical procedure and the impact of hormonal physiological fluctuations using real-world data. The datasets were created using 12,418 records from individuals whose steroid hormone measurements were taken in our laboratory between September 2019 and March 2024. 22 steroid hormones in plasma by a well-validated liquid chromatography tandem mass spectrometry method were measured. After normalization transformation, outlier removal, and z-score normalization, generalized additive models were constructed to evaluate preanalytic stability and age, sex, and sample time-dependent hormonal fluctuations. Most hormones exhibit significant variability with age, particularly steroid hormone precursors, sex hormones, and certain corticosteroids such as aldosterone. 18-hydroxycortisol, 18-oxocortisol. Sex hormones varied between males and females. Levels of certain hormones, including cortisol, cortisone, 11-deoxycortisol, 18-hydroxycortisol, 18-oxocortisol, corticosterone, aldosterone, estrone, testosterone, dihydrotestosterone, dehydroepiandrosterone sulfate, 11-ketotestosterone, and 11-hydroxytestosterone, fluctuated with sampling time. Moreover, levels of pregnenolone and progesterone decreased within 1 hour of sampling, with pregnenolone becoming unstable with storage time at 4 degrees after centrifugation, while other hormone levels remained relatively stable for a short period of time without or after centrifugation of the sample. This is the first instance real-world data has been used to assess the pre-analytic stability of plasma hormones and to evaluate the impact of physiological factors on steroid hormones. • Pre-analytical stability of 22 plasma steroid hormones were evaluated. • Plasma steroids were stable whether centrifuged or not, except pregnenolone. • Levels of steroid hormones varied with age and sex. • Daytime rhythm of cortical hormones were observed using real world data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. FXAM: A unified and fast interpretable model for predictive analytics.
- Author
-
Jiang, Yuanyuan, Ding, Rui, Qiao, Tianchi, Zhu, Yunan, Han, Shi, and Zhang, Dongmei
- Subjects
- *
PREDICTION models , *MATHEMATICAL optimization , *MACHINE learning - Abstract
Predictive analytics aims to build machine learning models to predict behavior patterns and use predictions to guide decision-making. Predictive analytics is human involved, thus the machine learning model is preferred to be interpretable. In literature, Generalized Additive Model (GAM) is a standard for interpretability. However, due to the one-to-many and many-to-one phenomena which appear commonly in real-world scenarios, existing GAMs have limitations to serve predictive analytics in terms of both accuracy and training efficiency. In this paper, we propose FXAM (Fast and eXplainable Additive Model), a unified and fast interpretable model for predictive analytics. FXAM extends GAM's modeling capability with a unified additive model for numerical, categorical, and temporal features. FXAM conducts a novel training procedure called Three-Stage Iteration (TSI). TSI corresponds to learning over numerical, categorical, and temporal features respectively. Each stage learns a local optimum by fixing the parameters of other stages. We design joint learning over categorical features and partial learning over temporal features to achieve high accuracy and training efficiency. We prove that TSI is guaranteed to converge to the global optimum. We further propose a set of optimization techniques to speed up FXAM's training algorithm to meet the needs of interactive analysis. Thorough evaluations conducted on diverse data sets verify that FXAM significantly outperforms existing GAMs in terms of training speed, and modeling categorical and temporal features. In terms of interpretability, we compare FXAM with the typical post-hoc approach XGBoost+SHAP on two real-world scenarios, which shows the superiority of FXAM's inherent interpretability for predictive analytics. • A unified and fast interpretable model (FXAM) for predictive analytics is proposed. • FXAM extends modeling capability and efficiency of Generalized Additive Model. • FXAM conducts a novel training procedure called Three-Stage Iteration. • Joint learning and partial learning help achieve high accuracy and efficiency. • FXAM performs much better on synthetic data sets and 13 real datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Interpretable diurnal impacts on extreme urban PM2.5 concentrations of soil temperature, soil water content, humidity and temperature inversion.
- Author
-
de Foy, Benjamin and Schauer, James J.
- Subjects
- *
SOIL moisture , *TEMPERATURE inversions , *SOIL temperature , *PARTICULATE matter , *ATMOSPHERIC boundary layer - Abstract
Inhabitants of megacities around the world are suffering from severely unhealthy concentrations of PM 2.5 as a result of intense emissions from numerous sectors, often combined with adverse meteorological conditions. Machine learning algorithms can be applied to multiyear datasets of hourly measurements in order to identify the main drivers causing intense pollution. In particular, Generalized Additive Models (GAM) provide interpretable associations of the interplay between emissions and meteorology. GAM simulations were developed for five dry seasons in Kolkata and Dhaka. In the model, soil temperature was associated with 21% of PM 2.5 variability. Instead of 2-m humidity, model performance was improved by including air humidity at 1000 hPa and soil water content individually, with each accounting for around 6% of PM 2.5 variability. Boundary layer heights have a significant impact on daytime concentrations, but the GAM output showed that temperature inversion intensity better characterized the stability of the nocturnal boundary layer and had a larger contribution to the PM 2.5 variability. The GAM model could also identify interactions between parameters and showed that boundary layer height, temperature inversion intensity and air humidity had impacts that varied by time of day. By using GAM factors for winds at 100 m above the surface in combination with the Trajectory Cluster Contribution Function, the model estimated that local winds were associated with around 6% of variability whereas long range transport was associated with around 9% of variability. The analysis shows that there are no silver bullets for improving air quality and that adverse meteorology is making the problem harder to solve. Nevertheless, the results show that sustained efforts at controlling both local and regional sources will yield cleaner air that is greatly needed to improve the health of people living in large metropolitan areas. [Display omitted] • Generalized Additive Models yield interpretable associations of meteorology and hourly PM2.5 • Temperature inversion intensity characterizes the stable nocturnal boundary layer • Boundary layer height is a weaker predictor of PM2.5 than temperature inversion intensity • To improve machine learning model include soil temperature and soil water content • Humid air and drier soils both lead to higher PM2.5 concentrations [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Drivers of phytoplankton biomass and diversity in a macrotidal bay of the Amazon Mangrove Coast, a Ramsar site.
- Author
-
Cavalcanti, Lisana F., do N Feitosa, Fernando A., Cutrim, Marco V.J., Montes, Manuel de J.F., Lourenço, Caio B., Furtado, Jordana A., and dos S Sá, Ana Karoline D.
- Subjects
BIOMASS ,BIOLOGICAL productivity ,PHYTOPLANKTON ,SKELETONEMA costatum ,MANGROVE plants ,ECOHYDROLOGY ,MARINE biomass ,ECOSYSTEMS - Abstract
Biodiversity maintenance is a main goal in ecology. Hence, phytoplankton diversity and biomass were analyzed in a coastal bay (Cumã Bay) of the Amazon Macrotidal Mangrove Coast, which has been designated as an international hotspot for conservation (Ramsar site) with high biological productivity and diversity that provides crucial ecosystem services and elevated fish production. An ecohydrology-based approach was applied to identify the main factors that drive the patterns of phytoplankton diversity and biomass, considering spatio-temporal analyses of physical, chemical, and biological variables from May 2019 to June 2020. Phytoplankton dynamics were investigated using multivariate analyses, correlations, and generalized additive models. Seven indices were tested to select the most efficient biodiversity metric. The hydrological conditions of Cumã Bay were governed primarily by elevated precipitation and macrotidal dynamics, resulting in two different functional zones based on environmental variability: the freshwater influence zone and marine influence zone. Seasonally, the maximum freshwater discharge, low salinity and light availability promoted cell abundance and biomass increase, with blooms of Skeletonema costatum , which reduced the taxonomic diversity of the community in the rainy season. During the dry season, turbid waters resulting from macrotidal dynamics and wind speed limited light penetration and phytoplankton photosynthesis, leading to a higher uniformity in the species distribution. Shannon index was the most sensitive biodiversity metric to environmental changes. This study found that deterministic processes governed the community, which rainfall on the Amazon coast, along with wind speed, salinity, light availability and nutrients were the main controlling factors for phytoplankton diversity and richness. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Estimating the nonlinear association of online somatic cell count, lactate dehydrogenase, and electrical conductivity with milk yield.
- Author
-
Bonestroo, John, van der Voort, Mariska, Fall, Nils, Emanuelson, Ulf, Klaas, Ilka Christine, and Hogeveen, Henk
- Subjects
- *
MILK yield , *LACTATE dehydrogenase , *ELECTRIC conductivity , *SOMATIC cells , *DAIRY cattle , *MILKFAT , *MILK - Abstract
Reduction of milk yield is one of the principal components in the cost of mastitis. However, past research into the association between milk yield and mastitis indicators is limited. Past research has not been based on online or in-line daily measurements and has not fully explored nonlinearity and the thresholds at which milk yield starts to decrease. In dairy herds with automated milking systems equipped with sensors, mastitis indicators of individual cows are measured on an intraday frequency, which provides unprecedented avenues to explore such effects in detail. The aim of this observational study was primarily to investigate the nonlinear associations of lactate dehydrogenase (LDH), electrical conductivity (EC), and somatic cell count (SCC) with milk yield at various stages of lactation, parity, and mastitis chronicity status (i.e., whether the cow had SCC ≥200,000 SCC/mL for the last 28 d). We also investigated thresholds at which mastitis indicators (LDH, EC, and SCC) started to be negatively associated with milk yield. We used data from 21 automated milking system herds measuring EC and online SCC. Of these herds, 7 of the 21 additionally measured online LDH. We operationalized milk yield as milk synthesis rate in kilograms per hour. Applying a generalized additive model, we estimated the milk synthesis rate as a function of the 3 mastitis indicators for 3 different subgroups based on parity, stage of lactation, and mastitis chronicity. Partial dependence plots of the mastitis indicators were used to evaluate the milk synthesis rate to study if the milk synthesis rate was associated with mastitis indicators at a specific level. Results showed that milk synthesis rate decreased with increasing SCC, LDH, and EC, but in a nonlinear fashion. The thresholds at which milk synthesis rate started to decrease were 2.5 LnSCC (12,000 SCC/mL) to 3.75 LnSCC (43,000 SCC/mL), 0 to 1 LnLDH (1−2.7 U/L), and 5.0 to 6.0 mS/cm for EC. Additionally, another substantial decrease of milk synthesis rate was observed at thresholds of 5.625 LnSCC (277,000 SCC/mL) and 3 LnLDH (20 LDH U/L) but not for EC. Having chronic mastitis decreased milk synthesis rate in all models. The identified nonlinearities between mastitis indicators and milk synthesis rate should be incorporated in statistical models for more accurate estimations of milk loss due to mastitis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. On forecasting the intraday Bitcoin price using ensemble of variational mode decomposition and generalized additive model.
- Author
-
Gyamerah, Samuel Asante
- Subjects
HILBERT-Huang transform ,BITCOIN ,STANDARD deviations ,FORECASTING ,STATISTICAL models - Abstract
High frequency Bitcoin price series are often non-linear and non-stationary and hence forecasting the price of Bitcoin directly or by transformation using statistical models is subject to large errors. This paper presents an ensemble model using variational mode decomposition (VMD) and Generalized additive model (GAM) to forecast intraday Bitcoin price. To evaluate the performance of the constructed model, it is compared with an ensemble of empirical mode decomposition (EMD) and GAM. The results showed that VMD-GAM model performed better than the EMD-GAM ensemble model in terms of three evaluation metrics (root mean square error, mean absolute percentage error, and bias) used. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Quantifying the impact of anthropogenic emissions and aquatic environmental impacts on sedimentary mercury variations in a typical urban river.
- Author
-
Tang, Yi, Liu, Yang, He, Yong, Zhang, Jiaodi, Guo, Huaming, and Liu, Wenxin
- Subjects
MERCURY ,ANTHROPOGENIC effects on nature ,POLYCYCLIC aromatic hydrocarbons ,ATMOSPHERIC deposition ,ENVIRONMENTAL health ,HUMAN ecology - Abstract
In urban and industrial regions, sedimentary mercury (Hg) serves as the crucial indicator for Hg pollution, posing potential risks to ecology and human health. The physicochemical processes of Hg in aquatic environments are influenced by various factors such as anthropogenic emissions and aquatic environmental impacts, making it challenging to quantify the drivers of total mercury (THg) variations. Here, we analyzed the spatiotemporal variations, quantified driving factors, and assessed accumulation risks of sedimentary THg from the mainstream of a typical urban river (Haihe River). THg in the urban region (37−3237 ng g
−1 ) was significantly higher (t -test, p < 0.01) than in suburban (71−2317 ng g−1 ) and developing regions (156−916 ng g−1 ). The sedimentary THg in suburban and developing regions increased from 2003 to 2018, indicating the elevated atmospheric deposition of Hg. Together with the temperature, grain size of sediments, total organic carbon (TOC), the pH and salinity of water, 40 components of parent and substituted polycyclic aromatic hydrocarbons (PAHs) were first introduced to quantify the driver of sedimentary THg based on generalized additive model. Results showed that anthropogenic emissions, including three PAHs components (31%) and TOC (63%), accounted for 94% of sedimentary THg variations. The aquatic environmental impacts accounted for 5% of sedimentary THg variations. The geo-accumulation index of THg indicated moderate to heavy accumulation in the urban region. This study demonstrates that homologous pollutants such as PAHs can be used to trace sources and variations of Hg pollution, supporting their co-regulation as international conventions regulate pollutants. [Display omitted] • Sedimentary mercury posed the highest concentration in urban regions. • The long-term trend indicated enhanced atmospheric Hg deposition. • Organic matter mainly controlled the spatiotemporal variation of mercury. • 94% of the sedimentary mercury was attributed to anthropogenic discharges. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
31. A gradient descent algorithm for SNN with time-varying weights for reliable multiclass interpretation.
- Author
-
Jeyasothy, Abeegithan, Ramasamy, Savitha, and Sundaram, Suresh
- Subjects
STIMULUS generalization ,MACHINE learning ,CLASSIFICATION algorithms ,ALGORITHMS ,TIME-varying networks - Abstract
Interpretation of the prediction is vital for mission critical tasks. Accurate interpretation relies upon the generalization accuracy of the model. In this paper, we propose a modified gradient descent learning algorithm to improve the generalization ability of a Spiking Neural Network with time-varying weights (SNN-t). This algorithm is referred to as GradST, can help towards improving the interpretation of multiclass classification problems. We have transformed the SNN-t to a Generalized Additive Model (GAM) to provide interpretation. The resultant Spiking Additive Model (SAM) has the generalization ability of SNN-t and the interpretable characteristics of GAM for multiclass problems. We also propose a post-processing method to enable better visualization of multiple shape functions of GAMs, towards better relative interpretation for multiclass classification problems. The post-processing method utilizes the properties of multiclass GAMs to visually modify the shape functions to establish the importance of the feature in multiclass setting. We first evaluate the performance of SNN-t, trained with GradST and the SAM generated from it, on large public datasets. The SNN-t trained with GradST has better generalization accuracy than other SNN-t classifier and consequently, the SAM generated from it has better generalization accuracy than other state-of-the-art multiclass GAMs. Improved accuracy in SAM implies more reliable interpretation. Then, we evaluate the proposed post-processing method for multiclass GAMs to provide relative interpretation. It is observed that relative interpretation of multiclass GAM is more meaningful and reliable. • Modified gradient descent-based learning algorithm for a spiking neural classifier. • Transformation of spiking neural classifier to an interpretable classifier. • Improve generalization ability for reliable interpretation. • Post-processing method for relative interpretation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Nonlinear time effects of vegetation response to climate change: Evidence from Qilian Mountain National Park in China.
- Author
-
Li, Qiuran, Gao, Xiang, Li, Jie, Yan, An, Chang, Shuhang, Song, Xiaojiao, and Lo, Kevin
- Published
- 2024
- Full Text
- View/download PDF
33. Investigating the differences in driving mechanisms for phytoplankton community composition under various human disturbances in cold regions.
- Author
-
Zhang, Yongxin, Yu, Hongxian, Liu, Jiamin, and Guo, Yao
- Subjects
- *
ENVIRONMENTAL indicators , *PHYTOPLANKTON , *PRINCIPAL components analysis , *RANK correlation (Statistics) , *ELECTRIC conductivity ,COLD regions - Abstract
In river ecosystems, phytoplankton are essential components, and this study delves into their response to critical environmental indicators. We analyzed the spatiotemporal variations of phytoplankton diversity in cold regions, employing techniques such as Principal Component Analysis (PCA), Spearman correlation coefficient, and Generalized Additive Model (GAM) to unveil the critical environmental factors influencing phytoplankton diversity and habitat suitability under varying degrees of human interference. With 37 monitoring stations across two cold region rivers, phytoplankton and water bodies were monitored in spring, summer, and autumn, resulting in the collection and analysis of 111 phytoplankton samples and 13 environmental indicators at each station. By identifying critical water quality indicators and exploring nonlinear relationships, the study revealed intricate interactions between phytoplankton diversity and environmental factors. Our findings suggest that overall phytoplankton species diversity is lower in rivers with high human interference, with turbidity, NH 4 +-N, chlorophyll-a, and total phosphorus (TP) deemed critical factors. Conversely, chlorophyll-a, NH 4 +-N, and electrical conductivity (EC) were identified as important factors in rivers with lower human interference, indicating differences in phytoplankton driving mechanisms between the two rivers. These insights provide valuable references for the sustainable development and effective assessment of cold region river ecosystems in the Anthropocene. [Display omitted] • Under varying degrees of interference, differences exist in the composition of phytoplankton communities. • Under different types of interference, key environmental factors exhibit similarities, yet differences persist. • Building a GAM and adding interaction terms can improve model accuracy. • The driving mechanisms of phytoplankton vary between rivers with different hydrological conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Spatial constraints in cellular automata-based urban growth models: A systematic comparison of classifiers and input urban maps.
- Author
-
Bastos Moroz, Cassiano, Sieg, Tobias, and Thieken, Annegret H.
- Subjects
- *
METROPOLITAN areas , *HISTORICAL maps , *CITIES & towns , *RANDOM forest algorithms , *LOGISTIC regression analysis - Abstract
Spatial constraints are fundamental to integrating the spatial suitability to urbanization into Cellular Automata-based (CA) urban growth models, but there is a lack of consensus on the optimal methods for this purpose. This study compared the performance of three probabilistic classifiers to generate suitability surfaces for CA-based urban growth models: Logistic Regression using Generalized Linear Model (LR-GLM), Logistic Regression using Generalized Additive Model (LR-GAM), and Random Forest (RF). The study also evaluated the sensitivity of these classifiers to the input urban map adopted as a dependent variable. For this analysis, seven maps were tested: the historical urban map containing the entire extent of the urban footprint, and six additional maps containing only the recently urbanized areas over timeframes ranging from one year up to two decades. The comparison evaluated the goodness of fit of the suitability surfaces and the spatial accuracy of the urban growth simulations, using five large Brazilian cities as case study areas. The results revealed that the RF classifier significantly outperformed the LR-based classifiers. However, this overperformance was more prominent when incorporating the new urban cells over the last one to two decades of growth as input urban maps. In addition, the sensitivity analysis of the input urban maps emphasized the benefits of calibrating the classifier using the recently urbanized cells rather than the historical urban extent. We consistently observed these results concerning classifiers and input urban maps across all five case study areas. Thus, the RF classifier combined with a training dataset containing the newly urbanized areas over at least the last 10 years systematically resulted in the suitability surfaces with the highest predictability among all tested scenarios. • Highest predictibability was achieved with the random forest classifier. • Best calibration was obtained using the recently urbanized cells as input urban map. • These research findings were consistent across all five case study areas. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. A spatial statistical approach to estimate bus stop demand using GIS-processed data.
- Author
-
Montero-Lamas, Yaiza, Fernández-Casal, Rubén, Varela-García, Francisco-Alberto, Orro, Alfonso, and Novales, Margarita
- Subjects
- *
PUBLIC transit , *BUS lines , *BUS stops , *URBAN planning , *CITIES & towns , *URBAN transit systems - Abstract
This study integrates the fields of geography, urban transit planning, and statistical learning to develop a sophisticated methodology for predicting bus demand at the stop level. It uses a Generalized Additive Model that captures non-linear relationships and incorporates spatial dependence, improving traditional methods. It showcases a high predictive capacity with a pseudo R-squared of 0.79 during its validation, ensuring substantial explanatory power for new observations. A large number of variables, including land-use characteristics, socioeconomic factors, and transit supply, are analysed. These widely available predictors facilitate the transferability of the methodology to other urban areas. Transit supply predictor considers the number of annual trips per stop and area as well as the location of stops along the lines that serve them. GIS processing of the data allows the calculation of variables within the areas of influence of each stop, obtained by following the walkable street network. For the case study, the presence of universities, hospitals, and lodgings areas, as well as inhabitants and ratio of bus trips show a positive impact on bus demand. This geo-analysis process employs accurate disaggregated data, such as information on uses in each building, as well as methods for assigning socioeconomic information from local areas to residential buildings. This study highlights the complex relationship between the location of transit network stops, both along the bus line and in terms of geographical proximity, their transit supply, and its surrounding factors. The results indicate that there is spatial dependence for stops less than 1.15 km apart. The developed methodology provides reliable information to transit network planners for decision making. Specifically, this proposed methodology can contribute to designing new routes, optimizing stop locations, and estimating the impact of changes in the transit network or urban planning on bus demand. All these improvement measures promote sustainable urban mobility, consequently fostering environmental and social benefits. • Integration of geography, transit planning, and statistics in bus demand prediction. • Novel statistical approach with non-linear relationships and spatial dependence. • Detailed GIS-based analysis, cleaning and processing of open data. • Bus supply, socioeconomic data and land use analysis within stops' influence area. • Consideration of actual walking distances from stops using walkable street networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Considering scale effects in water quality analysis to enhance the precision of influencing factor response analysis.
- Author
-
Wu, Haowei, Wu, Feng, Li, Zhihui, Gao, Xing, Wu, Xianhua, and Bao, Guangjing
- Subjects
- *
WATER quality , *FACTOR analysis , *WATER analysis , *QUALITY factor , *ELECTRIC conductivity , *WATERSHED management , *BUFFER zones (Ecosystem management) , *ELECTRICAL conductivity measurement - Abstract
[Display omitted] • A framework of water quality factors analysis considering scale effect is developed. • The optimal scales of influencing factors on water quality in YJRB are given. • Optima-scale non-linear effects of influencing factors on water quality are found. • Thresholds for key factors that accelerate water quality degradation are clarified. Accurately identifying the optimal spatial scales of effect analysis of influencing factors on water quality is crucial for effective water environment management. To address this, we propose a framework that consists of mix scale division (watersheds, riparian buffers, and circular buffers), and conduct a case study in the Yuanjiang River Basin (YJRB). Scale effects and non-linear impacts of various influencing factors on water quality were identified. The case study results revealed the variations of these water quality indicators were predominated by different influencing factors at various sales, such as by slope (SL) at the scale of riparian buffer (w) = 500 m and circular buffer (r) = 15 km (with a contribution percentage of 16.6 %), mean annual temperature (TE) at the watershed scale (23.4 %), annual precipitation (PR) at the scale of w = watershed and r = 15 km etc. The percentage of cultivated land (CL) > 28 % at the scale (w = 500 m, r = 50 km) will lead to increase in total phosphorus (TP) and SL > 26° at the scale (w = 500 m, r = 15 km) will lead to increase in pondus hydrogenii (pH). While, POP > 30 person per unit area at the scale (w = 100 m, r = 1 km), SL > 24° at the scale (w = 500 m, r = 15 km), NDVI > 0.48 at the scale (w = 500 m, r = 5 km) and PR > 1300 mm at the scale (w = watershed, r = 15 km) will lead to decrease in ammonia nitrogen (NH 3 -N) , total nitrogen (TN), electrical conductivity (EC) and turbidity (NTU) respectively. Results indicated that the precision can be improved by regulating influencing factors within scale effects and considering the non-linear effects of factors on water quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. A unified framework of analyzing missing data and variable selection using regularized likelihood.
- Author
-
Bian, Yuan, Yi, Grace Y., and He, Wenqing
- Subjects
- *
MISSING data (Statistics) , *ELECTRONIC data processing - Abstract
Missing data arise commonly in applications, and research on this topic has received extensive attention in the past few decades. Various inference methods have been developed under different missing data mechanisms, including missing at random and missing not at random. The assessment of a feasible missing data mechanism is, however, difficult due to the lack of validation data. The problem is further complicated by the presence of spurious variables in covariates. Focusing on missingness in the response variable, a unified modeling scheme is proposed by utilizing the parametric generalized additive model to characterize various types of missing data processes. Taking the generalized linear model to facilitate the dependence of the response on the associated covariates, the concurrent estimation and variable selection procedures are developed using regularized likelihood, and the asymptotic properties for the resultant estimators are rigorously established. The proposed methods are appealing in their flexibility and generality; they circumvent the need of assuming a particular missing data mechanism that is required by most available methods. Empirical studies demonstrate that the proposed methods result in satisfactory performance in finite sample settings. Extensions to accommodating missingness in both the response and covariates are also discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Comparison of methods for predicting cow composite somatic cell counts.
- Author
-
Anglart, Dorota, Hallén-Sandgren, Charlotte, Emanuelson, Ulf, and Rönnegård, Lars
- Subjects
- *
MILK quality , *SOMATIC cells , *COWS , *HEALTH of cattle , *FORECASTING , *MACHINE learning , *MILKING - Abstract
One of the most common and reliable ways of monitoring udder health and milk quality in dairy herds is by monthly cow composite somatic cell counts (CMSCC). However, such sampling can be time consuming, and more automated sampling tools entail extra costs. Machine learning methods for prediction have been widely investigated in mastitis detection research, and CMSCC is normally used as a predictor or gold standard in such models. Predicted CMSCC between samplings could supply important information and be used as an input for udder health decision-support tools. To our knowledge, methods to predict CMSCC are lacking. Our aim was to find a method to predict CMSCC by using regularly recorded quarter milk data such as milk flow or conductivity. The milk data were collected at the quarter level for 8 wk when milking 372 Holstein-Friesian cows, resulting in a data set of 30,734 records with information on 87 variables. The cows were milked in an automatic milking rotary and sampled once weekly to obtain CMSCC values. The machine learning methods chosen for evaluation were the generalized additive model (GAM), random forest, and multilayer perceptron (MLP). For each method, 4 models with different predictor variable setups were evaluated: models based on 7-d lagged or 3-d lagged records before the CMSCC sampling and additionally for each setup but removing cow number as a predictor variable (which captures indirect information regarding cows' overall level of CMSCC based on previous samplings). The methods were evaluated by a 5-fold cross validation and predictions on future data using models with the 4 different variable setups. The results indicated that GAM was the superior model, although MLP was equally good when fewer data were used. Information regarding the cows' level of previous CMSCC was shown to be important for prediction, lowering prediction error in both GAM and MLP. We conclude that the use of GAM or MLP for CMSCC prediction is promising. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
39. Distinctive trajectories of the COVID-19 epidemic by age and gender: A retrospective modeling of the epidemic in South Korea.
- Author
-
Yu, Xinhua, Duan, Jiasong, Jiang, Yu, and Zhang, Hongmei
- Subjects
- *
COVID-19 , *NEGATIVE binomial distribution , *COVID-19 pandemic , *OLDER people , *EPIDEMICS - Abstract
• The epidemic dynamics of COVID-19 by age and gender groups in South Korea are different. • The epidemic process of COVID-19 was driven by young people aged 20–39, with two peaks from February 19 to May 2, 2020. • The epidemic among elderly people declined steadily after March 15, 2020, despite large fluctuations in daily new cases among young people. • Elderly people can be effectively protected during the COVID-19 pandemic. Elderly people had suffered a disproportionate burden of COVID-19. We hypothesized that males and females in different age groups might have different epidemic trajectories. Using publicly available data from South Korea, daily new COVID-19 cases were assessed using generalized additive models, assuming Poisson and negative binomial distributions. Epidemic dynamics by age and gender groups were explored using interactions between smoothed time terms and age and gender. A negative binomial distribution fitted the daily case counts best. The relationship between the dynamic patterns of daily new cases and age groups was statistically significant (p < 0.001), but this was not the case with gender groups. People aged 20–39 years led the epidemic processes in South Korean society with two peaks — one major peak around March 1 and a smaller peak around April 7, 2020. The epidemic process among people aged 60 or above trailed behind that of the younger age group, and with smaller magnitude. After March 15, there was a consistent decline of daily new cases among elderly people, despite large fluctuations in case counts among young adults. Although young people drove the COVID-19 epidemic throughout society, with multiple rebounds, elderly people could still be protected from infection after the peak of the epidemic. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
40. Projections for COVID-19 pandemic in India and effect of temperature and humidity.
- Author
-
Goswami, Kuldeep, Bharali, Sulaxana, and Hazarika, Jiten
- Abstract
As, the COVID-19 has been deemed a pandemic by World Health Organization (WHO), and since it spreads everywhere throughout the world, investigation in relation to this disease is very much essential. Investigation of pattern in the occurrence of COVID-19, to check the influence of different meteorological factors on the incidence of COVID-19 and prediction of incidence of COVID-19 are the objectives of this paper. For trend analysis, Sen's Slope and Man-Kendall test have been used, Generalized Additive Model (GAM) of regression has been used to check the influence of different meteorological factors on the incidence and to predict the frequency of COVID-19, and Verhulst (Logistic) Population Model has been used. Statistically significant linear trend found for the daily-confirmed cases of COVID-19. The regression analysis indicates that there is some influence of the interaction of average temperature (AT) and average relative humidity (ARH) on the incidence of COVID-19. However, this result is not consistent throughout the study area. The projections have been made up to 21st May, 2020. Trend and regression analysis give an idea of the incidence of COVID-19 in India while projection made by Verhulst (Logistic) Population Model for the confirmed cases of the study area are encouraging as the sample prediction is as same as the actual number of confirmed COVID-19 cases. • Daily confirmed COVID-19 cases follow linear trend in India. • Average temperature and average relative humidity have some influence in the incidence of COVID-19. • Verhulst (Logistic) Population Model gives promising prediction for daily confirmed cases of COVID-19. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
41. Health effects of exposure to urban ambient particulate matter: A spatial-statistical study on 3rd-trimester pregnant women.
- Author
-
Yousefzadeh, Elham, Chamani, Atefeh, and Besalatpour, Aliasghar
- Subjects
MODIS (Spectroradiometer) ,PREGNANT women ,PREGNANCY ,PARTICULATE matter ,UREA ,GESTATIONAL age ,BLOOD platelets - Abstract
Pregnant women are highly vulnerable to environmental stressors such as ambient particulate matter (PM). Particularly during their 3rd trimester, their bodies undergo significant oxidative stresses. To further consolidate this dialogue into practice, the current study evaluated healthy pregnant women (n = 150 housewives; 18–40 years old; gestation age >36 weeks) from the highly polluted city of Yazd, Iran, from September to November 2021. The aerosol optical depth (AOD) data retrieved from the Moderate Resolution Imaging Spectroradiometer (MODIS) were employed as influencing variables and validated using field-collected PM 10 data (r = 0.62, p -value <0.01). The links between blood platelet count, enzymes (SGOT, SGPT, LDH, bilirubin), metabolic products (urea and acid uric) and different combinations of AOD data were assessed using the Generalized Additive Model. The results showed a high temporal variability in AOD (0.94 ± 0.51) but a spatially stable distribution pattern. The mean AOD during the 3rd trimester, followed by that of the three-month peak, were identified as the most significant non-linear predictors, while the mean AOD during the 1st trimester and throughout the entire pregnancy showed no significant associations with any of the biomarkers. Considering the associations found between AOD variables and maternal oxidative stresses, urgent planning is required to improve the urban air quality for sensitive subpopulations. [Display omitted] • Elevated ambient aerosol concentrations and advancing gestational age heighten maternal oxidative stress. • Maternal oxidative stress showed no sensitivity to aerosol concentration in early pregnancy. • Lactate dehydrogenase emerges as the most sensitive biomarker to aerosol concentration in pregnant women. • Aerosol concentration has no impact on blood bilirubin levels in pregnant women. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Estimation of pore-water electrical conductivity in soilless tomatoes cultivation using an interpretable machine learning model.
- Author
-
Sodini, Mirko, Cacini, Sonia, Navarro, Alejandra, Traversari, Silvia, and Massa, Daniele
- Subjects
- *
MACHINE learning , *ELECTRIC conductivity , *TOMATO farming , *FERTIGATION , *HYDROPONICS , *TILLAGE , *TOMATOES - Abstract
• Monitoring electrical conductivity at root zone is crucial in soilless systems. • Three models to estimate the pore-water electrical conductivity (EC w) were tested. • The widely used Hilhorst equation performed poorly in soilless conditions. • The GAM and XGBoost improved significantly the estimation of EC w. • XGBoost was the best model, as also confirmed by the evaluation of Shapely values. Soilless culture is widely adopted for improving produce quality and yield and increasing input efficiency. Most of the benefits potentially achievable in soilless systems are possible through precise and continuous management and adjustment of plant nutrition. Under operational conditions, the electrical conductivity (EC) is the main driving parameter leading fertigation strategies, but its measure in the drainage water can be not completely representative of the root zone in the growing medium. Nowadays low-cost sensors can be adopted to measure bulk EC (EC b) in the substrate. The Hilhorst equation is commonly used to convert the EC b into pore-water EC (EC w). This equation is widely calibrated for soil cultivation, but unable to perform properly for soilless substrate with high moisture content and water permittivity. In this work, two cultivation cycles of cherry tomato, managed in a closed-loop soilless system, were used to calibrate and validate two alternative models to the above equation (i.e., generalized additive model - GAM, and extreme gradient boost model - XGBoost). The models predicted EC w from the EC b recorded by substrate sensors. Plants were grown in rockwool using two different strategies for nutrient solution refill achieving different EC w trends during the cultivation. The Hilhorst equation confirmed its unsuitability for EC w prediction in soilless systems. EC w prediction through GAM was not satisfying at low and high EC w values. XGBoost was the most suitable model for EC w estimation, particularly at extreme EC values. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. A generalized additive model (GAM) approach to principal component analysis of geographic data.
- Author
-
de Asís López, Francisco, Ordóñez, Celestino, and Roca-Pardiñas, Javier
- Abstract
Geographically Weighted Principal Component Analysis (GWPCA) is an extension of classical PCA to deal with the spatial heterogeneity of geographical data. This heterogeneity results in a variance–covariance matrix that is not stationary but changes with the geographical location. Despite its usefulness, this method presents some unsolved issues, such as finding an appropriate bandwidth (size of the vicinity) as a function of the retained components. In this work, we address the problem of calculating principal components for geographical data from a new perspective that overcomes this problem. Specifically we propose a scale-location model which uses generalized additive models (GAMs) to calculate means for each variable and a correlation matrix that relates the variables, both depending on the spatial location. It should be noticed that although we deal with geographic data, our methodology cannot be considered strictly spatial since we assume that there is not a spatial correlation structure in the error term. Our approach does not require to calculate an optimal bandwidth as a function of the number of components retained in the analysis. Instead, the covariance matrix is estimated using smooth functions adapted to the data, so the smoothness can be different for each element of the matrix. The proposed methodology was tested with simulated data and compared with GWPCA. The result was a better representation of the data structure in the proposed method. Finally, we show the possibilities of our method in a problem with real data regarding air pollution and socioeconomic factors. • We solve PCA for geographical data using a location-scale model. • Variances and correlations are fitted using GAM. • No optimal bandwidth depending on the components retained needs to be calculated. • A simulation shows that our method outperforms GWPCA. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Effects of urbanization on winter wind chill conditions over China.
- Author
-
Lin, Lijie, Luo, Ming, Chan, Ting On, Ge, Erjia, Liu, Xiaoping, Zhao, Yongquan, and Liao, Weilin
- Abstract
Human-perceived wind chill describes the combined effects of wind velocity and low temperature, strongly related to human health and natural environment. Although long-term trends in the air or ambient temperature over China under global warming have been well studied in the literature, the changes in human-perceived wind chill conditions, especially under possible urbanization effects, are still not completely known. This paper investigates the changes of wind chill over China and quantifies the associated urbanization effect by examining nearly 2000 meteorological stations during 1961–2014 using the generalized additive model (GAM). Results show that the winter wind chill temperature (WCT) in China exhibits more prominent raising trends than the air temperature, i.e., 0.623 and 0.349 °C per decade, respectively. The wind speed (V) and wind chill days (WCD) decreased by 0.149 m/s and 1.970 days per decade, respectively. These trends become more substantial in densely populated and highly urbanized areas such as the North China Plain. The expansion of urban built-up area induces additional warming (reducing) to the increase (decrease) in WCT (WCD). On average, an increase from 0% to 100% in the urban fraction induced 0.290 ± 0.067 °C higher WCT (± denotes the 95% confidence interval), along with a reduction in V and WCD by 0.052 ± 0.014 m/s and 3.513 ± 0.387 days, respectively; whereas, the presence of the grassland and forest significantly diminishes the WCT and increases the WCD and surface V. It is expected that wind chill over China tends to be weakened under glocal warming and local urbanization in the near future. Our results have important implications for climate change mitigation, urban planning, landscape design, and air pollution abatement. Unlabelled Image • Wind chill temperature in China increased at +0.623 °C/10 yr during 1961–2014. • Effects of urbanization on wind chill are assessed by the generalized additive model. • Expansion of urban built-up area significantly increases wind chill temperature. • A change in built-up from 0% to 100% induces additional warming of 0.29 ± 0.067 °C. • Urban grassland and forest decrease wind chill temperature and increase wind speed. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
45. Effects of time-lagged meteorological variables on attributable risk of leishmaniasis in central region of Afghanistan.
- Author
-
Adegboye, Majeed A., Olumoh, Jamiu, Saffary, Timor, Elfaki, Faiz, and Adegboye, Oyelola A.
- Abstract
Leishmaniasis remains one of the world's most neglected vector-borne diseases, affecting predominantly poor communities mainly in developing countries. Previous studies have shown that the distribution and dynamics of leishmaniasis infections are sensitive to environmental factors, however, there are no studies on the burden of leishmaniasis attributable to time-varying meteorological variables. This study used data from 3 major leishmaniosis afflicted provinces of Afghanistan, between 2003 and 2009, to provide empirical analysis of change in heat/cold-leishmaniosis association. Non-linear and delayed exposure-lag-response relationship between meteorological variables and leishmaniasis were fitted with a distributed lag non-linear model applying a spline function which describes the dependency along the range of values with a lag of up to 12 months. We estimated the risk of leishmaniasis attributable to high and low temperature. The median monthly mean temperature and rainfall were 16.1 °C and 0.6 in., respectively. Seasonal variations of leishmaniasis were consistent between males and females, however significant differences were observed among different age groups. Temperature effects were immediate and persistent (lag 0–12 months). The cumulative risks were highest at cold temperatures. The cumulative relative risks (logRR) for leishmaniasis were 6.16 (95% CI: 5.74–6.58) and 1.15 (95% CI: 1.32–1.31) associated with the 10th percentile temperature (2.16 °C) and the 90th percentile temperature (28.46 °C). The subgroup analysis showed increased risk for males as well as young and middle aged people at cold temperatures, however, higher risk was observed for the elderly in heat. The overall leishmaniasis-temperature attributable fractions was estimated to be 7.6% (95% CI: 7.5%–7.7%) and mostly due to cold. Findings in this study highlight the non-linearity, delay of effects and magnitude of leishmaniasis risk associated with temperature. The disparity of risk between different subgroups can hopefully advise policy makers and assist in leishmaniasis control program. Unlabelled Image • This study presents the burden of leishmaniasis attributable to time-varying meteorological variables using DLNM. • Temperature effects were immediate and persistent, and highest at cold temperatures. • Disparity of risk was observed in subgroup analysis. • Extreme cold temperatures exerted increased risk of leishmaniasis for males and younger people (aged <60 years). • Elderly people were the most sensitive to high temperature. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
46. Comparison between two GAMs in quantifying the spatial distribution of Hexagrammos otakii in Haizhou Bay, China.
- Author
-
Liu, Xiaoxiao, Wang, Jing, Zhang, Yunlei, Yu, Huaming, Xu, Binduo, Zhang, Chongliang, Ren, Yiping, and Xue, Ying
- Subjects
- *
SPECIES distribution , *MULTIPLE correspondence analysis (Statistics) , *PREY availability , *MARINE organisms - Abstract
Species distribution models (SDMs) can be used to quantify the relationships between species distribution and environmental variables. The predictive skill of SDMs depends on whether appropriate explanatory variables and intrinsic processes are included in the model. In addition to abiotic environmental variables, biotic variables could also have significant impacts on the spatial distribution of marine organisms. Correlations between some explanatory variables will cause multicollinearity, which could result in overfitting of models and erroneous projections/forecasts of species distribution. Application of dimension reduction techniques such as principal component analysis (PCA) could be used to retain important information and avoid collinearity. We compared the performance of the generalized additive model (GAM) and the PCA-based GAM in predicting the spatial distribution of Hexagrammos otakii in Haizhou Bay, incorporating abiotic and biotic variables in these models. Results showed that the PCA-based GAM was able to reduce the multicollinearity introduced by explanatory variables and improve the performance of GAMs, according to a cross-validation test and predicted species distribution. Incorporating prey abundance in PCA-based GAM could improve the predictive skill of SDMs. The method proposed in this study could be extended to other marine organisms to enhance our understanding of the ecological mechanisms underlying the distribution of target species. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
47. Holocene mercury accumulation calibrated by peat decomposition in a peat sequence from the Sanjiang Plain, northeast China.
- Author
-
Gao, Chuanyu, Zhang, Shaoqing, Li, Yunhui, Han, Dongxue, Liu, Hanxiang, and Wang, Guoping
- Subjects
- *
PEAT , *MERCURY , *PEAT soils , *LEAD in soils , *PLAINS , *PEATLANDS , *CHARCOAL - Abstract
Peatlands are ideal archives that are widely used to reconstruct historical Hg accumulation around the world. However, decomposition of peat soils leads to Hg enrichment or depletion in peat profiles. To evaluate the impact of peat decomposition on historical Hg accumulation records, a 7800-year peat sequence from the Shenjiadian peatland (SJD-2, Sanjiang Plain, northeast China) was selected in this study. Based on the degree of peat humification and a generalized additive model (GAM), Hg accumulation rates (Hg ARs) were reconstructed and calibrated from the middle Holocene onward. The results showed that the Hg concentrations in the SJD-2 peat profile ranged from 11.9 to 55.3 ng g−1 and that the Hg AR ranged from 0.4 to 7.0 μg m−2 yr−1; these values for both parameters were lower than their corresponding values observed in other peatlands around the world. Peat decomposition led to Hg depletion in peat soils to some extent, and the GAM could be used to evaluate the impact of peat decomposition on historical Hg ARs in peat sequences based on the degree of peat humification and calibration of Hg ARs. Before the Industrial Revolution, anthropogenic Hg sources caused the calibrated Hg ARs in the SJD-2 peat profile to slightly increase around 1300 cal yr BP. Similar to other regions around the world, the calibrated Hg AR on the Sanjiang Plain also obviously increased from 3 to 8 μg m−2 yr−1 after global Hg emissions began to increase during the Industrial Revolution. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
48. How much can temporally stationary factors explain cellular automata-based simulations of past and future urban growth?
- Author
-
Feng, Yongjiu, Wang, Rong, Tong, Xiaohua, and Shafizadeh-Moghadam, Hossein
- Subjects
- *
URBAN growth , *CITIES & towns , *CELLULAR automata , *GROWTH factors - Abstract
Driving factors are usually assumed temporally stationary in cellular automata (CA) based land use modeling, hence the persistence of their relationships. Therefore, major questions as to how much do the temporally stationary factors explain the past and future urban growth, and how long can these factors justify the projection of urban scenarios in the future, are worth further study. We selected seven explanatory driving factors to calibrate a DE-CA (differential evolution-based CA) model to simulate urban growth in Ningbo of China during 2000–2015 and project nine scenarios of urban growth from 2015 to 2060. We evaluated the effects of factors on urban growth using generalized additive models (GAM) based on fitting statistics such as accumulative deviance explained (ADE). Our results show remarkably temporal change in factor effects on the future urban growth – the ADE peaks with 34.7% in 2045 for the total projected urban growth since 2015 while that for every five years decreases continuously from 26.5% during 2000–2005 to 1.9% during 2050–2055, but slightly increase to 3.0% during 2055–2060. These indicate that the stationary factors have less strong explanatory power to the new urban areas that are farther away from the existing built-up areas. The results suggest that a 30-year period in the future is most suitable to project the urban growth scenarios, where the new urban area approximates the initial urban area. The specific best period for scenario projection elsewhere can then be identified using the method presented in this study. • We calibrated a DE-CA model using temporally stationary factors. • We simulated and projected the urban growth in Ningbo from 2000 to 2060. • We evaluated the effects of factors on urban growth using generalized additive models. • The dominant factors that drive the future urban growth differ over time. • A 30-year period in future is most suitable to project the urban growth scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
49. Isoscapes reveal patterns of δ13C and δ15N of pelagic forage fish and squid in the Northwest Pacific Ocean.
- Author
-
Ohshimo, Seiji, Madigan, Daniel J., Kodama, Taketoshi, Tanaka, Hiroshige, Komoto, Kaoru, Suyama, Satoshi, Ono, Tsuneo, and Yamakawa, Takashi
- Subjects
- *
FORAGE fishes , *PELAGIC fishes , *SQUIDS , *PREDATORY animals , *NITROGEN fixation , *CARBON fixation - Abstract
• Isoscapes of δ13C and (δ15N) in fish and squid were estimated in the Northwest Pacific. • High and low values of the δ13C showed in the sub-tropical and sub-arctic, respectively. • Higher values of δ15N were found in both tropical and temperate regions. • Lower values of δ15N were observed in sub-tropical and sub-arctic regions. • The isotopic heterogeneity is available for understanding the food-web and the budgets. Isoscapes of stable isotope ratios of carbon (δ13C) and nitrogen (δ15N) in pelagic fish and squid were generated using 1967 measured values of forage fish and squid in the Northwest Pacific Ocean. We then used generalized additive models (GAMs) to assess the explanatory variables that best predicted regional fish and squid δ13C and δ15N values. A total of 522 squid and 1445 forage fish were analyzed for δ13C and δ15N. The explanatory variables/variates used in GAM analyses were geographical parameters (longitude and latitude), body mass, season, category of organism (fish or squid), and vertical habitat. The resulting isoscapes of δ13C showed higher values in the sub-tropical region and lower values in the sub-arctic region. The isoscape patterns of δ15N were more complex; higher values were found in both tropical and temperate regions, and lower values were observed in sub-tropical and sub-arctic regions. These heterogeneities in isoscapes of δ13C and δ15N in the Northwest Pacific could be caused by productivity dynamics, ocean currents and upwelling, and the degree of atmospheric fixation of carbon and nitrogen. These new isoscapes can be used in conjunction with predator δ13C and δ15N values to evaluate movements and trophic dynamics of predatory marine animals, including tunas, billfish, sharks, seabirds, and marine mammals, in the Northwestern Pacific Ocean. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
50. A compositional approach for modelling SDG7 indicators: Case study applied to electricity access.
- Author
-
Marcillo-Delgado, J.C., Ortego, M.I., and Pérez-Foguet, A.
- Subjects
- *
RURAL families , *SUPPORT vector machines , *ELECTRICITY , *LOAD forecasting (Electric power systems) , *REGRESSION analysis , *CASE studies , *DATA quality - Abstract
Abstract Monitoring energy indicators has acquired a renewed interest with the 2030 Agenda for Sustainable Development, and specifically with goal 7 (SDG7), which seeks to guarantee universal access to energy. The predominant criteria to monitor SDG7 are given in a set of individual indicators. Along this line, the UN indicators proposed in the 47th session of the UN Statistical commission are a practical starting point. A relevant characteristic of these indicators is that they can be expressed as proportions from a whole, i.e., they are compositions. Notably, directly implementing traditional multivariate models onto indicators that are proportions without an intermediate process can lead to spurious analysis. Here, we aim to assess the application of compositional data analysis(CoDa) to follow up on the temporal trend indicators of the energy sector in the context of SDG7, with a case study for the most affected areas addressing the problem of electricity access. Following CoDa methodology, we first use a log-ratio transformation to bring compositions to real space and then apply three multivariate methods: linear regression, generalized additive models and support vector machine. We also address other characteristic problems of the electricity access indicators, such as data quality, which was treated by considering models with interactions. In sum, CoDa facilitates a controlled management of the parts that make up population based indicators, suggesting that modelling evolution of compositions as individual components – even the standard splitting of country data into rural and urban "access to" indicator – should be avoided. Highlights • Three multivariate methods are applied to a compositional data description of electricity access indicators to model temporal trends. • Poor data quality of electricity access indicators is treated by considering models with interactions. • The approach facilitates a controlled management of the parts that make up population based indicators. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.