32 results on '"Bounded data"'
Search Results
2. A model for bimodal rates and proportions.
- Author
-
Vila, Roberto, Alfaia, Lucas, Menezes, André F.B., Çankaya, Mehmet N., and Bourguignon, Marcelo
- Subjects
- *
BETA distribution , *REGRESSION analysis , *DATA distribution - Abstract
The beta model is the most important distribution for fitting data with the unit interval. However, the beta distribution is not suitable to model bimodal unit interval data. In this paper, we propose a bimodal beta distribution constructed by using an approach based on the alpha-skew-normal model. We discuss several properties of this distribution, such as bimodality, real moments, entropies and identifiability. Furthermore, we propose a new regression model based on the proposed model and discuss residuals. Estimation is performed by maximum likelihood. A Monte Carlo experiment is conducted to evaluate the performances of these estimators in finite samples with a discussion of the results. An application is provided to show the modelling competence of the proposed distribution when the data sets show bimodality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. An Alternative to the Beta Regression Model with Applications to OECD Employment and Cancer Data
- Author
-
Okorie, Idika E. and Afuecheta, Emmanuel
- Published
- 2024
- Full Text
- View/download PDF
4. Quasi-Cauchy Regression Modeling for Fractiles Based on Data Supported in the Unit Interval.
- Author
-
de Oliveira, José Sérgio Casé, Ospina, Raydonal, Leiva, Víctor, Figueroa-Zúñiga, Jorge, and Castro, Cecilia
- Subjects
- *
QUANTILE regression , *REGRESSION analysis , *PROBABILITY density function , *DATA modeling , *HETEROSCEDASTICITY - Abstract
A fractile is a location on a probability density function with the associated surface being a proportion of such a density function. The present study introduces a novel methodological approach to modeling data within the continuous unit interval using fractile or quantile regression. This approach has a unique advantage as it allows for a direct interpretation of the response variable in relation to the explanatory variables. The new approach provides robustness against outliers and permits heteroscedasticity to be modeled, making it a tool for analyzing datasets with diverse characteristics. Importantly, our approach does not require assumptions about the distribution of the response variable, offering increased flexibility and applicability across a variety of scenarios. Furthermore, the approach addresses and mitigates criticisms and limitations inherent to existing methodologies, thereby giving an improved framework for data modeling in the unit interval. We validate the effectiveness of the introduced approach with two empirical applications, which highlight its practical utility and superior performance in real-world data settings. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Modelling count, bounded and skewed continuous outcomes in physical activity research: beyond linear regression models
- Author
-
Muhammad Akram, Ester Cerin, Karen E. Lamb, and Simon R. White
- Subjects
Count data ,Skewed data ,Bounded data ,Physical activity ,Linear regression model ,Generalized linear model ,Nutritional diseases. Deficiency diseases ,RC620-627 ,Public aspects of medicine ,RA1-1270 - Abstract
Abstract Background Inference using standard linear regression models (LMs) relies on assumptions that are rarely satisfied in practice. Substantial departures, if not addressed, have serious impacts on any inference and conclusions; potentially rendering them invalid and misleading. Count, bounded and skewed outcomes, common in physical activity research, can substantially violate LM assumptions. A common approach to handle these is to transform the outcome and apply a LM. However, a transformation may not suffice. Methods In this paper, we introduce the generalized linear model (GLM), a generalization of the LM, as an approach for the appropriate modelling of count and non-normally distributed (i.e., bounded and skewed) outcomes. Using data from a study of physical activity among older adults, we demonstrate appropriate methods to analyse count, bounded and skewed outcomes. Results We show how fitting an LM when inappropriate, especially for the type of outcomes commonly encountered in physical activity research, substantially impacts the analysis, inference, and conclusions compared to a GLM. Conclusions GLMs which more appropriately model non-normally distributed response variables should be considered as more suitable approaches for managing count, bounded and skewed outcomes rather than simply relying on transformations. We recommend that physical activity researchers add the GLM to their statistical toolboxes and become aware of situations when GLMs are a better method than traditional approaches for modeling count, bounded and skewed outcomes.
- Published
- 2023
- Full Text
- View/download PDF
6. The Unit-Gompertz Quantile Regression Model for the Bounded Responses.
- Author
-
Mazucheli, Josmar, Alves, Bruna, and Korkmaz, Mustafa Ç.
- Subjects
- *
QUANTILE regression , *REGRESSION analysis , *MONTE Carlo method , *DISTRIBUTION (Probability theory) , *RANDOM numbers , *GENERATING functions - Abstract
This paper proposes a regression model for the continuous responses bounded to the unit interval which is based on the unit-Gompertz distribution as an alternative to quantile regression models based on the unit-Birnbaum-Saunders, unit-Weibull, L-Logistic, Kumaraswamy and Johnson SB distributions. Re-parameterizing the unit-Gompertz distribution as a function of its quantile allows us to model the effect of covariates across the entire response distribution, rather than only at the mean. Our proposal sometimes outperforms the other distributions available in the literature. These discoveries are provided by Monte Carlo simulations and one application using a real data set. An R package, including parameter estimation, model checking as well as density, cumulative distribution, quantile and random number generating functions of the unit-Gompertz distribution are developed and can be readily used in applications. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Modelling count, bounded and skewed continuous outcomes in physical activity research: beyond linear regression models.
- Author
-
Akram, Muhammad, Cerin, Ester, Lamb, Karen E., and White, Simon R.
- Subjects
RESEARCH ,HEALTH outcome assessment ,PHYSICAL activity ,OLD age - Abstract
Background: Inference using standard linear regression models (LMs) relies on assumptions that are rarely satisfied in practice. Substantial departures, if not addressed, have serious impacts on any inference and conclusions; potentially rendering them invalid and misleading. Count, bounded and skewed outcomes, common in physical activity research, can substantially violate LM assumptions. A common approach to handle these is to transform the outcome and apply a LM. However, a transformation may not suffice. Methods: In this paper, we introduce the generalized linear model (GLM), a generalization of the LM, as an approach for the appropriate modelling of count and non-normally distributed (i.e., bounded and skewed) outcomes. Using data from a study of physical activity among older adults, we demonstrate appropriate methods to analyse count, bounded and skewed outcomes. Results: We show how fitting an LM when inappropriate, especially for the type of outcomes commonly encountered in physical activity research, substantially impacts the analysis, inference, and conclusions compared to a GLM. Conclusions: GLMs which more appropriately model non-normally distributed response variables should be considered as more suitable approaches for managing count, bounded and skewed outcomes rather than simply relying on transformations. We recommend that physical activity researchers add the GLM to their statistical toolboxes and become aware of situations when GLMs are a better method than traditional approaches for modeling count, bounded and skewed outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model.
- Author
-
Martínez-Flórez, Guillermo, Vergara-Cardozo, Sandra, Tovar-Falón, Roger, and Rodriguez-Quevedo, Luisa
- Subjects
- *
SKEWNESS (Probability theory) , *REGRESSION analysis , *MARGINAL distributions , *FISHER information , *MAXIMUM likelihood statistics , *INFERENTIAL statistics - Abstract
In this article, a multivariate extension of the unit-sinh-normal (USHN) distribution is presented. The new distribution, which is obtained from the conditionally specified distributions methodology, is absolutely continuous, and its marginal distributions are univariate USHN. The properties of the multivariate USHN distribution are studied in detail, and statistical inference is carried out from a classical approach using the maximum likelihood method. The new multivariate USHN distribution is suitable for modeling bounded data, especially in the (0 , 1) p region. In addition, the proposed distribution is extended to the case of the regression model and, for the latter, the Fisher information matrix is derived. The numerical results of a small simulation study and two applications with real data sets allow us to conclude that the proposed distribution, as well as its extension to regression models, are potentially useful to analyze the data of proportions, rates, or indices when modeling them jointly considering different degrees of correlation that may exist in the study variables is of interest. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. A resource occupancy ratio-oriented load balancing task scheduling mechanism for Flink.
- Author
-
Dai, Qinglong, Qin, Guangjun, Li, Jianwu, Zhao, Jun, and Cai, Jifan
- Subjects
- *
DISTRIBUTED computing , *SCHEDULING , *QUALITY of service , *ELECTRONIC data processing , *QUADRATIC programming - Abstract
Flink is regarded as a promising distributed data processing engine for unifying bounded data and unbounded data. Unbalanced workloads upon multiple workers/task managers/servers in the Flink bring congestion, which will lead to the quality of service (QoS) decreasing. The balanced load distribution could efficiently improve QoS. Besides, existing works are lagging behind the current Flink version. To distribute workloads upon workers evenly, a resource-oriented load balancing task scheduling (RoLBTS) mechanism for Flink is proposed. The capacities of CPU, memory, and bandwidth are taken into consideration. Based on the barrel principle, the memory, and the bandwidth are respectively selected to model the resource occupancy ratio of the physical node and that of the physical link. On the based of modeled resource occupancy ratio, the data processing of load-balancing resource usage in Flink is formulated as a quadratic programming problem. Based on the self-recursive calling, a RoLBTS algorithm for scheduling task-needed resources is presented. Trough the numerical simulation, the superiority of our work is evaluated in terms of resource score, the number of possible scheduling solutions, and resource usage ratio. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. An Alternative Lambert-Type Distribution for Bounded Data.
- Author
-
Varela, Héctor, Rojas, Mario A., Reyes, Jimmy, and Iriarte, Yuri A.
- Subjects
- *
DATA distribution , *BETA distribution , *MAXIMUM likelihood statistics , *PARAMETER estimation , *PERCENTILES , *KURTOSIS - Abstract
In this article, we propose a new two-parameter distribution for bounded data such as rates, proportions, or percentages. The density function of the proposed distribution, presenting monotonic, unimodal, and inverse-unimodal shapes, tends to a positive finite value at the lower end of its support, which can lead to a better fit of the lower empirical quantiles. We derive some of the main structural properties of the new distribution. We make a description of the skewness and kurtosis of the distribution. We discuss the parameter estimation under the maximum likelihood method. We developed a simulation study to evaluate the behavior of the estimators. Finally, we present two applications to real data providing evidence that the proposed distribution can perform better than the popular beta and Kumaraswamy distributions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Zero and One Inflated Item Response Theory Models for Bounded Continuous Data.
- Author
-
Molenaar, Dylan, Cúri, Mariana, and Bazán, Jorge L.
- Subjects
ITEM response theory ,MODEL theory ,STIMULUS & response (Psychology) ,DATA distribution - Abstract
Bounded continuous data are encountered in many applications of item response theory, including the measurement of mood, personality, and response times and in the analyses of summed item scores. Although different item response theory models exist to analyze such bounded continuous data, most models assume the data to be in an open interval and cannot accommodate data in a closed interval. As a result, ad hoc transformations are needed to prevent scores on the bounds of the observed variables. To motivate the present study, we demonstrate in real and simulated data that this practice of fitting open interval models to closed interval data can majorly affect parameter estimates even in cases with only 5% of the responses on one of the bounds of the observed variables. To address this problem, we propose a zero and one inflated item response theory modeling framework for bounded continuous responses in the closed interval. We illustrate how four existing models for bounded responses from the literature can be accommodated in the framework. The resulting zero and one inflated item response theory models are studied in a simulation study and a real data application to investigate parameter recovery, model fit, and the consequences of fitting the incorrect distribution to the data. We find that neglecting the bounded nature of the data biases parameters and that misspecification of the exact distribution may affect the results depending on the data generating model. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. A case for beta regression in the natural sciences
- Author
-
Emilie A. Geissinger, Celyn L. L. Khoo, Isabella C. Richmond, Sally J. M. Faulkner, and David C. Schneider
- Subjects
angular transformation ,arcsine square root ,beta regression ,bounded data ,elemental composition ,NMR ,Ecology ,QH540-549.5 - Abstract
Abstract Data in the natural sciences are often in the form of percentages or proportions that are continuous and bounded by 0 and 1. Statistical analysis assuming a normal error structure can produce biased and incorrect estimates when data are doubly bounded. Beta regression uses an error structure appropriate for such data. We conducted a literature review of percent and proportion data from 2004 to 2020 to determine the types of analyses used for (0, 1) bounded data. Our literature review showed that before 2012, angular transformations accounted for 93% of analyses of proportion or percent data. After 2012, angular transformation accounted for 52% of analyses and beta regression accounted for 14% of analyses. We compared a linear model with angular transformation with beta regression using data from two fields of the natural sciences that produce continuous, bounded data: biogeochemistry and ecological elemental composition. We found little difference in model diagnostics, likelihood ratios, and p‐values between the two models. However, we found substantially different coefficient estimates from the back‐calculated beta regression and angular transformation models. Beta regression provides reliable parameter estimates in natural science studies where effect sizes are considered as important as hypothesis testing.
- Published
- 2022
- Full Text
- View/download PDF
13. A case for beta regression in the natural sciences.
- Author
-
Geissinger, Emilie A., Khoo, Celyn L. L., Richmond, Isabella C., Faulkner, Sally J. M., and Schneider, David C.
- Subjects
REGRESSION analysis ,DATA science ,PARAMETER estimation ,STATISTICS ,ESTIMATES - Abstract
Data in the natural sciences are often in the form of percentages or proportions that are continuous and bounded by 0 and 1. Statistical analysis assuming a normal error structure can produce biased and incorrect estimates when data are doubly bounded. Beta regression uses an error structure appropriate for such data. We conducted a literature review of percent and proportion data from 2004 to 2020 to determine the types of analyses used for (0, 1) bounded data. Our literature review showed that before 2012, angular transformations accounted for 93% of analyses of proportion or percent data. After 2012, angular transformation accounted for 52% of analyses and beta regression accounted for 14% of analyses. We compared a linear model with angular transformation with beta regression using data from two fields of the natural sciences that produce continuous, bounded data: biogeochemistry and ecological elemental composition. We found little difference in model diagnostics, likelihood ratios, and p‐values between the two models. However, we found substantially different coefficient estimates from the back‐calculated beta regression and angular transformation models. Beta regression provides reliable parameter estimates in natural science studies where effect sizes are considered as important as hypothesis testing. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. PERFORMANCE ANALYSIS IN THE PRESENCE OF BOUNDED, DISCRETE, AND FLEXIBLE MEASURES.
- Author
-
KORDROSTAMI, SOHRAB and SAYYAD NOVEIRI, MONIREH JAHANI
- Subjects
DATA envelopment analysis - Abstract
In conventional data envelopment analysis (DEA) models, the relative efficiency of decision-making units (DMUs) is evaluated while all measures with certain input and/or output status are considered as continuous data without upper and/or lower bounds. However, there are occasions in real-world applications that the efficiency of firms must be assessed while bounded elements, discrete values, and flexible measures are present. For this purpose, the current study proposes DEA-based approaches to estimate the relative efficiency of DMUs where bounded factors, integer values, and flexible measures exist. To illustrate it, radial models based on two aspects, individual and aggregate, are introduced to measure the performance of entities and to handle the status of the flexible measure such that there are bounded components and discrete data. Applications of approaches proposed in the areas of quality management, highway maintenance patrols, and university performance measurement are given to clarify the issue and to show their practicability. It was found that the introduced procedure can determine practical projection points for bounded measures and integer values (from the individual DMU viewpoint) and can classify flexible measures along with evaluation of DMUs relative efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
15. Multiple imputation strategies for a bounded outcome variable in a competing risks analysis.
- Author
-
Curnow, Elinor, Hughes, Rachael A., Birnie, Kate, Crowther, Michael J., May, Margaret T., and Tilling, Kate
- Subjects
- *
COMPETING risks , *HEMATOPOIETIC stem cell transplantation , *RISK assessment , *TIME perception , *MISSING data (Statistics) - Abstract
In patient follow‐up studies, events of interest may take place between periodic clinical assessments and so the exact time of onset is not observed. Such events are known as "bounded" or "interval‐censored." Methods for handling such events can be categorized as either (i) applying multiple imputation (MI) strategies or (ii) taking a full likelihood‐based (LB) approach. We focused on MI strategies, rather than LB methods, because of their flexibility. We evaluated MI strategies for bounded event times in a competing risks analysis, examining the extent to which interval boundaries, features of the data distribution and substantive analysis model are accounted for in the imputation model. Candidate imputation models were predictive mean matching (PMM); log‐normal regression with postimputation back‐transformation; normal regression with and without restrictions on the imputed values and Delord and Genin's method based on sampling from the cumulative incidence function. We used a simulation study to compare MI methods and one LB method when data were missing at random and missing not at random, also varying the proportion of missing data, and then applied the methods to a hematopoietic stem cell transplantation dataset. We found that cumulative incidence and median event time estimation were sensitive to model misspecification. In a competing risks analysis, we found that it is more important to account for features of the data distribution than to restrict imputed values based on interval boundaries or to ensure compatibility with the substantive analysis by sampling from the cumulative incidence function. We recommend MI by type 1 PMM. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
16. Flexible quasi-beta regression models for continuous bounded data.
- Author
-
Bonat, Wagner H, Petterle, Ricardo R, Hinde, John, and Demétrio, Clarice GB
- Subjects
- *
REGRESSION analysis , *DATA distribution , *PARAMETER estimation , *BETA distribution , *DATA - Abstract
We propose a flexible class of regression models for continuous bounded data based on second-moment assumptions. The mean structure is modelled by means of a link function and a linear predictor, while the mean and variance relationship has the form ϕ μ p (1 − μ) p , where μ , ϕ and p are the mean, dispersion and power parameters respectively. The models are fitted by using an estimating function approach where the quasi-score and Pearson estimating functions are employed for the estimation of the regression and dispersion parameters respectively. The flexible quasi-beta regression model can automatically adapt to the underlying bounded data distribution by the estimation of the power parameter. Furthermore, the model can easily handle data with exact zeroes and ones in a unified way and has the Bernoulli mean and variance relationship as a limiting case. The computational implementation of the proposed model is fast, relying on a simple Newton scoring algorithm. Simulation studies, using datasets generated from simplex and beta regression models show that the estimating function estimators are unbiased and consistent for the regression coefficients. We illustrate the flexibility of the quasi-beta regression model to deal with bounded data with two examples. We provide an R implementation and the datasets as supplementary materials. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. Complementary Beta Regression Model for Fitting Bounded Data
- Author
-
Menezes, André F. B., Bourguignon, Marcelo, and Mazucheli, Josmar
- Published
- 2022
- Full Text
- View/download PDF
18. Bivariate unit-Birnbaum-Saunders distribution
- Author
-
Rodríguez Quevedo, Luisa Paulina, Vergara Cardozo, Sandra, and Martínez Flórez, Guillermo
- Subjects
Unit-Sinh-Normal distribution ,Multivariate regression model ,Prueba de hipótesis estadística ,condicionalmente especificada ,Bivariate unit-Birnbaum-Saunders distribution ,Statistical hypothesis testing ,Mathematical statistics ,Multivariate log-Birnbaum Saunders distribution ,Estadística matemática ,Distribución Bivariada Birnbaum Saunders unitaria ,Bounded data ,Distibución Sinh-Normal Unitaria ,Datos acotados ,Conditionally specified ,519 - Probabilidades y matemáticas aplicadas [510 - Matemáticas] ,Modelo de regresión multivariado ,Distribución log-Birbaum Saunders multivariada - Abstract
ilustraciones La distribución Unitaria Birnbaum Saunders (UBS), [Mazucheli et al., 2018a], tiene soporte en el intervalo (0,1), motivo por el cual se emplea con éxito en el modelamiento de tasas e indicadores. Se presentan dos nuevas distribuciones bivariadas, la distribución Bivariada Birnbaum Saunders Unitaria (BVUBS) y la distribución Bivariada Sinh-Normal Birnbaum Saunders Unitaria (BVUSHN), además como efecto natural el modelo de regresión para el caso de covariables en el modelo, empleando para ello el concepto de distribuciones condicionalmente especificadas, dichas distribuciones son capaces de modelar tasas y proporciones en el plano unidad, y presentan un mejor ajuste a datos comparadas con otras distribuciones. Igualmente, se presentan algunas propiedades generales de los modelos, valores esperados e inferencia por máxima verosimilitud y aplicación a datos reales. Conjuntamente al presente trabajo de maestría se publica el artículo The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model, [Martínez-Flórez et al., 2023], el cual se enfocó en la extensión multivariada de la distribución Sinh-Normal Unitaria, estudiando en detalle las propiedades de la distribución e inferencia estadística, se incluye un estudio de simulación asociado al modelo de regresión y dos aplicaciones con datos reales, logrando concluir que son potencialmente útiles para modelar datos de proporciones, tasas o índices. (Texto tomado de la fuente) The unit-Birnbaum-Saunders distribution (UBS), [Mazucheli et al., 2018a], has support in the interval (0,1), which is why it is used successfully in the modeling of rates and indicators. Two new bivariate distributions are presented, the Bivariate Unit-Birnbaum-Saunders distribution (BVUBS) and the Bivariate Unit-Sinh-Normal Birnbaum Saunders distribution (BVUSHN), as well as the natural effect the regression model for the case of covariates in the model, using the concept of conditionally specified distributions, these distributions are capable of modeling rates and proportions in the unit plane, and present a better fit to data compared to other distributions. Likewise, some general properties of the models, expected values and inference by maximum likelihood and application to real data are presented. The article The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model, [Martínez-Flórez et al., 2023], which focused on the multivariate extension of the Unit-Sinh-Normal Distribution, is published together with this master's thesis, studying in detail the properties of the distribution and statistical inference, a simulation study associated with the regression model and two applications with real data are included. We conclude that they are potentially useful for modeling ratio, rate or index data. Maestría Magíster en Ciencias - Estadística Profundización
- Published
- 2023
19. Sidelining the Mean: The Relative Variability Index as a Generic Mean-Corrected Variability Measure for Bounded Variables.
- Author
-
Mestdagh, Merijn, Pe, Madeline, Pestman, Wiebe, Verdonck, Stijn, Kuppens, Peter, and Tuerlinckx, Francis
- Abstract
Variability indices are a key measure of interest across diverse fields, in and outside psychology. A crucial problem for any research relying on variability measures however is that variability is severely confounded with the mean, especially when measurements are bounded, which is often the case in psychology (e.g., participants are asked "rate how happy you feel now between 0 and 100?"). While a number of solutions to this problem have been proposed, none of these are sufficient or generic. As a result, conclusions on the basis of research relying on variability measures may be unjustified. Here, we introduce a generic solution to this problem by proposing a relative variability index that is not confounded with the mean by taking into account the maximum possible variance given an observed mean. The proposed index is studied theoretically and we offer an analytical solution for the proposed index. Associated software tools (in R and MATLAB) have been developed to compute the relative index for measures of standard deviation, relative range, relative interquartile distance and relative root mean squared successive difference. In five data examples, we show how the relative variability index solves the problem of confound with the mean, and document how the use of the relative variability measure can lead to different conclusions, compared with when conventional variability measures are used. Among others, we show that the variability of negative emotions, a core feature of patients with borderline disorder, may be an effect solely driven by the mean of these negative emotions. The variability of processes is important across diverse fields, in and outside psychology. When measurements of these processes are bounded, which is often the case in psychology (e.g., participants are asked "rate how happy you feel now between 0 and 100?"), most variability indices become confounded with the mean. This is problematic for interpreting findings related to variability (effects of manipulation, correlations with other variables), as it is unclear whether they truly reflect effects involving variability, or are just a side effect of the mean. In the worst case, conclusions on the basis of research relying on existing variability measures may be unjustified. Here, we introduce a generic solution to this problem by proposing a relative variability index that is not confounded with the mean. The proposed index is studied theoretically and we offer an analytical solution for the proposed index, along with software tools (in R and MATLAB) to compute the relative index for measures of standard deviation, relative range, relative interquartile distance and relative root mean squared successive difference. In five data examples, we show how the relative variability index solves the problem of confound with the mean, and document how the use of the relative variability measure can lead to different conclusions, compared with when conventional variability measures are used. Among others, we show that the variability of negative emotions, a core feature of patients with borderline disorder, may be an effect solely driven by the mean of these negative emotions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. Interval And Ordinal Data : How Standard Linear DEA Model Treats Imprecise Data
- Author
-
Chen, Yao, Zhu, Joe, Zhu, Joe, editor, and Cook, Wade D., editor
- Published
- 2007
- Full Text
- View/download PDF
21. TO LOGIT OR NOT TO LOGIT DATA IN THE UNIT INTERVAL: A SIMULATION STUDY
- Author
-
Hamzat, Kayode Idris
- Subjects
bounded data ,logit ,beta regression ,quantile regression ,Applied Statistics ,simulation study ,unit interval ,Statistical Methodology ,Statistical Theory - Abstract
In this paper, we recommend a mechanism for determining whether to logit or not to logit data in the unit interval which is based on quantile estimation of data between 0 and 1. By using a simulated dataset generated from a Beta regression model, the estimated quantile for this model perform better than those based on the linear quantile regression with logit transformation. Further, we investigate the performance of the quantile regression estimators based on the LQR and we conclude that it is better than those based on the Beta regression when the distribution is contaminated with 10% uniform numbers between 0 and 1. The proposed recommendation is that we can use logit transformation LQR if (1) we are dealing with quantile estimation in data between 0 and 1 (2) we ascertain that the data fit well to the contemplated bounded data regressions (whether Beta Regression or otherwise) and (3) if the fit of the model is suspected.
- Published
- 2022
22. Kernel density estimation with bounded data.
- Author
-
Kang, Young-Jin, Noh, Yoojeong, and Lim, O-Kaung
- Subjects
- *
KERNEL functions , *THEORY of distributions (Functional analysis) , *STATISTICAL models , *COMPUTER-aided engineering , *ELECTRIC power , *ELECTRIC motors - Abstract
The uncertainties of input variables are quantified as probabilistic distribution functions using parametric or nonparametric statistical modeling methods for reliability analysis or reliability-based design optimization. However, parametric statistical modeling methods such as the goodness-of-fit test and the model selection method are inaccurate when the number of data is very small or the input variables do not have parametric distributions. To deal with this problem, kernel density estimation with bounded data (KDE-bd) and KDE with estimated bounded data (KDE-ebd), which randomly generates bounded data within given input variable intervals for given data and applies them to generate density functions, are proposed in this study. Since the KDE-bd and KDE-ebd use input variable intervals, they attain better convergence to the population distribution than the original KDE does, especially for a small number of given data. The KDE-bd can even deal with a problem that has one data with input variable bounds. To verify the proposed method, statistical simulation tests were carried out for various numbers of data using multiple distribution types and then the KDE-bd and KDE-ebd were compared with the KDE. The results showed the KDE-bd and KDE-ebd to be more accurate than the original KDE, especially when the number of data is less than 10. It is also more robust than the original KDE regardless of the quality of given data, and is therefore more useful even if there is insufficient data for input variables. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. Improving estimation for beta regression models via EM-algorithm and related diagnostic tools.
- Author
-
Barreto-Souza, Wagner and Simas, Alexandre B.
- Subjects
- *
REGRESSION analysis , *PARAMETER estimation , *MONTE Carlo method , *ALGORITHMS , *STOCHASTIC processes - Abstract
In this paper we propose an alternative procedure for estimating the parameters of the beta regression model. This alternative estimation procedure is based on the EM-algorithm. For this, we took advantage of the stochastic representation of the beta random variable through ratio of independent gamma random variables. We present a complete approach based on the EM-algorithm. More specifically, this approach includes point and interval estimations and diagnostic tools for detecting outlying observations. As it will be illustrated in this paper, the EM-algorithm approach provides a better estimation of the precision parameter when compared to the direct maximum likelihood (ML) approach. We present the results of Monte Carlo simulations to compare EM-algorithm and direct ML. Finally, two empirical examples illustrate the full EM-algorithm approach for the beta regression model. This paper contains a Supplementary Material. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
24. Bounded and discrete data and Likert scales in data envelopment analysis: application to regional energy efficiency in China.
- Author
-
Chen, Ya, Cook, Wade, Du, Juan, Hu, Hanhui, and Zhu, Joe
- Subjects
- *
ENERGY consumption , *DATA , *STATISTICS , *POWER resources - Abstract
In data envelopment analysis (DEA), it is usually assumed that all data are continuous and not restricted by upper and/or lower bounds. However, there are situations where data are discrete and/or bounded, and where projections arising from DEA models are required to fall within those bounds. Such situations can be found, for example, in cases where percentage data are present and where projected percentages must not exceed the requisite 100 % limit. Other examples include Likert scale data. Using existing integer DEA approaches as a backdrop, the current paper presents models for dealing with bounded and discrete data. Our proposed models address the issue of constraining DEA projections to fall within imposed bounds. It is shown that Likert scale data can be modeled using the proposed approach. The proposed DEA models are used to evaluate the energy efficiency of 29 provinces in China. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
25. The Influence of Potential Infection on the Relationship between Temperature and Confirmed Cases of COVID-19 in China
- Author
-
Weiran Lin and Qiuqin He
- Subjects
Mainland China ,bounded data ,Geography, Planning and Development ,Population ,Kernel density estimation ,TJ807-830 ,010501 environmental sciences ,Management, Monitoring, Policy and Law ,TD194-195 ,01 natural sciences ,Renewable energy sources ,Copula (probability theory) ,0502 economics and business ,Statistics ,Floating population ,GE1-350 ,nonparametric ,Proxy (statistics) ,education ,0105 earth and related environmental sciences ,Mathematics ,education.field_of_study ,Environmental effects of industries and plants ,Renewable Energy, Sustainability and the Environment ,05 social sciences ,temperature ,COVID-19 ,Density estimation ,Environmental sciences ,Copula ,Kernel (statistics) ,050203 business & management - Abstract
Considering the impact of the number of potential new coronavirus infections in each city, this paper explores the relationship between temperature and cumulative confirmed cases of COVID-19 in mainland China through the non-parametric method. In this paper, the floating population of each city in Wuhan is taken as a proxy variable for the number of potential new coronavirus infections. Firstly, to use the non-parametric method correctly, the symmetric Gauss kernel and asymmetric Gamma kernel are applied to estimate the density of cumulative confirmed cases of COVID-19 in China. The result confirms that the Gamma kernel provides a more reasonable density estimation of bounded data than the Gauss kernel. Then, through the non-parametric method based on the Gamma kernel estimation, this paper finds a positive relationship between Wuhan’s mobile population and cumulative confirmed cases, while the relationship between temperature and cumulative confirmed cases is inconclusive in China when the impact of the number of potential new coronavirus infections in each city is considered. Compared with the weather, the potentially infected population plays a more critical role in spreading the virus. Therefore, the role of prevention and control measures is more important than weather factors. Even in summer, we should also pay attention to the prevention and control of the epidemic.
- Published
- 2021
- Full Text
- View/download PDF
26. A non-parametric Bayesian model for bounded data.
- Author
-
Minh Nguyen, Thanh and Jonathan Wu, Q.M.
- Subjects
- *
BAYESIAN analysis , *DISTRIBUTION (Probability theory) , *NONPARAMETRIC estimation , *DENSITY functionals , *COMPUTER simulation , *PATTERN recognition systems - Abstract
The intensity distribution of the observed data in many practical problems is digitalized and has bounded support. There has been growing research interest in model-based techniques to carry out on the non-Gaussian shape of observed data. However, users set remaining parameters in the existing models based on prior knowledge. Also, the distribution in the existing models is unbounded, which is not sufficiently flexible to fit different shapes of the bounded support data. In this paper, we present a non-parametric Bayesian model for modeling the probability density function of the bounded data. The advantage of our method is that the number of the parameters in the proposed model is variable and infinite, which makes the model conceptually simpler and more adaptable to the size of the data. We present numerical experiments in which we test the proposed model in various data from simulated to real data. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
27. Can bayesian models play a role in dental caries epidemiology? Evidence from an application to the BELCAP data set.
- Author
-
Matranga, Domenica, Firenze, Alberto, and Vullo, Angela
- Subjects
- *
ACADEMIC medical centers , *CONFIDENCE intervals , *DENTAL caries , *EPIDEMIOLOGY , *REGRESSION analysis , *DATA analysis , *CHILDREN - Abstract
Objectives The aim of this study was to show the potential of Bayesian analysis in statistical modelling of dental caries data. Because of the bounded nature of the dmft ( DMFT) index, zero-inflated binomial ( ZIB) and beta-binomial ( ZIBB) models were considered. The effects of incorporating prior information available about the parameters of models were also shown. Methods The data set used in this study was the Belo Horizonte Caries Prevention ( BELCAP) study (Böhning et al. (1999)), consisting of five variables collected among 797 Brazilian school children designed to evaluate four programmes for reducing caries. Only the eight primary molar teeth were considered in the data set. A data augmentation algorithm was used for estimation. Firstly, noninformative priors were used to express our lack of knowledge about the regression parameters. Secondly, prior information about the probability of being a structural zero dmft and the probability of being caries affected in the subpopulation of susceptible children was incorporated. Results With noninformative priors, the best fitting model was the ZIBB. Education ( OR = 0.76, 95% CrI: 0.59, 0.99), all interventions ( OR = 0.46, 95% CrI: 0.35, 0.62), rinsing ( OR = 0.61, 95% CrI: 0.47, 0.80) and hygiene ( OR = 0.65, 95% CrI: 0.49, 0.86) were demonstrated to be factors protecting children from being caries affected. Being male increased the probability of being caries diseased ( OR = 1.19, 95% CrI: 1.01, 1.42). However, after incorporating informative priors, ZIB models' estimates were not influenced, while ZIBB models reduced deviance and confirmed the association with all interventions and rinsing only. Discussion In our application, Bayesian estimates showed a similar accuracy and precision than likelihood-based estimates, although they offered many computational advantages and the possibility of expressing all forms of uncertainty in terms of probability. The overdispersion parameter could expound why the introduction of prior information had significant effects on the parameters of the ZIBB model, while ZIB estimates remained unchanged. Finally, the best performance of ZIBB compared to the ZIB model was shown to catch overdispersion in data. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
28. Integrated framework of risk evaluation and risk allocation with bounded data
- Author
-
Bae, Young Min and Lee, Young Hoon
- Subjects
- *
DATA envelopment analysis , *RISK assessment , *DATA analysis , *RISK management in business , *HELICOPTERS , *MATHEMATICAL programming , *NUMERICAL analysis , *EXPERT systems - Abstract
Abstract: This paper presents an integrated framework for risk evaluation and risk allocation with bounded data in a critical risk management. A risk evaluation framework using the Imprecise Data Envelopment Analysis (IDEA) method is proposed to be applied to operations of Korean Army helicopters. The risks pertaining to pilots, missions and helicopters are evaluated based on bounded data, and pilots are appropriately allocated to missions and helicopters using goal programming with bounded risk scores. Using bounded data, two risk allocation models are developed to be used with the expected value and lower/upper limit values, resulting in improved reliability of the solutions. Numerical experiments show reasonable solutions and valuable information for risk management. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
29. Adaptive density estimation on bounded domains
- Author
-
Salima El Kolei, Nicolas Klutchnikoff, Karine Bertin, Universidad de Valparaiso [Chile], Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz] (ENSAI), Institut de Recherche Mathématique de Rennes (IRMAR), AGROCAMPUS OUEST, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Université de Rennes 2 (UR2), Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), IRMAR-STAT, Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-INSTITUT AGRO Agrocampus Ouest, and Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
- Subjects
Sobolev-Slobodetskii classes ,Statistics and Probability ,Sobolev–Slobodetskii classes ,Boundary bias ,Kernel density estimation ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,Multivariate kernel density estimation ,01 natural sciences ,Combinatorics ,010104 statistics & probability ,Adaptive estimation ,Bounded data ,0502 economics and business ,FOS: Mathematics ,62G05 ,0101 mathematics ,62G20 ,Oracle inequality ,050205 econometrics ,Mathematics ,05 social sciences ,Estimator ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,Density estimation ,[STAT]Statistics [stat] ,Sobolev space ,Bounded function ,Statistics, Probability and Uncertainty - Abstract
International audience; We study the estimation, in L p-norm, of density functions dened on [0, 1] d. We construct a new family of kernel density estimators that do not suer from the so-called boundary bias problem and we propose a data-driven procedure based on the Goldenshluger and Lepski approach that jointly selects a kernel and a bandwidth. We derive two estimators that satisfy oracle-type inequalities. They are also proved to be adaptive over a scale of anisotropic or isotropic Sobolev-Slobodetskii classes (which are particular cases of Besov or Sobolev classical classes). The main interest of the isotropic procedure is to obtain adaptive results without any restriction on the smoothness parameter. Abstract Nous étudions l'estimation, en norme L p , d'une densité de probabilté dénie sur [0, 1] d. Nous construisons une nouvelle famille d'estimateurs à noyaux qui ne sont pas biaisés au bord du domaine de dénition et nous proposons une procédure de sélection simultanée d'un noyau et d'une fenêtre de lissage en adaptant la méthode développée par Goldenshluger et Lepski. Deux estimateurs diérents, déduits de cette procédure générale, sont proposés et des inégalités oracles sont établies pour chacun d'eux. Ces inégalités permettent de prouver que les-dits estimateurs sont adapatatifs par rapport à des familles de classes de Sobolev-Slobodetskii anisotropes ou isotropes. Dans cette dernière situation aucune borne supérieure sur le paramètre de régularité n'est imposée.
- Published
- 2019
- Full Text
- View/download PDF
30. The Influence of Potential Infection on the Relationship between Temperature and Confirmed Cases of COVID-19 in China.
- Author
-
Lin, Weiran and He, Qiuqin
- Abstract
Considering the impact of the number of potential new coronavirus infections in each city, this paper explores the relationship between temperature and cumulative confirmed cases of COVID-19 in mainland China through the non-parametric method. In this paper, the floating population of each city in Wuhan is taken as a proxy variable for the number of potential new coronavirus infections. Firstly, to use the non-parametric method correctly, the symmetric Gauss kernel and asymmetric Gamma kernel are applied to estimate the density of cumulative confirmed cases of COVID-19 in China. The result confirms that the Gamma kernel provides a more reasonable density estimation of bounded data than the Gauss kernel. Then, through the non-parametric method based on the Gamma kernel estimation, this paper finds a positive relationship between Wuhan's mobile population and cumulative confirmed cases, while the relationship between temperature and cumulative confirmed cases is inconclusive in China when the impact of the number of potential new coronavirus infections in each city is considered. Compared with the weather, the potentially infected population plays a more critical role in spreading the virus. Therefore, the role of prevention and control measures is more important than weather factors. Even in summer, we should also pay attention to the prevention and control of the epidemic. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
31. Principal component analysis with boundary constraints
- Author
-
Henk A.L. Kiers, Paolo Giordani, and Psychometrics and Statistics
- Subjects
alternating least squares procedure ,least squares with inequality algorithm ,Mathematical optimization ,bounded data ,Generalization ,principal component analysis ,Applied Mathematics ,Zero (complex analysis) ,non-negative least squares algorithm ,Least squares ,Analytical Chemistry ,Constraint (information theory) ,Chemometrics ,SQUARES ,Bounded function ,Principal component analysis ,Order (group theory) ,ALGORITHM ,Algorithm ,3-MODE FACTOR-ANALYSIS ,Mathematics - Abstract
Observed data often belong to some specific intervals of values (for instance in case of percentages or proportions) or are higher (lower) than pre-specified values (for instance, chemical concentrations are higher than zero). The use of classical principal component analysis (PCA) may lead to extract components such that the reconstructed data take unfeasible values. In order to cope with this problem, a Constrained generalization of PCA is proposed. The new technique, called bounded principal component analysis (B-PCA), detects components such that the reconstructed data are constrained to belong to some pre-specified bounds. This is done by implementing a row-wise alternating least squares (ALS) algorithm, which exploits the potentialities of the least squares with inequality (LSI) algorithm. The results of a simulation study and two applications to bounded data are discussed for evaluating how the method and the algorithm for solving it work in practice. Copyright (C) 2007 John Wiley & Sons, Ltd.
- Published
- 2007
- Full Text
- View/download PDF
32. Estimating the extensive margin of trade
- Author
-
Kehai Wei, Joao M C Santos Silva, and Silvana Tenreyro
- Subjects
Economics and Econometrics ,media_common.quotation_subject ,Conditional expectation ,Upper and lower bounds ,jel:C25 ,bounded data ,extensive margin of trade ,number of sectors ,Margin (machine learning) ,Bounded data ,0502 economics and business ,Statistics ,Economics ,Relevance (information retrieval) ,050207 economics ,050205 econometrics ,media_common ,Estimation ,HB Economic Theory ,Variables ,05 social sciences ,jel:C51 ,Estimator ,jel:C13 ,HF Commerce ,jel:F14 ,jel:F11 ,Number of sectors ,HD Industries. Land use. Labor ,Constant (mathematics) ,Estimation of trade models ,estimation of trade models ,Finance - Abstract
Understanding and quantifying the determinants of the number of sectors or firms exporting in a given country is of relevance for the assessment of trade policies. Estimation of models for the number of sectors, however, poses a challenge because the dependent variable has both a lower and an upper bound, implying that the partial effects of the explanatory variables on the conditional mean of the dependent variable cannot be constant and must approach zero as the conditional mean approaches its bounds. We argue that ignoring these bounds can lead to erroneous conclusions due to the model's mispecification, and propose a flexible specification that accounts for the doubly-bounded nature of the dependent variable. We empirically investigate the problem and the proposed solution, finding significant differences between estimates obtained with the proposed estimator and those obtained with standard approaches.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.