1. Assessing Point Forecast Bias Across Multiple Time Series: Measures and Visual Tools
- Author
-
Paul Goodwin and Andrey Davydenko
- Subjects
mean percentage error (MPE) ,forecast evaluation setup ,computer.software_genre ,criteria for error measures ,ease of implementation ,RelAME ,error metrics ,OPc-diagram ,absolute mean scaled error (AMSE) ,bepress|Social and Behavioral Sciences|Social Statistics ,Multiple time ,log accuracy ratio ,bepress|Social and Behavioral Sciences|Environmental Studies ,absolute mean error (AME) ,Function (engineering) ,media_common ,bepress|Social and Behavioral Sciences|Other Social and Behavioral Sciences ,Point (typography) ,rolling-origin evaluation ,prediction-realization diagram ,symmetric linear loss ,geometric mean ,Reporting bias ,binomial test ,bepress|Social and Behavioral Sciences|Economics ,media_common.quotation_subject ,construct validity ,forecasting ,forecasting performance ,050906 social work ,statistical graphics ,symmetric quadratic loss ,data transformation ,AvgRel-prefix ,forecast density ,SocArXiv|Social and Behavioral Sciences|Other Social and Behavioral Sciences ,bepress|Social and Behavioral Sciences|Science and Technology Studies ,AvgRelME ,FEW-L2 ,FEW-L1 ,LnQ ,Series (mathematics) ,RelAMdE ,relative performance ,ease of interpretation ,SocArXiv|Social and Behavioral Sciences|Social Statistics ,forecasting competitions ,Wilcoxon signed rank test ,SocArXiv|Social and Behavioral Sciences ,0509 other social sciences ,Computer science ,forecast accuracy ,testing for bias ,forecast evaluation workflow (FEW) ,target loss function ,absolute median error (AMdE) ,mean error (ME) ,regression bias ,error measures ,optimal correction of forecasts ,Forecast bias ,SocArXiv|Social and Behavioral Sciences|Science and Technology Studies ,median error (MdE) ,pooled prediction-realization diagram (PPRD) ,05 social sciences ,SocArXiv|Social and Behavioral Sciences|Economics ,AvgRelAME ,forecast evaluation ,median bias ,SocArXiv|Social and Behavioral Sciences|Economics|Econometrics ,AvgRelAMdE ,Overestimation Percentage corrected (OPc) ,OPc-boxplot ,050104 developmental & child psychology ,probability_and_statistics ,ease of communication ,Machine learning ,relative mean error (RelME) ,AvgRelMAE ,forecast value added (FVA) ,0501 psychology and cognitive sciences ,bepress|Social and Behavioral Sciences|Economics|Econometrics ,mean bias ,MAD/MEAN ratio ,AvgRelMSE ,business.industry ,AvgRelRMSE ,scale-independence ,forecast evaluation framework ,AvgRel-metrics ,Visualization ,SocArXiv|Social and Behavioral Sciences|Environmental Studies ,AvgRelMdE ,forecast bias ,bepress|Social and Behavioral Sciences ,point forecast evaluation setup (PFES) ,Artificial intelligence ,business ,Focus (optics) ,computer ,Overestimation Percentage (OP) - Abstract
Cite as:Davydenko, A., & Goodwin, P. (2021). Assessing point forecast bias across multiple time series: Measures and visual tools. International Journal of Statistics and Probability, 10(5), 46-69. https://doi.org/10.5539/ijsp.v10n5p46 Note: This is the final version of the paper, which appeared in the International Journal of Statistics and Probability. The first draft of this paper was uploaded to Preprints.org on 11 May, 2021:https://doi.org/10.20944/preprints202105.0261.v1 Abstract Measuring bias is important as it helps identify flaws in quantitative forecasting methods or judgmental forecasts. It can, therefore, potentially help improve forecasts. Despite this, bias tends to be under represented in the literature: many studies focus solely on measuring accuracy. Methods for assessing bias in single series are relatively well known and well researched, but for datasets containing thousands of observations for multiple series, the methodology for measuring and reporting bias is less obvious. We compare alternative approaches against a number of criteria when rolling origin point forecasts are available for different forecasting methods and for multiple horizons over multiple series. We focus on relatively simple, yet interpretable and easy to implement metrics and visualization tools that are likely to be applicable in practice. To study the statistical properties of alternative measures we use theoretical concepts and simulation experiments based on artificial data with predetermined features. We describe the difference between mean and median bias, describe the connection between metrics for accuracy and bias, provide suitable bias measures depending on the loss function used to optimise forecasts, and suggest which measures for accuracy should be used to accompany bias indicators. We propose several new measures and provide our recommendations on how to evaluate forecast bias across multiple series. Summary of Contributions Research on metrics for forecast bias tends to be under-represented in the literature: most studies have focused solely on measuring accuracy. At the same time, measuring and reporting bias is important as it helps detect flaws in forecasting methods and to gain insights into how forecasting performance may be improved. This paper focuses on bias measurement and visualization techniques and on the relationship between bias, accuracy, and the loss function used to optimise forecasts. The paper makes the following contributions to the fields of applied statistical analysis and forecasting. 1) The Point Forecast Evaluation Setup (PFES) was defined where it is needed to evaluate forecast performance for rolling-origin point forecasts across multiple time series and horizons. It is assumed that forecasts were optimised under linear or quadratic loss and it is needed to evaluate accuracy and bias of point forecasts. 2) Regarding the criteria for finding suitable error measures, the following new principles were proposed: construct validity (which reflects the extent to which a metric measures what it is intended to measure), the ease of communication to the participants of the forecasting process (who may not be technical specialists), and the ease of implementation. 3) Special experiments were conducted in order to evaluate well-known bias indicators. In particular, it was demonstrated that the mean percentage error (MPE) is not advisable due to the non-symmetric features of percentage errorsand outliers caused by low actual values. It was also shown that the Overestimation Percentage (OP), the LnQ, the Absolute Mean Scaled Error (AMScE), the Relative Mean Error (RelME) and the Average Relative Absolute Mean Error (AvgRelAME) have their own limitations. The experiments conducted were based on normal and log-normal distributions in order to simulate time series resembling real-world datasets. 4) It was demonstrated that it is important to match bias indicator with the target loss function. In particular, indicators for median bias should be used when evaluating the performance of forecasts optimised under linear loss, while indicators for mean bias should be used when the target loss function is quadratic. 5) In order to detect the presence of median bias, the Overestimation Percentage corrected (OPc) metric was proposed. The OPc metric is calculated as OPc = OP+ZP/2, where OP is the percentage cases when actual was overestimated and ZP is the percentage of zero errors. To visualise the underlying distribution for the Overestimation Percentage corrected (OPc), the OPc-boxplot visual tool was proposed. Additionally, the OPc-diagram was proposed for reporting summary results. This metric was shown to be robust, immune to outliers, and easy to implement and to communicate. The OPc, however, only shows the frequency of cases of overestimation and does not show the magnitude of bias. 6) To indicate the magnitude and the direction of median bias the following additional metrics were proposed based on the use of the geometric mean: the Average Relative Absolute Median Error (AvgRelAMdE) and the Average Relative Median Error (AvgRelMdE). The former measures bias in comparison with a benchmark method, while the latter reports bias in terms of time series median. Enhanced boxplots for the AvgRel-metrics (AvgRel-boxplots)were introduced to aid visual analysis. 7) To report mean bias, the Average Relative Mean Error(AvgRelME) metric was proposed. The AvgRelME was derived based on LnQ, where Q was replaced with(1-RelME) in order to obtain a better proxy for ME compared to the LnQ. 8) The term forecast evaluation workflow (FEW) was defined as a set of sequential activities aiming to ensure a comprehensive, informative, and reliable forecast evaluation. 9) Two alternative forecast evaluation workflows (FEWs) were proposed describing step-by-step instructions for the evaluation and comparison of forecasts depending on the target loss function. The workflows proposed were named FEW-L1 and FEW-L2. The FEW-L1 workflow assumes the linear symmetric target loss function and the FEW-L2 workflow assumes the quadratic symmetric target loss function. The workflows proposed use the new metrics and visual tools and aim to ensure comprehensive forecast evaluation and detailed interpretation of results in the context of the PFES. In order to visually detect outliers and data flaws the workflows proposed rely on the use ofthe pooled prediction realization diagram (PPRD), which is a special variant of the prediction-realization diagramshowing data across series on one plot. Our suggested procedures are applicable both to academic researchers who are developing and evaluating new forecasting methods and practitioners wishing to evaluate the current forecasting performance of their organisation. The framework presented allows the preparation of reports in accordance with principles and methodologies for carrying out data science projects.
- Published
- 2021
- Full Text
- View/download PDF