Back to Search Start Over

Assessing Point Forecast Bias Across Multiple Time Series: Measures and Visual Tools

Authors :
Paul Goodwin
Andrey Davydenko
Publication Year :
2021
Publisher :
figshare, 2021.

Abstract

Cite as:Davydenko, A., & Goodwin, P. (2021). Assessing point forecast bias across multiple time series: Measures and visual tools. International Journal of Statistics and Probability, 10(5), 46-69. https://doi.org/10.5539/ijsp.v10n5p46 Note: This is the final version of the paper, which appeared in the International Journal of Statistics and Probability. The first draft of this paper was uploaded to Preprints.org on 11 May, 2021:https://doi.org/10.20944/preprints202105.0261.v1 Abstract Measuring bias is important as it helps identify flaws in quantitative forecasting methods or judgmental forecasts. It can, therefore, potentially help improve forecasts. Despite this, bias tends to be under represented in the literature: many studies focus solely on measuring accuracy. Methods for assessing bias in single series are relatively well known and well researched, but for datasets containing thousands of observations for multiple series, the methodology for measuring and reporting bias is less obvious. We compare alternative approaches against a number of criteria when rolling origin point forecasts are available for different forecasting methods and for multiple horizons over multiple series. We focus on relatively simple, yet interpretable and easy to implement metrics and visualization tools that are likely to be applicable in practice. To study the statistical properties of alternative measures we use theoretical concepts and simulation experiments based on artificial data with predetermined features. We describe the difference between mean and median bias, describe the connection between metrics for accuracy and bias, provide suitable bias measures depending on the loss function used to optimise forecasts, and suggest which measures for accuracy should be used to accompany bias indicators. We propose several new measures and provide our recommendations on how to evaluate forecast bias across multiple series. Summary of Contributions Research on metrics for forecast bias tends to be under-represented in the literature: most studies have focused solely on measuring accuracy. At the same time, measuring and reporting bias is important as it helps detect flaws in forecasting methods and to gain insights into how forecasting performance may be improved. This paper focuses on bias measurement and visualization techniques and on the relationship between bias, accuracy, and the loss function used to optimise forecasts. The paper makes the following contributions to the fields of applied statistical analysis and forecasting. 1) The Point Forecast Evaluation Setup (PFES) was defined where it is needed to evaluate forecast performance for rolling-origin point forecasts across multiple time series and horizons. It is assumed that forecasts were optimised under linear or quadratic loss and it is needed to evaluate accuracy and bias of point forecasts. 2) Regarding the criteria for finding suitable error measures, the following new principles were proposed: construct validity (which reflects the extent to which a metric measures what it is intended to measure), the ease of communication to the participants of the forecasting process (who may not be technical specialists), and the ease of implementation. 3) Special experiments were conducted in order to evaluate well-known bias indicators. In particular, it was demonstrated that the mean percentage error (MPE) is not advisable due to the non-symmetric features of percentage errorsand outliers caused by low actual values. It was also shown that the Overestimation Percentage (OP), the LnQ, the Absolute Mean Scaled Error (AMScE), the Relative Mean Error (RelME) and the Average Relative Absolute Mean Error (AvgRelAME) have their own limitations. The experiments conducted were based on normal and log-normal distributions in order to simulate time series resembling real-world datasets. 4) It was demonstrated that it is important to match bias indicator with the target loss function. In particular, indicators for median bias should be used when evaluating the performance of forecasts optimised under linear loss, while indicators for mean bias should be used when the target loss function is quadratic. 5) In order to detect the presence of median bias, the Overestimation Percentage corrected (OPc) metric was proposed. The OPc metric is calculated as OPc = OP+ZP/2, where OP is the percentage cases when actual was overestimated and ZP is the percentage of zero errors. To visualise the underlying distribution for the Overestimation Percentage corrected (OPc), the OPc-boxplot visual tool was proposed. Additionally, the OPc-diagram was proposed for reporting summary results. This metric was shown to be robust, immune to outliers, and easy to implement and to communicate. The OPc, however, only shows the frequency of cases of overestimation and does not show the magnitude of bias. 6) To indicate the magnitude and the direction of median bias the following additional metrics were proposed based on the use of the geometric mean: the Average Relative Absolute Median Error (AvgRelAMdE) and the Average Relative Median Error (AvgRelMdE). The former measures bias in comparison with a benchmark method, while the latter reports bias in terms of time series median. Enhanced boxplots for the AvgRel-metrics (AvgRel-boxplots)were introduced to aid visual analysis. 7) To report mean bias, the Average Relative Mean Error(AvgRelME) metric was proposed. The AvgRelME was derived based on LnQ, where Q was replaced with(1-RelME) in order to obtain a better proxy for ME compared to the LnQ. 8) The term forecast evaluation workflow (FEW) was defined as a set of sequential activities aiming to ensure a comprehensive, informative, and reliable forecast evaluation. 9) Two alternative forecast evaluation workflows (FEWs) were proposed describing step-by-step instructions for the evaluation and comparison of forecasts depending on the target loss function. The workflows proposed were named FEW-L1 and FEW-L2. The FEW-L1 workflow assumes the linear symmetric target loss function and the FEW-L2 workflow assumes the quadratic symmetric target loss function. The workflows proposed use the new metrics and visual tools and aim to ensure comprehensive forecast evaluation and detailed interpretation of results in the context of the PFES. In order to visually detect outliers and data flaws the workflows proposed rely on the use ofthe pooled prediction realization diagram (PPRD), which is a special variant of the prediction-realization diagramshowing data across series on one plot. Our suggested procedures are applicable both to academic researchers who are developing and evaluating new forecasting methods and practitioners wishing to evaluate the current forecasting performance of their organisation. The framework presented allows the preparation of reports in accordance with principles and methodologies for carrying out data science projects.

Subjects

Subjects :
mean percentage error (MPE)
forecast evaluation setup
computer.software_genre
criteria for error measures
ease of implementation
RelAME
error metrics
OPc-diagram
absolute mean scaled error (AMSE)
bepress|Social and Behavioral Sciences|Social Statistics
Multiple time
log accuracy ratio
bepress|Social and Behavioral Sciences|Environmental Studies
absolute mean error (AME)
Function (engineering)
media_common
bepress|Social and Behavioral Sciences|Other Social and Behavioral Sciences
Point (typography)
rolling-origin evaluation
prediction-realization diagram
symmetric linear loss
geometric mean
Reporting bias
binomial test
bepress|Social and Behavioral Sciences|Economics
media_common.quotation_subject
construct validity
forecasting
forecasting performance
050906 social work
statistical graphics
symmetric quadratic loss
data transformation
AvgRel-prefix
forecast density
SocArXiv|Social and Behavioral Sciences|Other Social and Behavioral Sciences
bepress|Social and Behavioral Sciences|Science and Technology Studies
AvgRelME
FEW-L2
FEW-L1
LnQ
Series (mathematics)
RelAMdE
relative performance
ease of interpretation
SocArXiv|Social and Behavioral Sciences|Social Statistics
forecasting competitions
Wilcoxon signed rank test
SocArXiv|Social and Behavioral Sciences
0509 other social sciences
Computer science
forecast accuracy
testing for bias
forecast evaluation workflow (FEW)
target loss function
absolute median error (AMdE)
mean error (ME)
regression bias
error measures
optimal correction of forecasts
Forecast bias
SocArXiv|Social and Behavioral Sciences|Science and Technology Studies
median error (MdE)
pooled prediction-realization diagram (PPRD)
05 social sciences
SocArXiv|Social and Behavioral Sciences|Economics
AvgRelAME
forecast evaluation
median bias
SocArXiv|Social and Behavioral Sciences|Economics|Econometrics
AvgRelAMdE
Overestimation Percentage corrected (OPc)
OPc-boxplot
050104 developmental & child psychology
probability_and_statistics
ease of communication
Machine learning
relative mean error (RelME)
AvgRelMAE
forecast value added (FVA)
0501 psychology and cognitive sciences
bepress|Social and Behavioral Sciences|Economics|Econometrics
mean bias
MAD/MEAN ratio
AvgRelMSE
business.industry
AvgRelRMSE
scale-independence
forecast evaluation framework
AvgRel-metrics
Visualization
SocArXiv|Social and Behavioral Sciences|Environmental Studies
AvgRelMdE
forecast bias
bepress|Social and Behavioral Sciences
point forecast evaluation setup (PFES)
Artificial intelligence
business
Focus (optics)
computer
Overestimation Percentage (OP)

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....a2d95922b36eef22791777680f967ade
Full Text :
https://doi.org/10.6084/m9.figshare.16663234