TY - JOUR
T1 - Expert forecasting with and without uncertainty quantification and weighting
T2 - What do the data say?
AU - Cooke, Roger M.
AU - Marti, Deniz
AU - Mazzuchi, Thomas
PY - 2020
Y1 - 2020
N2 - Post-2006 expert judgment data has been extended to 530 experts assessing 580 calibration variables from their fields. New analysis shows that point predictions as medians of combined expert distributions outperform combined medians, and medians of performance weighted combinations outperform medians of equal weighted combinations. Relative to the equal weight combination of medians, using the medians of performance weighted combinations yields a 65% improvement. Using the medians of equally weighted combinations yields a 46% improvement. The Random Expert Hypothesis underlying all performance-blind combination schemes, namely that differences in expert performance reflect random stressors and not persistent properties of the experts, is tested by randomly scrambling expert panels. Generating distributions for a full set of performance metrics, the hypotheses that the original panels’ performance measures are drawn from distributions produced by random scrambling are rejected at significance levels ranging from E−6 to E−12. Random stressors cannot produce the variations in performance seen in the original panels. In- and out-of-sample validation results are updated.
AB - Post-2006 expert judgment data has been extended to 530 experts assessing 580 calibration variables from their fields. New analysis shows that point predictions as medians of combined expert distributions outperform combined medians, and medians of performance weighted combinations outperform medians of equal weighted combinations. Relative to the equal weight combination of medians, using the medians of performance weighted combinations yields a 65% improvement. Using the medians of equally weighted combinations yields a 46% improvement. The Random Expert Hypothesis underlying all performance-blind combination schemes, namely that differences in expert performance reflect random stressors and not persistent properties of the experts, is tested by randomly scrambling expert panels. Generating distributions for a full set of performance metrics, the hypotheses that the original panels’ performance measures are drawn from distributions produced by random scrambling are rejected at significance levels ranging from E−6 to E−12. Random stressors cannot produce the variations in performance seen in the original panels. In- and out-of-sample validation results are updated.
KW - Calibration
KW - Combining forecasts
KW - Evaluating forecasts
KW - Judgmental forecasting
KW - Panel data
KW - Simulation
UR - http://www.scopus.com/inward/record.url?scp=85088787888&partnerID=8YFLogxK
U2 - 10.1016/j.ijforecast.2020.06.007
DO - 10.1016/j.ijforecast.2020.06.007
M3 - Article
AN - SCOPUS:85088787888
SN - 0169-2070
VL - 37
SP - 378
EP - 387
JO - International Journal of Forecasting
JF - International Journal of Forecasting
IS - 1
ER -