Calibrating experts’ probabilistic assessments for improved probabilistic predictions

A.M. Hanea; G.F. Nane

doi:10.1016/j.ssci.2019.05.048

Calibrating experts’ probabilistic assessments for improved probabilistic predictions

A.M. Hanea^*, G.F. Nane

^*Corresponding author for this work

Applied Probability

Research output: Contribution to journal › Article › Scientific › peer-review

13 Citations (Scopus)

33 Downloads (Pure)

Abstract

Expert judgement is routinely required to inform critically important decisions. While expert judgement can be remarkably useful when data are absent, it can be easily influenced by contextual biases which can lead to poor judgements and subsequently poor decisions. Structured elicitation protocols aim to: (1) guard against biases and provide better (aggregated) judgements, and (2) subject expert judgements to the same level of scrutiny as is expected for empirical data. The latter ensures that if judgements are to be used as data, they are subject to the scientific principles of review, critical appraisal, and repeatability. Objectively evaluating the quality of expert data and validating expert judgements are other essential elements. Considerable research suggests that the performance of experts should be evaluated by scoring experts on questions related to the elicitation questions, whose answers are known a priori. Experts who can provide accurate, well-calibrated and informative judgements should receive more weight in a final aggregation of judgements. This is referred to as performance-weighting in the mathematical aggregation of multiple judgements. The weights depend on the chosen measures of performance. We are yet to understand the best methods to aggregate judgements, how well such aggregations perform out of sample, or the costs involved, as well as the benefits of the various approaches. In this paper we propose and explore a new measure of experts’ calibration. A sizeable data set containing predictions for outcomes of geopolitical events is used to investigate the properties of this calibration measure when compared to other, well established measures.

Original language	English
Pages (from-to)	763-771
Number of pages	9
Journal	Safety Science
Volume	118
DOIs	https://doi.org/10.1016/j.ssci.2019.05.048
Publication status	Published - 2019

Bibliographical note

Accepted Author Manuscript

Keywords

Calibration
Performance based weighting
Probabilistic predictions
Structured expert judgement

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.ssci.2019.05.048

SRAESpecialIssueSafetyScienceHaneaNaneAccepted author manuscript, 581 KBLicence: CC BY-NC-ND

Cite this

@article{78fecf1ab7854966ada923975d056071,

title = "Calibrating experts{\textquoteright} probabilistic assessments for improved probabilistic predictions",

abstract = "Expert judgement is routinely required to inform critically important decisions. While expert judgement can be remarkably useful when data are absent, it can be easily influenced by contextual biases which can lead to poor judgements and subsequently poor decisions. Structured elicitation protocols aim to: (1) guard against biases and provide better (aggregated) judgements, and (2) subject expert judgements to the same level of scrutiny as is expected for empirical data. The latter ensures that if judgements are to be used as data, they are subject to the scientific principles of review, critical appraisal, and repeatability. Objectively evaluating the quality of expert data and validating expert judgements are other essential elements. Considerable research suggests that the performance of experts should be evaluated by scoring experts on questions related to the elicitation questions, whose answers are known a priori. Experts who can provide accurate, well-calibrated and informative judgements should receive more weight in a final aggregation of judgements. This is referred to as performance-weighting in the mathematical aggregation of multiple judgements. The weights depend on the chosen measures of performance. We are yet to understand the best methods to aggregate judgements, how well such aggregations perform out of sample, or the costs involved, as well as the benefits of the various approaches. In this paper we propose and explore a new measure of experts{\textquoteright} calibration. A sizeable data set containing predictions for outcomes of geopolitical events is used to investigate the properties of this calibration measure when compared to other, well established measures.",

keywords = "Calibration, Performance based weighting, Probabilistic predictions, Structured expert judgement",

author = "A.M. Hanea and G.F. Nane",

note = "Accepted Author Manuscript",

year = "2019",

doi = "10.1016/j.ssci.2019.05.048",

language = "English",

volume = "118",

pages = "763--771",

journal = "Safety Science",

issn = "0925-7535",

publisher = "Elsevier",

}

TY - JOUR

T1 - Calibrating experts’ probabilistic assessments for improved probabilistic predictions

AU - Hanea, A.M.

AU - Nane, G.F.

N1 - Accepted Author Manuscript

PY - 2019

Y1 - 2019

N2 - Expert judgement is routinely required to inform critically important decisions. While expert judgement can be remarkably useful when data are absent, it can be easily influenced by contextual biases which can lead to poor judgements and subsequently poor decisions. Structured elicitation protocols aim to: (1) guard against biases and provide better (aggregated) judgements, and (2) subject expert judgements to the same level of scrutiny as is expected for empirical data. The latter ensures that if judgements are to be used as data, they are subject to the scientific principles of review, critical appraisal, and repeatability. Objectively evaluating the quality of expert data and validating expert judgements are other essential elements. Considerable research suggests that the performance of experts should be evaluated by scoring experts on questions related to the elicitation questions, whose answers are known a priori. Experts who can provide accurate, well-calibrated and informative judgements should receive more weight in a final aggregation of judgements. This is referred to as performance-weighting in the mathematical aggregation of multiple judgements. The weights depend on the chosen measures of performance. We are yet to understand the best methods to aggregate judgements, how well such aggregations perform out of sample, or the costs involved, as well as the benefits of the various approaches. In this paper we propose and explore a new measure of experts’ calibration. A sizeable data set containing predictions for outcomes of geopolitical events is used to investigate the properties of this calibration measure when compared to other, well established measures.

AB - Expert judgement is routinely required to inform critically important decisions. While expert judgement can be remarkably useful when data are absent, it can be easily influenced by contextual biases which can lead to poor judgements and subsequently poor decisions. Structured elicitation protocols aim to: (1) guard against biases and provide better (aggregated) judgements, and (2) subject expert judgements to the same level of scrutiny as is expected for empirical data. The latter ensures that if judgements are to be used as data, they are subject to the scientific principles of review, critical appraisal, and repeatability. Objectively evaluating the quality of expert data and validating expert judgements are other essential elements. Considerable research suggests that the performance of experts should be evaluated by scoring experts on questions related to the elicitation questions, whose answers are known a priori. Experts who can provide accurate, well-calibrated and informative judgements should receive more weight in a final aggregation of judgements. This is referred to as performance-weighting in the mathematical aggregation of multiple judgements. The weights depend on the chosen measures of performance. We are yet to understand the best methods to aggregate judgements, how well such aggregations perform out of sample, or the costs involved, as well as the benefits of the various approaches. In this paper we propose and explore a new measure of experts’ calibration. A sizeable data set containing predictions for outcomes of geopolitical events is used to investigate the properties of this calibration measure when compared to other, well established measures.

KW - Calibration

KW - Performance based weighting

KW - Probabilistic predictions

KW - Structured expert judgement

UR - http://www.scopus.com/inward/record.url?scp=85067249865&partnerID=8YFLogxK

U2 - 10.1016/j.ssci.2019.05.048

DO - 10.1016/j.ssci.2019.05.048

M3 - Article

AN - SCOPUS:85067249865

SN - 0925-7535

VL - 118

SP - 763

EP - 771

JO - Safety Science

JF - Safety Science

ER -

Calibrating experts’ probabilistic assessments for improved probabilistic predictions

Abstract

Bibliographical note

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this