A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal

Jessica A. Eisma; Gerrit Schoups; Jeffrey C. Davids; Nick van de Giesen

doi:10.5194/hess-27-3565-2023

A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal

Jessica A. Eisma^*, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen

^*Corresponding author for this work

Water Resources

Research output: Contribution to journal › Article › Scientific › peer-review

20 Downloads (Pure)

Abstract

High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.

Original language	English
Pages (from-to)	3565-3579
Number of pages	15
Journal	Hydrology and Earth System Sciences
Volume	27
Issue number	19
DOIs	https://doi.org/10.5194/hess-27-3565-2023
Publication status	Published - 2023

Access to Document

10.5194/hess-27-3565-2023Licence: CC BY

A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from NepalFinal published version, 2.02 MBLicence: CC BY

Cite this

@article{2da7167fd61e483b9de38b25900727b6,

title = "A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal",

abstract = "High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.",

author = "Eisma, {Jessica A.} and Gerrit Schoups and Davids, {Jeffrey C.} and {van de Giesen}, Nick",

year = "2023",

doi = "10.5194/hess-27-3565-2023",

language = "English",

volume = "27",

pages = "3565--3579",

journal = "Hydrology and Earth System Sciences",

issn = "1027-5606",

publisher = "European Geosciences Union",

number = "19",

}

TY - JOUR

T1 - A Bayesian model for quantifying errors in citizen science data

T2 - application to rainfall observations from Nepal

AU - Eisma, Jessica A.

AU - Schoups, Gerrit

AU - Davids, Jeffrey C.

AU - van de Giesen, Nick

PY - 2023

Y1 - 2023

N2 - High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.

AB - High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.

UR - http://www.scopus.com/inward/record.url?scp=85178224560&partnerID=8YFLogxK

U2 - 10.5194/hess-27-3565-2023

DO - 10.5194/hess-27-3565-2023

M3 - Article

SN - 1027-5606

VL - 27

SP - 3565

EP - 3579

JO - Hydrology and Earth System Sciences

JF - Hydrology and Earth System Sciences

IS - 19

ER -

A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal

Abstract

Access to Document

Other files and links

Fingerprint

Cite this