TY - GEN
T1 - Online label aggregation
T2 - 2021 World Wide Web Conference, WWW 2021
AU - Hong, Chi
AU - Ghiassi, Amirmasoud
AU - Zhou, Yichi
AU - Birke, Robert
AU - Chen, Lydia Y.
PY - 2021
Y1 - 2021
N2 - Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.
AB - Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.
KW - Convergence bound
KW - Label aggregation
KW - Online
KW - Stochastic optimizer
KW - Variational bayesian inference
UR - http://www.scopus.com/inward/record.url?scp=85107964875&partnerID=8YFLogxK
U2 - 10.1145/3442381.3449933
DO - 10.1145/3442381.3449933
M3 - Conference contribution
AN - SCOPUS:85107964875
T3 - The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
SP - 1904
EP - 1915
BT - The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
PB - Association for Computing Machinery (ACM)
Y2 - 19 April 2021 through 23 April 2021
ER -