TY - GEN
T1 - TrustNet
T2 - 8th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2021
AU - Ghiassi, Amirmasoud
AU - Birke, Robert
AU - Chen, Lydia Y.
PY - 2021
Y1 - 2021
N2 - Big Data systems allow collecting massive datasets to feed the data hungry deep learning. Labelling these ever-bigger datasets is increasingly challenging and label errors affect even highly curated sets. This makes robustness to label noise a critical property for weakly-supervised classifiers. The related works on resilient deep networks tend to focus on a limited set of synthetic noise patterns, and with disparate views on their impacts, e.g., robustness against symmetric v.s. asymmetric noise patterns. In this paper, we first extend the theoretical analysis of test accuracy for any given noise patterns. Based on the insights, we design TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10, CIFAR-100 and big real-world data with label noise, i.e., Clothing1M. We compare against state-of-The-Art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.
AB - Big Data systems allow collecting massive datasets to feed the data hungry deep learning. Labelling these ever-bigger datasets is increasingly challenging and label errors affect even highly curated sets. This makes robustness to label noise a critical property for weakly-supervised classifiers. The related works on resilient deep networks tend to focus on a limited set of synthetic noise patterns, and with disparate views on their impacts, e.g., robustness against symmetric v.s. asymmetric noise patterns. In this paper, we first extend the theoretical analysis of test accuracy for any given noise patterns. Based on the insights, we design TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10, CIFAR-100 and big real-world data with label noise, i.e., Clothing1M. We compare against state-of-The-Art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.
KW - deep neural networks
KW - noise estimation
KW - noise transition matrix
KW - noisy labels in big data
KW - robust loss function
UR - http://www.scopus.com/inward/record.url?scp=85123990470&partnerID=8YFLogxK
U2 - 10.1145/3492324.3494166
DO - 10.1145/3492324.3494166
M3 - Conference contribution
AN - SCOPUS:85123990470
T3 - ACM International Conference Proceeding Series
SP - 52
EP - 62
BT - Proceedings of the 8th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2021
PB - Association for Computing Machinery (ACM)
Y2 - 6 December 2021 through 9 December 2021
ER -