TY - GEN
T1 - Robust (Deep) learning framework against dirty labels and beyond
AU - Ghiassi, Amirmasoud
AU - Younesian, Taraneh
AU - Zhao, Zhilong
AU - Birke, Robert
AU - Schiavoni, Valerio
AU - Chen, Lydia Y.
PY - 2019
Y1 - 2019
N2 - Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.
AB - Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.
KW - Active learning
KW - Adversarial learning
KW - Data filtering
KW - Deep neural networks
KW - Dirty labels
KW - Trusted execution
UR - http://www.scopus.com/inward/record.url?scp=85082242077&partnerID=8YFLogxK
U2 - 10.1109/TPS-ISA48467.2019.00038
DO - 10.1109/TPS-ISA48467.2019.00038
M3 - Conference contribution
AN - SCOPUS:85082242077
T3 - Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019
SP - 236
EP - 244
BT - Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019
PB - IEEE
T2 - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019
Y2 - 12 December 2019 through 14 December 2019
ER -