Robust (Deep) learning framework against dirty labels and beyond

Amirmasoud Ghiassi; Taraneh Younesian; Zhilong Zhao; Robert Birke; Valerio Schiavoni; Lydia Y. Chen

doi:10.1109/TPS-ISA48467.2019.00038

Robust (Deep) learning framework against dirty labels and beyond

Amirmasoud Ghiassi, Taraneh Younesian, Zhilong Zhao, Robert Birke, Valerio Schiavoni, Lydia Y. Chen

Data-Intensive Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

10 Citations (Scopus)

Abstract

Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.

Original language	English
Title of host publication	Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Pages	236-244
Number of pages	9
ISBN (Electronic)	9781728167411
DOIs	https://doi.org/10.1109/TPS-ISA48467.2019.00038
Publication status	Published - 2019
Event	1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019 - Los Angeles, United States Duration: 12 Dec 2019 → 14 Dec 2019

Publication series

Name	Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019

Conference

Conference	1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019
Country/Territory	United States
City	Los Angeles
Period	12/12/19 → 14/12/19

Keywords

Active learning
Adversarial learning
Data filtering
Deep neural networks
Dirty labels
Trusted execution

Access to Document

10.1109/TPS-ISA48467.2019.00038

Cite this

Ghiassi, A., Younesian, T., Zhao, Z., Birke, R., Schiavoni, V., & Chen, L. Y. (2019). Robust (Deep) learning framework against dirty labels and beyond. In Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019 (pp. 236-244). Article 9014352 (Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/TPS-ISA48467.2019.00038

Ghiassi, Amirmasoud ; Younesian, Taraneh ; Zhao, Zhilong et al. / Robust (Deep) learning framework against dirty labels and beyond. Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019. Institute of Electrical and Electronics Engineers (IEEE), 2019. pp. 236-244 (Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019).

@inproceedings{21a69f0db26c4202bbeb792809141f54,

title = "Robust (Deep) learning framework against dirty labels and beyond",

abstract = "Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.",

keywords = "Active learning, Adversarial learning, Data filtering, Deep neural networks, Dirty labels, Trusted execution",

author = "Amirmasoud Ghiassi and Taraneh Younesian and Zhilong Zhao and Robert Birke and Valerio Schiavoni and Chen, {Lydia Y.}",

year = "2019",

doi = "10.1109/TPS-ISA48467.2019.00038",

language = "English",

series = "Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

pages = "236--244",

booktitle = "Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019",

address = "United States",

note = "1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019 ; Conference date: 12-12-2019 Through 14-12-2019",

}

Ghiassi, A, Younesian, T, Zhao, Z, Birke, R, Schiavoni, V & Chen, LY 2019, Robust (Deep) learning framework against dirty labels and beyond. in Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019., 9014352, Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019, Institute of Electrical and Electronics Engineers (IEEE), pp. 236-244, 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019, Los Angeles, United States, 12/12/19. https://doi.org/10.1109/TPS-ISA48467.2019.00038

Robust (Deep) learning framework against dirty labels and beyond. / Ghiassi, Amirmasoud; Younesian, Taraneh; Zhao, Zhilong et al.
Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019. Institute of Electrical and Electronics Engineers (IEEE), 2019. p. 236-244 9014352 (Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Robust (Deep) learning framework against dirty labels and beyond

AU - Ghiassi, Amirmasoud

AU - Younesian, Taraneh

AU - Zhao, Zhilong

AU - Birke, Robert

AU - Schiavoni, Valerio

AU - Chen, Lydia Y.

PY - 2019

Y1 - 2019

N2 - Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.

AB - Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.

KW - Active learning

KW - Adversarial learning

KW - Data filtering

KW - Deep neural networks

KW - Dirty labels

KW - Trusted execution

UR - http://www.scopus.com/inward/record.url?scp=85082242077&partnerID=8YFLogxK

U2 - 10.1109/TPS-ISA48467.2019.00038

DO - 10.1109/TPS-ISA48467.2019.00038

M3 - Conference contribution

AN - SCOPUS:85082242077

T3 - Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019

SP - 236

EP - 244

BT - Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019

PB - Institute of Electrical and Electronics Engineers (IEEE)

T2 - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019

Y2 - 12 December 2019 through 14 December 2019

ER -

Ghiassi A, Younesian T, Zhao Z, Birke R, Schiavoni V, Chen LY. Robust (Deep) learning framework against dirty labels and beyond. In Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019. Institute of Electrical and Electronics Engineers (IEEE). 2019. p. 236-244. 9014352. (Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019). doi: 10.1109/TPS-ISA48467.2019.00038

Robust (Deep) learning framework against dirty labels and beyond

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this