Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Zilong Zhao; Robert Birke; Rui Han; Bogdan Robu; Sara Bouchenak; Sonia Ben Mokhtar; Lydia Y. Chen

doi:10.1109/TDSC.2021.3063947

Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Zilong Zhao, Robert Birke, Rui Han, Bogdan Robu, Sara Bouchenak, Sonia Ben Mokhtar, Lydia Y. Chen

Data-Intensive Systems

Research output: Contribution to journal › Article › Scientific › peer-review

8 Citations (Scopus)

Abstract

Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

Original language	English
Article number	9369874
Pages (from-to)	2177 - 2192
Number of pages	16
Journal	IEEE Transactions on Dependable and Secure Computing
Volume	18
Issue number	5
DOIs	https://doi.org/10.1109/TDSC.2021.3063947
Publication status	Published - 2021

Keywords

Unreliable Data
Anomaly Detection
Failures
Attacks
Machine Learning

Access to Document

10.1109/TDSC.2021.3063947

Cite this

@article{d31b020adfb6473d9955c61a8cd9a9f5,

title = "Enhancing Robustness of On-line Learning Models on Highly Noisy Data",

abstract = "Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.",

keywords = "Unreliable Data, Anomaly Detection, Failures, Attacks, Machine Learning",

author = "Zilong Zhao and Robert Birke and Rui Han and Bogdan Robu and Sara Bouchenak and {Ben Mokhtar}, Sonia and Chen, {Lydia Y.}",

year = "2021",

doi = "10.1109/TDSC.2021.3063947",

language = "English",

volume = "18",

pages = "2177 -- 2192",

journal = "IEEE Transactions on Dependable and Secure Computing",

issn = "1545-5971",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "5",

}

TY - JOUR

T1 - Enhancing Robustness of On-line Learning Models on Highly Noisy Data

AU - Zhao, Zilong

AU - Birke, Robert

AU - Han, Rui

AU - Robu, Bogdan

AU - Bouchenak, Sara

AU - Ben Mokhtar, Sonia

AU - Chen, Lydia Y.

PY - 2021

Y1 - 2021

N2 - Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

AB - Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

KW - Unreliable Data

KW - Anomaly Detection

KW - Failures

KW - Attacks

KW - Machine Learning

UR - http://www.scopus.com/inward/record.url?scp=85102278310&partnerID=8YFLogxK

U2 - 10.1109/TDSC.2021.3063947

DO - 10.1109/TDSC.2021.3063947

M3 - Article

AN - SCOPUS:85102278310

SN - 1545-5971

VL - 18

SP - 2177

EP - 2192

JO - IEEE Transactions on Dependable and Secure Computing

JF - IEEE Transactions on Dependable and Secure Computing

IS - 5

M1 - 9369874

ER -

Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this