Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Zilong Zhao, Robert Birke, Rui Han, Bogdan Robu, Sara Bouchenak, Sonia Ben Mokhtar, Lydia Y. Chen

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

Original languageEnglish
Article number9369874
Pages (from-to)2177 - 2192
Number of pages16
JournalIEEE Transactions on Dependable and Secure Computing
Volume18
Issue number5
DOIs
Publication statusPublished - 2021

Keywords

  • Unreliable Data
  • Anomaly Detection
  • Failures
  • Attacks
  • Machine Learning

Fingerprint

Dive into the research topics of 'Enhancing Robustness of On-line Learning Models on Highly Noisy Data'. Together they form a unique fingerprint.

Cite this