L1-depth revisited: A robust angle-based outlier factor in high-dimensional space

N Pham - Joint European Conference on Machine Learning and …, 2018 - Springer
Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2018Springer
Angle-based outlier detection (ABOD) has been recently emerged as an effective method to
detect outliers in high dimensions. Instead of examining neighborhoods as proximity-based
concepts, ABOD assesses the broadness of angle spectrum of a point as an outlier factor.
Despite being a parameter-free and robust measure in high-dimensional space, the exact
solution of ABOD suffers from the cubic cost O (n^ 3) regarding the data size n, hence cannot
be used on large-scale data sets. In this work we present a conceptual relationship between …
Abstract
Angle-based outlier detection (ABOD) has been recently emerged as an effective method to detect outliers in high dimensions. Instead of examining neighborhoods as proximity-based concepts, ABOD assesses the broadness of angle spectrum of a point as an outlier factor. Despite being a parameter-free and robust measure in high-dimensional space, the exact solution of ABOD suffers from the cubic cost regarding the data size n, hence cannot be used on large-scale data sets.
In this work we present a conceptual relationship between the ABOD intuition and the L1-depth concept in statistics, one of the earliest methods used for detecting outliers. Deriving from this relationship, we propose to use L1-depth as a variant of angle-based outlier factors, since it only requires a quadratic computational time as proximity-based outlier factors. Empirically, L1-depth is competitive (often superior) to proximity-based and other proposed angle-based outlier factors on detecting high-dimensional outliers regarding both efficiency and accuracy.
In order to avoid the quadratic computational time, we introduce a simple but efficient sampling method named SamDepth for estimating L1-depth measure. We also present theoretical analysis to guarantee the reliability of SamDepth. The empirical experiments on many real-world high-dimensional data sets demonstrate that SamDepth with samples often achieves very competitive accuracy and runs several orders of magnitude faster than other proximity-based and ABOD competitors. Data related to this paper are available at: https://www.dropbox.com/s/nk7nqmwmdsatizs/Datasets.zip . Code related to this paper is available at: https://github.com/NinhPham/Outlier .
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果