作者
Yuanyuan Wei, Julian Jang-Jaccard, Fariza Sabrina, Timothy McIntosh
发表日期
2021/10/20
研讨会论文
2021 IEEE 15th International Conference on Big Data Science and Engineering (BigDataSE)
页码范围
87-94
出版商
IEEE
简介
Existing outlier detection algorithms exhibit different sensitivity to noisy data such as extreme values. In this paper, we propose a novel cluster-based outlier detection algorithm named MSD-Kmeans that combines the statistical method of Mean and Standard Deviation (MSD) and the machine learning clustering algorithm K-means to detect outliers more accurately with the better control of extreme values. There are two phases in this combination method of MSD-Kmeans: (1) applying MSD algorithm to eliminate as many noisy data to minimize the interference on clusters, and (2) applying K-means algorithm to obtain local optimal clusters. We evaluate our algorithm and demonstrate its effectiveness in the context of detecting possible overcharging of taxi fares. We compare the performance indicators of MSD-Kmeans with those of other outlier detection algorithms both from statistical and machine learning-based …
引用总数
2020202120222023202425341