[PDF][PDF] Outlier detection using clustering methods: a data cleaning application

A Loureiro, L Torgo, C Soares - … -based systems for the Public Sector, 2004 - academia.edu
A Loureiro, L Torgo, C Soares
Proceedings of KDNet Symposium on Knowledge-based systems for the Public …, 2004academia.edu
This paper describes a methodology for the application of hierarchical clustering methods to
the task of outlier detection. The methodology is tested on the problem of cleaning Official
Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by
the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they
have an important impact on the statistics produced by the institute. The task of detecting
these rare errors is a manual, time-consuming task. Our methodology is able to save a large …
Abstract
This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important impact on the statistics produced by the institute. The task of detecting these rare errors is a manual, time-consuming task. Our methodology is able to save a large amount of time by selecting a small subset of suspicious transactions for manual inspection, which, nevertheless, includes most of the erroneous transactions. In this study we compare several alternative hierarchical clustering methodologies for this task. The results we have obtained confirm the validity of the use of hierarchical clustering techniques for this task. Moreover, our results when compared to previous approaches to the same data, clearly outperform them, identifying the same level of erroneous transactions with significantly less manual inspection.
academia.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果