Semantic adaptive microaggregation of categorical microdata

S Martínez, D Sánchez, A Valls - Computers & Security, 2012 - Elsevier
Computers & Security, 2012Elsevier
In the context of Statistical Disclosure Control, microaggregation is a privacy-preserving
method aimed to mask sensitive microdata prior to publication. It iteratively creates clusters
of, at least, k elements, and replaces them by their prototype so that they become k-
indistinguishable (anonymous). This data transformation produces a loss of information with
regards to the original dataset which affects the utility of masked data, so, the aim of
microaggregation algorithms is to find the partition that minimises the information loss while …
In the context of Statistical Disclosure Control, microaggregation is a privacy-preserving method aimed to mask sensitive microdata prior to publication. It iteratively creates clusters of, at least, k elements, and replaces them by their prototype so that they become k-indistinguishable (anonymous). This data transformation produces a loss of information with regards to the original dataset which affects the utility of masked data, so, the aim of microaggregation algorithms is to find the partition that minimises the information loss while ensuring a certain level of privacy. Most microaggregation methods, such as the MDAV algorithm, which is the focus of this paper, have been designed for numerical data. Extending them to support non-numerical (categorical) attributes is not straightforward because of the limitations on defining appropriate aggregation operators. Concretely, related works focused on the MDAV algorithm propose grouping data into groups with constrained size (or even fixed) and/or incorporate a basic categorical treatment of non-numerical data. This approach affects negatively the utility of the protected dataset because neither the distributional characteristics of data nor their underlying semantics are properly considered. In this paper, we propose a set of modifications to the MDAV algorithm focused on categorical microdata. Our approach has been evaluated and compared with related works when protecting real datasets with textual attribute values. Results show that our method produces masked datasets that better minimises the information loss resulting from the data transformation.
Elsevier
以上显示的是最相近的搜索结果。 查看全部搜索结果