A Privacy-Preserving Framework for Distance-Based Mining


In the environment of today's world where data collection and storage is growing at an exponential rate, the privacy issues related to sensitive individual information cannot be overemphasized. Many organizations need to protect privacy of data and at the same time still allow useful patterns being discovered from the data.

Distance-based mining algorithms are widely used. However the commonly used privacy-preserving methods may lead to poor mining quality and are vulnerable to simple attacks that utilize the correlations of data. The existing work also does not provide any worst case privacy guarantee. In this project, our research group is investigating a novel framework for preserving privacy in distance based data mining. The methods proposed through the research will be applicable for privatizing sensitive data, both numeric and categorical, as well as image data such as photographs and fingerprints, which can have practical implications in areas such as law enforcement and counterterrorism, among others. The proposed methods can ensure privacy protection with provable guarantees, and at the same time allow accurate mining of privatized data using distance-based mining methods.



This material is based in part upon work supported by the National Science Foundation under Grant Numbers IIS-0713345. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.