Publication: A Data Masking Guideline for Optimizing Insights and Privacy under GDPR Compliance
Issued Date
2020-07-01
Resource Type
Other identifier(s)
2-s2.0-85089195691
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
ACM International Conference Proceeding Series. (2020)
Suggested Citation
Chitanut Tachepun, Sotarat Thammaboosadee A Data Masking Guideline for Optimizing Insights and Privacy under GDPR Compliance. ACM International Conference Proceeding Series. (2020). doi:10.1145/3406601.3406627 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/57822
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
A Data Masking Guideline for Optimizing Insights and Privacy under GDPR Compliance
Author(s)
Other Contributor(s)
Abstract
© 2020 ACM. The General Data Protection Regulation (GDPR) has been enforced since May 2019 and became a disruptive issue to every organization due to its severe penalties in the data breaches or use of personal data for illegal purposes, e.g., lack of the consent of data subject. Therefore, the data Pseudonymization and Anonymization are one of the employed techniques to protect and reduce the privacy risks from the data breach. Unfortunately, they also destroy the pattern of the data, which represents the fact that it could be analyzed or monetized to gain useful insights by data analytics or data science approaches. This paper focuses on optimizing the privacy and insight method that the data could be useful for analyzing and also compliance with the GDPR. This paper proposes the guideline consists of three techniques: tokenization, suppression, and generalization to protect personal data by calculating risk scores from two methods: data classification and data uniqueness. The criteria in the guideline are experimented to achieve the optimized classification performance in protected data compared with five original open data by analyzing with three data mining algorithms with the hyperparameter tuning process. The results show that the protected data by the proposed guideline can protect adequate information and achieve insignificant classification performance when compared to the unprotected data.