An empirical study of automated privacy requirements classification in issue reports
Issued Date
2023-11-01
Resource Type
ISSN
09288910
eISSN
15737535
Scopus ID
2-s2.0-85163735171
Journal Title
Automated Software Engineering
Volume
30
Issue
2
Rights Holder(s)
SCOPUS
Bibliographic Citation
Automated Software Engineering Vol.30 No.2 (2023)
Suggested Citation
Sangaroonsilp P., Choetkiertikul M., Dam H.K., Ghose A. An empirical study of automated privacy requirements classification in issue reports. Automated Software Engineering Vol.30 No.2 (2023). doi:10.1007/s10515-023-00387-9 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/87876
Title
An empirical study of automated privacy requirements classification in issue reports
Author(s)
Author's Affiliation
Other Contributor(s)
Abstract
The recent advent of data protection laws and regulations has emerged to protect privacy and personal information of individuals. As the cases of privacy breaches and vulnerabilities are rapidly increasing, people are aware and more concerned about their privacy. These bring a significant attention to software development teams to address privacy concerns in developing software applications. As today’s software development adopts an agile, issue-driven approach, issues in an issue tracking system become a centralised pool that gathers new requirements, requests for modification and all the tasks of the software project. Hence, establishing an alignment between those issues and privacy requirements is an important step in developing privacy-aware software systems. This alignment also facilitates privacy compliance checking which may be required as an underlying part of regulations for organisations. However, manually establishing those alignments is labour intensive and time consuming. In this paper, we explore a wide range of machine learning and natural language processing techniques which can automatically classify privacy requirements in issue reports. We employ six popular techniques namely Bag-of-Words (BoW), N-gram Inverse Document Frequency (N-gram IDF), Term Frequency-Inverse Document Frequency (TF-IDF), Word2Vec, Convolutional Neural Network (CNN) and Bidirectional Encoder Representations from Transformers (BERT) to perform the classification on privacy-related issue reports in Google Chrome and Moodle projects. The evaluation showed that BoW, N-gram IDF, TF-IDF and Word2Vec techniques are suitable for classifying privacy requirements in those issue reports. In addition, N-gram IDF is the best performer in both projects.