An empirical study of automated privacy requirements classification in issue reports

dc.contributor.authorSangaroonsilp P.
dc.contributor.authorChoetkiertikul M.
dc.contributor.authorDam H.K.
dc.contributor.authorGhose A.
dc.contributor.otherMahidol University
dc.date.accessioned2023-07-17T18:02:01Z
dc.date.available2023-07-17T18:02:01Z
dc.date.issued2023-11-01
dc.description.abstractThe recent advent of data protection laws and regulations has emerged to protect privacy and personal information of individuals. As the cases of privacy breaches and vulnerabilities are rapidly increasing, people are aware and more concerned about their privacy. These bring a significant attention to software development teams to address privacy concerns in developing software applications. As today’s software development adopts an agile, issue-driven approach, issues in an issue tracking system become a centralised pool that gathers new requirements, requests for modification and all the tasks of the software project. Hence, establishing an alignment between those issues and privacy requirements is an important step in developing privacy-aware software systems. This alignment also facilitates privacy compliance checking which may be required as an underlying part of regulations for organisations. However, manually establishing those alignments is labour intensive and time consuming. In this paper, we explore a wide range of machine learning and natural language processing techniques which can automatically classify privacy requirements in issue reports. We employ six popular techniques namely Bag-of-Words (BoW), N-gram Inverse Document Frequency (N-gram IDF), Term Frequency-Inverse Document Frequency (TF-IDF), Word2Vec, Convolutional Neural Network (CNN) and Bidirectional Encoder Representations from Transformers (BERT) to perform the classification on privacy-related issue reports in Google Chrome and Moodle projects. The evaluation showed that BoW, N-gram IDF, TF-IDF and Word2Vec techniques are suitable for classifying privacy requirements in those issue reports. In addition, N-gram IDF is the best performer in both projects.
dc.identifier.citationAutomated Software Engineering Vol.30 No.2 (2023)
dc.identifier.doi10.1007/s10515-023-00387-9
dc.identifier.eissn15737535
dc.identifier.issn09288910
dc.identifier.scopus2-s2.0-85163735171
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/87876
dc.rights.holderSCOPUS
dc.subjectComputer Science
dc.titleAn empirical study of automated privacy requirements classification in issue reports
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85163735171&origin=inward
oaire.citation.issue2
oaire.citation.titleAutomated Software Engineering
oaire.citation.volume30
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationUniversity of Wollongong

Files

Collections