An empirical study of automated privacy requirements classification in issue reports

Sangaroonsilp P.; Choetkiertikul M.; Dam H.K.; Ghose A.

An empirical study of automated privacy requirements classification in issue reports

Issued Date

2023-11-01

Resource Type

Article

ISSN

09288910

eISSN

15737535

DOI

10.1007/s10515-023-00387-9

Scopus ID

2-s2.0-85163735171

Journal Title

Automated Software Engineering

Volume

30

Issue

2

Rights Holder(s)

SCOPUS

Bibliographic Citation

Automated Software Engineering Vol.30 No.2 (2023)

Suggested Citation

Sangaroonsilp P., Choetkiertikul M., Dam H.K., Ghose A. An empirical study of automated privacy requirements classification in issue reports. Automated Software Engineering Vol.30 No.2 (2023). doi:10.1007/s10515-023-00387-9 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/87876

Title

An empirical study of automated privacy requirements classification in issue reports

Author(s)

Sangaroonsilp P.
Choetkiertikul M.
Dam H.K.
Ghose A.

Author's Affiliation

Mahidol University
University of Wollongong

Other Contributor(s)

Mahidol University

Abstract

The recent advent of data protection laws and regulations has emerged to protect privacy and personal information of individuals. As the cases of privacy breaches and vulnerabilities are rapidly increasing, people are aware and more concerned about their privacy. These bring a significant attention to software development teams to address privacy concerns in developing software applications. As today’s software development adopts an agile, issue-driven approach, issues in an issue tracking system become a centralised pool that gathers new requirements, requests for modification and all the tasks of the software project. Hence, establishing an alignment between those issues and privacy requirements is an important step in developing privacy-aware software systems. This alignment also facilitates privacy compliance checking which may be required as an underlying part of regulations for organisations. However, manually establishing those alignments is labour intensive and time consuming. In this paper, we explore a wide range of machine learning and natural language processing techniques which can automatically classify privacy requirements in issue reports. We employ six popular techniques namely Bag-of-Words (BoW), N-gram Inverse Document Frequency (N-gram IDF), Term Frequency-Inverse Document Frequency (TF-IDF), Word2Vec, Convolutional Neural Network (CNN) and Bidirectional Encoder Representations from Transformers (BERT) to perform the classification on privacy-related issue reports in Google Chrome and Moodle projects. The evaluation showed that BoW, N-gram IDF, TF-IDF and Word2Vec techniques are suitable for classifying privacy requirements in those issue reports. In addition, N-gram IDF is the best performer in both projects.

Keyword(s)

Computer Science

URI

https://repository.li.mahidol.ac.th/handle/123456789/87876

Collections

Scopus 2023

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th