Code Clone Configuration as a Multi-Objective Search Problem
Issued Date
2024-10-24
Resource Type
ISSN
19493770
eISSN
19493789
Scopus ID
2-s2.0-85210572093
Journal Title
International Symposium on Empirical Software Engineering and Measurement
Start Page
503
End Page
509
Rights Holder(s)
SCOPUS
Bibliographic Citation
International Symposium on Empirical Software Engineering and Measurement (2024) , 503-509
Suggested Citation
Sousa D., Paixao M., Ragkhitwetsagul C., Uchoa I. Code Clone Configuration as a Multi-Objective Search Problem. International Symposium on Empirical Software Engineering and Measurement (2024) , 503-509. 509. doi:10.1145/3674805.3690757 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/102311
Title
Code Clone Configuration as a Multi-Objective Search Problem
Author(s)
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
Clone detection is an automated process for finding duplicated code within a project's code base or between online sources. Nowadays, the code cloning community advocates that developers must be aware of the clones they may have in their code bases. In modern clone detection, rank-based tools appear as the ones able to handle the large code corpora that are necessary to identify online clones. However, such tools are sensitive to their parameters, which directly affects their clone detection abilities. Moreover, existing parameter optimization approaches for clone detectors are not meant for rank-based tools. To overcome this issue and facilitate empirical studies of code clones, we introduce Multi-objective Code Clone Configuration, a new approach based on multi-objective optimization to search for an optimal set of parameters for a rank-based clone detection tool. In our empirical evaluation, we ran 3 baseline search algorithms and NSGA-II to assess their performance in this new optimization problem. Additionally, we compared the optimized configurations with the default one. Our results show that NSGA-II was the algorithm that achieved the best performance, finding better configurations than those of the baseline algorithms. Finally, the optimized configurations achieved improvements of 71.08% and 46.29% for our fitness functions.