Browsing by Author "Ragkhitwetsagul C."

Now showing 1 - 20 of 20

Metadata only
A component recommendation model for issues in software projects
(2022-01-01) Kangwanwisit P.; Choetkiertikul M.; Ragkhitwetsagul C.; Sunetnanta T.; Maipradit R.; Hata H.; Matsumoto K.; Mahidol University
In modern software development projects, developer teams usually adopt an issue-driven approach to increase their productivity. The component of an issue report implicitly or-ganize issues in a software project (e.g, defects, new feature requests, and tasks) into a group of issues that have similar characteristics. A component of an issue report is an important attribute needed to be identified in an issue triaging process. Thus, assigning the correct component(s) to an issue is crucial in issue resolution. However, it is a challenging task since large-scale projects contain a considerable amount of components (e.g. almost one-hundred components in the Bamboo project) and it can increase significantly as the project evolves over time. In this paper, we propose an approach that uses textual feature extraction and machine learning techniques with Binary Relevance (BR) to develop a component recommendation model to support the task of assigning component(s) to an issue. The empirical evaluation over 60,000 issue reports shows that our proposed models outperform the baseline benchmarks and other techniques by achieving on average 0.480 Precision@1, 0.616 Recall@3, 0.432 MAP, and 0.596 MRR.
Metadata only
A taxonomy for mining and classifying privacy requirements in issue reports
(2023-05-01) Sangaroonsilp P.; Dam H.K.; Choetkiertikul M.; Ragkhitwetsagul C.; Ghose A.; Mahidol University
Context: Digital and physical trails of user activities are collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. Objective: In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. Methods: These requirements are extracted, refined and classified (using the goal-based requirements analysis method) into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. Results: The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Conclusion: Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements.
Metadata only
Adoption of automated software engineering tools and techniques in Thailand
(2024-07-01) Ragkhitwetsagul C.; Krinke J.; Choetkiertikul M.; Sunetnanta T.; Sarro F.; Ragkhitwetsagul C.; Mahidol University
Readiness for the adoption of Automated Software Engineering (ASE) tools and techniques can vary according to the size and maturity of software companies. ASE tools and techniques have been adopted by large or ultra-large software companies. However, little is known about the adoption of ASE tools and techniques in small and medium-sized software enterprises (SSMEs) in emerging countries, and the challenges faced by such companies. We study the adoption of ASE tools and techniques for software measurement, static code analysis, continuous integration, and software testing, and the respective challenges faced by software developers in Thailand, a developing country with a growing software economy which mainly consists of SSMEs (similar to other developing countries). Based on the answers from 103 Thai participants in an online survey, we found that Thai software developers are somewhat familiar with ASE tools and agree that adopting such tools would be beneficial. Most of the developers do not use software measurement or static code analysis tools due to a lack of knowledge or experience but agree that their use would be useful. Continuous integration tools have been used with some difficulties. Lastly, although automated testing tools are adopted despite several serious challenges, many developers are still testing the software manually. We call for improvements in ASE tools to be easier to use in order to lower the barrier to adoption in small and medium-sized software enterprises (SSMEs) in developing countries.
Metadata only
Autorepairability: A New Software Quality Characteristic
(2024-01-01) Lapvikai P.; Ragkhitwetsagul C.; Choetkiertikul M.; Higo Y.; Lapvikai P.; Mahidol University
Currently, research on automated program repair (in short, APR) is actively being conducted. APR techniques have been applied to many bugs in open-source software, but the probability of a successful fix is not very high. The authors consider that not only should APR techniques be developed, but software systems should be developed so that bugs can be easily fixed with APR techniques. In this paper, we propose autorepairability, a new characteristic of software quality, that shows how effective automated program repair techniques are for a specific code fragment, file, or project. We also show an approach to automatically measure autorepairability from the source code of a target project, and present experimental results on 1,282 Java method pairs. The use of autorepairability allows many studies to be conducted. For example, research on the development process for developing software systems with high autorepairability and research on refactoring, which transforms software with low autorepairability into software systems with high autorepairability, will be possible.
Metadata only
BigCloneBench Considered Harmful for Machine Learning
(2022-01-01) Krinke J.; Ragkhitwetsagul C.; Mahidol University
BigCloneBench is a well-known large-scale dataset of clones mainly targeted at the evaluation of recall of clone detection tools. It has been beneficial for research on clone detection and evaluating the performance of clone detection tools, for which it has become standard. It has also been used in machine learning approaches to clone detection or code similarity detection. However, the way BigCloneBench has been constructed makes it problematic to use as ground truth for learning code similarity. This paper highlights the features of BigCloneBench that affect the ground truth quality and discusses common misperceptions about the benchmark. For example, extending or replacing the ground truth without understanding the properties of BigCloneBench often leads to wrong assumptions which can lead to invalid results. Also, a manual investigation of a sample of Weak-Type-3/Type-4 clone pairs revealed 86% of pairs to be false positives, threatening the results of machine learning approaches using BigCloneBench. We call for a halt in using BigCloneBench as the ground truth for learning code similarity.
Metadata only
Challenges in Adopting LLaMA: An Empirical Study of Discussions on Stack Overflow
(2024-01-01) Deeprom R.; Yang S.; Higo Y.; Choetkiertikul M.; Ragkhitwetsagul C.; Deeprom R.; Mahidol University
LLaMA (Large Language Model Meta AI) has quickly gained traction among developers due to its wide-ranging applications and its capabilities to be integrated into software projects. As interest in LLaMA grows, discussions around it have surged on platforms like Stack Overflow. The developer community, with its collaborative nature, serves as a valuable source for studying LLaMA’s quality, its emerging trends, and insights into its usage. Despite this growing attention, there has been no comprehensive study examining how the community interacts with and discusses LLaMA. This study addresses that gap by exploring conversations on Stack Overflow related to LLaMA and its quality, with the objective of identifying key themes and recurring patterns in these discussions. We systematically collected and analyzed 473 posts from Stack Overflow that contained the keyword “LLaMA” or were tagged accordingly. The analysis revealed that prominent topics of discussion include model configuration, error handling, and integration with other technologies. Furthermore, we identified frequent co-occurring tags, underscoring LLaMA’s integration within the larger ecosystem of large language models and its interoperability with widely used frameworks, such as Python and Hugging Face Transformers. The findings highlight the complexity of working with LLaMA, especially in model configuration and fine-tuning, indicating a need for better resources, documentation, and community support. The study also suggests that future development should prioritize interoperability with popular machine-learning frameworks to improve the LLM’s quality and to strengthen LLaMA’s role in the AI ecosystem.
Metadata only
Code Clone Configuration as a Multi-Objective Search Problem
(2024-10-24) Sousa D.; Paixao M.; Ragkhitwetsagul C.; Uchoa I.; Sousa D.; Mahidol University
Clone detection is an automated process for finding duplicated code within a project's code base or between online sources. Nowadays, the code cloning community advocates that developers must be aware of the clones they may have in their code bases. In modern clone detection, rank-based tools appear as the ones able to handle the large code corpora that are necessary to identify online clones. However, such tools are sensitive to their parameters, which directly affects their clone detection abilities. Moreover, existing parameter optimization approaches for clone detectors are not meant for rank-based tools. To overcome this issue and facilitate empirical studies of code clones, we introduce Multi-objective Code Clone Configuration, a new approach based on multi-objective optimization to search for an optimal set of parameters for a rank-based clone detection tool. In our empirical evaluation, we ran 3 baseline search algorithms and NSGA-II to assess their performance in this new optimization problem. Additionally, we compared the optimized configurations with the default one. Our results show that NSGA-II was the algorithm that achieved the best performance, finding better configurations than those of the baseline algorithms. Finally, the optimized configurations achieved improvements of 71.08% and 46.29% for our fitness functions.
Metadata only
Detecting Malicious Android Game Applications on Third-Party Stores Using Machine Learning
(2024-01-01) Sanamontre T.; Visoottiviseth V.; Ragkhitwetsagul C.; Sanamontre T.; Mahidol University
Due to Android’s flexibility in installing applications, it is one of the most popular mobile operating systems. Some Android users install applications from third-party stores even though they have the official application store, Google Play. These third-party stores usually have the mod version and the self-proclaimed original applications, which can be repackaged applications. Applications on these third-party stores can introduce security risks because of the non-transparent alteration and uploading processes. In this research, we inspect 492 Android applications from ten third-party stores for repackaged applications using information of APK files and a token-based code clone detection technique. We also classify repackaged applications as benign or malicious and categorize malicious applications into twelve malware categories. For the malware classification, we use machine learning techniques, including Random Forest, Decision Tree, and XGBoost, with the CCCS-CIC-AndMal-2020 Android malware dataset. Finally, we compare the results with VirusTotal, a well-known malware scanning website.
Metadata only
Identifying Software Engineering Challenges in Software SMEs: A Case Study in Thailand
(2022-01-01) Ragkhitwetsagul C.; Krinke J.; Choetkiertikul M.; Sunetnanta T.; Sarro F.; Mahidol University
Small and medium-sized software enterprises (SSMEs) are a vital part of emerging markets. Due to their size, they are not capable of adopting advanced software engineering techniques or automated software engineering tools in the same way large and ultra-large companies are. We study the software engineering challenges in SSMEs in Thailand, an emerging market in software development, using semi-structured interviews with four SSMEs. After performing a thematic analysis of the interview transcripts, we found a number of common challenges such as lack of testing, code-related issues, and inaccurate effort estimation. We observed that in order to introduce advanced automated software engineering tools and techniques, SSMEs need to adopt contemporary best practices in software engineering like automated testing, continuous integration and automated code review. Moreover, we suggest that software engineering research engage with SSMEs to enable them to improve their knowledge and adopt more advanced software engineering practices.
Metadata only
jscefr: A Framework to Evaluate the Code Proficiency for JavaScript
(2024-01-01) Ragkhitwetsagul C.; Kongwongsupak K.; Maneesawas T.; Puttiwarodom N.; Rojpaisarnkit R.; Choetkiertikul M.; Kula R.G.; Sunetnanta T.; Ragkhitwetsagul C.; Mahidol University
In this paper, we present jscefr (pronounced jes-cee-fer), a tool that detects the use of different elements of the JavaScript (JS) language, effectively measuring the level of proficiency required to comprehend and deal with a fragment of JavaScript code in software maintenance tasks. Based on the pycefr tool, the tool incorporates JavaScript elements and the well-known Common European Framework of Reference for Languages (CEFR) and utilizes the official ECMAScript JavaScript documentation from the Mozilla Developer Network. jscefr categorizes JS code into six levels based on proficiency. jscefr can detect and classify 138 different JavaScript code constructs. To evaluate, we apply our tool to three JavaScript projects of the NPM ecosystem, with interesting results. A video demonstrating the tool's availability and usage is available at https://youtu.be/Ehh-Prq59Pc.
Metadata only
Microusity: A testing tool for Backends for Frontends (BFF) Microservice Systems
(2023-01-01) Rattanukul P.; Makaranond C.; Watanakulcharus P.; Ragkhitwetsagul C.; Nearunchorn T.; Visoottiviseth V.; Choetkiertikul M.; Sunetnanta T.; Mahidol University
Microservice software architecture is more scalable and efficient than its monolithic predecessor. Despite its increasing adoption, microservices might expose security concerns and issues that are distinct from those associated with monolithic designs. We propose Microusity, a tool that performs RESTful API testing on a specific type of microservice pattern called backends for frontends (BFF). We design a novel approach to trace BFF requests using the port mapping between requests to BFF and the sub-requests sent to backend microservices. Furthermore, our tool can pinpoint which of the backend service causing the internal server error, which may lead to unhandled errors or vulnerabilities. Microusity provides an error report and a graph visualization that reveal the source of the error and supports developers in comprehension and debugging of the errors. The evaluation of eight software practitioners shows that Microusity and its test reports are useful for investigating and understanding problems in BFF systems. The prototype tool and the video demo of the tool can be found at https://github.com/MUICT-SERU/MICROUSITY.
Metadata only
pycefr: Python Competency Level through Code Analysis
(2022-01-01) Robles G.; Kula R.G.; Ragkhitwetsagul C.; Sakulniwat T.; Matsumoto K.; Gonzalez-Barahona J.M.; Mahidol University
Python is known to be a versatile language, well suited both for beginners and advanced users. Some elements of the language are easier to understand than others: some are found in any kind of code, while some others are used only by experienced programmers. The use of these elements lead to different ways to code, depending on the experience with the language and the knowledge of its elements, the general programming competence and programming skills, etc. In this paper, we present pycefr, a tool that detects the use of the different elements of the Python language, effectively measuring the level of Python proficiency required to comprehend and deal with a fragment of Python code. Following the well-known Common European Framework of Reference for Languages (CEFR), widely used for natural languages, pycefr categorizes Python code in six levels, depending on the proficiency required to create and understand it. We also discuss different use cases for pycefr: iden-tifying code snippets that can be understood by developers with a certain proficiency, labeling code examples in online resources such as Stackoverflow and GitHub to suit them to a certain level of competency, helping in the onboarding process of new developers in Open Source Software projects, etc. A video shows availability and usage of the tool: https://tinyurl.com/ypdt3fwe.
Metadata only
Reusing My Own Code: Preliminary Results for Competitive Coding in Jupyter Notebooks
(2022-01-01) Ritta N.; Settewong T.; Kula R.G.; Ragkhitwetsagul C.; Sunetnanta T.; Matsumoto K.; Mahidol University
The reuse of already existing code is widely considered a popular software development practice, that provides both benefits and drawbacks for all stakeholders involved. Prior work reports on how code reuse is a common practice in software development projects and data science projects such as machine learning pipelines. Recently, there has been much code reuse work in the context of competitive programming. Although there is work such as detecting plagiarism, there is no work that studies how a competitor will reuse their own code. In this paper, we present a preliminary study on the code reuse behavior of three grandmasters' Jupyter notebooks in the Kaggle Competitions, an online competition platform for data scientists, and report the types of code they often reuse. Grandmasters are the highest level reached in competitions (novice, expert, master, and grandmaster). We find that Grandmasters are less likely to reuse specialized code, but instead, tend to reuse common functions like importing packages (importing the pandas library). They are most likely to reuse common abstractions like importing packages, configurations, file IO operations, show data, plotting graphs, defining functions, and exploring files. The work opens up new research potential into recommending how developers can reuse their own code.
Metadata only
Sprint2Vec: a deep characterization of sprints in iterative software development
(2024-01-01) Choetkiertikul M.; Banyongrakkul P.; Ragkhitwetsagul C.; Tuarob S.; Dam H.K.; Sunetnanta T.; Choetkiertikul M.; Mahidol University
Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information - sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.
Metadata only
Studying the association between Gitcoin's issues and resolving outcomes
(2023-12-01) Choetkiertikul M.; Puengmongkolchaikit A.; Chandra P.; Ragkhitwetsagul C.; Maipradit R.; Hata H.; Sunetnanta T.; Matsumoto K.; Mahidol University
The development of open-source software (OSS) projects usually have been driven through collaborations among contributors and strongly relies on volunteering. Thus, allocating software practitioners (e.g., contributors) to a particular task is non-trivial and draws attention away from the development. Therefore, a number of bug bounty platforms have emerged to address this problem through bounty rewards. Especially, Gitcoin, a new bounty platform, introduces a bounty reward mechanism that allows individual issue owners (backers) to define a reward value using cryptocurrencies rather than using crowdfunding mechanisms. Although a number of studies have investigated the phenomenon on bounty platforms, those rely on different bounty reward systems. Our study thus investigates the association between the Gitcoin bounties and their outcomes (i.e., success and non-success). We empirically study over 4,000 issues with Gitcoin bounties using statistical analysis and machine learning techniques. We also conducted a comparative study with the Bountysource platform to gain insights into the usage of both platforms. Our study highlights the importance of factors such as the length of the project, issue description, type of bounty issue, and the bounty value, which are found to be highly correlated with the outcome of bounty issues. These findings can provide useful guidance to practitioners. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
Metadata only
Towards Identifying Code Proficiency Through the Analysis of Python Textbooks
(2024-01-01) Rojpaisarnkit R.; Robles G.; Kula R.G.; Wang D.; Ragkhitwetsagul C.; Gonzalez-Barahona J.M.; Matsumoto K.; Rojpaisarnkit R.; Mahidol University
Python, one of the most prevalent programming languages today, is widely utilized in various domains, including web development, data science, machine learning, and DevOps. Recent scholarly efforts have proposed a methodology to assess Python competence levels, similar to how proficiency in natural languages is evaluated. This method involves assigning levels of competence to Python constructs - for instance, placing simple 'print' statements at the most basic level and abstract base classes at the most advanced. The aim is to gauge the level of proficiency a developer must have to understand a piece of source code. This is particularly crucial for software maintenance and evolution tasks, such as debugging or adding new features. For example, in a code review process, this method could determine the competence level required for reviewers. However, categorizing Python constructs by proficiency levels poses significant challenges. Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies. In response, this paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks. By comparing the sequence in which Python constructs are introduced in these textbooks with the current state of the art, we have uncovered notable discrepancies in the order of introduction of Python constructs. Our study underscores a misalignment in the sequences, demonstrating that pinpointing proficiency levels is not trivial. Insights from the study serve as pivotal steps toward reinforcing the idea that textbooks serve as a valuable source for evaluating developers' proficiency, and particularly in terms of their ability to undertake maintenance and evolution tasks.
Metadata only
Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter Notebooks
(2024-01-01) Ragkhitwetsagul C.; Prasertpol V.; Ritta N.; Sae-Wong P.; Noraset T.; Choetkiertikul M.; Ragkhitwetsagul C.; Mahidol University
At present, code recommendation tools have gained greater importance to many software developers in various areas of expertise. Having code recommendation tools has enabled better productivity and performance in developing the code in software and made it easier for developers to find code examples and learn from them. This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database using text similarities such as the BM25 ranking function or CodeBERT, a machine-learning approach. Then, the algorithm computes the similarity distance between the tokenized query and markdown cells to return the most relevant code cells to the developers. We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy. The approach and results in this paper can lead to further improvements in code cell recommendations in Jupyter notebooks.
Metadata only
V-Achilles: An Interactive Visualization of Transitive Security Vulnerabilities
(2022-09-19) Jarukitpipat V.; Chhun K.; Wanprasert W.; Ragkhitwetsagul C.; Choetkiertikul M.; Sunetnanta T.; Kula R.G.; Chinthanet B.; Ishio T.; Matsumoto K.; Mahidol University
A key threat to the usage of third-party dependencies has been the threat of security vulnerabilities, which risks unwanted access to a user application. As part of an ecosystem of dependencies, users of a library are prone to both the direct and transitive dependencies adopted into their applications. Recent work involves tool supports for vulnerable dependency updates, rarely showing the complexity of the transitive updates. In this paper, we introduce our solution to support vulnerability updating in npm. V-Achilles is a prototype that shows a visualization (i.e., using dependency graphs) affected by vulnerability attacks. In addition to the tool overview, we highlight three use cases to demonstrate the usefulness and application of our prototype with real-world npm packages. The prototype is available at https://github.com/MUICT-SERU/V-Achilles, with an accompanying video demonstration at https://www.youtube.com/watch?v=tspiZfhMNcs.
Metadata only
Virtual Reality for Software Engineering Presentations
(2022-01-01) Ragkhitwetsagul C.; Choetkiertikul M.; Hoonlor A.; Prachyabrued M.; Mahidol University
Due to the impact of the pandemic situation, applying online learning methods become an immediate response to tackle the difficulties in teaching and learning, including software engineering courses. Online video meeting platforms (e.g., MS Teams, Webex) are popularly adopted as a medium between instructors and students to conduct online learning classes and they have been modified to provide functions supporting remote teaching and learning activities such as the breakout rooms for conducting group activities. However, maintaining student engagement is still a challenging problem in online learning. Especially, drawing students' attention and enhancing their experience during in-class activities (e.g., project presentations, group discussions) is critical to achieving of activities' objective. Virtual Reality (VR) has been considered to be a potential answer to this online teaching and learning enhancement. This study evaluates the benefit of adopting VR in software engineering class presentation activities. The evaluation result from 3 courses shows that VR improves the online learning and presentation experience by offering visual attractions and presence to students.
Metadata only
Why Visualize Data When Coding? Preliminary Categories for Coding in Jupyter Notebooks
(2022-01-01) Settewong T.; Ritta N.; Kula R.G.; Ragkhitwetsagul C.; Sunetnanta T.; Matsumoto K.; Mahidol University
Data visualization becomes a crucial component in data analytics, especially data exploration, understanding, and analysis. Effective data visualization impacts decision-making and aids in discovering and understanding relationships. It leads to benefits in data-intensive software development tasks e.g., feature engineering in machine learning-based software projects. However, it is unknown how visualizations are used in competitive programming. The idea of this paper is to report early results on what visualizations are prevalent in competitive programming. Grandmasters are the highest level reached in competitions (novice, expert, master, and grandmaster). Analyzing the visualizations of 7 high-rank competitors (i.e., Grandmaster) in Kaggle, we identify and present a catalog of visualizations used to both tell a story from the data, as well as explain the process and pipelines involved to explain their coding solutions. Our taxonomy includes nine types from over 821 visualizations in 68 instances of Jupyter notebooks. Furthermore, most visualizations are for data analysis for distribution (DA Distribution), and frequency (DA Frequency) are most used. We envision that this catalog can be useful to better understand different situations in which to employ these visualizations.

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th