“Paper, Meet Code”: A Deep Learning Approach to Linking Scholarly Articles with GitHub Repositories
Issued Date
2024-01-01
Resource Type
eISSN
21693536
Scopus ID
2-s2.0-85193211640
Journal Title
IEEE Access
Rights Holder(s)
SCOPUS
Bibliographic Citation
IEEE Access (2024)
Suggested Citation
Puangjaktha P., Choetkiertikul M., Tuarob S. “Paper, Meet Code”: A Deep Learning Approach to Linking Scholarly Articles with GitHub Repositories. IEEE Access (2024). doi:10.1109/ACCESS.2024.3399767 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/98446
Title
“Paper, Meet Code”: A Deep Learning Approach to Linking Scholarly Articles with GitHub Repositories
Author(s)
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
Computer scientists often publish their source code accompanying their publications, prominently using code repositories across various domains. Despite the concurrent existence of scholarly articles and their associated official code repositories, explicit references linking the two are often missing. Traditionally, identifying whether scholarly content and code repositories pertain to the same research project requires manual inspection, a time-consuming task. This paper proposes a deep learning-based algorithm for automatically matching scholarly articles with their corresponding official code repositories. Our findings indicate that the most common linking information includes the paper title and BibTeX entries, typically found in the repository’s readme document. In this study, we employed SPECTER for vector embedding of paper and repository metadata. Utilizing these embedding representations with the Light Gradient Boosting Machine (LGBM), our method achieved an F1 score of 0.94. Moreover, combining our best model with a rule-based approach improved performance by 5.31%. This study successfully delineates a connection between academic papers and associated official code repositories, minimizing reliance on explicit bibliographic information in repositories.