Publication: Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents
dc.contributor.author | Iqra Safder | en_US |
dc.contributor.author | Saeed Ul Hassan | en_US |
dc.contributor.author | Anna Visvizi | en_US |
dc.contributor.author | Thanapon Noraset | en_US |
dc.contributor.author | Raheel Nawaz | en_US |
dc.contributor.author | Suppawong Tuarob | en_US |
dc.contributor.other | Information Technology University | en_US |
dc.contributor.other | American College of Greece | en_US |
dc.contributor.other | Manchester Metropolitan University | en_US |
dc.contributor.other | Mahidol University | en_US |
dc.date.accessioned | 2020-08-25T09:35:14Z | |
dc.date.available | 2020-08-25T09:35:14Z | |
dc.date.issued | 2020-11-01 | en_US |
dc.description.abstract | © 2020 Elsevier Ltd The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively. | en_US |
dc.identifier.citation | Information Processing and Management. Vol.57, No.6 (2020) | en_US |
dc.identifier.doi | 10.1016/j.ipm.2020.102269 | en_US |
dc.identifier.issn | 03064573 | en_US |
dc.identifier.other | 2-s2.0-85085523063 | en_US |
dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/20.500.14594/57817 | |
dc.rights | Mahidol University | en_US |
dc.rights.holder | SCOPUS | en_US |
dc.source.uri | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85085523063&origin=inward | en_US |
dc.subject | Computer Science | en_US |
dc.subject | Decision Sciences | en_US |
dc.subject | Engineering | en_US |
dc.subject | Social Sciences | en_US |
dc.title | Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents | en_US |
dc.type | Article | en_US |
dspace.entity.type | Publication | |
mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85085523063&origin=inward | en_US |