Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

Iqra Safder; Saeed Ul Hassan; Anna Visvizi; Thanapon Noraset; Raheel Nawaz; Suppawong Tuarob

Publication:
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

dc.contributor.author	Iqra Safder	en_US
dc.contributor.author	Saeed Ul Hassan	en_US
dc.contributor.author	Anna Visvizi	en_US
dc.contributor.author	Thanapon Noraset	en_US
dc.contributor.author	Raheel Nawaz	en_US
dc.contributor.author	Suppawong Tuarob	en_US
dc.contributor.other	Information Technology University	en_US
dc.contributor.other	American College of Greece	en_US
dc.contributor.other	Manchester Metropolitan University	en_US
dc.contributor.other	Mahidol University	en_US
dc.date.accessioned	2020-08-25T09:35:14Z
dc.date.available	2020-08-25T09:35:14Z
dc.date.issued	2020-11-01	en_US
dc.description.abstract	© 2020 Elsevier Ltd The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively.	en_US
dc.identifier.citation	Information Processing and Management. Vol.57, No.6 (2020)	en_US
dc.identifier.doi	10.1016/j.ipm.2020.102269	en_US
dc.identifier.issn	03064573	en_US
dc.identifier.other	2-s2.0-85085523063	en_US
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/20.500.14594/57817
dc.rights	Mahidol University	en_US
dc.rights.holder	SCOPUS	en_US
dc.source.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85085523063&origin=inward	en_US
dc.subject	Computer Science	en_US
dc.subject	Decision Sciences	en_US
dc.subject	Engineering	en_US
dc.subject	Social Sciences	en_US
dc.title	Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents	en_US
dc.type	Article	en_US
dspace.entity.type	Publication
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85085523063&origin=inward	en_US

Collections

Scopus 2020

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Publication: Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

Files

Collections

Publication:
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents