Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

Iqra Safder; Saeed Ul Hassan; Anna Visvizi; Thanapon Noraset; Raheel Nawaz; Suppawong Tuarob

Publication:
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

3

Issued Date

2020-11-01

Resource Type

Article

ISSN

03064573

DOI

10.1016/j.ipm.2020.102269

Other identifier(s)

2-s2.0-85085523063

Rights

Mahidol University

Rights Holder(s)

SCOPUS

Bibliographic Citation

Information Processing and Management. Vol.57, No.6 (2020)

Suggested Citation

Iqra Safder, Saeed Ul Hassan, Anna Visvizi, Thanapon Noraset, Raheel Nawaz, Suppawong Tuarob Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents. Information Processing and Management. Vol.57, No.6 (2020). doi:10.1016/j.ipm.2020.102269 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/57817

Title

Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

Author(s)

Iqra Safder
Saeed Ul Hassan
Anna Visvizi
Thanapon Noraset
Raheel Nawaz
Suppawong Tuarob

Other Contributor(s)

Information Technology University
American College of Greece
Manchester Metropolitan University
Mahidol University

Abstract

© 2020 Elsevier Ltd The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively.

Keyword(s)

Computer Science
Decision Sciences
Engineering
Social Sciences

URI

https://repository.li.mahidol.ac.th/handle/123456789/57817

Collections

Scopus 2020

Full item page

Send Feedback

Publication:
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

Issued Date

Resource Type

ISSN

DOI

Other identifier(s)

Rights

Rights Holder(s)

Bibliographic Citation

Suggested Citation

Research Projects

Organizational Units

Authors

Journal Issue

Thesis

Title

Author(s)

Other Contributor(s)

Abstract

Keyword(s)

Availability

URI

Collections

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Publication: Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

Issued Date

Resource Type

ISSN

DOI

Other identifier(s)

Rights

Rights Holder(s)

Bibliographic Citation

Suggested Citation

Research Projects

Organizational Units

Authors

Journal Issue

Thesis

Title

Author(s)

Other Contributor(s)

Abstract

Keyword(s)

Availability

URI

Collections

Publication:
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents