Improving pseudo-code detection in ubiquitous scholarly data using ensemble machine learning

Suppawong TuarobMahidol University2018-12-212019-03-142018-12-212019-03-142017-02-2120th International Computer Science and Engineering Conference: Smart Ubiquitos Computing and Knowledge, ICSEC 2016. (2017)2-s2.0-85016200381https://repository.li.mahidol.ac.th/handle/123456789/42401© 2016 IEEE. A significant number of new algorithms constantly emerge ubiquitously as computer science and other computational related disciplines grow in advancement and complexity. A majority of these algorithms are developed by professional researchers who publish their algorithmic advancements in scholarly articles, especially in the form of pseudo-codes. The ability to automatically collect, manage, and index these pseudocodes could prove to be useful for computer scientists and software developers seeking cutting-edge algorithmic solutions to their problems. In an effort towards automatic retrieval of these pseudo-codes, a machine learning based approach that detects and extracts these pseudo-codes in large scale scholarly documents has recently been proposed. In this paper, we extend the previous findings by investigating possible enhancement on the previously proposed classification methodology using ensemble learning techniques. The results illustrate that Random Forest is by far the most effective ensemble learning method which improves the classification performance by 13% over the best base classifier.Mahidol UniversityComputer ScienceImproving pseudo-code detection in ubiquitous scholarly data using ensemble machine learningConference PaperSCOPUS10.1109/ICSEC.2016.7859944