Publication: Semi-automated augmentation of pandas dataframes
Issued Date
2019-01-01
Resource Type
ISSN
18650937
18650929
18650929
Other identifier(s)
2-s2.0-85070014915
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Communications in Computer and Information Science. Vol.1071, (2019), 70-79
Suggested Citation
Steven Lynden, Waran Taveekarn Semi-automated augmentation of pandas dataframes. Communications in Computer and Information Science. Vol.1071, (2019), 70-79. doi:10.1007/978-981-32-9563-6_8 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/50680
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Semi-automated augmentation of pandas dataframes
Author(s)
Other Contributor(s)
Abstract
© 2019, Springer Nature Singapore Pte Ltd. Creative feature engineering is an important aspect within machine learning prediction tasks which can be facilitated by augmenting datasets with additional data to improve predictions. This paper presents an approach towards augmenting existing datasets represented as pandas dataframes with data from open data sources, semi-automatically, with the aims of (1) automatically suggesting data augmentation options given an existing set of features, and (2) automatically augmenting the data when a suggestion is selected by the user. This paper demonstrates the performance of the approach in terms of aligning typical machine learning datasets with open data sources, suggesting useful augmentation options, and the design and implementation of a software tool implementing the approach, available as open-source software.