Steven LyndenWaran TaveekarnNational Institute of Advanced Industrial Science and TechnologyMahidol University2020-01-272020-01-272019-01-01Communications in Computer and Information Science. Vol.1071, (2019), 70-7918650937186509292-s2.0-85070014915https://repository.li.mahidol.ac.th/handle/20.500.14594/50680© 2019, Springer Nature Singapore Pte Ltd. Creative feature engineering is an important aspect within machine learning prediction tasks which can be facilitated by augmenting datasets with additional data to improve predictions. This paper presents an approach towards augmenting existing datasets represented as pandas dataframes with data from open data sources, semi-automatically, with the aims of (1) automatically suggesting data augmentation options given an existing set of features, and (2) automatically augmenting the data when a suggestion is selected by the user. This paper demonstrates the performance of the approach in terms of aligning typical machine learning datasets with open data sources, suggesting useful augmentation options, and the design and implementation of a software tool implementing the approach, available as open-source software.Mahidol UniversityComputer ScienceMathematicsSemi-automated augmentation of pandas dataframesConference PaperSCOPUS10.1007/978-981-32-9563-6_8