Publication:
DATA++: An Automated Tool for Intelligent Data Augmentation Using Wikidata

dc.contributor.authorWaran Taveekarnen_US
dc.contributor.authorChatchanin Yimudomen_US
dc.contributor.authorSupisara Sukkantaen_US
dc.contributor.authorSteven Lyndenen_US
dc.contributor.authorWudhichart Sawangpholen_US
dc.contributor.authorSuppawong Tuaroben_US
dc.contributor.otherNational Institute of Advanced Industrial Science and Technologyen_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2020-01-27T08:19:36Z
dc.date.available2020-01-27T08:19:36Z
dc.date.issued2019-07-01en_US
dc.description.abstract© 2019 IEEE. In the present, technology has become a big influence that impacts the lives of many humans, with artificial intelligence being one of the most influential elements. Creative feature engineering is an important part of machine learning methodology that supports and manipulates existing data to make it work more efficiently by modifying dimensions of data. Pulling useful information from external sources and combining them, however, are cumbersome since data engineers need to manually find external data sources and process them. Therefore, the ability to modify and enrich existing data automatically, using external open data sources could prove crucial to data engineers and scientists looking to enrich their datasets. In this paper, we propose a method that automatically augments a given structured dataset, by inferencing relevant dimension from an external data source with respect to the target attribute. Specifically, our proposed algorithm first creates bloom filters for every instance of data items. Such filters are then used to retrieve relevant information from the linked open data source, which is later processed into additional columns in the target dataset. A case study of three real-world datasets using Wikidata as the external data source is used to empirically validate our proposed method on both regression and classification tasks. The experimental results show that the datasets augmented by our proposed algorithm yield correlation improvement of 23.11 % on average for the regression task, and ROC improvement of 86.50% for the classification task.en_US
dc.identifier.citationJCSSE 2019 - 16th International Joint Conference on Computer Science and Software Engineering: Knowledge Evolution Towards Singularity of Man-Machine Intelligence. (2019), 91-96en_US
dc.identifier.doi10.1109/JCSSE.2019.8864152en_US
dc.identifier.other2-s2.0-85074238695en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/50629
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85074238695&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.subjectDecision Sciencesen_US
dc.titleDATA++: An Automated Tool for Intelligent Data Augmentation Using Wikidataen_US
dc.typeConference Paperen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85074238695&origin=inwarden_US

Files

Collections