Publication: Element matching across data-oriented XML sources using a multi-strategy clustering model
Issued Date
2004-03-01
Resource Type
ISSN
0169023X
Other identifier(s)
2-s2.0-1142288175
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Data and Knowledge Engineering. Vol.48, No.3 (2004), 297-333
Suggested Citation
Charnyote Pluempitiwiriyawej, Joachim Hammer Element matching across data-oriented XML sources using a multi-strategy clustering model. Data and Knowledge Engineering. Vol.48, No.3 (2004), 297-333. doi:10.1016/j.datak.2003.06.001 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/21294
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Element matching across data-oriented XML sources using a multi-strategy clustering model
Author(s)
Other Contributor(s)
Abstract
We describe a family of heuristics-based clustering strategies to support the merging of XML data from multiple sources. As part of this research, we have developed a comprehensive classification for schematic and semantic conflicts that can occur when reconciling related XML data from multiple sources. Given the fact that element clustering is compute-intensive, especially when comparing large numbers of data elements that exhibit great representational diversity, performance is a critical, yet so far neglected aspect of the merging process. We have developed five heuristics for clustering data in the multi-dimensional metric space. Equivalence of data elements within the individual clusters is determined using several distance functions that calculate the semantic distances among the elements. The research described in this article is conducted within the context of the Integration Wizard (IWIZ) project at the University of Florida. IWIZ enables users to access and retrieve information from multiple XML-based sources through a consistent, integrated view. The results of our qualitative analysis of the clustering heuristics have validated the feasibility of our approach as well as its superior performance when compared to other similarity search techniques. © 2002 Elsevier Science B.V. All rights reserved.