Finding Syntactic Similarities Between XML Documents
Loading...
Date
Author(s)
Citation for Previous Publication
Link to Related Item
Abstract
Description
Technical report TR05-16. We present a concise and accurate structural summary of XML documents and show that this summary can be used to effectively cluster documents that belong to a structurally similar class. We present efficient formulations of similarity between structural summaries that leads to a better detection of documents that conform to the same DTD. Our formulation is based on the intuition that two documents are likely to be generated by the same DTD if a large fraction of paths in the two documents are the same or similar. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. | TRID-ID TR05-16
Item Type
http://purl.org/coar/resource_type/c_93fc
Alternative
Other License Text / Link
Subject/Keywords
Language
en
