Finding Syntactic Similarities Between XML Documents

Loading...
Thumbnail Image

Date

Citation for Previous Publication

Link to Related Item

Abstract

Description

Technical report TR05-16. We present a concise and accurate structural summary of XML documents and show that this summary can be used to effectively cluster documents that belong to a structurally similar class. We present efficient formulations of similarity between structural summaries that leads to a better detection of documents that conform to the same DTD. Our formulation is based on the intuition that two documents are likely to be generated by the same DTD if a large fraction of paths in the two documents are the same or similar. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. | TRID-ID TR05-16

Item Type

http://purl.org/coar/resource_type/c_93fc

Alternative

Other License Text / Link

Subject/Keywords

Language

en

Location

Time Period

Source