Fall 2025 theses and dissertations (non-restricted) will be available in ERA on November 17, 2025.

Speedup Clustering with Hierarchical Ranking

Loading...
Thumbnail Image

Date

Citation for Previous Publication

Link to Related Item

Abstract

Description

Technical report TR08-09. Many clustering algorithms in particular hierarchical clustering algorithms do not scale-up well for large data-sets especially when using an expensive distance function. In this paper, we propose a novel approach to perform approximate clustering with high accuracy. We introduce the concept of a pairwise hierarchical ranking to efficiently determine close neighbors for every data object. We also propose two techniques to significantly reduce the overhead of ranking: 1) a frontier search rather than a sequential scan in the naïve ranking to reduce the search space; 2) based on this exact search, an approximate frontier search for pairwise ranking that further reduces the runtime. Empirical results on synthetic and real-life data show a speedup of up to two orders of magnitude over OPTICS while maintaining a high accuracy and up to one order of magnitude over the previously proposed DATA BUBBLES method, which also tries to speedup OPTICS by trading accuracy for speed. | TRID-ID TR08-09

Item Type

http://purl.org/coar/resource_type/c_93fc

Alternative

Other License Text / Link

Language

en

Location

Time Period

Source