Linear memory implementation of HDBSCAN clustering based on SLINK.
By not building a distance matrix, we can reduce memory usage to linear
memory only; but at the cost of roughly double the runtime (unless using
indexes) as we first need to compute all kNN distances (for core sizes), then
recompute distances when building the spanning tree.
This version uses the SLINK algorithm to directly produce the pointer
representation expected by the extraction methods. The SLINK algorithm is
closely related to Prim's minimum spanning tree, but produces the more
compact pointer representation instead of an edges list.
This implementation does not include the cluster extraction
discussed as Step 4. This functionality should however already be provided by
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.extraction.HDBSCANHierarchyExtraction. For this reason, we also do not include self-edges.
Reference:
R. J. G. B. Campello, D. Moulavi, J. Sander
Density-Based Clustering Based on Hierarchical Density Estimates
Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)