Contribution to a conference proceedings FZJ-2015-06807

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
HPDBSCAN - Highly parallel DBSCAN

 ;  ;

2015
ACM Press New York, New York, USA
ISBN: 978-1-4503-4006-9

Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments - MLHPC '15
Workshop Workshop on Machine Learning in High-Performance Computing Environments, subworkshop to Supercomputing 2015, MLHPC'15, AustinAustin, Texas, 15 Nov 2015 - 15 Nov 20152015-11-152015-11-15
ACM Press New York, New York, USA 10p. () [10.1145/2834892.2834894]

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: Clustering algorithms in the field of data-mining are used to aggregate similar objects into common groups. One of the best-known of these algorithms is called DBSCAN. Its distinct design enables the search for an apriori unknown number of arbitrarily shaped clusters, and at the same time allows to filter out noise. Due to its sequential formulation, the parallelization of DBSCAN renders a challenge. In this paper we present a new parallel approach which we call HPDBSCAN. It employs three major techniques in order to break the sequentiality, empower workload-balancing as well as speed up neighborhood searches in distributed parallel processing environments i) a computation split heuristic for domain decomposition, ii) a data index preprocessing step and iii) a rule-based cluster merging scheme.As a proof-of-concept we implemented HPDBSCAN as an OpenMP/MPI hybrid application. Using real-world data sets, such as a point cloud from the old town of Bremen, Germany, we demonstrate that our implementation is able to achieve a significant speed-up and scale-up in common HPC setups. Moreover, we compare our approach with previous attempts to parallelize DBSCAN showing an order of magnitude improvement in terms of computation time and memory consumption.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 512 - Data-Intensive Science and Federated Computing (POF3-512) (POF3-512)

Appears in the scientific report 2015
Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Workflow collections > Public records
Institute Collections > JSC
Publications database

 Record created 2015-11-25, last modified 2021-01-29


Restricted:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)