001     276347
005     20210129220847.0
020 _ _ |a 978-1-4503-4006-9
024 7 _ |a 10.1145/2834892.2834894
|2 doi
024 7 _ |a altmetric:21827709
|2 altmetric
037 _ _ |a FZJ-2015-06807
041 _ _ |a English
100 1 _ |a Götz, Markus
|0 P:(DE-Juel1)162390
|b 0
|e Corresponding author
|u fzj
111 2 _ |a Workshop Workshop on Machine Learning in High-Performance Computing Environments, subworkshop to Supercomputing 2015
|g MLHPC'15
|c Austin
|d 2015-11-15 - 2015-11-15
|w Texas
245 _ _ |a HPDBSCAN - Highly parallel DBSCAN
260 _ _ |c 2015
|b ACM Press New York, New York, USA
295 1 0 |a Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments - MLHPC '15
300 _ _ |a 10p.
336 7 _ |a Contribution to a conference proceedings
|b contrib
|m contrib
|0 PUB:(DE-HGF)8
|s 1448542827_24254
|2 PUB:(DE-HGF)
336 7 _ |a Conference Paper
|0 33
|2 EndNote
336 7 _ |a CONFERENCE_PAPER
|2 ORCID
336 7 _ |a Output Types/Conference Paper
|2 DataCite
336 7 _ |a conferenceObject
|2 DRIVER
336 7 _ |a INPROCEEDINGS
|2 BibTeX
520 _ _ |a Clustering algorithms in the field of data-mining are used to aggregate similar objects into common groups. One of the best-known of these algorithms is called DBSCAN. Its distinct design enables the search for an apriori unknown number of arbitrarily shaped clusters, and at the same time allows to filter out noise. Due to its sequential formulation, the parallelization of DBSCAN renders a challenge. In this paper we present a new parallel approach which we call HPDBSCAN. It employs three major techniques in order to break the sequentiality, empower workload-balancing as well as speed up neighborhood searches in distributed parallel processing environments i) a computation split heuristic for domain decomposition, ii) a data index preprocessing step and iii) a rule-based cluster merging scheme.As a proof-of-concept we implemented HPDBSCAN as an OpenMP/MPI hybrid application. Using real-world data sets, such as a point cloud from the old town of Bremen, Germany, we demonstrate that our implementation is able to achieve a significant speed-up and scale-up in common HPC setups. Moreover, we compare our approach with previous attempts to parallelize DBSCAN showing an order of magnitude improvement in terms of computation time and memory consumption.
536 _ _ |a 512 - Data-Intensive Science and Federated Computing (POF3-512)
|0 G:(DE-HGF)POF3-512
|c POF3-512
|f POF III
|x 0
588 _ _ |a Dataset connected to CrossRef Conference
700 1 _ |a Bodenstein, Christian
|0 P:(DE-Juel1)164357
|b 1
|u fzj
700 1 _ |a Riedel, Morris
|0 P:(DE-Juel1)132239
|b 2
|u fzj
773 _ _ |a 10.1145/2834892.2834894
|p 2
856 4 _ |u https://juser.fz-juelich.de/record/276347/files/a2-gotz.pdf
|y Restricted
856 4 _ |u https://juser.fz-juelich.de/record/276347/files/a2-gotz.gif?subformat=icon
|x icon
|y Restricted
856 4 _ |u https://juser.fz-juelich.de/record/276347/files/a2-gotz.jpg?subformat=icon-1440
|x icon-1440
|y Restricted
856 4 _ |u https://juser.fz-juelich.de/record/276347/files/a2-gotz.jpg?subformat=icon-180
|x icon-180
|y Restricted
856 4 _ |u https://juser.fz-juelich.de/record/276347/files/a2-gotz.jpg?subformat=icon-640
|x icon-640
|y Restricted
909 C O |o oai:juser.fz-juelich.de:276347
|p VDB
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)162390
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)164357
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 2
|6 P:(DE-Juel1)132239
913 1 _ |a DE-HGF
|b Key Technologies
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-512
|2 G:(DE-HGF)POF3-500
|v Data-Intensive Science and Federated Computing
|x 0
|4 G:(DE-HGF)POF
|3 G:(DE-HGF)POF3
|l Supercomputing & Big Data
914 1 _ |y 2015
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a contrib
980 _ _ |a VDB
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a UNRESTRICTED


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21