000276347 001__ 276347 000276347 005__ 20210129220847.0 000276347 020__ $$a978-1-4503-4006-9 000276347 0247_ $$2doi$$a10.1145/2834892.2834894 000276347 0247_ $$2altmetric$$aaltmetric:21827709 000276347 037__ $$aFZJ-2015-06807 000276347 041__ $$aEnglish 000276347 1001_ $$0P:(DE-Juel1)162390$$aGötz, Markus$$b0$$eCorresponding author$$ufzj 000276347 1112_ $$aWorkshop Workshop on Machine Learning in High-Performance Computing Environments, subworkshop to Supercomputing 2015$$cAustin$$d2015-11-15 - 2015-11-15$$gMLHPC'15$$wTexas 000276347 245__ $$aHPDBSCAN - Highly parallel DBSCAN 000276347 260__ $$bACM Press New York, New York, USA$$c2015 000276347 29510 $$aProceedings of the Workshop on Machine Learning in High-Performance Computing Environments - MLHPC '15 000276347 300__ $$a10p. 000276347 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1448542827_24254 000276347 3367_ $$033$$2EndNote$$aConference Paper 000276347 3367_ $$2ORCID$$aCONFERENCE_PAPER 000276347 3367_ $$2DataCite$$aOutput Types/Conference Paper 000276347 3367_ $$2DRIVER$$aconferenceObject 000276347 3367_ $$2BibTeX$$aINPROCEEDINGS 000276347 520__ $$aClustering algorithms in the field of data-mining are used to aggregate similar objects into common groups. One of the best-known of these algorithms is called DBSCAN. Its distinct design enables the search for an apriori unknown number of arbitrarily shaped clusters, and at the same time allows to filter out noise. Due to its sequential formulation, the parallelization of DBSCAN renders a challenge. In this paper we present a new parallel approach which we call HPDBSCAN. It employs three major techniques in order to break the sequentiality, empower workload-balancing as well as speed up neighborhood searches in distributed parallel processing environments i) a computation split heuristic for domain decomposition, ii) a data index preprocessing step and iii) a rule-based cluster merging scheme.As a proof-of-concept we implemented HPDBSCAN as an OpenMP/MPI hybrid application. Using real-world data sets, such as a point cloud from the old town of Bremen, Germany, we demonstrate that our implementation is able to achieve a significant speed-up and scale-up in common HPC setups. Moreover, we compare our approach with previous attempts to parallelize DBSCAN showing an order of magnitude improvement in terms of computation time and memory consumption. 000276347 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0 000276347 588__ $$aDataset connected to CrossRef Conference 000276347 7001_ $$0P:(DE-Juel1)164357$$aBodenstein, Christian$$b1$$ufzj 000276347 7001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b2$$ufzj 000276347 773__ $$a10.1145/2834892.2834894$$p2 000276347 8564_ $$uhttps://juser.fz-juelich.de/record/276347/files/a2-gotz.pdf$$yRestricted 000276347 8564_ $$uhttps://juser.fz-juelich.de/record/276347/files/a2-gotz.gif?subformat=icon$$xicon$$yRestricted 000276347 8564_ $$uhttps://juser.fz-juelich.de/record/276347/files/a2-gotz.jpg?subformat=icon-1440$$xicon-1440$$yRestricted 000276347 8564_ $$uhttps://juser.fz-juelich.de/record/276347/files/a2-gotz.jpg?subformat=icon-180$$xicon-180$$yRestricted 000276347 8564_ $$uhttps://juser.fz-juelich.de/record/276347/files/a2-gotz.jpg?subformat=icon-640$$xicon-640$$yRestricted 000276347 909CO $$ooai:juser.fz-juelich.de:276347$$pVDB 000276347 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)162390$$aForschungszentrum Jülich GmbH$$b0$$kFZJ 000276347 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)164357$$aForschungszentrum Jülich GmbH$$b1$$kFZJ 000276347 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich GmbH$$b2$$kFZJ 000276347 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0 000276347 9141_ $$y2015 000276347 920__ $$lyes 000276347 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 000276347 980__ $$acontrib 000276347 980__ $$aVDB 000276347 980__ $$aI:(DE-Juel1)JSC-20090406 000276347 980__ $$aUNRESTRICTED