Scaling DBSCAN towards exascale computing for clustering of big datasets

Erlingsson, Ernir; Cavallaro, Gabriele; Neukirchen, Helmut; Riedel, Morris

Items
Marc 21

001			852519
005			20210129235129.0
024	7	_	\|a 2128/19727 \|2 Handle
037	_	_	\|a FZJ-2018-05447
082	_	_	\|a 550
100	1	_	\|a Erlingsson, Ernir \|0 P:(DE-HGF)0 \|b 0 \|e Corresponding author
111	2	_	\|a EGU General Assembly 2018 \|c Wien \|d 2018-04-08 - 2018-04-13 \|w Austria
245	_	_	\|a Scaling DBSCAN towards exascale computing for clustering of big datasets
260	_	_	\|a Katlenburg-Lindau \|c 2018 \|b Soc.
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1538050369_16823 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
520	_	_	\|a Progress in sensor technology allows us to collect environmental data in more detail and with better resolutionthan ever before. One example are 3D laser scanners that generate 3D point-cloud datasets for land survey.Clustering can then be performed on these datasets to identify objects such as buildings, trees, or rocks in theunstructured point-clouds. Segmenting huge point-clouds (of whole cities or even whole countries) into objects isa computationally expensive operation and therefore requires parallel processing. Density-based spatial clusteringof applications with noise (DBSCAN) is a popular clustering algorithm and HPDBSCAN is an efficient parallelimplementation of it running on supercomputing clusters. Tomorrow’s supercomputers will be able to provideexascale computing performance by exploiting specialised hardware accelerators, however, existing softwareneeds to be adapted to make use of the best fitting accelerators. To address this problem, we present a mapping ofHPDBSCAN to a pre-exascale platform currently being developed by the European DEEP-EST project. It is basedon the Modular Supercomputer Architecture (MSA) that provides a set of accelerator modules which we exploitin novel ways to enhance HPDBSCAN to reach exascale performance. These MSA modules include: a ClusterModule (CM) with powerful multicore CPUs; the Extreme Scale Booster (ESB) module with manycore CPUs;the Network Attached Memory (NAM) module which stores datasets and provides extremely fast access to them;a fast interconnect fabric speeds up inter-process message passing together with the Global Collective Engine(GCE), which includes a multi-purpose Field Programmable Gate Array (FPGA) for, e.g., summing up valuestransmitted in messages collected. HPDBSCAN exploits the above accelerator modules as follows: the data that isto be clustered can be stored in the NAM, it gets subsequently distributed and load balanced, which is acceleratedby the GCE, to the CPU nodes of the CM; the parallel clustering itself is performed by the powerful CPUs of theCM which also merges the obtained cluster IDs; the merged cluster IDs are stored in the NAM for further level ofdetail (LoD) studies, i.e. zooming in and out based on continuous, instead of fixed, levels of importance for eachpoint, which can be regarded as an added dimension. The ESB module (supported by GCE) is most suitable tocalculate these continuous level of importance (cLoI) values and add them to the dataset in the NAM. Based on theadded cLoI data, the LoD studies can then be performed by re-clustering as described previously, i.e. distributionand load balancing of the cLoI value-enriched dataset followed by parallel clustering. The described approach willallow to scale HPDBSCAN-based clusterings on tomorrow’s hardware towards exascale performance.
536	_	_	\|a 512 - Data-Intensive Science and Federated Computing (POF3-512) \|0 G:(DE-HGF)POF3-512 \|c POF3-512 \|f POF III \|x 0
536	_	_	\|a DEEP-EST - DEEP - Extreme Scale Technologies (754304) \|0 G:(EU-Grant)754304 \|c 754304 \|f H2020-FETHPC-2016 \|x 1
700	1	_	\|a Neukirchen, Helmut \|0 P:(DE-HGF)0 \|b 1
700	1	_	\|a Cavallaro, Gabriele \|0 P:(DE-Juel1)171343 \|b 2 \|u fzj
700	1	_	\|a Riedel, Morris \|0 P:(DE-Juel1)132239 \|b 3 \|u fzj
773	_	_	\|0 PERI:(DE-600)2144416-X \|p EGU2018-16171 \|t Geophysical research abstracts \|v 20 \|y 2018 \|x 1607-7962
856	4	_	\|y OpenAccess \|u https://juser.fz-juelich.de/record/852519/files/EGU2018-16171.pdf
856	4	_	\|y OpenAccess \|x pdfa \|u https://juser.fz-juelich.de/record/852519/files/EGU2018-16171.pdf?subformat=pdfa
909	C	O	\|o oai:juser.fz-juelich.de:852519 \|p openaire \|p open_access \|p driver \|p VDB \|p ec_fundedresources \|p dnbdelivery
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)171343
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 3 \|6 P:(DE-Juel1)132239
913	1	_	\|a DE-HGF \|b Key Technologies \|1 G:(DE-HGF)POF3-510 \|0 G:(DE-HGF)POF3-512 \|2 G:(DE-HGF)POF3-500 \|v Data-Intensive Science and Federated Computing \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3 \|l Supercomputing & Big Data
914	1	_	\|y 2018
915	_	_	\|a No Peer Review \|0 StatID:(DE-HGF)0020 \|2 StatID
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
915	_	_	\|a Creative Commons Attribution CC BY 4.0 \|0 LIC:(DE-HGF)CCBY4 \|2 HGFVOC
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe