Scaling DBSCAN towards exascale computing for clustering of big datasets

Erlingsson, Ernir; Cavallaro, Gabriele; Neukirchen, Helmut; Riedel, Morris
000852519 001__ 852519
000852519 005__ 20210129235129.0
000852519 0247_ $$2Handle$$a2128/19727
000852519 037__ $$aFZJ-2018-05447
000852519 082__ $$a550
000852519 1001_ $$0P:(DE-HGF)0$$aErlingsson, Ernir$$b0$$eCorresponding author
000852519 1112_ $$aEGU General Assembly 2018$$cWien$$d2018-04-08 - 2018-04-13$$wAustria
000852519 245__ $$aScaling DBSCAN towards exascale computing for clustering of big datasets
000852519 260__ $$aKatlenburg-Lindau$$bSoc.$$c2018
000852519 3367_ $$2DRIVER$$aarticle
000852519 3367_ $$2DataCite$$aOutput Types/Journal article
000852519 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1538050369_16823
000852519 3367_ $$2BibTeX$$aARTICLE
000852519 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000852519 3367_ $$00$$2EndNote$$aJournal Article
000852519 520__ $$aProgress in sensor technology allows us to collect environmental data in more detail and with better resolutionthan ever before. One example are 3D laser scanners that generate 3D point-cloud datasets for land survey.Clustering can then be performed on these datasets to identify objects such as buildings, trees, or rocks in theunstructured point-clouds. Segmenting huge point-clouds (of whole cities or even whole countries) into objects isa computationally expensive operation and therefore requires parallel processing. Density-based spatial clusteringof applications with noise (DBSCAN) is a popular clustering algorithm and HPDBSCAN is an efficient parallelimplementation of it running on supercomputing clusters. Tomorrow’s supercomputers will be able to provideexascale computing performance by exploiting specialised hardware accelerators, however, existing softwareneeds to be adapted to make use of the best fitting accelerators. To address this problem, we present a mapping ofHPDBSCAN to a pre-exascale platform currently being developed by the European DEEP-EST project. It is basedon the Modular Supercomputer Architecture (MSA) that provides a set of accelerator modules which we exploitin novel ways to enhance HPDBSCAN to reach exascale performance. These MSA modules include: a ClusterModule (CM) with powerful multicore CPUs; the Extreme Scale Booster (ESB) module with manycore CPUs;the Network Attached Memory (NAM) module which stores datasets and provides extremely fast access to them;a fast interconnect fabric speeds up inter-process message passing together with the Global Collective Engine(GCE), which includes a multi-purpose Field Programmable Gate Array (FPGA) for, e.g., summing up valuestransmitted in messages collected. HPDBSCAN exploits the above accelerator modules as follows: the data that isto be clustered can be stored in the NAM, it gets subsequently distributed and load balanced, which is acceleratedby the GCE, to the CPU nodes of the CM; the parallel clustering itself is performed by the powerful CPUs of theCM which also merges the obtained cluster IDs; the merged cluster IDs are stored in the NAM for further level ofdetail (LoD) studies, i.e. zooming in and out based on continuous, instead of fixed, levels of importance for eachpoint, which can be regarded as an added dimension. The ESB module (supported by GCE) is most suitable tocalculate these continuous level of importance (cLoI) values and add them to the dataset in the NAM. Based on theadded cLoI data, the LoD studies can then be performed by re-clustering as described previously, i.e. distributionand load balancing of the cLoI value-enriched dataset followed by parallel clustering. The described approach willallow to scale HPDBSCAN-based clusterings on tomorrow’s hardware towards exascale performance.
000852519 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0
000852519 536__ $$0G:(EU-Grant)754304$$aDEEP-EST - DEEP - Extreme Scale Technologies (754304)$$c754304$$fH2020-FETHPC-2016$$x1
000852519 7001_ $$0P:(DE-HGF)0$$aNeukirchen, Helmut$$b1
000852519 7001_ $$0P:(DE-Juel1)171343$$aCavallaro, Gabriele$$b2$$ufzj
000852519 7001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b3$$ufzj
000852519 773__ $$0PERI:(DE-600)2144416-X$$pEGU2018-16171$$tGeophysical research abstracts$$v20$$x1607-7962$$y2018
000852519 8564_ $$uhttps://juser.fz-juelich.de/record/852519/files/EGU2018-16171.pdf$$yOpenAccess
000852519 8564_ $$uhttps://juser.fz-juelich.de/record/852519/files/EGU2018-16171.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000852519 909CO $$ooai:juser.fz-juelich.de:852519$$pdnbdelivery$$pec_fundedresources$$pVDB$$pdriver$$popen_access$$popenaire
000852519 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)171343$$aForschungszentrum Jülich$$b2$$kFZJ
000852519 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich$$b3$$kFZJ
000852519 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0
000852519 9141_ $$y2018
000852519 915__ $$0StatID:(DE-HGF)0020$$2StatID$$aNo Peer Review
000852519 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000852519 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000852519 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000852519 980__ $$ajournal
000852519 980__ $$aVDB
000852519 980__ $$aUNRESTRICTED
000852519 980__ $$aI:(DE-Juel1)JSC-20090406
000852519 9801_ $$aFullTexts
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe