Journal Article FZJ-2024-06868

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Vectorized Highly Parallel Density-Based Clustering for Applications With Noise

 ;  ;  ;  ;  ;  ;  ;

2024
IEEE New York, NY

IEEE access 12, 181679 - 181692 () [10.1109/ACCESS.2024.3507193]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor’s SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN’s energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.

Classification:

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)
  2. 5122 - Future Computing & Big Data Systems (POF4-512) (POF4-512)
  3. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  4. EUPEX - EUROPEAN PILOT FOR EXASCALE (101033975) (101033975)
  5. EUROCC-2 (DEA02266) (DEA02266)

Appears in the scientific report 2024
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; DOAJ ; OpenAccess ; Article Processing Charges ; Clarivate Analytics Master Journal List ; Current Contents - Electronics and Telecommunications Collection ; Current Contents - Engineering, Computing and Technology ; DOAJ Seal ; Essential Science Indicators ; Fees ; IF < 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Workflow collections > Public records
Workflow collections > Publication Charges
Institute Collections > JSC
Publications database
Open Access

 Record created 2024-12-10, last modified 2025-02-03


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)