Conference Presentation (After Call) FZJ-2023-05813

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Scaling data-intensive analytics with Heat: a Python library for massively-parallel array computing and machine learning

 ;  ;  ;  ;  ;  ;  ;  ;  ;

2023

Helmholtz AI Conference, HamburgHamburg, Germany, 12 Jun 2023 - 14 Jun 20232023-06-122023-06-14

Abstract: Manipulating and processing massive data sets is challenging. For the vast majority of research communities, those without a background in high-performance computing, the standard approach involves breaking up and analyzing data in smaller chunks, an inefficient and very prone-to-errors process.The Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python. Developed in collaboration by three institutions of the Helmholtz Association (KIT, FZJ, DLR), Heat: enables memory distribution of n-dimensional arrays, adopts PyTorch as process-local compute engine (hence supporting GPU-acceleration), provides memory-distributed (i.e., multi-node, multi-GPU) array operations and algorithms, optimizing asynchronous MPI-communication under the hood, and wraps functionalities in NumPy- or scikit-learn-like API to achieve porting of existing applications with minimal changes.In this presentation, we will provide an overview of the Heat library's features and capabilities and discuss its role in the ecosystem of distributed array computing and machine learning in Python. Additionally, we will highlight Heat's role as a platform for cross-discipline collaboration in data-intensive research, and address technical and operational challenges in Heat development.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)
  2. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  3. SLNS - SimLab Neuroscience (Helmholtz-SLNS) (Helmholtz-SLNS)

Appears in the scientific report 2023
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Conference Presentations
Workflow collections > Public records
Institute Collections > JSC
Publications database

 Record created 2023-12-21, last modified 2024-01-05


External link:
Download fulltext
Fulltext
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)