TY - CONF
AU - Hoppe, Fabian
AU - Comito, Claudia
AU - Gutiérrez Hermosillo Muriedas, Juan Pedro
AU - Götz, Markus
AU - Hagemeier, Björn
AU - Knechtges, Philipp
AU - Krajsek, Kai
AU - Rüttgers, Alexander
AU - Streit, Achim
AU - Tarnawa, Michael
TI - Scaling data-intensive analytics with Heat: a Python library for massively-parallel array computing and machine learning
M1 - FZJ-2023-05813
PY - 2023
AB - Manipulating and processing massive data sets is challenging. For the vast majority of research communities, those without a background in high-performance computing, the standard approach involves breaking up and analyzing data in smaller chunks, an inefficient and very prone-to-errors process.The Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python. Developed in collaboration by three institutions of the Helmholtz Association (KIT, FZJ, DLR), Heat: enables memory distribution of n-dimensional arrays, adopts PyTorch as process-local compute engine (hence supporting GPU-acceleration), provides memory-distributed (i.e., multi-node, multi-GPU) array operations and algorithms, optimizing asynchronous MPI-communication under the hood, and wraps functionalities in NumPy- or scikit-learn-like API to achieve porting of existing applications with minimal changes.In this presentation, we will provide an overview of the Heat library's features and capabilities and discuss its role in the ecosystem of distributed array computing and machine learning in Python. Additionally, we will highlight Heat's role as a platform for cross-discipline collaboration in data-intensive research, and address technical and operational challenges in Heat development.
T2 - Helmholtz AI Conference
CY - 12 Jun 2023 - 14 Jun 2023, Hamburg (Germany)
Y2 - 12 Jun 2023 - 14 Jun 2023
M2 - Hamburg, Germany
LB - PUB:(DE-HGF)6
UR - https://juser.fz-juelich.de/record/1019998
ER -