%0 Conference Paper
%A Hoppe, Fabian
%A Comito, Claudia
%A Gutiérrez Hermosillo Muriedas, Juan Pedro
%A Götz, Markus
%A Hagemeier, Björn
%A Knechtges, Philipp
%A Krajsek, Kai
%A Rüttgers, Alexander
%A Streit, Achim
%A Tarnawa, Michael
%T Scaling data-intensive analytics with Heat: a Python library for massively-parallel array computing and machine learning
%M FZJ-2023-05813
%D 2023
%X Manipulating and processing massive data sets is challenging. For the vast majority of research communities, those without a background in high-performance computing, the standard approach involves breaking up and analyzing data in smaller chunks, an inefficient and very prone-to-errors process.The Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python. Developed in collaboration by three institutions of the Helmholtz Association (KIT, FZJ, DLR), Heat: enables memory distribution of n-dimensional arrays, adopts PyTorch as process-local compute engine (hence supporting GPU-acceleration), provides memory-distributed (i.e., multi-node, multi-GPU) array operations and algorithms, optimizing asynchronous MPI-communication under the hood, and wraps functionalities in NumPy- or scikit-learn-like API to achieve porting of existing applications with minimal changes.In this presentation, we will provide an overview of the Heat library's features and capabilities and discuss its role in the ecosystem of distributed array computing and machine learning in Python. Additionally, we will highlight Heat's role as a platform for cross-discipline collaboration in data-intensive research, and address technical and operational challenges in Heat development.
%B Helmholtz AI Conference
%C 12 Jun 2023 - 14 Jun 2023, Hamburg (Germany)
Y2 12 Jun 2023 - 14 Jun 2023
M2 Hamburg, Germany
%F PUB:(DE-HGF)6
%9 Conference Presentation
%U https://juser.fz-juelich.de/record/1019998