%0 Conference Paper
%A Comito, Claudia
%A Hoppe, Fabian
%A Götz, Markus
%A Gutiérrez Hermosillo Muriedas, Juan Pedro
%A Hagemeier, Björn
%A Knechtges, Philipp
%A Krajsek, Kai
%A Rüttgers, Alexander
%A Streit, Achim
%A Tarnawa, Michael
%T Heat: accelerating massive data processing in Python
%M FZJ-2023-05811
%D 2023
%X Manipulating and processing massive data sets is challenging. In astrophysics as in the vast majority of research communities, the standard approach involves breaking up and analyzing data in smaller chunks, a process that is both inefficient and prone to errors. The problem is exacerbated on GPUs, because of the smaller available memory.Popular solutions to distribute NumPy/SciPy computations are based on task parallelism, introducing significant runtime overhead, complicating implementation, and often limiting GPU support to one vendor.This poster illustrates an alternative based on data parallelism instead. The open-source library Heat [1, 2] builds on PyTorch and mpi4py to simplify porting of NumPy/SciPy-based code to GPU (CUDA, ROCm, including multi-GPU, multi-node clusters). Under the hood, Heat distributes massive memory-intensive operations over multi-node resources via MPI communication. From a user's perspective, Heat can be used seamlessly in the Python array ecosystem. Supported features:- distributed (multi-GPU) I/O from shared memory- easy distribution of memory-intensive operations in existing code (e.g. matrix multiplication)- interoperability within the Python array ecosystem: Heat as a backend for your massive array manipulations, statistics, signal processing, machine learning...- transparent parallelism: prototype on your laptop, run the same code on HPC cluster.I'll also touch upon Heat's current implementation roadmap, and possible paths to collaboration.[1] https://github.com/helmholtz-analytics/heat[2] M. Götz et al., "HeAT – a Distributed and GPU-accelerated Tensor Framework for Data Analytics," 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 2020, pp. 276-287, doi: 10.1109/BigData50022.2020.9378050.
%B CS & Physics Meet-Up by Lamarr & B3D
%C 29 Nov 2023 - 1 Dec 2023, TU Dortmund (Germany)
Y2 29 Nov 2023 - 1 Dec 2023
M2 TU Dortmund, Germany
%F PUB:(DE-HGF)24
%9 Poster
%U https://juser.fz-juelich.de/record/1019996