001019997 001__ 1019997
001019997 005__ 20240226075235.0
001019997 037__ $$aFZJ-2023-05812
001019997 1001_ $$0P:(DE-HGF)0$$aHoppe, Fabian$$b0
001019997 1112_ $$aEuroSciPy$$cBasel$$d2023-08-14 - 2023-08-17$$wSwitzerland
001019997 245__ $$aThe Helmholtz Analytics Toolkit (Heat) and its role in the landscape of massively-parallel scientific Python
001019997 260__ $$c2023
001019997 3367_ $$033$$2EndNote$$aConference Paper
001019997 3367_ $$2DataCite$$aOther
001019997 3367_ $$2BibTeX$$aINPROCEEDINGS
001019997 3367_ $$2DRIVER$$aconferenceObject
001019997 3367_ $$2ORCID$$aLECTURE_SPEECH
001019997 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1704431563_25361$$xAfter Call
001019997 520__ $$aWhen it comes to enhancing exploitation of massive data, machine learning methods are at the forefront of researchers’ awareness. Much less so is the need for, and the complexity of, applying these techniques efficiently across large-scale, memory-distributed data volumes. In fact, these aspects typical for the handling of massive data sets pose major challenges to the vast majority of research communities, in particular to those without a background in high-performance computing. Often, the standard approach involves breaking up and analyzing data in smaller chunks; this can be inefficient and prone to errors, and sometimes it might be inappropriate at all because the context of the overall data set can get lost.The Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python. The main objective is to make memory-intensive data analysis possible across various fields of research ---in particular for domain scientists being non-experts in traditional high-performance computing who nevertheless need to tackle data analytics problems going beyond the capabilities of a single workstation. The development of this interdisciplinary, general-purpose, and open-source scientific Python library started in 2018 and is based on collaboration of three institutions (German Aerospace Center DLR, Forschungszentrum Jülich FZJ, Karlsruhe Institute of Technology KIT) of the Helmholtz Association. The pillars of its development are...    ...to enable memory distribution of n-dimensional arrays,    to adopt PyTorch as process-local compute engine (hence supporting GPU-acceleration),    to provide memory-distributed (i.e., multi-node, multi-GPU) array operations and algorithms, optimizing asynchronous MPI-communication (based on mpi4py) under the hood, and    to wrap functionalities in NumPy- or scikit-learn-like API to achieve porting of existing applications with minimal changes and to enable the usage by non-experts in HPC.In this talk we will give an illustrative overview on the current features and capabilities of our library. Moreover, we will discuss its role in the existing ecosystem of distributed computing in Python, and we will address technical and operational challenges in further development.
001019997 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001019997 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x1
001019997 536__ $$0G:(DE-Juel1)Helmholtz-SLNS$$aSLNS - SimLab Neuroscience (Helmholtz-SLNS)$$cHelmholtz-SLNS$$x2
001019997 7001_ $$0P:(DE-Juel1)174573$$aComito, Claudia$$b1
001019997 7001_ $$0P:(DE-HGF)0$$aGötz, Markus$$b2
001019997 7001_ $$0P:(DE-HGF)0$$aGutiérrez Hermosillo Murieda, Juan Pedro$$b3
001019997 7001_ $$0P:(DE-Juel1)132123$$aHagemeier, Björn$$b4
001019997 7001_ $$0P:(DE-HGF)0$$aKnechtges, Philipp$$b5
001019997 7001_ $$0P:(DE-Juel1)129347$$aKrajsek, Kai$$b6
001019997 7001_ $$0P:(DE-HGF)0$$aRüttgers, Alexander$$b7
001019997 7001_ $$0P:(DE-HGF)0$$aStreit, Achim$$b8
001019997 7001_ $$0P:(DE-Juel1)178977$$aTarnawa, Michael$$b9
001019997 8564_ $$uhttps://pretalx.com/euroscipy-2023/talk/STXCKT/
001019997 909CO $$ooai:juser.fz-juelich.de:1019997$$pVDB
001019997 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)174573$$aForschungszentrum Jülich$$b1$$kFZJ
001019997 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132123$$aForschungszentrum Jülich$$b4$$kFZJ
001019997 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)129347$$aForschungszentrum Jülich$$b6$$kFZJ
001019997 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)178977$$aForschungszentrum Jülich$$b9$$kFZJ
001019997 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001019997 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x1
001019997 9141_ $$y2023
001019997 920__ $$lyes
001019997 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001019997 980__ $$aconf
001019997 980__ $$aVDB
001019997 980__ $$aI:(DE-Juel1)JSC-20090406
001019997 980__ $$aUNRESTRICTED