Contribution to a conference proceedings FZJ-2021-02866

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures

 ;  ;  ;  ;  ;  ;  ;  ;

2021
IEEE

IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), PortlandPortland, USA, 17 Jun 2021 - 21 Jun 20212021-06-172021-06-21 IEEE 76-85 () [10.1109/IPDPSW52791.2021.00019]

This record in other databases:    

Please use a persistent id in citations:   doi:

Abstract: We observe a continuously increased use of Deep Learning (DL) as a specific type of Machine Learning (ML) for data-intensive problems (i.e., ’big data’) that requires powerful computing resources with equally increasing performance. Consequently, innovative heterogeneous High-Performance Computing (HPC) systems based on multi-core CPUs and many-core GPUs require an architectural design that addresses end user communities’ requirements that take advantage of ML and DL. Still the workloads of end user communities of the simulation sciences (e.g., using numerical methods based on known physical laws) needs to be equally supported in those architectures. This paper offers insights into the Modular Supercomputer Architecture (MSA) developed in the Dynamic Exascale Entry Platform (DEEP) series of projects to address the requirements of both simulation sciences and data-intensive sciences such as High Performance Data Analytics (HPDA). It shares insights into implementing the MSA in the Jülich Supercomputing Centre (JSC) hosting Europe No. 1 Supercomputer Jülich Wizard for European Leadership Science (JUWELS). We augment the technical findings with experience and lessons learned from two application communities case studies (i.e., remote sensing and health sciences) using the MSA with JUWELS and the DEEP systems in practice. Thus, the paper provides details into specific MSA design elements that enable significant performance improvements of ML and DL algorithms. While this paper focuses on MSA-based HPC systems and application experience, we are not losing sight of advances in Cloud Computing (CC) and Quantum Computing (QC) relevant for ML and DL.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  2. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)
  3. DEEP-EST - DEEP - Extreme Scale Technologies (754304) (754304)
  4. AISee - AI- and Simulation-Based Engineering at Exascale (951733) (951733)
  5. DEEP-SEA - DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES (955606) (955606)
  6. EUROCC - National Competence Centres in the framework of EuroHPC (951732) (951732)

Appears in the scientific report 2021
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2021-07-06, last modified 2021-10-23


OpenAccess:
Download fulltext PDF
External link:
Download fulltextFulltext by OpenAccess repository
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)