Journal Article FZJ-2025-00807

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Mantik: A Workflow Platform for the Development of Artificial Intelligence on High-Performance Computing Infrastructures

 ;  ;  ;  ;  ;  ;  ;  ;

2024

The journal of open source software 9(98), 6136 () [10.21105/joss.06136]

This record in other databases:

Please use a persistent id in citations: doi:  doi:

Abstract: The use of machine learning (ML) approaches is exponentially increasing, and for manyscientific applications, high-performance computing (HPC) infrastructure is used to train largemodels. However, the tooling for an easy deployment of models for training or inference onHPC infrastructures is not satisfactory, e.g. reproducibility, collaboration and monitoring ofML models are not addressed in existing toolsets. With Mantik, we provide an open-sourcecloud platform, which simplifies the development of and collaboration on ML models on HPCfacilities, and enhances reproducibility by supporting data and code versioning as well asexperiment tracking. The users are able to develop their applications in the environment theyare most comfortable with – their local machine. Usage of the best-choice IDE and mostrecent software versions allow to leverage the full potential of the software stack for theirresearch. Using Mantik’s remote file service allows for simple management of data in remotestorages and keeping track of it. As soon as an application is ready for training or inference,users can immediately submit it to an HPC cluster. During application development, userscan train and/or evaluate their models on HPC clusters via CLI on their local machine or ourbrowser-based Mantik cloud platform. The latter only requires an internet browser such thate.g., ML training from your phone becomes feasible. Once training or inference has begun, auser is able to monitor the application in real time on the Mantik cloud platform.

Classification:

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)
  2. Earth System Data Exploration (ESDE) (ESDE)
  3. MAELSTROM - MAchinE Learning for Scalable meTeoROlogy and cliMate (955513) (955513)

Appears in the scientific report 2024
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; DOAJ ; OpenAccess ; DOAJ Seal
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2025-01-20, last modified 2025-02-03


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)