Poster (Other) FZJ-2025-03852

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Modernizing Legacy Infrastructure Monitoring: Enhancing Performance with Prometheus and GitLab CI/CD



2025

5th conference for Research Software Engineering in Germany (deRSE25), deRSE25, KarlsruheKarlsruhe, Germany, 25 Feb 2025 - 1 Mar 20252025-02-252025-03-01

Abstract: Effective monitoring of (computing) infrastructure, especially in complex systems with various dependencies, is crucial for ensuring high availability and early detection of performance issues. This poster demonstrates the integration of Prometheus and GitLab CI/CD to modernize our existing infrastructure monitoring methods. As infrastructure checks increase, our legacy monitoring system faces growing challenges such as performance bottlenecks, limited scalability, and maintenance difficulties. Prometheus, with its real-time monitoring and alerting capabilities, offers a scalable and flexible solution. It supports both horizontal and vertical scaling, efficient data storage, and a modular architecture that facilitates the seamless integration of various existing monitoring tools, such as specialized exporters.Using Prometheus as our backend involves setting up a containerized system, creating data sources and targets, and configuring (custom) metrics and alerts. The use of GitLab’s CI/CD pipeline further automates the building, deployment and testing processes. Additionally, Grafana, when used alongside Prometheus, provides a robust visualization tool to display statistics and reports, such as CPU and GPU usage or file quotas. This approach not only enhances efficiency and ensures timely alerts for potential issues but also keeps the monitoring system up-to-date and resilient. It also provides users with valuable statistics through a modern and flexible backend. Furthermore, containerizing the new monitoring system offers significant advantages, including portability, scalability, and modularization.The poster presents selected infrastructure systems, directly comparing the usability and performance of our legacy script-based monitoring system and the new Prometheus-based monitoring system.


Note: https://zenodo.org/records/14982650

Contributing Institute(s):
  1. Datenanalyse und Maschinenlernen (IAS-8)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2025
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Poster
Institute Collections > IAS > IAS-8
Workflow collections > Public records
Publications database

 Record created 2025-09-23, last modified 2025-09-24


External link:
Download fulltext
Fulltext
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)