About the course
Building upon your foundational knowledge of Prometheus and Grafana basics, this 2-day advanced workshop dives deeper into the powerful capabilities needed to implement robust monitoring, alerting, and visualisation solutions for production environments and at scale. It is designed for participants who are already familiar with the fundamental concepts covered in the introductory workshop or have equivalent hands-on experience with getting started with Prometheus and Grafana. This workshop focuses on mastering advanced techniques and understanding operational considerations crucial for real-world deployments.
The workshop begins with a review of core PromQL concepts before exploring advanced querying techniques and optimising query performance with Recording Rules. You will learn how to automate target discovery using various Service Discovery methods, which is crucial for monitoring dynamic infrastructure. A significant focus is placed on comprehensive alerting, covering the definition of complex alert rules in Prometheus and the detailed configuration and management of notifications using Alertmanager, including routing, grouping, and silencing.
Participants will also enhance their Grafana skills by mastering advanced dashboarding techniques, including building dynamic dashboards using variables and templates, applying transformations, and using data linking for deeper analysis. The workshop concludes by covering essential operational aspects like sizing, data retention, backup, and an overview of high availability/scaling strategies. Key security considerations for the monitoring stack and an introduction to integrating with other observability tools (logs, traces) are also included, providing participants with the knowledge to build and maintain production-ready Prometheus and Grafana deployments.
Instructor-led online and in-house face-to-face options are available - as part of a wider customised training programme, or as a standalone workshop, on-site at your offices or at one of many flexible meeting spaces in the UK and around the World.
-
- Apply advanced PromQL techniques to perform complex data analysis and troubleshooting.
- Define and use Recording Rules to optimise query performance and simplify complex expressions.
- Implement Service Discovery to automatically manage monitoring targets in dynamic environments.
- Define and manage Prometheus Alerting Rules effectively for different scenarios.
- Configure and use Alertmanager for advanced alert routing, grouping, and notification management.
- Build advanced Grafana dashboards using variables, templates, transformations, and linking for enhanced interactivity and reusability.
- Utilise advanced Grafana features like the Explore view and dashboard import/export.
- Understand key operational aspects for managing Prometheus and Grafana, including sizing, retention, and backup.
- Understand basic security considerations for a production Prometheus and Grafana monitoring stack.
- Understand how Prometheus and Grafana fit into a wider observability strategy with logs and traces (overview).
-
This advanced 2-day workshop is designed for IT professionals, system administrators, DevOps engineers, Site Reliability Engineers (SREs), and architects who are already familiar with the fundamentals of Prometheus and Grafana (equivalent to the introductory workshop) and need to deepen their skills for production deployments, automation, alerting, and operational management. It is ideal for:
Professionals who have completed the Introduction to Prometheus & Grafana workshop.
Users who are currently working with Prometheus and Grafana but need to learn advanced querying, alerting, and configuration techniques.
Teams looking to implement automated service discovery and robust alerting strategies for dynamic environments.
Those responsible for the operational management, scaling, and security of Prometheus and Grafana in production.
-
Participants must have:
Prior completion of the Introduction to Prometheus & Grafana (2 Day Workshop) or equivalent hands-on experience.
Equivalent experience includes being comfortable with basic Prometheus installation, configuration, scraping targets, fundamental PromQL queries, basic Grafana installation, and building simple dashboards.
Solid familiarity with Linux command-line environments.
Knowledge of Docker is recommended for laboratory exercises.
-
This advanced Prometheus & Grafana course is available for private / custom delivery for your team - as an in-house face-to-face workshop at your location of choice, or as online instructor-led training via MS Teams (or your own preferred platform).
Get in touch to find out how we can deliver tailored training which focuses on your project requirements and learning goals.
-
Advanced PromQL & Recording Rules
Review of PromQL Fundamentals: Quick recap of basic queries, labels, and aggregation.
More Advanced PromQL Patterns: Working with rate, irate, delta, increase, histograms, and joining time series.
Understanding Query Performance: Writing efficient PromQL queries for scale.
Recording Rules: Understanding the purpose of pre-calculating frequently used expressions for performance and simplicity.
Defining and Using Recording Rules: Configuring recording rules in Prometheus and querying the resulting new time series.
Hands-On Lab: Writing more complex PromQL queries, creating and verifying recording rules.
Module 6: Service Discovery
The Challenge of Dynamic Environments: Why manual configuration doesn't scale.
Overview of Service Discovery Methods: Introduction to various mechanisms Prometheus uses to automatically find monitoring targets.
Configuring Common Service Discovery Methods: Implementing file-based discovery, and an overview or lab on cloud/orchestration-specific discovery (e.g., Kubernetes, EC2) if applicable.
Relabelling: Using relabelling rules in scrape configurations to transform or filter discovered targets and their labels.
Hands-On Lab: Implementing file-based service discovery. Optionally, configuring discovery for a dynamic environment based on the audience's likely use case.
Alerting with Prometheus & Alertmanager
Review of Basic Alerting Rules: Quick recap of defining alert conditions in Prometheus.
Understanding Alert States and Life Cycle.
Introduction to Alertmanager: Overview of its role in managing alerts.
Setting up and Configuring Alertmanager: Installation and detailed configuration of alertmanager.yml.
Alert Routing: Defining rules to send alerts to different teams or channels.
Alert Grouping, Inhibition, and Silences: Strategies for managing alert noise.
Templating Alert Notifications: Customising the format and content of messages sent by Alertmanager.
Hands-On Lab: Defining advanced alerting rules, setting up and configuring Alertmanager with multiple receivers, testing grouping and inhibition rules.
Advanced Grafana Dashboards
Review of Basic Dashboard Building: Quick recap of creating dashboards and adding panels.
Using Variables and Templating: Creating dynamic and reusable dashboards with template variables (e.g., for selecting jobs, instances, or environments).
Advanced Panel Types & Configuration: Exploring visualisations like Heatmaps, Worldmaps, and using features like thresholds and repeated panels.
Transformations: Applying data transformations within panels (e.g., sorting, filtering, calculations across series).
Annotations: Adding markers to graphs for events (e.g., deployments).
Data Links and Panel Links: Configuring links for drill-down and cross-referencing.
Importing and Exporting Dashboards: Sharing and managing dashboards as JSON.
Hands-On Lab: Creating a dynamic dashboard using template variables, configuring advanced panels, adding transformations and annotations, exporting a dashboard.
Operational Aspects, Security, and Beyond
Prometheus Sizing and Capacity Planning Basics: Estimating resource needs.
Data Retention Policies: Configuring how long metrics are stored.
Basic Troubleshooting: Identifying common issues with Prometheus and Grafana.
High Availability & Scaling Concepts: Overview of strategies for resilience and handling large loads (e.g., HA Prometheus, Thanos/Mimir overview).
Integrating with other Observability Pillars: Overview of using Grafana with other data sources like Loki (logs) and Tempo (traces) for a unified view.
Backup and Restoration: Basic strategies for backing up Prometheus data.
Prometheus and Grafana Security Best Practices: Basic steps for securing your monitoring stack (authentication, TLS).
Hands-On Lab: Configuring data retention, basic troubleshooting exercise, performing a simple backup/restore simulation.
-
Prometheus Official Documentation: The comprehensive source for information on installing, configuring, and using Prometheus, including PromQL. https://prometheus.io/docs/
Grafana Official Documentation: The main resource for learning how to install, configure, and use Grafana to build dashboards and visualisations. https://grafana.com/docs/
Prometheus Community Forum: Get help, ask questions, and connect with other Prometheus users and contributors. https://community.prometheus.io/
Grafana Community Forum: Find answers, share knowledge, and interact with the wider Grafana user and development community. https://community.grafana.com/
Trusted by