Advanced Prometheus & Grafana Training Course

Scale, Alert, and Operate: Advanced Prometheus & Grafana Skills.

About this Advanced Prometheus & Grafana course

Building upon your foundational knowledge of Prometheus and Grafana basics, this 2-day advanced workshop dives deeper into the powerful capabilities needed to implement robust monitoring, alerting, and visualisation solutions for production environments and at scale. It is designed for participants who are already familiar with the fundamental concepts covered in the introductory workshop or have equivalent hands-on experience with getting started with Prometheus and Grafana. This workshop focuses on mastering advanced techniques and understanding operational considerations crucial for real-world deployments.

The workshop begins with a review of core PromQL concepts before exploring advanced querying techniques and optimising query performance with Recording Rules. You will learn how to automate target discovery using various Service Discovery methods, which is crucial for monitoring dynamic infrastructure. A significant focus is placed on comprehensive alerting, covering the definition of complex alert rules in Prometheus and the detailed configuration and management of notifications using Alertmanager, including routing, grouping, and silencing.

Participants will also enhance their Grafana skills by mastering advanced dashboarding techniques, including building dynamic dashboards using variables and templates, applying transformations, and using data linking for deeper analysis. The workshop concludes by covering essential operational aspects like sizing, data retention, backup, and an overview of high availability/scaling strategies. Key security considerations for the monitoring stack and an introduction to integrating with other observability tools (logs, traces) are also included, providing participants with the knowledge to build and maintain production-ready Prometheus and Grafana deployments.

Instructor-led online and in-house face-to-face options are available - as part of a wider customised training programme, or as a standalone workshop, on-site at your offices or at one of many flexible meeting spaces in the UK and around the World.

By the end of this course, attendees will be able to:
- Apply advanced PromQL techniques to perform complex data analysis and troubleshooting.
- Define and use Recording Rules to optimise query performance and simplify complex expressions.
- Implement Service Discovery to automatically manage monitoring targets in dynamic environments.
- Define and manage Prometheus Alerting Rules effectively for different scenarios.
- Configure and use Alertmanager for advanced alert routing, grouping, and notification management.
- Build advanced Grafana dashboards using variables, templates, transformations, and linking for enhanced interactivity and reusability.
- Utilise advanced Grafana features like the Explore view and dashboard import/export.
- Understand key operational aspects for managing Prometheus and Grafana, including sizing, retention, and backup.
- Understand basic security considerations for a production Prometheus and Grafana monitoring stack.
- Understand how Prometheus and Grafana fit into a wider observability strategy with logs and traces (overview).
This advanced 2-day workshop is designed for IT professionals, system administrators, DevOps engineers, Site Reliability Engineers (SREs), and architects who are already familiar with the fundamentals of Prometheus and Grafana (equivalent to the introductory workshop) and need to deepen their skills for production deployments, automation, alerting, and operational management. It is ideal for:
- Professionals who have completed the Introduction to Prometheus & Grafana workshop.
- Users who are currently working with Prometheus and Grafana but need to learn advanced querying, alerting, and configuration techniques.
- Teams looking to implement automated service discovery and robust alerting strategies for dynamic environments.
- Those responsible for the operational management, scaling, and security of Prometheus and Grafana in production.
Participants must have:
- Prior completion of the Introduction to Prometheus & Grafana (2 Day Workshop) or equivalent hands-on experience.
- Equivalent experience includes being comfortable with basic Prometheus installation, configuration, scraping targets, fundamental PromQL queries, basic Grafana installation, and building simple dashboards.
- Solid familiarity with Linux command-line environments.
Knowledge of Docker is recommended for laboratory exercises.
This advanced Prometheus & Grafana course is available for private / custom delivery for your team - as an in-house face-to-face workshop at your location of choice, or as online instructor-led training via MS Teams (or your own preferred platform).
Get in touch to find out how we can deliver tailored training which focuses on your project requirements and learning goals.
Advanced PromQL & Recording Rules
- Review of PromQL Fundamentals: Quick recap of basic queries, labels, and aggregation.
- More Advanced PromQL Patterns: Working with rate, irate, delta, increase, histograms, and joining time series.
- Understanding Query Performance: Writing efficient PromQL queries for scale.
- Recording Rules: Understanding the purpose of pre-calculating frequently used expressions for performance and simplicity.
- Defining and Using Recording Rules: Configuring recording rules in Prometheus and querying the resulting new time series.
- Hands-On Lab: Writing more complex PromQL queries, creating and verifying recording rules.
Module 6: Service Discovery
- The Challenge of Dynamic Environments: Why manual configuration doesn't scale.
- Overview of Service Discovery Methods: Introduction to various mechanisms Prometheus uses to automatically find monitoring targets.
- Configuring Common Service Discovery Methods: Implementing file-based discovery, and an overview or lab on cloud/orchestration-specific discovery (e.g., Kubernetes, EC2) if applicable.
- Relabelling: Using relabelling rules in scrape configurations to transform or filter discovered targets and their labels.
- Hands-On Lab: Implementing file-based service discovery. Optionally, configuring discovery for a dynamic environment based on the audience's likely use case.
Alerting with Prometheus & Alertmanager
- Review of Basic Alerting Rules: Quick recap of defining alert conditions in Prometheus.
- Understanding Alert States and Life Cycle.
- Introduction to Alertmanager: Overview of its role in managing alerts.
- Setting up and Configuring Alertmanager: Installation and detailed configuration of alertmanager.yml.
- Alert Routing: Defining rules to send alerts to different teams or channels.
- Alert Grouping, Inhibition, and Silences: Strategies for managing alert noise.
- Templating Alert Notifications: Customising the format and content of messages sent by Alertmanager.
- Hands-On Lab: Defining advanced alerting rules, setting up and configuring Alertmanager with multiple receivers, testing grouping and inhibition rules.
Advanced Grafana Dashboards
- Review of Basic Dashboard Building: Quick recap of creating dashboards and adding panels.
- Using Variables and Templating: Creating dynamic and reusable dashboards with template variables (e.g., for selecting jobs, instances, or environments).
- Advanced Panel Types & Configuration: Exploring visualisations like Heatmaps, Worldmaps, and using features like thresholds and repeated panels.
- Transformations: Applying data transformations within panels (e.g., sorting, filtering, calculations across series).
- Annotations: Adding markers to graphs for events (e.g., deployments).
- Data Links and Panel Links: Configuring links for drill-down and cross-referencing.
- Importing and Exporting Dashboards: Sharing and managing dashboards as JSON.
- Hands-On Lab: Creating a dynamic dashboard using template variables, configuring advanced panels, adding transformations and annotations, exporting a dashboard.
Operational Aspects, Security, and Beyond
- Prometheus Sizing and Capacity Planning Basics: Estimating resource needs.
- Data Retention Policies: Configuring how long metrics are stored.
- Basic Troubleshooting: Identifying common issues with Prometheus and Grafana.
- High Availability & Scaling Concepts: Overview of strategies for resilience and handling large loads (e.g., HA Prometheus, Thanos/Mimir overview).
- Integrating with other Observability Pillars: Overview of using Grafana with other data sources like Loki (logs) and Tempo (traces) for a unified view.
- Backup and Restoration: Basic strategies for backing up Prometheus data.
- Prometheus and Grafana Security Best Practices: Basic steps for securing your monitoring stack (authentication, TLS).
- Hands-On Lab: Configuring data retention, basic troubleshooting exercise, performing a simple backup/restore simulation.
- Prometheus Official Documentation: The comprehensive source for information on installing, configuring, and using Prometheus, including PromQL. https://prometheus.io/docs/
- Grafana Official Documentation: The main resource for learning how to install, configure, and use Grafana to build dashboards and visualisations. https://grafana.com/docs/
- Prometheus Community Forum: Get help, ask questions, and connect with other Prometheus users and contributors. https://community.prometheus.io/
- Grafana Community Forum: Find answers, share knowledge, and interact with the wider Grafana user and development community. https://community.grafana.com/

Trusted by

Customise this Advanced Prometheus & Grafana training for your team

We can tailor this course to your team's specific needs, tools, and goals. Whether you need to adjust the syllabus, focus on particular topics, or align with your internal processes, we'll work with you to create the right programme.

Delivered at your offices or online
Tailored content and exercises
Flexible length and scheduling
Hands-on labs using your own environment

What people are saying...

Pacing was good with clear explanations. The instructor was friendly and very approachable.

SB, Software Engineer (Central Government Department)

Modern Spring Framework Development

Really appreciated that there was a topic I really wanted to learn and Simon covered this and added time in to do it.

HHF, Software Engineer ()

Advanced C# Programming Training Course

We broadly covered everything we needed to know while also being able to dive a little deeper into some specifics for our organisation.

AB, Software Engineer (Qvest)

Elastic Stack for Systems Monitoring Training Course

Kevin is a fab instructor, his delivery is great, his explanations are great, just fab

Anon, Software Developer (UK Broadcasting Organisation)

Python Training: a practical introduction to programming

Excellent material that will be instantly useful and provide benefit

MB, Software Engineer (UK GIS Mapping Organisation)

Advanced Python Training Course

Public Courses Dates and Rates

Standard duration: 2 days

Please get in touch for pricing and availability.

Course enquiry

Send us a no-obligation enquiry about this course

First Name ^*

Last Name ^*

Email ^*

Phone Number ^*

Company ^*

Subject/Tech ^*

Your message ^*

Where did you first hear about us?

Choose how you first heard about Framework Training.

Related courses

Ansible Training Course

Location Custom On-site / On-Line Options

Duration 2 days

AWS CloudWatch Training Course: Essential Monitoring and Observability

Location Custom On-site / On-Line Options

Duration 2 days

AIOps Training Course

Location Custom On-site / On-Line Options

Duration 2 days

Learn skills by your role

AI Workflows with PostgreSQL & pgvector Training Course

DevSecOps Training Course

Elasticsearch Training Course

Modern C# & .NET 10 Foundations Training Course

Rust Training - build reliable and efficient software

More...

Why learn with Framework Training?

What are the course hours?

How many people will be on my public scheduled course?

How should I prepare for the course?

I’m not sure if I have enough experience to take the course. What should I do?

Do you offer training discounts?

Can I pay via Credit/Debit card?

Can I pay via Invoice?

Do I need to pay Value Added Tax (UK VAT)?

Where can I find your course booking terms and conditions?

Public Sector

Graduate Training Schemes

Attract & retain the brightest new starters

Learning & Development

Corporate & Volume Pricing

Custom Learning Paths

Advanced Prometheus & Grafana Training Course

About this Advanced Prometheus & Grafana course

What will you learn on this Advanced Prometheus & Grafana training course?

Who should attend?

What are the prerequisite skills?

What are the training delivery options for this course?

What is covered in the Advanced Prometheus & Grafana course?

Advanced PromQL & Recording Rules

Module 6: Service Discovery

Alerting with Prometheus & Alertmanager

Advanced Grafana Dashboards

Operational Aspects, Security, and Beyond

Useful resources for Advanced Prometheus & Grafana

Trusted by

Public Courses Dates and Rates

Course enquiry

Related courses