Snowflake for AI & Machine Learning: a Data Science workshop

Accelerate the ML lifecycle: explore Snowpark, Python, and UDFs to build, train, and deploy models natively in Snowflake.

About the course

This two-day, hands-on workshop focuses on leveraging Snowflake's native capabilities for end-to-end AI and Machine Learning workflows. Designed for data professionals, the course moves beyond basic SQL to immerse you in Snowpark, Snowflake's powerful development framework that allows you to write and execute Python, Scala, and Java code directly within the data cloud.

You will learn to manage large-scale data preparation using User-Defined Functions (UDFs), train models using familiar Python libraries like scikit-learn, and deploy them as secure, scalable User-Defined Functions (UDFs) and Stored Procedures for real-time inference.

You will receive practial guidance to eliminate data movement and maximize performance for your ML initiatives.

Instructor-led online and in-house face-to-face options are available - as part of a wider customised training programme, or as a standalone workshop, on-site at your offices or at one of many flexible meeting spaces in the UK and around the World.

- Master Snowpark: Write and execute Python code natively within Snowflake using the Snowpark API to leverage its distributed computing power.
- Streamline Data Prep: Use Snowpark DataFrames to efficiently clean, engineer features, and transform data at scale without exporting data.
- Build In-Database Logic: Develop scalable User-Defined Functions (UDFs) and Stored Procedures using Python to contain all ML logic within Snowflake.
- Train and Evaluate Models: Utilize external ML libraries (via Snowpark's package management) to train models directly on Snowflake data.
- Deploy for Inference: Deploy trained models as secure, highly performant UDFs for real-time batch and row-level scoring.
- Manage Dependencies: Handle third-party Python libraries and package management efficiently within the Snowpark environment.
This course is ideal for Data Scientists, ML Engineers, Data Analysts, and Database Developers who have hands-on experience with Python and SQL and want to leverage the speed and scalability of Snowflake for their predictive modeling projects.
To gain maximum benefit from this workshop, delegates should have attended our Python Programming training and SQL training courses, or have equivalent experience:
- Proficiency in SQL.
- Working knowledge of Python (including pandas/scikit-learn concepts).
- Basic familiarity with the Snowflake environment (console, tables, warehouses).
We can customise this training to match your team's experience and needs - with more time and coverage of fundamentals for newer data analysts, for instance.
This Snowflake course is available for private / custom delivery for your team - as an in-house face-to-face workshop at your location of choice, or as online instructor-led training via MS Teams (or your own preferred platform).
Get in touch to find out how we can deliver tailored training which focuses on your project requirements and learning goals.
Introduction to Snowflake for ML
- Review of Snowflake's architecture: Compute (Warehouses) and Storage.
- The challenge of data movement in traditional ML pipelines.
- Introducing Snowpark: The developer framework for in-database programming.
- Setting up the Snowpark environment and Python client.
Snowpark DataFrames
- Moving beyond SQL: Using Snowpark DataFrames (the Pandas analogue).
- Lazy evaluation and query optimization in Snowpark.
- Loading data from Snowflake tables into DataFrames.
- Hands-on Lab: Data exploration and basic statistical analysis using Snowpark.
Advanced Data Transformation
- Data cleaning and feature engineering techniques using Snowpark functions.
- Handling categorical data and implementing one-hot encoding.
- Implementing complex window functions and joins entirely within Snowpark.
- Hands-on Lab: Preparing a feature set for an ML model using distributed Snowpark operations.
User-Defined Functions (UDFs)
- What are UDFs and their role in embedding custom Python logic?
- Creating Scalar UDFs for row-level transformations.
- Creating Vectorized UDFs (using Pandas for batch processing) for performance.
- Hands-on Lab: Deploying a simple Python UDF to perform a custom feature calculation.
Training Models with Snowpark
- Managing third-party library dependencies (e.g., scikit-learn, joblib).
- Integrating models: Loading the prepared Snowpark DataFrames into scikit-learn.
- Training a classification or regression model (e.g., Logistic Regression) directly on Snowflake data.
- Hands-on Lab: Training, evaluating, and persisting a model object in internal Snowflake stages.
Model Deployment for Inference
- The gold standard: Deploying the trained model as a Python UDF or Stored Procedure for scoring.
- Writing a prediction UDF that loads the persisted model object from a stage.
- Real-time and batch inference strategies.
- Hands-on Lab: Creating a scoring UDF and running it across a large Snowflake table for batch predictions.
Stored Procedures and ML Pipeline Orchestration
- Using Stored Procedures to orchestrate complex, multi-step ML pipelines (data prep, training, deployment).
- Scheduling stored procedures for automated re-training.
- Error handling and logging best practices within Snowpark.
Governance and Best Practices
- Introduction to Streamlit in Snowflake for building simple ML application frontends.
- Cost management and optimizing warehouse size for ML workloads.
- Data Governance: Securing UDFs and Stored Procedures with permissions and roles.
Core Snowpark and Python Resources
- Snowpark for Python Official Documentation The primary guide for developers to understand the API, features, and architecture of running Python code natively in Snowflake. https://docs.snowflake.com/en/developer-guide/snowpark/python/index
- Snowpark Python API Reference The detailed reference for the Snowpark DataFrame methods and functions, essential for the data preparation labs. https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/index
- Creating User-Defined Functions (UDFs) in Python Specific documentation on how to write, package, and deploy Python UDFs and Stored Procedures for model inference. https://docs.snowflake.com/en/developer-guide/udf/python/udf-python
- Managing Python Packages in Snowpark Guide on how Snowpark leverages the Anaconda repository to manage third-party libraries like scikit-learn and pandas within the environment. https://docs.snowflake.com/en/developer-guide/snowpark/python/packaging-dependencies
Application and Ecosystem Tools
- Streamlit in Snowflake Documentation Instructions for using Streamlit, which is key for quickly building data application frontends directly on the Snowflake platform (Module 8). https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit
- Snowflake Data Science and ML Blog A valuable resource featuring real-world case studies, best practices, and new feature announcements for the ML ecosystem. https://www.snowflake.com/blog/category/data-science-machine-learning/
Development Environment
- VS Code (Visual Studio Code) The recommended IDE for writing and testing Snowpark Python scripts before deploying UDFs. https://code.visualstudio.com/
- Snowflake VS Code Extension The official extension that allows users to write, run, and manage SQL and Snowpark code directly from their IDE. https://docs.snowflake.com/en/user-guide/tools-ides/vscode-ext

Trusted by

Customise this course for your team

We can tailor your syllabus to take into account your group's current skills, technology stack, and specific learning goals - as part of a wider training programme or as a standalone workshop.

Enquire now to find out about our cost-effective options:

The data sets and examples used during the course were useful and relevant.

DS, Developer, (Winton Capital)

Data Analysis with Python Training Course

The different aspects of Angular covered using relatable examples was good. ...[Online training] held up really well.

AB, Software Developer [Remote online course], (V12 Retail Finance)

Angular Training Course

Very good explanations, approachable instructor. Well thought out content.

AV, Mobile Developer, (Leica Geosystems)

Modern iOS App Development with Swift & SwiftUI

Public Courses Dates and Rates

Please get in touch for pricing and availability.

First Name ^*

Last Name ^*

Email ^*

Phone Number ^*

Company ^*

Subject/Tech ^*

Your message ^*

Where did you first hear about us?

Related courses

Applied Natural Language Processing (NLP) with Python

Location Custom/Remote delivery available: (enquire for cost)

Duration 3 days

OpenSearch: Comprehensive Search, Analytics, and Security

Location Custom/Remote delivery available: (enquire for cost)

Duration 2 days

Amazon SageMaker: Machine Learning on AWS

Location Custom/Remote delivery available: (enquire for cost)

Duration 2 days

Snowflake for AI & Machine Learning: a Data Science workshop

About the course

Learning outcomes

Who should attend?

Prerequisites

Flexible training

Course syllabus

Introduction to Snowflake for ML

Snowpark DataFrames

Advanced Data Transformation

User-Defined Functions (UDFs)

Training Models with Snowpark

Model Deployment for Inference

Stored Procedures and ML Pipeline Orchestration

Governance and Best Practices

Useful resources

Core Snowpark and Python Resources

Application and Ecosystem Tools

Development Environment

Trusted by

Public Courses Dates and Rates

Related courses