About the course
This two-day, hands-on workshop focuses on leveraging Snowflake's native capabilities for end-to-end AI and Machine Learning workflows. Designed for data professionals, the course moves beyond basic SQL to immerse you in Snowpark, Snowflake's powerful development framework that allows you to write and execute Python, Scala, and Java code directly within the data cloud.
You will learn to manage large-scale data preparation using User-Defined Functions (UDFs), train models using familiar Python libraries like scikit-learn, and deploy them as secure, scalable User-Defined Functions (UDFs) and Stored Procedures for real-time inference.
You will receive practial guidance to eliminate data movement and maximize performance for your ML initiatives.
Instructor-led online and in-house face-to-face options are available - as part of a wider customised training programme, or as a standalone workshop, on-site at your offices or at one of many flexible meeting spaces in the UK and around the World.
-
- Master Snowpark: Write and execute Python code natively within Snowflake using the Snowpark API to leverage its distributed computing power.
- Streamline Data Prep: Use Snowpark DataFrames to efficiently clean, engineer features, and transform data at scale without exporting data.
- Build In-Database Logic: Develop scalable User-Defined Functions (UDFs) and Stored Procedures using Python to contain all ML logic within Snowflake.
- Train and Evaluate Models: Utilize external ML libraries (via Snowpark's package management) to train models directly on Snowflake data.
- Deploy for Inference: Deploy trained models as secure, highly performant UDFs for real-time batch and row-level scoring.
- Manage Dependencies: Handle third-party Python libraries and package management efficiently within the Snowpark environment.
-
This course is ideal for Data Scientists, ML Engineers, Data Analysts, and Database Developers who have hands-on experience with Python and SQL and want to leverage the speed and scalability of Snowflake for their predictive modeling projects.
-
To gain maximum benefit from this workshop, delegates should have attended our Python Programming training and SQL training courses, or have equivalent experience:
Proficiency in SQL.
Working knowledge of Python (including pandas/scikit-learn concepts).
Basic familiarity with the Snowflake environment (console, tables, warehouses).
We can customise this training to match your team's experience and needs - with more time and coverage of fundamentals for newer data analysts, for instance.
-
This Snowflake course is available for private / custom delivery for your team - as an in-house face-to-face workshop at your location of choice, or as online instructor-led training via MS Teams (or your own preferred platform).
Get in touch to find out how we can deliver tailored training which focuses on your project requirements and learning goals.
-
Introduction to Snowflake for ML
Review of Snowflake's architecture: Compute (Warehouses) and Storage.
The challenge of data movement in traditional ML pipelines.
Introducing Snowpark: The developer framework for in-database programming.
Setting up the Snowpark environment and Python client.
Snowpark DataFrames
Moving beyond SQL: Using Snowpark DataFrames (the Pandas analogue).
Lazy evaluation and query optimization in Snowpark.
Loading data from Snowflake tables into DataFrames.
Hands-on Lab: Data exploration and basic statistical analysis using Snowpark.
Advanced Data Transformation
Data cleaning and feature engineering techniques using Snowpark functions.
Handling categorical data and implementing one-hot encoding.
Implementing complex window functions and joins entirely within Snowpark.
Hands-on Lab: Preparing a feature set for an ML model using distributed Snowpark operations.
User-Defined Functions (UDFs)
What are UDFs and their role in embedding custom Python logic?
Creating Scalar UDFs for row-level transformations.
Creating Vectorized UDFs (using Pandas for batch processing) for performance.
Hands-on Lab: Deploying a simple Python UDF to perform a custom feature calculation.
Training Models with Snowpark
Managing third-party library dependencies (e.g., scikit-learn, joblib).
Integrating models: Loading the prepared Snowpark DataFrames into scikit-learn.
Training a classification or regression model (e.g., Logistic Regression) directly on Snowflake data.
Hands-on Lab: Training, evaluating, and persisting a model object in internal Snowflake stages.
Model Deployment for Inference
The gold standard: Deploying the trained model as a Python UDF or Stored Procedure for scoring.
Writing a prediction UDF that loads the persisted model object from a stage.
Real-time and batch inference strategies.
Hands-on Lab: Creating a scoring UDF and running it across a large Snowflake table for batch predictions.
Stored Procedures and ML Pipeline Orchestration
Using Stored Procedures to orchestrate complex, multi-step ML pipelines (data prep, training, deployment).
Scheduling stored procedures for automated re-training.
Error handling and logging best practices within Snowpark.
Governance and Best Practices
Introduction to Streamlit in Snowflake for building simple ML application frontends.
Cost management and optimizing warehouse size for ML workloads.
Data Governance: Securing UDFs and Stored Procedures with permissions and roles.
-
Core Snowpark and Python Resources
Snowpark for Python Official Documentation The primary guide for developers to understand the API, features, and architecture of running Python code natively in Snowflake. https://docs.snowflake.com/en/developer-guide/snowpark/python/index
Snowpark Python API Reference The detailed reference for the Snowpark DataFrame methods and functions, essential for the data preparation labs. https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/index
Creating User-Defined Functions (UDFs) in Python Specific documentation on how to write, package, and deploy Python UDFs and Stored Procedures for model inference. https://docs.snowflake.com/en/developer-guide/udf/python/udf-python
Managing Python Packages in Snowpark Guide on how Snowpark leverages the Anaconda repository to manage third-party libraries like scikit-learn and pandas within the environment. https://docs.snowflake.com/en/developer-guide/snowpark/python/packaging-dependencies
Application and Ecosystem Tools
Streamlit in Snowflake Documentation Instructions for using Streamlit, which is key for quickly building data application frontends directly on the Snowflake platform (Module 8). https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit
Snowflake Data Science and ML Blog A valuable resource featuring real-world case studies, best practices, and new feature announcements for the ML ecosystem. https://www.snowflake.com/blog/category/data-science-machine-learning/
Development Environment
VS Code (Visual Studio Code) The recommended IDE for writing and testing Snowpark Python scripts before deploying UDFs. https://code.visualstudio.com/
Snowflake VS Code Extension The official extension that allows users to write, run, and manage SQL and Snowpark code directly from their IDE. https://docs.snowflake.com/en/user-guide/tools-ides/vscode-ext
Trusted by



