Web Scraping with Python

Learn to extract valuable data from the Web using Python.

About the course

Our instructor-led Web Scraping with Python training course will give you the skills to create automated scripts to pull in data from across the web, based on the criteria you require to build valuable reports on relevant sources.

Some of the key use cases for Web Scraping include:

Competition monitoring: extracting details of products and services, like price, images and other content, observing changes over time.
Policy tracking: mining circulars from trade societies and other organisations, filtering specific keywords of interest.
Data gathering from multiple sources: collecting, aggregating and analysing data on a set of products or services (e.g. real estate) from multiple websites, in order to have richer insights on the specific items.
Online reputation tracking: mining opinions about products or brands, from online reviews or blog posts.
Data collection for training Machine Learning systems.

You will benefit from extensive hands-on labs, delivered by an expert Data Science practitioner who will give you enough knowledge of Python to kick-start your project.

We're happy to offer this instructor-led web scraping training online; in-person at our London training centre, or at your location of choice. Please get in touch to find out about flexible options to suit your team.

- Understand the concepts, diverse use cases, and ethical considerations of web scraping and web crawling.
- Utilise essential Python data structures, control flow, and file handling for practical web scraping tasks.
- Acquire web content programmatically using the Requests library for fetching static pages.
- Build structured and scalable web crawlers using the Scrapy framework.
- Automate browser interactions and extract data from dynamic, JavaScript-rendered web pages.
- Extract data from various web data formats, including HTML, XML, and JSON.
- Parse HTML content and extract specific data using the BeautifulSoup library and extract data from HTML tables using pandas.
- Perform essential data processing and cleaning steps, including handling missing data, duplicates, string manipulation, and pattern matching.
- Store extracted data in relational databases using the SQLAlchemy library and understand the advantages and disadvantages of alternative storage options (files, NoSQL).
This 3-day intensive hands-on training course is designed for developers, data analysts, data scientists, researchers, and anyone who needs to programmatically collect data from websites for analysis, research, or application development. It is ideal for:
- Data Analysts and Scientists needing to source data from the web.
- Software Developers building data-driven applications that require web data.
- Researchers collecting data for analysis and study.
- Business Analysts requiring competitive intelligence, market data, or other web-based information.
- Professionals with some programming background and an interest in using Python for data collection tasks.
This 2-day intensive hands-on training course is designed for developers, data analysts, data scientists, researchers, and anyone who needs to programmatically collect data from websites for analysis, research, or application development. It is ideal for:
- Data Analysts and Scientists needing to source data from the web.
- Software Developers building data-driven applications that require web data.
- Researchers collecting data for analysis and study.
- Business Analysts requiring competitive intelligence, market data, or other web-based information.
- Professionals with some programming background and an interest in using Python for data collection tasks.
Participants should have:
- An understanding of basic programming concepts (e.g., variables, functions, loops, conditionals).
- Some prior experience with Python is helpful, but the course includes a refresher covering the necessary fundamentals.
- Basic understanding of HTML structure and tags is beneficial, though not strictly required.
- Familiarity with using a command-line interface (CLI).
We can customise the training to match your team's experience and needs - with more time and coverage of Python fundamentals for those new to the language, for instance.
Python Refresher
- Data structures
- Control flow statements
- Working with files in different formats (CSV, JSON, ...)
Overview on Web Scraping
- What is Web Scraping?
- Web Crawling vs. Web Scraping
- Uses Cases of Web Scraping
- Components of a Web Scraper
- Alternatives to Web Scraping: Using Web APIs
Data Acquisition
- Simple web client using Requests
- Building a crawler using Scrapy
- Simulating user clicks and browser interactions using Selenium
- ◦ Handling JavaScript/AJAX in dynamic web pages
- ◦ Automatic form submission
Data Extraction
- Data formats: HTML, XML, JSON
- Extracting data from HTML tables using pandas
- Ad-hoc parsing of HTML documents using BeautifulSoup
Data Processing and Cleaning
- Preparing your data for downstream analysis and computation
- Handling missing data and duplicate data
- String manipulation and pattern matching
- Overview on Natural Language Processing tools for dealing with text data
Data Storage: Relational Databases
- Connecting to SQL databases using SQLAlchemy
- Inserting data into SQL databases
- Reading data from SQL databases
- Overview on alternatives to SQL databases: file formats, NoSQL databases
- Python Documentation: The official and comprehensive documentation for the Python language itself.
  https://docs.python.org/
- Requests Documentation: Official documentation for the popular Requests HTTP library, excellent for simple data fetching.
  https://docs.python-requests.org/en/latest/
- Scrapy Documentation: Official documentation for the powerful open-source web scraping and web crawling framework, ideal for larger projects.
  https://docs.scrapy.org/en/latest/
- Selenium Documentation: Documentation for automating browser interactions, essential for scraping dynamic websites.
  https://www.selenium.dev/documentation/
- BeautifulSoup Documentation: Documentation for BeautifulSoup (bs4), a library for pulling data out of HTML and XML files.
  https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- Pandas Documentation: Comprehensive documentation for the pandas library, crucial for data manipulation, analysis, and extracting data from HTML tables.
  https://pandas.pydata.org/docs/
- SQLAlchemy Documentation: Official documentation for the Python SQL toolkit and Object Relational Mapper, useful for database interaction.
  https://docs.sqlalchemy.org/en/latest/

Trusted by

Customise this course for your team

We can tailor your syllabus to take into account your group's current skills, technology stack, and specific learning goals - as part of a wider training programme or as a standalone workshop.

Enquire now to find out about our cost-effective options:

Very interactive - plenty of time for practical exercises, encouraging comments, questions throughout and sharing what we've built. I've had a great refresher and have a lot of things I can go back and build on.

Anon, Graduate Software Engineer, (Major UK Broadcaster)

SOA: From Fundamentals to Implementation and Governance

Overall impression of the course: 5/5

GC, Quality Assurance, (Civil Service)

It gave a really good foundation on the principles of Docker and k8s. I think being able to do some hands on work while it was being explained was very useful and is helpful to make what has been learned, stick

SS, Software Engineer, (Cambridge University Press)

Kubernetes Training Course

Public Courses Dates and Rates

Please get in touch for pricing and availability.

First Name ^*

Last Name ^*

Email ^*

Phone Number ^*

Company ^*

Subject/Tech ^*

Your message ^*

Where did you first hear about us?

Related courses

Amazon SageMaker: Machine Learning on AWS

Location Custom On-site / On-Line Options

Duration 2 days

Advanced Kibana Course

Location Custom On-site / On-Line Options

Duration 1 day

Elasticsearch Training Course

Location Attend online

Next Date Next delivery: November 17th, 2025

Price £1,395 + VAT

Duration 2 days

Learn skills by your role

AI Workflows with PostgreSQL & pgvector

C# Training Course - Learn .NET

DevSecOps Training Course

Elasticsearch Training Course

Rust Programming - building reliable and efficient software

More...

Why learn with Framework Training?

What are the course hours?

How many people will be on my public scheduled course?

How should I prepare for the course?

I’m not sure if I have enough experience to take the course. What should I do?

Do you offer training discounts?

Can I pay via Credit/Debit card?

Can I pay via Invoice?

Do I need to pay Value Added Tax (UK VAT)?

Where can I find your course booking terms and conditions?

Public Sector

Graduate Training Schemes

Attract & retain the brightest new starters

Learning & Development

Corporate & Volume Pricing

Custom Learning Paths

Web Scraping with Python

About the course

Learning outcomes

Who should attend?

Prerequisites

Flexible training

Course syllabus

Python Refresher

Overview on Web Scraping

Data Acquisition

Data Extraction

Data Processing and Cleaning

Data Storage: Relational Databases

Useful resources

Trusted by

Public Courses Dates and Rates

Related courses