About the course
Our instructor-led Web Scraping with Python training course will give you the skills to create automated scripts to pull in data from across the web, based on the criteria you require to build valuable reports on relevant sources.
Some of the key use cases for Web Scraping include:
Competition monitoring: extracting details of products and services, like price, images and other content, observing changes over time.
Policy tracking: mining circulars from trade societies and other organisations, filtering specific keywords of interest.
Data gathering from multiple sources: collecting, aggregating and analysing data on a set of products or services (e.g. real estate) from multiple websites, in order to have richer insights on the specific items.
Online reputation tracking: mining opinions about products or brands, from online reviews or blog posts.
Data collection for training Machine Learning systems.
You will benefit from extensive hands-on labs, delivered by an expert Data Science practitioner who will give you enough knowledge of Python to kick-start your project.
We're happy to offer this instructor-led web scraping training online; in-person at our London training centre, or at your location of choice. Please get in touch to find out about flexible options to suit your team.
-
- Understand the concepts, diverse use cases, and ethical considerations of web scraping and web crawling.
- Utilise essential Python data structures, control flow, and file handling for practical web scraping tasks.
- Acquire web content programmatically using the Requests library for fetching static pages.
- Build structured and scalable web crawlers using the Scrapy framework.
- Automate browser interactions and extract data from dynamic, JavaScript-rendered web pages.
- Extract data from various web data formats, including HTML, XML, and JSON.
- Parse HTML content and extract specific data using the BeautifulSoup library and extract data from HTML tables using pandas.
- Perform essential data processing and cleaning steps, including handling missing data, duplicates, string manipulation, and pattern matching.
- Store extracted data in relational databases using the SQLAlchemy library and understand the advantages and disadvantages of alternative storage options (files, NoSQL).
-
This 3-day intensive hands-on training course is designed for developers, data analysts, data scientists, researchers, and anyone who needs to programmatically collect data from websites for analysis, research, or application development. It is ideal for:
Data Analysts and Scientists needing to source data from the web.
Software Developers building data-driven applications that require web data.
Researchers collecting data for analysis and study.
Business Analysts requiring competitive intelligence, market data, or other web-based information.
Professionals with some programming background and an interest in using Python for data collection tasks.
-
This 2-day intensive hands-on training course is designed for developers, data analysts, data scientists, researchers, and anyone who needs to programmatically collect data from websites for analysis, research, or application development. It is ideal for:
Data Analysts and Scientists needing to source data from the web.
Software Developers building data-driven applications that require web data.
Researchers collecting data for analysis and study.
Business Analysts requiring competitive intelligence, market data, or other web-based information.
Professionals with some programming background and an interest in using Python for data collection tasks.
-
Participants should have:
An understanding of basic programming concepts (e.g., variables, functions, loops, conditionals).
Some prior experience with Python is helpful, but the course includes a refresher covering the necessary fundamentals.
Basic understanding of HTML structure and tags is beneficial, though not strictly required.
Familiarity with using a command-line interface (CLI).
We can customise the training to match your team's experience and needs - with more time and coverage of Python fundamentals for those new to the language, for instance.
-
Python Refresher
Data structures
Control flow statements
Working with files in different formats (CSV, JSON, ...)
Overview on Web Scraping
What is Web Scraping?
Web Crawling vs. Web Scraping
Uses Cases of Web Scraping
Components of a Web Scraper
Alternatives to Web Scraping: Using Web APIs
Data Acquisition
Simple web client using Requests
Building a crawler using Scrapy
Simulating user clicks and browser interactions using Selenium
◦ Handling JavaScript/AJAX in dynamic web pages
◦ Automatic form submission
Data Extraction
Data formats: HTML, XML, JSON
Extracting data from HTML tables using pandas
Ad-hoc parsing of HTML documents using BeautifulSoup
Data Processing and Cleaning
Preparing your data for downstream analysis and computation
Handling missing data and duplicate data
String manipulation and pattern matching
Overview on Natural Language Processing tools for dealing with text data
Data Storage: Relational Databases
Connecting to SQL databases using SQLAlchemy
Inserting data into SQL databases
Reading data from SQL databases
Overview on alternatives to SQL databases: file formats, NoSQL databases
-
Python Documentation: The official and comprehensive documentation for the Python language itself.
Requests Documentation: Official documentation for the popular Requests HTTP library, excellent for simple data fetching.
Scrapy Documentation: Official documentation for the powerful open-source web scraping and web crawling framework, ideal for larger projects.
Selenium Documentation: Documentation for automating browser interactions, essential for scraping dynamic websites.
BeautifulSoup Documentation: Documentation for BeautifulSoup (bs4), a library for pulling data out of HTML and XML files.
Pandas Documentation: Comprehensive documentation for the pandas library, crucial for data manipulation, analysis, and extracting data from HTML tables.
SQLAlchemy Documentation: Official documentation for the Python SQL toolkit and Object Relational Mapper, useful for database interaction.
Trusted by



