About the course
Natural Language Processing (NLP) is a rapidly evolving field focused on enabling computers to understand, interpret, and generate human language. With the explosion of text data available from sources like social media, emails, and documents, NLP skills are increasingly vital across industries for tasks such as sentiment analysis, information extraction, topic discovery, and building conversational agents. This course provides a thorough introduction to Natural Language Processing using the powerful Python ecosystem, covering fundamental concepts, essential techniques, and an outlook on modern advancements driven by deep learning and transformer models.
The course begins with the foundations of NLP, exploring common applications and the rich Python ecosystem of libraries like NLTK, spaCy, Gensim, and the cutting-edge Hugging Face Transformers library. You will master essential text processing techniques necessary to prepare text data for analysis, including tokenisation, stemming, lemmatisation, and using Regular Expressions for pattern matching and cleaning. A significant focus is placed on converting unstructured text into numerical formats that machine learning algorithms can understand, covering traditional methods like Bag-of-Words and TF-IDF, and introducing modern word and document embeddings, including the conceptual role of contextual embeddings derived from Transformer models.
You will gain practical skills in core NLP tasks, including Text Classification (categorising documents using both classical ML approaches and modern techniques leveraging pre-trained Transformer models), Topic Modelling (discovering underlying themes using algorithms like LDA), and Information Extraction (specifically Named Entity Recognition using powerful libraries like spaCy and Hugging Face).
The course also provides an outlook on advanced NLP applications such as Text Summarisation and Natural Language Generation, demonstrating capabilities using pre-trained models and setting realistic expectations for developing solutions in these complex areas. Through dedicated hands-on labs integrated throughout, you will gain practical experience applying these techniques and libraries to real-world text data in Python.
Instructor-led online and in-house face-to-face options are available - as part of a wider customised training programme, or as a standalone workshop, on-site at your offices or at one of many flexible meeting spaces in the UK and around the World.
-
- Understand core NLP concepts, common applications, and the landscape of the Python NLP ecosystem (NLTK, spaCy, Hugging Face Transformers...).
- Perform essential text processing and cleaning techniques (tokenisation, stemming, lemmatisation, regex) using Python libraries.
- Create numerical representations of text using methods like Bag-of-Words, TF-IDF, and understand different types of word, document, and contextual embeddings.
- Implement Text Classification using both classical ML models (e.g., Naive Bayes, SVMs with Scikit-learn) and by leveraging pre-trained Transformer models (e.g., with Hugging Face).
- Perform Topic Modelling using algorithms like Latent Dirichlet Allocation (LDA) in Python.
- Perform Named Entity Recognition (NER) using libraries like spaCy and Hugging Face Transformers.
- Understand the concepts and applications of advanced NLP tasks such as Text Summarisation and Natural Language Generation.
- Use key Python NLP libraries including NLTK, spaCy, scikit-learn, Gensim, and Hugging Face Transformers for various NLP tasks.
- Apply NLP techniques to process, analyse, and build models for real-world text data through hands-on labs.
-
This course is designed for developers, data scientists, data analysts, and researchers who want to gain a practical introduction to Natural Language Processing using the Python programming language and its rich ecosystem of libraries. It is ideal for:
Data Professionals who work with or anticipate working with text data.
Developers interested in building applications that involve processing or understanding human language.
Individuals looking to add foundational and modern NLP skills to their repertoire.
-
Participants should have attended our Python Programming and Machine Learning courses, or have equivalent experience:
Working knowledge of the Python programming language, including experience with libraries like NumPy and Pandas is beneficial.
Basic familiarity with machine learning concepts (e.g., features, training, evaluation metrics) is beneficial but not strictly required, as relevant concepts will be introduced in context.
No prior experience with Natural Language Processing is required.
-
This NLP course is available for private / custom delivery for your team - as an in-house face-to-face workshop at your location of choice, or as online instructor-led training via MS Teams (or your own preferred platform).
Get in touch to find out how we can deliver tailored training which focuses on your project requirements and learning goals.
-
Foundations - Text Processing and the NLP Ecosystem
Understanding Natural Language Processing: What is NLP? Key challenges in understanding and processing human language. Overview of diverse real-world NLP applications (e.g., sentiment analysis, chatbots, search).
The Python NLP Ecosystem: Exploring the landscape of popular Python libraries for NLP and their strengths (NLTK, spaCy, Gensim, scikit-learn, Hugging Face Transformers). Guidance on choosing the right tool for different tasks.
Working with Text: Essential techniques for preparing text data.
Tokenisation: Breaking text into meaningful units (words, sub-word tokens).
Text Pre-processing: Techniques like lowercasing, punctuation removal, handling noise, stemming (NLTK) and lemmatisation (spaCy).
Regular Expressions: Using RegEx for pattern matching, searching, and cleaning text.
Basic Text Analysis:
Word Frequencies and Distributions: Identifying common and rare words, understanding the concept of Stop-words.
Introduction to Zipf's Law (for context on word distribution patterns).
Mining simple word co-occurrences to identify basic relationships between words.
Hands-On Lab: Setting up the Python NLP environment, loading and cleaning text data, performing tokenisation and pre-processing steps using NLTK and spaCy, calculating word frequencies and identifying stop words.
Text Representation - Converting Text to Data
The Need for Numerical Representation: Why text needs to be converted into numerical formats (vectors, matrices) for machine learning algorithms.
Traditional Methods:
N-grams: Representing sequences of words or characters.
Bag-of-Words (BoW): Simple frequency-based representation of documents.
TF-IDF (Term Frequency-Inverse Document Frequency): Weighing word importance in a document relative to a corpus.
Introduction to Word Embeddings: Dense vector representations that capture semantic relationships between words.
Concepts of classic word embeddings (Word2Vec, GloVe, FastText).
Loading and using pre-trained word embeddings.
Introduction to Document Embeddings: Representing entire documents as vectors (e.g., averaging word embeddings, or simple Doc2Vec concepts).
Modern Contextual Embeddings: Introduction to the concept of embeddings derived from Transformer models (e.g., BERT), where the vector representation of a word changes based on its surrounding context.
Hands-On Lab: Implementing BoW and TF-IDF representations using Scikit-learn, working with pre-trained word embeddings, conceptual discussion and brief demonstration of contextual embeddings.
Text Classification
The Text Classification Problem: Training models to automatically assign predefined categories or labels to documents.
Common Applications: Topic Classification, Sentiment Analysis, Spam Detection, Intent Recognition.
Classical ML Approaches: Using Scikit-learn for Text Classification pipelines with traditional representations (BoW, TF-IDF).
Algorithms: Naive Bayes, Support Vector Machines (SVMs), Logistic Regression.
Introduction to Deep Learning Approaches: Concepts of using simple feed-forward neural networks with word embeddings for classification.
Modern Approaches with Transformers: Using pre-trained Transformer models (e.g., from the Hugging Face transformers library) for Text Classification. Understanding the fine-tuning paradigm (briefly).
Model Evaluation: Assessing Classification Quality using appropriate metrics (Accuracy, Precision, Recall, F1-score, Confusion Matrix) and cross-validation.
Model Introspection: Basic techniques for understanding why a model made a specific classification decision.
Hands-On Lab: Implementing text classification using classical ML (Scikit-learn pipeline), implementing text classification using a pre-trained Transformer model (Hugging Face transformers), evaluating and comparing model performance.
Topic Modelling
Topic Modelling: Discovering abstract "topics" or themes that occur in a collection of documents.
Understanding Probabilistic Topic Models.
Algorithm: Latent Dirichlet Allocation (LDA) - Concepts and practical implementation using Gensim or Scikit-learn.
Interpreting the results of Topic Models: Identifying key words for each topic and assigning documents to topics.
Evaluating Topic Models (basic concepts).
Hands-On Lab: Implementing LDA on a collection of documents, exploring the resulting topics and dominant themes in documents.
Information Extraction
Introduction to Information Extraction: Automatically extracting structured information from unstructured text.
Named Entity Recognition (NER): Identifying and classifying named entities in text (e.g., persons, organisations, locations, dates).
Techniques: Rule-based approaches (briefly) vs. Machine Learning/Deep Learning approaches.
Practical NER using libraries like spaCy and pre-trained models from Hugging Face Transformers.
(Optional/Brief): Introduction to other IE tasks like Relation Extraction (identifying relationships between entities) or Event Extraction (identifying mentions of events).
Hands-On Lab: Performing Named Entity Recognition on sample text using spaCy and/or Hugging Face Transformers, exploring the output and evaluating results.
Introduction to Advanced NLP Applications - An Outlook
Overview on Advanced NLP Problems: Introduction to complex tasks that often build on deep learning and large models.
Text Summarisation:
Understanding the goal: Condensing a document or collection into a shorter version.
Approaches: Extractive Summarisation (selecting key sentences, e.g., TextRank concepts) vs. Abstractive Summarisation (generating new sentences).
Conceptual Overview: How sequence-to-sequence models and Transformers are used for summarisation.
Demonstration: Using a pre-trained summarisation model from Hugging Face Transformers.
Natural Language Generation (NLG):
Understanding the goal: Creating human-like text from structured data or as a continuation of existing text.
Techniques Overview: Simple methods (e.g., template-based, N-gram generation) vs. Deep Learning methods (RNNs, LSTMs, and especially Transformer-based Language Models).
Setting Realistic Expectations: Highlighting the complexity and current state of the art in generating coherent, relevant, and fluent text, and the significant role of large pre-trained models (LLMs).
Demonstration: Using a pre-trained language model from Hugging Face Transformers for text generation (e.g., prompting the model, exploring different decoding strategies).
(Optional/Brief): Introduction to Machine Translation concepts, Conversational AI / Chatbot basics.
Hands-On Demo/Brief Lab: Using libraries to perform summarisation and text generation with pre-trained models on sample data.
Summary and next steps
o Review of key NLP concepts, techniques, and the Python ecosystem covered throughout the course.
o Connecting the learned concepts and tools to real-world applications and participant interests.
o Discussing next steps in your NLP journey: Exploring specific libraries in more depth, diving deeper into Deep Learning for NLP theory, working with advanced architectures, fine-tuning large pre-trained models, deploying NLP models (MLOps for NLP).
o Q&A
-
NLTK (Natural Language Toolkit) Documentation: Official documentation for the foundational NLTK library. https://www.nltk.org/
spaCy Documentation: Official documentation for the industrial-strength spaCy library. https://spacy.io/api/
Hugging Face Transformers Documentation: Official documentation for the library that provides access to thousands of pre-trained models. https://huggingface.co/docs/transformers/
Scikit-learn Documentation: Official documentation for the widely used machine learning library, relevant for classification and some NLP utilities. https://scikit-learn.org/stable/documentation.html
Gensim Documentation: Official documentation for the library focused on topic modelling and vector space modelling. https://radimrehurek.com/gensim/
Trusted by



