Introduction
We have all been hearing a lot about Large Language Models (LLMs) over the last few years, but recently a new trend has started to appear called Small Language Models (SLMs). Indeed, many companies (such as Microsoft and Google) in 2025 and 2026 have started to invest heavily in SLMs, rather than just focusing on LLMs that require extensive cloud-based infrastructures.
You might be forgiven for thinking "What?" and "Why?" given the terminology around LLMs and SLMs. Not least because of the way in which LLMs have been positioned as providing quite remarkable AI abilities due to their very nature (i.e. that they are "Large").
Where then do "Small" Language Models come in? In what way are they small, and why might an organisation or individual choose to use a small version over a large version?
In this blog, we will look at SLMs and try to answer the above questions.
What is a Large Language Model?
Let’s start off by discussing what we mean by Large Language Models (LLMs), and what they are often used for.
According to Wikipedia:
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. [1] [2] The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the core capabilities of modern chatbots.
As suggested above, the most common (at least in the public domain) uses of LLMs are for chatbot-style tools such as OpenAI’s ChatGPT, Google's Gemini, Meta’s Llama or DeepSeeks’ suite of tools.
A user can ask these systems questions and in most cases, they can give you surprisingly good answers very, very quickly.
These systems employ Natural Language Processing to ‘understand’ your questions or prompts and once they has generated their answers, they again use NLP to created understandable and meaningful outputs (or images if appropriate). Between these two stages these systems utilise various AI-based tools to generate these responses, including an element of context awareness.
So what makes them ‘Large’? Well the ‘large’ part comes from that idea that these systems take many billions to trillions of parameters as inputs. Using these very numerous inputs, they build up internal models that can be used for tasks such as generative responses, summations, translations and reasoning activities.
To train and manage these LLMs requires significant infrastructure and processing power. This can be seen by looking at the costs associated with GPT-1 versus its successors (GPT - Generative Pre-trained Transformer - is the underlying model used by ChatGPT).
GPT Version | Release | Parameters (billions) | Training Cost (petaflop-day) |
GPT-1 | 2018 | 0.117 | 1 |
GPT-2 | 2019 | 1.5 | 28 |
GPT-3 | 2020 | 175 | 3640 |
GPT-4 | 2023 | Approx. 1760 | est. 230,000 |
As can be seen from the above, the size of the LLMs has grown significantly, and in turn, their cost has grown exponentially. There has also been associated environmental concerns due to the energy and cooling requirements of these systems (for example see the Nature article on ‘Reconciling the contrasting narratives on the environmental impact of large language models’).
Of course companies are not completely abandoning LLMs, but many are switching some tasks to SLMs because they are more practical for everyday business use.

What is a Small Language Model?
This brings us to what we mean when we describe something as being a Small Language Model. Here, "Small" is a relative term. It means that the associated Language Model has fewer parameters than a Large Language Model. They typically have only millions to a few billion parameters. They therefore require less computing power and are "lighter" weight language models.
Since they are still a type of language model, they are still able to answer questions, generate textual outputs, summarise information, and translate between languages depending upon their target task. This then is the other difference: they tend to be focused on a specific task.
Why do we have Small Language Models?
The key reason is that they require fewer resources (less processing power and less memory). They are therefore cheaper to run.
However, this is not the only reason for SLMs. They also tend to run more quickly than LLMs and can run on less sophisticated hardware, including mobile devices.
Due to their lower hardware requirements and cheaper running costs, they are also better suited to being run locally within an organisation, rather than utilising cloud-based solutions requiring data to be sent off to the cloud (and thus outside of an organisation’s own infrastructure). There are therefore security benefits associated with SLMs.
To summarise, the benefits of a Small Language Model include:
Cheaper
Faster
Lower hardware requirements
Can run locally
Less common to send data outside the organisation
How are Small Language Models Built?
There are several techniques used to generate SLMs, these include:
Knowledge Distillation: In this approach to creating an SLM, an LLM is used to teach a smaller model. For example, the LLM generates answers to questions, and the SLM learns those inputs and outputs directly.
Pruning: In this approach, pruning removes weak internal connections, and engineers fine-tune the model. This makes the underlying neural network model simpler and smaller.
Quantization: Quantization reduces the precision of numbers used in the models. For example, in an LLM, numbers are typically represented using 32 bits. However, many SLMs use 8-bit or even 4-bit representations. This reduces the memory overheads of the model and reduces computation times.
Architecture Optimization: Work based on analysis of larger LLMs and how the models are actually used has helped engineers and developers refine these models to reduce the number of layers and apply other optimizations to simplify the architecture of SLMs.
Parameter Sharing: Rather than duplicating parameters across different layers of the model, weights and parameters are shared, which significantly reduces the overall size.
Perhaps a suitable analogy indicating how SLMs can be created that equal (or in some cases outperform) LLMs is that of a large library with a wide range of books and magazine on almost every subject. In contrast an SLM is like a carefully selected best murder mystery books only store.
Examples of Small Language Model Systems
There are a range of SML frameworks / libraries available. They are often related to a LLM model from which they are often derived. For example, GPT-5 and GPT-5 Mini. Here are a few of the most widely used:
GPT-5 Mini: A faster, smaller version of GPT-5 aimed at high-volume or specific, well-defined tasks.
GPT-5 Nano: Described as the fastest and cheapest version of GPT-5, suitable for classification and summarization tasks.
DistilBERT – a compressed version of BERT (Bidirectional Encoder Representations from Transformers).
TinyBERT – an optimized version of BERT for mobile devices.
Phi-2 – small language model from Microsoft.
Real World Uses of Small Language Models
This might well lead to the question, "Where are Small Language Models used?" Here are some typical uses for SLMs:
Phone AI assistants
Email spam detection
Chatbots inside apps
Smart keyboards / autocomplete
Offline AI tools
For example, some AI features in smartphones run a small model directly on the device instead of the cloud. In fact, SLMs seem to be particularly suitable for mobile devices due to the following features.
SLMs work well because they:
Require less memory
Run on mobile chips
Respond instantly
Keep data private (i.e. data stays on the phone)
As an example, on iPhones, Apple Intelligence features SLMs that are run directly on the phone and can assist users with:
Writing aids, such as helping to fix grammar, change the tone of messages, or rewrite messages entirely
Summarising notifications
Providing voice assistance via Siri (such as "summarise this article")
Smart keyboard suggestions, such as suggesting full sentences rather than just individual words
Summary
Small Language Models are increasing in importance and applications, even as the world is being dominated by their big brothers, the Large Language Models. However, in many situations (such as on mobile devices), SLMs are better solutions offering a range of benefits, both economical as well as performance-oriented.

.png)
