In the eighth annual Python Developers Survey (2024), JetBrains found that 80% of developers working in Data Science use pandas, whereas only 16% use Spark and 15% use Polars. This means any major update of pandas is big news for those using Python and working in the Data Science arena. On January 21st 2026 pandas 3.0 was released. In this blog we will look at what the new features in pandas 3.0 are and why you should consider moving to this new version of this highly influential library.

pandas 3.0
pandas 2.0 addressed a major performance black spot, which at the time was being addressed by Polars. Back when pandas 2.0 was released there was a lot of talk about Polars replacing pandas altogether. As shown from the introduction that hasn't happened pandas has gone from strength to strength. This doesn't mean that there were no areas that could be improved and pandas 3.0 is designed to address at least some of these.
The main highlights of the pandas 3.0 release are:
Dedicated pandas String data type
Consistent copy/view behaviour with Copy-on-Write
Support for pd.col() syntax to create expressions
Support for the Arrow PyCapsule Interface
We will look at each of these in turn with the first two above being the most significant.

Dedicated pandas String data type
In previous version of pandas, there was something of an anomaly when it came to string like data. For example, if you loaded data into a DataFrame and one or more of the columns in that DataFrame contained data that looked like Python strings, pandas would identify that column as a column of (NumPy) objects. Now technically in Python everything is an object and that means that Strings are objects. However, from a pandas point of view, we want a DataFrame to work with the type of data in a column as this can be used to determine what behaviour is appropriate for the columns etc. For other data types, pandas can tell us what the dtype (data type) is and thus what it is sensible to do with that data; an important aspect when doing data exploration type tasks.
In addition, from a performance point of view, it means that an application that held a lot of textual data and needed to perform a lot of text oriented actions could easily deteriorate as pandas had to do a lot of work under the hood with these ‘objects’. Additionally, it also had implications for memory storage.
pandas 3.0 changes all this. It now has a string-oriented data type (str) which will can use to identify textual string like data when that data is first loaded / added to a DataFrame.

The pandas 3.0 release notes state:
“a dedicated string data type is enabled by default (backed by PyArrow under the hood, if installed, otherwise falling back to being backed by NumPy object-dtype). This means that pandas will start inferring columns containing string data as the new str data type when creating pandas objects, such as in constructors or IO functions.”
For example, using a pandas version we would see the following:
ser = pd.Series(["John", "Tom"])
0 John
1 Tom
dtype: objectHowever, doing the same thing while using Pandas 3.0 gives us:
ser = pd.Series(["John", "Tom"])
0 John
1 Tom
dtype: strThis change may not look that significant, however this new data type is not just a wrapper around what pandas previously did, it is a fundamental shift in the way that textual data is held internally to the DataFrame.
For example, the main effects of this change are that
The string data type is inferred by default (instead of the NumPy object data type)
String columns can only hold strings (or missing values)
It is now impossible to store a non-string value into a column intended / expected to hold strings.
If missing values are present, then these are represented using Nan (actually np.nan) and follow the same missing value semantics as other default data types in pandas.
Consistent copy/view behaviour with Copy-on-Write
pandas 3.0 brings a new “copy-on-write” behaviour for copies and views. These changes are intended to make the developers API more consistent and comprehensible (and predictable) with less anomalies. The new API brings together a clear set of rules, specifically any subset or returned series /DataFrame always behaves as a copy of the original. Thus the original is never modified! In previous version of pandas, whether the original was modified or a copy produced depending the operation being performed (and potentially argument values)_ which could be confusing and potentially error prone.
Support for pd.col() syntax to create expressions
The above two changes are the big news for pandas 3.0, however this change will, we suspect, be as important in the long run. Essentially, this change beings in a cleanr API for column operations using pd.col.
Previously where you used to have use a lambda and potentially create temporary columns etc. you can now use pd.col instead. For example, prior to pandas 3.0 you might have written something like:
df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
df.assign(c = lambda df: df['a'] + df['b'])
Out[2]:
a b c
0 1 4 5
1 1 5 6
2 2 6 8Now you can write:
df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
df.assign(c = pd.col('a') + pd.col('b'))
Out[3]:
a b c
0 1 4 5
1 1 5 6
2 2 6 8This may not seem like much, but it simplifies the developer API and makes the whole process seem much more natural in pandas.
Support for the Arrow PyCapsule Interface
The Arrow C data interface can be used to move data between different DataFrame libraries through the Arrow format, and is designed to be zero-copy where possible. In Python, this interface is exposed through the Arrow PyCapsule Protocol.
DataFrame and Series both now support the Arrow PyCapsule Interface for both export and import of data. From some small tests these appears to provide some of the observed performance benefits obtained from Pandas 3.0.
pandas 3.0 Performance
So the other big question is, does this benefit the user in terms of overall performance. As a simple benchmark we put together a short application that loaded a reasonable amount of data, did some simple DataFrame operations and stored the resulting data to an excel file.
When run using pandas 2.2 this short application took about 55 seconds to run. When it was upgraded to Pandas 3.0 it took about 34 seconds to run with no other changes.
Whilst this was not an extensive performance test by any means it does illustrate the general trend being reported by those using Pandas 3.0 – it can offer a significant performance improvement (although of course it depends on what you are actually doing).

Setting up pandas 3.0
Pandas 3.0 requires Python 3.11 or newer, so if you are for example on python 3,10 you will need to upgrade your version of Python. To install pandas 3.0 you can use pip and
pip install pandas=3.0if you are using conda then you can use:
conda install -c conda-forge pandasIf you want to try out the new features in pandas 3.0 but without upgrading to pandas 3.0 then you can do so if you are on pandas 2.3 by enable two flags:
pd.options.future.infer_string = True
pd.options.mode.copy_on_write = TrueTo Upgrade or Not to Upgrade
That is the question! pandas 3.0 offers significant performance improvements both for text-oriented applications and for its modified copy_on_write behaviour. Both of these offer significant improvements in performance. However, there is a migration cost which will need to be taken into account. For example, the change to a specific string type in pandas 3.0 means that there are potential breaking changes present. For example, if you code checks the .dtype for a column and you previously assumed it was object but it is now string or checking the exact missing value sentinel. However, pandas have provided a Migration Guide that should be reviewed before and used during any upgrade.
.png)
.png)
.png)