Microsoft’s Excel spreadsheet app is still king when it comes to simple, companywide, data analysis. While there are now a growing number of tools trying to compete with Excel for its crown, it is still the most common choice for business. We take a closer look at performing powerful data analytics techniques using Python and Excel in our latest blog.
24-09-2024
What is Excel?
In case you've managed to escape having to use a computer in the last 39 years (in which case, congratulations on finding this page!), Excel
is a spreadsheet
application, developed by Microsoft, and is a component of their Office product group of applications. This is a key factor of its ubiquity as a huge number of people have access to it through workplace or educational subscriptions. Another aspect of its success is its flexibility; it can be used for a wide range of purposes including accounting, charting, inventory tracking, trade data analysis, retail business intelligence reporting - or even just making big sortable lists of stuff. Most importantly it achieves all this without the user having to be a technical expert or software developer.
Data Analytics and Excel
Even though many companies are still performing their analysis using 'vanilla' Excel spreadsheets, there is an increasing trend towards alternative data analysis tools and techniques. One reason for this is that a programming language such as R or Python allows a wider range of techniques to be implemented and utilised than is possible in an Excel function.
While Excel already has a programming aspect to it allowing users to define functions and behaviours using Visual Basic for Applications (VBA), seasoned analysts will be the first to tell you that VBA can be quite ...challenging and maybe even a bit old-fashioned, compared to more frequently-updated and current languages such as Python, Java or C#.
For many years, Python programmers have used third-party libraires to access and manipulate Excel data. One of the most widely known is the OpenPyXL library – it is very widely used, stable open-source and well documented. It provides facilities for:
reading and writing Excel workbooks,
creating and modifying Excel worksheets,
creating and updating Excel formulas,
creating graphs (with support from additional OpenPyXL modules
Once data has been extracted from an Excel spreadsheet (or sheets) then it can be further analysed using additional third party libraires in Python such such as NumPy, Pandas, Polars, SciPy, Seaborn
and visualised using graphing frameworks such as Matplotlib.
These libraries extend the basic functionality of the language with facilities that support the processing and analysis of data, and in the case of Matplotlib. in the presentation of any results generated:
NumPy which provides sophisticated facilities for handling numbers.
SciPy which stands for Scientific Python.
Pandas is a data manipulation and analysis suite of modules. It provides facilities for reading data from a wide range of different formats, for managing and handling missing data, for reshaping data etc.
Polars Polars is a direct rival to Pandas in terms of its aims and objectives with the added focus of aiming to be faster and more efficient than Pandas.
Matplotlib is a sophisticated graphing library already discussed earlier in this book.
Seaborn is a statistical data visualisation library that is based on (builds on Matplotlib). It provides a higher level interface for creating and presenting statistical graphics.
scikit-learn provides a set of machine learning implementations that can be used with data science projects to create classifiers, regression predictors, clustering algorithms, dimensionality reduction libraries etc.
These libraries are hosted in two primary central repositories that can be access over the internet, the first is the PyPi (Python package Index) repository which is used by the pip tool to install third party libraries. The other is a repository (or channel) hosted by Anaconda.
Microsoft and Python
Python's dominance of the data analytics world has not been lost on Microsoft. In addition, Microsoft’s increasing interest in Python, including the fact that they now employ Python's creator Guido van Rossum who came out of retirement to join Microsoft in November 2020, shows that they are investing heavily in the future of Python. Part of this investment has been to consider where and how Python can be used within the Microsoft product family.
This lead to that announcement back in 2023 that it would support the Python programming language directly within the EXCEL spreadsheet software. As of January 2024, Python in Excel was available through the Microsoft 365 Insider Program. It will be rolling out to Excel for Windows first, starting with Version 2406 (Build 17726.20016) and then to other platforms at a later date. For more information on Python in Excel availability see this link.
The latest preview version of this was made available in August 2024.
“allows you to bring the capabilities of Python directly into the Excel grid. As a part of this feature, users with access to Excel for Windows will be able to add Python formulas directly into their workbooks without any installation required. These Python formulas will be run by Excel in a secure container on the Microsoft Cloud with enterprise-level security as a compliant Microsoft 365 connected experience.”
So what does this mean? It means that you can write Python code ‘in a cell’ in a grid. When the cell is ‘evaluated’ the python code will be sent to a Python runtime held within the cloud and the result returned to the Excel spreadsheet and displayed within the cell itself.
This is rather important as it means that it is not necessary to install Python locally on your machine. The Python runtime is managed / handled by Microsoft itself within its own cloud environment. It also means that even if you do have a local installation of Python on your machine, it will not be used. It also means that any modifications you have made locally will not affect the Python within Excel and thus cannot have any direct or indirect effect on it (such as incompatible versions of libraries or version of Python itself).
To make data analytics easier, the Python in Excel environment comes with a core set of Python libraires automatically. These are the libraries provided by Anaconda. Anaconda is a pre-built distribution of a widely used set of data analytics and machine learning libraries. Anaconda (well actually the underlying conda tool and the anaconda library dependency data) also manages dependencies between the libraries being used.
This means that as its just a Python with an Anaconda distribution; you will be able to use all the libraries available to any other Python environment including Pandas, NumPy etc.
How to Work with Python in Excel
First make sure you have a version of Excel that supports the Python feature.
To enable Python within a cell, enter =PY into an Excel cell and then choose PY from the function AutoComplete menu.
Alternatively Select the Formulas ribbon and then Select ‘Insert Python’.
Once Python is enabled within an Excel cell, then that cell will display a PY icon. The formula bar will display the same PY icon when the cell is selected. For example:
To read data from other cells within the spreadsheet you can use the special function xl(). This function accepts Excel objects such as cell names, ranges, tables, queries etc.
For example,
x = xl("A1")
will read the value held in cell A1 into a local python variable x.
The last expression evaluated within the Python code associated with a cell is the result displayed in that cell.
For example, a cell holding the code:
x = xl("A1")
y = xl("B1")
x + y
Will display the result of adding the value in A1 and B1 together, for example:
Summary
The introduction of Python into Excel as a directly supported programming language is very exciting. It opens up a whole host of opportunities to analytics and developers. In terms of flexibility and third party libraires Python is in a different league to the Visual Basic alternative. Of course, care needs to be taken regarding when, where and how to use Python within Excel; however that aside the future is bright for both Excel and Python.
What next?
If you want to take a deeper dive into Python, Data Analysis, or Excel, here are some of our instructor-led hands-on courses to bring your practical skills up to the next level:
To help raise awareness of challenges and vulnerabilities and ways to reduce risk, we've got a bumper crop of cyber security blog articles. We've also got a robust range of hands-on training courses covering security for non-technical staff and IT professionals
We use cookies on our website to provide you with the best user experience. If you're happy with this please continue to use the site as normal. For more information please see our Privacy Policy.