Python and Excel: a marriage made in heaven or in hell?

Microsoft has announced that that they will be adding Python to Excel. This will make it possible to integrate Python code within an Excel spreadsheet to perform Python based analytics within the same workflow. But is it as simple as that?

30-08-2023
Bcorp Logo


Python and Excel: a marriage made in heaven or in hell?


If you are a programmer or a data analyst, there is a fairly strong chance you're using Python for at least some of your work – it's arguably one of the most popular languages for such tasks. However, if you are more of a business analyst who prefers a more graphical way to analyse data, then Excel is probably your go-to tool. 

Until recently, if you wanted to analyse your Excel data with Python then you needed to export that data into a CSV file or use a third-party library to read the Excel file (such as openpxyl or Pandas). However, Microsoft has just announced that that they will be adding Python to Excel. This will make it possible to integrate code within an Excel spreadsheet to perform Python-based analytics within the same workflow.

Python for Data Analytics

Python in the Data Analytics world is huge – it is the most widely used programming language for data analytics by far; other rivals such as R are of course used but Python is the clear favourite. Although it should be noted that Python is not used on its own, usually additional third party libraries such as NumPy, Pandas, Polars, SciPy, Seaborn and MatPlotLib are used. These libraries extend the basic functionality of the language with facilities that support the processing and analysis of data and in the case of MatPlotLib in the presentation of any results generated:

  • NumPy which provides sophisticated facilities for handling numbers.
  • SciPy which stands for Scientific Python.
  • Pandas is a data manipulation and analysis suite of modules. It provides facilities for reading data from a wide range of different formats, for managing and handling missing data, for reshaping data etc.
  • Polars Polars is a direct rival to Pandas in terms of its aims and objectives with the added focus of aiming to be faster and more efficient than Pandas.
  • Matplotlib is a sophisticated graphing library already discussed earlier in this book.
  • Seaborn is a statistical data visualisation library that is based on (builds on Matplotlib). It provides a higher level interface for creating and presenting statistical graphics.
  • SciKitLearn provides a set of machine learning implementations that can be used with data science projects to create classifiers, regression predictors, clustering algorithms, dimensionality reduction libraries etc.

These libraries are hosted in two primary central repositories that can be access over the internet, the first is the PyPi (Python package Index) repository which is used by the pip tool to install third party libraries. The other is a repository (or channel) hosted by Anaconda.

Anaconda is a popular distribution which bundles the Python programming language and a wealth of useful tools used for data analysis, machine learning, and scientific computing / research.

Excel

Python and Excel: a marriage made in heaven or in hell?


Excel has been the mainstay of many organisations' basic data analytical operations for decades. For many Excel users it provides all the functionality and features they need in a familiar, easy-to-use environment. These users may not be familiar with programming per se but are often comfortable with creating equations and automation routines (e.g. using VBA) within Excel and can produce sophisticated results and graphs with ease.

Why Bring Python and Excel Together?

At this point you may be wondering why Microsoft wants to bring Python and Excel together? After all, at first sight they seem to offer similar features (certainly once Python is paired with the aforementioned libraries). One answer is that it allows for flexibility. It is now possible to run Python code within Excel and to perform data analytics or operations that may already be available in one or more libraries or custom modules. This allows the power of Python to be exploited within an Excel spreadsheet.

However, a possibly more compelling answer is that it allows the power of Python to be integrated within an existing Excel workflow. Many Excel users, certainly within the business analysis domain, will not be familiar with Python or how to execute Python scripts / programs. Instead, by embedding Python within Excel they can continue to work in a way which is familiar to them using existing and established workflows. The fact that part of the process is now being automated or analysed using Python may be completely hidden from them. This thus widens the applicability and usability of Python to such an audience within minimal fuss.

Python and Excel: a marriage made in heaven or in hell?


How does it work?

The big question here is how will this work? Microsoft have introduced a new function into Excel, the PY function. This function can be used to Python data to be exposed within the grid of an Excel spreadsheet. The python integration is part of Excel’s built-in connectors and Power Query.

Microsoft are also partnering with Anaconda to provide the Anaconda distribution of libraries so that they can be easily integrated into the Python code defined within an Excel spreadsheet.

Perhaps the most controversial aspect of the Python integration into Excel is that the Python calculations will run on a remote server within the Microsoft cloud, rather than locally using a local Python interpreter. This is probably for two reasons, firstly it simplifies the local installation avoiding the need to install Python and use virtual environments to handle multiple libraries in different projects or Excel files / spreadsheets. It also allows Microsoft more control over the execution of that Python code. As Microsoft 365 is something that combines naturally with the cloud this may be an obvious approach for them to adopt.

Why Not to Use Python in Excel?

Perhaps the final question we should ask is, "why shouldn't you want to use Python within Excel?" There are a few possible reasons, one being this remote cloud execution model; in some organisations this may not be allowed due to the sensitive nature of the data etc. It might also be considered inappropriate to embed such potentially powerful features within a simple spreadsheet with limited controls over the use of the data. Further, one might wonder at the additional complexity of integrating Python and Excel together, since you can do most of the things you want in either environment!

Summary

The introduction of Python within Excel opens up a huge range of features to a new world. There is no comparison between VBA and Python in terms of data analytics functionality. However, care probably needs to be taken to determine where, when, and why an organisation might want to use such features within their existing Excel based data analytics workflows.


Would you like to know more?

We've got lots of great Python training courses to choose from:

Share this post on:

We would love to hear from you

Get in touch

or call us on 020 3137 3920

Get in touch