Learning to Excel with Python

Jack Card looks at one library that is widely used within the Python ecosystem to read and write Excel files.

23-06-2021
Bcorp Logo
Learning to Excel with Python

If you've just found this post, check out our updated guide on How to Excel with Python in 2024!


Jack Card takes a look at a Python library that could make working with Excel files a great deal more productive.

Introduction

Microsoft’s Excel is still king when it comes to simple companywide data analysis. While there are a growing number of tools trying to compete with Excel for its crown, it is still the most common choice for business. This is in part due to its ease of access (its available anywhere that Microsoft Office 365 is available) and in part due to its flexibility; it can be used for a wide range of purposes including accounting, charting, inventory tracking, and for trade data analysis or retail business intelligence reporting. Most importantly it achieves all this without the user having to be a technical expert or software developer.

Even though many companies are still performing their analysis using Excel spreadsheets, there is an increasing trend towards alternative data analysis tools and techniques.

For most companies, this means either exporting Excel data into tools such as Tableau or requiring developers to extract the data from Excel and analyse it using a programming language such as Python or R.

For example, data held in Excel files can be accessed using the Python language. It is then possible to analyse this data using Python as well as in Python data analysis libraries such as NumPy, SciPy and Pandas and visualized using graphing frameworks such as Matplotlib. This idea is illustrated below, in which a Python program that utilizes these libraries reads data from an Excel file:

Learning to Excel with Python

Excel files

If you just want a lightweight Excel focussed library, then the OpenPyXL library may be your best option. It is a widely used, stable, open-source project that is also well documented.

The OpenPyXL library provides facilities for:

  • reading and writing Excel workbooks,
  • creating and modifying Excel worksheets,
  • creating and updating Excel formulas,
  • creating graphs (with support from additional OpenPyXL modules).

As OpenPyXL is not part of the standard Python distribution you will need to install the library yourself using a tool such as Anaconda or pip (e.g. pip install openpyxl). Alternatively, if you are using PyCharm you will be able to add the OpenPyXL library to your project.

Learning to Excel with Python


The openpyxl.Workbook Class

The key element in the OpenPyXL library is the Workbook class. This can be imported from the openpyxl module:

from openpyxl import Workbook

A new instance of the (in memory) Workbook can be created using the Workbook class. At this point the workbook is purely a structure within the Python program and must be saved before an actual Excel file is created.

wb = Workbook()

The openpyxl.WorkSheet objects

A workbook is always created with at least one worksheet. You can get hold of the currently active worksheet using the Workbook.active property:

ws = wb.active

Additional worksheets can be created using the Workbook.create_sheet() method:

ws = wb.create_sheet('Mysheet')

You can then access or update the title of the worksheet using the title property:

ws.title = 'New Title'

The background colour of the tab holding this title is white by default. However, you can change this providing an RGB colour code to the worksheet.sheet_properties.tabColor attribute:

ws.sheet_properties.tabColor = "1072BA"

Working with cells

Learning to Excel with Python

It is possible to access a specific cell of a worksheet using an indexer on the worksheet. Th evalue of the index is the cells label such as A1 or C5:

ws['A1'] = 42

or

cell = ws['A1']

The above statement returns a cell object; you can obtain the value of the cell using the value property, for example:

print(cell.value)

There is also the Worksheet.cell() method which provides access to cells using row and column notation:

d = ws.cell(row=4, column=2, value=10)

It is also possible to retrieve a range of rows or columns. This can be done using slicing:

cell_range = ws['A1':'C2']

It is also possible to obtain a range of columns or rows. In the following col_range and row_range hold a range of values:

col = ws['C']
col_range = ws['C:D']
row10 = ws[10]
row_range = ws[5:10]

As well as reading the data from a worksheet, it is also possible to modify the contents of the worksheet. For example, a row of values can also be added at the current position within the Excel file using append:

ws.append([1, 2, 3])

This will add a row to the Excel file containing the values 1, 2, and 3.

It is also possible to set the value of a cell to be an Excel formula such as:

ws['A3'] = '=SUM(A1, A2)'

A workbook is actually only a structure in memory; it must be saved to a file for permanent storage. These workbooks can be saved using the save() method which is a method that takes a filename and writes the Workbook out in Excel format:

workbook = Workbook()
...
workbook.save('balances.xlsx')

Sample Excel File Creation Application

The following simple application creates a Workbook with two worksheet and supplies a simple Excel formula that sums the values held into other cells:

from openpyxl import Workbook


def main():
    print('Starting Write Excel Example with openPyXL')
workbook = Workbook()

# Get the current active worksheet

    ws = workbook.active
    ws.title = 'my worksheet'
    ws.sheet_properties.tabColor = '1072BA'
    ws['A1'] = 42    
    ws['A2'] = 12
    ws['A3'] = '=SUM(A1, A2)'
    ws2 = workbook.create_sheet(title='my other sheet')
    ws2['A1'] = 3.42
    ws2.append([1, 2, 3])
    ws2.cell(column=2, row=1, value=15)
    workbook.save('sample.xlsx')
    print('Done Write Excel Example')
if __name__ == '__main__':
    main()

The Excel file generated from this can be viewed in Excel as shown below:

Learning to Excel with Python


Loading a Workbook from an Excel file

Of course, in many cases it is necessary not just to create Excel files for data export but also to import data from an existing Excel file. This can be done using the OpenPyXL load_workbook() function. This function opens the specified Excel file (in read only mode by default) and returns a Workbook object. An example is provided below:

from openpyxl import load_workbook


workbook = load_workbook(filename='sample.xlsx')

It is now possible to process the workbook as required by your application. This could involve extracting the data held and then analysing it using NumPy and SciPy etc.

The workbook object provides properties that allow access to the contents of the workbook including::

  • workbook.active returns the active worksheet object.
  • workbook.sheetnames returns the names (strings) of the worksheets in this workbook.
  • workbook.worksheets returns a list of worksheet objects.

The following sample application reads the Excel file created above:

from openpyxl import load_workbook


def main():
    print('Starting reading Excel file using openPyXL')
    workbook = load_workbook(filename='sample.xlsx')
    print(workbook.active)
    print(workbook.sheetnames)
    print(workbook.worksheets)
    print('-' * 10)
    ws = workbook['my worksheet']
    print(ws['A1'])
    print(ws['A1'].value)
    print(ws['A2'].value)
    print(ws['A3'].value)
    print('-' * 10)
    for sheet in workbook:
        print(sheet.title)
    print('-' * 10)
    cell_range = ws['A1':'A3']
    for cell in cell_range:
        print(cell[0].value)
    print('-' * 10)
    print('Finished reading Excel file using openPyXL')
    if __name__ == '__main__':
main()

The output of which is illustrated below:

Starting reading Excel file using openPyXL

<Worksheet "my worksheet">
['my worksheet', 'my other sheet']
[<Worksheet "my worksheet">, <Worksheet "my other sheet">]
----------
<Cell 'my worksheet'.A1>
42
12
=SUM(A1, A2)
----------
my worksheet
my other sheet
----------
42
12
=SUM(A1, A2)
----------

Finished reading Excel file using OpenPyXL

This blog has illustrated how data can be extracted from an Excel file for use within a Python program. Once this data has been extracted it can be processed or analysed as required using any of the libraries / modules available within the Python ecosystem.

Learn more about Python

If you found this article useful and interesting check out our range of instructor-led Python Courses:

Or take a look at some of our other Python blogs:

Share this post on:

We would love to hear from you

Get in touch

or call us on 020 3137 3920

Get in touch