Jack Card takes a look at a Python library that could make working with Excel files a great deal more productive.
Introduction
Microsoft’s Excel
is still king when it comes to simple companywide data analysis. While there are a growing number of tools trying to compete with Excel for its crown, it is still the most common choice for business. This is in part due to its ease of access (its available anywhere that Microsoft Office 365
is available) and in part due to its flexibility; it can be used for a wide range of purposes including accounting, charting, inventory tracking, and for trade data analysis or retail business intelligence reporting. Most importantly it achieves all this without the user having to be a technical expert or software developer.
Even though many companies are still performing their analysis using Excel spreadsheets, there is an increasing trend towards alternative data analysis tools and techniques.
For most companies, this means either exporting Excel data into tools such as Tableau or requiring developers to extract the data from Excel and analyse it using a programming language such as Python or R.
For example, data held in Excel files can be accessed using the Python language. It is then possible to analyse this data using Python as well as in Python data analysis libraries such as NumPy, SciPy and Pandas and visualized using graphing frameworks such as Matplotlib. This idea is illustrated below, in which a Python program that utilizes these libraries reads data from an Excel file:
Excel files
If you just want a lightweight Excel focussed library, then the OpenPyXL library may be your best option. It is a widely used, stable, open-source project that is also well documented.
The OpenPyXL library provides facilities for:
reading and writing Excel workbooks,
creating and modifying Excel worksheets,
creating and updating Excel formulas,
creating graphs (with support from additional OpenPyXL modules).
As OpenPyXL is not part of the standard Python distribution you will need to install the library yourself using a tool such as Anaconda or pip (e.g. pip install openpyxl). Alternatively, if you are using PyCharm you will be able to add the OpenPyXL library to your project.
The openpyxl.Workbook Class
The key element in the OpenPyXL library is the Workbook class. This can be imported from the openpyxl module:
from
openpyxl import Workbook
A new instance of the (in memory) Workbook can be created using the Workbook class. At this point the workbook is purely a structure within the Python program and must be saved before an actual Excel file is created.
wb = Workbook()
The openpyxl.WorkSheet objects
A workbook is always created with at least one worksheet. You can get hold of the currently active worksheet using the Workbook.active property:
ws = wb.active
Additional worksheets can be created using the Workbook.create_sheet() method:
ws = wb.create_sheet('Mysheet')
You can then access or update the title of the worksheet using the title property:
ws.title = 'New Title'
The background colour of the tab holding this title is white by default. However, you can change this providing an RGB colour code to the worksheet.sheet_properties.tabColor attribute:
ws.sheet_properties.tabColor = "1072BA"
Working with cells
It is possible to access a specific cell of a worksheet using an indexer on the worksheet. Th evalue of the index is the cells label such as A1 or C5:
ws['A1'] = 42
or
cell = ws['A1']
The above statement returns a cell object; you can obtain the value of the cell using the value property, for example:
print(cell.value)
There is also the Worksheet.cell() method which provides access to cells using row and column notation:
d = ws.cell(row=4, column=2, value=10)
It is also possible to retrieve a range of rows or columns. This can be done using slicing:
cell_range = ws['A1':'C2']
It is also possible to obtain a range of columns or rows. In the following col_range and row_range hold a range of values:
As well as reading the data from a worksheet, it is also possible to modify the contents of the worksheet. For example, a row of values can also be added at the current position within the Excel file using append:
ws.append([1, 2, 3])
This will add a row to the Excel file containing the values 1, 2, and 3.
It is also possible to set the value of a cell to be an Excel formula such as:
ws['A3'] = '=SUM(A1, A2)'
A workbook is actually only a structure in memory; it must be saved to a file for permanent storage. These workbooks can be saved using the save() method which is a method that takes a filename and writes the Workbook out in Excel format:
workbook = Workbook()
...
workbook.save('balances.xlsx')
Sample Excel File Creation Application
The following simple application creates a Workbook with two worksheet and supplies a simple Excel formula that sums the values held into other cells:
from
openpyxl import Workbook
def main():
print('Starting Write Excel Example with openPyXL')
workbook = Workbook()
The Excel file generated from this can be viewed in Excel as shown below:
Loading a Workbook from an Excel file
Of course, in many cases it is necessary not just to create Excel files for data export but also to import data from an existing Excel file. This can be done using the OpenPyXL load_workbook() function. This function opens the specified Excel file (in read only mode by default) and returns a Workbook object. An example is provided below:
from
openpyxl import load_workbook
workbook = load_workbook(filename='sample.xlsx')
It is now possible to process the workbook as required by your application. This could involve extracting the data held and then analysing it using NumPy and SciPy etc.
The workbook object provides properties that allow access to the contents of the workbook including::
workbook.active returns the active worksheet object.
workbook.sheetnames returns the names (strings) of the worksheets in this workbook.
workbook.worksheets returns a list of worksheet objects.
The following sample application reads the Excel file created above:
from
openpyxl import load_workbook
def main():
print('Starting reading Excel file using openPyXL')
workbook = load_workbook(filename='sample.xlsx')
print(workbook.active)
print(workbook.sheetnames)
print(workbook.worksheets)
print('-' * 10)
ws = workbook['my worksheet']
print(ws['A1'])
print(ws['A1'].value)
print(ws['A2'].value)
print(ws['A3'].value)
print('-' * 10)
for sheet in workbook:
print(sheet.title)
print('-' * 10)
cell_range = ws['A1':'A3']
for cell in cell_range:
print(cell[0].value)
print('-' * 10)
print('Finished reading Excel file using openPyXL')
if __name__ == '__main__':
main()
The output of which is illustrated below:
Starting reading Excel file using openPyXL
<Worksheet "my worksheet">
['my worksheet', 'my other sheet']
[<Worksheet "my worksheet">, <Worksheet "my other sheet">]
----------
<Cell 'my worksheet'.A1>
42
12
=SUM(A1, A2)
----------
my worksheet
my other sheet
----------
42
12
=SUM(A1, A2)
----------
Finished reading Excel file using OpenPyXL
This blog has illustrated how data can be extracted from an Excel file for use within a Python program. Once this data has been extracted it can be processed or analysed as required using any of the libraries / modules available within the Python ecosystem.
Learn more about Python
If you found this article useful and interesting check out our range of instructor-led Python Courses:
We maintain a robust range of hands-on training courses covering Coding, Data Science, DevOps, Security and more - available for on-site and online delivery as part of a full-stack training programme or as short standalone workshops. We would love to discuss your learning needs - get in touch for a no-obligation scoping chat.
We use cookies on our website to provide you with the best user experience. If you're happy with this please continue to use the site as normal. For more information please see our Privacy Policy.