Jack Card looks at one library that is widely used within the Python ecosystem to read and write Excel files.
Jack Card takes a look at a Python library that could make working with Excel files a great deal more productive.
Microsoft’s Excel is still king when it comes to simple companywide data analysis. While there are a growing number of tools trying to compete with Excel for its crown, it is still the most common choice for business. This is in part due to its ease of access (its available anywhere that Microsoft Office 365 is available) and in part due to its flexibility; it can be used for a wide range of purposes including accounting, charting, inventory tracking, and for trade data analysis or retail business intelligence reporting. Most importantly it achieves all this without the user having to be a technical expert or software developer.
Even though many companies are still performing their analysis using Excel spreadsheets, there is an increasing trend towards alternative data analysis tools and techniques.
For most companies, this means either exporting Excel data into tools such as Tableau or requiring developers to extract the data from Excel and analyse it using a programming language such as Python or R.
For example, data held in Excel files can be accessed using the Python language. It is then possible to analyse this data using Python as well as in Python data analysis libraries such as NumPy, SciPy and Pandas and visualized using graphing frameworks such as Matplotlib. This idea is illustrated below, in which a Python program that utilizes these libraries reads data from an Excel file:
If you just want a lightweight Excel focussed library, then the OpenPyXL library may be your best option. It is a widely used, stable, open-source project that is also well documented.
The OpenPyXL library provides facilities for:
As OpenPyXL is not part of the standard Python distribution you will need to install the library yourself using a tool such as Anaconda or pip (e.g. pip install openpyxl). Alternatively, if you are using PyCharm you will be able to add the OpenPyXL library to your project.
The key element in the OpenPyXL library is the Workbook class. This can be imported from the openpyxl module:
from openpyxl import Workbook
A new instance of the (in memory) Workbook can be created using the Workbook class. At this point the workbook is purely a structure within the Python program and must be saved before an actual Excel file is created.
wb = Workbook()
A workbook is always created with at least one worksheet. You can get hold of the currently active worksheet using the Workbook.active property:
ws = wb.active
Additional worksheets can be created using the Workbook.create_sheet() method:
ws = wb.create_sheet('Mysheet')
You can then access or update the title of the worksheet using the title property:
ws.title = 'New Title'
The background colour of the tab holding this title is white by default. However, you can change this providing an RGB colour code to the worksheet.sheet_properties.tabColor attribute:
ws.sheet_properties.tabColor = "1072BA"
It is possible to access a specific cell of a worksheet using an indexer on the worksheet. Th evalue of the index is the cells label such as A1 or C5:
ws['A1'] = 42
cell = ws['A1']
The above statement returns a cell object; you can obtain the value of the cell using the value property, for example:
There is also the Worksheet.cell() method which provides access to cells using row and column notation:
d = ws.cell(row=4, column=2, value=10)
It is also possible to retrieve a range of rows or columns. This can be done using slicing:
cell_range = ws['A1':'C2']
It is also possible to obtain a range of columns or rows. In the following col_range and row_range hold a range of values:
col = ws['C'] col_range = ws['C:D'] row10 = ws row_range = ws[5:10]
As well as reading the data from a worksheet, it is also possible to modify the contents of the worksheet. For example, a row of values can also be added at the current position within the Excel file using append:
ws.append([1, 2, 3])
This will add a row to the Excel file containing the values 1, 2, and 3.
It is also possible to set the value of a cell to be an Excel formula such as:
ws['A3'] = '=SUM(A1, A2)'
A workbook is actually only a structure in memory; it must be saved to a file for permanent storage. These workbooks can be saved using the save() method which is a method that takes a filename and writes the Workbook out in Excel format:
workbook = Workbook() ...
The following simple application creates a Workbook with two worksheet and supplies a simple Excel formula that sums the values held into other cells:
from openpyxl import Workbook
def main(): print('Starting Write Excel Example with openPyXL') workbook = Workbook()
# Get the current active worksheet
ws = workbook.active ws.title = 'my worksheet' ws.sheet_properties.tabColor = '1072BA' ws['A1'] = 42 ws['A2'] = 12 ws['A3'] = '=SUM(A1, A2)' ws2 = workbook.create_sheet(title='my other sheet') ws2['A1'] = 3.42 ws2.append([1, 2, 3]) ws2.cell(column=2, row=1, value=15) workbook.save('sample.xlsx') print('Done Write Excel Example') if __name__ == '__main__': main()
The Excel file generated from this can be viewed in Excel as shown below:
Of course, in many cases it is necessary not just to create Excel files for data export but also to import data from an existing Excel file. This can be done using the OpenPyXL load_workbook() function. This function opens the specified Excel file (in read only mode by default) and returns a Workbook object. An example is provided below:
from openpyxl import load_workbook
workbook = load_workbook(filename='sample.xlsx')
It is now possible to process the workbook as required by your application. This could involve extracting the data held and then analysing it using NumPy and SciPy etc.
The workbook object provides properties that allow access to the contents of the workbook including::
The following sample application reads the Excel file created above:
from openpyxl import load_workbook
def main(): print('Starting reading Excel file using openPyXL') workbook = load_workbook(filename='sample.xlsx') print(workbook.active) print(workbook.sheetnames) print(workbook.worksheets) print('-' * 10) ws = workbook['my worksheet'] print(ws['A1']) print(ws['A1'].value) print(ws['A2'].value) print(ws['A3'].value) print('-' * 10) for sheet in workbook: print(sheet.title) print('-' * 10) cell_range = ws['A1':'A3'] for cell in cell_range: print(cell.value) print('-' * 10) print('Finished reading Excel file using openPyXL') if __name__ == '__main__': main()
The output of which is illustrated below:
Starting reading Excel file using openPyXL
<Worksheet "my worksheet"> ['my worksheet', 'my other sheet'] [<Worksheet "my worksheet">, <Worksheet "my other sheet">] ---------- <Cell 'my worksheet'.A1> 42 12 =SUM(A1, A2) ---------- my worksheet my other sheet ---------- 42 12 =SUM(A1, A2) ----------
Finished reading Excel file using OpenPyXL
This blog has illustrated how data can be extracted from an Excel file for use within a Python program. Once this data has been extracted it can be processed or analysed as required using any of the libraries / modules available within the Python ecosystem.
If you found this article useful and interesting check out our range of instructor-led Python Courses:
Or take a look at some of our other Python blogs: