How to Search and Replace in Excel File using Python

Do you need to search and replace a list of values in a big Excel file with many sheets?

If so, I'll show you the steps to search in Excel file - list of words and replace them. In this article you can find the exact cell(with packages like xlrd, xlwt and openpyxl) and partial cell match. At the end a new Excel file is generated with the replaced values.

Let's check the example data:

A B C
4321 3210 2100
1 0 0
2 1 0
3 2 1
4 3 2

we are going to search for 0 and 1 - and replace them with False and True.

A B C
4321 3210 2100
True False False
2 True False
3 2 True
4 3 2

You can check also this video:
Easily extract information from Excel with Python and Pandas

Python and Excel - search and replace with xlrd and xlwt

In this example we are going to use two Python packages: xlrd and xlwt.

Step 1: Read Excel file

The first package reads the Excel file and a given sheet by name or index:

import xlwt
import xlrd

# read Excel file and sheet by name
workbook = xlrd.open_workbook('/home/vanx/Documents/example1.xlsx')
sheet = workbook.sheet_by_name('Test')
sheet2 = workbook.sheet_by_index(2)

Step 2: Create new Excel file

The second package - xlwt - will be used to write the data into new Excel file:

new_workbook = xlwt.Workbook()
new_sheet = new_workbook.add_sheet('Test')

Step 3: Search and replace a cell in xlsx file

The next step is to define replacement pairs like: {1:True, 0:False}:

replacement = {1:True, 0:False}

Step 4: Search and replace a cell in xlsx file

Finally the code iterates over the rows and columns is controlled by:

  • ncols - number of columns in the selected sheet
  • nrows - number of rows in the selected sheet

This is the final part of the code:

# iterate over the rows
for i in range(sheet.nrows):
    print(i)

    data = [sheet.cell_value(i, col) for col in range(sheet.ncols)]

    for index, value in enumerate(data):

        if value in replacement.keys():
            new_sheet.write(i, index, str(replacement.get(value)))
        else:
            new_sheet.write(i, index, value)

new_workbook.save('example.xls')

where new_workbook.save('example.xlsx') saves the data into file - example.xlsx

Python and Excel - search and replace with openpyxl

Another Python package exists and can be used - openpyxl. It can do the same as the previous example. Difference is that only one module is used for reading and writing new Excel file:

import openpyxl
from openpyxl.utils.cell import get_column_letter

wb = openpyxl.load_workbook('/home/vanx/Documents/example1.xlsx')
wb.sheetnames
sheet = wb["Test"]
number_rows = sheet.max_row
number_columns = sheet.max_column

replacement = {'1':True, '0':False}


for i in range(number_columns):
    for k in range(number_rows):
        cell = str(sheet[get_column_letter(i+1)+str(k+1)].value)
        for key in replacement.keys():
            if str(cell) == key:
                newCell = replacement.get(key)
                sheet[get_column_letter(i+1)+str(k+1)] = str(newCell)

wb.save('example1.xlsx')

The code above reads file: example1.xlsx from folder ~/Documents and then produces a new Excel file in the current working directory. In the file there are several sheets - we are interested in the one named - Test.

Getting the number of rows and columns is done by:

  • sheet.max_row
  • sheet.max_column

Finally we iterate and search for the values. In case of a match then we will replace the cell with the new values.

Python and Excel - partial match

The previous examples work with exact text matches for the cell value. If you need to perform partial match than you can try with Python code like:

if str(cell[0]) == key:

This will search only the first character of the cell if it's exactly the searched value.

Or if the cell contains the key:

 if key in str(cell[0]):

Regex can be used as well. The problem for such partial matches is performance - it might take resources and time for large Excel files.

Share Tweet Send
0 Comments
Loading...