I have a requirement where i need to compare excel to excel and create a third excel with True(where column value matches) and False(in case the match fails) using Python.
Can someone please assist with the piece of code with explanation.
Much appreciated, thanks in advance.
If you could please specify what tools you plan on using that would be great. We can accomplish the task in python using the openpyxl library.
Assuming that you are using python 3 with openpyxl, and your files are located in directory "C:\Users\Me\files" and are called "file1.xlsx" and "file2.xlsx":
import openpyxl
from openpyxl.utils import get_column_letter
path = 'C:\\Users\\Me\\files'
# open xcel sheets
wb1 = openpyxl.load_workbook(path + 'file1.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook(path + 'file2.xlsx')
ws2 = wb2.active
# create new workbook
wb3 = openpyxl.Workbook()
ws3 = wb3.active
wb3.save(path + 'file3.xlsx')
# compare each element
for row in range(ws1.max_row):
for column in range(ws1.max_column):
column_letter = get_column_letter(column)
cell = column_letter + str(row)
if ws1[cell].value == ws2[cell].value:
ws3[cell].value = 'True'
else:
ws3[cell].value = 'False'
wb3.save(path + 'file3.xlsx')
Related
I'm a beginner in Python and I'm developing a program that take some data of a .xlsx and put it into an other .xlsx.
To do so decided to use openpyxl. Here is the beginning of my code :
path1 = "sourceFile.xlsx"
path2 = "targetFile.xlsx"
sheet1 = openpyxl.load_workbook(path1, data_only=True)
sheet2 = openpyxl.load_workbook(path2)
As you can see I use the "data_only=True" to only take the data of my source file. My problem is that with this solution, "None" is returned for few cells of the source file. When I delete the "data_only=True" parameter, the formula is returned, "=B28" in these case. It's not what I want by the way that B28 cell of the target file has not the same value as B28 cell of source file.
I already search for solutions but surprisedly found nothing. If you have any idea you're welcomed !
If B28's value in the original file is different than the output file, then the issue is likely with the code you're using to copy the cells. When asked how you're extracting the cells, you gave code for extracting the value of a single cell. How are you extracting ALL the cells? For-loop? If you shared that code, we can further analyze this problem.
I'm including code which copies values from one file to another, you should be able to tweak this to your needs.
from openpyxl import load_workbook, Workbook
## VERSION 1: Output will have formulas from WB1
WB1 = load_workbook('int_column.xlsx')
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
## VERSION 2: Output will have value displayed in cells in WB1
WB1 = load_workbook('int_column.xlsx', data_only=True)
WB1_WS1 = WB1['Sheet']
WB2 = Workbook()
WB2_WS1 = WB2.active # get the active sheet, so you don't need to create then delete one
# copy rows
for x, row in enumerate(WB1_WS1.rows):
if x < 100: # only copy first 100 rows
num_cells_in_row = len(row)
for y in range(num_cells_in_row):
WB2_WS1.cell(row=x + 1, column=y + 1).value = WB1_WS1.cell(row=x + 1, column=y + 1).value
WB2.save('copied.xlsx')
Please post more code if you need further assistance.
I have an assignment to do for my boring online class and I couldn't come out with an idea to do this thing. I'm told to calculate the ratio of four columns with this formula ratio = weight/heightlengthwidth. Bu i'm bad at using microsoft excel and ironically we haven't learnt anything related to that. So I remembered that there is a python library which works with excel sheets. So how could I calculate this ratio = Weight/HeightWidthLength by using openpyxl for every single row in this excel sheet easily ?
Though I've never used openpyxl library I tried to find a solution to your problem. If the spreadsheet you're working on looks like the one below then you should be able to work with this script.
Sample spreadsheet image
from openpyxl import load_workbook
# Modify filename and sheet name where the data is
workbook_filename = 'workbook.xlsx'
sheet_name = 'Sheet1'
wb = load_workbook(workbook_filename)
ws = wb[sheet_name]
# If the data is stored differently in your file, you have to modify
# this loop to suit your needs
for row in ws.iter_rows(min_row = 2, max_row = 3, max_col = 5):
row[4].value = row[0].value / (row[1].value * row[2].value * row[3].value)
wb.save('result.xlsx')
This question already has answers here:
OpenPyXL + How can I search for content in a cell in Excel, and if the content matches the search criteria update the content?
(2 answers)
Closed 4 years ago.
I would like to use OpenPyXL to search through a workbook, but I'm running into some issues that I'm hoping someone can help with.
Here are a few of the obstacles/to-dos:
I have an unknown number of sheets & cells
I want to search through the workbook and place the sheet names in an array
I want to cycle through each array item and search for cells containing a specific string
I have cells with UNC paths that reference an old server. I need to extract all the text after the server name within the UNC path, update the server name, and contatenate the remaining text back on the server name
e.g. \file-server\blah\blah\blah.xlsx; extract \file-server\; replace with \file-server1\; put remaining blah\blah\blah.xlsx after new name.
Save xlsx document
I'm new to Python, so would someone be able to point me in the right direction? Sample code is appreciated, because all I know how to do at this point is search through a known workbook, with known sheet names, and then print the data. I don't know how to include wildcards when iterating through worksheets & cells.
What I've done to show the contents of the cells:
from openpyxl import load_workbook, worksheet
def main():
#read workbook to get data
wb = load_workbook(filename = 'Book1_test.xlsx', use_iterators = True)
ws = wb.get_sheet_by_name(name = 'Sheet1')
#ws = wb.worksheets
#Iterate through worksheet and print cell contents
for row in ws.iter_rows():
for cell in row:
print cell.value
#Iterate through workbook & print worksheets
#for sheet in wb.worksheets:
# print sheet
if __name__ == '__main__':
main()
-----------------------Update-------------------------
I'm able to search through the cells and extract the server name from the cell, but I I'm not able to save the spreadsheet because I'm in read only mode. When I try to switch to optimized_write=True I get the error:
AttributeError: 'ReadOnlyCell' object has no attribute 'upper'
Here's my code:
from openpyxl import load_workbook, worksheet, Workbook
def main():
#read workbook to get data
wb = load_workbook(filename = 'Book1_test.xlsx', use_iterators = True)
ws = wb.get_sheet_by_name(name = 'Sheet1')
#ws = wb.worksheets
#Iterate through worksheet and print cell contents
for row in ws.iter_rows():
for cell in row:
cellContent = str(cell.value)
#Scans the first 14 characters of the string for the server name
if cellContent[:14] == '\\\\file-server\\':
#open workbook in write mode?
wb = Workbook(optimized_write=True)
ws = wb.create_sheet()
#update cell content
ws[cell] = '\\\\file-server1\\' + cellContent[14:]
print cellContent[:14]
#save workbooks
wb.save('Book1_test.xlsx')
if __name__ == '__main__':
main()
Does anyone know how to update cell contents?
Why don't you read the documentation? If you simply open the workbook with no flags you can edit it.
This is a duplicate of OpenPyXL + How can I search for content in a cell in Excel, and if the content matches the search criteria update the content?
I dont think you can update cell contents. You can open a file to read, or open a new file to write to.
I think you have to create a new workbook, and every cell that you read, if you choose to not modify it, write it out to your new workbook. In your sample code, you are overwriting wb (used to read) with the wb (used to write). Pull it out of the for loop, assign a different name to it.
You can update the content in a cell. You need to assign a value:
workBook = load_workbook('example.xlsx')
sheet = workBook.get_sheet_by_name('sheet')
a = sheet.cell(row=i,column=j)
a.value = 'nuevo valor'
and then save:
workBook.save('example.xlsx')
by rows something like this (as an idea) works:
sheet = wb.create_sheet(index = 1, title = 'Hipster') # name of the obj. sheet
for counter in range(1,11):
sheet['A'+ str(counter)] = 'Hola'
I am able to write into new xlsx workbook using
import xlsxwriter
def write_column(csvlist):
workbook = xlsxwriter.Workbook("filename.xlsx",{'strings_to_numbers': True})
worksheet = workbook.add_worksheet()
row = 0
col = 0
for i in csvlist:
worksheet.write(col,row, i)
col += 1
workbook.close()
but couldn't find the way to write in an existing workbook.
Please help me to write/update cells in existing workbook using xlswriter or any alternative.
Quote from xlsxwriter module documentation:
This module cannot be used to modify or write to an existing Excel
XLSX file.
If you want to modify existing xlsx workbook, consider using openpyxl module.
See also:
Modify an existing Excel file using Openpyxl in Python
Use openpyxl to edit a Excel2007 file (.xlsx) without changing its own styles?
you can use this code to open (test.xlsx) file and modify A1 cell and then save it with a new name
import openpyxl
xfile = openpyxl.load_workbook('test.xlsx')
sheet = xfile.get_sheet_by_name('Sheet1')
sheet['A1'] = 'hello world'
xfile.save('text2.xlsx')
Note that openpyxl does not have a large toolbox for manipulating and editing images. Xlsxwriter has methods for images, but on the other hand cannot import existing worksheets...
I have found that this works for rows...
I'm sure there's a way to do it for columns...
import openpyxl
oxl = openpyxl.load_workbook('File Loction Here')
xl = oxl.['SheetName']
x=0
col = "A"
row = x
while (row <= 100):
y = str(row)
cell = col + row
xl[cell] = x
row = row + 1
x = x + 1
You can do by xlwings as well
import xlwings as xw
for book in xlwings.books:
print(book)
If you have issue with writing into an existing xls file because it is already created you need to put checking part like below:
PATH='filename.xlsx'
if os.path.isfile(PATH):
print "File exists and will be overwrite NOW"
else:
print "The file is missing, new one is created"
...
and here part with the data you want to add
How do I open a file that is an Excel file for reading in Python?
I've opened text files, for example, sometextfile.txt with the reading command. How do I do that for an Excel file?
Edit:
In the newer version of pandas, you can pass the sheet name as a parameter.
file_name = # path to file + file name
sheet = # sheet name or sheet number or list of sheet numbers and names
import pandas as pd
df = pd.read_excel(io=file_name, sheet_name=sheet)
print(df.head(5)) # print first 5 rows of the dataframe
Check the docs for examples on how to pass sheet_name: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
Old version:
you can use pandas package as well....
When you are working with an excel file with multiple sheets, you can use:
import pandas as pd
xl = pd.ExcelFile(path + filename)
xl.sheet_names
>>> [u'Sheet1', u'Sheet2', u'Sheet3']
df = xl.parse("Sheet1")
df.head()
df.head() will print first 5 rows of your Excel file
If you're working with an Excel file with a single sheet, you can simply use:
import pandas as pd
df = pd.read_excel(path + filename)
print df.head()
Try the xlrd library.
[Edit] - from what I can see from your comment, something like the snippet below might do the trick. I'm assuming here that you're just searching one column for the word 'john', but you could add more or make this into a more generic function.
from xlrd import open_workbook
book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
if name.endswith('2'):
sheet = book.sheet_by_name(name)
# Attempt to find a matching row (search the first column for 'john')
rowIndex = -1
for cell in sheet.col(0): #
if 'john' in cell.value:
break
# If we found the row, print it
if row != -1:
cells = sheet.row(row)
for cell in cells:
print cell.value
book.unload_sheet(name)
This isn't as straightforward as opening a plain text file and will require some sort of external module since nothing is built-in to do this. Here are some options:
http://www.python-excel.org/
If possible, you may want to consider exporting the excel spreadsheet as a CSV file and then using the built-in python csv module to read it:
http://docs.python.org/library/csv.html
There's the openpxyl package:
>>> from openpyxl import load_workbook
>>> wb2 = load_workbook('test.xlsx')
>>> print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']
>>> worksheet1 = wb2['Sheet1'] # one way to load a worksheet
>>> worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
>>> print(worksheet1['D18'].value)
3
>>> for row in worksheet1.iter_rows():
>>> print row[0].value()
You can use xlpython package that requires xlrd only.
Find it here https://pypi.python.org/pypi/xlpython
and its documentation here https://github.com/morfat/xlpython
This may help:
This creates a node that takes a 2D List (list of list items) and pushes them into the excel spreadsheet. make sure the IN[]s are present or will throw and exception.
this is a re-write of the Revit excel dynamo node for excel 2013 as the default prepackaged node kept breaking. I also have a similar read node. The excel syntax in Python is touchy.
thnx #CodingNinja - updated : )
###Export Excel - intended to replace malfunctioning excel node
import clr
clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
##AddReferenceGUID("{00020813-0000-0000-C000-000000000046}") ''Excel C:\Program Files\Microsoft Office\Office15\EXCEL.EXE
##Need to Verify interop for version 2015 is 15 and node attachemnt for it.
from Microsoft.Office.Interop import * ##Excel
################################Initialize FP and Sheet ID
##Same functionality as the excel node
strFileName = IN[0] ##Filename
sheetName = IN[1] ##Sheet
RowOffset= IN[2] ##RowOffset
ColOffset= IN[3] ##COL OFfset
Data=IN[4] ##Data
Overwrite=IN[5] ##Check for auto-overwtite
XLVisible = False #IN[6] ##XL Visible for operation or not?
RowOffset=0
if IN[2]>0:
RowOffset=IN[2] ##RowOffset
ColOffset=0
if IN[3]>0:
ColOffset=IN[3] ##COL OFfset
if IN[6]<>False:
XLVisible = True #IN[6] ##XL Visible for operation or not?
################################Initialize FP and Sheet ID
xlCellTypeLastCell = 11 #####define special sells value constant
################################
xls = Excel.ApplicationClass() ####Connect with application
xls.Visible = XLVisible ##VISIBLE YES/NO
xls.DisplayAlerts = False ### ALerts
import os.path
if os.path.isfile(strFileName):
wb = xls.Workbooks.Open(strFileName, False) ####Open the file
else:
wb = xls.Workbooks.add# ####Open the file
wb.SaveAs(strFileName)
wb.application.visible = XLVisible ####Show Excel
try:
ws = wb.Worksheets(sheetName) ####Get the sheet in the WB base
except:
ws = wb.sheets.add() ####If it doesn't exist- add it. use () for object method
ws.Name = sheetName
#################################
#lastRow for iterating rows
lastRow=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Row
#lastCol for iterating columns
lastCol=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Column
#######################################################################
out=[] ###MESSAGE GATHERING
c=0
r=0
val=""
if Overwrite == False : ####Look ahead for non-empty cells to throw error
for r, row in enumerate(Data): ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
if col.Value2 >"" :
OUT= "ERROR- Cannot overwrite"
raise ValueError("ERROR- Cannot overwrite")
##out.append(Data[0]) ##append mesage for error
############################################################################
for r, row in enumerate(Data): ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
ws.Cells[r+1+RowOffset,c+1+ColOffset].Value2 = col.__str__()
##run macro disbled for debugging excel macro
##xls.Application.Run("Align_data_and_Highlight_Issues")
import pandas as pd
import os
files = os.listdir('path/to/files/directory/')
desiredFile = files[i]
filePath = 'path/to/files/directory/%s'
Ofile = filePath % desiredFile
xls_import = pd.read_csv(Ofile)
Now you can use the power of pandas DataFrames!
This code worked for me with Python 3.5.2. It opens and saves and excel. I am currently working on how to save data into the file but this is the code:
import csv
excel = csv.writer(open("file1.csv", "wb"))