Using xlwt vs openpyxl - python

I need some help with openpyxl in PYTHON. I have been using xlwt quite successfully, but now I have some files (in MySQL Workbench) that contain more than 65,000 rows. I know I can create a CSV file, but XLSX is the preferred output. I am able to create a workbook using openpyxl, but I have not been successful placing the MySQL data into the table. The main portion of the program using xlwt is pretty straightforward (see below). I just cannot seem to figure out how to do the same thing using openpyxl. I've tried a number of different combinations and solutions. I just get stuck after the "for x in result:".
file_dest = "c:\home\test.xls"
result = dest.execute("select a, b, c, d from filea")
for x in result:
rw = rw + 1
sheet1 = book.add.sheet("Sheet 1")
row1 = sheet1.row(rw)
row1.write(1, x[0])
row1.write(1, x[1])
row1.write(1, x[2])
row1.write(1, x[3])
book.save(file_dest)

This is a good use case for using append():
Appends a group of values at the bottom of the current sheet.
If it’s a list: all values are added in order, starting from the first
column
import openpyxl
file_dest = "test.xlsx"
workbook = openpyxl.Workbook()
worksheet = workbook.get_active_sheet()
result = dest.execute("select a, b, c, d from filea")
for x in result:
worksheet.append(list(x))
workbook.save(file_dest)

A little example:
wb = Workbook(encoding='utf-8')
ws = wb.worksheets[0]
row = 2
ws.title = "Report"
ws.cell('A1').value = "Value"
ws.cell('B1').value = "Note"
for item in results:
ws.cell('A%d' % (row)).value = item[0]
ws.cell('B%d' % (row)).value = item[1]
row += 1
http://pythonhosted.org//openpyxl/

Related

python excel subtract with 2 worksheet

Is it possible to create a python script to automatic which is subtract cell value with 2 worksheet in one excel file?
I have checked some documents, and seem that use the method of pandas or openpyxl to do so. But I can't to do that. Do you have any suggestion to me? Many thanks.
Script:
from datetime import datetime
import pandas as pd
import openpyxl as xl;
currDateTime = datetime.now()
Sheet1 ="C:\\Users\\peter\\Downloads\\" + currDateTime.strftime('%Y%m%d') + "\\5250A" + "\\5250A.xlsx"
wb3 = xl.load_workbook(Sheet1)
ws3 = wb3.worksheets[0]
wb4 = xl.load_workbook(Sheet1)
ws4 = wb4.worksheets[1]
wb5 = xl.load_workbook(Sheet1)
ws5 = wb5.create_sheet("Done")
wb4.subtract(wb3)
wb5.save(str(Sheet1))
Expected Result:
Do so in excel coule be way easier I think. There could be a smarter way to write this code.
[NOTE] I just do the subsctraction cell by cell, so if there's any mismatch like same row but different dept.id or same col but different item will make errors. If you may meet this situation, you'll have a change some in the following code.
import openpyxl as xl
def get_row_values(worksheet):
"""
return data structure:
[
[A1, B1, C1, ...],
[A2, B2, C2, ...],
...
]
"""
result = []
for i in worksheet.rows:
row_data = []
for j in i:
row_data.append(j.value)
result.append(row_data)
return result
if __name__ == '__main__':
# load excel file
wb = xl.load_workbook('test1.xlsx')
ws1 = wb.worksheets[0]
ws2 = wb.worksheets[1]
# get data from the first 2 worksheets
ws1_rows = get_row_values(ws1)
ws2_rows = get_row_values(ws2)
# calculate and make a new sheet
ws_new = wb.create_sheet('Done')
# insert header
ws_new.append(ws1_rows[0])
for row in range(1, len(ws1_rows)):
# do the substract cell by cell
row_data = []
for column, value in enumerate(ws1_rows[row]):
if column == 0:
# insert first column
row_data.append(value)
else:
if ws1_rows[row][0] == ws2_rows[row][0]:
# process only when first column match
row_data.append(value - ws2_rows[row][column])
ws_new.append(row_data)
wb.save('test2.xlsx')
here's my sample excel file
first sheet:
second sheet:
generated sheet:

How can I concatenate multiple rows of excel data into one?

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.
So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = {}
#Go through unique ids
for key in unique_ids:
processed[key] = {}
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs = {}
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = {key:value for key, value in zip(headers, row)}
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = {key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"}
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.

How to filter column data using openpyxl

I am trying to apply a filter to an existing Excel file, and export it to another Excel file. I would like to extract rows that only contain the value 16, then export the table to another excel file (as shown in the picture below).
I have tried reading the openpyxl documentation multiple times and googling for solutions but I still can't make my code work. I have also attached the code and files below
import openpyxl
# Is use to create a reference of the Excel to wb
wb1 = openpyxl.load_workbook('test_data.xlsx')
wb2 = openpyxl.load_workbook('test_data_2.xlsx')
# Refrence the workbook to the worksheets
sh1 = wb1["data_set_1"]
sh2 = wb2["Sheet1"]
sh1.auto_filter.ref = "A:A"
sh1.auto_filter.add_filter_column(0, ["16"])
sh1.auto_filter.add_sort_condition("B2:D6")
sh1_row_number = sh1.max_row
sh1_col_number = sh1.max_column
rangeSelected = []
for i in range(1, sh1_row_number+1, 1):
rowSelected = []
for j in range(1, sh1_col_number+1, 1):
rowSelected.append(sh1.cell(row = i, column = j))
rangeSelected.append(rowSelected)
del rowSelected
for i in range(1, sh1_row_number+1, 1):
for j in range(1, sh1_col_number+1, 1):
sh2.cell(row = i, column = j).value = rangeSelected[i-1][j-1].value
wb1.save("test_data.xlsx")
wb2.save("test_data_2.xlsx")
The pictures shows what should be the desire result
The auto filter doesn't actually filter the data, it is just for visualization.
You probably want to filter while looping through the workbook. Please note with this code I assume you have the table headers already in the second workbook. It does not overwrite the data, it appends to the table.
import openpyxl
# Is use to create a reference of the Excel to wb
wb1 = openpyxl.load_workbook('test_data.xlsx')
wb2 = openpyxl.load_workbook('test_data_2.xlsx')
# Refrence the workbook to the worksheets
sh1 = wb1["data_set_1"]
sh2 = wb2["data_set_1"] # use same sheet name, different workbook
for row in sh1.iter_rows():
if row[0].value == 16: # filter on first column with value 16
sh2.append((cell.value for cell in row))
wb1.save("test_data.xlsx")
wb2.save("test_data_2.xlsx")

how to read (or parse) EXCEL comments using python

I have several excel files that use lots of comments for saving information.
For example, one cell has value 2 and there is a comment attached to the cell saying
"2008:2#2009:4". it seems that value 2 is for the current year (2010) value. The comment keeps all previous year values separated by '#'. I would like to create a dictionary to keep all this info like {2008:2, 2009:4, 2010:2} but I don't know how to parse (or read) this comment attached to the cell. Python excel readin module has this function (reading in comment)?
You can do this without an Excel COM object using openpyxl:
from openpyxl import load_workbook
workbook = load_workbook('/tmp/data.xlsx')
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
for row in worksheet.iter_rows():
for cell in row:
if cell.comment:
print(cell.comment.text)
The parsing of the comments itself can be done the same as with Steven Rumbalski's answer.
(example adapted from here)
Normally for reading from Excel, I would suggest using xlrd, but xlrd does not support comments. So instead use the Excel COM object:
from win32com.client import Dispatch
xl = Dispatch("Excel.Application")
xl.Visible = True
wb = xl.Workbooks.Open("Book1.xls")
sh = wb.Sheets("Sheet1")
comment = sh.Cells(1,1).Comment.Text()
And here's how to parse the comment:
comment = "2008:2#2009:4"
d = {}
for item in comment.split('#'):
key, val = item.split(':')
d[key] = val
Often, Excel comments are on two lines with the first line noting who created the comment. If so your code would look more like this:
comment = """Steven:
2008:2#2009:4"""
_, comment = comment.split('\n')
d = {}
for item in comment.split('#'):
key, val = item.split(':')
d[key] = val
After running the last posted code here, can you store that information later in a word document?
from openpyxl import load_workbook
workbook = load_workbook('/tmp/data.xlsx')
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
for row in worksheet.iter_rows():
for cell in row:
if cell.comment:
print(cell.comment.text)

How can I open an Excel file in Python?

How do I open a file that is an Excel file for reading in Python?
I've opened text files, for example, sometextfile.txt with the reading command. How do I do that for an Excel file?
Edit:
In the newer version of pandas, you can pass the sheet name as a parameter.
file_name = # path to file + file name
sheet = # sheet name or sheet number or list of sheet numbers and names
import pandas as pd
df = pd.read_excel(io=file_name, sheet_name=sheet)
print(df.head(5)) # print first 5 rows of the dataframe
Check the docs for examples on how to pass sheet_name: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
Old version:
you can use pandas package as well....
When you are working with an excel file with multiple sheets, you can use:
import pandas as pd
xl = pd.ExcelFile(path + filename)
xl.sheet_names
>>> [u'Sheet1', u'Sheet2', u'Sheet3']
df = xl.parse("Sheet1")
df.head()
df.head() will print first 5 rows of your Excel file
If you're working with an Excel file with a single sheet, you can simply use:
import pandas as pd
df = pd.read_excel(path + filename)
print df.head()
Try the xlrd library.
[Edit] - from what I can see from your comment, something like the snippet below might do the trick. I'm assuming here that you're just searching one column for the word 'john', but you could add more or make this into a more generic function.
from xlrd import open_workbook
book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
if name.endswith('2'):
sheet = book.sheet_by_name(name)
# Attempt to find a matching row (search the first column for 'john')
rowIndex = -1
for cell in sheet.col(0): #
if 'john' in cell.value:
break
# If we found the row, print it
if row != -1:
cells = sheet.row(row)
for cell in cells:
print cell.value
book.unload_sheet(name)
This isn't as straightforward as opening a plain text file and will require some sort of external module since nothing is built-in to do this. Here are some options:
http://www.python-excel.org/
If possible, you may want to consider exporting the excel spreadsheet as a CSV file and then using the built-in python csv module to read it:
http://docs.python.org/library/csv.html
There's the openpxyl package:
>>> from openpyxl import load_workbook
>>> wb2 = load_workbook('test.xlsx')
>>> print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']
>>> worksheet1 = wb2['Sheet1'] # one way to load a worksheet
>>> worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
>>> print(worksheet1['D18'].value)
3
>>> for row in worksheet1.iter_rows():
>>> print row[0].value()
You can use xlpython package that requires xlrd only.
Find it here https://pypi.python.org/pypi/xlpython
and its documentation here https://github.com/morfat/xlpython
This may help:
This creates a node that takes a 2D List (list of list items) and pushes them into the excel spreadsheet. make sure the IN[]s are present or will throw and exception.
this is a re-write of the Revit excel dynamo node for excel 2013 as the default prepackaged node kept breaking. I also have a similar read node. The excel syntax in Python is touchy.
thnx #CodingNinja - updated : )
###Export Excel - intended to replace malfunctioning excel node
import clr
clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
##AddReferenceGUID("{00020813-0000-0000-C000-000000000046}") ''Excel C:\Program Files\Microsoft Office\Office15\EXCEL.EXE
##Need to Verify interop for version 2015 is 15 and node attachemnt for it.
from Microsoft.Office.Interop import * ##Excel
################################Initialize FP and Sheet ID
##Same functionality as the excel node
strFileName = IN[0] ##Filename
sheetName = IN[1] ##Sheet
RowOffset= IN[2] ##RowOffset
ColOffset= IN[3] ##COL OFfset
Data=IN[4] ##Data
Overwrite=IN[5] ##Check for auto-overwtite
XLVisible = False #IN[6] ##XL Visible for operation or not?
RowOffset=0
if IN[2]>0:
RowOffset=IN[2] ##RowOffset
ColOffset=0
if IN[3]>0:
ColOffset=IN[3] ##COL OFfset
if IN[6]<>False:
XLVisible = True #IN[6] ##XL Visible for operation or not?
################################Initialize FP and Sheet ID
xlCellTypeLastCell = 11 #####define special sells value constant
################################
xls = Excel.ApplicationClass() ####Connect with application
xls.Visible = XLVisible ##VISIBLE YES/NO
xls.DisplayAlerts = False ### ALerts
import os.path
if os.path.isfile(strFileName):
wb = xls.Workbooks.Open(strFileName, False) ####Open the file
else:
wb = xls.Workbooks.add# ####Open the file
wb.SaveAs(strFileName)
wb.application.visible = XLVisible ####Show Excel
try:
ws = wb.Worksheets(sheetName) ####Get the sheet in the WB base
except:
ws = wb.sheets.add() ####If it doesn't exist- add it. use () for object method
ws.Name = sheetName
#################################
#lastRow for iterating rows
lastRow=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Row
#lastCol for iterating columns
lastCol=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Column
#######################################################################
out=[] ###MESSAGE GATHERING
c=0
r=0
val=""
if Overwrite == False : ####Look ahead for non-empty cells to throw error
for r, row in enumerate(Data): ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
if col.Value2 >"" :
OUT= "ERROR- Cannot overwrite"
raise ValueError("ERROR- Cannot overwrite")
##out.append(Data[0]) ##append mesage for error
############################################################################
for r, row in enumerate(Data): ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
ws.Cells[r+1+RowOffset,c+1+ColOffset].Value2 = col.__str__()
##run macro disbled for debugging excel macro
##xls.Application.Run("Align_data_and_Highlight_Issues")
import pandas as pd
import os
files = os.listdir('path/to/files/directory/')
desiredFile = files[i]
filePath = 'path/to/files/directory/%s'
Ofile = filePath % desiredFile
xls_import = pd.read_csv(Ofile)
Now you can use the power of pandas DataFrames!
This code worked for me with Python 3.5.2. It opens and saves and excel. I am currently working on how to save data into the file but this is the code:
import csv
excel = csv.writer(open("file1.csv", "wb"))

Categories