Excel parser stuck on one row - python

So I was making a quick script to loop through a bunch of sheets in an excel file (22 to be exact) and what I wanted to do was the following:
Open the excel sheet and open the sheet named "All" which contained a list of names and then loop through each name and do the following
To loop through all the other 22 sheets in the same workbook and look through each one for the name, which I knew was in the 'B' column.
If the name were to be found, I wanted to take all the columns in that row containing the data for that name and these columns were from A-H
Then copy and paste them next to the original name (same row) in the 'All sheet' while leaving a bit of a space between the original name and the others.
I wanted to do this for all 22 sheets and for the 200+ names listed in the 'All' sheet, my code is as follows:
import openpyxl, pprint
columns = ['A','B','C','D','E','F','G','H']
k = 10
x = 0
def colnum_string(n):
string = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
print("Opening Workbook...")
wb = openpyxl.load_workbook('FileName.xlsx')
sheet_complete = wb.get_sheet_by_name("All")
row_count_all = sheet_complete.max_row
for row in range(4, row_count_all+1):
k = 10
cell = 'B' + str(row)
print(cell)
name = sheet_complete[cell].value
for i in range(2, 23):
sheet = wb.get_sheet_by_name(str(1995 + i))
row_count = sheet.max_row
for row2 in range(2, row_count+1):
cell2 = 'B' + str(row2)
name2 = sheet[cell].value
if name == name2:
x = x + 1
for z in range(0,len(columns)):
k = k + 1
cell_data = sheet[columns[z] + str(row2)].value
cell_target = colnum_string(k) + str(row)
sheet_complete[cell_target] = cell_data
wb.save('Scimago Country Ranking.xlsx')
print("Completed " + str(x) + " Task(s)")
break
The problem is that it keeps looping with the first name only, so it goes through all the names but when it comes to copying and pasting the data, it just redoes the first name so in the end, I end up with all the names in the 'All' sheet and next to each one is the data for the first name repeated over and over. I can't see what's wrong with my code but forgive me if it's a silly mistake as I'm kind of a beginner in these excel parsing scripts. print statements were for testing reasons.
P.S I know I'm using a deprecated function and I will change that, I was just too lazy to do it since it seems to still work fine and if that's the problem then please let me know.

Related

Finding and Saving Blank Row Location in Python OpenPyXL

I am new to python and need some help understanding why my code continues to run past what I originally believed are the boundaries that I have set for rows and columns:
from openpyxl import load_workbook
wb = load_workbook("x_test.xlsx")
ws = wb.active
emptycellcount = 0
maxrow=50
maxcol=20
for col in ws.iter_cols(min_row = 0, max_row = maxrow):
for row in ws.iter_rows(min_col = 0, max_col = maxcol):
emptycellcount=0
for cell in row:
if cell.value == None:
emptycellcount += 1
print("The presence of an empty row is at " + str(cell.row) + " " + str(cell.column) + " " + str(emptycellcount))
elif cell.value != None:
emptycellcount = 0
emptyrow = cell.row
print("not empty " + str(cell.row) + " " + str(cell.column) + " " + str(emptycellcount))
break
Right now this code runs through the entire used range of my worksheet "x_text.xlsx" but I would like it to do 3 things:
only iterate through a maximum of 20 cells (columns) across 50 rows
stop running the code when an entirely empty row is found within the 50x20 cell range
save the location of the empty row to the variable "emptyrow" to be referenced again later
I have been working at this for a few days now and have yet to figure out a workable solution. Any help would be highly appreciated.
Thank you in advance!

Openpyxl: We found a problem with some content

I am getting the error message 'We found a problem with some content' opening a file I generated with openpyxl. The file is being generated by concatenating different xlsx files and adding additional formulas in further cells.
The problem is caused by a Formula with an if-condition I am writing into a cell (the second for loop is causing the excel error message).
That's the code:
import openpyxl as op
import glob
# Search for all xlsx files in directory and assign them to variable allfiles
allfiles = glob.glob('*.xlsx')
print('Following files are going to be included into the inventory: ' + str(allfiles))
# Create a workbook with a sheet called 'Input'
risk_inventory = op.load_workbook('./Report/Risikoinventar.xlsx', data_only = False)
input_sheet = risk_inventory['Input']
risk_inventory.remove(input_sheet)
input_sheet = risk_inventory.create_sheet()
input_sheet.title = 'Input'
r_maxrow = input_sheet.max_row + 1
# There is more code here which is not related to the problem
for i in range (2,r_maxrow):
if input_sheet.cell(row = i, column = 2).value == 'Top-Down':
input_sheet.cell(row = i, column = 20).value = '=IF(ISTEXT(H{}),0,IF(H{}<=1000000,1,IF(H{}<=2000000,2,IF(H{}<=4000000,3,IF(H{}<=8000000,4,IF(H{}>8000000,5,0))))))'.format(i,i,i,i,i,i)
elif input_sheet.cell(row = i, column = 2).value == 'Bottom-Up':
input_sheet.cell(row = i, column = 20).value = '=IF(ISTEXT(H{}),0,IF(H{}<=1000000,1,IF(H{}<=2000000,2,IF(H{}<=4000000,3,IF(H{}<=8000000,4,IF(H{}>8000000,5,0))))))'.format(i,i,i,i,i,i)
for i in range (2,r_maxrow):
if input_sheet.cell(row = i, column = 2).value == 'Top-Down':
input_sheet.cell(row = i, column = 21).value = '=IF(K{}="Sehr gering",1,IF(K{}="Gering",2,IF(K{}="Mittel",3,IF(K{}="Hoc",3,IF(K{}="Sehr hoch",3,0))))))'.format(i,i,i,i,i,i)
elif input_sheet.cell(row = i, column = 2).value == 'Bottom-Up':
input_sheet.cell(row = i, column = 21).value = '=IF(K{}="Sehr gering",1,IF(K{}="Gering",2,IF(K{}="Mittel",3,IF(K{}="Hoc",3,IF(K{}="Sehr hoch",3,0))))))'.format(i,i,i,i,i,i)
So depending on what information is in cell(row = i, column = 2) I want a specific formula in cell(row = i, column = 21). The first for loop works perfectly, second for loop causes the error message in excel and the formulas are not being pasted in)
As you probably already see I am trying to code with Python for a week an have never ever tried coding beforeā€¦
Many thanks in advance!
I've been having the same issue, and it was due to an incorrectly written formula. I found what was wrong by clicking "View" instead of "Delete" when opening the file.

Copying an entire column using OpenPyXL in Python 3

I'm trying to copy an entire column over using OpenPyXL. Google seems to offer a lot of examples using ranges, but not for an entire column.
I have a workbook with a single worksheet with a load of dates in column A and column JX (A contains monthly dates, JX contains quarterly dates). I want the monthly dates column (in A:A) to be copied over to each worksheet ending in 'M' in my target workbook, and the quarterly dates column (in JX:JX) to the worksheets ending in 'Q'.
However, for some reason the last nested for loop, for src, dst in zip(ws_base[monthRange], ws_target['A:A']): is only copying the first cell, and nothing else. It looks like I'm identifying the correct column with my monthRange and quarterRange strings, but Python isn't looping through the whole column despite the fact that I've got two ranges defined.
Does anyone have any ideas?
# Load the target workbook
targetwb = openpyxl.load_workbook('pythonOutput.xlsx')
# Load the source workbook
wb_base = openpyxl.load_workbook('Baseline_IFRS9_' + reportingMonth+'.xlsx')
# Go to row 9 and find "Geography:" to identify the relevant
# month and quarter date columns
sentinel = u"Geography:"
ws_base = wb_base.active
found = 0
dateColumns = []
for column in ws_base:
for cell in column:
if cell.value == sentinel:
dateColumns.append(cell.column) #
found + 1
if found == 2:
break
ColumnM = dateColumns[0]
ColumnQ = dateColumns[1]
print('Monthly col is ' + ColumnM)
print('Quarterly col is ' + ColumnQ)
IndexM = int(openpyxl.utils.column_index_from_string(str(ColumnM)))
IndexQ = int(openpyxl.utils.column_index_from_string(str(ColumnQ)))
print('Monthly col index is ' + str(IndexM))
print('Quarterly col index is ' + str(IndexQ))
print('Proceeding to paste into our new workbook...')
sheetLoop = targetwb.get_sheet_names()
for sheets in sheetLoop:
if sheets.endswith('Q'):
ws_target = targetwb[sheets]
quarterRange = ColumnQ + ':' + ColumnQ
print('Copying and pasting quarterly dates into: ' + sheets)
for src, dst in zip(ws_base[quarterRange], ws_target['A:A']):
dst.value = src.value
elif sheets.endswith('M'):
ws_target = targetwb[sheets]
monthRange = ColumnM + ':' + ColumnM
print('Copying and pasting monthly dates into: ' + sheets)
for src, dst in zip(ws_base[monthRange], ws_target['A:A']):
dst.value = src.value
targetwb.save('pythonOutput.xlsx')
Here's a simpler form of my problem.
import openpyxl
wb1 = openpyxl.load_workbook('pythonInput.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook('pythonOutput.xlsx')
ws2 = wb2.active
for src, dst in zip(ws1['A:A'], ws2['B:B']):
print( 'Printing from ' + str(src.column) + str(src.row) + ' to ' + str(dst.column) + str(dst.row))
dst.value = src.value
wb2.save('test.xlsx')
So the problem here is that the for loop only prints from A1 to B1. Shouldn't it be looping down across rows..?
When you load a new XLSX in a spreadsheet editor, you see lots and lots of empty cells in a grid. However, these empty cells are actually omitted from the file, and they will be only written once they have a non-empty value. You can see for yourself: XLSX is essentially a bunch of ZIP-compressed XMLs, which can be opened with any archive manager.
In a similar fashion, new cells in OpenPyXL are only created when you access them. The ws2['B:B'] range only contains one cell, B1, and zip stops when the shortest iterator is exhausted.
With this in mind, you can iterate through the source range and use explicit coordinates to save the values in correct cells:
import openpyxl
wb1 = openpyxl.load_workbook('pythonInput.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook('pythonOutput.xlsx')
ws2 = wb2.active
for cell in ws1['A:A']:
print('Printing from ' + str(cell.column) + str(cell.row))
ws2.cell(row=cell.row, column=2, value=cell.value)
wb2.save('test.xlsx')

Quickly count non empty cells in large excel sheet

I'm trying to determine how much data is missing from a large excel sheet. The following code takes a prohibitive amount of time to complete. I've seen similar questions, but I'm not sure how to translate the answer to this case. Any help would be appreciated!
import openpyxl
wb = openpyxl.load_workbook('C://Users/Alec/Documents/Vertnet master list.xlsx', read_only = True)
sheet = wb.active
lat = 0
loc = 0
ele = 0
a = openpyxl.utils.cell.column_index_from_string('CF')
b = openpyxl.utils.cell.column_index_from_string('BU')
c = openpyxl.utils.cell.column_index_from_string('BX')
print('Workbook loaded')
for x in range(2, sheet.max_row):
if sheet.cell(row = x, column = a).value:
lat += 1
if sheet.cell(row = x, column = b).value:
loc += 1
if sheet.cell(row = x, column = c).value:
ele += 1
print((x/sheet.max_row) * 100, '%')
print('Latitude: ', lat/sheet.max_row)
print('Location', loc/sheet.max_row)
print('Elevation', ele/sheet.max_row)
If you are simply trying to do the calc on a table on the sheet and not the entire sheet, you could make one adjustment to make it faster.
row = 1
Do Until IsEmpty(range("A1").offset(row,1).value)
if range("B"&row).value: lat += 1
if range("C"&row).value: loc += 1
if range("D"&row).value: ele += 1
row = row + 1
Loop
This would take you to the end of your defined table rather than the end of the whole sheet which is 90% of the reason it's taking you so long.
Hope this helps
Your problem is that, despite advice in the documentation to the contrary, you're using your own counters to access cells. In read-only mode each use of ws.cell() will force the worksheet to reparse the XML source for the worksheet. Simply use ws.iter_rows(min_col=a, max_col=c) to get the cells in the columns you're interested in.

Faster search method on the first empty cell in a column using openpyxl PYTHON 3.5

I am having a problem in searching a the first empty cell in a certain column
on a 40k lines .xlsx file. As the search goes farther, it becoming slower and slower. Is there a faster/instant search method in searching the first empty cell on a column?
wb = load_workbook(filename = dest_filename,read_only=True)
sheet_ranges1 = wb[name]
i = 1
x = 0
sam = 0
cc = 0
brgyst =Street+Brgy
entrylist = [TotalNoConfig,TotalNoChannel,Rsl,Mode,RslNo,Year,IssuedDate,Carrier,CaseNo,Site,brgyst,Municipality,Province,Region,Longitude1,Longitude2,Longitude3,Latitude1,Latitude2,Latitude3,ConvertedLong,ConvertedLat,License,Cos,NoS,CallSign,PTSVC,PTSVCCS,Tx,Rx] #The values to be inputted in the entire row after searching the last empty cell in column J
listX1 = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N', 'O','P','Q','T','U','V','R','X','Y','Z','AA','AB','AM','AN','AP','FL'] #The columns in the file
eter = 0
while(x != 1):
cellS = 'J'+str(i) #until there is no empty cell
if(sheet_ranges1[cellS].value is None): #if found empty cell, insert the values
x=1
book = load_workbook(filename = dest_filename)
sheet = book[name]
rangeof = int(len(entrylist))
while(cc<rangeof):
cells = listX1[cc]+str(i)
sheet[cells]= entrylist[cc]
cc=cc+1
else:
x=0
sam = sam+1
i=i+1
wb.save(dest_filename)
wb.close()
In read-only mode every cell lookup causes the worksheet to parsed again so you should always use ws.iter_rows() for your work.

Categories