How do I find the max row in Openpyxl

How do I find the max row in Openpyxl - python

The max_row function returns a value higher than it should be (the largest row that has a value in it is row 7, but max_row returns 10), and if I try iterating through a column to find the first row that has nothing in it I get the same value as max_row.

This would be easier to understand if you work with excel on java.
Excel cell have properties which define them as active or inactive. If you enter a value to a cell then delete the value, the cell still remains active.
max_row returns the row number of the last active cell, hence you get 10 rather than 7 even if the sheet now have data only till row 7 it may once have data till 10.
Manually you can clear the cell (Editing->Clear->Clear All) for the cell in excel making it inactive again. Not sure how to do the same via code in python.

Related

How to delete specific rows in excel with openpyxl python if condition is met

Using openpyxl I am creating python script that will loop through the rows of data and find rows in which some of the column are empty - these will be deleted. The range of rows is 3 to 1800.
I am not excatly sure how to delete these row - please see code I have come up with so far.
What I was trying to achieve is to iterate through the rows and check if columns 4, 7 values are set to None. If True I wanted to return row number into suitable collection (need advise which one would be best for this) and then create another loop that would delete specific row number reversed as I don't want change the structure of the file by deleting rows with data.
I believe there may be easier function to do this but could not find this particular answer.
for i in worksheet.iter_rows(min_row=3, max_row=1800):
emp_id = i[4]
full_name = i[7]
if emp_id.value == None and full_name.value == None:
print(worksheet[i].rows)
rows_to_delete.append(i)

Your iteration looks good.
OpenPyXl offers worksheet.delete_rows(idx, amt=1) where idx is the index of the first row to delete (1-based) and amt is the amount of rows to delete beyond that index.
So worksheet.delete_rows(3, 4) will delete rows 3, 4, 5, and 6 and worksheet.delete_rows(2, 1) will just delete row 2.
In your case, you'd probably want to do something like worksheet.delete_rows(i, 1).

Updating cells in a range with a for loop

I'm pretty new to coding so apologies if this is easy.
I have two columns in google sheets and I want to add a formula into a third column that is something like this:
=(E3*90)+(F3*10) - the values in the columns are grades and the 90 and 10 are weightings that are fixed.
I created a for loop to try and iterate through a range(3,90) as as it updates each cell in the column.
It prints the formula in every cell but it's only the last iteration '=(E89*90)+(F89*10)'
I managed to get this working by adding report.update_acell('E'+str(i),'=(E'+str(i)+'*90)+(F'+str(i)+'*10)') to the for loop but this create too many calls and causes problems.
sh = client.open("grading")
report = sh.worksheet("Report")
weighted = report.range('G3:G89')
for cell in weighted:
for i in range(3,90):
cell.value = '=(E'+str(i)+'*90)+(F'+str(i)+'*10)'
report.update_cells(weighted, value_input_option='USER_ENTERED')
What I'd like to see is every cell in the 'weighted' range be updated with a formula that looks at the two cells next to them and adds them into the formula so that a result is visible in weighted column.
eg.
row 3 should be =(E3*90)+(F3*10)
row 4 should be =(E4*90)+(F4*10) and so on until the range is completed.

I fixed this after a lot of trial and error. For anyone who is trying to do the same here is my solution:
sh = client.open("grading")
report = sh.worksheet("Report")
weighted = report.range('G3:G89')
for i, cell in enumerate(weighted,3):
cell.value = '=(E'+str(i)+'*90)+(F'+str(i)+'*10)'
report.update_cells(weighted, value_input_option='USER_ENTERED')

Excel autofilter range: keep the range unchanged / avoid that rows are added automatically

I have an issue to limit the Excel autofilter to the set range and display a sticky bottom row below that range...
This works well (I am using Python xlsxwriter):
worksheet.autofilter('A1:D111')
It results in a filter list range in Excel (Office 365) of $A$1:$D$111.
However, if I write a cell below the autofilter range with:
worksheet.write(111, 3, 'Total filtered selection', format_string) #adds string to Excel row 112
Then this row is also included in the filterrange (the filter now ends at $D112 for some reason... see picture).
Due to this the bottom row is not sticking to the bottom of the selection on changing the filter, which is what I wanted in order to show a total for the selection (using =SUBTOTAL(101, E1:E111) which only includes filtered rows as intended).
What am I doing wrong? Thanks!

By using the same range for a chart series the autofilter range remains unchanged:
As dwirony suggested, it seems to be standard behavior for Excel to add new data (rows) to the autofilter. This way the filter range is extended with my bottom row showing subtotals and the row is hidden when the filter is reapplied.
However, if you apply a chart series to the same range of cells as the autofilter range then the autofilter remains unchanged! I.e. my autofilter range was 'A1:D111' and changed to 'A1:D112' on adding content to row 112. However, if I create a chart series for range 'A1:D111' and add content to row 112 then the autofilter range will remain unchanged.
PS note: I also tried to keep the range fixed by defining a named range (without using it) but this does not help / the filter still adds new rows outside the named range automatically. In xlsx writer:
workbook.define_name('Filterrange', '={}!$A$1:$D$111'

Openpyxl max_row and max_column wrongly reports a larger figure

My query is to do with a function that is part of a parsing script Im developing. I am trying to write a python function to find the column number corresponding to a matched value in excel. The excel has been created on the fly with openpyxl, and it has the first row (from 3rd column) headers that each span 4 columns merged into one. In my subsequent function, I am parsing some content to be added to the columns corresponding to the matching headers. (Additional info: The content I'm parsing is blast+ output. I'm trying to create a summary spreadsheet with the hit names in each column with subcolumns for hits, gaps, span and identity. The first two columns are query contigs and its length. )
I had initially written a similar function for xlrd and it worked. But when I try to rewrite it for openpyxl, I find that the max_row and max_col function wrongly returns a larger number of rows and columns than actually present. For instance, I have 20 rows for this pilot input, but it reports it as 82.
Note that I manually selected the empty rows & columns and right clicked and deleted them, as advised elsewhere in this forum. This didn't change the error.
def find_column_number(x):
col = 0
print "maxrow = ", hrsh.max_row
print "maxcol = ", hrsh.max_column
for rowz in range(hrsh.max_row):
print "now the row is ", rowz
if(rowz > 0):
pass
for colz in range(hrsh.max_column):
print "now the column is ", colz
name = (hrsh.cell(row=rowz,column=colz).value)
if(name == x):
col = colz
return col
The issue with max_row and max_col, has been discussed here https://bitbucket.org/openpyxl/openpyxl/issues/514/cell-max_row-reports-higher-than-actual I applied the suggestion here. But the max_row is still wrong.
for row in reversed(hrsh.rows):
values = [cell.value for cell in row]
if any(values):
print("last row with data is {0}".format(row[0].row))
maxrow = row[0].row
I then tried the suggestion at https://www.reddit.com/r/learnpython/comments/3prmun/openpyxl_loop_through_and_find_value_of_the/, and tried to get the column values. Once, again the script takes into account the empty columns and reports a higher number columns than actually present.
for currentRow in hrsh.rows:
for currentCell in currentRow:
print(currentCell.value)
Can you please help me resolve this error, or suggest another method to achieve my aim?

As noted in the bug report you linked to there's a difference between a sheet's reported dimensions and whether these include empty rows or columns. If max_row and max_column are not reporting what you want to see then you will need to write your own code to find the first completely empty. The most efficient way, of course, would be to start from max_row and work backwards but the following is probably sufficient:
for max_row, row in enumerate(ws, 1):
if all(c.value is None for c in row):
break

I confirm the bug found by the OP. I found newer posts reporting max_row being too large.
This bug cannot be fixed.
In my case, it appears when I set the value of all cells in a worksheet to None.
After this operation, the worksheet still reports the old dimensions.
A call to ws.calculate_dimensions() does not change anything.
Closing and restarting excel still has openpyxl report the same wrong dimensions.
This is a problem because ws.append() starts at ws.max_row, and there is no way to override this behaviour. You end up with a worksheet that is blank and then, somewhere down, the data you appended appears.
The only way I found out that remedies this bug is to delete entire rows by hand in excel. openpyxl then shows the correct max_row.
I found out that this is linked to the member ws._cells not being empty as it should after setting all cells to None. However, the user cannot delete this dictionary as it is a private member.

I have the same behaviour with the latest version 3.0.3 of openpyxl. I use an XLSX file as a template (created from a XLS file), open it, add some data then save it with a different name. I find out that max_row is set to 49 and I don't know why.
However after reading in the online documentation https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html this line:
Do not create worksheets yourself, use
openpyxl.workbook.Workbook.create_sheet() instead
I created my XLSX template directly from openpyxl simply as follows:
wb = openpyxl.Workbook()
wb.save(filename="template.xslx")
It works fine now (max_row=1). Hope it helps.

When using openpyxl max_row function to get the maximum rows containing the data in the sheet, sometimes it even counts the empty rows, this is because the max_row function returns the maximum row index of the sheet, not the count of rows containing the data.
Example: Let's say an excel/google-sheet file is created with 10 rows of data and 5 rows of data are removed, the max_row function of openpyxl returns maximum rows as 10, as the maximum row index of file will be 10, as the file had contained 10 rows initially.
So to get the maximum rows containing the data in openpyxl
def get_maximum_rows(*, sheet_object):
rows = 0
for max_row, row in enumerate(sheet_object, 1):
if not all(col.value is None for col in row):
rows += 1
return rows
import openpyxl
workbook = openpyxl.load_workbook(<filepath>)
sheet_object = workbook.active
max_rows = get_maximum_rows(sheet_object=sheet_object)

Today I encountered the same. I edited the .xlsx file which I'm using in openpyxl. I deleted all values from the extreme right side column and found that max_column not giving exact max_column. Then I deleted the columns where the cell values were previously deleted (right-click on column 'ID' and delete). Now I find it is reporting correct value.

I used Dharman's approach and solved the problem.
I had an Excel file with more than 100k rows. I had deleted the duplications in this file.
At first, the max_row reported the total row number before the deletion.
I used workbook.save(filename='another_filename.xlsx") method to save the original Excel file to a new one.
Then I used the openpyxl to open the new file (another_filanem.xlsx). The max_row reports the correct number now.

in general max_row and max_col will make your script so slow to run, maybe it is better to detect a None and store the row or col in case.

Here is how I find the max column and max row by simply looping through the Excel sheet. By using this code, you can compare both the result from the Python and the loop.
from openpyxl import load_workbook
wb = lw("Test.xlsx")
sheet = wb["Sheet 1"]
print("Python defined max_column " + str(sheet.max_column))
print("Python defined max_row " + str(sheet.max_row))
def get_maximum_cols():
for i in range(1, 20000):
if sheet.cell(row=2, column= i).value == None:
max_col = i
break
return max_col
def get_maximum_rows():
for i in range(1, 20000):
if sheet.cell(row=i, column = 2).value == None:
max_row = i
break
return max_row
max_cols = get_maximum_cols()
max_rows = get_maximum_rows()
print('max column ' + str(max_cols))
print('max row ' + str(max_rows))
wb.save("Test.xlsx")

Updating rows in a SQLite database using Python

I have the following problem:
I want to update an existing SQLite database row by row. What's happening now is that the iterator updates all existing rows with the last assigned value of dbdata.
I don't want that.
I want update row 1 with the first assigned value of dbdata. Then take iterator shall go "up" again, get the next value and the updating should go on to the next row.
Obviously there is a problem with the logic but I cannot get my head around it.
Whats happening now is that the rows are updated now which leaves me with the last assigned value of dbdatafor all rows. I only want one row to be updated per iteration.
How do I tell Python to always "go one row down"? Can someone give a hint? I am not looking for a complete solution here. My current code is as follows:
for row in dbconnector:
print (row)
dbdata = langid.classify("{}".format(row))
print (dbdata)
connector.execute('''update SOMEDB set test1=? , test2=?''',(dbdata[-2], dbdata[-1]))
I am working with a SQLite Database and Python 3.3.

The reason all your data is set to the last dbdata is because your update isn't restricted to a single row so on each iteration all the rows are set to whatever dbdata you just processed. To restrict your update use a where clause so the only row affected is the one you want.

Solved it. Thanks for all the input!
n = 0
for row in dbconnector:
print (row)
dbdata = langid.classify("{}".format(row))
print (dbdata)
for amount in row:
n += 1
print (n)
connector.execute('''update SOMEDB set test1=? , test2=? where rowid == ?''',(dbdata[-2], dbdata[-1], n))
That works! It alters with every iteration the number of the rowid.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I find the max row in Openpyxl - python

The max_row function returns a value higher than it should be (the largest row that has a value in it is row 7, but max_row returns 10), and if I try iterating through a column to find the first row that has nothing in it I get the same value as max_row.

Related

How to delete specific rows in excel with openpyxl python if condition is met

Updating cells in a range with a for loop

Excel autofilter range: keep the range unchanged / avoid that rows are added automatically

Openpyxl max_row and max_column wrongly reports a larger figure

Updating rows in a SQLite database using Python

Categories

Resources