Merging Specific Cells in an Excel Sheet with Python

Merging Specific Cells in an Excel Sheet with Python - python

I've been trying to merge cells that meet specific criteria with the cell next to it via a loop, but I'm not quite sure how to go about it.
For example, starting at row 7, if the cell has the word "Sample" in it, I want it to merge with the cell in the column next to it and I want to keep doing that until I get to the end of that row.
I'm currently using openpyxl for this.
Here is what I've tried (it does not work):
wb = load_workbook('Test.xlsx')
ws = wb.active
worksheet = wb['Example']
q_cells = []
for row_cells in worksheet.iter_rows(min_row = 7):
for cell in row_cells:
if cell.value == 'Sample':
q_cells.append(cell.coordinate)
for item in q_cells:
worksheet.merge_cells(item:item+1)
wb.save('merging.xlsx')
I'm not quite sure how best to proceed with this code. Any help would be appreciated!

merge_cells takes a string (eg: "A2:A8") or a set of values. From the docs:
>>> ws.merge_cells('A2:D2')
>>> ws.unmerge_cells('A2:D2')
>>>
>>> # or equivalently
>>> ws.merge_cells(start_row=2, start_column=1, end_row=4, end_column=4)
>>> ws.unmerge_cells(start_row=2, start_column=1, end_row=4, end_column=4)
Source: https://openpyxl.readthedocs.io/en/stable/usage.html
It sounds like you will want to find your first cell and your last cell, and merge as such (here I'm using f-strings):
ws.merge_cells(f'{first_cell.coordinate}:{last_cell.coordinate}')
Merged cells in openpyxl change from type 'Cell' to type 'MultiCellRange', which is specified as a particular range of cell coordinates. Openpyxl will let you overlap merge ranges without throwing an error, but Excel won't let you open the resulting file without a warning (and probably removing the later merges). If you want to merge, you have to specify the whole range.

Related

How can I pull data from rows based on the presence of a specific character in a specific column with Google Sheets?

I'm sending API calls to Google sheets to retrieve information like so:
gc = gspread.authorize(credentials)
def grab_available_row(wks):
str_list = list(filter(None, wks.col_values(17)))
return str(len(str_list)+1)
wks = gc.open("test").worksheet("Logs")
grab_row = grab_available_row(wks)
try:
GrabRequestTest = wks.acell("B{}".format(grab_row)).value
except:
pass
try:
print(GrabRequestTest)
ctypes.windll.user32.MessageBoxW(0, "DONE!!!", "DONE!!!", 1)
sys.exit()
except:
pass
With this, I can retrieve information in any row if there is no value present in column #17. In other words, this essentially reads from the first available row without anything in column #17. If I put an X in column 17, it will read the row below it. This isn't exactly what I'm looking for.
I'd like to be able to print all values in a row where a specific character like X is present in column 17, and ignore all other rows. I'd then take the data from each row with X present in column 17 and use mail merge to generate a bunch of .docx files. I can easily figure out the second part. Anybody know how to accomplish the first part? (print values in a specific row where X is present in column 17)

From I'd like to be able to print all values in a row where a specific character like X is present in column 17, and ignore all other rows. I'd then take the data from each row with X present in column 17 and use mail merge to generate a bunch of .docx files. I can easily figure out the second part. Anybody know how to accomplish the first part? (print values in a specific row where X is present in column 17), I believe your goal in this question is as follows.
You want to retrieve the filtered rows by the specific value at the column 17 (it's column "Q".).
You want to achieve this using gspread for python.
In this case, how about the following modification?
Modified script:
gc = gspread.authorize(credentials)
wks = gc.open("test").worksheet("Logs")
search = "X" # Please set the search value you expect.
values = [r for r in wks.get_all_values() if r[16] == search]
print(values)
When this script is run, the rows that the column "Q" is the value of search are retrieved as a 2-dimensional array.
Added:
From the following reply,
This is almost it! How can you print by column only? like, I only want the value from column 3 from the array.. print(values[4]) doesn't seem to work.
In this case, how about the following sample script?
Sample script:
gc = gspread.authorize(credentials)
wks = gc.open("test").worksheet("Logs")
search = "X" # Please set the search value you expect.
col = 3 # From your reply, the values of the column "C" is retrieved.
values = [r[col - 1] for r in wks.get_all_values() if r[16] == search]
print(values)

Append data from list to excel using openpyxl

I am using openpyxl to work with excel on python.
I have a list i want to add each value inside it in excel file, my current code:
for y in myzoo:
loo1 = str(y)
c5a = my_sheet.cell(row= 21 , column = 3)
c5a.value = loo1
myzoo is the list (its originally a pyodbc.Row)
and i convert each entry to string, then save it to excel file, the problem is currently it save only last one overwriting all earlier values, i want to do one of two: save each value in next empty cell in row, or even (which less preferable) saving all the exported data into the cell without deleting earlier ones, thanks.

I think you can just do something like this:
column = 3 # start column
while myzoo:
c5a = my_sheet.cell(row=21, column=column)
if not c5a.value:
c5a.value = str(myzoo.pop(0))
column += 1
in case you need to preserve myzoo - you will need to copy it. (temp = myzoo.copy())

openpyxl iterate through rows and apply formula

I am trying to iterate through the rows of a particular column in an Excel worksheet, apply a formula, and save the output. I'm struggling to get my code right and am not sure where to go next.
My code so far:
import openpyxl
wb = openpyxl.load_workbook('test-in.xlsx')
sheet = wb.worksheets[2]
maxRow = sheet.max_row
for row in range(2, maxRow)
wb.save('test-out.xlsx')
So I'm not clear how to write my for loop to write the results of applying the =CLEAN(D2) formula, in column E. I can apply the formula to a single cell with:
sheet['I2'] = '=CLEAN(D2)'
However I'm not sure how I can incorporate this into my for loop!
Any help much appreciated...

Try this (max_row_num is yours maxRow - in Python we usually do not use cameCase for variables):
for row_num in range(2, max_row_num):
sheet['E{}'.format(row_num)] = '=CLEAN(D{})'.format(row_num)

This is covered in the documentation: http://openpyxl.readthedocs.io/en/latest/tutorial.html#accessing-many-cells

Openpyxl max_row and max_column wrongly reports a larger figure

My query is to do with a function that is part of a parsing script Im developing. I am trying to write a python function to find the column number corresponding to a matched value in excel. The excel has been created on the fly with openpyxl, and it has the first row (from 3rd column) headers that each span 4 columns merged into one. In my subsequent function, I am parsing some content to be added to the columns corresponding to the matching headers. (Additional info: The content I'm parsing is blast+ output. I'm trying to create a summary spreadsheet with the hit names in each column with subcolumns for hits, gaps, span and identity. The first two columns are query contigs and its length. )
I had initially written a similar function for xlrd and it worked. But when I try to rewrite it for openpyxl, I find that the max_row and max_col function wrongly returns a larger number of rows and columns than actually present. For instance, I have 20 rows for this pilot input, but it reports it as 82.
Note that I manually selected the empty rows & columns and right clicked and deleted them, as advised elsewhere in this forum. This didn't change the error.
def find_column_number(x):
col = 0
print "maxrow = ", hrsh.max_row
print "maxcol = ", hrsh.max_column
for rowz in range(hrsh.max_row):
print "now the row is ", rowz
if(rowz > 0):
pass
for colz in range(hrsh.max_column):
print "now the column is ", colz
name = (hrsh.cell(row=rowz,column=colz).value)
if(name == x):
col = colz
return col
The issue with max_row and max_col, has been discussed here https://bitbucket.org/openpyxl/openpyxl/issues/514/cell-max_row-reports-higher-than-actual I applied the suggestion here. But the max_row is still wrong.
for row in reversed(hrsh.rows):
values = [cell.value for cell in row]
if any(values):
print("last row with data is {0}".format(row[0].row))
maxrow = row[0].row
I then tried the suggestion at https://www.reddit.com/r/learnpython/comments/3prmun/openpyxl_loop_through_and_find_value_of_the/, and tried to get the column values. Once, again the script takes into account the empty columns and reports a higher number columns than actually present.
for currentRow in hrsh.rows:
for currentCell in currentRow:
print(currentCell.value)
Can you please help me resolve this error, or suggest another method to achieve my aim?

As noted in the bug report you linked to there's a difference between a sheet's reported dimensions and whether these include empty rows or columns. If max_row and max_column are not reporting what you want to see then you will need to write your own code to find the first completely empty. The most efficient way, of course, would be to start from max_row and work backwards but the following is probably sufficient:
for max_row, row in enumerate(ws, 1):
if all(c.value is None for c in row):
break

I confirm the bug found by the OP. I found newer posts reporting max_row being too large.
This bug cannot be fixed.
In my case, it appears when I set the value of all cells in a worksheet to None.
After this operation, the worksheet still reports the old dimensions.
A call to ws.calculate_dimensions() does not change anything.
Closing and restarting excel still has openpyxl report the same wrong dimensions.
This is a problem because ws.append() starts at ws.max_row, and there is no way to override this behaviour. You end up with a worksheet that is blank and then, somewhere down, the data you appended appears.
The only way I found out that remedies this bug is to delete entire rows by hand in excel. openpyxl then shows the correct max_row.
I found out that this is linked to the member ws._cells not being empty as it should after setting all cells to None. However, the user cannot delete this dictionary as it is a private member.

I have the same behaviour with the latest version 3.0.3 of openpyxl. I use an XLSX file as a template (created from a XLS file), open it, add some data then save it with a different name. I find out that max_row is set to 49 and I don't know why.
However after reading in the online documentation https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html this line:
Do not create worksheets yourself, use
openpyxl.workbook.Workbook.create_sheet() instead
I created my XLSX template directly from openpyxl simply as follows:
wb = openpyxl.Workbook()
wb.save(filename="template.xslx")
It works fine now (max_row=1). Hope it helps.

When using openpyxl max_row function to get the maximum rows containing the data in the sheet, sometimes it even counts the empty rows, this is because the max_row function returns the maximum row index of the sheet, not the count of rows containing the data.
Example: Let's say an excel/google-sheet file is created with 10 rows of data and 5 rows of data are removed, the max_row function of openpyxl returns maximum rows as 10, as the maximum row index of file will be 10, as the file had contained 10 rows initially.
So to get the maximum rows containing the data in openpyxl
def get_maximum_rows(*, sheet_object):
rows = 0
for max_row, row in enumerate(sheet_object, 1):
if not all(col.value is None for col in row):
rows += 1
return rows
import openpyxl
workbook = openpyxl.load_workbook(<filepath>)
sheet_object = workbook.active
max_rows = get_maximum_rows(sheet_object=sheet_object)

Today I encountered the same. I edited the .xlsx file which I'm using in openpyxl. I deleted all values from the extreme right side column and found that max_column not giving exact max_column. Then I deleted the columns where the cell values were previously deleted (right-click on column 'ID' and delete). Now I find it is reporting correct value.

I used Dharman's approach and solved the problem.
I had an Excel file with more than 100k rows. I had deleted the duplications in this file.
At first, the max_row reported the total row number before the deletion.
I used workbook.save(filename='another_filename.xlsx") method to save the original Excel file to a new one.
Then I used the openpyxl to open the new file (another_filanem.xlsx). The max_row reports the correct number now.

in general max_row and max_col will make your script so slow to run, maybe it is better to detect a None and store the row or col in case.

Here is how I find the max column and max row by simply looping through the Excel sheet. By using this code, you can compare both the result from the Python and the loop.
from openpyxl import load_workbook
wb = lw("Test.xlsx")
sheet = wb["Sheet 1"]
print("Python defined max_column " + str(sheet.max_column))
print("Python defined max_row " + str(sheet.max_row))
def get_maximum_cols():
for i in range(1, 20000):
if sheet.cell(row=2, column= i).value == None:
max_col = i
break
return max_col
def get_maximum_rows():
for i in range(1, 20000):
if sheet.cell(row=i, column = 2).value == None:
max_row = i
break
return max_row
max_cols = get_maximum_cols()
max_rows = get_maximum_rows()
print('max column ' + str(max_cols))
print('max row ' + str(max_rows))
wb.save("Test.xlsx")

Openpyxl Python - Vlookup Iterate through rows

I'm trying to automate a daily report we have, and I'm using a query to pull in data and writing it in Excel using openpyxl, and then doing a vlookup in openpyxl to match a cell value. Unfortunately I'm hung up on how to iterate through the rows to find the cell value to look up.
for row in ws['E5:E91']:
for cell in row:
cell.value = "=VLOOKUP(D5, 'POD data'!C1:D87, 2, FALSE)"
It works except I don't know how to change the D5 value to look up D6, D7, D8, etc. depending on the row I'm on. I'm honestly at a loss for how to best approach this. Obviously I don't feel like writing the formula out for every single row, and there's other columns I'd like to do this for once I get it.

Using your example, you can do:
for row in ws['E5:E91']:
for cell in row:
cell.value = "=VLOOKUP(D{0}, 'POD data'!C1:D87, 2, FALSE)".format(cell.row)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merging Specific Cells in an Excel Sheet with Python - python

Related

How can I pull data from rows based on the presence of a specific character in a specific column with Google Sheets?

Append data from list to excel using openpyxl

openpyxl iterate through rows and apply formula

Openpyxl max_row and max_column wrongly reports a larger figure

Openpyxl Python - Vlookup Iterate through rows

Categories

Resources