Openpyxl compare subsequent rows in column - python

I've been learning Python for the express purpose of creating a program that automates part of my job. I'm far along in the learning process to feel comfortable to take on a small part of the problem.
I want to create a function that combines two cells into one, with just one of their values (meaning I don't want to concatenate), if they are equal to each other. If they aren't it will pass.
I don't know how to express this in a for loop. I really want to complete this project myself, but I need some jumping off point. Any help is greatly appreciated.
I've created a virtual environment and have the following code. I understanding indexing with for loops for lists, but don't know how it works with openpyxl. Again, I am very new to programming in general, but am excited to work on this problem. The issue I have now that I haven't been able to find online, is how do I refer to cell's location and then have it refer to subsequent cell's location.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
wb = workbook()
ws = wb.active
#load existing spreadsheet
wb = load_workbook('input.xlsx')
column_i = ws['I']
def same_name()
for cell in column_i:
if cell.value[cell] == cell.value[cell+1]

You can use iter_cols to loop through each column then use merge_cells :
from openpyxl import load_workbook
def merge_subsequently(inpath, outpath):
wb = load_workbook(inpath)
ws = wb.active
for col in ws.iter_cols():
for _1st, _2nd in zip(col, col[1:]):
if _1st.value == _2nd.value:
ws.merge_cells(None, _1st.row, _1st.column,
_2nd.row, _2nd.column)
wb.save(outpath)
p1 = "/Desktop/input.xlsx"
p2 = "/Desktop/output.xlsx"
merge_subsequently(p1, p2)
BEFORE :
AFTER :

Related

How to read outline levels using Python `openpyxl`?

My organization has a clean export for bills of materials (BOM). I would like to automatically parse the excel file to check the BOM for certain attributes.
At the moment, I'm using Python with openpyxl.
I can read the excel workbook and worksheet just fine, but I cannot seem to find the attribute that contains the "outline level" of each row (I fully concede that I may be using the wrong terminology... another term candidate might be "group").
When I look at my excel file using excel, I see this at the left of the screen:
I would like to extract the 1 2 3 4 5 from each of the rows and to tell what grouping they were in.
My initial code is:
from pathlib import Path
import openpyxl as xl
path = Path('<path-to-my-file>.xlsx')
wb = xl.load_workbook(filename=path)
sh = wb.worksheets[0]
# ... would like to put outline level reading code here
From reading other questions, I suspect that I need to look at the row_dimension.group method of the worksheet, but I can't seem to get a handle on the syntax or the exact attribute that I'm looking for.
Thanks for the post. I was struggling with the same problem and seing your post gave me an idea!
I overcome it with the following code:
from pathlib import Path
import openpyxl as xl
path = Path('<path-to-my-file>.xlsx')
wb = xl.load_workbook(filename=path)
sh = wb.worksheets[0]
for row in sorted(sheet.row_dimensions):
outline1=sheet.dimensions[row].outlineLevel
outline2=sheet.dimensions[row].outline_level
print(row,sheet.dimensions[row], outline1, outline2 )
Maybe you can use the following code to gather individual row outline levels as an integer. I use a similar code to find maximum outline level in a sheet with some more lines.
for index in range(ws.min_row, ws.max_row):
row_level = ws.row_dimensions[index].outline_level + 1
In here row level variable is the outline level, you may use as required. But please double check +1, if I remember correctly, to get true level, you need to increase variable by one.

IDE does not suggest methods available on the selected object

I am very new to Python world. Pardon for any inane mistake(s).
What program is trying to do?
I just wrote a piece of code which reads the data from an existing excel and printing a selected cell value.
Problem?
Problem is IDE is not showing all the suggestion like when control loaded the sheet (of type Worksheet) the object name "sheet" does not show methods available in sheet object.
Code
import pandas as pd
from openpyxl import load_workbook
test_f_path = '/Users/new_python_user/_Codes/_Personal/test_resources/Test_Update.xlsx'
sheet_name = 'Eight'
workbook = load_workbook(test_f_path)
sheet = workbook[sheet_name]
print(sheet.cell(1, 1).value) # <--- Here is problem: typing "sheet." does not provide suggestions
Pycharm Version
Pycharm Community 2019.3.3 for MAC
EDIT-1
Modified the program to cast the object then it worked. But this is not the way i thought about Python. May be i am missing something. Pls advice.
from openpyxl import load_workbook
from openpyxl.worksheet.worksheet import Worksheet # <--Imported here
test_f_path = '/Users/i852841/_Codes/_Personal/PyStockCrawler/test_resources/Test_Update.xlsx'
sheet_name = 'Eight'
workbook = load_workbook(test_f_path)
sheet = workbook[sheet_name]
sheet_casted = Worksheet(sheet) #<-- Cast here

Is there a way to save data in named Excel cells using Python?

I have used openpyxl for outputting values in Excel in my Python code. However, now I find myself in a situation where the cell locations in excel file may change based on the user. To avoid any problems with the program, I want to name the cells where the code can save the output to. Is there any way to have Python interact with named ranges in Excel?
For a workbook level defined name
import openpyxl
wb = openpyxl.load_workbook("c:/tmp/SO/namerange.xlsx")
ws = wb["Sheet1"]
mycell = wb.defined_names['mycell']
for title, coord in mycell.destinations:
ws = wb[title]
ws[coord] = "Update"
wb.save('updated.xlsx')
print("{} {} updated".format(ws,coord))
I was able to find the parameters of the named range using defined_names. After that I just worked like it was a normal Excel cell.
from openpyxl import load_workbook
openWB=load_workbook('test.xlsx')
rangeDestination = openWB.defined_names['testCell']
print(rangeDestination)
sheetName=str(rangeDestination.attr_text).split('!')[0]
cellName = str(rangeDestination.attr_text).split('!')[1]
sheetToWrite=openWB[sheetName]
cellToWrite=sheetToWrite[cellName]
sheetToWrite[cellName]='TEST-A3'
print(sheetName)
print(cellName)
openWB.save('test.xlsx')
openWB.close()

Find words with underscores in excel worksheet by using Python

Is it possible to search/ parse through two columns in excel (let's say columns C & D) and find only the fields with underscores by using python?
Maybe a code like this? Not too sure..:
Import xl.range
Columns = workbook.get("C:D"))
Extract = re.findall(r'\(._?)\', str(Columns)
Please let me know if my code can be further improved on! :)
for those who need an answer, I solved it via using this code:
import openpyxl
from openpyxl.reader.excel import load_workbook
dict_folder = "C:/...../abc"
for file in os.listdir(dict_folder):
if file.endswith(".xlsx"):
wb1 = load_workbook(join(dict_folder, file), data_only = True)
ws = wb1.active
for rowofcellobj in ws["C" : "D"]:
for cellobj in rowofcellobj:
data = re.findall(r"\w+_.*?\w+", str(cellobj.value))
if data != []:
fields = data[0]
fieldset.add(fields)
Yes, it is indeed possible. The main lib you'll get to for that is pandas. With it installed (instructions here) after, of course, installing python, you could do something along the lines of
import pandas as pd
# Reading the Excel worksheet into a pandas.DataFrame type object
sheet_path = 'C:\\Path\\to\\excel\\sheet.xlsx'
df = pd.read_excel(sheet_path)
# Using multiple conditions to find column substring within
underscored = df[(df['C'].str.contains('_')) | (df['D'].str.contains('_'))]
And that'd do it for columns C and D within your worksheet.
pandas has got a very diverse documentation, but to the extent you're looking for, the read_excel function documentation (has examples) will suffice, along with some more content on python itself, if needed.

Python: Write a dataframe to an already existing excel which contains a sheet with images

I have been working on this for too long now. I have an Excel with one sheet (sheetname = 'abc') with images in it and I want to have a Python script that writes a dataframe on a second separate sheet (sheetname = 'def') in the same excel file. Can anybody provide me with some example code, because everytime I try to write the dataframe, the first sheet with the images gets emptied.
This is what I tried:
book = load_workbook('filename_of_file_with_pictures_in_it.xlsx')
writer = pd.ExcelWriter('filename_of_file_with_pictures_in_it.xlsx', engine = 'openpyxl')
writer.book = book
x1 = np.random.randn(100, 2)
df = pd.DataFrame(x1)
df.to_excel(writer, sheet_name = 'def')
writer.save()
book.close()
It saves the random numbers in the sheet with the name 'def', but the first sheet 'abc' now becomes empty.
What goes wrong here? Hopefully somebody can help me with this.
Interesting question! With openpyxl you can easily add values, keep the formulas but cannot retain the graphs. Also with the latest version (2.5.4), graphs do not stay. So, I decided to address the issue with
xlwings :
import xlwings as xw
wb = xw.Book(r"filename_of_file_with_pictures_in_it.xlsx")
sht=wb.sheets.add('SheetMod')
sht.range('A1').value = np.random.randn(100, 2)
wb.save(r"path_new_file.xlsx")
With this snippet I managed to insert the random set of values and saved a new copy of the modified xlsx.As you insert the command, the excel file will automatically open showing you the new sheet- without changing the existing ones (graphs and formulas included). Make sure you install all the interdependencies to get xlwings to run in your system. Hope this helps!
You'll need to use an Excel 'reader' like Openpyxl or similar in combnination with Pandas for this, pandas' to_excel function is write only so it will not care what is inside the file when you open it.

Categories