How to read outline levels using Python `openpyxl`?

How to read outline levels using Python `openpyxl`? - python

My organization has a clean export for bills of materials (BOM). I would like to automatically parse the excel file to check the BOM for certain attributes.
At the moment, I'm using Python with openpyxl.
I can read the excel workbook and worksheet just fine, but I cannot seem to find the attribute that contains the "outline level" of each row (I fully concede that I may be using the wrong terminology... another term candidate might be "group").
When I look at my excel file using excel, I see this at the left of the screen:
I would like to extract the 1 2 3 4 5 from each of the rows and to tell what grouping they were in.
My initial code is:
from pathlib import Path
import openpyxl as xl
path = Path('<path-to-my-file>.xlsx')
wb = xl.load_workbook(filename=path)
sh = wb.worksheets[0]
# ... would like to put outline level reading code here
From reading other questions, I suspect that I need to look at the row_dimension.group method of the worksheet, but I can't seem to get a handle on the syntax or the exact attribute that I'm looking for.

Thanks for the post. I was struggling with the same problem and seing your post gave me an idea!
I overcome it with the following code:
from pathlib import Path
import openpyxl as xl
path = Path('<path-to-my-file>.xlsx')
wb = xl.load_workbook(filename=path)
sh = wb.worksheets[0]
for row in sorted(sheet.row_dimensions):
outline1=sheet.dimensions[row].outlineLevel
outline2=sheet.dimensions[row].outline_level
print(row,sheet.dimensions[row], outline1, outline2 )

Maybe you can use the following code to gather individual row outline levels as an integer. I use a similar code to find maximum outline level in a sheet with some more lines.
for index in range(ws.min_row, ws.max_row):
row_level = ws.row_dimensions[index].outline_level + 1
In here row level variable is the outline level, you may use as required. But please double check +1, if I remember correctly, to get true level, you need to increase variable by one.

Related

Freeze Panes first two rows and column with openpyxl

Trying to freeze the first two rows and first column with openpyxl, however, whenever doing such Excel says that the file is corrupted and there is no freeze.
Current code:
workbook = openpyxl.load_workbook(path)
worksheet = workbook[first_sheet]
freeze_panes = Pane(xSplit=2000, ySplit=3000, state="frozen", activePane="bottomRight")
worksheet.sheet_view.pane = freeze_panes
Took a look at the documentation, however, there is little explanation on parametere setting.
Desired output:
Came across this answer, however, it fits a specific use case, hence, wanted to make a general question for future reference:
How to split Excel screen with Openpyxl?

To freeze the first two rows and first column, use the sample code below... ws.freeze_panes works. Note that, like you would do in excel, select the cell above and left of which you want to freeze. So, in your case, the cell should be B3. Hope this is what you are looking for.
import openpyxl
wb=openpyxl.load_workbook('Sample.xlsx')
ws=wb['Sheet1']
mycell = ws['B3']
ws.freeze_panes = mycell
wb.save('Sample.xlsx')

Why is openpyxl keep corrupting my excel files?

I'm trying to apply styles to cells in my excel files using the openpyxl library. If I try this (using an existing style and modifying it):
import openpyxl
wkbk = openpyxl.load_workbook('example.xlsx')
views_sheet = wkbk['Sheet']
cell_ = views_sheet.cell(row=4,column=3)
cell_.style = '20 % - Accent1'
bd = openpyxl.styles.Side(color=openpyxl.styles.colors.Color(theme=29))
cell_.border = openpyxl.styles.Border(left=bd, top=bd, right=bd, bottom=bd)
cell_.font = openpyxl.styles.Font(name='Calibri',size=11,bold=False,italic=False,vertAlign=None,underline='none',strike=False)
wkbk.save('example.xlsx')
I open 'example.xlsx' I get that my file is corrupted/needs to be restored. I thought that maybe it isn't possible writing over some existing style, so I created a new named style "highlight" with the associated color:
highlight = openpyxl.styles.NamedStyle(name="highlight")
highlight.fill = openpyxl.styles.PatternFill(bgColor=openpyxl.styles.colors.Color(theme=30),fill_type='shaded',patternType='lightGray')
bd = openpyxl.styles.Side(color=openpyxl.styles.colors.Color(theme=29))
highlight.border = openpyxl.styles.Border(left=bd, top=bd, right=bd, bottom=bd)
highlight.font = openpyxl.styles.Font(name='Calibri',size=11,bold=False,italic=False,vertAlign=None,underline='none',strike=False)
wkbk.add_named_style(highlight)
cell_.style = 'highlight'
But then I keep getting a ValueError indicating that I need to provide a value for parameter 'patternType' of class 'PatternFill'. This clearly does not makes sense.
Maybe I'm doing this wrong (it's hard to follow the documentation; had to look up older analogous implementations/snippets). Would appreciate some help.
Thank you!

I realize this is very late, but I had a similar issue where openpyxl was corrupting my .xlsx files when I tried to write a Pandas dataframe to an Excel workbook. The issue turned out to be that the workbook had some other tabs with formulas in certain cells, and for some reason the formulas get corrupted when openpyxl runs. I don't understand why, but the "fix" is to remove the formulas (so, hardcode anything you can).

How to use Win32com to Find and Replace. My code wont execute Replace

New to Python--
My code wont execute Find and Replace after moving the Sheet.
The goal is to bring a new sheet with formulas, then Find and Replace the reference, in the formulas from the 1st book. This will allow the formulas to be live in the second book.
Here is what I have so far. It returns "No Values were found", But they are there.
Any Point in the right Direction will help!
Various Functions
from win32com.client import Dispatch
path1 = (r'C:Full Path\Book1.xlsx')
path2 = (r'C:\Full Path\Book2.xlsx')
xl = Dispatch("Excel.Application")
xl.Visible = True
wb1 = xl.Workbooks.Open(Filename=path1)
wb2 = xl.Workbooks.Open(Filename=path2)
ws1 = wb1.Worksheets(1)
ws2 = wb2.Worksheets(1)
ws1.Copy(Before=wb2.Worksheets(1))
wb1.Close(SaveChanges=True)
#Cant get this part to work
ws2.Cells.Replace('C:Full Path\[Book1.xlsx]','')
Replace.Execute(ReplaceAll=1, Forward=True)
wb2.Close(SaveChanges=True)
xl.Quit()
I think the issue is letting excel know where to execute the Find and Replace.

Find and Replace are coupled in VBA. Do not be fooled by Python. This is a VBA problem, not a python problem! Once you involve win32com, you have to use VBA Methods and some of the syntax.
Replace cannot work on its own, first, you need to mark a Range with Find that you want to Replace.
Usually it goes
[object].Range.Find
.[Text or Feature to find]
.Replace.[what to replace with]
.Replace.Execute
How to do this in python depends on exactly what you want to find. I do not understand from your Question what that actually would be.

Reading scientific numbers in xlrd

Pretty simple question but haven't been able to find a good answer.
In Excel, I am generating files that need to be automatically read. They are read by an ID number, but the format I get is setting it as text. When using xlrd, I get this format:
5.5112E+12
When I need it in this format:
5511195414392
What is the best way to achieve this? I would like to avoid using xlwt but if it is necessary I could use help on getting started in that process too

Give this a shot:
import decimal
decimalNotation = decimal.Decimal(scientificNotationValueFromExcel)
I made the following quick program to test it out. The Excel file it is reading from has a single entry in the first cell.
from xlrd import *
import decimal
workbook = open_workbook('test.xlsx')
sheet = workbook.sheet_by_index(0)
value = sheet.cell_value(0, 0)
print decimal.Decimal(value)

I used the CSV module to figure this out, as it read the cells correctly.

Is there any way to edit an existing Excel file using Python preserving formulae?

I am trying to edit several excel files (.xls) without changing the rest of the sheet. The only thing close so far that I've found is the xlrd, xlwt, and xlutils modules. The problem with these is it seems that xlrd evaluates formulae when reading, then puts the answer as the value of the cell. Does anybody know of a way to preserve the formulae so I can then use xlwt to write to the file without losing them? I have most of my experience in Python and CLISP, but could pick up another language pretty quick if they have better support. Thanks for any help you can give!

I had the same problem... And eventually found the next module:
from openpyxl import load_workbook
def Write_Workbook():
wb = load_workbook(path)
ws = wb.get_sheet_by_name("Sheet_name")
c = ws.cell(row = 2, column = 1)
c.value = Some_value
wb.save(path)
==> Doing this, my file got saved preserving all formulas inserted before.
Hope this helps!

I've used the xlwt.Formula function before to be able to get hyperlinks into a cell. I imagine it will also work with other formulas.
Update: Here's a snippet I found in a project I used it in:
link = xlwt.Formula('HYPERLINK("%s";"View Details")' % url)
sheet.write(row, col, link)

As of now, xlrd doesn't read formulas. It's not that it evaluates them, it simply doesn't read them.
For now, your best bet is to programmatically control a running instance of Excel, either via pywin32 or Visual Basic or VBScript (or some other Microsoft-friendly language which has a COM interface). If you can't run Excel, then you may be able to do something analogous with OpenOffice.org instead.

We've just had this problem and the best we can do is to manually re-write the formulas as text, then convert them to proper formulas on output.
So open Excel and replace =SUM(C5:L5) with "=SUM(C5:L5)" including the quotes. If you have a double quote in your formula, replace it with 2 double quotes, as this will escape it, so = "a" & "b" becomes "= ""a"" & ""b"" ")
Then in your Python code, loop over every cell in the source and output sheets and do:
output_sheet.write(row, col, xlwt.ExcelFormula.Formula(source_cell[1:-1]))
We use this SO answer to make a copy of the source sheet to be the output sheet, which even preserves styles, and avoids overwriting the hand written text formulas from above.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read outline levels using Python `openpyxl`? - python

Related

Freeze Panes first two rows and column with openpyxl

Why is openpyxl keep corrupting my excel files?

How to use Win32com to Find and Replace. My code wont execute Replace

Reading scientific numbers in xlrd

Is there any way to edit an existing Excel file using Python preserving formulae?

Categories

Resources