Getting values from a column in excel using openpyxl - python

I had intended to do this using the code in the answer here, in the last block of code. However i get an error in the line for cell in ws.iter_rows(range_string=range_expr): saying that "Worksheet object has no attribute iter_rows". Any idea what I'm doing wrong here?

all i needed was to change the workbook declaration to the following: wb = load_workbook('path/doc.xls', use_iterators=True), adding in the use_iterators paramater. Simple issue, simple solution :)

Related

Broken Excel output: Openpyxl formula settings?

I am creating some Excel spreadsheets from pandas DataFrames using the pandas.ExcelWriter().
Issue:
For some string input, this creates broken .xlsx files that need to be repaired. (problem with some content --- removed formula, cf error msg below)
I assume this happens because Excel interprets the cell content not as a string, but a formula which it cannot parse, e.g. when a string value starts with "="
Question:
When using xlsxwriter as engine, I can solve this issue by setting the argument options = {"strings_to_formulas" : False }
Is there a similar argument for openpyxl?
Troubleshooting:
I found the data_only argument to Workbook, but it only seems to apply to reading files / I cannot get it to work with ExcelWriter().
Not all output values are strings / I'd like to avoid converting all output to str
Could not find an applicable question on here
Any hints are much appreciated, thanks!
Error messages:
We found a problem with some content in 'file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes
The log after opening says:
[...] summary="Following is a list of removed records:">Removed Records: Formula from /xl/worksheets/sheet1.xml part [...]
Code
import pandas
excelout = pandas.ExcelWriter(output_file, engine = "openpyxl")
df.to_excel(excelout)
excelout.save()
Versions:
pandas #0.24.2
openpyxl #2.5.6
Excel 2016 for Mac (but replicates on Win)
I've struggled of this issue too.
I have found a strange solution for formulas.
I had to replace all ; (semicolon) signs with , (comma) in the formulas.
When I opened the result xlsx file with Excel, this error didn't rise and the formula in Excel had usual ;.
I spent FAR too long trying to figure out this error.
Turned out I had an extra bracket, so the formula wasn't valid.
I know 99% of people will read this and say "thats not the issue" and move on, but take your formula and paste it into excel if you can (replacing dynamic values as best you can) and see if excel accepts it.
If it accepts it fine, move on and find whatever the other cause it, but if you find it doesn't like the formula, maybe I just saved you a couple of hours....
My command: f'''=IF(ISBLANK(E{row}),FALSE," "))'''
Tiny command, could not understand what was wrong with it. :facepalm:

How to split Excel screen with Openpyxl?

I've been trying to use Openpyxl to split the Excel screen vertically (in Excel the Split button in the View tab on the ribbon). I haven't found any guide, how to do this. But I have found this web page (https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.views.html) and I think the "ySplit" property may be the solution. Unfortunately I haven't been able to figure out how to use it properly. I've tried the following code:
wb = openpyxl.load_workbook('file.xlsx')
sh = wb.active
sh.sheet_view.pane.ySplit = 20
EDIT: But this code does not work: AttributeError: 'NoneType' object has no
attribute 'ySplit'.
I've also tried some variations of the code above (with ySplit). But without success. If anybody could help me to find a way, how to split the screen, it would be much appreciated.
Thanks in advance.
EDIT2: The solution was provided by stovfl in comments. The code
should be:
sh.sheet_view.pane = openpyxl.worksheet.views.Pane(xSplit=20.0, ySplit=None, topLeftCell='C1', activePane='topLeft', state='split')
Question How to split Excel screen with Openpyxl?
To define, to show a Worksheet splitted, you have to create a openpyxl.worksheet.views.Pane object and assign it to myWorksheet.sheet_view.pane.
from openpyxl.worksheet.views import Pane
wb = openpyxl.load_workbook('file.xlsx')
ws = wb.active
ws.sheet_view.pane = Pane(xSplit=20.0, ySplit=None,
topLeftCell='C1', activePane='topLeft', state='split')
wb.save('file.xlsx')
openPyXL - worksheet.views.Pane
class openpyxl.worksheet.views.Pane(xSplit=None, ySplit=None,
topLeftCell=None,
activePane='topLeft', state='split')[source]
activePane
Value must be one of {‘topLeft’, ‘bottomRight’, ‘topRight’, ‘bottomLeft’}
state
Value must be one of {‘split’, ‘frozen’, ‘frozenSplit’}
topLeftCell
Values must be of type <class ‘str’>
xSplit
Values must be of type <class ‘float’>
ySplit
Values must be of type <class ‘float’>

pd.read_excel does recognize the file but does not actually read it

I've been busy working on some code and one part of it is importing an excel file. I've been using the code below. Now, on one pc it works but on another it does not (I did change the paths though). Python does recognize the excel file and does not give an error when loading, but when I print the table it says:
Empty DataFrame
Columns: []
Index: []
Just to be sure, I checked the filepath which seems to be correct. I also checked the sheetname but that is all good too.
df = pd.read_excel(book_filepath, sheet_name='Potentie_alles')
description = df["#"].map(str)
The key error '#' (# is the header of the first column of the sheet).
Does anyone know how to fix this?
Kind regards,
iCookieMonster

Openpyxl not removing sheets from created workbook correctly

So I ran into an issue with remove_sheet() with openpxyl that I can't find an answer to. When I run the following code:
import openpyxl
wb = openpyxl.Workbook()
ws = wb.create_sheet("Sheet2")
wb.get_sheet_names()
['Sheet','Sheet2']
wb.remove_sheet('Sheet')
I get the following error:
ValueError: list.remove(x): x not in list
It doesn't work, even if I try wb.remove_sheet(0) or wb.remove_sheet(1), I get the same error. Is there something I am missing?
If you use get_sheet_by_name you will get the following:
DeprecationWarning: Call to deprecated function get_sheet_by_name (Use
wb[sheetname]).
So the solution would be:
xlsx = Workbook()
xlsx.create_sheet('other name')
xlsx.remove(xlsx['Sheet'])
xlsx.save('some.xlsx')
remove.sheet() is given a sheet object, not the name of the sheet!
So for your code you could try
wb.remove(wb.get_sheet_by_name(sheet))
In the same vein, remove_sheet is also not given an index, because it operates on the actual sheet object.
Here's a good source of examples (though it isn't the same problem you're facing, it just happens to show how to properly call the remove_sheet method)!
Since the question was posted and answered, the Openpyxl library changed.
You should not use wb.remove(wb.get_sheet_by_name(sheet)) as indicated by #cosinepenguin since it is now depreciated ( you will get warnings when trying to use it ) but wb.remove(wb[sheet])
In python 3.7
import openpyxl
wb = openpyxl.Workbook()
ws = wb.create_sheet("Sheet2")
n=wb.sheetnames
#sheetname =>get_sheet_names()
wb.remove(wb["Sheet"])
'#or can use'
wb.remove(wb[n[1]])
1 is index sheet "sheet"
you can visit this link for more info

Is there any way to edit an existing Excel file using Python preserving formulae?

I am trying to edit several excel files (.xls) without changing the rest of the sheet. The only thing close so far that I've found is the xlrd, xlwt, and xlutils modules. The problem with these is it seems that xlrd evaluates formulae when reading, then puts the answer as the value of the cell. Does anybody know of a way to preserve the formulae so I can then use xlwt to write to the file without losing them? I have most of my experience in Python and CLISP, but could pick up another language pretty quick if they have better support. Thanks for any help you can give!
I had the same problem... And eventually found the next module:
from openpyxl import load_workbook
def Write_Workbook():
wb = load_workbook(path)
ws = wb.get_sheet_by_name("Sheet_name")
c = ws.cell(row = 2, column = 1)
c.value = Some_value
wb.save(path)
==> Doing this, my file got saved preserving all formulas inserted before.
Hope this helps!
I've used the xlwt.Formula function before to be able to get hyperlinks into a cell. I imagine it will also work with other formulas.
Update: Here's a snippet I found in a project I used it in:
link = xlwt.Formula('HYPERLINK("%s";"View Details")' % url)
sheet.write(row, col, link)
As of now, xlrd doesn't read formulas. It's not that it evaluates them, it simply doesn't read them.
For now, your best bet is to programmatically control a running instance of Excel, either via pywin32 or Visual Basic or VBScript (or some other Microsoft-friendly language which has a COM interface). If you can't run Excel, then you may be able to do something analogous with OpenOffice.org instead.
We've just had this problem and the best we can do is to manually re-write the formulas as text, then convert them to proper formulas on output.
So open Excel and replace =SUM(C5:L5) with "=SUM(C5:L5)" including the quotes. If you have a double quote in your formula, replace it with 2 double quotes, as this will escape it, so = "a" & "b" becomes "= ""a"" & ""b"" ")
Then in your Python code, loop over every cell in the source and output sheets and do:
output_sheet.write(row, col, xlwt.ExcelFormula.Formula(source_cell[1:-1]))
We use this SO answer to make a copy of the source sheet to be the output sheet, which even preserves styles, and avoids overwriting the hand written text formulas from above.

Categories