Python appending dataframe to exsiting excel file and sheet - python

I have a question about appending a dataframe to existing sheet on existing file.
I tried to write the code by myself
writer = pd.ExcelWriter('existingFile.xlsx', engine='openpyxl', mode='a')
df.to_excel(writer, sheet_name="existingSheet", startrow=writer.sheets["existingSheet"].max_row, index=False, header=False)
and this would cause an error
ValueError: Sheet 'existingSheet' already exists and if_sheet_exists is set to 'error'.
and I googled and found this function in here;
Append existing excel sheet with new dataframe using python pandas
and even with this function, it still causes the same error, even though i think this function prevents this exact error from what i think.
Could you please help?
Thank you very much!

I guess I am late to the party but
Add keyword if_sheet_exists='replace' to ExcelWriter like this
pd.ExcelWriter('existingFile.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace' )
Then you can use the latest version of pandas

Since Pandas 1.4, there is also an 'overlay' option to if_sheet_exists.

It seems this function is broken in pandas 1.3.0 Just look at this document , when trying to append to existing sheet it will either raise an error, create a new one or replace it
if_sheet_exists{‘error’, ‘new’, ‘replace’}, default ‘error’ How to
behave when trying to write to a sheet that already exists (append
mode only).
error: raise a ValueError.
new: Create a new sheet, with a name determined by the engine.
replace: Delete the contents of the sheet before writing to it.
New in version 1.3.0
The only solution is to downgrade pandas (1.2.3 is working for me)
pip install pandas==1.2.3

Related

<ValueError: Sheet already exists and if_sheet_exists is set to 'error'> on one machine but not on another

I'm running some basic python code which generates a pandas DataFrame called df and then writes it to a pre-formatted Excel file using pandas ExcelWriter and openpyxl as its engine.
workbook = load_workbook('example.xlsx')
sheet = workbook['example_sheet']
writer = pd.ExcelWriter('example.xlsx', engine='openpyxl', mode='a')
writer.book = workbook
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
sheet.cell(row=8, column=5).value = some.value
df.to_excel(writer, sheet_name='example_sheet', index=False, header=False, startrow=10, startcol=5)
The strange thing is that when running on my machine it works perfectly, however on my colleague's machine it throws the mentioned error:
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_openpyxl.py", line 437, in write_cells f"Sheet '{sheet_name}' already exists and " ValueError: Sheet 'example_sheet' already exists and if_sheet_exists is set to 'error'.
I've tried working around it by explicitly setting
if_sheet_exists = 'replace'
however now (as expected) it replaces the entire sheet and destroys the formatting that was applied before. When running on my machine however it would not replace the sheet, even though the writer was set to do so.
I'm not entirely sure where to look for differences in the machines so if anyone could throw me an idea I would be very thankful.
The last Pandas version to behave as you expect was version 1.2.5. pip uninstall pandas and then pip install pandas==1.2.5, then it will simply append the data to your previously formatted Excel template.
YMMV.
Edit:
I believe the issue was with openpyxl, which you may be able to revert separately if the above solution is not feasible. You might also try if_sheet_exists = 'overlay' if it suits your needs.

Copy dataframe to existing excel file (Pandas)

I'm trying to copy a dataframe to the first empty rows of an existing excel file.
I already know how to copy the dataframe to a new file, but my intention is to keep on adding information to the same file, each time on the first empty rows available.
I've tried something like this, but it hasn't worked (I'm still getting used to working with Pandas):
with pd.ExcelWriter('test.xlsx', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet1')
The excel's path I'm working with is the folder of my IDE (Spyder).
Thanks in advance!
You could read in the existing dataframe
df = pd.read_excel('test.xlsx')
add any rows to this dataframe and then output it all, overwriting what was previously stored
df.to_excel("test.xlsx")
From the documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html) this will always overwrite the previous sheet avoiding the chance of duplicating data.

pandas pd.read_excel() returning empty dictionary

I am a novice Python programmer and I am having an issue loading an xlsx workbook with the pd.read_excel() function. The pandas read_excel documentation says that specifying 'sheet_name = None' should return "All sheets as a dictionary of DataFrames", however I am getting an empty dictionary back:
template_workbook = pd.read_excel(template_path, sheet_name=None, index_col=None)
template_workbook
Returns:
OrderedDict()
When I try to print the worksheet names in the dictionary:
template_workbook.sheet_name
Returns:
AttributeErrorTraceback (most recent call last)
<ipython-input-67 e76a0b915981> in <module>()
----> 1 template_workbook.sheet_name
AttributeError: 'OrderedDict' object has no attribute 'sheet_name'
It is not clear to me why the worksheets are not being listed in the output dictionary. Any tips are greatly appreciated.
I have 26 tabs/sheets, and am trying to fill 23 using the tab names for indexing.
When you use read_excel with multiple sheets, pandas will return a dictionary:
Returns: DataFrame or Dict of DataFrames
If you have an dictionary, you can use the .keys() method to see the file tabs, as in:
print(template_workbook.keys())
I found this post through Google as I ran into this same problem. Unfortunately, no errors were thrown which is not very helpful, so I'm posting this answer to help the next person who might find this.
The read_excel function in Pandas doesn't exhaustively support ALL Excel functionality. This means if you are using some advanced Excel functionality (named ranges) your data might not be parsed correctly when Pandas tries to read your Excel data.
I tried to simplify my Excel file as much as possible which still didn't work, so I created a new Excel Workbook and copied my data in sheet by sheet. This ended up working for me.
So my advice is to keep your Excel file as simple as possible and you'll probably be able to import it with Pandas. If you send over your exact Excel file I'm happy to help debug (I know this is coming years after the question though).

Excel formatting in python without loading workbook

I am trying to format an excel document within python that I am creating in the same script. All of the answers I have found have involved loading an existing workbook into python and formatting from there. In my script, I am currently writing the entire unformatted excel sheet, saving the file, then immediately reloading the document in to python to format. This is the only workaround I can find so that I can have an active sheet.
writer=pd.ExcelWriter(file_name, engine='openpyxl')
writer.save()#saving my file
wb=load_workbook(file_name) #reloading file to format
ws=wb.active
ws.column_dimensions['A'].width=33
ws.column_dimensions['B'].width=16
wb.save(file_name)
This works to change aspects such as column width, but I would like a way to format the page without saving and reloading. Is there a way to get around the need for an active sheet when there is no file_name written yet? I want a way to remove line 2 and 3, however that may be.
The object that Pandas is creating in ExcelWriter depends on the "engine" you give it. In this case, you're passing along "openpyxl", so ExcelWriter is making an openpyxl.Workbook() object. You can create a new Workbook in openpyxl using "Workbook()" Like so:
https://openpyxl.readthedocs.io/en/default/tutorial.html#create-a-workbook
It is created with 1 active sheet. Basically:
import openpyxl
wb = openpyxl.Workbook()
ws=wb.active
ws.column_dimensions['A'].width=33
ws.column_dimensions['B'].width=16
wb.save(file_name)
...would do the job
Your title is misleading: you're working in Pandas and dumping to Excel. Pandas does allow some formatting for this but, because it tries to support different Python libraries (openpyxl, xlsxwriter and xlwt) there are restrictions on this.
For full control openpyxl provides support for Pandas' DataFrame objects: http://openpyxl.readthedocs.io/en/latest/pandas.html

Delete excel row with Python

I'm doing some testing using python-excel modules. I can't seem to find a way to delete a row in an excel sheet using these modules and the internet hasn't offered up a solution. Is there a way to delete a row using one of the python-excel modules?
In my case, I want to open an excel sheet, read the first row, determine if it contains some valid data, if not, then delete it.
Any suggestions are welcome.
xlwt provides as the module name suggests Excel writer (creation rather than modification) funcionality.
xlrd on the other hand provides Excel reader funcionality.
If your source excel file is rather simple (no fancy graphs, pivot tables, etc.), you should proceed this way:
with xlrd module read the contents of the targeted excel file, and then with xlwt module create new excel file which contains the necessary rows.
If you, however are running this on windows platform , you might be able to manipulate Excel directly through Microsoft COM objects, see old book reference.
I was having the same issue but found a walk around:
Use a custom filter process (Reader>Filter1>Filter2>...>Writer) to generate a copy of the source excel file but with a blank column inserted at the front. Let's call this file augmented.xls.
Then, read augmented.xls into a xlrd.Workbook object, rb, using xlrd.open_workbook().
Use xlutils.copy.copy() to convert rb into a xlwt.Workbook object, wb.
Set the value of the first column of each of the to-be-deleted rows as "x" (or other values as a marker) in wb.
Save wb back to augmented.xls.
Use another custom filter process to generate a resulting excel file from augmented.xls by omitting those rows with "x" in the first column and shifting all columns one column left (equivalent to deleting the first column of markers).
Information and examples of defining a filter process can be found in http://www.simplistix.co.uk/presentations/python-excel.pdf
Hope this help in some way.
You can use the library openpyxl. When opening a file it is both for reading and for writing. Then, with a simple function you can achieve that:
from openpyxl import load_workbook
wb = load_workbook(filename)
ws = wb.active()
first_row = ws[1]
# Your code here using first_row
if first_row not valid:
ws.delete_rows(1, amount=1)

Categories