Writer.save() with new name python - python

I have an error when I'm trying to save an excel file with another name:
This is part of my code:
precios_read = pd.read_excel('Precios_{}.xls'.format(auth2), sheet_name='Precios')
precios_read = precios_read.sort_values(by=['Espacio'], ascending=True)
book = load_workbook('Template_sugerencia.xlsx')
writer = pd.ExcelWriter('Template_sugerencia.xlsx', engine='openpyxl')
writer.book = book
precios_read.to_excel(writer, sheet_name='template', startcol=12, startrow=5, index=False, merge_cells = True)
Recom.to_excel(writer, sheet_name='template', startcol=0, startrow=5, index=False, merge_cells = True)
cliente = auth + '_' + ids
writer.save('{}.xls'.format(cliente))
The problem is in the last line : writer.save('{}.xls'.format(cliente)). If I do writer.save() only all is okey and the file was saved but if I add the name of the file I want I can't do it
TypeError: save() takes exactly 1 argument (2 given)

ExcelWriter only takes in the filename on create, e.g.:
writer = pd.ExcelWriter('Template_sugerencia.xslx', engine='openpyxl')
writer.save has no arguments (the 1 argument is self). Calling it will save to the earlier specified filename.

Related

Property 'sheets' of 'OpenpyxlWriter' object has no setter using pandas and openpyxl

This code used to get a xlsx file and write over it, but after updating from pandas 1.1.5 to 1.5.1 I got zipfile.badzipfile file is not a zip file
Then I read here that after pandas 1.2.0 the pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it.
Knowing that, I changed the code to this one, but now I'm getting AttributeError: property 'sheets' of 'OpenpyxlWriter' object has no setter. How should I handle this?
book = load_workbook('Resultados.xlsx')
writer = pd.ExcelWriter('Resultados.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel(r'Resultados.xlsx')
df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
TLDR
Use .update to modify writer.sheets
Rearrange the order of your script to get it working
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# use `with` to avoid other exceptions
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
Details
Recreating your problem with some fake data
import numpy as np
from openpyxl import load_workbook
import pandas as pd
if __name__ == "__main__":
# make some random data
np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# this makes an existing file
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
df.to_excel(excel_writer=writer)
# make new random data
np.random.seed(1)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# what you tried...
book = load_workbook("Resultados.xlsx")
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel("Resultados.xlsx")
# skipping this step as we defined `df` differently
# df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
writer.close()
We get the same error plus a FutureWarning
...\StackOverflow\answer.py:23: FutureWarning: Setting the `book` attribute is not part of the public API, usage can give unexpected or corrupted results and will be removed in a future version
writer.book = book
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 24, in <module>
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
AttributeError: can't set attribute 'sheets'
The AttributeError is because sheets is a property of the writer instance. If you're unfamiliar with it, here is a resource.
In shorter terms, the exception is raised because sheets cannot be modified in the way you're trying. However, you can do this:
# use the `.update` method
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
That will move us past the the AttributeError, but we'll hit a ValueError a couple lines down:
reader = pd.read_excel("Resultados.xlsx")
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx")
...
File "...\lib\site-packages\pandas\io\excel\_base.py", line 1656, in __init__
raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Do what the error message says and supply an argument to the engine parameter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
And now we're back to your original zipfile.BadZipFile exception
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
...
File "...\Local\Programs\Python\Python310\lib\zipfile.py", line 1334, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
After a bit of toying, I noticed that the Resultados.xlsx file could not be opened manually after running this line:
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
So I reordered some of the steps in your code:
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# the old way
# writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
try this:
filepath = r'Resultados.xlsx'
with pd.ExcelWriter(
filepath,
engine='openpyxl',
mode='a',
if_sheet_exists='overlay') as writer:
reader = pd.read_excel(filepath)
df.to_excel(
writer,
startrow=reader.shape[0] + 1,
index=False,
header=False)

ValueError: Sheet 'Sheet1' already exists and if_sheet_exists is set to 'error'

I am trying to create an excel file of 3 columns: System Date, Time, Value on a webpage at that time.
Intention is to create a dataframe of the 3 values, every time the code runs, and append the dataframe to existing excel workbook (with one existing sheet).
I am able to create dataframe every time code runs, but when I try to append it to an excel file, it throws error:
ValueError: Sheet 'Sheet1' already exists and if_sheet_exists is set to 'error'
Can you please suggest, where am I going wrong.
# Importing Libraries
from datetime import datetime
import pandas as pd
import requests
from bs4 import BeautifulSoup
import openpyxl
#getting today's date amd formatting it
now = datetime.now()
Date = now.strftime ("%d/%m/%Y")
Time = now.strftime ("%H:%M")
# GET request to scrape. 'Page' variable to assign contents
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
# Create BeautifulSoup object to parse content
soup = BeautifulSoup(page.content, 'html.parser')
adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()
ADratio = round(int(adv)/int(dec), 2)
df = pd.DataFrame({tuple([Date, Time, ADratio])})
#Load workbook and read last used row
path = r'C:\Users\kashk\OneDrive\Documents\ADratios.xlsx'
writer = pd.ExcelWriter (path, engine='openpyxl', mode = 'a')
wb = openpyxl.load_workbook(path)
startrow = writer.sheets['Sheet1'].max_row
#Append data frame to existing table in existing sheet
df.to_excel (writer, sheet_name = 'Sheet1', index = False, header = False, startrow = startrow)
writer.save()
writer.close()
A fast and easy solution would be upgrading your pandas > 1.4.0 since it provides a if_sheet_exists = 'overlay' Source
pd.ExcelWriter(path, engine='openpyxl', mode='a', if_sheet_exists='overlay')
If you don't want to upgrade your pandas, there is a way to work around by removing and re-write the sheet into the excel file. (Not recommended if you have a lot of records since it will be slow).
path, sheet_name = 'ADratios.xlsx' , 'Sheet 1'
df.columns = ['Date','Time','ADratio']
with pd.ExcelWriter(path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
book = openpyxl.load_workbook(path, 'r')
df_bak = pd.read_excel(path)
writer.book = openpyxl.load_workbook(path)
writer.book.remove(writer.book.worksheets[writer.book.sheetnames.index(sheet_name)])
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
pd.concat([df_bak, df], axis=0).to_excel(writer, sheet_name=sheet_name, index = False)

Not able to execute pandas read_excel function twice, when referring the object from AWS S3 bucket

Trying to read multiple worksheets from the same excel file stored in the S3 bucket.
The code works fine for the first time, but while trying to execute the read_excel function for the second time to read another worksheet from the same excel file, getting an error. Though the "obj" object remains constant, the read_excel function doesn't work.
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File1 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet1",dtype=str, header= 4)
File2 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet2",dtype=str, header= 4)
ValueError: File is not a recognized excel file
For now, re-executing the get_object function one more time before executing the read_excel function for reading the second sheet from the same excel file.
like this
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File1 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet1",dtype=str, header= 4)
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File2 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet2",dtype=str, header= 4)
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File3 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet3",dtype=str, header= 4)
This is of course very redundant coding, so looking for some advice. Thanks!!
Save what you read to be reused
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0, 0]))
content = obj['Body'].read()
File1 = pd.read_excel(io.BytesIO(content), sheet_name="Sheet1", dtype=str, header=4)
File2 = pd.read_excel(io.BytesIO(content), sheet_name="Sheet2", dtype=str, header=4)

writing data to the same file from different functions

Currently I can write results from within each function to an individual file.
How would I write the results from the 2 functions to the same file?
I think I would need to pull out writer = pd.ExcelWriter('All Results', engine='xlsxwriter') with new file name outside of the function but I dont know how to handle the writing of each df_Final...
Input:
ExcelName='....'
t1=pd.read_excel('.....')
t2=['.......']
def F1(Input_Data):
writer = pd.ExcelWriter('F1_Results', engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[3],index=False, header=False)
writer.save()
return
def F2(Input_Data):
writer = pd.ExcelWriter('F2_Results', engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[7],index=False, header=False)
writer.save()
return
Solution:
This helper function might help you out:
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
NOTE: for Pandas < 0.21.0, replace sheet_name with sheetname!
Usage examples:
append_df_to_excel('/home/data/test.xlsx', df)
append_df_to_excel('/home/data/test.xlsx', df, header=None, index=False)
append_df_to_excel('/home/data/test.xlsx', df, sheet_name='Sheet2', index=False)
append_df_to_excel('/home/data/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)
You can modify the function to receive filename as a parameter
def write_to_excel(filename, input_data):
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[3],index=False, header=False)
writer.save()
return
#Then use like
write_to_excel("F2_Results", input_data)

How to write at the same time to Different excel Using Python

I try to write to all files, that I have at the same time.
I have some files
izzymonroe#mail.ru.xlsx,
lucky-frog#mail.ru.xlsx,
lucky-frog#mail.ru.xlsx,
izzymonroe#mail.ru.xlsx,
Yubodrova#ya.ru.xlsx,
lucky-frog#mail.ru.xlsx,
Ant.karpoff2011#yandex.ru.xlsx
9rooney9#list.ru.xlsx
and I want to write data to this. But how can I send it to function(and I need to write to file value with groupby)
df = pd.read_excel('group.xlsx')
def add_xlsx_sheet(df, sheet_name=u'Смартфоны полно', index=True, digits=1, path='9rooney9#list.ru.xlsx'):
book = load_workbook(path)
writer = ExcelWriter('9rooney9#list.ru.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
if sheet_name in list(writer.sheets.keys()):
sh = book.get_sheet_by_name(sheet_name)
book.remove_sheet(sh)
df.to_excel(writer, sheet_name=u'Смартфоны полно', startrow=0, startcol=0,
float_format='%.{}f'.format(digits), index=index)
writer.save()
It works to one file, but it write all data to this file. But I need to write group, where id in mail complies the name of file
How can I specify all file in function and next
df.groupby('member_id').apply(lambda g: g.to_excel(str(g.name) + '.xlsx', 'sheet2'))
The problem was solved with df.groupby('col_name').apply(lambda x: add_xlsx_sheet(x, x.name, path='{}.xlsx'.format(x.name)))

Categories