I have an error when I'm trying to save an excel file with another name:
This is part of my code:
precios_read = pd.read_excel('Precios_{}.xls'.format(auth2), sheet_name='Precios')
precios_read = precios_read.sort_values(by=['Espacio'], ascending=True)
book = load_workbook('Template_sugerencia.xlsx')
writer = pd.ExcelWriter('Template_sugerencia.xlsx', engine='openpyxl')
writer.book = book
precios_read.to_excel(writer, sheet_name='template', startcol=12, startrow=5, index=False, merge_cells = True)
Recom.to_excel(writer, sheet_name='template', startcol=0, startrow=5, index=False, merge_cells = True)
cliente = auth + '_' + ids
writer.save('{}.xls'.format(cliente))
The problem is in the last line : writer.save('{}.xls'.format(cliente)). If I do writer.save() only all is okey and the file was saved but if I add the name of the file I want I can't do it
TypeError: save() takes exactly 1 argument (2 given)
ExcelWriter only takes in the filename on create, e.g.:
writer = pd.ExcelWriter('Template_sugerencia.xslx', engine='openpyxl')
writer.save has no arguments (the 1 argument is self). Calling it will save to the earlier specified filename.
Related
This code used to get a xlsx file and write over it, but after updating from pandas 1.1.5 to 1.5.1 I got zipfile.badzipfile file is not a zip file
Then I read here that after pandas 1.2.0 the pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it.
Knowing that, I changed the code to this one, but now I'm getting AttributeError: property 'sheets' of 'OpenpyxlWriter' object has no setter. How should I handle this?
book = load_workbook('Resultados.xlsx')
writer = pd.ExcelWriter('Resultados.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel(r'Resultados.xlsx')
df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
TLDR
Use .update to modify writer.sheets
Rearrange the order of your script to get it working
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# use `with` to avoid other exceptions
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
Details
Recreating your problem with some fake data
import numpy as np
from openpyxl import load_workbook
import pandas as pd
if __name__ == "__main__":
# make some random data
np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# this makes an existing file
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
df.to_excel(excel_writer=writer)
# make new random data
np.random.seed(1)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# what you tried...
book = load_workbook("Resultados.xlsx")
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel("Resultados.xlsx")
# skipping this step as we defined `df` differently
# df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
writer.close()
We get the same error plus a FutureWarning
...\StackOverflow\answer.py:23: FutureWarning: Setting the `book` attribute is not part of the public API, usage can give unexpected or corrupted results and will be removed in a future version
writer.book = book
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 24, in <module>
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
AttributeError: can't set attribute 'sheets'
The AttributeError is because sheets is a property of the writer instance. If you're unfamiliar with it, here is a resource.
In shorter terms, the exception is raised because sheets cannot be modified in the way you're trying. However, you can do this:
# use the `.update` method
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
That will move us past the the AttributeError, but we'll hit a ValueError a couple lines down:
reader = pd.read_excel("Resultados.xlsx")
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx")
...
File "...\lib\site-packages\pandas\io\excel\_base.py", line 1656, in __init__
raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Do what the error message says and supply an argument to the engine parameter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
And now we're back to your original zipfile.BadZipFile exception
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
...
File "...\Local\Programs\Python\Python310\lib\zipfile.py", line 1334, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
After a bit of toying, I noticed that the Resultados.xlsx file could not be opened manually after running this line:
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
So I reordered some of the steps in your code:
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# the old way
# writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
try this:
filepath = r'Resultados.xlsx'
with pd.ExcelWriter(
filepath,
engine='openpyxl',
mode='a',
if_sheet_exists='overlay') as writer:
reader = pd.read_excel(filepath)
df.to_excel(
writer,
startrow=reader.shape[0] + 1,
index=False,
header=False)
I am trying to create an excel file of 3 columns: System Date, Time, Value on a webpage at that time.
Intention is to create a dataframe of the 3 values, every time the code runs, and append the dataframe to existing excel workbook (with one existing sheet).
I am able to create dataframe every time code runs, but when I try to append it to an excel file, it throws error:
ValueError: Sheet 'Sheet1' already exists and if_sheet_exists is set to 'error'
Can you please suggest, where am I going wrong.
# Importing Libraries
from datetime import datetime
import pandas as pd
import requests
from bs4 import BeautifulSoup
import openpyxl
#getting today's date amd formatting it
now = datetime.now()
Date = now.strftime ("%d/%m/%Y")
Time = now.strftime ("%H:%M")
# GET request to scrape. 'Page' variable to assign contents
page = requests.get("https://www.traderscockpit.com/?pageView=live-nse-advance-decline-ratio-chart")
# Create BeautifulSoup object to parse content
soup = BeautifulSoup(page.content, 'html.parser')
adv = soup.select_one('a:-soup-contains("Advanced:")').next_sibling.strip()
dec = soup.select_one('a:-soup-contains("Declined:")').next_sibling.strip()
ADratio = round(int(adv)/int(dec), 2)
df = pd.DataFrame({tuple([Date, Time, ADratio])})
#Load workbook and read last used row
path = r'C:\Users\kashk\OneDrive\Documents\ADratios.xlsx'
writer = pd.ExcelWriter (path, engine='openpyxl', mode = 'a')
wb = openpyxl.load_workbook(path)
startrow = writer.sheets['Sheet1'].max_row
#Append data frame to existing table in existing sheet
df.to_excel (writer, sheet_name = 'Sheet1', index = False, header = False, startrow = startrow)
writer.save()
writer.close()
A fast and easy solution would be upgrading your pandas > 1.4.0 since it provides a if_sheet_exists = 'overlay' Source
pd.ExcelWriter(path, engine='openpyxl', mode='a', if_sheet_exists='overlay')
If you don't want to upgrade your pandas, there is a way to work around by removing and re-write the sheet into the excel file. (Not recommended if you have a lot of records since it will be slow).
path, sheet_name = 'ADratios.xlsx' , 'Sheet 1'
df.columns = ['Date','Time','ADratio']
with pd.ExcelWriter(path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
book = openpyxl.load_workbook(path, 'r')
df_bak = pd.read_excel(path)
writer.book = openpyxl.load_workbook(path)
writer.book.remove(writer.book.worksheets[writer.book.sheetnames.index(sheet_name)])
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
pd.concat([df_bak, df], axis=0).to_excel(writer, sheet_name=sheet_name, index = False)
Trying to read multiple worksheets from the same excel file stored in the S3 bucket.
The code works fine for the first time, but while trying to execute the read_excel function for the second time to read another worksheet from the same excel file, getting an error. Though the "obj" object remains constant, the read_excel function doesn't work.
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File1 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet1",dtype=str, header= 4)
File2 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet2",dtype=str, header= 4)
ValueError: File is not a recognized excel file
For now, re-executing the get_object function one more time before executing the read_excel function for reading the second sheet from the same excel file.
like this
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File1 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet1",dtype=str, header= 4)
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File2 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet2",dtype=str, header= 4)
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0,0]))
File3 = pd.read_excel(io.BytesIO(obj['Body'].read()) , sheet_name = "Sheet3",dtype=str, header= 4)
This is of course very redundant coding, so looking for some advice. Thanks!!
Save what you read to be reused
obj = S3_Client.get_object(Bucket=bucket_name, Key=str(XLSX_Keys.iloc[0, 0]))
content = obj['Body'].read()
File1 = pd.read_excel(io.BytesIO(content), sheet_name="Sheet1", dtype=str, header=4)
File2 = pd.read_excel(io.BytesIO(content), sheet_name="Sheet2", dtype=str, header=4)
Currently I can write results from within each function to an individual file.
How would I write the results from the 2 functions to the same file?
I think I would need to pull out writer = pd.ExcelWriter('All Results', engine='xlsxwriter') with new file name outside of the function but I dont know how to handle the writing of each df_Final...
Input:
ExcelName='....'
t1=pd.read_excel('.....')
t2=['.......']
def F1(Input_Data):
writer = pd.ExcelWriter('F1_Results', engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[3],index=False, header=False)
writer.save()
return
def F2(Input_Data):
writer = pd.ExcelWriter('F2_Results', engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[7],index=False, header=False)
writer.save()
return
Solution:
This helper function might help you out:
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
NOTE: for Pandas < 0.21.0, replace sheet_name with sheetname!
Usage examples:
append_df_to_excel('/home/data/test.xlsx', df)
append_df_to_excel('/home/data/test.xlsx', df, header=None, index=False)
append_df_to_excel('/home/data/test.xlsx', df, sheet_name='Sheet2', index=False)
append_df_to_excel('/home/data/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)
You can modify the function to receive filename as a parameter
def write_to_excel(filename, input_data):
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[3],index=False, header=False)
writer.save()
return
#Then use like
write_to_excel("F2_Results", input_data)
I try to write to all files, that I have at the same time.
I have some files
izzymonroe#mail.ru.xlsx,
lucky-frog#mail.ru.xlsx,
lucky-frog#mail.ru.xlsx,
izzymonroe#mail.ru.xlsx,
Yubodrova#ya.ru.xlsx,
lucky-frog#mail.ru.xlsx,
Ant.karpoff2011#yandex.ru.xlsx
9rooney9#list.ru.xlsx
and I want to write data to this. But how can I send it to function(and I need to write to file value with groupby)
df = pd.read_excel('group.xlsx')
def add_xlsx_sheet(df, sheet_name=u'Смартфоны полно', index=True, digits=1, path='9rooney9#list.ru.xlsx'):
book = load_workbook(path)
writer = ExcelWriter('9rooney9#list.ru.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
if sheet_name in list(writer.sheets.keys()):
sh = book.get_sheet_by_name(sheet_name)
book.remove_sheet(sh)
df.to_excel(writer, sheet_name=u'Смартфоны полно', startrow=0, startcol=0,
float_format='%.{}f'.format(digits), index=index)
writer.save()
It works to one file, but it write all data to this file. But I need to write group, where id in mail complies the name of file
How can I specify all file in function and next
df.groupby('member_id').apply(lambda g: g.to_excel(str(g.name) + '.xlsx', 'sheet2'))
The problem was solved with df.groupby('col_name').apply(lambda x: add_xlsx_sheet(x, x.name, path='{}.xlsx'.format(x.name)))