How to write at the same time to Different excel Using Python - python

I try to write to all files, that I have at the same time.
I have some files
izzymonroe#mail.ru.xlsx,
lucky-frog#mail.ru.xlsx,
lucky-frog#mail.ru.xlsx,
izzymonroe#mail.ru.xlsx,
Yubodrova#ya.ru.xlsx,
lucky-frog#mail.ru.xlsx,
Ant.karpoff2011#yandex.ru.xlsx
9rooney9#list.ru.xlsx
and I want to write data to this. But how can I send it to function(and I need to write to file value with groupby)
df = pd.read_excel('group.xlsx')
def add_xlsx_sheet(df, sheet_name=u'Смартфоны полно', index=True, digits=1, path='9rooney9#list.ru.xlsx'):
book = load_workbook(path)
writer = ExcelWriter('9rooney9#list.ru.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
if sheet_name in list(writer.sheets.keys()):
sh = book.get_sheet_by_name(sheet_name)
book.remove_sheet(sh)
df.to_excel(writer, sheet_name=u'Смартфоны полно', startrow=0, startcol=0,
float_format='%.{}f'.format(digits), index=index)
writer.save()
It works to one file, but it write all data to this file. But I need to write group, where id in mail complies the name of file
How can I specify all file in function and next
df.groupby('member_id').apply(lambda g: g.to_excel(str(g.name) + '.xlsx', 'sheet2'))

The problem was solved with df.groupby('col_name').apply(lambda x: add_xlsx_sheet(x, x.name, path='{}.xlsx'.format(x.name)))

Related

Property 'sheets' of 'OpenpyxlWriter' object has no setter using pandas and openpyxl

This code used to get a xlsx file and write over it, but after updating from pandas 1.1.5 to 1.5.1 I got zipfile.badzipfile file is not a zip file
Then I read here that after pandas 1.2.0 the pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it.
Knowing that, I changed the code to this one, but now I'm getting AttributeError: property 'sheets' of 'OpenpyxlWriter' object has no setter. How should I handle this?
book = load_workbook('Resultados.xlsx')
writer = pd.ExcelWriter('Resultados.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel(r'Resultados.xlsx')
df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
TLDR
Use .update to modify writer.sheets
Rearrange the order of your script to get it working
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# use `with` to avoid other exceptions
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
Details
Recreating your problem with some fake data
import numpy as np
from openpyxl import load_workbook
import pandas as pd
if __name__ == "__main__":
# make some random data
np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# this makes an existing file
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
df.to_excel(excel_writer=writer)
# make new random data
np.random.seed(1)
df = pd.DataFrame(np.random.random(size=(5, 5)))
# what you tried...
book = load_workbook("Resultados.xlsx")
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
reader = pd.read_excel("Resultados.xlsx")
# skipping this step as we defined `df` differently
# df = pd.DataFrame.from_dict(dict_)
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
writer.close()
We get the same error plus a FutureWarning
...\StackOverflow\answer.py:23: FutureWarning: Setting the `book` attribute is not part of the public API, usage can give unexpected or corrupted results and will be removed in a future version
writer.book = book
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 24, in <module>
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
AttributeError: can't set attribute 'sheets'
The AttributeError is because sheets is a property of the writer instance. If you're unfamiliar with it, here is a resource.
In shorter terms, the exception is raised because sheets cannot be modified in the way you're trying. However, you can do this:
# use the `.update` method
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
That will move us past the the AttributeError, but we'll hit a ValueError a couple lines down:
reader = pd.read_excel("Resultados.xlsx")
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx")
...
File "...\lib\site-packages\pandas\io\excel\_base.py", line 1656, in __init__
raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Do what the error message says and supply an argument to the engine parameter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
And now we're back to your original zipfile.BadZipFile exception
Traceback (most recent call last):
File "...\StackOverflow\answer.py", line 26, in <module>
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
...
File "...\Local\Programs\Python\Python310\lib\zipfile.py", line 1334, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
After a bit of toying, I noticed that the Resultados.xlsx file could not be opened manually after running this line:
writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
So I reordered some of the steps in your code:
# run before initializing the ExcelWriter
reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
book = load_workbook("Resultados.xlsx")
# the old way
# writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
writer.book = book
writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
try this:
filepath = r'Resultados.xlsx'
with pd.ExcelWriter(
filepath,
engine='openpyxl',
mode='a',
if_sheet_exists='overlay') as writer:
reader = pd.read_excel(filepath)
df.to_excel(
writer,
startrow=reader.shape[0] + 1,
index=False,
header=False)

Pandas ExcelWriter overwriting sheets using for loop

I have a nested for loop that is taking data from a master file with multiple sheets and splits the data out by user for each sheet. I want to then write each users data to their own file with the same sheets as the master file.
Here is what I have so far:
tm_sheet_to_df_map = pd.read_excel(src_file_tm, sheet_name=None)
for key, value in sorted(tm_sheet_to_df_map.items(),reverse=True):
tm_group = value.groupby('TM')
for TM, group_df in tm_group:
attachment = attachment_path_tm / f'{TM}' / f'Q221 New Accounts - {TM}.xlsx'
attachment1 = os.makedirs(os.path.dirname(attachment), exist_ok=True)
writer = ExcelWriter(attachment, engine = 'xlsxwriter')
group_df.to_excel(writer, sheet_name =f'{key}', index=False)
writer.save()
PROBLEM - The above script creates a new file for each user, but will only write the final dataframe to each file, instead of adding all sheets from the master file. Any ideas how to write each sheet to the individual files? I've tried moving
writer.save() outside the loop with no luck.
You need an "append" mode for ExcelWriter:
try:
# append mode will fail if file does not exist
writer = ExcelWriter(attachment, engine = 'openpyxl', mode="a")
except FileNotFoundError:
writer = ExcelWriter(attachment, engine = 'openpyxl')
ExcelWriter docs

writing data to the same file from different functions

Currently I can write results from within each function to an individual file.
How would I write the results from the 2 functions to the same file?
I think I would need to pull out writer = pd.ExcelWriter('All Results', engine='xlsxwriter') with new file name outside of the function but I dont know how to handle the writing of each df_Final...
Input:
ExcelName='....'
t1=pd.read_excel('.....')
t2=['.......']
def F1(Input_Data):
writer = pd.ExcelWriter('F1_Results', engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[3],index=False, header=False)
writer.save()
return
def F2(Input_Data):
writer = pd.ExcelWriter('F2_Results', engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[7],index=False, header=False)
writer.save()
return
Solution:
This helper function might help you out:
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
NOTE: for Pandas < 0.21.0, replace sheet_name with sheetname!
Usage examples:
append_df_to_excel('/home/data/test.xlsx', df)
append_df_to_excel('/home/data/test.xlsx', df, header=None, index=False)
append_df_to_excel('/home/data/test.xlsx', df, sheet_name='Sheet2', index=False)
append_df_to_excel('/home/data/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)
You can modify the function to receive filename as a parameter
def write_to_excel(filename, input_data):
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
.
.
.
df_Final.to_excel(writer, sheet_name=writeto[3],index=False, header=False)
writer.save()
return
#Then use like
write_to_excel("F2_Results", input_data)

Create Excel Tables from Dictionary of Dataframes

I have dictionary of dataframes.
dd = {
'table': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
'another_table':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
'and_another_table':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
}
I would like to create an Excel file which contains Excel Table objects created from these dataframes. Each Table needs to be on a separate Tab/Sheet and Table names should match dataframe names.
Is this possible to do with Python?
So far I was only able to export data to Excel normally without converting to tables using xlsxwriter
writer = pd.ExcelWriter('Results.xlsx', engine='xlsxwriter')
for sheet, frame in dd.items():
frame.to_excel(writer, sheet_name = sheet)
writer.save()
For writing multiple sheets from Pandas, use the openpyxl library. In addition, to prevent overwriting, set the workbook sheets before each update.
Try this code:
import pandas as pd
import openpyxl
dd = {
'table': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
'another_table':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
'and_another_table':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
}
filename = 'Results.xlsx' # must exist
wb = openpyxl.load_workbook(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl')
for sheet, frame in dd.items():
writer.sheets = dict((ws.title, ws) for ws in wb.worksheets) # need this to prevent overwrite
frame.to_excel(writer, index=False, sheet_name = sheet)
writer.save()
# convert data to tables
wb = openpyxl.load_workbook(filename)
for ws in wb.worksheets:
mxrow = ws.max_row
mxcol = ws.max_column
tab = openpyxl.worksheet.table.Table(displayName=ws.title, ref="A1:" + ws.cell(mxrow,mxcol).coordinate)
ws.add_table(tab)
wb.save(filename)
Output

AttributeError when trying to save pandas dataframe to existing excel sheet

I am trying to write a pandas data frame to an existing excel sheet on a new tab, but it gives me the following error:
AttributeError: 'NoneType' object has no attribute 'read'.
I've determined this is because pandas to_excel returns a NoneType object, which isn't allowing me to save the file with writer.save(). Does anyone know a workaround for this?
path = 'summary.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name="results")
writer.save()
I had exactly the same issue.
I managed to work around it by removing the value in legacy_drawing from each sheet in the workbook.
path = 'summary.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
for s in list(writer.sheets.keys()):
writer.sheets[s].legacy_drawing = None
df.to_excel(writer, sheet_name="results")
writer.save()

Categories