Appending sheets to Excel in Pandas

Appending sheets to Excel in Pandas - python

I am trying to add a new excel sheet for every day.
I have some parts of this working os path checks if the file exists and if it doesn't it makes a new one.
Else it should just append data to a new sheet labeled with the date. It works to actually make the sheet but it overwrites any previous data.
It also sometimes throws this error when it gets to the for loop: zipfile.BadZipFile(File is not a zip file)
As a side issue I believe it is also overwriting data on the sheet when I need to append it but I think its the same problem.
import os
import datetime as dt
import pandas as pd
import xlsxwriter
from openpyxl import load_workbook
filename = "exceltest.xlsx"
current = dt.datetime.now().strftime("%m-%d-%y")
def makeXL():
df = pd.DataFrame({"fname":["First Name"], "lname":["Last Name"], "id":["ID"], "time":["Time"]})
print(df)
lastSheet = ""
if os.path.exists(filename) == False:
writer = pd.ExcelWriter(filename, engine='openpyxl')
df.to_excel(writer, sheet_name=current, index=False)
writer.close()
else:
for sheet in pd.read_excel(filename, engine='openpyxl').sheet_names:
print("sheet",sheet)
lastSheet = sheet
if lastSheet != current:
writer = pd.ExcelWriter(filename, engine='openpyxl')
df.to_excel(writer, sheet_name=current, index=False, mode="a")
writer.close()
makeXL()

Related

Pandas creates new excel sheet when trying to append to existing sheet

I have the code where I want to read data from the current sheet, store it in df_old, append the current data to it using df = df_old.append(df) and then replace the data in the sheet with this new dataframe. However, what it does instead is create a new sheet with the exact same name where it publishes this new dataframe. I tried adding if_sheet_exists="replace" as an argument to ExcelWriter but this did not change anything. How can I force it to overwrite the data in the sheet with the current name?
df_old = pd.read_excel(r'C:\Users\XXX\Downloads\Digitalisation\mat_flow\reblend_v2.xlsx',sheet_name = ft_tags_final[i][j])
df = df_old.append(df)
with pd.ExcelWriter(r'C:\Users\XXX\Downloads\Digitalisation\mat_flow\reblend_v2.xlsx', engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, index=False, sheet_name = ft_tags_final[i][j])

I had the same issue and i solved it with using write instead of append. Also i used openpyxl instead of xlsxwriter
from pandas import ExcelWriter
from pandas import ExcelFile
from openpyxl import load_workbook
book = load_workbook('Wallet.xlsx')
writer = pd.ExcelWriter('Wallet.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
#^THIS IS THE MOST IMPORTANT LINES BECAUSE IT GIVES PANDAS THE SHEET
Data.to_excel(writer, sheet_name='Main', header=None, index=False, startcol=number,startrow=counter)

Xlsxwriter writer is writing its own sheets and deletes existing ones

I am wring dataframes to excel. Maybe I am not doing it correctly,
When I use this code:
from datetime import datetime
import numpy as np
import pandas as pd
from openpyxl import load_workbook
start = datetime.now()
df = pd.read_excel(r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal "
r"Data\Historical Worksheet\data.xlsx", sheet_name='x1')
df['run_time'] = start
df1 = pd.read_csv(r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal "
r"Data\Pre-processed\oddsportal_upcoming_matches.csv")
df1['run_time'] = start
concat = [df, df1]
df_c = pd.concat(concat)
path = r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal Data\Historical Worksheet\data.xlsx"
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='x1')
df1.to_excel(writer, sheet_name='x2')
df_c.to_excel(writer, sheet_name='upcoming_archive')
writer.save()
writer.close()
print(df_c.head())
The dataframes are written in their respective sheets and all the other existing sheets get deleted.
How can i write to only the respective sheets and not disturb the other existing ones?

xlsxwriter is Not meant to alter an existing xlsx file. The only savier is openpyxl, which does the job but is hard to learn. I even wrote a simple python script to fill the gap to write a bunch of rows or columns in a sheet - openpyxl_writers.py

You just need to use the append mode and set if_sheet_exists to replace and use openpyxl as engine.
Replace:
writer = pd.ExcelWriter('test.xlsx')
By:
writer = pd.ExcelWriter('test.xlsx', mode='a', engine='openpyxl',
if_sheet_exists='replace') # <- HERE
From the documentation:
mode{‘w’, ‘a’}, default ‘w’

How do I use a loop to write data to multiple Excel sheets in 1 workbook [duplicate]

I want to use excel files to store data elaborated with python. My problem is that I can't add sheets to an existing excel file. Here I suggest a sample code to work with in order to reach this issue
import pandas as pd
import numpy as np
path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)
x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)
writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()
This code saves two DataFrames to two sheets, named "x1" and "x2" respectively. If I create two new DataFrames and try to use the same code to add two new sheets, 'x3' and 'x4', the original data is lost.
import pandas as pd
import numpy as np
path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)
x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)
writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()
I want an excel file with four sheets: 'x1', 'x2', 'x3', 'x4'.
I know that 'xlsxwriter' is not the only "engine", there is 'openpyxl'. I also saw there are already other people that have written about this issue, but still I can't understand how to do that.
Here a code taken from this link
import pandas
from openpyxl import load_workbook
book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
writer.save()
They say that it works, but it is hard to figure out how. I don't understand what "ws.title", "ws", and "dict" are in this context.
Which is the best way to save "x1" and "x2", then close the file, open it again and add "x3" and "x4"?

Thank you. I believe that a complete example could be good for anyone else who have the same issue:
import pandas as pd
import numpy as np
path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)
x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)
writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.close()
Here I generate an excel file, from my understanding it does not really matter whether it is generated via the "xslxwriter" or the "openpyxl" engine.
When I want to write without loosing the original data then
import pandas as pd
import numpy as np
from openpyxl import load_workbook
path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)
x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)
df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.close()
this code do the job!

For creating a new file
x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)
with pd.ExcelWriter('sample.xlsx') as writer:
df1.to_excel(writer, sheet_name='x1')
For appending to the file, use the argument mode='a' in pd.ExcelWriter.
x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)
with pd.ExcelWriter('sample.xlsx', engine='openpyxl', mode='a') as writer:
df2.to_excel(writer, sheet_name='x2')
Default is mode ='w'.
See documentation.

In the example you shared you are loading the existing file into book and setting the writer.book value to be book. In the line writer.sheets = dict((ws.title, ws) for ws in book.worksheets) you are accessing each sheet in the workbook as ws. The sheet title is then ws so you are creating a dictionary of {sheet_titles: sheet} key, value pairs. This dictionary is then set to writer.sheets. Essentially these steps are just loading the existing data from 'Masterfile.xlsx' and populating your writer with them.
Now let's say you already have a file with x1 and x2 as sheets. You can use the example code to load the file and then could do something like this to add x3 and x4.
path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
writer = pd.ExcelWriter(path, engine='openpyxl')
df3.to_excel(writer, 'x3', index=False)
df4.to_excel(writer, 'x4', index=False)
writer.save()
That should do what you are looking for.

A simple example for writing multiple data to excel at a time. And also when you want to append data to a sheet on a written excel file (closed excel file).
When it is your first time writing to an excel. (Writing "df1" and "df2" to "1st_sheet" and "2nd_sheet")
import pandas as pd
from openpyxl import load_workbook
df1 = pd.DataFrame([[1],[1]], columns=['a'])
df2 = pd.DataFrame([[2],[2]], columns=['b'])
df3 = pd.DataFrame([[3],[3]], columns=['c'])
excel_dir = "my/excel/dir"
with pd.ExcelWriter(excel_dir, engine='xlsxwriter') as writer:
df1.to_excel(writer, '1st_sheet')
df2.to_excel(writer, '2nd_sheet')
writer.save()
After you close your excel, but you wish to "append" data on the same excel file but another sheet, let's say "df3" to sheet name "3rd_sheet".
book = load_workbook(excel_dir)
with pd.ExcelWriter(excel_dir, engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
## Your dataframe to append.
df3.to_excel(writer, '3rd_sheet')
writer.save()
Be noted that excel format must not be xls, you may use xlsx one.

Every time you want to save a Pandas DataFrame to an Excel, you may call this function:
import os
def save_excel_sheet(df, filepath, sheetname, index=False):
# Create file if it does not exist
if not os.path.exists(filepath):
df.to_excel(filepath, sheet_name=sheetname, index=index)
# Otherwise, add a sheet. Overwrite if there exists one with the same name.
else:
with pd.ExcelWriter(filepath, engine='openpyxl', if_sheet_exists='replace', mode='a') as writer:
df.to_excel(writer, sheet_name=sheetname, index=index)

I would strongly recommend you work directly with openpyxl since it now supports Pandas DataFrames.
This allows you to concentrate on the relevant Excel and Pandas code.

Can do it without using ExcelWriter, using tools in openpyxl
This can make adding fonts to the new sheet much easier using openpyxl.styles
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
#Location of original excel sheet
fileLocation =r'C:\workspace\data.xlsx'
#Location of new file which can be the same as original file
writeLocation=r'C:\workspace\dataNew.xlsx'
data = {'Name':['Tom','Paul','Jeremy'],'Age':[32,43,34],'Salary':[20000,34000,32000]}
#The dataframe you want to add
df = pd.DataFrame(data)
#Load existing sheet as it is
book = load_workbook(fileLocation)
#create a new sheet
sheet = book.create_sheet("Sheet Name")
#Load dataframe into new sheet
for row in dataframe_to_rows(df, index=False, header=True):
sheet.append(row)
#Save the modified excel at desired location
book.save(writeLocation)

You can read existing sheets of your interests, for example, 'x1', 'x2', into memory and 'write' them back prior to adding more new sheets (keep in mind that sheets in a file and sheets in memory are two different things, if you don't read them, they will be lost). This approach uses 'xlsxwriter' only, no openpyxl involved.
import pandas as pd
import numpy as np
path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
# begin <== read selected sheets and write them back
df1 = pd.read_excel(path, sheet_name='x1', index_col=0) # or sheet_name=0
df2 = pd.read_excel(path, sheet_name='x2', index_col=0) # or sheet_name=1
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df1.to_excel(writer, sheet_name='x1')
df2.to_excel(writer, sheet_name='x2')
# end ==>
# now create more new sheets
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)
x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)
df3.to_excel(writer, sheet_name='x3')
df4.to_excel(writer, sheet_name='x4')
writer.save()
writer.close()
If you want to preserve all existing sheets, you can replace above code between begin and end with:
# read all existing sheets and write them back
writer = pd.ExcelWriter(path, engine='xlsxwriter')
xlsx = pd.ExcelFile(path)
for sheet in xlsx.sheet_names:
df = xlsx.parse(sheet_name=sheet, index_col=0)
df.to_excel(writer, sheet_name=sheet)

Another fairly simple way to go about this is to make a method like this:
def _write_frame_to_new_sheet(path_to_file=None, sheet_name='sheet', data_frame=None):
book = None
try:
book = load_workbook(path_to_file)
except Exception:
logging.debug('Creating new workbook at %s', path_to_file)
with pd.ExcelWriter(path_to_file, engine='openpyxl') as writer:
if book is not None:
writer.book = book
data_frame.to_excel(writer, sheet_name, index=False)
The idea here is to load the workbook at path_to_file if it exists and then append the data_frame as a new sheet with sheet_name. If the workbook does not exist, it is created. It seems that neither openpyxl or xlsxwriter append, so as in the example by #Stefano above, you really have to load and then rewrite to append.

#This program is to read from excel workbook to fetch only the URL domain names and write to the existing excel workbook in a different sheet..
#Developer - Nilesh K
import pandas as pd
from openpyxl import load_workbook #for writting to the existing workbook
df = pd.read_excel("urlsearch_test.xlsx")
#You can use the below for the relative path.
# r"C:\Users\xyz\Desktop\Python\
l = [] #To make a list in for loop
#begin
#loop starts here for fetching http from a string and iterate thru the entire sheet. You can have your own logic here.
for index, row in df.iterrows():
try:
str = (row['TEXT']) #string to read and iterate
y = (index)
str_pos = str.index('http') #fetched the index position for http
str_pos1 = str.index('/', str.index('/')+2) #fetched the second 3rd position of / starting from http
str_op = str[str_pos:str_pos1] #Substring the domain name
l.append(str_op) #append the list with domain names
#Error handling to skip the error rows and continue.
except ValueError:
print('Error!')
print(l)
l = list(dict.fromkeys(l)) #Keep distinct values, you can comment this line to get all the values
df1 = pd.DataFrame(l,columns=['URL']) #Create dataframe using the list
#end
#Write using openpyxl so it can be written to same workbook
book = load_workbook('urlsearch_test.xlsx')
writer = pd.ExcelWriter('urlsearch_test.xlsx',engine = 'openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name = 'Sheet3')
writer.save()
writer.close()
#The below can be used to write to a different workbook without using openpyxl
#df1.to_excel(r"C:\Users\xyz\Desktop\Python\urlsearch1_test.xlsx",index='false',sheet_name='sheet1')

if you want to add empty sheet
xw = pd.ExcelWriter(file_path, engine='xlsxwriter')
pd.DataFrame().to_excel(xw, 'sheet11')
if you get empty sheet
sheet = xw.sheets['sheet11']

import pandas as pd
import openpyxl
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
data_df.to_excel(writer, 'sheet_name')
writer.save()
writer.close()

The following solution worked for me:
# dataframe to save
df = pd.DataFrame({"A":[1,2], "B":[3,4]})
# path where you want to save
path = "./..../..../.../test.xlsx"
# if an excel sheet named `test` is already present append on sheet 2
if os.path.isfile(path):
with pd.ExcelWriter(path, mode='a') as writer:
df.to_excel(writer, sheet_name= "sheet_2")
else:
# if not present then write to a excel file on sheet 1
with pd.ExcelWriter(path) as writer:
df.to_excel(writer, sheet_name= "sheet_1")
Now, if you want to write multiple dataframes on different sheets, simply add a loop and keep on changing the sheet_name.

Insert worksheet at specified index in existing Excel file using Pandas

Is there a way to insert a worksheet at a specified index using Pandas? With the code below, when adding a dataframe as a new worksheet, it gets added after the last sheet in the existing Excel file. What if I want to insert it at say index 1?
import pandas as pd
from openpyxl import load_workbook
f = 'existing_file.xlsx'
df = pd.DataFrame({'cat':['A','B'], 'word': ['C','D']})
book = load_workbook(f)
writer = pd.ExcelWriter(f, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'sheet')
writer.save()
writer.close()
Thank you.

Adding a pandas.DataFrame to Existing Excel File

I have a web scraper which creates an excel file for this month's scrapes. I want to add today's scrape and every scrape for that month into that file as a new sheet every time it is run. My issue, however, has been that it only overwrites the existing sheet with a new sheet instead of adding it as a separate new sheet. I've tried to do it with xlrd, xlwt, pandas, and openpyxl.
Still brand new to Python so simplicity is appreciated!
Below is just the code dealing with writing the excel file.
# My relevant time variables
ts = time.time()
date_time = datetime.datetime.fromtimestamp(ts).strftime('%y-%m-%d %H_%M_%S')
HourMinuteSecond = datetime.datetime.fromtimestamp(ts).strftime('%H_%M_%S')
month = datetime.datetime.now().strftime('%m-%y')
# Creates a writer for this month and year
writer = pd.ExcelWriter(
'C:\\Users\\G\\Desktop\\KickstarterLinks(%s).xlsx' % (month),
engine='xlsxwriter')
# Creates dataframe from my data, d
df = pd.DataFrame(d)
# Writes to the excel file
df.to_excel(writer, sheet_name='%s' % (HourMinuteSecond))
writer.save()

Update:
This functionality has been added to pandas 0.24.0:
ExcelWriter now accepts mode as a keyword argument, enabling append to existing workbooks when using the openpyxl engine (GH3441)
Previous version:
Pandas has an open feature request for this.
In the mean time, here is a function which adds a pandas.DataFrame to an existing workbook:
Code:
def add_frame_to_workbook(filename, tabname, dataframe, timestamp):
"""
Save a dataframe to a workbook tab with the filename and tabname
coded to timestamp
:param filename: filename to create, can use strptime formatting
:param tabname: tabname to create, can use strptime formatting
:param dataframe: dataframe to save to workbook
:param timestamp: timestamp associated with dataframe
:return: None
"""
filename = timestamp.strftime(filename)
sheet_name = timestamp.strftime(tabname)
# create a writer for this month and year
writer = pd.ExcelWriter(filename, engine='openpyxl')
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# copy existing sheets
writer.sheets = dict(
(ws.title, ws) for ws in writer.book.worksheets)
except IOError:
# file does not exist yet, we will create it
pass
# write out the new sheet
dataframe.to_excel(writer, sheet_name=sheet_name)
# save the workbook
writer.save()
Test Code:
import datetime as dt
import pandas as pd
from openpyxl import load_workbook
data = [x.strip().split() for x in """
Date Close
2016-10-18T13:44:59 2128.00
2016-10-18T13:59:59 2128.75
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])
name_template = './sample-%m-%y.xlsx'
tab_template = '%d_%H_%M'
now = dt.datetime.now()
in_an_hour = now + dt.timedelta(hours=1)
add_frame_to_workbook(name_template, tab_template, df, now)
add_frame_to_workbook(name_template, tab_template, df, in_an_hour)
(Source)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Appending sheets to Excel in Pandas - python

Related

Pandas creates new excel sheet when trying to append to existing sheet

Xlsxwriter writer is writing its own sheets and deletes existing ones

How do I use a loop to write data to multiple Excel sheets in 1 workbook [duplicate]

Insert worksheet at specified index in existing Excel file using Pandas

Adding a pandas.DataFrame to Existing Excel File

Categories

Resources