How to append dataframe to excel

How to append dataframe to excel - python

I have a dataframe which I save it into an excel file at a certain location.
Currently I do this way:
df.to_excel(r'C:\Users\user_name\Downloads\test.xlsx')
Issue I am facing is when I insert the new dataframe it overwrites old ones. I want to append the new data. I tried several SOF answers but nothing seems to be working.

You can first read_excel, append and then write back to_excel:
filename = r'C:\Users\user_name\Downloads\test.xlsx'
existing = df.read_excel(filename)
output = existing.append(df)
output.to_excel(filename)
To check if the file exists before reading, you can use:
import os
filename = r'C:\Users\user_name\Downloads\test.xlsx'
if os.path.exists(filename):
existing = df.read_excel(filename)
output = existing.append(df)
else:
output = df
output.to_excel(filename)

One way to handle it is that you could read what is in Excel and combine it with your data frame, then overwrite the excel file/ generate it again basically.
Here's a sample of a similar question asked where a solution can be found with excel writer. Instead of overwriting the existing data, they just carefully set a startrow by reading the existing file for the startrow.
The last row/ start row can be found with the command writer.sheets[sheetname].max_row
append dataframe to excel with pandas
import pandas
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
for sheetname in writer.sheets:
df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
writer.save()

Related

pandas dataframe not posting correctly to excel

I've an excel sheet "Calcs" with 1 column name "old". I'm trying to add new column "new" with a fixed value of "1" to existing sheet "Calcs" and am using below code which is resulting 2 issues.
it's not updating existing sheet rather it's creating new sheet called "Calcs1"
After code is executed and while opening excel file, getting this error. (no such error while opening file before execution of the code).
We found a problem with some content in 'test1.xlsx'. Do you want us
to try to recover as much as we can? if you trust the source of this
workbook, click Yes.
Appreciate any help
import pandas as pd
from openpyxl import load_workbook
file = r"C:\test1.xlsx"
df2 = pd.read_excel(file, sheet_name = 'Calcs')
df2["new"] = "1"
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine = 'openpyxl')
writer.book = book
df2.to_excel(writer, sheet_name = 'Calcs')
writer.save()
writer.close()

Pandas dataframe to specific sheet in a excel file without losing formatting

I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE

You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()

Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.

I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.

The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()

You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.

Exporting dataframe to existing Excel sheet in workbook breaks pivot slicer connection in other sheets

My code works exactly like I would like it to by taking the data from the df and inserting it into the desired Excel file while skipping the appropriate rows. However, when I hit the .save() function other sheets that reference the data (mostly through pivots) seem to break even though they were not touched by the writer. I can insert the data into another Excel file, copy, and paste the exact same data where the python data puts it and the corresponding sheets do not break, but display the correct information. How do you stop other sheets from breaking when Python write to the file?
filename_in = 'File Location In'
filename_out = 'File Location Out'
sheet_name = 'Detail'
pos_detail_data_df.to_excel(filename_in, sheet_name=sheet_name, header = False, index = False)
df = pd.read_excel(filename_in, sheet_name=sheet_name)
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
writer.sheets
df.to_excel(writer, sheet_name, index=False, startrow = 2, header = False)
writer.save()
Edit:
The code was updated to reflect the assistance from below. However, now the process will simply remove everything from my filename_out and replace it with only the sheets from filename_in

I found an Excel file with a slicer so I took a look.
Sample file:
Site: https://www.contextures.com/excelpivottableslicers.html#download
Try:
import pandas as pd
from openpyxl import load_workbook
# sample Excel file with slicers.
# if required download and unzip and put in the folder with this script
sample_file = 'https://www.contextures.com/pivotsamples/regionsalesslicer.zip'
# set your filename_in, filename_out, and sheet_name
filename_in = 'regionsalesslicer.xlsx'
filename_out = 'regionsalesslicerUpdated.xlsx'
sheet_name = 'Sales Data'
# read in the Excel file with pd.read_excel rather than pd.ExcelFile
# just to play safe and avoid any BadZipFile: File is not a zip file errors
df = pd.read_excel(filename_in, sheet_name=sheet_name)
################## WHATEVER YOU WANT BELOW UNTIL LINE 37 ##################
# check the contents
print(df.head(2), '\n')
# make a change (or changes) to your df
# in the case just swap 'Carrot' for 'Orange' in the 'Product' column
df.loc[df['Product'] == 'Carrot', 'Product'] = 'Orange'
# check the contents after the change
print(df.head(2), '\n')
# as long as you have imported from the top two lines and read the file
# and not called ExcelWriter before this point all the other lines above
# are up to you.
################## WHATEVER YOU NEED ABOVE AFTER LINE 15 ##################
# from this point on try...
book = load_workbook(filename_in)
writer = pd.ExcelWriter(filename_out, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name, index=False)
writer.save()
In the resulting file (in the example code above we used filename_out = 'regionsalesslicerUpdated.xlsx'), the slicers still work.
Example:
Shows 'Orange'. Let's refresh the data...
Slicer/filter shows 'Orange'...
Exporting from pandas to Excel has not deleted any of the sheets etc...
We have successfully overwritten a dataframe to an existing sheet in Excel.

There is no way to do this if you are writing directly to the sheet unless you would like to pay for xlwings. A better (and easier to manage) solution is to change the way you are collecting your data from excel - Also, it won't break any dashboards or slicers you have. It will require some adjustments to your overall data pipeline and how you process it. Again, a one time thing that will pay dividends in the future.
Instead of writing directly to a sheet in the file, you can write to a separate file altogether.
df.to_excel(writer_path_to_seperate_sheet, sheet_name, index=False)
From excel you can now import this file (and every other file that you may write to the folder in the future) via power query.
Select either the file with your data, or preferably, the folder which will contain your file and all future files. Click combine and transform.
Once you complete this step, you can adjust your data set to your liking and load it. It will be a table by default (perfect for pivot tables and anything else). When new files are written to the folder, you simply click refresh on the table data set and wala. All slicers and other dashboard/pivots are left unaffected.

Copy Python list variable into existing xlsx excel file

I'm new to Python, so please excuse my ignorance. I have tried several different bits of code using openpyxl and pandas, however, can't get anything to work.
What I need is to copy the text of an existing list-variable in Python (which is an array of file paths), and paste this into an existing xlsx worksheet at a given cell.
For example, given the list-variable in Python ["apple", "orange", "grape"], I need cells A2, A3, and A4 of Sheet 1 to read the same.
Any help is much appreciated!
import pandas as pd
import os
folder = "C:\\Users\\user\\Documents\\temp"
x = []
for path in os.listdir(folder):
if path.endswith(".png"):
full_path = os.path.join(folder, path)
x.append(full_path)
fn = r"C:\\Users\\user\\Documents\\test.xlsx"
df = pd.read_excel(fn, header = None)
df2 = pd.DataFrame(x)
writer = pd.ExcelWriter(fn)
df2.to_excel(writer, startcol=0, startrow=1, header=None, index=False)
writer.save()
What this gets me is the correct information but seemingly overwrites any existing data.
All other cell contents and sheets are now gone and the newly added data is in an otherwise blank Sheet1. I need to maintain the existing spreadsheet as-is, and only add this info starting at a given cell on a given sheet.
Have also tried the following with xlwings, which imports the correct data into the existing sheet while maintaining all other data, but replaces the first cell with a 0. How do I get rid of this extra 0 from the dataframe?
filename = "C:\\Users\\user\\Documents\\test.xlsx"
df = pd.DataFrame(x)
wb = xw.Book(filename)
ws = wb.sheets('Sheet1')
ws.range('A2').options(index=False).value = df
wb = xw.Book(filename)
EDIT: The above xlwings code appears to work if I replace DataFrame with Series.
SOLUTION:
import xlwings as xw
filename = "C:\\Users\\user\\Documents\\test.xlsx"
df = pd.Series(x)
wb = xw.Book(filename)
ws = wb.sheets('Sheet1')
ws.range('A2').options(index=False).value = df
wb = xw.Book(filename)

I need to maintain the existing spreadsheet as-is
Open the excel file in append mode, take a look at this answer for example (other answers in the same thread are useful as well).

If you want to reference excel sheet by its indexing eg("A1") you can use openpyxl
from openpyxl import load_workbook
workbook = load_workbook(filename="sample.xlsx")
ll = ["apple", "orange", "grape"]
sheet = workbook.active
sheet["A2"] = ll[0]
sheet["A3"] = ll[1]
sheet["A4"] = ll[2]
workbook.save(filename="sample.xlsx")
Guess this solves your problem

Depending on how your data is formatted, this may work, if you are satisfied appending to the next empty column in the sheet. With this proviso, the pandas library is really good for handling tabular data and can work with Excel files (using the openpyxl library).
It looks like you've installed pandas already but, for the benefit of others reading, make sure you've installed pandas and openpyxl first:
pip install pandas openpyxl
Then try:
import pandas as pd
df = pd.read_excel('filename.xlsx')
df['new_column_name'] = listname
df.to_excel('output_spreadsheet.xlsx')
I personally find that more readable than using openpyxl directly.

import csv to xlsx python

I'm trying to put some data from a csv file to exist excel file.
my exist excel file contains images and xlrd cannot get images.
I try to use xlsxwriter but it cannot append to existing xslx.
the only solution I've found is to use openpyxl.
import openpyxl
xfile = openpyxl.load_workbook('my_exist_file')
sheet = xfile.get_sheet_by_name('Sheet1')
with open("my_csv", 'rb') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
-here is my problem-
how can I write the csv data (that is a table) to a specific location in the exist xslx? I want that my table will start at K2 cell.
thanks!

reading the CSV
using pandas.read_csv to extract the information
import pandas as pd
df = pd.read_csv(my_filename)
Options you might need to specify
sep: which separator is used
encoding
header: Is the first row a row of labels?
index_col: is the first column an index
adding to an excel worksheet
inspired by: https://stackoverflow.com/a/20221655/1562285
check the pandas.to_excel documentation for other possible options
book = load_workbook(old_filename)
sheet_name = 'Sheet1'
with pd.ExcelWriter(new_filename, engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name, startrow=1, startcol=10, engine='openpyxl')
The startrow and startcol say where in the worksheet you want to paste your data.
This method might overwrite the previous content on this worksheet. If it does, you will have to loop over the columns and rows of the DataFrame and add them semi-manually to the worksheet
Inserting images
If you have the images to insert somewhere externally you can use the code from the documentation
from openpyxl.drawing.image import Image
ws = book['sheet_name_for_images']
ws['A1'] = 'You should see three logos below'
img = Image('logo.png')
# add to worksheet and anchor next to cells
ws.add_image(img, 'A1')
I did not test this, and you might need to insert this code before the writer.sheets = ...

Use the worksheet's cell method to update a specific cell
sheet.cell(row=<row>, column=<col>, value=<val>)
It is usually a good idea to use keep_vba=True while loading workbook. Check the help page for more details.
Also check answer to this question.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to append dataframe to excel - python

Related

pandas dataframe not posting correctly to excel

Pandas dataframe to specific sheet in a excel file without losing formatting

Exporting dataframe to existing Excel sheet in workbook breaks pivot slicer connection in other sheets

Copy Python list variable into existing xlsx excel file

import csv to xlsx python

Categories

Resources