I have an existing excel file called test.xlsx, which has a table in sheet one tables, which points to the data in the second sheet data.
I want to keep the tables sheet and its table while overwriting the entirety of the data sheet with the contents of the new_df dataframe.
My program:
import pandas as pd
import openpyxl
from openpyxl import load_workbook
file_to_update = r'C:\folder\test.xlsx'
book = load_workbook(file_to_update)
writer = pd.ExcelWriter(file_to_update, engine = 'openpyxl')
writer.book = book
new_df = pd.read_csv(r'C:\folder\other_csv.csv', sep = '|')
new_df.to_excel(writer, index = False)
writer.save()
writer.close()
The previously existing data is still there, but the new dataframe new_df data is in a new sheet, Sheet1.
The objective is to keep the existing tables sheet, while overwriting the data sheet with the data from the new dataframe new_df.
Related
I am having trouble updating an Excel Sheet using pandas by writing new values in it. I already have an existing frame df1 that reads the values from MySheet1.xlsx. so this needs to either be a new dataframe or somehow to copy and overwrite the existing one.
The spreadsheet is in this format:
I have a python list: values_list = [12.34, 17.56, 12.45]. My goal is to insert the list values under Col_C header vertically. It is currently overwriting the entire dataframe horizontally, without preserving the current values.
df2 = pd.DataFrame({'Col_C': values_list})
writer = pd.ExcelWriter('excelfile.xlsx', engine='xlsxwriter')
df2.to_excel(writer, sheet_name='MySheet1')
workbook = writer.book
worksheet = writer.sheets['MySheet1']
How to get this end result? Thank you!
Below I've provided a fully reproducible example of how you can go about modifying an existing .xlsx workbook using pandas and the openpyxl module (link to Openpyxl Docs).
First, for demonstration purposes, I create a workbook called test.xlsx:
from openpyxl import load_workbook
import pandas as pd
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
wb = writer.book
df = pd.DataFrame({'Col_A': [1,2,3,4],
'Col_B': [5,6,7,8],
'Col_C': [0,0,0,0],
'Col_D': [13,14,15,16]})
df.to_excel(writer, index=False)
wb.save('test.xlsx')
This is the Expected output at this point:
In this second part, we load the existing workbook ('test.xlsx') and modify the third column with different data.
from openpyxl import load_workbook
import pandas as pd
df_new = pd.DataFrame({'Col_C': [9, 10, 11, 12]})
wb = load_workbook('test.xlsx')
ws = wb['Sheet1']
for index, row in df_new.iterrows():
cell = 'C%d' % (index + 2)
ws[cell] = row[0]
wb.save('test.xlsx')
This is the Expected output at the end:
In my opinion, the easiest solution is to read the excel as a panda's dataframe, and modify it and write out as an excel. So for example:
Comments:
Import pandas as pd.
Read the excel sheet into pandas data-frame called.
Take your data, which could be in a list format, and assign it to the column you want. (just make sure the lengths are the same). Save your data-frame as an excel, either override the old excel or create a new one.
Code:
import pandas as pd
ExcelDataInPandasDataFrame = pd.read_excel("./YourExcel.xlsx")
YourDataInAList = [12.34,17.56,12.45]
ExcelDataInPandasDataFrame ["Col_C"] = YourDataInAList
ExcelDataInPandasDataFrame .to_excel("./YourNewExcel.xlsx",index=False)
'`import pandas
from openpyxl import load_workbook
mypath="C:\Users\egoyrat\Desktop\smt tracker\Swap Manual Tracking_v1 (12).xlsx"
wb = load_workbook(mypath,read_only=False)
wb_ws= wb['Main']
for row in dataframe_to_rows(now_append, header = False, index = False):
wb_ws.append(row)
wb.save(mypath) # save workbook
wb.close()
writer.save()
writer.close()`
I have done this but it is not proper working, data is appending bt not particular column
I have 7 excel sheets in one workbook and I am trying to copy and paste the data from each excel sheet into my final sheet. the code below creates the final sheet called 'Final Sheet' but does not copy any of the data from each sheet. I need a loop to go through each sheet and copy and paste the data into the final sheet but don't know how to do it.
Sheet 1 = North America, Sheet 2 = Japan, Sheet 3 = China etc
`#create final list sheet
open = openpyxl.load_workbook(filepath)
ws2 = open.create_sheet('Final List') # this creates the final sheet
open.save(filepath)`
`#put data into final list
wb = openpyxl.load_workbook(filepath)
sheet1 = open.get_sheet_by_name('North America')
finalListSheet = open.get_sheet_by_name('Final List')
wb.save(filepath)`
A similar question was asked here: Python Loop through Excel sheets, place into one df
I simplify this here. This method use Pandas:
import pandas as pd
sheets_dict = pd.read_excel(filepath, sheetname=None)
full_table = pd.DataFrame()
//Loop in sheets
for name, sheet in sheets_dict.items():
sheet['sheet'] = name
full_table = full_table.append(sheet)
//Need to save the DF in your Final Sheet
Here's another question about how to save dataframe (DF) in specific Excel sheet: Pandas Dataframe to excel sheet
Is there a way to insert a worksheet at a specified index using Pandas? With the code below, when adding a dataframe as a new worksheet, it gets added after the last sheet in the existing Excel file. What if I want to insert it at say index 1?
import pandas as pd
from openpyxl import load_workbook
f = 'existing_file.xlsx'
df = pd.DataFrame({'cat':['A','B'], 'word': ['C','D']})
book = load_workbook(f)
writer = pd.ExcelWriter(f, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'sheet')
writer.save()
writer.close()
Thank you.
I have a pandas dataframe and I want to open an existing excel workbook containing formulas, copying the dataframe in a specific set of columns (lets say from column A to column H) and save it as a new file with a different name.
The idea is to update an existing template, populate it with the dataframe in a specified set of column and then save a copy of the Excel file with a different name.
Any idea?
What I have is:
import pandas
from openpyxl import load_workbook
book = load_workbook('Template.xlsx')
writer = pandas.ExcelWriter('Template.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer)
writer.save()
The below should work, assuming that you are happy to copy into column A. I don't see a way to write into the sheet starting in a different column (without overwriting anything).
The below incorporates #MaxU's suggestion of copying the template sheet before writing to it (having just lost a few hours' work on my own template workbook to pd.to_excel)
import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from shutil import copyfile
template_file = 'Template.xlsx' # Has a header in row 1 already
output_file = 'Result.xlsx' # What we are saving the template as
# Copy Template.xlsx as Result.xlsx
copyfile(template_file, output_file)
# Read in the data to be pasted into the termplate
df = pd.read_csv('my_data.csv')
# Load the workbook and access the sheet we'll paste into
wb = load_workbook(output_file)
ws = wb.get_sheet_by_name('Existing Result Sheet')
# Selecting a cell in the header row before writing makes append()
# start writing to the following line i.e. row 2
ws['A1']
# Write each row of the DataFrame
# In this case, I don't want to write the index (useless) or the header (already in the template)
for r in dataframe_to_rows(df, index=False, header=False):
ws.append(r)
wb.save(output_file)
try this:
df.to_excel(writer, startrow=10, startcol=1, index=False, engine='openpyxl')
Pay attention at startrow and startcol parameters