Write a pandas dataframe into an existing excel file [duplicate] - python

I am having trouble updating an Excel Sheet using pandas by writing new values in it. I already have an existing frame df1 that reads the values from MySheet1.xlsx. so this needs to either be a new dataframe or somehow to copy and overwrite the existing one.
The spreadsheet is in this format:
I have a python list: values_list = [12.34, 17.56, 12.45]. My goal is to insert the list values under Col_C header vertically. It is currently overwriting the entire dataframe horizontally, without preserving the current values.
df2 = pd.DataFrame({'Col_C': values_list})
writer = pd.ExcelWriter('excelfile.xlsx', engine='xlsxwriter')
df2.to_excel(writer, sheet_name='MySheet1')
workbook = writer.book
worksheet = writer.sheets['MySheet1']
How to get this end result? Thank you!

Below I've provided a fully reproducible example of how you can go about modifying an existing .xlsx workbook using pandas and the openpyxl module (link to Openpyxl Docs).
First, for demonstration purposes, I create a workbook called test.xlsx:
from openpyxl import load_workbook
import pandas as pd
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
wb = writer.book
df = pd.DataFrame({'Col_A': [1,2,3,4],
'Col_B': [5,6,7,8],
'Col_C': [0,0,0,0],
'Col_D': [13,14,15,16]})
df.to_excel(writer, index=False)
wb.save('test.xlsx')
This is the Expected output at this point:
In this second part, we load the existing workbook ('test.xlsx') and modify the third column with different data.
from openpyxl import load_workbook
import pandas as pd
df_new = pd.DataFrame({'Col_C': [9, 10, 11, 12]})
wb = load_workbook('test.xlsx')
ws = wb['Sheet1']
for index, row in df_new.iterrows():
cell = 'C%d' % (index + 2)
ws[cell] = row[0]
wb.save('test.xlsx')
This is the Expected output at the end:

In my opinion, the easiest solution is to read the excel as a panda's dataframe, and modify it and write out as an excel. So for example:
Comments:
Import pandas as pd.
Read the excel sheet into pandas data-frame called.
Take your data, which could be in a list format, and assign it to the column you want. (just make sure the lengths are the same). Save your data-frame as an excel, either override the old excel or create a new one.
Code:
import pandas as pd
ExcelDataInPandasDataFrame = pd.read_excel("./YourExcel.xlsx")
YourDataInAList = [12.34,17.56,12.45]
ExcelDataInPandasDataFrame ["Col_C"] = YourDataInAList
ExcelDataInPandasDataFrame .to_excel("./YourNewExcel.xlsx",index=False)

Related

Can Pandas to_excel support hyperlink style now?

I can't find an answer (or one I know how to implement) when it comes to using the excel "hyperlink" style for a column when exporting using pd.to_excel.
I can find plenty of (OLD) answers on using xlsxwriter or openpyxl. But none using the current pandas functionality.
I think it might be possible now with the updates to the .style function? But I don't know how to implement the CSS2.2 rules to emulate the hyperlink style.
import pandas as pd
df = pd.DataFrame({'ID':1, 'link':['=HYPERLINK("http://www.someurl.com", "some website")']})
df.to_excel('test.xlsx')
The desired output is for the link column, to be the standard blue underlined text that then turns purple once you have clicked the link.
Is there a way to use the built in excel styling? Or would you have to pass various css properties througha dictionary using .style?
Here is one way to do it using xlsxwriter as the Excel engine:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2],
'link':['=HYPERLINK("http://www.python.org", "some website")',
'=HYPERLINK("http://www.python.org", "some website")']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the default URL format.
url_format = workbook.get_default_url_format()
# Apply it to the appropriate column, and widen the column.
worksheet.set_column(2, 2, 40, url_format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output, note that the second link has been clicked and is a different color:
Note, it would be preferable to use the xlsxwriter worksheet.write_url() method since that will look like a native Excel url to the end user and also doesn't need the above trick of getting and applying the url format. However, that method can't be used directly from a pandas dataframe (unlike the formula) so you would need to iterate through the link column of the dataframe and overwrite the formulas programatically with actual links.
Something like this:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2],
'link':['=HYPERLINK("http://www.python.org", "some website")',
'=HYPERLINK("http://www.python.org", "some website")']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test2.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the worksheet handle.
worksheet = writer.sheets['Sheet1']
# Widen the colum for clarity
worksheet.set_column(2, 2, 40)
# Overwrite the urls
worksheet.write_url(1, 2, "http://www.python.org", None, "some website")
worksheet.write_url(2, 2, "http://www.python.org", None, "some website")
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Xlsxwriter writer is writing its own sheets and deletes existing ones

I am wring dataframes to excel. Maybe I am not doing it correctly,
When I use this code:
from datetime import datetime
import numpy as np
import pandas as pd
from openpyxl import load_workbook
start = datetime.now()
df = pd.read_excel(r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal "
r"Data\Historical Worksheet\data.xlsx", sheet_name='x1')
df['run_time'] = start
df1 = pd.read_csv(r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal "
r"Data\Pre-processed\oddsportal_upcoming_matches.csv")
df1['run_time'] = start
concat = [df, df1]
df_c = pd.concat(concat)
path = r"C:\Users\harsh\Google Drive\Oddsportal\Files\Oddsportal Data\Historical Worksheet\data.xlsx"
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='x1')
df1.to_excel(writer, sheet_name='x2')
df_c.to_excel(writer, sheet_name='upcoming_archive')
writer.save()
writer.close()
print(df_c.head())
The dataframes are written in their respective sheets and all the other existing sheets get deleted.
How can i write to only the respective sheets and not disturb the other existing ones?
xlsxwriter is Not meant to alter an existing xlsx file. The only savier is openpyxl, which does the job but is hard to learn. I even wrote a simple python script to fill the gap to write a bunch of rows or columns in a sheet - openpyxl_writers.py
You just need to use the append mode and set if_sheet_exists to replace and use openpyxl as engine.
Replace:
writer = pd.ExcelWriter('test.xlsx')
By:
writer = pd.ExcelWriter('test.xlsx', mode='a', engine='openpyxl',
if_sheet_exists='replace') # <- HERE
From the documentation:
mode{‘w’, ‘a’}, default ‘w’

Can I export a dataframe to excel as the very first sheet?

Running dataframe.to_excel() automatically saves the dataframe as the last sheet in the Excel file.
Is there a way to save a dataframe as the very first sheet, so that, when you open the spreadsheet, Excel shows it as the first on the left?
The only workaround I have found is to first export an empty dataframe to the tab with the name I want as first, then export the others, then export the real dataframe I want to the tab with the name I want. Example in the code below. Is there a more elegant way? More generically, is there a way to specifically choose the position of the sheet you are exporting to (first, third, etc)?
Of course this arises because the dataframe I want as first is the result of some calculations based on all the others, so I cannot export it.
import pandas as pd
import numpy as np
writer = pd.ExcelWriter('My excel test.xlsx')
first_df = pd.DataFrame()
first_df['x'] = np.arange(0,100)
first_df['y'] = 2 * first_df['x']
other_df = pd.DataFrame()
other_df['z'] = np.arange(100,201)
pd.DataFrame().to_excel(writer,'this should be the 1st')
other_df.to_excel(writer,'other df')
first_df.to_excel(writer,'this should be the 1st')
writer.save()
writer.close()
It is possible to re-arrange the sheets after they have been created:
import pandas as pd
import numpy as np
writer = pd.ExcelWriter('My excel test.xlsx')
first_df = pd.DataFrame()
first_df['x'] = np.arange(0,100)
first_df['y'] = 2 * first_df['x']
other_df = pd.DataFrame()
other_df['z'] = np.arange(100,201)
other_df.to_excel(writer,'Sheet2')
first_df.to_excel(writer,'Sheet1')
writer.save()
This will give you this output:
Add this before you save the workbook:
workbook = writer.book
workbook.worksheets_objs.sort(key=lambda x: x.name)

Insert worksheet at specified index in existing Excel file using Pandas

Is there a way to insert a worksheet at a specified index using Pandas? With the code below, when adding a dataframe as a new worksheet, it gets added after the last sheet in the existing Excel file. What if I want to insert it at say index 1?
import pandas as pd
from openpyxl import load_workbook
f = 'existing_file.xlsx'
df = pd.DataFrame({'cat':['A','B'], 'word': ['C','D']})
book = load_workbook(f)
writer = pd.ExcelWriter(f, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'sheet')
writer.save()
writer.close()
Thank you.

Insert pandas DataFrame into existing excel worksheet with styling

I've seen answers as to how to add a pandas DataFrame into an existing worksheet using openpyxl as shown below:
from openpyxl import load_workbook, Workbook
import pandas as pd
df = pd.DataFrame(data=["20-01-2018",4,9,16,25,36],columns=["Date","A","B","C","D","E"])
path = 'filepath.xlsx'
writer = pd.ExcelWriter(path, engine='openpyxl')
writer.book = load_workbook(path)
writer.sheets = dict((ws.title,ws) for ws in writer.book.worksheets)
df.to_excel(writer,sheet_name="Sheet1", startrow=2,index=False, header=False)
writer.save()
However, I need to set a highlight color to the background data. Is there a way to do this without changing the dataframe into a list - trying to maintain the date format too.
Thanks
You can create a function to do the highlighting in the cells you desire
def highlight_style():
# provide your criteria for highlighting the cells here
return ['background-color: red']
And then apply your highlighting function to your dataframe...
df.style.apply(highlight_style)
After this when you write it to an excel it should work as you want =)
I sorted it thanks to help from Andre. You can export the results as such:
df.style.set_properties(**{'background-color':'red'}).to_excel(writer,sheet_name="Sheet1", startrow=2,index=False, header=False)
writer.save()
Thanks!

Categories