I have data like below in abc.xlxs
date,qty,price,profitprice,sellprice
20200501,11,900,,20
And using python I want output as:
data,qty,price,profitprice,sellprice
20200501,11.00,900.00,,20.00
Can any one help on this?
how can I read each column with its value and add number format and save to xlxs file?
Based on this answer by Akshit Khurana:
import pandas as pd
df = pd.read_excel("initial.xlsx")
writer = pd.ExcelWriter("formatted.xlsx", engine = "xlsxwriter")
df.to_excel(writer, index=False, header=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'num_format': '0.00'})
worksheet.set_column('C:E', None, format1) # Adds formatting to columns C-E
writer.save()
I believe the two other answers posted here do not work for the same reason why this question was asked.
You can use the dtype parameter at read_excel:
pd.read_excel('abc.xlxs', dtype={'profitprice': float, 'sellprice': float})
Related
I am using xlsxwriter to generate a file with quite a few formulas. From there, I want to create a table on another sheet. Everything is pretty straightforward until I want to use data from a different sheet for the table.
The documentation only shows examples of already having the data you need, and then passing that to the .add_table as the 'data' parameter.
What I am trying to do is this: (Which is structured how the rest of xlsxwriter's formulas are.)
df = pd.DataFrame(stuff)
writer = pd.ExcelWriter('File.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
workbook = writer.book
worksheet1 = writer.sheets['Sheet1']
worksheet2 = workbook.add_worksheet('Summary Page')
data = f"'Sheet1'!$A$1:$D${len(df)}"
worksheet2.add_table(f'A1:D{len(df)}', {'data':data})
workbook.close()
This approach adds the new sheet, and creates a table the correct size. But then fills in the "data" with 'data' as a string down the first column with one character in each cell.
Is there a way to create a table referencing data from another sheet using xlsxwriter?
ExcelWriter is (obviously) for writing Excelfiles.
If you want to read data from Excel after writing and saving it (did I get you right?!) use
ExcelFile.parse or read_excel to convert data to dataframe and write it again to Excel by ExcelWriter. Unfortunately xlsxwriter does not support appending, so you have to load and write all sheets again. Or just use the default openpyxl as engine. Could be omitted (as said: default) but to point out it is given in minimal working example:
import pandas as pd
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
data = pd.read_excel('test.xlsx', usecols='A:B', sheet_name='Sheet1', index_col=0)
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a')
# shape our data here
data.to_excel(writer, sheet_name='Sheet2')
writer.save()
Is there any chance to change the header format of my pandas dataframe which is wrote to an excel file.
Maybe it is unusual, but my header is composed of Dates and times and I want the 'cell format' of the excel file be 'date format'.
I tried something like this:
import pandas as pd
data = pd.DataFrame({'1899-12-30 00:00:00': [1.5,2.5,3.5,4.5,5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='Sheet1',index=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
worksheet.set_row(0, 20, date_fmt)
writer.save()
but set_row appears to not change header formats. I also converted the dates to an excel serial date value, but that didn't help either.
There are a few things you will need to do to get this working.
The first is to avoid the Pandas default header since that will set a cell header which can't be overwritten with set_row(). The best thing to do is to skip the default header and write your own (see Formatting of the Dataframe headers section of the XlsxWriter docs).
Secondly, dates in Excel are formatted numbers so you will need to convert the string header into a number, or better to a datetime object (see the Working with Dates and Time section of the docs).
Finally '1899-12-30' isn't a valid date in Excel.
Here is a working example with some of these fixes:
import pandas as pd
from datetime import datetime
data = pd.DataFrame({'2020-09-18 12:30:00': [1.5, 2.5, 3.5, 4.5, 5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Turn off the default header and skip one row to allow us to insert a user
# defined header.
data.to_excel(writer,
sheet_name='Sheet1', index=True,
startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
# Convert the column headers to datetime objects and write them with the
# defined format.
for col_num, value in enumerate(data.columns.values):
# Convert the date string to a datetime object.
date_time = datetime.strptime(value, '%Y-%m-%d %H:%M:%S')
# Make the column wider for clarity.
worksheet.set_column(col_num + 1, col_num + 1, 20)
# Write the date.
worksheet.write(0, col_num + 1, date_time, date_fmt)
writer.save()
Output:
I'm saving pandas DataFrame to_excel using xlsxwriter. I've managed to format all of my data (set column width, font size etc) except for changing header's font and I can't find the way to do it. Here's my example:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
The penultimate line that tries to set format for the header does nothing.
I think you need first reset default header style, then you can change it:
pd.core.format.header_style = None
All together:
import pandas as pd
data = pd.DataFrame({'test_data': [1,2,3,4,5]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
pd.core.format.header_style = None
data.to_excel(writer, sheet_name='test', index=False)
workbook = writer.book
worksheet = writer.sheets['test']
font_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10})
header_fmt = workbook.add_format({'font_name': 'Arial', 'font_size': 10, 'bold': True})
worksheet.set_column('A:A', None, font_fmt)
worksheet.set_row(0, None, header_fmt)
writer.save()
Explaining by jmcnamara, thank you:
In Excel a cell format overrides a row format overrides a column format.The pd.core.format.header_style is converted to a format and is applied to each cell in the header. As such the default cannot be overridden by set_row(). Setting pd.core.format.header_style to None means that the header cells don't have a user defined format and thus it can be overridden by set_row().
EDIT: In version 0.18.1 you have to change
pd.core.format.header_style = None
to:
pd.formats.format.header_style = None
EDIT: from version 0.20 this changed again
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
thanks krvkir.
EDIT: from version 0.24 this is now required
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
thanks Chris Vecchio.
An update for anyone who comes across this post and is using Pandas 0.20.1.
It seems the required code is now
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Apparently the excel submodule isn't imported automatically, so simply trying pandas.io.formats.excel.header_style = None alone will raise an AttributeError.
Another option for Pandas 0.25 (probably also 0.24). Likely not the best way to do it, but it worked for me.
import pandas.io.formats.excel
pandas.io.formats.excel.ExcelFormatter.header_style = None
for pandas 0.24:
The below doesn't work anymore:
import pandas.io.formats.excel
pandas.io.formats.excel.header_style = None
Instead, create a cell formatting object, and re-write the first row's content (your header) one cell at a time with the new cell formatting object.
Now, you are future proof.
Use the following pseudo code:
# [1] write df to excel as usual
writer = pd.ExcelWriter(path_output, engine='xlsxwriter')
df.to_excel(writer, sheet_name, index=False)
# [2] do formats for other rows and columns first
# [3] create a format object with your desired formatting for the header, let's name it: headercellformat
# [4] write to the 0th (header) row **one cell at a time**, with columnname and format
for columnnum, columnname in enumerate(list(df.columns)):
worksheet.write(0, columnnum, columnname, headercellformat)
In pandas 0.20 the solution of the accepted answer changed again.
The format that should be set to None can be found at:
pandas.io.formats.excel.header_style
If you do not want to set the header style for pandas entirely, you can alternatively also pass a header=False to ExcelWriter:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 5),
columns=pd.date_range('2019-01-01', periods=5, freq='M'))
file_path='output.xlsx'
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False )
workbook = writer.book
fmt = workbook.add_format({'num_format': 'mm/yyyy', 'bold': True})
worksheet = writer.sheets['Sheet1']
worksheet.set_row(0, None, fmt)
writer.save()
unfortunately add_format is in not avaiable anymore
I've seen answers as to how to add a pandas DataFrame into an existing worksheet using openpyxl as shown below:
from openpyxl import load_workbook, Workbook
import pandas as pd
df = pd.DataFrame(data=["20-01-2018",4,9,16,25,36],columns=["Date","A","B","C","D","E"])
path = 'filepath.xlsx'
writer = pd.ExcelWriter(path, engine='openpyxl')
writer.book = load_workbook(path)
writer.sheets = dict((ws.title,ws) for ws in writer.book.worksheets)
df.to_excel(writer,sheet_name="Sheet1", startrow=2,index=False, header=False)
writer.save()
However, I need to set a highlight color to the background data. Is there a way to do this without changing the dataframe into a list - trying to maintain the date format too.
Thanks
You can create a function to do the highlighting in the cells you desire
def highlight_style():
# provide your criteria for highlighting the cells here
return ['background-color: red']
And then apply your highlighting function to your dataframe...
df.style.apply(highlight_style)
After this when you write it to an excel it should work as you want =)
I sorted it thanks to help from Andre. You can export the results as such:
df.style.set_properties(**{'background-color':'red'}).to_excel(writer,sheet_name="Sheet1", startrow=2,index=False, header=False)
writer.save()
Thanks!
I desire to append dataframe to excel
This code works nearly as desire. Though it does not append each time. I run it and it puts data-frame in excel. But each time I run it it does not append. I also hear openpyxl is cpu intensive but not hear of many workarounds.
import pandas
from openpyxl import load_workbook
book = load_workbook('C:\\OCC.xlsx')
writer = pandas.ExcelWriter('C:\\OCC.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df1.to_excel(writer, index = False)
writer.save()
I want the data to append each time I run it, this is not happening.
Data output looks like original data:
A B C
H H H
I want after run a second time
A B C
H H H
H H H
Apologies if this is obvious I new to python and examples I practise did not work as wanted.
Question is - how can I append data each time I run. I try change to xlsxwriter but get AttributeError: 'Workbook' object has no attribute 'add_format'
first of all, this post is the first piece of the solution, where you should specify startrow=:
Append existing excel sheet with new dataframe using python pandas
you might also consider header=False.
so it should look like:
df1.to_excel(writer, startrow = 2,index = False, Header = False)
if you want it to automatically get to the end of the sheet and append your df then use:
startrow = writer.sheets['Sheet1'].max_row
and if you want it to go over all of the sheets in the workbook:
for sheetname in writer.sheets:
df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
btw: for the writer.sheets you could use dictionary comprehension (I think it's more clean, but that's up to you, it produces the same output):
writer.sheets = {ws.title: ws for ws in book.worksheets}
so full code will be:
import pandas
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
for sheetname in writer.sheets:
df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
writer.save()
You can use the append_df_to_excel() helper function, which is defined in this answer:
Usage examples:
filename = r'C:\OCC.xlsx'
append_df_to_excel(filename, df)
append_df_to_excel(filename, df, header=None, index=False)
append_df_to_excel(filename, df, sheet_name='Sheet2', index=False)
append_df_to_excel(filename, df, sheet_name='Sheet2', index=False, startrow=25)
All examples here are quite complicated.
In the documentation, it is much easier:
def append_to_excel(fpath, df, sheet_name):
with pd.ExcelWriter(fpath, mode="a") as f:
df.to_excel(f, sheet_name=sheet_name)
append_to_excel(<your_excel_path>, <new_df>, <new_sheet_name>)
When using this on LibreOffice/OpenOffice excel files, I get the error:
KeyError: "There is no item named 'xl/drawings/drawing1.xml' in the archive"
which is a bug in openpyxl as mentioned here.
I tried to read an excel, put it in a dataframe and then concat the dataframe from excel with the desired dataframe. It worked for me.
def append_df_to_excel(df, excel_path):
df_excel = pd.read_excel(excel_path)
result = pd.concat([df_excel, df], ignore_index=True)
result.to_excel(excel_path, index=False)
df = pd.DataFrame({"a":[11,22,33], "b":[55,66,77]})
append_df_to_excel(df, r"<path_to_dir>\<out_name>.xlsx")
If someone need it, I found an easier way:
Convert DF to rows in a list
rows = your_df.values.tolist()
load your workbook
workbook = load_workbook(filename=your_excel)
Pick your sheet
sheet = workbook[your_sheet]
Iterate over rows to append each:
for row in rows:
sheet.append(row)
Save woorkbook when done
workbook.save(filename=your_excel)
Putting it all together:
rows = your_df.values.tolist()
workbook = load_workbook(filename=your_excel)
sheet = workbook[your_sheet]
for row in rows:
sheet.append(row)
workbook.save(filename=your_excel)
def append_to_excel(fpath, df):
if (os.path.exists(fpath)):
x=pd.read_excel(fpath)
else :
x=pd.DataFrame()
dfNew=pd.concat([df,x])
dfNew.to_excel(fpath,index=False)