Is there any chance to change the header format of my pandas dataframe which is wrote to an excel file.
Maybe it is unusual, but my header is composed of Dates and times and I want the 'cell format' of the excel file be 'date format'.
I tried something like this:
import pandas as pd
data = pd.DataFrame({'1899-12-30 00:00:00': [1.5,2.5,3.5,4.5,5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='Sheet1',index=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
worksheet.set_row(0, 20, date_fmt)
writer.save()
but set_row appears to not change header formats. I also converted the dates to an excel serial date value, but that didn't help either.
There are a few things you will need to do to get this working.
The first is to avoid the Pandas default header since that will set a cell header which can't be overwritten with set_row(). The best thing to do is to skip the default header and write your own (see Formatting of the Dataframe headers section of the XlsxWriter docs).
Secondly, dates in Excel are formatted numbers so you will need to convert the string header into a number, or better to a datetime object (see the Working with Dates and Time section of the docs).
Finally '1899-12-30' isn't a valid date in Excel.
Here is a working example with some of these fixes:
import pandas as pd
from datetime import datetime
data = pd.DataFrame({'2020-09-18 12:30:00': [1.5, 2.5, 3.5, 4.5, 5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Turn off the default header and skip one row to allow us to insert a user
# defined header.
data.to_excel(writer,
sheet_name='Sheet1', index=True,
startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
# Convert the column headers to datetime objects and write them with the
# defined format.
for col_num, value in enumerate(data.columns.values):
# Convert the date string to a datetime object.
date_time = datetime.strptime(value, '%Y-%m-%d %H:%M:%S')
# Make the column wider for clarity.
worksheet.set_column(col_num + 1, col_num + 1, 20)
# Write the date.
worksheet.write(0, col_num + 1, date_time, date_fmt)
writer.save()
Output:
Related
I have a dataframe that looks like this....
df2['date1'] = ""
df2['date2'] = '=IF(INDIRECT("A"&ROW())="","",INDIRECT("A"&ROW())+30)'
df2['date3'] = '=IF(INDIRECT("A"&ROW())="","",INDIRECT("A"&ROW())+35)'
I want date2 and date3 to be calculated in excel using the excel formulas. I create this dataframe in python, then save the result
to excel. to save to excel, I have tried:
writer = pd.ExcelWriter("test.xlsx",
engine='xlsxwriter',
datetime_format='mmm d yyyy hh:mm:ss',
date_format='mmmm dd yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
df2.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects. in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape
# Set the column widths, to make the dates clearer.
worksheet.set_column(1, max_col, 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
When I do this, I get an excel sheet with empty columns, when I enter the date in date1, I set serial numbers back in date2 and date3,
so I know my coding is correct, and when I manually convert the format to short date, I get the correct dates in the mm/dd/yyyy format.
So my question is how do I set the format up in python so that I do not have to manually change the date format everytime this excel refreshes?
The datetime_format and date_format options to ExcelWriter() don't work because the dataframe columns don't have a datetime-like data type.
Instead you can use the xlsxwriter worksheet handle to set the column format.
Here is an adjusted version of your code to demonstrate:
import pandas as pd
# Create a sample dataframe.
df2 = pd.DataFrame({
'date1': [44562],
'date2': ['=IF(INDIRECT("B"&ROW())="","",INDIRECT("B"&ROW())+30)'],
'date3': ['=IF(INDIRECT("B"&ROW())="","",INDIRECT("B"&ROW())+35)']})
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df2.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Create a suitable date format.
date_format = workbook.add_format({'num_format': 'yyyy-mm-dd'})
# Get the dimensions of the dataframe.
(max_row, max_col) = df2.shape
# Set the column widths and add a date format.
worksheet.set_column(1, max_col, 14, date_format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
I cannot get my outputted XLSX to write dates in a usable fashion and I have followed familar threads like:
https://xlsxwriter.readthedocs.io/example_pandas_datetime.html
Problem with Python Pandas data output to excel in date format
Here is a MWE:
import pandas as pd
import xlsxwriter
not_in1 = ['missing']
# generate data
df = pd.DataFrame({'date1': ['5/1/2022 00:33:22', '3/1/2022 00:33:22', 'missing'], 'date2': ['3/1/2022 00:33:22', 'missing', '6/2/2022 00:33:22']})
# format
df['date1'] = df['date1'].apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y') if x not in not_in1 else x)
df['date2'] = df['date2'].apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y') if x not in not_in1 else x)
# write
path = 'C:\\Users\\Andrew\\Desktop\\xd2.xlsx'
with pd.ExcelWriter(path, engine='xlsxwriter', date_format="mm dd yyyy", datetime_format="mm dd yyyy") as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
formatdict = {'num_format':'mm/dd/yyyy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('A:B', 20, fmt)
Here as an XLSX, Excel doesn't know what to do:
https://i.stack.imgur.com/PBrAi.png
Interestingly, if I save the XLSX sheet as a CSV, the dates work just fine.
https://i.stack.imgur.com/tROFc.png
Your lambda function converts the x argument into a string, you should keep it as datetime instead. Currently you're ending up with a string in Excel (use Excel's type to see the difference between the .csv and .xlsx files).
Just remove the .strftime('%m/%d/%Y') and you'll be fine.
I have data like below in abc.xlxs
date,qty,price,profitprice,sellprice
20200501,11,900,,20
And using python I want output as:
data,qty,price,profitprice,sellprice
20200501,11.00,900.00,,20.00
Can any one help on this?
how can I read each column with its value and add number format and save to xlxs file?
Based on this answer by Akshit Khurana:
import pandas as pd
df = pd.read_excel("initial.xlsx")
writer = pd.ExcelWriter("formatted.xlsx", engine = "xlsxwriter")
df.to_excel(writer, index=False, header=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'num_format': '0.00'})
worksheet.set_column('C:E', None, format1) # Adds formatting to columns C-E
writer.save()
I believe the two other answers posted here do not work for the same reason why this question was asked.
You can use the dtype parameter at read_excel:
pd.read_excel('abc.xlxs', dtype={'profitprice': float, 'sellprice': float})
Question: I have a data frame with column names with respective values. But when i apply format object to column headings, they are not responding.
Code:
import pandas as pd
root = "C:\Users\543904\Desktop\New folder\"
dict = {'name':["aparna", "pankaj"],
'degree': ["MBA", "BCA"],
'score':[90, 40]}
df = pd.DataFrame(dict)
writer = pd.ExcelWriter(root + 'output', engine = "xlsxwriter")
df.to_excel(writer, sheet_name='df', index = False)
workbook = writer.book
worksheet = writer.sheets['df']
Format_Object = workbook.add_format({'text_wrap': True})
Format_Object.set_bold()
Format_Object.set_align('center')
Format_Object.set_align('top')
Format_Object.set_border(1)
Format_Object.set_bg_color('#0ef0ce')
worksheet.set_row(0, 20, Format_Object)
writer.save()
expected:
Expected
Actual:
Actual
This is explained in the XlsxWriter docs on Working with Python Pandas and XlsxWriter:
Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
I have data in seconds that I need to convert to H:MM:SS. When this data comes in it also has a date field in a separate column. I need to convert the seconds data into H:MM:SS but keep the date field as a date. I need the output to look like the desired output in Excel.
Example desired output:
excel output
I've tried using Excel writer and setting the default format of date_format or datetime_format however this converts all datetime columns in the excel file. Previous responses from jmcnamara indicates that this is because cell format takes precedence over column or row format.
Here is some sample code that i've gotten to work but it's not very pythonic. It involves saving the dataframe to excel but then re-opening that exact file.
# imports
import pandas as pd
import random
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle
# generate data
numbers = (random.sample(range(500, 2000), 10))
df = pd.DataFrame(numbers)
df.rename(columns={df.columns[0]:'Time'}, inplace=True)
# convert to time
df['Timestamp'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
#df['Openpyxl Time'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
# write to file
writer = pd.ExcelWriter('test.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, sheet_name= 'Sheet 1', index=False)
writer.save()
# load just created file
wb = load_workbook('test.xlsx')
ws = wb.active
# set format style
date_style = NamedStyle(name='datetime', number_format='h:mm:ss')
# simple way to format but also formats column header
for cell in ws['C']:
cell.style = date_style
#more complex way to format, but does not format column header
# for row in ws.iter_rows('C{}:C{}'.format(ws.min_row+1, ws.max_row)):
# for cell in row:
# cell.style = date_style
wb.save('test.xlsx')
wb.close()
How do i re-write this to not have to re-open the excel file to change the different columns to different datetime formats?
The desired output also can't be read as a string in excel. I need to be able to derive averages and sum from the timestamps.
Thanks!
After the recommendation from Charlie Clark in the comments above, i used OpenpyXL's utils package to convert the pandas dataframe to openpyxl's workbook. Once converted to a workbook i can still utilize the same code for the rest of the script.
# imports
import pandas as pd
import random
from openpyxl.styles import NamedStyle
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import Workbook
# generate data
numbers = (random.sample(range(500, 2000), 10))
df = pd.DataFrame(numbers)
df.rename(columns={df.columns[0]: 'Time'}, inplace=True)
# convert to time
df['Timestamp'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
# create empty openpyxl workbook
wb = Workbook()
ws = wb.active
# convert pandas dataframe to openpyxl workbook
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
# set format style in openpyxl
date_style = NamedStyle(name='datetime', number_format='h:mm:ss')
# simple way to format but also formats column header
for cell in ws['B']:
cell.style = date_style
# more complex way to format, but does not format column header
# for row in ws.iter_rows('C{}:C{}'.format(ws.min_row+1, ws.max_row)):
# for cell in row:
# cell.style = date_style
# save workbook
wb.save('test.xlsx')
wb.close()