Pandas: Write dates to Excel such that they are useable as Dates - python

I cannot get my outputted XLSX to write dates in a usable fashion and I have followed familar threads like:
https://xlsxwriter.readthedocs.io/example_pandas_datetime.html
Problem with Python Pandas data output to excel in date format
Here is a MWE:
import pandas as pd
import xlsxwriter
not_in1 = ['missing']
# generate data
df = pd.DataFrame({'date1': ['5/1/2022 00:33:22', '3/1/2022 00:33:22', 'missing'], 'date2': ['3/1/2022 00:33:22', 'missing', '6/2/2022 00:33:22']})
# format
df['date1'] = df['date1'].apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y') if x not in not_in1 else x)
df['date2'] = df['date2'].apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y') if x not in not_in1 else x)
# write
path = 'C:\\Users\\Andrew\\Desktop\\xd2.xlsx'
with pd.ExcelWriter(path, engine='xlsxwriter', date_format="mm dd yyyy", datetime_format="mm dd yyyy") as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
formatdict = {'num_format':'mm/dd/yyyy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('A:B', 20, fmt)
Here as an XLSX, Excel doesn't know what to do:
https://i.stack.imgur.com/PBrAi.png
Interestingly, if I save the XLSX sheet as a CSV, the dates work just fine.
https://i.stack.imgur.com/tROFc.png

Your lambda function converts the x argument into a string, you should keep it as datetime instead. Currently you're ending up with a string in Excel (use Excel's type to see the difference between the .csv and .xlsx files).
Just remove the .strftime('%m/%d/%Y') and you'll be fine.

Related

Excel dates formats in pandas

I have a dataframe that looks like this....
df2['date1'] = ""
df2['date2'] = '=IF(INDIRECT("A"&ROW())="","",INDIRECT("A"&ROW())+30)'
df2['date3'] = '=IF(INDIRECT("A"&ROW())="","",INDIRECT("A"&ROW())+35)'
I want date2 and date3 to be calculated in excel using the excel formulas. I create this dataframe in python, then save the result
to excel. to save to excel, I have tried:
writer = pd.ExcelWriter("test.xlsx",
engine='xlsxwriter',
datetime_format='mmm d yyyy hh:mm:ss',
date_format='mmmm dd yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
df2.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects. in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape
# Set the column widths, to make the dates clearer.
worksheet.set_column(1, max_col, 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
When I do this, I get an excel sheet with empty columns, when I enter the date in date1, I set serial numbers back in date2 and date3,
so I know my coding is correct, and when I manually convert the format to short date, I get the correct dates in the mm/dd/yyyy format.
So my question is how do I set the format up in python so that I do not have to manually change the date format everytime this excel refreshes?
The datetime_format and date_format options to ExcelWriter() don't work because the dataframe columns don't have a datetime-like data type.
Instead you can use the xlsxwriter worksheet handle to set the column format.
Here is an adjusted version of your code to demonstrate:
import pandas as pd
# Create a sample dataframe.
df2 = pd.DataFrame({
'date1': [44562],
'date2': ['=IF(INDIRECT("B"&ROW())="","",INDIRECT("B"&ROW())+30)'],
'date3': ['=IF(INDIRECT("B"&ROW())="","",INDIRECT("B"&ROW())+35)']})
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df2.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Create a suitable date format.
date_format = workbook.add_format({'num_format': 'yyyy-mm-dd'})
# Get the dimensions of the dataframe.
(max_row, max_col) = df2.shape
# Set the column widths and add a date format.
worksheet.set_column(1, max_col, 14, date_format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Changing header format python pandas to excel

Is there any chance to change the header format of my pandas dataframe which is wrote to an excel file.
Maybe it is unusual, but my header is composed of Dates and times and I want the 'cell format' of the excel file be 'date format'.
I tried something like this:
import pandas as pd
data = pd.DataFrame({'1899-12-30 00:00:00': [1.5,2.5,3.5,4.5,5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='Sheet1',index=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
worksheet.set_row(0, 20, date_fmt)
writer.save()
but set_row appears to not change header formats. I also converted the dates to an excel serial date value, but that didn't help either.
There are a few things you will need to do to get this working.
The first is to avoid the Pandas default header since that will set a cell header which can't be overwritten with set_row(). The best thing to do is to skip the default header and write your own (see Formatting of the Dataframe headers section of the XlsxWriter docs).
Secondly, dates in Excel are formatted numbers so you will need to convert the string header into a number, or better to a datetime object (see the Working with Dates and Time section of the docs).
Finally '1899-12-30' isn't a valid date in Excel.
Here is a working example with some of these fixes:
import pandas as pd
from datetime import datetime
data = pd.DataFrame({'2020-09-18 12:30:00': [1.5, 2.5, 3.5, 4.5, 5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Turn off the default header and skip one row to allow us to insert a user
# defined header.
data.to_excel(writer,
sheet_name='Sheet1', index=True,
startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
# Convert the column headers to datetime objects and write them with the
# defined format.
for col_num, value in enumerate(data.columns.values):
# Convert the date string to a datetime object.
date_time = datetime.strptime(value, '%Y-%m-%d %H:%M:%S')
# Make the column wider for clarity.
worksheet.set_column(col_num + 1, col_num + 1, 20)
# Write the date.
worksheet.write(0, col_num + 1, date_time, date_fmt)
writer.save()
Output:

To add number format to xlxs file using python

I have data like below in abc.xlxs
date,qty,price,profitprice,sellprice
20200501,11,900,,20
And using python I want output as:
data,qty,price,profitprice,sellprice
20200501,11.00,900.00,,20.00
Can any one help on this?
how can I read each column with its value and add number format and save to xlxs file?
Based on this answer by Akshit Khurana:
import pandas as pd
df = pd.read_excel("initial.xlsx")
writer = pd.ExcelWriter("formatted.xlsx", engine = "xlsxwriter")
df.to_excel(writer, index=False, header=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'num_format': '0.00'})
worksheet.set_column('C:E', None, format1) # Adds formatting to columns C-E
writer.save()
I believe the two other answers posted here do not work for the same reason why this question was asked.
You can use the dtype parameter at read_excel:
pd.read_excel('abc.xlxs', dtype={'profitprice': float, 'sellprice': float})

Save to Excel strings with '='

I'm trying to save my output to an Excel file, but some of the values have '=' at the beginning of the string.
So while exporting, Excel converts them to formulas, and instead of strings, I have #NAME error in Excel.
I need to save only some columns as text, as I have dates and numerics in other columns, and they should be saved as is.
I've already tried to convert them with the .astype() function, but with no result.
def create_excel(datadir, filename, data):
df = col_type_converter(filename, pd.DataFrame(data))
filepath = os.path.join(datadir, filename + '.xlsx')
writer = pd.ExcelWriter(filepath, engine='xlsxwriter')
df.to_excel(writer, index=False)
writer.save()
return filepath
def col_type_converter(name, dataframe):
df = dataframe
if name == 'flights':
df['departure_station'] = df['departure_station'].astype(str)
df['arrival_station'] = df['arrival_station'].astype(str)
return df
return df
When I'm importing from CSV using the built-in Excel importer, I can make it import values as text.
Is there any way to say to Pandas how I want to import columns?
nvm, you can just pass xlsxwriter options through pandas:
writer = pd.ExcelWriter(filepath, engine='xlsxwriter', options={'strings_to_formulas': False})
https://xlsxwriter.readthedocs.io/working_with_pandas.html#passing-xlsxwriter-constructor-options-to-pandas
https://xlsxwriter.readthedocs.io/worksheet.html#worksheetwrite

How to convert seconds into H:MM:SS and keep the date in a separate column

I have data in seconds that I need to convert to H:MM:SS. When this data comes in it also has a date field in a separate column. I need to convert the seconds data into H:MM:SS but keep the date field as a date. I need the output to look like the desired output in Excel.
Example desired output:
excel output
I've tried using Excel writer and setting the default format of date_format or datetime_format however this converts all datetime columns in the excel file. Previous responses from jmcnamara indicates that this is because cell format takes precedence over column or row format.
Here is some sample code that i've gotten to work but it's not very pythonic. It involves saving the dataframe to excel but then re-opening that exact file.
# imports
import pandas as pd
import random
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle
# generate data
numbers = (random.sample(range(500, 2000), 10))
df = pd.DataFrame(numbers)
df.rename(columns={df.columns[0]:'Time'}, inplace=True)
# convert to time
df['Timestamp'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
#df['Openpyxl Time'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
# write to file
writer = pd.ExcelWriter('test.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, sheet_name= 'Sheet 1', index=False)
writer.save()
# load just created file
wb = load_workbook('test.xlsx')
ws = wb.active
# set format style
date_style = NamedStyle(name='datetime', number_format='h:mm:ss')
# simple way to format but also formats column header
for cell in ws['C']:
cell.style = date_style
#more complex way to format, but does not format column header
# for row in ws.iter_rows('C{}:C{}'.format(ws.min_row+1, ws.max_row)):
# for cell in row:
# cell.style = date_style
wb.save('test.xlsx')
wb.close()
How do i re-write this to not have to re-open the excel file to change the different columns to different datetime formats?
The desired output also can't be read as a string in excel. I need to be able to derive averages and sum from the timestamps.
Thanks!
After the recommendation from Charlie Clark in the comments above, i used OpenpyXL's utils package to convert the pandas dataframe to openpyxl's workbook. Once converted to a workbook i can still utilize the same code for the rest of the script.
# imports
import pandas as pd
import random
from openpyxl.styles import NamedStyle
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import Workbook
# generate data
numbers = (random.sample(range(500, 2000), 10))
df = pd.DataFrame(numbers)
df.rename(columns={df.columns[0]: 'Time'}, inplace=True)
# convert to time
df['Timestamp'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
# create empty openpyxl workbook
wb = Workbook()
ws = wb.active
# convert pandas dataframe to openpyxl workbook
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
# set format style in openpyxl
date_style = NamedStyle(name='datetime', number_format='h:mm:ss')
# simple way to format but also formats column header
for cell in ws['B']:
cell.style = date_style
# more complex way to format, but does not format column header
# for row in ws.iter_rows('C{}:C{}'.format(ws.min_row+1, ws.max_row)):
# for cell in row:
# cell.style = date_style
# save workbook
wb.save('test.xlsx')
wb.close()

Categories