I have a dataframe that looks like this....
df2['date1'] = ""
df2['date2'] = '=IF(INDIRECT("A"&ROW())="","",INDIRECT("A"&ROW())+30)'
df2['date3'] = '=IF(INDIRECT("A"&ROW())="","",INDIRECT("A"&ROW())+35)'
I want date2 and date3 to be calculated in excel using the excel formulas. I create this dataframe in python, then save the result
to excel. to save to excel, I have tried:
writer = pd.ExcelWriter("test.xlsx",
engine='xlsxwriter',
datetime_format='mmm d yyyy hh:mm:ss',
date_format='mmmm dd yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
df2.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects. in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Get the dimensions of the dataframe.
(max_row, max_col) = df.shape
# Set the column widths, to make the dates clearer.
worksheet.set_column(1, max_col, 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
When I do this, I get an excel sheet with empty columns, when I enter the date in date1, I set serial numbers back in date2 and date3,
so I know my coding is correct, and when I manually convert the format to short date, I get the correct dates in the mm/dd/yyyy format.
So my question is how do I set the format up in python so that I do not have to manually change the date format everytime this excel refreshes?
The datetime_format and date_format options to ExcelWriter() don't work because the dataframe columns don't have a datetime-like data type.
Instead you can use the xlsxwriter worksheet handle to set the column format.
Here is an adjusted version of your code to demonstrate:
import pandas as pd
# Create a sample dataframe.
df2 = pd.DataFrame({
'date1': [44562],
'date2': ['=IF(INDIRECT("B"&ROW())="","",INDIRECT("B"&ROW())+30)'],
'date3': ['=IF(INDIRECT("B"&ROW())="","",INDIRECT("B"&ROW())+35)']})
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df2.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Create a suitable date format.
date_format = workbook.add_format({'num_format': 'yyyy-mm-dd'})
# Get the dimensions of the dataframe.
(max_row, max_col) = df2.shape
# Set the column widths and add a date format.
worksheet.set_column(1, max_col, 14, date_format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:
Related
I am using xlsxwriter to generate a file with quite a few formulas. From there, I want to create a table on another sheet. Everything is pretty straightforward until I want to use data from a different sheet for the table.
The documentation only shows examples of already having the data you need, and then passing that to the .add_table as the 'data' parameter.
What I am trying to do is this: (Which is structured how the rest of xlsxwriter's formulas are.)
df = pd.DataFrame(stuff)
writer = pd.ExcelWriter('File.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
workbook = writer.book
worksheet1 = writer.sheets['Sheet1']
worksheet2 = workbook.add_worksheet('Summary Page')
data = f"'Sheet1'!$A$1:$D${len(df)}"
worksheet2.add_table(f'A1:D{len(df)}', {'data':data})
workbook.close()
This approach adds the new sheet, and creates a table the correct size. But then fills in the "data" with 'data' as a string down the first column with one character in each cell.
Is there a way to create a table referencing data from another sheet using xlsxwriter?
ExcelWriter is (obviously) for writing Excelfiles.
If you want to read data from Excel after writing and saving it (did I get you right?!) use
ExcelFile.parse or read_excel to convert data to dataframe and write it again to Excel by ExcelWriter. Unfortunately xlsxwriter does not support appending, so you have to load and write all sheets again. Or just use the default openpyxl as engine. Could be omitted (as said: default) but to point out it is given in minimal working example:
import pandas as pd
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
data = pd.read_excel('test.xlsx', usecols='A:B', sheet_name='Sheet1', index_col=0)
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a')
# shape our data here
data.to_excel(writer, sheet_name='Sheet2')
writer.save()
I cannot get my outputted XLSX to write dates in a usable fashion and I have followed familar threads like:
https://xlsxwriter.readthedocs.io/example_pandas_datetime.html
Problem with Python Pandas data output to excel in date format
Here is a MWE:
import pandas as pd
import xlsxwriter
not_in1 = ['missing']
# generate data
df = pd.DataFrame({'date1': ['5/1/2022 00:33:22', '3/1/2022 00:33:22', 'missing'], 'date2': ['3/1/2022 00:33:22', 'missing', '6/2/2022 00:33:22']})
# format
df['date1'] = df['date1'].apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y') if x not in not_in1 else x)
df['date2'] = df['date2'].apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y') if x not in not_in1 else x)
# write
path = 'C:\\Users\\Andrew\\Desktop\\xd2.xlsx'
with pd.ExcelWriter(path, engine='xlsxwriter', date_format="mm dd yyyy", datetime_format="mm dd yyyy") as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
formatdict = {'num_format':'mm/dd/yyyy'}
fmt = workbook.add_format(formatdict)
worksheet.set_column('A:B', 20, fmt)
Here as an XLSX, Excel doesn't know what to do:
https://i.stack.imgur.com/PBrAi.png
Interestingly, if I save the XLSX sheet as a CSV, the dates work just fine.
https://i.stack.imgur.com/tROFc.png
Your lambda function converts the x argument into a string, you should keep it as datetime instead. Currently you're ending up with a string in Excel (use Excel's type to see the difference between the .csv and .xlsx files).
Just remove the .strftime('%m/%d/%Y') and you'll be fine.
Can I set pandas to default YYYY-MM-DD, am getting YYYY-MM-DD 00:00:00 at the end? Is there a way to make sure by default that the zeros don't appear when I export to excel/csv?
Updated per comment request:
I have a function that looks like this:
x1 = my_funct('Unemployment', '2004-01-04 2009-01-04', 'DK', 'Unemployment (Denmark)')
Then I create a df out of it:
df1 = pd.DataFrame(x1)
along with others:
# this concats the df horizontally
df_merged1 = pd.concat([df1, df0, df2, df0, df3, df0, df4], axis=1)
df_merged1.reset_index(inplace=True)
Then I export that to excel:
writer = pd.ExcelWriter('Test1.xlsx', engine='xlsxwriter')
df_merged1.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False, index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df_merged1.columns.values):
worksheet.write(0, col_num, value, header_format)
format = workbook.add_format()
format.set_align('center')
format.set_align('vcenter')
worksheet.set_column(0, 11, 30, format)
writer.save()
writer.close()
The exported excel file has multiple date columns each one showing the extra 00:00:00 at the end. Is it possible to have it only as YYYY-MM-DD?
Thanks
The solution is to create a writer
Creating a tiny dataframe for test
import pandas as pd
from datetime import datetime
df = pd.DataFrame([datetime(2021, 3, 4, 20, 48, 5)])
This is the dataframe so far:
0
0 2021-03-04 20:48:05
Creating the Writer
writer = pd.ExcelWriter("exemple.xlsx", datetime_format='hh:mm:ss')
df.to_excel(writer, "Sheet1")
writer.close()
Note: I used hh:mm:ss but it could be any format.
If you need more details, see at ExcelWriter
The result would be:
Is there any chance to change the header format of my pandas dataframe which is wrote to an excel file.
Maybe it is unusual, but my header is composed of Dates and times and I want the 'cell format' of the excel file be 'date format'.
I tried something like this:
import pandas as pd
data = pd.DataFrame({'1899-12-30 00:00:00': [1.5,2.5,3.5,4.5,5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='Sheet1',index=True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
worksheet.set_row(0, 20, date_fmt)
writer.save()
but set_row appears to not change header formats. I also converted the dates to an excel serial date value, but that didn't help either.
There are a few things you will need to do to get this working.
The first is to avoid the Pandas default header since that will set a cell header which can't be overwritten with set_row(). The best thing to do is to skip the default header and write your own (see Formatting of the Dataframe headers section of the XlsxWriter docs).
Secondly, dates in Excel are formatted numbers so you will need to convert the string header into a number, or better to a datetime object (see the Working with Dates and Time section of the docs).
Finally '1899-12-30' isn't a valid date in Excel.
Here is a working example with some of these fixes:
import pandas as pd
from datetime import datetime
data = pd.DataFrame({'2020-09-18 12:30:00': [1.5, 2.5, 3.5, 4.5, 5.4]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Turn off the default header and skip one row to allow us to insert a user
# defined header.
data.to_excel(writer,
sheet_name='Sheet1', index=True,
startrow=1, header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add a header format.
date_fmt = workbook.add_format({'num_format': 'dd.mm.yyyy hh:mm:ss'})
# Convert the column headers to datetime objects and write them with the
# defined format.
for col_num, value in enumerate(data.columns.values):
# Convert the date string to a datetime object.
date_time = datetime.strptime(value, '%Y-%m-%d %H:%M:%S')
# Make the column wider for clarity.
worksheet.set_column(col_num + 1, col_num + 1, 20)
# Write the date.
worksheet.write(0, col_num + 1, date_time, date_fmt)
writer.save()
Output:
I have data in seconds that I need to convert to H:MM:SS. When this data comes in it also has a date field in a separate column. I need to convert the seconds data into H:MM:SS but keep the date field as a date. I need the output to look like the desired output in Excel.
Example desired output:
excel output
I've tried using Excel writer and setting the default format of date_format or datetime_format however this converts all datetime columns in the excel file. Previous responses from jmcnamara indicates that this is because cell format takes precedence over column or row format.
Here is some sample code that i've gotten to work but it's not very pythonic. It involves saving the dataframe to excel but then re-opening that exact file.
# imports
import pandas as pd
import random
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle
# generate data
numbers = (random.sample(range(500, 2000), 10))
df = pd.DataFrame(numbers)
df.rename(columns={df.columns[0]:'Time'}, inplace=True)
# convert to time
df['Timestamp'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
#df['Openpyxl Time'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
# write to file
writer = pd.ExcelWriter('test.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, sheet_name= 'Sheet 1', index=False)
writer.save()
# load just created file
wb = load_workbook('test.xlsx')
ws = wb.active
# set format style
date_style = NamedStyle(name='datetime', number_format='h:mm:ss')
# simple way to format but also formats column header
for cell in ws['C']:
cell.style = date_style
#more complex way to format, but does not format column header
# for row in ws.iter_rows('C{}:C{}'.format(ws.min_row+1, ws.max_row)):
# for cell in row:
# cell.style = date_style
wb.save('test.xlsx')
wb.close()
How do i re-write this to not have to re-open the excel file to change the different columns to different datetime formats?
The desired output also can't be read as a string in excel. I need to be able to derive averages and sum from the timestamps.
Thanks!
After the recommendation from Charlie Clark in the comments above, i used OpenpyXL's utils package to convert the pandas dataframe to openpyxl's workbook. Once converted to a workbook i can still utilize the same code for the rest of the script.
# imports
import pandas as pd
import random
from openpyxl.styles import NamedStyle
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import Workbook
# generate data
numbers = (random.sample(range(500, 2000), 10))
df = pd.DataFrame(numbers)
df.rename(columns={df.columns[0]: 'Time'}, inplace=True)
# convert to time
df['Timestamp'] = pd.to_timedelta(df['Time'], unit='s') + pd.Timestamp(0)
# create empty openpyxl workbook
wb = Workbook()
ws = wb.active
# convert pandas dataframe to openpyxl workbook
for r in dataframe_to_rows(df, index=False, header=True):
ws.append(r)
# set format style in openpyxl
date_style = NamedStyle(name='datetime', number_format='h:mm:ss')
# simple way to format but also formats column header
for cell in ws['B']:
cell.style = date_style
# more complex way to format, but does not format column header
# for row in ws.iter_rows('C{}:C{}'.format(ws.min_row+1, ws.max_row)):
# for cell in row:
# cell.style = date_style
# save workbook
wb.save('test.xlsx')
wb.close()