openpyxl how to set cell format as Date instead of Custom - python

I am using openpyxl and pandas to generate an Excel file, and need to have dates formatted as Date in Excel. The dates in exported file are formatted correctly in dd/mm/yyyy format but when I right-click on a cell and go to 'Format Cells' it shows Custom, is there a way to change to Date? Here is my code where I specify date format.
writer = pd.ExcelWriter(dstfile, engine='openpyxl', date_format='dd/mm/yyyy')
I have also tried to set cell.number_format = 'dd/mm/yyyy' but still getting Custom format in Excel.

The answer can be found in the comments of Converting Data to Date Type When Writing with Openpyxl.
ensure you are writing a datetime.datetime object to the cell, then:
.number_format = 'mm/dd/yyyy;#' # notice the ';#'
e.g.,
import datetime
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws['A1'] = datetime.datetime(2021, 12, 25)
ws['A1'].number_format = 'yyyy-mm-dd;#'
wb.save(r'c:\data\test.xlsx')
n.b. these dates are still a bit 'funny' as they are not auto-magically grouped into months and years in pivot tables (if you like that sort of thing). In the pivot table, you can manually click on them and set the grouping though: https://support.microsoft.com/en-us/office/group-or-ungroup-data-in-a-pivottable-c9d1ddd0-6580-47d1-82bc-c84a5a340725

You might have to convert them to datetime objects in python if they are saved as strings in the data frame. One approach is to iterate over the cells and doing it after using ExcelWriter:
cell = datetime.strptime('30/12/1999', '%d/%m/%Y')
cell.number_format = 'dd/mm/yyyy'
A better approach is to convert that column in the data frame prior to that. You can use to_datetime function in Pandas for that.
See this answer for converting the whole column in the dataframe.

Related

Treat everything as raw string (even formulas) when reading into pandas from excel

So, I am actually handling text responses from surveys, and it is common to have responses that starts with -, an example is: -I am sad today.
Excel would interpret it as #NAMES?
So when I import the excel file into pandas using read_excel, it would show NAN.
Now is there any method to force excel to retain as raw strings instead interpret it at formula level?
I created a vba and assigning the entire column with text to click through all the cells in the column, which is slow if there is ten thousand++ data.
I was hoping it can do it at python level instead, any idea?
I hope, it works for your solution, use openpyxl to extract excel data and then convert it into a pandas dataframe
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = './formula_contains_raw.xlsx', ).active
print(wb.values)
# sheet_names = wb.get_sheet_names()[0]
# sheet_ranges = wb[name]
df = pd.DataFrame(list(wb.values)[1:], columns=list(wb.values)[0])
df.head()
It works for me using a CSV instead of excel file.
In the CSV file (opened in excel) I need to select the option Formulas/Show Formulas, then save the file.
pd.read_csv('draft.csv')
Output:
Col1
0 hello
1 =-hello

How to get rid of timestamps infront of Date, as Pandas adds time stamps to date columns after saving to excel

I am a student and i am learning pandas.
I have created excel file named Student_Record.xlsx(using microsoft excel)
I wanted to create new file using pandas
import pandas as pd
df = pd.read_excel(r"C:\Users\sudarshan\Desktop\Student_Record.xlsx")
df.head()
df.to_excel(r"C:\Users\sudarshan\Desktop\Output.xlsx",index=False)
I opened the file in pandas and saving the file back to excel with different name(file name = Output)
I saved the file back to Excel, but when i open the file(Output) on MS.Excel the columns(DOB and YOP)have time stamp attached to dates.
Please let me know how to print only date?(I want Output file and its contents to look exactly like the original file)
Hope to get some help/support.
Thank you
Probably your DOB and Year of passing columns are of datetime format before they are saved to Excel. As a result, they got converted back to the datetime representation when saved to Excel.
If you want to retain its contents to look exactly like the original file in dd-mm-YYYY format, you can try converting these 2 columns to string format before saving to Excel. You can do it by:
df['DOB'] = df['DOB'].dt.strftime('%d-%m-%Y')
df['Year of passing'] = df['Year of passing'].dt.strftime('%d-%m-%Y')

Using xlsxwriter to Align Left a Row

xlsxwriter has been pretty powerful and almost everything I want is working, but the following attempt to align left a single row doesn't seem to work.
stats = DataFrame(...)
xl_writer = ExcelWriter(r'U:\temp\test.xlsx')
stats.to_excel(xl_writer, 'Stats')
workbook = xl_writer.book
format_header = workbook.add_format({'align': 'left'})
stats_sheet = xl_writer.sheets['Stats']
stats_sheet.set_row(0, None, format_header)
See the XlsxWriter docs for Formatting of the Dataframe headers:
Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example...

Error on saving date format in file with pandas ExcelWriter

I'm trying to save an Excel file with date format and I got error.
Here is my code:
import pandas as pd
from datetime import datetime, date
df=dataframe with two columns: created_at (date format), name (number format)
writer = pd.ExcelWriter('graph_data.xlsx',engine='xlsxwriter',date_format='mm dd yyyy')
pd.DataFrame(df).to_excel(writer, 'Name')
writer.save()
I obtain an Excel like following:
I can format the cells manually but I would like to format them directly in the code?
From the docs:
If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas.
Then I suggest to do something like this:
workbook = writer.book
worksheet = writer.sheets['Name']
worksheet.set_column('A:A', 20) # Assuming is the first column
writer.save()
Complete example here

Python script to parse a big workbook

I have an extra sized excel file and I need to automate a task I do everyday: Add rows to the bottom with the day's date, save a new workbook, crop the old ones and save as a new file with the day's date.
An example is today only having rows with date 04-10-2016 and the filename would be [sheetname]04102016H12 or [sheetname]04102016H16if it has passed 12 pm.
I've tried xldr, doing this in VBA and so on but I can't get along with VBA and it is slow. So I'd rather use Python here - lightweight, does the job and so on.
Anyway, so far, I have done the follwing:
import xlsxwriter, datetime, xlrd
import pandas as pd
# Parsing main excel sheet to save the correct
with xlrd.open_workbook(r'D:/path/to/file/file.xlsx', on_demand=True) as xls:
for sheet in xls.parse(xls.sheet_names([0])):
dfs = pd.read_excel(xls, sheet, header = 1)
now = datetime.date.today()
df[df['Data'] != now]
if datetime.time()<datetime.time(11,0,0,0):
df.to_excel(r'W:\path\I\need'+str(sheet)+now+'H12.xlsx', index=False)
else:
df.to_excel(r'W:\path\I\need'+str(sheet)+now+'H16.xlsx', index=False)
Unfortunately, this does not separate the main file into as many files as worksheets the workbook contains. It outputs TypeError: 'list' object is not callable, regarding this in xls.parse(xls.sheet_names([0])).
Based on comments below I am updating my answer. Just do:
xls.sheet_names()[0]
However, if you want to loop through the sheets, then you may want all sheet names instead of just the first one.

Categories