Error on saving date format in file with pandas ExcelWriter - python

I'm trying to save an Excel file with date format and I got error.
Here is my code:
import pandas as pd
from datetime import datetime, date
df=dataframe with two columns: created_at (date format), name (number format)
writer = pd.ExcelWriter('graph_data.xlsx',engine='xlsxwriter',date_format='mm dd yyyy')
pd.DataFrame(df).to_excel(writer, 'Name')
writer.save()
I obtain an Excel like following:
I can format the cells manually but I would like to format them directly in the code?

From the docs:
If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas.
Then I suggest to do something like this:
workbook = writer.book
worksheet = writer.sheets['Name']
worksheet.set_column('A:A', 20) # Assuming is the first column
writer.save()
Complete example here

Related

Treat everything as raw string (even formulas) when reading into pandas from excel

So, I am actually handling text responses from surveys, and it is common to have responses that starts with -, an example is: -I am sad today.
Excel would interpret it as #NAMES?
So when I import the excel file into pandas using read_excel, it would show NAN.
Now is there any method to force excel to retain as raw strings instead interpret it at formula level?
I created a vba and assigning the entire column with text to click through all the cells in the column, which is slow if there is ten thousand++ data.
I was hoping it can do it at python level instead, any idea?
I hope, it works for your solution, use openpyxl to extract excel data and then convert it into a pandas dataframe
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = './formula_contains_raw.xlsx', ).active
print(wb.values)
# sheet_names = wb.get_sheet_names()[0]
# sheet_ranges = wb[name]
df = pd.DataFrame(list(wb.values)[1:], columns=list(wb.values)[0])
df.head()
It works for me using a CSV instead of excel file.
In the CSV file (opened in excel) I need to select the option Formulas/Show Formulas, then save the file.
pd.read_csv('draft.csv')
Output:
Col1
0 hello
1 =-hello

How to get rid of timestamps infront of Date, as Pandas adds time stamps to date columns after saving to excel

I am a student and i am learning pandas.
I have created excel file named Student_Record.xlsx(using microsoft excel)
I wanted to create new file using pandas
import pandas as pd
df = pd.read_excel(r"C:\Users\sudarshan\Desktop\Student_Record.xlsx")
df.head()
df.to_excel(r"C:\Users\sudarshan\Desktop\Output.xlsx",index=False)
I opened the file in pandas and saving the file back to excel with different name(file name = Output)
I saved the file back to Excel, but when i open the file(Output) on MS.Excel the columns(DOB and YOP)have time stamp attached to dates.
Please let me know how to print only date?(I want Output file and its contents to look exactly like the original file)
Hope to get some help/support.
Thank you
Probably your DOB and Year of passing columns are of datetime format before they are saved to Excel. As a result, they got converted back to the datetime representation when saved to Excel.
If you want to retain its contents to look exactly like the original file in dd-mm-YYYY format, you can try converting these 2 columns to string format before saving to Excel. You can do it by:
df['DOB'] = df['DOB'].dt.strftime('%d-%m-%Y')
df['Year of passing'] = df['Year of passing'].dt.strftime('%d-%m-%Y')

Date format issue while writing data-frame to excel file using xlwings

i am trying to read a excel file and write it to another excel file using python Xlwings, but in the output1.xlsx excel file if the date value is lesser than 13 then the date and month part are getting interchanged . If the date >=13 then date value is same as input file. When i checked values in the data-frame it is correct and the issue is happening while writing the dataframe to output1.xlsx file using Xlwings . while using pandas.to_excel() the value are written to excel in correct format. Please help me with XLWINGS for the writing the data in dataframe to excel.
Issue::
Note: in the picture left side is input data and right side is output. i have highlighted the wrong data in red color and the date values are less than 13.
Expected output using xlwings:
Code:
import pandas as pd
import numpy as np
import xlwings as xw
df=pd.concat(pd.read_excel("input.xlsx",sheet_name=None,parse_dates=False,na_filter = False,dtype=str), ignore_index=True)
app = xw.App(visible=False)
# 1st output file using xlwings --> here the date vaues are wrong in output file output1.xlsx
book = xw.Book("output1.xlsx")
sht = book.sheets("Sheet1")
sht.range('A1').options(index=False,dates=False).value=df
book.save()
book.close()
app.quit()
# 2nd output file using pandas to_excel() -->> here the date values are correct
df.to_excel("output2.xlsx",index=False)
Files:
input files link

openpyxl how to set cell format as Date instead of Custom

I am using openpyxl and pandas to generate an Excel file, and need to have dates formatted as Date in Excel. The dates in exported file are formatted correctly in dd/mm/yyyy format but when I right-click on a cell and go to 'Format Cells' it shows Custom, is there a way to change to Date? Here is my code where I specify date format.
writer = pd.ExcelWriter(dstfile, engine='openpyxl', date_format='dd/mm/yyyy')
I have also tried to set cell.number_format = 'dd/mm/yyyy' but still getting Custom format in Excel.
The answer can be found in the comments of Converting Data to Date Type When Writing with Openpyxl.
ensure you are writing a datetime.datetime object to the cell, then:
.number_format = 'mm/dd/yyyy;#' # notice the ';#'
e.g.,
import datetime
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws['A1'] = datetime.datetime(2021, 12, 25)
ws['A1'].number_format = 'yyyy-mm-dd;#'
wb.save(r'c:\data\test.xlsx')
n.b. these dates are still a bit 'funny' as they are not auto-magically grouped into months and years in pivot tables (if you like that sort of thing). In the pivot table, you can manually click on them and set the grouping though: https://support.microsoft.com/en-us/office/group-or-ungroup-data-in-a-pivottable-c9d1ddd0-6580-47d1-82bc-c84a5a340725
You might have to convert them to datetime objects in python if they are saved as strings in the data frame. One approach is to iterate over the cells and doing it after using ExcelWriter:
cell = datetime.strptime('30/12/1999', '%d/%m/%Y')
cell.number_format = 'dd/mm/yyyy'
A better approach is to convert that column in the data frame prior to that. You can use to_datetime function in Pandas for that.
See this answer for converting the whole column in the dataframe.

fixing improper ID formatting

Background: The following code works to export a pandas df as an excel file:
import pandas as pd
import xlsxwriter
writer = pd.ExcelWriter('Excel_File.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Problem:
My ID column in the excel file shows up like
8.96013E+17 instead of 896013350764773376
I try to alter it in excel using format and zipcode but it still gives the wrong ID 896013350764773000
Question: Using excel or python code, how do I keep my original 896013350764773376 ID format?
Excel uses IEEE754 doubles to represent numbers and they have 15 digits of precision. So you are not going to be able to represent an 18 digit id as a number in Excel. You will need to convert it to a string to maintain all the digits.

Categories