Does xlrd retrieve date variables correctly from Excel? - python

I was trying to read a multiple sheet Excel workbook into SPSS when I stumbled upon the following problem: when I read a date variable from Excel into Python with xlrd, it seems to add 2 days to the date. Or perhaps my conversion from the Excel format to a more human friendly representation is not correct. Could anybody tell me what's wrong in the code below?
import xlwt,datetime
wb=xlwt.Workbook()
ws=wb.add_sheet("date_1")
fmt = xlwt.easyxf(num_format_str='M/D/YY')
ws.write(0,0,datetime.datetime.now(),fmt)
wb.save(r"d:\temp\datetest.xls")
#Now open Excel file manually -> date is correct
import xlrd
wb=xlrd.open_workbook(r"d:\temp\datetest.xls")
ws=wb.sheets()[0]
Data = ws.row_values(0)[0]
print datetime.datetime(1900,1,1,0,0,0)+datetime.timedelta(days=Data)
#Now date is 2 days off

I'm pretty sure that xlrd is able to tell when the cell is formatted in Excel as a date, and make the conversion to Python date object on its own. It's not foolproof, though.
Your issue is probably by starting with datetime.datetime(1900,1,1,0,0,0) and adding the timedelta to it--you might want to try:
datetime.date(1899,12,31) + datetime.timedelta(days=Data)
Which should avoid the (a) one day you're adding by starting at 1/1/1900 and (b) one day you're adding (I'm guessing) from having it be a datetime object rather than date, which may be pushing it over into the next day. This is just a guess, though.
Alternatively, if you already know that it's consistently two days, why don't you just do this?
print datetime.datetime(1900,1,1,0,0,0) + datetime.timedelta(days=Data - 2)

Nope. There's two things going on here.
1 - in Excel, "1" rather than "0" corresponds to January 1, 1900
2 - Excel includes Feb 29, 1900 (which never occurred), accounting for the second day of difference. This is done on purpose for backward compatibility reasons.
Taking these two points into account seems to solve all issues.

Earlier answers are only partially correct.
Extra info:
There are TWO Excel date systems: (1900 (Windows) and 1904 (Mac)).
1900 system: earliest non-ambiguous datetime is 1900-03-01T00:00:00, represented as 61.0.
1904 system: earliest non-ambiguous datetime is 1904-01-02T00:00:00, represented as 1.0.
Which date system is in effect is available in xlrd from Book.datemode.
xlrd supplies a function called xldate_as_tuple that takes care of all of the above. This code:
print datum
print datetime.datetime(1900, 1, 1) + datetime.timedelta(days=datum)
print datetime.datetime(1900, 3, 1) + datetime.timedelta(days=datum - 61)
tup = xlrd.xldate_as_tuple(datum, wb.datemode)
print tup
print datetime.datetime(*tup)
produces:
41274.4703588
2013-01-02 11:17:19
2012-12-31 11:17:19
(2012, 12, 31, 11, 17, 19)
2012-12-31 11:17:19
when wb.datemode is 0 (1900).
This information is all contained in the documentation that is distributed with xlrd.

Related

Strange date format conversion

I have a db with dates in this format: 44378.097351.
If I put that in excel and choose the date format is says it is 01/07/2021 2:20:11, which is correct.
The problem is that now I need to do a plot with python with these dates and I extract the data directly from the db but I don't know how to convert it to an understandable date format.
Any suggestions?
Thank you
From Date systems in Excel, we can find how such a number was converted to a datetime.
The 1900 date system
In the 1900 date system, dates are calculated by using January 1, 1900, as a starting point. When you enter a date, it is converted into a serial number that represents the number of days elapsed since January 1, 1900. For example, if you enter July 5, 2011, Excel converts the date to the serial number 40729. This is the default date system in Excel for Windows, Excel 2016 for Mac, and Excel for Mac 2011. If you choose to convert the pasted data, Excel adjusts the underlying values, and the pasted dates match the dates that you copied.
from datetime import timedelta, datetime
dt = datetime(1900, 1, 1) + timedelta(44378.097351 - 2)
This format corresponds to the number of days since 01/01/1900.
you should check out this or this post to find an adequate solution ;)

Python convert from ordinal time with milliseconds [duplicate]

I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.
You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here
Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format
Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()
Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.
Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.

Date change halfway through csv from YYYY-MM-DD to DD/MM/YY and after switch datetime no longer works

I have a csv of daily temperature data with 3 columns: dates, daily maximum temperatures, and daily minimum temperatures. I attached it here so you can see what I mean.
I am trying to break this data set into smaller datasets of 30 year periods. For the first few years of Old.csv the dates are entered in YYYY-MM-DD but then switch to DD/MM/YY in 1900. After this date format switches my code to split the years no longer works. Here is what I'm using:
df2 = pd.read_csv("Old.csv")
test = df2[
(pd.to_datetime(df2['Date']) >
pd.to_datetime('1897-01-01')) &
(pd.to_datetime(df2['Date']) <
pd.to_datetime('1899-12-31'))
]
and it works...BUT when I switch to 1900 and beyond it stops. So this one doesnt work:
test = df2[
(pd.to_datetime(df2['Date']) >
pd.to_datetime('1900-01-01')) &
(pd.to_datetime(df2['Date']) <
pd.to_datetime('1905-12-31'))
]
The above code gives me an empty data set, despite working pre 1900. I'm assuming this is some sort of a formatting issue but I thought that using ".to_datetime" would fix that. I also tried this:
df2['Date']=pd.to_datetime(df2['Date'])
to reformat the entire list before I ran the code above but it still didnt work. The other interesting thing is that I have a separate csv with dates consistently entered as MM/DD/YY and that one works with the code above. Could it be an issue with the turn of the century? Does anyone know how to fix this?
You're dealing with time/date data with different formats, for this you could you could use a more flexible parser, for instance dateutil.parser
Example:
>>> from dateutil.parser import parse
>>> df
Date
0 1897-01-01
1 1899-12-31
2 01/01/00
>>> df.Date.apply(parse)
0 1897-01-01 00:00:00
1 1899-12-31 00:00:00
2 2000-01-01
Name: Date, dtype: datetime64[ns]
and use your function on the parsed data.
As remarked in the comment above, it's still not clear whether year "00" refers to year 1900 or 2000, but maybe you can infer that from the context of the csv file.
To change all years in the 'DD/MM/YY' format to 1900 dates you could define your own parse function
>>> def my_parse(d):
... if d[-3]=='/':
... d = d[:-3]+'/19'+d[-2:]
... return parse(d)
>>> df.Date.apply(my_parse)
0 1897-01-01
1 1899-12-31
2 1900-01-01
Python is reading 00 as 2000 instead of 1900. So I tried this to edit 00 to read as 1900:
df2.Date.dt.year.replace(2000, 1990, inplace=True)
But python returned an error that said dates are not directly editable. So I then changed them to a string and edited that way using:
df2['Date'] = df2['Date'].str.replace(r'00', '1900')
This works but now I need to find a way to loop through 1896-1968 without having to type that line out every time.

More Efficient way to code this?

I'm supposed to get the excel date of Dec 1 2011 and what day of the week it is and print it out in this format. The Excel date for Thursday, 2011-Dec-1 is 40878. I've been able to get both, but I don't think my method of getting the day using if statements is the best approach. This is my Original script file, so please forgive the roughness. I've checked and I know my solution are right. My only problem is getting a more efficient way to get the day and any suggestions on how to get the month in my final output.
We haven't done the date time module yet,so I can't experiment with that.
here is my code:
Year=2011
Month=12
Day=1
Y2=Year-1900
en=int((14-Month)/12)
Y3=Y2-en
m2=Month+12*
l=1+min(Y3,0)+int(Y3/4)-int(Y3/100)+int((Y3+300)/400)
d1=int(-1.63+(m2-1)*30.6)
import math
d2=math.floor(Day+Y3*365+l+d1) #d2 is the final excel date.
Day_Of_Week=((d2%7)-1)
print "%s"%(d2)
if Day_Of_Week==0:
print "sun"
if Day_Of_Week ==1:
print "mon"
if Day_Of_Week==2:
print"tue"
if Day_Of_Week==3:
print "wed"
if Day_Of_Week==4 :
print "thur"
if Day_Of_Week==5:
print "fri"
if Day_Of_Week==6:
print "sat"
Any Help will be appreciated :)
How about:
days = ['sun', 'mon', 'tue', 'wed', 'thur', 'fri', 'sat']
print days[Day_Of_week]
Also take a look at this: How do I read a date in Excel format in Python?
"""I'm supposed to get the excel date of Dec 1 2011""": There is no such thing as the Excel date". There are two" date systems in use by Excel, one where the epoch is in 1900 [the default in Windows Excel] and the other using 1904 [the default in Windows for the Mac.
See the xlrd documentation; there's a section up front about dates, and check out the functions that have xldate in their names.
>>> import xlrd
>>> xlrd.xldate.xldate_from_date_tuple((2011,12, 1), 0) # Windows origin
40878.0
>>> xlrd.xldate.xldate_from_date_tuple((2011,12, 1), 1) # Mac origin
39416.0
Thanks for all your help,I was able to do it in a better way ,but couldn't post it up until our assignments had been graded.
This is what i did:
from math import floor
def calcExcelDate(Year, Month,Day):
Yr_Offset=Year-1900 #Determines year offset from starting point.
Early_Mnth_Correctn=int((14-Month)/12)
#Early month correction:makes the year have 14 months so the leap day is added at the end of the year
DateCorrector=(Yr_Offset)-(Early_Mnth_Correctn) #Corrects Date
MonthCorrector=Month+12*Early_Mnth_Correctn #Corrects Month
Leapyr_Calc=1+min(DateCorrector,0)+int(DateCorrector/4)-int(DateCorrector/100)+int ((DateCorrector+300)/400)
#calculates no of leap years since starting point
char=int(floor(-1.63+(MonthCorrector-1)*30.6))
#determines the number of days preceding the given month in a non leap year.
Excel_Date=(Day+DateCorrector*365+Leapyr_Calc+char )
Days=["Monday","Tuesday","Wednesday","Thursday","Friday","saturday","sunday"]
Months=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
Offset=2
dayNo=(Excel_Date-Offset)%7
dayOfWk=Days[dayNo]
return "The excel date of %r %r-%r-%r is %r"%(dayOfWk,Day,Months[Month-1],Year,Excel_Date)

How to convert the integer date format into YYYYMMDD?

Python and Matlab quite often have integer date representations as follows:
733828.0
733829.0
733832.0
733833.0
733834.0
733835.0
733836.0
733839.0
733840.0
733841.0
these numbers correspond to some dates this year. Do you guys know which function can convert them back to YYYYMMDD format?
thanks a million!
The datetime.datetime class can help you here. The following works, if those values are treated as integer days (you don't specify what they are).
>>> from datetime import datetime
>>> dt = datetime.fromordinal(733828)
>>> dt
datetime.datetime(2010, 2, 25, 0, 0)
>>> dt.strftime('%Y%m%d')
'20100225'
You show the values as floats, and the above doesn't take floats. If you can give more detail about what the data is (and where it comes from) it will be possible to give a more complete answer.
Since Python example was already demonstrated, here is the matlab one:
>> datestr(733828, 'yyyymmdd')
ans =
20090224
Also, note that while looking similar these are actually different things in Matlab and Python:
Matlab
A serial date number represents the whole and fractional number of days
from a specific date and time, where datenum('Jan-1-0000 00:00:00') returns
the number 1. (The year 0000 is merely a reference point and is not intended
to be interpreted as a real year in time.)
Python, datetime.date.fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1.
So they would differ by 366 days, which is apparently the length of the year 0.
Dates like 733828.0 are Rata Die dates, counted from January 1, 1 A.D. (and decimal fraction of days). They may be UTC or by your timezone.
Julian Dates, used mostly by astronomers, count the days (and decimal fraction of days) since January 1, 4713 BC Greenwich noon. Julian date is frequently confused with Ordinal date, which is the date count from January 1 of the current year (Feb 2 = ordinal day 33).
So datetime is calling these things ordinal dates, but I think this only makes sense locally, in the world of python.
Is 733828.0 a timestamp? If so, you can do the following:
import datetime as dt
dt.date.fromtimestamp(733828.0).strftime('%Y%m%d')
PS
I think Peter Hansen is right :)
I am not a native English speaker. Just trying to help. I don't quite know the difference between a timestamp and an ordinal :(

Categories