Python convert from ordinal time with milliseconds [duplicate] - python

I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.

You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here

Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format

Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()

Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.

Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.

Related

Why does pandas interpret Aug-30 as 1930-08, but not 2030-08? [duplicate]

I'm coming across something that is almost certainly a stupid mistake on my part, but I can't seem to figure out what's going on.
Essentially, I have a series of dates as strings in the format "%d-%b-%y", such as 26-Sep-05. When I go to convert them to datetime, the year is sometimes correct, but sometimes it is not.
E.g.:
dates = ['26-Sep-05', '26-Sep-05', '15-Jun-70', '5-Dec-94', '9-Jan-61', '8-Feb-55']
pd.to_datetime(dates, format="%d-%b-%y")
DatetimeIndex(['2005-09-26', '2005-09-26', '1970-06-15', '1994-12-05',
'2061-01-09', '2055-02-08'],
dtype='datetime64[ns]', freq=None)
The last two entries, which get returned as 2061 and 2055 for the years, are wrong. But this works fine for the 15-Jun-70 entry. What's going on here?
That seems to be the behavior of the Python library datetime, I did a test to see where the cutoff is 68 - 69:
datetime.datetime.strptime('31-Dec-68', '%d-%b-%y').date()
>>> datetime.date(2068, 12, 31)
datetime.datetime.strptime('1-Jan-69', '%d-%b-%y').date()
>>> datetime.date(1969, 1, 1)
Two digits year ambiguity
So it seems that anything with the %y year below 69 will be attributed a century of 2000, and 69 upwards get 1900
The %y two digits can only go from 00 to 99 which is going to be ambiguous if we start crossing centuries.
If there is no overlap, you could manually process it and annotate the century (kill the ambiguity)
I suggest you process your data manually and specify the century, e.g. you can decide that anything in your data that has the year between 17 and 68 is attributed to 1917 - 1968 (instead of 2017 - 2068).
If you have overlap then you can't process with insufficient year information, unless e.g. you have some ordered data and a reference
If you have overlap e.g. you have data from both 2016 and 1916 and both were logged as '16', that's ambiguous and there isn't sufficient information to parse this, unless the data is ordered by date in which case you can use heuristics to switch the century as you parse it.
from the docs
Year 2000 (Y2K) issues: Python depends on the platform’s C library,
which generally doesn’t have year 2000 issues, since all dates and
times are represented internally as seconds since the epoch. Function
strptime() can parse 2-digit years when given %y format code. When
2-digit years are parsed, they are converted according to the POSIX
and ISO C standards: values 69–99 are mapped to 1969–1999, and values
0–68 are mapped to 2000–2068.
For anyone looking for a quick and dirty code snippet to fix these cases, this worked for me:
from datetime import timedelta, date
col = 'date'
df[col] = pd.to_datetime(df[col])
future = df[col] > date(year=2050,month=1,day=1)
df.loc[future, col] -= timedelta(days=365.25*100)
You may need to tune the threshold date closer to the present depending on the earliest dates in your data.
You can write a simple function to correct this parsing of wrong year as stated below:
import datetime
def fix_date(x):
if x.year > 1989:
year = x.year - 100
else:
year = x.year
return datetime.date(year,x.month,x.day)
df['date_column'] = data['date_column'].apply(fix_date)
Hope this helps..
Another quick solution to the problem:-
import pandas as pd
import numpy as np
dates = pd.DataFrame(['26-Sep-05', '26-Sep-05', '15-Jun-70', '5-Dec-94', '9-Jan-61', '8-Feb-55'])
for i in dates:
tempyear=pd.to_numeric(dates[i].str[-2:])
dates["temp_year"]=np.where((tempyear>=44)&(tempyear<=99),tempyear+1900,tempyear+2000).astype(str)
dates["temp_month"]=dates[i].str[:-2]
dates["temp_flyr"]=dates["temp_month"]+dates["temp_year"]
dates["pddt"]=pd.to_datetime(dates.temp_flyr.str.upper(), format='%d-%b-%Y', yearfirst=False)
tempdrops=["temp_year","temp_month","temp_flyr",i]
dates.drop(tempdrops, axis=1, inplace=True)
And the output is as follows, here I have converted the output to pandas datetime format from object using pd.to_datetime
pddt
0 2005-09-26
1 2005-09-26
2 1970-06-15
3 1994-12-05
4 1961-01-09
5 1955-02-08
As mentioned in some other answers this works best if there is no overlap between the dates of the two centuries.
If running into the same problem using a pandas DataFrame, try using the current year or year greater than a particular year, then apply a lambda function similar to below:
df["column"] = df["column"].apply(lambda x: x - dt.timedelta(days=365*100) if x > dt.datetime.now() else x)
or
df["column"] = df["column"].apply(lambda x: x - dt.timedelta(days=365*100) if x > 2022 else x)

Converting a day decimal number array to unix timestamp in Python

I have an array of numbers (e.g 279.341, 279.345, 279.348) which relate to the date and time in 2017 (its supposed to be October 6th 2017). To be able to compare this data to another dataset I need to convert that array into an array of UNIX timestamps.
I have successfully done something similar in matlab (code below) but don't know how to translate this to Python.
MatLab:
adcpTimeStr = datestr(adcp.adcp_day_num,'2017 mmm dd HH:MM:SS');
adcpTimeRaw = datetime(adcpTimeStr,'InputFormat','yyyy MMM dd HH:mm:ss');
adcpTimenumRaw = datenum(adcpTimeRaw)';
What would be a good way of converting the array into UNIX timestamps?
assuming these numbers are fractional days of the year (UTC) and the year is 2017, in Python you would do
from datetime import datetime, timedelta, timezone
year = datetime(2017,1,1, tzinfo=timezone.utc) # the starting point
doy = [279.341, 279.345, 279.348]
# add days to starting point as timedelta and call timestamp() method:
unix_t = [(year+timedelta(d)).timestamp() for d in doy]
# [1507363862.4, 1507364208.0, 1507364467.2]

Improving the result of conversion of datenum to datetime

I have to convert a MATLAB's datenum to Python's datetime (e.g.2010-11-04 00:03:50.209589).
The datenum is represented in milliseconds and the date must be from 2010-11-04 00:00:00 to 2011-06-11 00:00:00.
The following code is as below:
matlab_datenum = 6.365057116950260162e+10
python_datetime = datetime.datetime.fromtimestamp(matlab_datenum / 1e3)
print (python_datetime)
The result is : 1972-01-07 16:42:51.169503
The result is wrong because the date must be from 2010-11-04 to 2011-06-11.
Do you have any idea how to correct the result ?
Thank you for your help
The datenum page in the Matlab documentation states:
The datenum function creates a numeric array that represents each point in time as the number of days from January 0, 0000.
Python's datetime module page states the following for fromtimestamp:
Return the local date corresponding to the POSIX timestamp
which is 00:00:00 1 January 1970
The two functions are counting from different start points and using different units (days and seconds), hence the discrepancy between your two dates.

how to generate a range of yyyymm values? [duplicate]

This question already has answers here:
Generate a list of datetimes between an interval
(5 answers)
Closed 8 years ago.
I have two yyyymm values that will be input by a user:
yyyymm_1 = '201406'
yyyymm_2 = '201501'
I want to be able to iterate through this range in increasing month order:
for yyyy and mm in the range of yyyymm_1 to yyyymm_2
my_function( yyyy, mm )
How can this be done in python?
Update:
Ideally, the solution should be as simple as possible without requiring external libraries. I'm not looking for a generic date manipulation solution, but a solution to answer the specific question I have asked above.
I had seen lots of generic solutions before posting my question. However, being a python noob, couldn't see how to adapt them to my question:
Generate a list of datetimes between an interval
Iterating through a range of dates in Python
On that note, the other questions linked to from this page are much more generic. If you are looking to generate a range of yyyymm values, I urge you to look at the selected answer on this page.
Here's another rather simple variant, without even using datetime. Just split the date, calculate the 'total month', and iterate.
def to_month(yyyymm):
y, m = int(yyyymm[:4]), int(yyyymm[4:])
return y * 12 + m
def iter_months(start, end):
for month in range(to_month(start), to_month(end) + 1):
y, m = divmod(month-1, 12) # ugly fix to compensate
yield y, m + 1 # for 12 % 12 == 0
for y, m in iter_months('201406', '201501'):
print y, m
Output:
2014 6
2014 7
...
2014 12
2015 1
For output in the same yyyymm format, use print("%d%02d" % (y, m)).
You can do this using the builtin datetime module and the third party package dateutil.
The code first converts your strings to datetime.datetime objects using datetime.datetime.strptime. It then uses the relativedelta function from dateutil to create a period of one month that can be added to your datetimes.
Within the while loop you can either work with the datetime objects directly, or construct the month and year as strings using strftime, I've shown an example of both in print functions.
import datetime as dt
from dateutil.relativedelta import relativedelta
yyyymm_1 = '201406'
yyyymm_2 = '201501'
MONTH = relativedelta(months=+1)
fmt = '%Y%m'
date_1 = dt.datetime.strptime(yyyymm_1, fmt).date()
date_2 = dt.datetime.strptime(yyyymm_2, fmt).date()
d = date_1
while d <= date_2:
print(d)
print(d.strftime('%Y'), d.strftime('%m'))
d += MONTH

In Python/Pandas how do I convert century-months to DateTimeIndex?

I am working with a dataset that encodes dates as the integer number of months since December 1899, so month 1 is January 1900 and month 1165 is January 1997. I would like to convert to a pandas DateTimeIndex. So far the best I've come up with is:
month0 = np.datetime64('1899-12-15')
one_month = np.timedelta64(30, 'D') + np.timedelta64(10.5, 'h')
birthdates = pandas.DatetimeIndex(month0 + one_month * resp.cmbirth)
The start date is the 15th of the month, and the timedelta is 30 days 10.5 hours, the average length of a calendar month. So the date within the month drifts by a day or two.
So this seems a little hacky and I wondered if there's a better way.
You can use built-in pandas date-time functionality.
import pandas as pd
import numpy as np
indexed_months = np.random.random_integers(0, high=1165, size=100)
month0 = pd.to_datetime('1899-12-01')
date_list = [month0 + pd.DateOffset(months=mnt) for mnt in indexed_months]
birthdates = pd.DatetimeIndex(date_list)
I've made an assumption that your resp.cmbirth object looks like an array of integers between 0 and 1165.
I'm not quite clear on why you want the bin edges of the indices to be offset from the start or end of the month. This can be done:
shifted_birthdates = birthdates.shift(15, freq=pd.datetools.day)
and similarly for hours if you want. There is also useful info in the answers to this SO question and the related pandas github issue.

Categories