Python datetime and pandas give different timestamps for the same date - python

from datetime import datetime
import pandas as pd
date="2020-02-07T16:05:16.000000000"
#Convert using datetime
t1=datetime.strptime(date[:-3],'%Y-%m-%dT%H:%M:%S.%f')
#Convert using Pandas
t2=pd.to_datetime(date)
#Subtract the dates
print(t1-t2)
#subtract the date timestamps
print(t1.timestamp()-t2.timestamp())
In this example, my understanding is that both datetime and pandas should use timezone naive dates. Can anyone explain why the difference between the dates is zero, but the difference between the timestamps is not zero? It's off by 5 hours for me, which is my time zone offset from GMT.

Naive datetime objects of Python's datetime.datetime class represent local time. This is kind of obvious from the docs but can be a brain-teaser to work with nevertheless. If you call the timestamp method on it, the returned POSIX timestamp refers to UTC (seconds since the epoch) as it should.
Coming from the Python datetime object, the behavior of a naive pandas.Timestamp can be counter-intuitive (and I think it's not so obvious). Derived the same way from a tz-naive string, it doesn't represent local time but UTC. You can verify that by localizing the datetime object to UTC:
from datetime import datetime, timezone
import pandas as pd
date = "2020-02-07T16:05:16.000000000"
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f')
t2 = pd.to_datetime(date)
print(t1.replace(tzinfo=timezone.utc).timestamp() - t2.timestamp())
# 0.0
The other way around you can make the pandas.Timestamp timezone-aware, e.g.
t3 = pd.to_datetime(t1.astimezone())
# e.g. Timestamp('2020-02-07 16:05:16+0100', tz='Mitteleuropäische Zeit')
# now both t1 and t3 represent my local time:
print(t1.timestamp() - t3.timestamp())
# 0.0
My bottom line is that if you know that the timestamps you have represent a certain timezone, work with timezone-aware datetime, e.g. for UTC
import pytz # need to use pytz here since pandas uses that internally
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=pytz.UTC)
t2 = pd.to_datetime(date, utc=True)
print(t1 == t2)
# True
print(t1-t2)
# 0 days 00:00:00
print(t1.timestamp()-t2.timestamp())
# 0.0

Related

datetime with fixedoffset to datetime in pandas

I have two datetime objects which I want to subtract - however they both need to be in same format
I tried to convert datetime64[ns, pytz.FixedOffset(-240)] (eastern time zone) however I run into errors. Other datetime object is datetime64[ns] which is already in est timezone
1) df['date'].strftime('%Y-%m-%d %H:%M:%S')
error: 'Series' object has no attribute 'strftime'
2) df['date'].replace(tzinfo=None)
error: replace() got an unexpected keyword argument 'tzinfo'
3) df['date'].dt_tz.replace(tzinfo=None)
error: 'Series' object has no attribute 'dt_tz'
In pandas, if you have mixed time zones or UTC offsets, you will get
TypeError: DatetimeArray subtraction must have the same timezones or no timezones
when trying to calculate a timedelta. The error basically tells you how to avoid it: convert everything to the same tz, for example:
import pandas as pd
df = pd.DataFrame({
'date0': pd.to_datetime(["2021-08-01 00:00 -04:00"]), # should be US/Eastern
'date1': pd.to_datetime(["2021-08-01 01:00"]) # should be US/Eastern as well
})
# date0 date1
# 0 2021-08-01 00:00:00-04:00 2021-08-01 01:00:00
# date0 already has a UTC offset but we can set a proper time zone:
df['date0'] = df['date0'].dt.tz_convert('America/New_York')
# date1 is naive, i.e. does not have a time zone, so we need to localize:
df['date1'] = df['date1'].dt.tz_localize('America/New_York')
# since both datetime columns now have the same time zone, we can calculate:
print(df['date1'] - df['date0'])
# 0 0 days 01:00:00
# dtype: timedelta64[ns]
Python's datetime isn't that picky, you can easily calculate timedelta from datetime objects with different time zones:
from datetime import datetime
from zoneinfo import ZoneInfo # Python 3.9
d0 = datetime(2021, 1, 1, tzinfo=ZoneInfo("UTC"))
d1 = datetime(2020, 12, 31, 20, tzinfo=ZoneInfo('America/New_York'))
print(d1-d0)
# 1:00:00
Keep in mind that Python's timedelta arithmetic is wall-time arithmetic; you can do weird stuff like this. So it's sometimes less obvious what's going on I'd say.
While #MrFuppes answer is detailed for generic case since one of my dataframe was already in tz format I had to take below steps which worked
Initial format
datetime64[ns, pytz.FixedOffset(-240)] (eastern time zone)
1) Step taken
pd.to_datetime((df['date']).dt.tz_convert('US/Eastern'))
Initial Format
datetime64[ns]
2) Step taken
pd.to_datetime((df['date1']).dt.tz_localize('US/Eastern'))
This two steps brought datetime in same format for me to perform arithmetic operations

Python convert timestamp to unix

I know these questions have been asked before but I'm struggling to convert a timestamp string to a unix time and figuring out whether the datetime objects are naive or aware
For example, to convert the time "2021-05-19 12:51:47" to unix:
>>> from datetime import datetime as dt
>>> dt_obj = dt.strptime("2021-05-19 12:51:47", "%Y-%m-%d %H:%M:%S")
>>> dt_obj
datetime.datetime(2021, 5, 19, 12, 51, 47)
is dt_obj naive or aware and how would you determine this? The methods on dt_obj such as timetz, tzinfo, and tzname don't seem to indicate anything - does that mean that dt_obj is naive?
Then to get unix:
>>> dt_obj.timestamp()
1621421507.0
However when I check 1621421507.0 on say https://www.unixtimestamp.com then it tells me that gmt for the above is Wed May 19 2021 10:51:47 GMT+0000, ie 2 hours behind the original timestamp?
since Python's datetime treats naive datetime as local time by default, you need to set the time zone (tzinfo attribute):
from datetime import datetime, timezone
# assuming "2021-05-19 12:51:47" represents UTC:
dt_obj = datetime.fromisoformat("2021-05-19 12:51:47").replace(tzinfo=timezone.utc)
Or, as #Wolf suggested, instead of setting the tzinfo attribute explicitly, you can also modify the input string by adding "+00:00" which is parsed to UTC;
dt_obj = datetime.fromisoformat("2021-05-19 12:51:47" + "+00:00")
In any case, the result
dt_obj.timestamp()
# 1621428707.0
now converts as expected on https://www.unixtimestamp.com/:
As long as you don't specify the timezone when calling strptime, you will produce naive datetime objects. You may pass time zone information via %z format specifier and +00:00 added to the textual date-time representation to get a timezone aware datetime object:
from datetime import datetime
dt_str = "2021-05-19 12:51:47"
print(dt_str)
dt_obj = datetime.strptime(dt_str+"+00:00", "%Y-%m-%d %H:%M:%S%z")
print(dt_obj)
print(dt_obj.timestamp())
The of above script is this:
2021-05-19 12:51:47
2021-05-19 12:51:47+00:00
1621428707.0
datetime.timestamp()
Naive datetime instances are assumed to represent local time and this method relies on the platform C mktime() function to perform the conversion.
So using this does automatically apply yours machine current timezone, following recipe is given to calculate timestamp from naive datetime without influence of timezone:
timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1)

Convert python aware datetime to local timetuple

I have an aware datetime object:
dt = datetime.datetime.now(pytz.timezone('Asia/Ho_Chi_Minh'))
I'm using this to convert the object to timestamp and it currently runs fine:
int(time.mktime(dt.utctimetuple()))
But according to time docs, time.mktime requires local timetuple, not UTC timetuple. How do I get local timetuple from an aware datetime? Or is any other way to make timestamp instead of time.mktime?
I have read this question and it seems that I should use calendar.timegm(dt.utctimetuple()).
Converting datetime to unix timestamp
If you have an aware datetime you can convert it to a unix epoch timestamp by subtracting a datetime at UTC epoch like:
Code:
import datetime as dt
import pytz
def aware_to_epoch(aware):
# get a datetime that is equal to epoch in UTC
utc_at_epoch = pytz.timezone('UTC').localize(dt.datetime(1970, 1, 1))
# return the number of seconds since epoch
return (aware - utc_at_epoch).total_seconds()
aware = dt.datetime.now(pytz.timezone('Asia/Ho_Chi_Minh'))
print(aware_to_epoch(aware))
Results:
1521612302.341014
It seems that your are confused on what do you want.
Based on your comment, the answer is in python datetime documentation:
There is no method to obtain the POSIX timestamp directly from a naive datetime instance representing UTC time. If your application uses this convention and your system timezone is not set to UTC, you can obtain the POSIX timestamp by supplying tzinfo=timezone.utc:
timestamp = dt.replace(tzinfo=timezone.utc).timestamp()
or by calculating the timestamp directly:
timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1)

Python: convert 'days since 1990' to datetime object

I have a time series that I have pulled from a netCDF file and I'm trying to convert them to a datetime format. The format of the time series is in 'days since 1990-01-01 00:00:00 +10' (+10 being GMT: +10)
time = nc_data.variables['time'][:]
time_idx = 0 # first timestamp
print time[time_idx]
9465.0
My desired output is a datetime object like so (also GMT +10):
"2015-12-01 00:00:00"
I have tried converting this using the time module without much success although I believe I may be using wrong (I'm still a novice in python and programming).
import time
time_datetime = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(time[time_idx]*24*60*60))
Any advice appreciated,
Cheers!
The datetime module's timedelta is probably what you're looking for.
For example:
from datetime import date, timedelta
days = 9465 # This may work for floats in general, but using integers
# is more precise (e.g. days = int(9465.0))
start = date(1990,1,1) # This is the "days since" part
delta = timedelta(days) # Create a time delta object from the number of days
offset = start + delta # Add the specified number of days to 1990
print(offset) # >>> 2015-12-01
print(type(offset)) # >>> <class 'datetime.date'>
You can then use and/or manipulate the offset object, or convert it to a string representation however you see fit.
You can use the same format as for this date object as you do for your time_datetime:
print(offset.strftime('%Y-%m-%d %H:%M:%S'))
Output:
2015-12-01 00:00:00
Instead of using a date object, you could use a datetime object instead if, for example, you were later going to add hours/minutes/seconds/timezone offsets to it.
The code would stay the same as above with the exception of two lines:
# Here, you're importing datetime instead of date
from datetime import datetime, timedelta
# Here, you're creating a datetime object instead of a date object
start = datetime(1990,1,1) # This is the "days since" part
Note: Although you don't state it, but the other answer suggests you might be looking for timezone aware datetimes. If that's the case, dateutil is the way to go in Python 2 as the other answer suggests. In Python 3, you'd want to use the datetime module's tzinfo.
netCDF num2date is the correct function to use here:
import netCDF4
ncfile = netCDF4.Dataset('./foo.nc', 'r')
time = ncfile.variables['time'] # do not cast to numpy array yet
time_convert = netCDF4.num2date(time[:], time.units, time.calendar)
This will convert number of days since 1900-01-01 (i.e. the units of time) to python datetime objects. If time does not have a calendar attribute, you'll need to specify the calendar, or use the default of standard.
We can do this in a couple steps. First, we are going to use the dateutil library to handle our work. It will make some of this easier.
The first step is to get a datetime object from your string (1990-01-01 00:00:00 +10). We'll do that with the following code:
from datetime import datetime
from dateutil.relativedelta import relativedelta
import dateutil.parser
days_since = '1990-01-01 00:00:00 +10'
days_since_dt = dateutil.parser.parse(days_since)
Now, our days_since_dt will look like this:
datetime.datetime(1990, 1, 1, 0, 0, tzinfo=tzoffset(None, 36000))
We'll use that in our next step, of determining the new date. We'll use relativedelta in dateutils to handle this math.
new_date = days_since_dt + relativedelta(days=9465.0)
This will result in your value in new_date having a value of:
datetime.datetime(2015, 12, 1, 0, 0, tzinfo=tzoffset(None, 36000))
This method ensures that the answer you receive continues to be in GMT+10.

Seconds since date (not epoch!) to date

I have a dataset file with a time variable in "seconds since 1981-01-01 00:00:00".
What I need is to convert this time into calendar date (YYYY-MM-DD HH:mm:ss).
I've seen a lot of different ways to do this for time since epoch (1970) (timestamp, calendar.timegm, etc) but I'm failing to do this with a different reference date.
I thought of doing something that is not pretty but it works:
time = 1054425600
st = (datetime.datetime(1981,1,1,0,0)-datetime.datetime(1970,1,1)).total_seconds()
datetime.datetime.fromtimestamp(st+time)
But any other way of doing this will be welcome!
You could try this:
from datetime import datetime, timedelta
t0 = datetime(1981, 1, 1)
seconds = 1000000
dt = t0 + timedelta(seconds=seconds)
print dt
# 1981-01-12 13:46:40
Here t0 is set to a datetime object representing the "epoch" of 1981-01-01. This is your reference datetime to which you can add a timedelta object initialised with an arbitrary number of seconds. The result is a datetime object representing the required date time.
N.B. this assumes UTC time

Categories