I have two datetime objects which I want to subtract - however they both need to be in same format
I tried to convert datetime64[ns, pytz.FixedOffset(-240)] (eastern time zone) however I run into errors. Other datetime object is datetime64[ns] which is already in est timezone
1) df['date'].strftime('%Y-%m-%d %H:%M:%S')
error: 'Series' object has no attribute 'strftime'
2) df['date'].replace(tzinfo=None)
error: replace() got an unexpected keyword argument 'tzinfo'
3) df['date'].dt_tz.replace(tzinfo=None)
error: 'Series' object has no attribute 'dt_tz'
In pandas, if you have mixed time zones or UTC offsets, you will get
TypeError: DatetimeArray subtraction must have the same timezones or no timezones
when trying to calculate a timedelta. The error basically tells you how to avoid it: convert everything to the same tz, for example:
import pandas as pd
df = pd.DataFrame({
'date0': pd.to_datetime(["2021-08-01 00:00 -04:00"]), # should be US/Eastern
'date1': pd.to_datetime(["2021-08-01 01:00"]) # should be US/Eastern as well
})
# date0 date1
# 0 2021-08-01 00:00:00-04:00 2021-08-01 01:00:00
# date0 already has a UTC offset but we can set a proper time zone:
df['date0'] = df['date0'].dt.tz_convert('America/New_York')
# date1 is naive, i.e. does not have a time zone, so we need to localize:
df['date1'] = df['date1'].dt.tz_localize('America/New_York')
# since both datetime columns now have the same time zone, we can calculate:
print(df['date1'] - df['date0'])
# 0 0 days 01:00:00
# dtype: timedelta64[ns]
Python's datetime isn't that picky, you can easily calculate timedelta from datetime objects with different time zones:
from datetime import datetime
from zoneinfo import ZoneInfo # Python 3.9
d0 = datetime(2021, 1, 1, tzinfo=ZoneInfo("UTC"))
d1 = datetime(2020, 12, 31, 20, tzinfo=ZoneInfo('America/New_York'))
print(d1-d0)
# 1:00:00
Keep in mind that Python's timedelta arithmetic is wall-time arithmetic; you can do weird stuff like this. So it's sometimes less obvious what's going on I'd say.
While #MrFuppes answer is detailed for generic case since one of my dataframe was already in tz format I had to take below steps which worked
Initial format
datetime64[ns, pytz.FixedOffset(-240)] (eastern time zone)
1) Step taken
pd.to_datetime((df['date']).dt.tz_convert('US/Eastern'))
Initial Format
datetime64[ns]
2) Step taken
pd.to_datetime((df['date1']).dt.tz_localize('US/Eastern'))
This two steps brought datetime in same format for me to perform arithmetic operations
Related
I want to change Datetime (2014-12-23 00:00:00) into unixtime. I tried it with the Datetime function but it didn´t work. I got the Datetime stamps in an array.
Zeit =np.array(Jahresgang1.ix[ :,'Zeitstempel'])
t = pd.to_datetime(Zeit, unit='s')
unixtime = pd.DataFrame(t)
print unixtime
Thanks a lot
I think you can subtract the date 1970-1-1 to create a timedelta and then access the attribute total_seconds:
In [130]:
s = pd.Series(pd.datetime(2012,1,1))
s
Out[130]:
0 2012-01-01
dtype: datetime64[ns]
In [158]:
(s - dt.datetime(1970,1,1)).dt.total_seconds()
Out[158]:
0 1325376000
dtype: float64
to emphasize EdChum's first comment, you can directly get Unix time like
import pandas as pd
s = pd.to_datetime(["2014-12-23 00:00:00"])
unix = s.astype("int64")
print(unix)
# Int64Index([1419292800000000000], dtype='int64')
or for a pd.Timestamp:
print(pd.to_datetime("2014-12-23 00:00:00").value)
# 1419292800000000000
Notes
the output precision is nanoseconds - if you want another, divide appropriately, e.g. by 10⁹ to get seconds, 10⁶ to get milliseconds etc.
this assumes the input date/time to be UTC, unless a time zone / UTC offset is specified
I know these questions have been asked before but I'm struggling to convert a timestamp string to a unix time and figuring out whether the datetime objects are naive or aware
For example, to convert the time "2021-05-19 12:51:47" to unix:
>>> from datetime import datetime as dt
>>> dt_obj = dt.strptime("2021-05-19 12:51:47", "%Y-%m-%d %H:%M:%S")
>>> dt_obj
datetime.datetime(2021, 5, 19, 12, 51, 47)
is dt_obj naive or aware and how would you determine this? The methods on dt_obj such as timetz, tzinfo, and tzname don't seem to indicate anything - does that mean that dt_obj is naive?
Then to get unix:
>>> dt_obj.timestamp()
1621421507.0
However when I check 1621421507.0 on say https://www.unixtimestamp.com then it tells me that gmt for the above is Wed May 19 2021 10:51:47 GMT+0000, ie 2 hours behind the original timestamp?
since Python's datetime treats naive datetime as local time by default, you need to set the time zone (tzinfo attribute):
from datetime import datetime, timezone
# assuming "2021-05-19 12:51:47" represents UTC:
dt_obj = datetime.fromisoformat("2021-05-19 12:51:47").replace(tzinfo=timezone.utc)
Or, as #Wolf suggested, instead of setting the tzinfo attribute explicitly, you can also modify the input string by adding "+00:00" which is parsed to UTC;
dt_obj = datetime.fromisoformat("2021-05-19 12:51:47" + "+00:00")
In any case, the result
dt_obj.timestamp()
# 1621428707.0
now converts as expected on https://www.unixtimestamp.com/:
As long as you don't specify the timezone when calling strptime, you will produce naive datetime objects. You may pass time zone information via %z format specifier and +00:00 added to the textual date-time representation to get a timezone aware datetime object:
from datetime import datetime
dt_str = "2021-05-19 12:51:47"
print(dt_str)
dt_obj = datetime.strptime(dt_str+"+00:00", "%Y-%m-%d %H:%M:%S%z")
print(dt_obj)
print(dt_obj.timestamp())
The of above script is this:
2021-05-19 12:51:47
2021-05-19 12:51:47+00:00
1621428707.0
datetime.timestamp()
Naive datetime instances are assumed to represent local time and this method relies on the platform C mktime() function to perform the conversion.
So using this does automatically apply yours machine current timezone, following recipe is given to calculate timestamp from naive datetime without influence of timezone:
timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1)
from datetime import datetime
import pandas as pd
date="2020-02-07T16:05:16.000000000"
#Convert using datetime
t1=datetime.strptime(date[:-3],'%Y-%m-%dT%H:%M:%S.%f')
#Convert using Pandas
t2=pd.to_datetime(date)
#Subtract the dates
print(t1-t2)
#subtract the date timestamps
print(t1.timestamp()-t2.timestamp())
In this example, my understanding is that both datetime and pandas should use timezone naive dates. Can anyone explain why the difference between the dates is zero, but the difference between the timestamps is not zero? It's off by 5 hours for me, which is my time zone offset from GMT.
Naive datetime objects of Python's datetime.datetime class represent local time. This is kind of obvious from the docs but can be a brain-teaser to work with nevertheless. If you call the timestamp method on it, the returned POSIX timestamp refers to UTC (seconds since the epoch) as it should.
Coming from the Python datetime object, the behavior of a naive pandas.Timestamp can be counter-intuitive (and I think it's not so obvious). Derived the same way from a tz-naive string, it doesn't represent local time but UTC. You can verify that by localizing the datetime object to UTC:
from datetime import datetime, timezone
import pandas as pd
date = "2020-02-07T16:05:16.000000000"
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f')
t2 = pd.to_datetime(date)
print(t1.replace(tzinfo=timezone.utc).timestamp() - t2.timestamp())
# 0.0
The other way around you can make the pandas.Timestamp timezone-aware, e.g.
t3 = pd.to_datetime(t1.astimezone())
# e.g. Timestamp('2020-02-07 16:05:16+0100', tz='Mitteleuropäische Zeit')
# now both t1 and t3 represent my local time:
print(t1.timestamp() - t3.timestamp())
# 0.0
My bottom line is that if you know that the timestamps you have represent a certain timezone, work with timezone-aware datetime, e.g. for UTC
import pytz # need to use pytz here since pandas uses that internally
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=pytz.UTC)
t2 = pd.to_datetime(date, utc=True)
print(t1 == t2)
# True
print(t1-t2)
# 0 days 00:00:00
print(t1.timestamp()-t2.timestamp())
# 0.0
So basically I have learned a bit with ISO 8601 where the format is
"2018-07-06T07:00:00.000"
and basically what I have achieved is that I starting of to change the ISO to a more formal timestamp which is:
etatime = str(datetime.datetime.strptime("2018-07-06T07:00:00.000", "%Y-%m-%dT%H:%M:%S.%f"))
which will give an output of:
2018-07-06 07:00:00
However I noticed the time is 1 hour behind the BST (British time) which should be added one hour.
My question is, is there possible to go from (2018-07-06T07:00:00.000) to (2018-07-06 08:00:00 BST)?
Assumptions: the input represents a UTC timestamp, and you want to localise that to London time. You probably do not want to localise it to BST time, since BST is the DST variation of GMT, and an actual location like London will switch between BST and GMT depending on the time of year. You'll want to install the pytz module.
from datetime import datetime, timezone
import pytz
date = '2018-07-06T07:00:00.000'
utc_date = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=timezone.utc)
london_date = utc_date.astimezone(pytz.timezone('Europe/London'))
datetime.datetime(2018, 7, 6, 8, 0, tzinfo=<DstTzInfo 'Europe/London' BST+1:00:00 DST>)
strptime gives you a naïve datetime object (without timezone information), .replace gives you an aware datetime object (with timezone information), which then enables you to simply convert that to a different timezone.
One suggestion is that you can use the timedelta function from datetime module:
from datetime import datetime, timedelta
etatime = datetime.strptime("2018-07-06T07:00:00.000", "%Y-%m-%dT%H:%M:%S.%f")
# Before adding one hour
print(etatime)
etatime = etatime + timedelta(hours=1)
# After adding one hour
print(etatime)
Output:
2018-07-06 07:00:00
2018-07-06 08:00:00
I have a time series that I have pulled from a netCDF file and I'm trying to convert them to a datetime format. The format of the time series is in 'days since 1990-01-01 00:00:00 +10' (+10 being GMT: +10)
time = nc_data.variables['time'][:]
time_idx = 0 # first timestamp
print time[time_idx]
9465.0
My desired output is a datetime object like so (also GMT +10):
"2015-12-01 00:00:00"
I have tried converting this using the time module without much success although I believe I may be using wrong (I'm still a novice in python and programming).
import time
time_datetime = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(time[time_idx]*24*60*60))
Any advice appreciated,
Cheers!
The datetime module's timedelta is probably what you're looking for.
For example:
from datetime import date, timedelta
days = 9465 # This may work for floats in general, but using integers
# is more precise (e.g. days = int(9465.0))
start = date(1990,1,1) # This is the "days since" part
delta = timedelta(days) # Create a time delta object from the number of days
offset = start + delta # Add the specified number of days to 1990
print(offset) # >>> 2015-12-01
print(type(offset)) # >>> <class 'datetime.date'>
You can then use and/or manipulate the offset object, or convert it to a string representation however you see fit.
You can use the same format as for this date object as you do for your time_datetime:
print(offset.strftime('%Y-%m-%d %H:%M:%S'))
Output:
2015-12-01 00:00:00
Instead of using a date object, you could use a datetime object instead if, for example, you were later going to add hours/minutes/seconds/timezone offsets to it.
The code would stay the same as above with the exception of two lines:
# Here, you're importing datetime instead of date
from datetime import datetime, timedelta
# Here, you're creating a datetime object instead of a date object
start = datetime(1990,1,1) # This is the "days since" part
Note: Although you don't state it, but the other answer suggests you might be looking for timezone aware datetimes. If that's the case, dateutil is the way to go in Python 2 as the other answer suggests. In Python 3, you'd want to use the datetime module's tzinfo.
netCDF num2date is the correct function to use here:
import netCDF4
ncfile = netCDF4.Dataset('./foo.nc', 'r')
time = ncfile.variables['time'] # do not cast to numpy array yet
time_convert = netCDF4.num2date(time[:], time.units, time.calendar)
This will convert number of days since 1900-01-01 (i.e. the units of time) to python datetime objects. If time does not have a calendar attribute, you'll need to specify the calendar, or use the default of standard.
We can do this in a couple steps. First, we are going to use the dateutil library to handle our work. It will make some of this easier.
The first step is to get a datetime object from your string (1990-01-01 00:00:00 +10). We'll do that with the following code:
from datetime import datetime
from dateutil.relativedelta import relativedelta
import dateutil.parser
days_since = '1990-01-01 00:00:00 +10'
days_since_dt = dateutil.parser.parse(days_since)
Now, our days_since_dt will look like this:
datetime.datetime(1990, 1, 1, 0, 0, tzinfo=tzoffset(None, 36000))
We'll use that in our next step, of determining the new date. We'll use relativedelta in dateutils to handle this math.
new_date = days_since_dt + relativedelta(days=9465.0)
This will result in your value in new_date having a value of:
datetime.datetime(2015, 12, 1, 0, 0, tzinfo=tzoffset(None, 36000))
This method ensures that the answer you receive continues to be in GMT+10.