I am trying to parse a Unix timestamp using pd.to_datetime() vs. dt.datetime.fromtimestamp(), but their outputs are different. Which one is correct?
import datetime as dt
import pandas as pd
ts = 1674853200000
print(pd.to_datetime(ts, unit='ms'))
print(dt.datetime.fromtimestamp(ts / 1e3))
>> 2023-01-27 21:00:00
>> 2023-01-27 13:00:00
In contrast to pandas (numpy) datetime, vanilla Python datetime defaults to local time if you to not specify a time zone or UTC (= use naive datetime). Here's an illustration. If I reproduce your example in my Python environment, I get
from datetime import datetime, timezone
import pandas as pd
# ms since the Unix epoch, 1970-01-01 00:00 UTC
unix = 1674853200000
dt_py = datetime.fromtimestamp(unix/1e3)
dt_pd = pd.to_datetime(unix, unit="ms")
print(dt_py, dt_pd)
# 2023-01-27 22:00:00 # from fromtimestamp
# 2023-01-27 21:00:00 # from pd.to_datetime
Comparing the datetime objects with my local time UTC offset, there's the difference:
# my UTC offset at that point in time:
print(dt_py.astimezone().utcoffset())
# 1:00:00
# difference between dt_py and dt_pd:
print(dt_py-dt_pd)
# 0 days 01:00:00
To get consistent results between pandas and vanilla Python, i.e. avoid the ambiguity, you can use aware datetime:
dt_py = datetime.fromtimestamp(unix/1e3, tz=timezone.utc)
dt_pd = pd.to_datetime(unix, unit="ms", utc=True)
print(dt_py, dt_pd)
# 2023-01-27 21:00:00+00:00
# 2023-01-27 21:00:00+00:00
print(dt_py-dt_pd)
# 0 days 00:00:00
Both are correct. The main difference between them is that pd.to_datetime() is more flexible and can handle missing input data, while dt.datetime.fromtimestamp() assumes the input timestamp is in the local time zone. Generally, the choice of which one to use depends on the requirements of your use-case.
Related
I have a variable in a df that looks like this
Datetime
10/27/2020 2:28:28 PM
8/2/2020 3:30:18 AM
6/15/2020 5:38:19 PM
How can I change it to this using python?
Date Time
10/27/2020 14:28:28
8/2/2020 3:30:18
6/15/2020 17:38:19
I understand how to separate date and time, but unsure of how to convert it to 24 hour time.
I think this is source you want:
from dateutil.parser import parse
dt = parse("10/27/2020 2:28:28")
print(dt)
# 2020-10-27 02:28:28
# Create Date
date=f"{str(dt.year)}/{str(dt.month)}/{str(dt.day)}"
# Create Time
time=f"{str(dt.hour)}:{str(dt.minute)}:{str(dt.second)}"
You can use pd.to_datetime to convert a scalar, array-like, Series or DataFrame/dict-like to a pandas datetime object. Then, you can use the accessor object for datetimelike properties of the Series values (Series.dt()) to obtain the time, that will be already in the desired format.
You can also use dt.strftime to format the output string which supports the same string format as the python standard library.
df['Datetime'] = pd.to_datetime(df.Datetime)
df['Date'] = df.Datetime.dt.strftime('%m/%d/%Y')
df['Time'] = df.Datetime.dt.time
print(df)
Datetime Date Time
0 2020-10-27 14:28:28 10/27/2020 14:28:28
1 2020-08-02 03:30:18 08/02/2020 03:30:18
2 2020-06-15 17:38:19 06/15/2020 17:38:19
I have two datetime objects which I want to subtract - however they both need to be in same format
I tried to convert datetime64[ns, pytz.FixedOffset(-240)] (eastern time zone) however I run into errors. Other datetime object is datetime64[ns] which is already in est timezone
1) df['date'].strftime('%Y-%m-%d %H:%M:%S')
error: 'Series' object has no attribute 'strftime'
2) df['date'].replace(tzinfo=None)
error: replace() got an unexpected keyword argument 'tzinfo'
3) df['date'].dt_tz.replace(tzinfo=None)
error: 'Series' object has no attribute 'dt_tz'
In pandas, if you have mixed time zones or UTC offsets, you will get
TypeError: DatetimeArray subtraction must have the same timezones or no timezones
when trying to calculate a timedelta. The error basically tells you how to avoid it: convert everything to the same tz, for example:
import pandas as pd
df = pd.DataFrame({
'date0': pd.to_datetime(["2021-08-01 00:00 -04:00"]), # should be US/Eastern
'date1': pd.to_datetime(["2021-08-01 01:00"]) # should be US/Eastern as well
})
# date0 date1
# 0 2021-08-01 00:00:00-04:00 2021-08-01 01:00:00
# date0 already has a UTC offset but we can set a proper time zone:
df['date0'] = df['date0'].dt.tz_convert('America/New_York')
# date1 is naive, i.e. does not have a time zone, so we need to localize:
df['date1'] = df['date1'].dt.tz_localize('America/New_York')
# since both datetime columns now have the same time zone, we can calculate:
print(df['date1'] - df['date0'])
# 0 0 days 01:00:00
# dtype: timedelta64[ns]
Python's datetime isn't that picky, you can easily calculate timedelta from datetime objects with different time zones:
from datetime import datetime
from zoneinfo import ZoneInfo # Python 3.9
d0 = datetime(2021, 1, 1, tzinfo=ZoneInfo("UTC"))
d1 = datetime(2020, 12, 31, 20, tzinfo=ZoneInfo('America/New_York'))
print(d1-d0)
# 1:00:00
Keep in mind that Python's timedelta arithmetic is wall-time arithmetic; you can do weird stuff like this. So it's sometimes less obvious what's going on I'd say.
While #MrFuppes answer is detailed for generic case since one of my dataframe was already in tz format I had to take below steps which worked
Initial format
datetime64[ns, pytz.FixedOffset(-240)] (eastern time zone)
1) Step taken
pd.to_datetime((df['date']).dt.tz_convert('US/Eastern'))
Initial Format
datetime64[ns]
2) Step taken
pd.to_datetime((df['date1']).dt.tz_localize('US/Eastern'))
This two steps brought datetime in same format for me to perform arithmetic operations
I have datetime values in local time zone, and I need to convert them into UTC. How can I do this conversion for historical records, considering daylight saving time in the past?
Local UTC
2018/07/20 09:00 ???
2018/12/31 11:00 ???
2019/01/17 13:00 ???
2020/08/15 18:00 ???
This is what I have so far:
import pytz
without_timezone = datetime(2018, 7, 20, 9, 0, 0, 0)
timezone = pytz.timezone("Europe/Vienna")
with_timezone = timezone.localize(without_timezone)
with_timezone
So, I assigned Europe/Vienna to all records (I assume that this considers daylight saving time, right?)
Now I need to convert it into UTC...
Assuming Local contains date/time as observed locally, i.e. including DST active/inactive, you would convert to datetime object, set time zone, and convert to UTC.
Ex:
from datetime import datetime, timezone
from zoneinfo import ZoneInfo # Python 3.9
Local = ["2018/07/20 09:00", "2018/12/31 11:00", "2019/01/17 13:00", "2020/08/15 18:00"]
# to datetime object and set time zone
LocalZone = ZoneInfo("Europe/Vienna")
Local = [datetime.strptime(s, "%Y/%m/%d %H:%M").replace(tzinfo=LocalZone) for s in Local]
for dt in Local:
print(dt.isoformat(" "))
# 2018-07-20 09:00:00+02:00
# 2018-12-31 11:00:00+01:00
# 2019-01-17 13:00:00+01:00
# 2020-08-15 18:00:00+02:00
# to UTC
UTC = [dt.astimezone(timezone.utc) for dt in Local]
for dt in UTC:
print(dt.isoformat(" "))
# 2018-07-20 07:00:00+00:00
# 2018-12-31 10:00:00+00:00
# 2019-01-17 12:00:00+00:00
# 2020-08-15 16:00:00+00:00
Note: with Python 3.9, you don't need third party libraries for time zone handling in Python anymore. There is a deprecation shim for pytz.
First, check your conversion value, here, in PDT, Universal is 5 hours behind, so convert accordingly, as for checking if its daylight savings time, write an if statement checking the date, and month and convert accordingly. Does this help?
from datetime import datetime
import pandas as pd
date="2020-02-07T16:05:16.000000000"
#Convert using datetime
t1=datetime.strptime(date[:-3],'%Y-%m-%dT%H:%M:%S.%f')
#Convert using Pandas
t2=pd.to_datetime(date)
#Subtract the dates
print(t1-t2)
#subtract the date timestamps
print(t1.timestamp()-t2.timestamp())
In this example, my understanding is that both datetime and pandas should use timezone naive dates. Can anyone explain why the difference between the dates is zero, but the difference between the timestamps is not zero? It's off by 5 hours for me, which is my time zone offset from GMT.
Naive datetime objects of Python's datetime.datetime class represent local time. This is kind of obvious from the docs but can be a brain-teaser to work with nevertheless. If you call the timestamp method on it, the returned POSIX timestamp refers to UTC (seconds since the epoch) as it should.
Coming from the Python datetime object, the behavior of a naive pandas.Timestamp can be counter-intuitive (and I think it's not so obvious). Derived the same way from a tz-naive string, it doesn't represent local time but UTC. You can verify that by localizing the datetime object to UTC:
from datetime import datetime, timezone
import pandas as pd
date = "2020-02-07T16:05:16.000000000"
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f')
t2 = pd.to_datetime(date)
print(t1.replace(tzinfo=timezone.utc).timestamp() - t2.timestamp())
# 0.0
The other way around you can make the pandas.Timestamp timezone-aware, e.g.
t3 = pd.to_datetime(t1.astimezone())
# e.g. Timestamp('2020-02-07 16:05:16+0100', tz='Mitteleuropäische Zeit')
# now both t1 and t3 represent my local time:
print(t1.timestamp() - t3.timestamp())
# 0.0
My bottom line is that if you know that the timestamps you have represent a certain timezone, work with timezone-aware datetime, e.g. for UTC
import pytz # need to use pytz here since pandas uses that internally
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=pytz.UTC)
t2 = pd.to_datetime(date, utc=True)
print(t1 == t2)
# True
print(t1-t2)
# 0 days 00:00:00
print(t1.timestamp()-t2.timestamp())
# 0.0
So basically I have learned a bit with ISO 8601 where the format is
"2018-07-06T07:00:00.000"
and basically what I have achieved is that I starting of to change the ISO to a more formal timestamp which is:
etatime = str(datetime.datetime.strptime("2018-07-06T07:00:00.000", "%Y-%m-%dT%H:%M:%S.%f"))
which will give an output of:
2018-07-06 07:00:00
However I noticed the time is 1 hour behind the BST (British time) which should be added one hour.
My question is, is there possible to go from (2018-07-06T07:00:00.000) to (2018-07-06 08:00:00 BST)?
Assumptions: the input represents a UTC timestamp, and you want to localise that to London time. You probably do not want to localise it to BST time, since BST is the DST variation of GMT, and an actual location like London will switch between BST and GMT depending on the time of year. You'll want to install the pytz module.
from datetime import datetime, timezone
import pytz
date = '2018-07-06T07:00:00.000'
utc_date = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=timezone.utc)
london_date = utc_date.astimezone(pytz.timezone('Europe/London'))
datetime.datetime(2018, 7, 6, 8, 0, tzinfo=<DstTzInfo 'Europe/London' BST+1:00:00 DST>)
strptime gives you a naïve datetime object (without timezone information), .replace gives you an aware datetime object (with timezone information), which then enables you to simply convert that to a different timezone.
One suggestion is that you can use the timedelta function from datetime module:
from datetime import datetime, timedelta
etatime = datetime.strptime("2018-07-06T07:00:00.000", "%Y-%m-%dT%H:%M:%S.%f")
# Before adding one hour
print(etatime)
etatime = etatime + timedelta(hours=1)
# After adding one hour
print(etatime)
Output:
2018-07-06 07:00:00
2018-07-06 08:00:00