Example
import pytz
b=pytz.timezone('Europe/Rome')
c=pytz.timezone('Europe/Berlin')
These two timezones have different names but represent the same thing, however
b==c returns false
b.zone is different than c.zone
Is there any way to see that b is in reality equal to c?
The concrete problem is that I have to convert the timezone of a pandas data frame, but only if this zone is different than let's say c. The original timezone might be b and in this case I do not want to convert as it would be a lost of time to convert b into c (since they represent the same time zones at the end....)
Thanks for any help.
Update:
changed 'CET' into 'Europe/Rome' to make sure that the timezones are the same in the example, using the feedback from an answer
They do not represent the same thing.
"CET" is always UTC+01:00
"Europe/Berlin" alternates between CET (UTC+01:00) in the winter, and CEST (UTC+02:00) in the summer.
See also:
The timezone tag wiki - specifically, the section "Time Zone != Offset"
The dst tag wiki - covering daylight saving time.
With regards to the edit, Europe/Rome is a distinct time zone. It is not the same as Europe/Berlin, nor Europe/Zurich, nor Europe/Amsterdam. At least not for their entire histories.
If you compare their definitions (using the links in the prior paragraph), you'll see that these each aligned to the "EU" rule for CET/CEST at some point in their past. Rome and Berlin since 1980, Zurich since 1981, and Amsterdam since 1977. Before those dates, they differed significantly. Other time zones have different rules as well.
If you're interested in the history of these zones, I suggest reading through the europe file in the TZ data. The comments alone are quite interesting.
On the other hand, if you are only working with modern dates, where all zones are following the same rules and offsets, then you could consider these substitutable - at least as long as they don't change in the future.
Also, there are some time zones that are just aliases and are completely interchangeable. In the TZ data, they're called "links". For example, you can see here that Europe/Vatican and Europe/San_Marino are both linked to Europe/Rome, and are therefore equivalent.
It's kind of ghetto, but I could compare the offsets of both timezones against a given a timestamp.
from datetime import datetime
today = datetime.today()
b.utcoffset(today) == c.utcoffset(today)
If the only reason for not wanting to convert is because of inefficiency, I would question whether this is really necessary. There is a good blog post by Wes McKinney on vectorized datetime conversion http://wesmckinney.com/blog/?p=506. As an example, for a series with 1e6 rows
In [1]: from pandas import *
In [2]: import numpy as np
In [3]: rng = date_range('3/6/2012', periods=1000000, freq='s', tz='US/Eastern')
In [4]: ts = Series(np.random.randn(len(rng)),rng)
In [5]: %timeit ts.tz_convert('utc')
100 loops, best of 3: 2.17 ms per loop
Related
I am trying to convert UTC data to local time Mozambique. For Mozambique the local time follows GMT+2 or Africa/Maputo. However, when using .tz_localize('UTC').tz_convert(X) where X can either be = 'GMT+2' or = 'Africa/Maputo' I get separate answers. As an example:
import pandas as pd
import numpy as np
np.random.seed(2019)
N = 1000
rng = pd.date_range('2019-01-01', freq='10Min', periods=N)
df = pd.DataFrame(np.random.rand(N, 3), columns=['temp','depth','acceleration'], index=rng)
print(df.tz_localize('UTC').tz_convert('Etc/GMT+2'))
print(df.tz_localize('UTC').tz_convert('Africa/Maputo'))
The code that solves my problem is: df.tz_localize('UTC').tz_convert('Africa/Maputo'). Therefore, I wonder if I have misunderstood the tz_convert('Etc/GMT+2') method, and why the two different solutions dont provide the same answers. tz_convert('Etc/GMT-2') solves the trick but is not intuitive, at least to me.
Thanks in advance.
The time zone conversion using etcetera works in reverse, and perhaps it should be deprecated altogether, considering the following observation on its documentation:
These entries are mostly present for historical reasons, so that
people in areas not otherwise covered by the tz files could "zic -l"
to a time zone that was right for their area. These days, the
tz files cover almost all the inhabited world, so there's little
need now for the entries that are not on UTC.
Your workaround is correct and the best explanation why can be found here. Maybe stick with the tz_convert('Africa/Maputo').
I encountered this issue but found the solution after a bit of research. I have posted my answer below to show my findings. If anyone has alternative suggestions please post them.
I needed to convert a datetime.datime object to a Unix timestamp. I tried using the datetime.timestamp. I found the result was 1 hour behind what I expected. I was able to replicate this issue with the following.
from datetime import datetime, timestamp
dt = datetime.utcfromtimestamp(1438956602.0)
dt now equals datetime.datetime(2015, 8, 7, 14, 10, 2)
Then:
dt_stamp = datetime.timestamp(dt)
Which gives dt_stamp = 1438953002.0 (which is different from our original timestamp). If we convert it back to datetime
datetime.utcfromtimestamp(dt_stamp)
We get:
datetime.datetime(2015, 8, 7, 13, 10, 2)
Which is an hour earlier than our original time.
For context I am using Python 3 and based in Britain where we're currently using British summer time (1 hour ahead of UTC).
My solution can be found below. If you think I have missed anything from my explanation or there's a better solution, please post your own answer.
I met the same problem recently, my case is that part EDF recording from one hostipal in UK have one hour bias, which is considered due to British summer time.
Following is the solution to my case.
from datetime import datetime as dt
Please use
dt = dt.utcfromtimestamp(#YOUR_TIME_STAMP)
INSTEAD of
dt = dt.fromtimestamp(#YOUR_TIME_STAMP)
The cause for this difference is actually shown in the
datetime.timestamp documentation.
Naive datetime instances are assumed to represent local time and this method relies on the platform C mktime() function to perform the conversion. Since datetime supports wider range of values than mktime() on many platforms, this method may raise OverflowError for times far in the past or far in the future.
Because I am in UTC+1 (during British summer time) this is the timezone datetime.timestamp uses to calculate the timestamp. This is where the mistake comes in. The documentation recommends a few ways to deal with this. I went with the following.
from datetime import datetime, timestamp
dt = datetime.utcfromtimestamp(1438956602.0)
dt_stamp = datetime.timestamp(dt.replace(tzinfo=timezone.utc))
By adding .replace(tzinfo=timezone.utc) to the end of dt it specifies that this is done in the UTC timezone. datetime.timestamp then knows to use the UTC time rather than whatever timezone my machine is running.
People in America or other parts of the world will encounter this issue if not using the UTC timezone. If this is the case you can set tzinfo to whatever your timezone is. Also note that datetime.utcfromtimestamp is also clearly designed for people using the UTC timezone.
I think you need a so called aware datetime object. Aware means it nows the time difference you have:
datetime.fromtimestamp(timestamp, timezone(timedelta(hours=1)))
Try it with that line of code, where timestamp is your Unix timestamp.
I have a timezone information saved as "PST -8 0". I want to convert this into its equivalent timezone name e.g. America/Los_Angeles.
Is there any library or API which would be useful in this conversion? I am looking for a BASH command or a PYTHON API for this task.
I am not sure if this conversion is possible first of all and seeking your comments on the same.
While there may be some specific cases where you can do what you are asking, in the general case you cannot, for several reasons:
Time zone abbreviations can be ambiguous. For example, there are 5 different meanings for "CST". See Wikipedia's list of time zone abbreviations.
Time zone abbreviations are not uniform and consistent. Wikipedia's list will probably vary from other lists of abbreviations you may find elsewhere. In some places of the world, there is no abbreviation in use, and other places argue about what the correct abbreviation should be.
Time zone offsets do not fully represent a time zone. Pacific time is not -8. It's -8 in the winter and -7 in the summer. It changes for daylight saving time.
Many different time zone identifiers in the tzdb share the same offsets at certain point in time, and sometimes even share the same abbreviations. For the example you gave, the result could just as easily be America/Vancouver instead of America/Los_Angeles. In some cases, the various zones that could match will be significantly different for different points in time. Refer to the list of tzdb zones on Wikipedia, which includes their offsets.
Check out pytz for a Python API to the Olson timezone database.
As mentioned in the comments, the "PST" part is probably not reliable.
Using pytz you can bruteforce the problem, in a way. If we assume you want to resolve the numerical part of the input, and ignore whatever string precedes it, it could look something like this:
import pytz
import re
from datetime import datetime
def find_timezone_name(input):
match = re.match(r'.*([+-]\d+) (\d+)', input)
if match:
offset = int(match.group(1))+int(match.group(2))/10.0
else:
raise ValueError('Unexpected input format')
refdate = datetime.now()
for tzname in pytz.common_timezones:
tzinfo = pytz.timezone(tzname)
tzoffset = tzinfo.utcoffset(refdate)
if offset == tzoffset.total_seconds()/3600:
return tzname
return "Unknown"
print(find_timezone_name('PST -8 0'))
If you want to restrict the timezones to a specific list you can replace pytz.common_timezones with something else and/or apply some other type of logic if you have additional input data that would help your selection process.
Of course, you can adapt the regex to accommodate additional input variants.
Finally make sure you take in consideration the points mentioned by Matt Johnson in his answer to this same question.
I found the following behaviour at normalizing Timestamps at daylight saving time change boundaries at pandas 0.16.2:
import pandas as pd
original_midnight = pd.Timestamp('20121104', tz='US/Eastern')
original_midday = pd.Timestamp('20121104T120000', tz='US/Eastern')
str(pd.tslib.normalize_date(original_midday))
`Out[10]:'2012-11-04 00:00:00-05:00'`
str(original_midnight)
`Out[12]:'2012-11-04 00:00:00-04:00'`
I believe the normalized Timestamp should have the same timezone than the original_midnight.
Is it a bug, or do I miss something?
The implementation simply truncates the time. It does not appear to manipulate the offset at all, so I will say no, it is not timezone aware.
Also, consider that this type of operation (in many languages) tends to gloss over the fact that not every local day has a midnight. For example, if the time zone is 'America/Sao_Paulo' (Brazil), and the date is on the spring-forward transition (such as 2015-10-18), the hour from 00:00 to 00:59 is skipped, meaning the start of the day is actually 01:00. If the function were to be updated to be timezone aware, it would have to adjust the time as well as the offset.
Is there a way to compare the size of two DateOffset objects?
>>> from pandas.core.datetools import *
>>> Hour(24) > Minute(5)
False
This works with timedelta, so I assumed that pandas would inherit that behavior - or is the time system made from scratch?
pandas DateOffsets does not inherit from timedelta. It's possible for some DateOffsets to be compared, but for offsets like MonthEnd, MonthStart, etc, the span of time to the next offset is non-uniform and depends on the starting date.
Please feel free to start a github issue on this at https://github.com/pydata/pandas, we can continue the discussion there and it'll serve as a reminder.
Thanks.