I have a timezone information saved as "PST -8 0". I want to convert this into its equivalent timezone name e.g. America/Los_Angeles.
Is there any library or API which would be useful in this conversion? I am looking for a BASH command or a PYTHON API for this task.
I am not sure if this conversion is possible first of all and seeking your comments on the same.
While there may be some specific cases where you can do what you are asking, in the general case you cannot, for several reasons:
Time zone abbreviations can be ambiguous. For example, there are 5 different meanings for "CST". See Wikipedia's list of time zone abbreviations.
Time zone abbreviations are not uniform and consistent. Wikipedia's list will probably vary from other lists of abbreviations you may find elsewhere. In some places of the world, there is no abbreviation in use, and other places argue about what the correct abbreviation should be.
Time zone offsets do not fully represent a time zone. Pacific time is not -8. It's -8 in the winter and -7 in the summer. It changes for daylight saving time.
Many different time zone identifiers in the tzdb share the same offsets at certain point in time, and sometimes even share the same abbreviations. For the example you gave, the result could just as easily be America/Vancouver instead of America/Los_Angeles. In some cases, the various zones that could match will be significantly different for different points in time. Refer to the list of tzdb zones on Wikipedia, which includes their offsets.
Check out pytz for a Python API to the Olson timezone database.
As mentioned in the comments, the "PST" part is probably not reliable.
Using pytz you can bruteforce the problem, in a way. If we assume you want to resolve the numerical part of the input, and ignore whatever string precedes it, it could look something like this:
import pytz
import re
from datetime import datetime
def find_timezone_name(input):
match = re.match(r'.*([+-]\d+) (\d+)', input)
if match:
offset = int(match.group(1))+int(match.group(2))/10.0
else:
raise ValueError('Unexpected input format')
refdate = datetime.now()
for tzname in pytz.common_timezones:
tzinfo = pytz.timezone(tzname)
tzoffset = tzinfo.utcoffset(refdate)
if offset == tzoffset.total_seconds()/3600:
return tzname
return "Unknown"
print(find_timezone_name('PST -8 0'))
If you want to restrict the timezones to a specific list you can replace pytz.common_timezones with something else and/or apply some other type of logic if you have additional input data that would help your selection process.
Of course, you can adapt the regex to accommodate additional input variants.
Finally make sure you take in consideration the points mentioned by Matt Johnson in his answer to this same question.
Related
I need to convert
FROM
a list of strs
TO
a list of
either a datetime.datetime
or a datetime.datetime plus a datetime.timedelta.
The input list contains strings in ISO_8601 (wikipedia) format. The strings can either be a date or a time interval. The solution for this, that I came up with is the following:
import dateutil.parser
result = []
for str_ in input_list:
if not is_time_interval_str(str_):
result.append(dateutil.parser.parse(str_))
else:
result.append(parse_time_interval(str_))
What I am stuck with is the two functions is_time_interval_str and parse_time_interval. I have looked for python packages that implement parsing of the time intervals but I couldn't find any yet. I have checked
dateutil.parser
pyiso8601
isodate
arrow
ciso8601
iso8601utils (claims to support time intervals, but does only some)
maya (offers the functionality, but the implementation is flawed)
pendulum (claims to support time intervals, but does only some)
Some may be capable of parsing durations like PnYnMnDTnHnMnS but none is able to parse time intervals like for example <start>/<end>.
6.,7. and 8. work partially also with <start>/<end> but none of them works with a partial <end> description. (example '2007-01-10/27')
I considered writing my own string parser but it feels that such fundamental thing like the implementation of the ISO_8601 should be incorporated by one of the above packages.
Questions
Should I write my own string parser with something like map(dateutil.parser.parser, str_.split("/"))?
Is there any python package capable of parsing time intervals of the ISO_8601?
I've seen solutions that use database engines for conversion of str to datetime.datetime, would this be an option for time intervals?
I encountered this issue but found the solution after a bit of research. I have posted my answer below to show my findings. If anyone has alternative suggestions please post them.
I needed to convert a datetime.datime object to a Unix timestamp. I tried using the datetime.timestamp. I found the result was 1 hour behind what I expected. I was able to replicate this issue with the following.
from datetime import datetime, timestamp
dt = datetime.utcfromtimestamp(1438956602.0)
dt now equals datetime.datetime(2015, 8, 7, 14, 10, 2)
Then:
dt_stamp = datetime.timestamp(dt)
Which gives dt_stamp = 1438953002.0 (which is different from our original timestamp). If we convert it back to datetime
datetime.utcfromtimestamp(dt_stamp)
We get:
datetime.datetime(2015, 8, 7, 13, 10, 2)
Which is an hour earlier than our original time.
For context I am using Python 3 and based in Britain where we're currently using British summer time (1 hour ahead of UTC).
My solution can be found below. If you think I have missed anything from my explanation or there's a better solution, please post your own answer.
I met the same problem recently, my case is that part EDF recording from one hostipal in UK have one hour bias, which is considered due to British summer time.
Following is the solution to my case.
from datetime import datetime as dt
Please use
dt = dt.utcfromtimestamp(#YOUR_TIME_STAMP)
INSTEAD of
dt = dt.fromtimestamp(#YOUR_TIME_STAMP)
The cause for this difference is actually shown in the
datetime.timestamp documentation.
Naive datetime instances are assumed to represent local time and this method relies on the platform C mktime() function to perform the conversion. Since datetime supports wider range of values than mktime() on many platforms, this method may raise OverflowError for times far in the past or far in the future.
Because I am in UTC+1 (during British summer time) this is the timezone datetime.timestamp uses to calculate the timestamp. This is where the mistake comes in. The documentation recommends a few ways to deal with this. I went with the following.
from datetime import datetime, timestamp
dt = datetime.utcfromtimestamp(1438956602.0)
dt_stamp = datetime.timestamp(dt.replace(tzinfo=timezone.utc))
By adding .replace(tzinfo=timezone.utc) to the end of dt it specifies that this is done in the UTC timezone. datetime.timestamp then knows to use the UTC time rather than whatever timezone my machine is running.
People in America or other parts of the world will encounter this issue if not using the UTC timezone. If this is the case you can set tzinfo to whatever your timezone is. Also note that datetime.utcfromtimestamp is also clearly designed for people using the UTC timezone.
I think you need a so called aware datetime object. Aware means it nows the time difference you have:
datetime.fromtimestamp(timestamp, timezone(timedelta(hours=1)))
Try it with that line of code, where timestamp is your Unix timestamp.
I found the following behaviour at normalizing Timestamps at daylight saving time change boundaries at pandas 0.16.2:
import pandas as pd
original_midnight = pd.Timestamp('20121104', tz='US/Eastern')
original_midday = pd.Timestamp('20121104T120000', tz='US/Eastern')
str(pd.tslib.normalize_date(original_midday))
`Out[10]:'2012-11-04 00:00:00-05:00'`
str(original_midnight)
`Out[12]:'2012-11-04 00:00:00-04:00'`
I believe the normalized Timestamp should have the same timezone than the original_midnight.
Is it a bug, or do I miss something?
The implementation simply truncates the time. It does not appear to manipulate the offset at all, so I will say no, it is not timezone aware.
Also, consider that this type of operation (in many languages) tends to gloss over the fact that not every local day has a midnight. For example, if the time zone is 'America/Sao_Paulo' (Brazil), and the date is on the spring-forward transition (such as 2015-10-18), the hour from 00:00 to 00:59 is skipped, meaning the start of the day is actually 01:00. If the function were to be updated to be timezone aware, it would have to adjust the time as well as the offset.
Example
import pytz
b=pytz.timezone('Europe/Rome')
c=pytz.timezone('Europe/Berlin')
These two timezones have different names but represent the same thing, however
b==c returns false
b.zone is different than c.zone
Is there any way to see that b is in reality equal to c?
The concrete problem is that I have to convert the timezone of a pandas data frame, but only if this zone is different than let's say c. The original timezone might be b and in this case I do not want to convert as it would be a lost of time to convert b into c (since they represent the same time zones at the end....)
Thanks for any help.
Update:
changed 'CET' into 'Europe/Rome' to make sure that the timezones are the same in the example, using the feedback from an answer
They do not represent the same thing.
"CET" is always UTC+01:00
"Europe/Berlin" alternates between CET (UTC+01:00) in the winter, and CEST (UTC+02:00) in the summer.
See also:
The timezone tag wiki - specifically, the section "Time Zone != Offset"
The dst tag wiki - covering daylight saving time.
With regards to the edit, Europe/Rome is a distinct time zone. It is not the same as Europe/Berlin, nor Europe/Zurich, nor Europe/Amsterdam. At least not for their entire histories.
If you compare their definitions (using the links in the prior paragraph), you'll see that these each aligned to the "EU" rule for CET/CEST at some point in their past. Rome and Berlin since 1980, Zurich since 1981, and Amsterdam since 1977. Before those dates, they differed significantly. Other time zones have different rules as well.
If you're interested in the history of these zones, I suggest reading through the europe file in the TZ data. The comments alone are quite interesting.
On the other hand, if you are only working with modern dates, where all zones are following the same rules and offsets, then you could consider these substitutable - at least as long as they don't change in the future.
Also, there are some time zones that are just aliases and are completely interchangeable. In the TZ data, they're called "links". For example, you can see here that Europe/Vatican and Europe/San_Marino are both linked to Europe/Rome, and are therefore equivalent.
It's kind of ghetto, but I could compare the offsets of both timezones against a given a timestamp.
from datetime import datetime
today = datetime.today()
b.utcoffset(today) == c.utcoffset(today)
If the only reason for not wanting to convert is because of inefficiency, I would question whether this is really necessary. There is a good blog post by Wes McKinney on vectorized datetime conversion http://wesmckinney.com/blog/?p=506. As an example, for a series with 1e6 rows
In [1]: from pandas import *
In [2]: import numpy as np
In [3]: rng = date_range('3/6/2012', periods=1000000, freq='s', tz='US/Eastern')
In [4]: ts = Series(np.random.randn(len(rng)),rng)
In [5]: %timeit ts.tz_convert('utc')
100 loops, best of 3: 2.17 ms per loop
I'm quite new to python and don't know much about it but i need to make a small script that when someone inputs a date in any format , it would then converts it in to yyyy-mm-dd format.
The script should be able to share elements of the entered date, and identify patterns.
It might be easy and obvious to some but making one by my self is over my head.
Thanks in advance!
This is a difficult task to do yourself; you might want to take a look at dateutil which has a rather robust parse() method that you can use to try and parse arbitrarily formatted date strings.
You can do something like this (not tested)
import locale
import datetime
...
parsedDate = datetime.strptime(your_string, locale.D_FMT)
print datetime.strftime(parsedDate, "%Y-%M-%d")
This assumes that the user will use its own local convention for dates.
You can use strftime for output (your format is "%Y-%M-%d").
For parsing input there's a corresponding function - strptime. But you won't be able to handle "any format". You have to know what you're getting in the first place. Otherwise you wouldn't be able to tell a difference between (for example) American and other dates. What does 01.02.03 mean for example? This could be:
yy.mm.dd
dd.mm.yy
mm.dd.yy