The most recent zoneinfo database, as maintained by the Internet Assigned Numbers Authority, holds two new prefixes, 'posix' and 'right'. For example where there used to be just Asia/Kolkata the new database has added posix/Asia/Kolkata and right/Asia/Kolkata.
This database is also known as the Olson database after its first developer, or the tz database.
What do these newly added prefixes mean, and what's their practical effect? Can any of them safely be filtered out of timezone-choice picklists presented to users?
Globalized web apps (such as WordPress) use these zoneinfo names for user-preference picklists. They're in MySQL's timezone support setup.
right (or also leap, also zoneinfo-leaps) is about using times including leap seconds. Posix (also zoneinfo-posix, often just `zoneinfo) is about using POSIX times, so without considering leap seconds.
In theory you should not choose, the system choose it for you, considering how the time is stored in the system.
But no, for user you should not use data in zoneinfo.
To quote Python documentation:
Note
These values are not designed to be exposed to end-users; for user facing elements, applications should use something like CLDR (the Unicode Common Locale Data Repository) to get more user-friendly strings. See also the cautionary note on ZoneInfo.key.
Note: some zones are obsolete, and the text is simple ASCII, so it cannot represent correctly the city names, and the Englished version is not always the best one.
The right/ prefix marks timezones taking leap seconds into account.
The posix/ prefix marks timezones using, well, POSIX time. Those timezones are, practically, the same as the unprefixed ones.
It's probably fine to filter out the prefixed names when presenting timezone choices to users (the way WordPress, for example, does). If you're an astronomer your parsecage may vary.
Related
There are several Python packages that implement the datetime.tzinfo interface, including pytz and dateutil. If someone hands me a timezone object and wants me to apply it to a datetime, the procedure is different depending on what kind of timezone object it is:
def apply_tz_to_datetime(dt: datetime.datetime, tz: datetime.tzinfo, ambiguous, nonexistent):
if isinstance(tz, dateutil.tz._common._tzinfo):
# do dt.replace(tz, fold=...)
elif isinstance(tz, pytz.tzinfo.BaseTzInfo):
# do tz.localize(dt, is_dst=...)
# other cases here
(The dateutil.tz case is a lot more complicated than I've shown, because there are a lot of cases to consider for non-existent or ambiguous datetimes, but the gist is always to either call dt.replace(tz, fold=...) or raise an exception.)
Checking dateutil.tz._common._tzinfo seems like a no-no, though, is there a better way?
I apologize for having to be the guy to say, "You shouldn't be doing that in the first place", but you indeed should not be trying to detect whether a time zone is a dateutil zone, you should instead just use it as a tzinfo object.
From your comments, it seems like the only reason you want to detect this is because pytz has a separate localization / normalization stage, but since pytz is the only library with this unusual interface, it should suffice to detect pytz zones.
As I mentioned in my comment on the dateutil issue, my recommendations are to either:
Not support pytz at all, if that is possible. It is effectively legacy software at this point, and if you have a new library you at least don't have any users who are already expecting to use it with pytz.
If that is not feasible, something like pytz-deprecation-shim might be a useful abstraction here. For a new library, I wouldn't recommend introducing time zones like those provided that also expose a pytz-like interface, but the helper functions (which don't require a dependency on pytz!) can be profitably used or re-implemented to either detect pytz zones or seamlessly upgrade them to their modern equivalents. You could also use this in combination with #1 by detecting if a zone is a pytz zone and throwing an error.
In any case, there is no particular reason to enumerate all the different time zone providers, since all except pytz use the standard interface.
It appears from the ratio of comments to answers (currently 9/0 = ∞), there is no available answer to the surface-level question (how to determine whether something is a dateutil.tz-style timezone object). I'll open a feature request ticket with the maintainers of the library.
I'm working on a webapp that has a Javascript front-end that talks JSON with a python (flask) middle and a postgres backend. I'd like to safely represent timestamps without loss, ambiguity or bugs. Python itself doesn't generate any timestamps, but it translates them between the client and the DB, which is where bugs can happen.
Javascript' lack of long means that sensible long-count-since-epoch-everywhere is lossy when stored as Javascript's number, and so iso datetime strings are the most least-unnatural format between the client and server. For example, from python, we can generate:
>>> datetime.fromtimestamp(time.time(), pytz.utc).isoformat()
'2018-02-08T05:42:48.866188+00:00'
Which can be unambiguously interpreted in the whole stack without loss of precision, whether or not the timezone offset is non-zero (as it may be coming from the client).
However, between python and the database, things get a little tricky. I'm concerned with preventing timezone unaware datetimes creeping into Python-land and then into the database. For example, a client in China may send JSON to the flask server in California with a string: '2018-02-07T21:46:33.250477' ... which we many parse as a timezone-unaware ISO datetime. Because this is an ambiguous time, the ONLY sensible thing to do it to reject it as an error. But where? I could manually write validation for each field received to Python, but with a large datamodel, it's easy to miss a field.
As I understand it, at the DB-schema level, it doesn't matter too much whether columns are declared timestamp or timestampz provided the queries always (ALWAYS) come with TZ information, they're unambiguously converted UTC for both types. However, as far as I know I don't think its possible for postgres to prevent you putting a timezone-less datetime or time-string into timestampz columns.
Two possibilities come to mind:
Could the flask JSON parser reliably detect iso dates without timezones and reject them?
I could leave the timestamps strings all the way to the psycopg2 cursor.execute. Can psycopg2's sql-formatter reject timestamp strings that don't have a timestamp?
Is there a correct way to format numbers by locale (getting the correct decimal separator) without modifying global state? This is for text generation server-side, so setlocale is not a good idea, and Babel does not yet support Python 3
Unfortunately, the locale module gets and sets global state. This is intrinsic to the design of locale.
The various workarounds include setting locks or calling a subprocess as a service.
I've read a bunch of posts on how flaky parsing time can be. I believe I have come up with a reliable way of converting an ISO8601-formatted timestamp here:
https://gist.github.com/3702066
The most important part being the astimezone(LOCALZONE) call when the date is parsed. This allowed time.mktime() to do the right thing and appears to handle daylight savings properly.
Are there obvious gotchas I've missed?
Your code does seem to work even for times that fall just before or just after daylight savings time transitions, but I am afraid it might still fail on those rare occasions when a location's timezone offset actually changes. I don't have an example to test with though.
So even if if does work (or almost always work), I think it's crazy to convert a UTC time string to a UTC timestamp in a manner which involves or passed through local time in any way. The local time zone should be irrelevant. It's an unwanted dependency. I'm not saying that you're crazy. You're just trying to work with the APIs you are given, and the C library's time APIs are badly designed.
Luckily, Python provides an alternative to mktime() that is what the C library should have provided: calendar.timegm(). With this function, I can rewrite your function like this:
parsed = parse_date(timestamp)
timetuple = parsed.timetuple()
return calendar.timegm(timetuple)
Because local time is not involved, this also removes the dependency on pytz and the nagging doubt that an obscure artifact of somebody's local timezone will cause an unwanted effect.
I am wondering what the most reliable way to generate a timestamp is using Python. I want this value to be put into a MySQL database, and for other programming languages and programs to be able to parse this information and use it.
I imagine it is either datetime, or the time module, but I can't figure out which I'd use in this circumstance, nor the method.
import datetime
print datetime.datetime.now().strftime("%Y-%m-%d-%H%M")
It should return a string with the format you want. Customize the string by taking a look at strftime(). This for example is the text format I used for a log filename.
For a database, your best bet is to store it in the database-native format, assuming its precision matches your needs. For a SQL database, the DATETIME type is appropriate.
EDIT: Or TIMESTAMP.
if it's just a simple timestamp that needs to be read by multiple programs, but which doesn't need to "mean" anything in sql, and you don't care about different timezones for different users or anything like that, then seconds from the unix epoch (start of 1970) is a simple, common standard, and is returned by time.time().
python actually returns a float (at least on linux), but if you only need accuracy to the second store it as an integer.
if you want something that is more meaningful in sql then use a sql type like datetime or timestamp. that lets you do more "meaningful" queries (like query for a particular day) more easily (you can do them with seconds from epoch too, but it requires messing around with conversions), but it also gets more complicated with timezones and converting into different formats in different languages.