I need to parse a date time in a non-English language, on a machine running with an en-us locale. The easy solution to this problem would be to do setlocale and then proceed to strptime.
The problem is that if you setlocale, it gets set program-wide. However, due to the nature of my program, many threads are running and run locale sensitive processes.
This means setlocale is not an option, because it messes with it globally. How can I strptime with a different locale?
Related
The most recent zoneinfo database, as maintained by the Internet Assigned Numbers Authority, holds two new prefixes, 'posix' and 'right'. For example where there used to be just Asia/Kolkata the new database has added posix/Asia/Kolkata and right/Asia/Kolkata.
This database is also known as the Olson database after its first developer, or the tz database.
What do these newly added prefixes mean, and what's their practical effect? Can any of them safely be filtered out of timezone-choice picklists presented to users?
Globalized web apps (such as WordPress) use these zoneinfo names for user-preference picklists. They're in MySQL's timezone support setup.
right (or also leap, also zoneinfo-leaps) is about using times including leap seconds. Posix (also zoneinfo-posix, often just `zoneinfo) is about using POSIX times, so without considering leap seconds.
In theory you should not choose, the system choose it for you, considering how the time is stored in the system.
But no, for user you should not use data in zoneinfo.
To quote Python documentation:
Note
These values are not designed to be exposed to end-users; for user facing elements, applications should use something like CLDR (the Unicode Common Locale Data Repository) to get more user-friendly strings. See also the cautionary note on ZoneInfo.key.
Note: some zones are obsolete, and the text is simple ASCII, so it cannot represent correctly the city names, and the Englished version is not always the best one.
The right/ prefix marks timezones taking leap seconds into account.
The posix/ prefix marks timezones using, well, POSIX time. Those timezones are, practically, the same as the unprefixed ones.
It's probably fine to filter out the prefixed names when presenting timezone choices to users (the way WordPress, for example, does). If you're an astronomer your parsecage may vary.
Background
I've been unable to find a deeper explanation of the real source of the time value used by Python. Most of the documentation states that it 'gets the value from CPython' but does not go into detail about where it comes from beyond that point.
Some countries have recently changed or are looking to change their DST policies. This means that devices in those countries may auto-adjust the time zone and produce an incorrect GMT-based time until they are updated with the current policies.
That's not terrible on it's own, because operating systems usually have a great mechanism for updates. However, updating Python on those same systems is a very murky area sometimes. The OS may handle it, or perhaps Python is bundled with the application. Perhaps the OS is still getting critical updates but not Python updates.
The Problem / Question
We do not control how Python is being updated with our code. The concern is that we may get an incorrect unix-time-stamp from Python due to the GMT time calculation being out-of-date.
If Python delegates the GMT time calculation to the OS, then we can rest easy. If Python does not delegate it, then we may have to force the value to come from the OS (e.g. time+%s).
So the question is: Does Python3 get it's GMT time from the OS or from it's own calculation based on local time?
Python does not keep its own database of timezone information; it delegates to the OS and/or the pytz third-party package for all timezone-related calculations. UTC or local timestamps are fetched from the OS via the C gmtime_*() and localtime_*() functions from the platform's time.h.
For the full story in source, have a look at the Python version of the datetime module, the C version of the same, the time module, and the pytime.c C module.
Is there a correct way to format numbers by locale (getting the correct decimal separator) without modifying global state? This is for text generation server-side, so setlocale is not a good idea, and Babel does not yet support Python 3
Unfortunately, the locale module gets and sets global state. This is intrinsic to the design of locale.
The various workarounds include setting locks or calling a subprocess as a service.
I've read a bunch of posts on how flaky parsing time can be. I believe I have come up with a reliable way of converting an ISO8601-formatted timestamp here:
https://gist.github.com/3702066
The most important part being the astimezone(LOCALZONE) call when the date is parsed. This allowed time.mktime() to do the right thing and appears to handle daylight savings properly.
Are there obvious gotchas I've missed?
Your code does seem to work even for times that fall just before or just after daylight savings time transitions, but I am afraid it might still fail on those rare occasions when a location's timezone offset actually changes. I don't have an example to test with though.
So even if if does work (or almost always work), I think it's crazy to convert a UTC time string to a UTC timestamp in a manner which involves or passed through local time in any way. The local time zone should be irrelevant. It's an unwanted dependency. I'm not saying that you're crazy. You're just trying to work with the APIs you are given, and the C library's time APIs are badly designed.
Luckily, Python provides an alternative to mktime() that is what the C library should have provided: calendar.timegm(). With this function, I can rewrite your function like this:
parsed = parse_date(timestamp)
timetuple = parsed.timetuple()
return calendar.timegm(timetuple)
Because local time is not involved, this also removes the dependency on pytz and the nagging doubt that an obscure artifact of somebody's local timezone will cause an unwanted effect.
I currently have setup a Python script that uses feedparser to read a feed and parse it. However, I have recently come across a problem with the date parsing. The feed I am reading contains <modified>2010-05-05T24:17:54Z</modified> - which comes up in Python as a datetime object - 2010-05-06 00:17:54. Notice the discrepancy: the feed entry was modified on the 5th of may, while python reads it as the 6th.
So the question is why this is happening. Is the ATOM feed (that is, the one who created the feed) wrong by putting the time as 24:17:54, or is my python script wrong in the way it treats it.
And can I solve this?
There are some interesting special cases in the rfc here (https://www.rfc-editor.org/rfc/rfc3339), however, typically its for the 00:00:60 vs 00:00:59 to allow for leap seconds. It may be though that that is legal. My guess is that its doing the "right thing". In all honesty, date/time things get really messy due to things like DST and local timezones. If its 24:17:54, that might be the right thing after all.
I think today at 24:17 is intelligently parsed as tomorrow at 00:17.... I'm thinking you are well handling the producer's bug.