I need to convert
FROM
a list of strs
TO
a list of
either a datetime.datetime
or a datetime.datetime plus a datetime.timedelta.
The input list contains strings in ISO_8601 (wikipedia) format. The strings can either be a date or a time interval. The solution for this, that I came up with is the following:
import dateutil.parser
result = []
for str_ in input_list:
if not is_time_interval_str(str_):
result.append(dateutil.parser.parse(str_))
else:
result.append(parse_time_interval(str_))
What I am stuck with is the two functions is_time_interval_str and parse_time_interval. I have looked for python packages that implement parsing of the time intervals but I couldn't find any yet. I have checked
dateutil.parser
pyiso8601
isodate
arrow
ciso8601
iso8601utils (claims to support time intervals, but does only some)
maya (offers the functionality, but the implementation is flawed)
pendulum (claims to support time intervals, but does only some)
Some may be capable of parsing durations like PnYnMnDTnHnMnS but none is able to parse time intervals like for example <start>/<end>.
6.,7. and 8. work partially also with <start>/<end> but none of them works with a partial <end> description. (example '2007-01-10/27')
I considered writing my own string parser but it feels that such fundamental thing like the implementation of the ISO_8601 should be incorporated by one of the above packages.
Questions
Should I write my own string parser with something like map(dateutil.parser.parser, str_.split("/"))?
Is there any python package capable of parsing time intervals of the ISO_8601?
I've seen solutions that use database engines for conversion of str to datetime.datetime, would this be an option for time intervals?
Related
I know that there have been similar questions asked, but they seemed to have to do with the way datetime deals (or doesn't deal) with timezones.
The setup is a little complicated, and probably not relevant to the problem, but I thought it was important to include the code as is, so a little background:
I've got a dictionary of arrays. Each of these arrays represents an "attempt" by the same person, but taking place at different times. Ultimately I'm going to be looking for the earliest of these dates. This may be a bit of a roundabout solution, but I'm converting all of the dates to datetime objects, finding the earliest and then just using that index to pull out the first attempt:
Here's what the code looks like to setup that array of attempt datetimes:
for key in duplicates_set.keys():
attempt_dates = [datetime.strptime(attempt['Attempt Date'], "%-m-%-d-%y %-H:%M:%S") for attempt in duplicates_set[key]]
Here's the format of what one of the original date strings looks like:
12-5-2016 3:27:58 PM
What I'm getting back is:
ValueError: '-' is a bad directive in format '%-m-%d-%y %-H:%M:%S'
I assume that's referring to the dashes placed before the 'm', 'd' and 'H' because they're non-zero-padded decimals. Why is it telling me that?
%-* -- to skip padding -- is a GNU libc extension. It's not part of POSIX strftime, and thus not guaranteed to be portable to systems where your time-formatting calls aren't eventually backed by GNU's strftime C library function.
The Python datetime module documentation explicitly specifies the format strings it supports, and this extension is not given. Thus, while this is supported in GNU date and GNU strftime(), it isn't available in Python datetime.
I had the same issue;
date: 1/9/21
according to:
https://strftime.org/ the correct format would've been "%-d/%-m/%y"
which gave the bad directive error.
"%d-/%m-/%y" didn't work either.
Weirdly enough what worked was "%d/%m/%y".
I have a timezone information saved as "PST -8 0". I want to convert this into its equivalent timezone name e.g. America/Los_Angeles.
Is there any library or API which would be useful in this conversion? I am looking for a BASH command or a PYTHON API for this task.
I am not sure if this conversion is possible first of all and seeking your comments on the same.
While there may be some specific cases where you can do what you are asking, in the general case you cannot, for several reasons:
Time zone abbreviations can be ambiguous. For example, there are 5 different meanings for "CST". See Wikipedia's list of time zone abbreviations.
Time zone abbreviations are not uniform and consistent. Wikipedia's list will probably vary from other lists of abbreviations you may find elsewhere. In some places of the world, there is no abbreviation in use, and other places argue about what the correct abbreviation should be.
Time zone offsets do not fully represent a time zone. Pacific time is not -8. It's -8 in the winter and -7 in the summer. It changes for daylight saving time.
Many different time zone identifiers in the tzdb share the same offsets at certain point in time, and sometimes even share the same abbreviations. For the example you gave, the result could just as easily be America/Vancouver instead of America/Los_Angeles. In some cases, the various zones that could match will be significantly different for different points in time. Refer to the list of tzdb zones on Wikipedia, which includes their offsets.
Check out pytz for a Python API to the Olson timezone database.
As mentioned in the comments, the "PST" part is probably not reliable.
Using pytz you can bruteforce the problem, in a way. If we assume you want to resolve the numerical part of the input, and ignore whatever string precedes it, it could look something like this:
import pytz
import re
from datetime import datetime
def find_timezone_name(input):
match = re.match(r'.*([+-]\d+) (\d+)', input)
if match:
offset = int(match.group(1))+int(match.group(2))/10.0
else:
raise ValueError('Unexpected input format')
refdate = datetime.now()
for tzname in pytz.common_timezones:
tzinfo = pytz.timezone(tzname)
tzoffset = tzinfo.utcoffset(refdate)
if offset == tzoffset.total_seconds()/3600:
return tzname
return "Unknown"
print(find_timezone_name('PST -8 0'))
If you want to restrict the timezones to a specific list you can replace pytz.common_timezones with something else and/or apply some other type of logic if you have additional input data that would help your selection process.
Of course, you can adapt the regex to accommodate additional input variants.
Finally make sure you take in consideration the points mentioned by Matt Johnson in his answer to this same question.
I'm quite new to python and don't know much about it but i need to make a small script that when someone inputs a date in any format , it would then converts it in to yyyy-mm-dd format.
The script should be able to share elements of the entered date, and identify patterns.
It might be easy and obvious to some but making one by my self is over my head.
Thanks in advance!
This is a difficult task to do yourself; you might want to take a look at dateutil which has a rather robust parse() method that you can use to try and parse arbitrarily formatted date strings.
You can do something like this (not tested)
import locale
import datetime
...
parsedDate = datetime.strptime(your_string, locale.D_FMT)
print datetime.strftime(parsedDate, "%Y-%M-%d")
This assumes that the user will use its own local convention for dates.
You can use strftime for output (your format is "%Y-%M-%d").
For parsing input there's a corresponding function - strptime. But you won't be able to handle "any format". You have to know what you're getting in the first place. Otherwise you wouldn't be able to tell a difference between (for example) American and other dates. What does 01.02.03 mean for example? This could be:
yy.mm.dd
dd.mm.yy
mm.dd.yy
I find myself needing to specify a timespan in a python configuration file a lot.
Is there a way that I can specify a more human readable timeframe (similar to PostgreSQL's Interval syntax) in a python configuration file with stdlib? Or will this require a 3rd party lib?
Clarification I'm not looking for anything in the ConfigParser.ConfigParser stdlib API specifically. I guess what I really need is a way to go from human readable date/time interval to datetime.timedelta value.
I found a good answer to this in an somewhat related question. Turns out the humanfriendly library does that fairly well:
In [1]: import humanfriendly
In [2]: humanfriendly.parse_timespan('1w')
Out[2]: 604800.0
That's in seconds. To get a timedelta object, you can simply load that:
In [3]: from datetime import timedelta
In [4]: timedelta(seconds=humanfriendly.parse_timespan('1w'))
Out[4]: datetime.timedelta(7)
Since humanfriendly also supports converting the other way, you can also do full round trip, which would look like:
In [5]: humanfriendly.format_timespan(timedelta(seconds=humanfriendly.parse_timespan('1w')).total_seconds())
Out[5]: '1 week'
Note how format_timespan does not access timedelta objects, unfortunately: only an integer (seconds).
I don't think there is a standard library module for that. I wrote one that does that. You can install it, or adapt it to your needs.
The module is called pycopia.timespec
It converts strings such as "1day 3min" to seconds, as a float. It's easy to get a datetime.timedelta from that.
I want to generate a fixed-length (say 10 characters) hash based on current date & time. This hash will be append to names of the uploaded files from my users. How can I do that in Python?
Batteries included:
Python3
import hashlib
import time
hashlib.sha1().update(str(time.time()).encode("utf-8"))
print(hashlib.sha1().hexdigest())
print(hashlib.sha1().hexdigest()[:10])
Python2
import hashlib
import time
hash = hashlib.sha1()
hash.update(str(time.time()))
print hash.hexdigest()
print hash.hexdigest()[:10]
I think my comment is a reasonable answer so I am going to post it. The code uses the python time() function to get the number of seconds since the unix epoch:
import time
import datetime
ts = int(time.time()) # this removes the decimals
# convert the timestamp to a datetime object if you want to extract the date
d = datetime.datetime.fromtimestamp(ts)
The time stamp is currently a 10 digit integer that can easily be converted back to a datetime object for other uses. If you want to further shrink the length of the timestamp you could encode the number in hexadecimal or some other format. ie.
hex(int(time.time()))
This reduces the length to 8 characters if you remove the 0x prefix
EDIT:
In your comment you specified that you don't want people to figure out the original date so I would suggest doing something like:
hex(int(time.time() + 12345))[2:] #The [2:] removes the 0x prefix
Just chose a number and remember to subtract it when you are trying to extract the timestamp. Without knowing this number the user would have a very difficult time inferring the real date from your code.
int(stamp,16) - 12345
import time
'{0:010x}'.format(int(time.time() * 256))[:10]
Check out strftime for python. You can format the date/time string any number of ways to get the 'look' you want.
What about changing the base of current milliseconds since epoch. For example, in JavaScript, changing to base 36:
Date.now().toString(36)
Results in :
"jv8pvlbg"
That should be a safe hash, up to milliseconds, respect date order and smaller than 10.
The only thing is that is not safe, but in your case security is not important right?
Sorry I don't have the answer for python, but it should by straightforward and should nor require any library. My two cents.