python convert string to datetime - python

i have a loop where i try to process set of data where one action is to convert ordinary string to datetime. everything works fine except sometimes happend a weird thing ... here is what i know
there are exactly the same parameters entering the function always
those parameters are the same type always
first time i run it, it always get trought
when it gets to second element in the loop in appx 80% throw and value error (time data did not match format)
but after i run it again, everything is ok, and it gets stuck on next emelement ...
because my function is pretty big and there are many things happing i decide to provide you with some saple code whitch i wrote right here, just for clarification:
data = ['January 20 1999', 'March 4 2010', 'June 11 1819']
dformat = '%B %d %Y'
for item in data:
out = datetime.datetime.strptime(item, dformat)
print out
although this clearly works in my program it doesnt ... i have try everything i have came up with but havent succeeded yet therefore i would be glad with any idea you provide thanks
btw: the error i always get looks like this
ValueError: time data did not match format: data=March 4 2010 fmt=%B %d %Y

You probably have a different locale set up. %B is March in locales that use English, but in other locales it will fail.
For example:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'sv_SE.utf8')
'sv_SE.utf8'
>>> import datetime
>>>
>>> data = ['January 20 1999', 'March 4 2010', 'June 11 1819']
>>> for item in data:
... print datetime.datetime.strptime(item, '%B %d %Y')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.6/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data 'January 20 1999' does not match format '%B %d %Y'
Here you see that even though the format does match, it claims it doesn't. And that's because the month names doesn't match. Change it to Swedish locale names, and it works again:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'sv_SE.utf8')
'sv_SE.utf8'
>>> import datetime
>>>
>>> data = ['Januari 20 1999', 'Mars 4 2010', 'Juni 11 1819']
>>> for item in data:
... print datetime.datetime.strptime(item, '%B %d %Y')
...
1999-01-20 00:00:00
2010-03-04 00:00:00
1819-06-11 00:00:00
(Note that the above locale 'sv_SE.utf8' might not work for you, because you have to have that specific locale installed. To see which ones that are installed on a Unix machine, run this command from the command line:
$ locale -a
C
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
POSIX
sv_FI.utf8
sv_SE.utf8
)

Pretty weird though... In the same run locale usually doesn't change. However, if your program keeps doing this, you might want to call 'setlocale' everytime the code enters into the loop (ugly solution, I know).

Related

Time data does not match format '%c'

This is very unexpected behavior...
I create a time string using the '%c' directive.
%c is the Locale’s appropriate date and time representation.
Then I try to parse the resulting time string, specifying the same '%c' as the string's format.
However this does not work as you can see from the error below. What am I missing?
I need to be able to store the time in a human-readable localized string, and then convert the string back into a struct_time so I can extract information from it.
(It is extremely important that the string be localized, and I of course don't want to write parsing algorithms for all locales around the world!)
# Ensure the locale is set.
import locale
locale.setlocale(locale.LC_ALL, '')
'en_US.UTF-8'
# 1. Create a localized time string using the '%c' directive.
import datetime
time_stamp = datetime.datetime.now().strftime('%c')
time_stamp
'Mon 21 Dec 2020 03:47:55 PM '
# 2. Try to parse the string using the same directive used to create it.
import time
time.strptime(time_stamp, '%c')
# 3. Unexpected error...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/_strptime.py", line 562, in _strptime_time
tt = _strptime(data_string, format)[0]
File "/usr/lib/python3.8/_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data 'Mon 21 Dec 2020 03:47:55 PM ' does not match format '%c'
Your locale is probably not configuring .strftime("%c") the way you expect and .strptime is objecting to the postfixed %p (PM)
Use locale.nl_langinfo(locale.D_T_FMT) to build your format instead!
>>> locale.nl_langinfo(locale.D_T_FMT)
'%a %b %e %H:%M:%S %Y'
>>> locale.setlocale(locale.LC_ALL, '')
'en_US.UTF-8'
>>> locale.nl_langinfo(locale.D_T_FMT)
'%a %b %e %X %Y'
However, if you
.. know the exact structure of the output, filter exact matches with a regex and then parse
.. can control the format, don't bother to format it and directly use time.time()
.. or always work in UTC and format as ISO 8601, deriving a tz-aware object and reading back with a custom parser (refer to the Caution on .fromisoformat)
>>> datetime.datetime.now(tz=datetime.timezone.utc)
datetime.datetime(2020, 12, 22, 0, 4, 29, 537007, tzinfo=datetime.timezone.utc)
use pytz, which is much "smarter" than the datetime builtin lib and properly supports a huge variety of locales
Instead of using %c, you can specify how you want to format the date using %a, %b and other directives. For example:
import locale
locale.setlocale(locale.LC_ALL, 'en_US.utf-8')
import datetime
fmt = '%a %b %d %Y %H:%M:%S'
time_stamp = datetime.datetime.now().strftime(fmt)
print(time_stamp)
import time
print(time.strptime(time_stamp, fmt))
This produces an output that you are looking for:
Output:
Mon Dec 21 2020 21:27:50
time.struct_time(tm_year=2020, tm_mon=12, tm_mday=21, tm_hour=21, tm_min=27, tm_sec=50, tm_wday=0, tm_yday=356, tm_isdst=-1)

Python Dateutil Parsing: Minimum number of components

The python dateutils package allows to parse date(time)s without specifying a format. It attempts to always return a date, even when the input does not appear to be one (e.g. 12). What would be a pythonic way to ensure at least a day, month and year component to be present in the input?
from dateutil import parser
dstr = '12'
dtime = parser.parse(dstr)
Returns
2019-06-12 00:00:00
One way you could do it is by splitting the input string on the likely date delimiters (e.g., ., -, :). So, this way you could input 2016.5.19 or 2016-5-19.
from dateutil import parser
import re
def date_parser(thestring):
pieces = re.split('\.|-|:', thestring)
if len(pieces) < 3:
raise Exception('Must have at least year, month and date passed')
return parser.parse(thestring)
print('---')
thedate = date_parser('2019-6-12')
print(thedate)
print('---')
thedate = date_parser('12')
print(thedate)
This will output:
---
2019-06-12 00:00:00
---
Traceback (most recent call last):
File "bob.py", line 18, in <module>
thedate = date_parser('12')
File "bob.py", line 9, in date_parser
raise Exception('Must have at least year, month and date passed')
Exception: Must have at least year, month and date passed
So the first one passes are there are 3 "pieces" to the date. The second one doesn't.
This will get dodgy depending on what is in the re.split, one will have to make sure all the right delimiters are in there.
You could remove the : in the delimiters if you want just typical date delimiters.

Reading log file last 5 minutes python 2.6

i making a script in python for reading the last 5 minutes of a log file, this is my code so far
from datetime import datetime, timedelta
now = datetime.now()
before = timedelta(minutes=5)
now = now.replace(microsecond=0)
before = (now-before)
now = (now.strftime("%b %d %X"))
before = (before.strftime("%b %d %X"))
print(before)
print(now)
with open('user.log','r') as f:
for line in f:
if before in line:
break
for line in f:
if now in line:
break
print (line.strip())
the output is Sep 03 11:47:25 Sep 03 11:52:25 which is the print to check if the time is correct, nearly 100 lines in the log that has it but dont bring me nothing, if i take the ifs out then print all the lines which proves the problem is on the if...
any ideas?
here is a exemple of my log file content:
Sep 03 10:18:47 bni..........teagagfaesa.....
Sep 03 10:18:48 bni..........teagagfaesa.....2
I managed to find a Python even older than yours.
#!/usr/bin/env python
from __future__ import with_statement
from datetime import datetime, timedelta
before = timedelta(minutes=5)
now = datetime.now().replace(microsecond=0, year=1900)
before = (now-before)
with open('user.log','r') as f:
for line in f:
if datetime.strptime(line[0:15], '%b %d %X') < before:
continue
print line.strip()
The change compared to your code is that we convert each time stamp from the file into a datetime object; then we can trivially compare these properly machine-readable representations the way you'd expect (whereas without parsing the dates, it can't work, except by chance -- "Sep" comes after "Aug" but "Sep" comes after "Oct", too; so it seems to work if you run it in a suitable month, but then breaks the next month!)
The year=1900 hack is because strptime() defaults to year 1900 for inputs which don't have a year.

Python: Parse one string into multiple variables?

I am pretty sure that there is a function for this, but I been searching for a while, so decided to simply ask SO instead.
I am writing a Python script that parses and analyzes text messages from an input file. Each line looks like this:
Oct 24, 2014, 19:20 - Lee White: Hello world!
or:
Apr 4, 19:20 - Lee White: Hello world!
If the year in the datetime is not mentioned, it means that the message was sent in the current year.
What I want to do, is parse this string into multiple variables. Ideally, I am looking for a function that takes an input string, a format string, and a couple of variables to store the output in:
foo(input, "MMM DD, YYYY, HH:MM - Sender: Text", &mon, &day, &year, &hour, &minutes, &sender, &text)
Does such a thing exist in Python?
This uses the remarkably useful dateutil library to make date parsing easier - you can pip install python-dateutil or easy_install python-dateutil it. Split the data on the : and the - to get message and sender, then process the date text to get a datetime object where you can access its various attributes to get the components required, eg:
from dateutil.parser import parse
s = 'Apr 4, 19:20 - Lee White: Hello world!'
fst, _, msg = s.rpartition(': ')
date, _, name = fst.partition(' - ')
date = parse(date)
name, msg, date.year, date.month, date.day, date.hour, date.minute
# ('Lee White', 'Hello world!', 2015, 4, 4, 19, 20)
Method strptime() may be used:
import time
strn = 'Apr 4, 19:20 - Lee White: Hello world!'
try:
date = time.strptime(strn.split(' - ')[0],'%b %d, %Y, %H:%M')
year = date.tm_year
except ValueError:
date = time.strptime(strn.split(' - ')[0],'%b %d, %H:%M')
year = time.asctime().split()[-1]
sender = strn.split('- ')[1].split(':')[0]
text = strn.split(': ')[1]
date.tm_mon, date.tm_mday, year, date.tm_hour, date.tm_min, sender, text

Handling international dates in python

I have a date that is either in German for e.g,
2. Okt. 2009
and also perhaps as
2. Oct. 2009
How do I convert this into an ISO datetime (or Python datetime)?
Solved by using this snippet:
for l in locale.locale_alias:
worked = False
try:
locale.setlocale(locale.LC_TIME, l)
worked = True
except:
worked = False
if worked: print l
And then plugging in the appropriate for the parameter l in setlocale.
Can parse using
import datetime
print datetime.datetime.strptime("09. Okt. 2009", "%d. %b. %Y")
http://docs.python.org/library/locale.html
The datetime module is already locale-aware.
It's something like the following
# German locale
loc = locale.setlocale(locale.LC_TIME, ("de","de"))
try:
date = datetime.date.strptime(input, "%d. %b. %Y")
except:
# English locale
loc = locale.setlocale(locale.LC_TIME, ("en","us"))
date = datetime.date.strptime(input, "%d. %b. %Y")
Very minor point about your code snippet: I'm no Python expert but I'd consider the whole "flag to check for success + silently swallowing all exceptions" to be bad style.
try/expect/else does what you want in a cleaner way, I think:
for l in locale.locale_alias:
try:
locale.setlocale(locale.LC_TIME, l)
except locale.Error: # the doc says setlocale should throw this on failure
pass
else:
print l

Categories