Python: Parse one string into multiple variables? - python

I am pretty sure that there is a function for this, but I been searching for a while, so decided to simply ask SO instead.
I am writing a Python script that parses and analyzes text messages from an input file. Each line looks like this:
Oct 24, 2014, 19:20 - Lee White: Hello world!
or:
Apr 4, 19:20 - Lee White: Hello world!
If the year in the datetime is not mentioned, it means that the message was sent in the current year.
What I want to do, is parse this string into multiple variables. Ideally, I am looking for a function that takes an input string, a format string, and a couple of variables to store the output in:
foo(input, "MMM DD, YYYY, HH:MM - Sender: Text", &mon, &day, &year, &hour, &minutes, &sender, &text)
Does such a thing exist in Python?

This uses the remarkably useful dateutil library to make date parsing easier - you can pip install python-dateutil or easy_install python-dateutil it. Split the data on the : and the - to get message and sender, then process the date text to get a datetime object where you can access its various attributes to get the components required, eg:
from dateutil.parser import parse
s = 'Apr 4, 19:20 - Lee White: Hello world!'
fst, _, msg = s.rpartition(': ')
date, _, name = fst.partition(' - ')
date = parse(date)
name, msg, date.year, date.month, date.day, date.hour, date.minute
# ('Lee White', 'Hello world!', 2015, 4, 4, 19, 20)

Method strptime() may be used:
import time
strn = 'Apr 4, 19:20 - Lee White: Hello world!'
try:
date = time.strptime(strn.split(' - ')[0],'%b %d, %Y, %H:%M')
year = date.tm_year
except ValueError:
date = time.strptime(strn.split(' - ')[0],'%b %d, %H:%M')
year = time.asctime().split()[-1]
sender = strn.split('- ')[1].split(':')[0]
text = strn.split(': ')[1]
date.tm_mon, date.tm_mday, year, date.tm_hour, date.tm_min, sender, text

Related

How to limit the input in Python for only dates?

Im currently working on my project.
I want the user to only be allowed to input a date (ex, January 2). If he enters anything else than a date a message should appear like "This is not a date, try again" repeatedly until a real date is given. How do i do this?
My initial idea was to create a .txt file were i write all the 365 dates and then somehow code that the user is only allowed to enter a string that matches one of the elements in the file, else try again.
I would really apreciate your help
Use dateutil.parser to handle dates of arbitrary formats.
Code
import dateutil.parser
def valid_date(date_string):
try:
date = dateutil.parser.parse(date_string)
return True
except ValueError:
return False
Test
for date in ['Somestring', 'Feb 20, 2021', 'Feb 20', 'Feb 30, 2021', 'January 25, 2011', '1/15/2020']:
print(f'Valid date {date}: {valid_date(date)}')
Output
Valid date Somestring: False # detects non-date strings
Valid date Feb 20, 2021: True
Valid date Feb 20: True
Valid date Feb 30, 2021: False # Recognizes Feb 30 as invalid
Valid date January 25, 2011: True
Valid date 1/15/2020: True # Handles different formats
There is no need to store all possible valid dates in a file.
Use datetime.strptime() to parse a string (entered by the user) into a datetime object according to a specific format.
strptime will raise an exception the input specified does not adhere to the pattern, so you can catch that exception and tell the user to try again.
Wrap it all in a while loop to make it work forever, until the user gets it right.
You can start with this:
from datetime import datetime
pattern = '%B %d, %Y' # e.g. January 2, 2021
inp = ''
date = None
while date is None:
inp = input('Please enter a date: ')
try:
date = datetime.strptime(inp, pattern)
break
except ValueError:
print(f'"{inp}" is not a valid date.')
continue
For a full list of the %-codes that strptime supports, check out the Python docs.
Provide you with several ways to verify the date, these are just simple implementations, and there is no strict check, you can choose one of the methods and then supplement the detailed check by yourself.
Use date(year,month,day)
def isValidDate(year, month, day):
try:
date(year, month, day)
except:
return False
else:
return True
Use date.fromisoformat()
def isValidDate(datestr):
try:
date.fromisoformat(datestr)
except:
return False
else:
return True
Use strptime
def check_date(i):
valids = ['%Y-%m-%d', '%Y%M']
for valid in valids
try:
return strptime(i, valid)
except ValueError as e:
pass
return False
Use regex
def check_date(str):
reg = /^(\d{4})-(\d{2})-(\d{2})$/;
return reg.test(str)

Time data does not match format '%c'

This is very unexpected behavior...
I create a time string using the '%c' directive.
%c is the Locale’s appropriate date and time representation.
Then I try to parse the resulting time string, specifying the same '%c' as the string's format.
However this does not work as you can see from the error below. What am I missing?
I need to be able to store the time in a human-readable localized string, and then convert the string back into a struct_time so I can extract information from it.
(It is extremely important that the string be localized, and I of course don't want to write parsing algorithms for all locales around the world!)
# Ensure the locale is set.
import locale
locale.setlocale(locale.LC_ALL, '')
'en_US.UTF-8'
# 1. Create a localized time string using the '%c' directive.
import datetime
time_stamp = datetime.datetime.now().strftime('%c')
time_stamp
'Mon 21 Dec 2020 03:47:55 PM '
# 2. Try to parse the string using the same directive used to create it.
import time
time.strptime(time_stamp, '%c')
# 3. Unexpected error...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/_strptime.py", line 562, in _strptime_time
tt = _strptime(data_string, format)[0]
File "/usr/lib/python3.8/_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data 'Mon 21 Dec 2020 03:47:55 PM ' does not match format '%c'
Your locale is probably not configuring .strftime("%c") the way you expect and .strptime is objecting to the postfixed %p (PM)
Use locale.nl_langinfo(locale.D_T_FMT) to build your format instead!
>>> locale.nl_langinfo(locale.D_T_FMT)
'%a %b %e %H:%M:%S %Y'
>>> locale.setlocale(locale.LC_ALL, '')
'en_US.UTF-8'
>>> locale.nl_langinfo(locale.D_T_FMT)
'%a %b %e %X %Y'
However, if you
.. know the exact structure of the output, filter exact matches with a regex and then parse
.. can control the format, don't bother to format it and directly use time.time()
.. or always work in UTC and format as ISO 8601, deriving a tz-aware object and reading back with a custom parser (refer to the Caution on .fromisoformat)
>>> datetime.datetime.now(tz=datetime.timezone.utc)
datetime.datetime(2020, 12, 22, 0, 4, 29, 537007, tzinfo=datetime.timezone.utc)
use pytz, which is much "smarter" than the datetime builtin lib and properly supports a huge variety of locales
Instead of using %c, you can specify how you want to format the date using %a, %b and other directives. For example:
import locale
locale.setlocale(locale.LC_ALL, 'en_US.utf-8')
import datetime
fmt = '%a %b %d %Y %H:%M:%S'
time_stamp = datetime.datetime.now().strftime(fmt)
print(time_stamp)
import time
print(time.strptime(time_stamp, fmt))
This produces an output that you are looking for:
Output:
Mon Dec 21 2020 21:27:50
time.struct_time(tm_year=2020, tm_mon=12, tm_mday=21, tm_hour=21, tm_min=27, tm_sec=50, tm_wday=0, tm_yday=356, tm_isdst=-1)

Change the format of a QDate

I need to change the format of a QDate. Here is my code:
yday = (QtCore.QDate.currentDate().addDays(-1))
And I got this result...
PyQt4.QtCore.QDate(2015, 4, 2)
But I need the date in this format:
2015/04/03
A QDate can be converted to a string using its toString method:
>>> yday = QtCore.QDate.currentDate().addDays(-1)
>>> yday.toString()
'Thu Apr 2 2015'
>>> yday.toString(QtCore.Qt.ISODate)
'2015-04-02'
>>> yday.toString('yyyy/MM/dd')
'2015/04/02'
Note that this output is from Python3. If you're using Python2, by default, the output will be a QString - but it can be converted to a python string using unicode().
you can use datetime.strftime()
yourdate.strftime('%Y, %m, %d')

python convert string to datetime

i have a loop where i try to process set of data where one action is to convert ordinary string to datetime. everything works fine except sometimes happend a weird thing ... here is what i know
there are exactly the same parameters entering the function always
those parameters are the same type always
first time i run it, it always get trought
when it gets to second element in the loop in appx 80% throw and value error (time data did not match format)
but after i run it again, everything is ok, and it gets stuck on next emelement ...
because my function is pretty big and there are many things happing i decide to provide you with some saple code whitch i wrote right here, just for clarification:
data = ['January 20 1999', 'March 4 2010', 'June 11 1819']
dformat = '%B %d %Y'
for item in data:
out = datetime.datetime.strptime(item, dformat)
print out
although this clearly works in my program it doesnt ... i have try everything i have came up with but havent succeeded yet therefore i would be glad with any idea you provide thanks
btw: the error i always get looks like this
ValueError: time data did not match format: data=March 4 2010 fmt=%B %d %Y
You probably have a different locale set up. %B is March in locales that use English, but in other locales it will fail.
For example:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'sv_SE.utf8')
'sv_SE.utf8'
>>> import datetime
>>>
>>> data = ['January 20 1999', 'March 4 2010', 'June 11 1819']
>>> for item in data:
... print datetime.datetime.strptime(item, '%B %d %Y')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.6/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data 'January 20 1999' does not match format '%B %d %Y'
Here you see that even though the format does match, it claims it doesn't. And that's because the month names doesn't match. Change it to Swedish locale names, and it works again:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'sv_SE.utf8')
'sv_SE.utf8'
>>> import datetime
>>>
>>> data = ['Januari 20 1999', 'Mars 4 2010', 'Juni 11 1819']
>>> for item in data:
... print datetime.datetime.strptime(item, '%B %d %Y')
...
1999-01-20 00:00:00
2010-03-04 00:00:00
1819-06-11 00:00:00
(Note that the above locale 'sv_SE.utf8' might not work for you, because you have to have that specific locale installed. To see which ones that are installed on a Unix machine, run this command from the command line:
$ locale -a
C
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
POSIX
sv_FI.utf8
sv_SE.utf8
)
Pretty weird though... In the same run locale usually doesn't change. However, if your program keeps doing this, you might want to call 'setlocale' everytime the code enters into the loop (ugly solution, I know).

Handling international dates in python

I have a date that is either in German for e.g,
2. Okt. 2009
and also perhaps as
2. Oct. 2009
How do I convert this into an ISO datetime (or Python datetime)?
Solved by using this snippet:
for l in locale.locale_alias:
worked = False
try:
locale.setlocale(locale.LC_TIME, l)
worked = True
except:
worked = False
if worked: print l
And then plugging in the appropriate for the parameter l in setlocale.
Can parse using
import datetime
print datetime.datetime.strptime("09. Okt. 2009", "%d. %b. %Y")
http://docs.python.org/library/locale.html
The datetime module is already locale-aware.
It's something like the following
# German locale
loc = locale.setlocale(locale.LC_TIME, ("de","de"))
try:
date = datetime.date.strptime(input, "%d. %b. %Y")
except:
# English locale
loc = locale.setlocale(locale.LC_TIME, ("en","us"))
date = datetime.date.strptime(input, "%d. %b. %Y")
Very minor point about your code snippet: I'm no Python expert but I'd consider the whole "flag to check for success + silently swallowing all exceptions" to be bad style.
try/expect/else does what you want in a cleaner way, I think:
for l in locale.locale_alias:
try:
locale.setlocale(locale.LC_TIME, l)
except locale.Error: # the doc says setlocale should throw this on failure
pass
else:
print l

Categories