Python Dateutil Parsing: Minimum number of components - python

The python dateutils package allows to parse date(time)s without specifying a format. It attempts to always return a date, even when the input does not appear to be one (e.g. 12). What would be a pythonic way to ensure at least a day, month and year component to be present in the input?
from dateutil import parser
dstr = '12'
dtime = parser.parse(dstr)
Returns
2019-06-12 00:00:00

One way you could do it is by splitting the input string on the likely date delimiters (e.g., ., -, :). So, this way you could input 2016.5.19 or 2016-5-19.
from dateutil import parser
import re
def date_parser(thestring):
pieces = re.split('\.|-|:', thestring)
if len(pieces) < 3:
raise Exception('Must have at least year, month and date passed')
return parser.parse(thestring)
print('---')
thedate = date_parser('2019-6-12')
print(thedate)
print('---')
thedate = date_parser('12')
print(thedate)
This will output:
---
2019-06-12 00:00:00
---
Traceback (most recent call last):
File "bob.py", line 18, in <module>
thedate = date_parser('12')
File "bob.py", line 9, in date_parser
raise Exception('Must have at least year, month and date passed')
Exception: Must have at least year, month and date passed
So the first one passes are there are 3 "pieces" to the date. The second one doesn't.
This will get dodgy depending on what is in the re.split, one will have to make sure all the right delimiters are in there.
You could remove the : in the delimiters if you want just typical date delimiters.

Related

Error: time data "b'YYYY/MM/DD" does not match format '%Y/%m/%d' but it does

I'm trying to parse dates from a textfile, but executing the scripts throws incorrect data format, when the format is correct.
The file is a .txt file with the following structure
2018/02/15 05:00:13 - somestring - anotherstring
2018/02/15 05:00:14 - somestring - anotherstring
2018/02/15 05:00:15 - somestring - anotherstring
... etc
The script gets the file divided in lines, and each line is divided on fields, of which one field is a date and time. I divided the date and the time in two separate fields, the time gets converted ok so the problem is in the date.
This is what I get on execution:
ValueError: time data "b'2018/02/15" does not match format '%Y/%m/%d'
I noticed it prints the string with a "b" in front of it, which if I'm not mistaken it means it's a byte literal. I've tried using "decode("utf-8")" on it, but it throw's exception as "string" has no method decode.
#the file is in one long string as I get it from a 'cat' bash command via ssh
file = str(stdout.read()) #reads the cat into a long string
strings = file.split("\\n") #splits the string into lines
for string in strings:
fields = string.split(" - ")
if len(fields) >= 3:
#dates.append(datetime.strptime(campos[0],"%Y/%m/%d %H:%M:%S")) #Wrong format
datentime = fields[0].split()
dates.append(datetime.strptime(datentime[0],"%Y/%m/%d")) #Wrong format
print(datentime[1])
dates.append(datetime.strptime(datentime[1],"%H:%M:%S")) #WORKS
I can't figure out why that is happening to you with the code you gave so I can't offer a fix for that but I tried testing on it and this worked for me:
datetime.strptime(str(datentime[0])[2,:-1], "%Y/%m/%d")
It removes the B and ' from the string, if you still have problems with that, please post how you got that string, maybe there was some error on the way.
use try and except:
import datetime
def convertDate(d):
strptime = datetime.datetime.strptime
try:
return strptime(d, "%Y/%m/%d")
except TypeError:
return strptime(d.decode("utf-8"), "%Y/%m/%d")
print(convertDate(b'2018/02/15'))
print(convertDate('2018/02/15'))

How to parse datetime with Z letter with no specified seconds after semicolon

I'm parsing logs of this program and not accessed to source code of the program.
Log contains an interesting timestamp of event in a log record –
2018-11-02T06:25:03870000Z. It looks strange to me and I don't know how correct is it. But I tend to think that 03974200Z describe a seconds (%s) part and I would like to gather information from this record as much as it possible.
I'm trying to parse this example from Python 3.7 like this:
d = '2018-11-02T06:25:03870000Z'
dt.datetime.strptime(d, '%Y-%m-%dT%H:%M:%S')
It generates a predictable error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 362, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: 870000Z
Update:
I have dirty solution for this but if there any better approach to do such operation than this:
sc = d.split(':')[-1][:2]
dd = d.split(':')
en = ':'.join(dd[:-1])
en += ':' + sc
>> en
'2018-11-02T06:25:03'
Questions:
How to parse such a datetime correctly (determining 03 in the example
as a part of seconds)?
(optional) Idk. But is this datetime example in log is correct (in terms of ISO or anything)?
The Z is specifying Zulu Time Zone (UTC or GMT), the seconds are given as whole seconds (03) followed by microseconds (870000) so you can parse the date fully using:
d = '2018-11-02T06:25:03870000Z'
dt.datetime.strptime(d, '%Y-%m-%dT%H:%M:%S%fZ')
I would use
import re
d = '2018-11-02T06:25:03870000Z'
date = re.findall('\d+', d)
this gives you a list of all occurences of one or more digits in a row and now you can do with it what you want, for example
print("Y: %s, M: %s, D: %s, H: %s, m: %s, S: %s" %(tuple(date)))
of course you can then also round the seconds so that they have only two digits

MM/DD/YYYY Date to variable conversion m,d, and y

I am struggling to find the best way to convert the date input given by the user as mm/dd/yyyy to 3 variables. I am unable to split this because I receive an error since it is a 'float'.
>>> date=3/2/2016
>>> date.split('/')
Traceback (most recent call last):
File "<pyshell#152>", line 1, in <module> date.split('/')
AttributeError: 'float' object has no attribute 'split'
what do I need to add to this to make sure it doesn't evaluate the date with division?
def main():
date=input("Enter date mm/dd/yyyy: ")
I want the input date given as mm/dd/yyyy, and then a way to convert this to 3 variables as m=month d=day y=year
What's the best way to do this?
Try str.split:
>>> test_date = "05/12/2016"
>>> month, day, year = test_date.split('/')
>>> print(f"Month = {month}, Day = {day}, Year = {year}")
Month = 05, Day = 12, Year = 2016
I wrote this following piece of code and it works perfectly fine.
>>> date='3/2/2016'
>>> new=date.split('/')
>>> new
['3', '2', '2016']
>>>
>>> m,d,year=new
>>> m
'3'
>>> d
'2'
>>> year
'2016'
>>>
Like Jessica Smith has already pointed it out, date=3/2/2016 evaluates expressions and divides the numbers. It has to be of string string type to be split.
The error "'float' object has no attribute 'split'" suggests that type(date) == float in your example that implies that you are trying to run Python 3 code using Python 2 interpreter where input() evaluates its input as a Python expression instead of returning it as a string.
To get the date as a string on Python 2, use raw_input() instead of input():
date_string = raw_input("Enter date mm/dd/yyyy: ")
To make it work on both Python 2 and 3, add at the top of your script:
try: # make input() and raw_input() to be synonyms
input = raw_input
except NameError: # Python 3
raw_input = input
If you need the old Python 2 input() behavior; you could call eval() explicitly.
To validate the input date, you could use datetime.strptime() and catch ValueError:
from datetime import datetime
try:
d = datetime.strptime(date_string, '%m/%d/%Y')
except ValueError:
print('wrong date string: {!r}'.format(date_string))
.strptime() guarantees that the input date is valid otherwise ValueError is raised. On success, d.year, d.month, d.day work as expected.
Putting it all together (not tested):
#!/usr/bin/env python
from datetime import datetime
try: # make input() and raw_input() to be synonyms
input = raw_input
except NameError: # Python 3
raw_input = input
while True: # until a valid date is given
date_string = raw_input("Enter date mm/dd/yyyy: ")
try:
d = datetime.strptime(date_string, '%m/%d/%Y')
except ValueError: # invalid date
print('wrong date string: {!r}'.format(date_string))
else: # valid date
break
# use the datetime object here
print("Year: {date.year}, Month: {date.month}, Day: {date.day}".format(date=d))
See Asking the user for input until they give a valid response.
You could use .split('/') instead of .strptime() if you must:
month, day, year = map(int, date_string.split('/'))
It doesn't validate whether the values form a valid date in the Gregorian calendar.
Try:
def main():
month, day, year = [int(x) for x in raw_input("Enter date mm/dd/yyyy: ").split('/')]
print "Month: {}\n".format(month), "Day: {}\n".format(day), "Year: {}".format(year)
main()
Output:
Enter date mm/dd/yyyy: 03/09/1987
Month: 3
Day: 9
Year: 1987

Python - read 10min from log file

I need some tool to read latest 10 minutes entry in my log file, and if some words are logged then print some text.
log file:
23.07.2014 09:22:11 INFO Logging.LogEvent 0 Failed login test#test.com
23.07.2014 09:29:02 INFO Logging.LogEvent 0 login test#test.com
23.07.2014 09:31:55 INFO Logging.LogEvent 0 login test#test.com
23.07.2014 09:44:14 INFO Logging.LogEvent 0 Failed login test#test.com
if during last 10min some entry = Failed -print ALARM.
All what i did is find 'Failed' match but i have no idea how to check last 10min in my log file ;/ -any idea??
from sys import argv
from datetime import datetime, timedelta
with open('log_test.log', 'r') as f:
for line in f:
try:
e = line.index("Failed")
except:
pass
else:
print(line)
Your format %d.%m.%Y is worse than %Y:%m:%d which can be used in string comparison.
We also do not know if log is big and if it is sorted. If it is not sorted (it is common for multithreaded applications) you will have to analyze each line and convert it into datetime:
def get_dt_from_line(s):
return datetime.datetime.strptime(s[:20], '%d.%m.%Y %H:%M:%S')
Then use it as filter (for small files):
MAX_CHECK_TIMEDELTA = datetime.timedelta(minutes=10)
LOG_START_ANALYZE_DATETIME = (datetime.datetime.today() - MAX_CHECK_TIMEDELTA)
lines = [s for s in TXT.split('\n') if 'Failed' in s and get_dt_from_line(s) >= LOG_START_ANALYZE_DATETIME]
print('\n'.join(lines))
For big files you can read file line by line.
If your log file is just for one day you can use string comparison instead of datetime comparison:
LOG_START_ANALYZE_DATETIME = (datetime.datetime.today() - datetime.timedelta(minutes=10)).strftime('%d.%m.%Y %H:%M:%S')
lines = [s for s in TXT.split('\n') if 'Failed' in s and s >= LOG_START_ANALYZE_DATETIME]
If I were you, I would lookup line by line, get the timestamp of the first line and then iterate until the difference between the first date and the current one is more than 10 minutes, while counting occurences of the word "Failed".
I think that you'll sort something out with splitting your line following spaces. But be careful as if someday, your log format changes, your script is likely not gonna be working too.

Searching and sorting in Python

i am writing a script in python that searches for strings and suposedly does different things when encounters strings.
import re, datetime
from datetime import *
f = open(raw_input('Name of file to search: ')
strToSearch = ''
for line in f:
strToSearch += line
patFinder = re.compile('\d{2}\/\d{2}\/\d{4}\sA\d{3}\sB\d{3}')
findPat1 = re.findall(patFinder, strToSearch)
# search only dates
datFinder = re.compile('\d{2}\/\d{2}\/\d{4}')
findDat = re.findall(datFinder, strToSearch)
nowDate = date.today()
fileLst = open('cels.txt', 'w')
ntrdLst = open('not_ready.txt', 'w')
for i in findPat1:
for Date in findDat:
Date = datetime.strptime(Date, '%d/%m/%Y')
Date = Date.date()
endDate = Date + timedelta(days=731)
if endDate < nowDate:
fileLst.write(i)
else:
ntrdLst.write(i)
f.close()
fileLst.close()
ntrdLst.close()
toClose = raw_input('File was modified, press enter to close: ')
so basically it searches for a string with dates and numbers and then same list but only dates, converts the dates, adds 2 years to each and compares, if the date surpass today's date, goes to the ntrdLst, if not, to fileLst.
My problem is that it writes the same list (i) multiple times and doesn't do the sorting.
i am fearly new to python and programming so i am asking for your help. thanks in advance
edit: -----------------
the normal output was this (without the date and if statement)
27/01/2009 A448 B448
22/10/2001 A434 B434
06/09/2007 A825 B825
06/09/2007 A434 B434
06/05/2010 A826 B826
what i would like is if i had a date that is after date.today() say like 27/01/2016 to write to another file and what i keep getting is the script printing this list 30x times or doesn't take to account the if statement.
(sorry, the if was indeed indented the last loop, i went wrong while putting it in here)
You're computing endDate in a loop, once for each date... but not doing anything with it in the loop. So, after the loop is over, you have the very last endDate, and you use only that one to decide which file to write to.
I'm not sure what your logic is supposed to be, but I'm pretty sure you want to put the if statement with the writes inside the inner loop.
If you do that, then if you have, say, 100 pattern matches and 25 dates, you'll end up writing 2500 strings--some to one file, some to the other. Is that what you wanted?
SOLVED
i gave it a little (A LOT) of thought about it and just got all together in one piece. i knew that there too many for loops but now i got it. Thanks anyway to you whom have reached a helping hand to me. I leave the code for anyone having a similar problem.
nowDate = date.today()
for line in sourceFile:
s = re.compile('(\d{2}\/\d{2}\/\d{4})\s(C\d{3}\sS\d{3})')
s1 = re.search(s, line)
if s1:
date = s1.group(1)
date = datetime.strptime(date, '%d/%m/%Y')
date = date.date()
endDate = date + timedelta(days=731)
if endDate <= nowDate:
fileLst.write(s1.group())
fileLst.write('\n')
else:
print ('not ready: ', date.strftime('%d-%m-%Y'))
ntrdLst.write(s1.group(1))
ntrdLst.write('\n')

Categories