Searching and sorting in Python

Searching and sorting in Python - python

i am writing a script in python that searches for strings and suposedly does different things when encounters strings.
import re, datetime
from datetime import *
f = open(raw_input('Name of file to search: ')
strToSearch = ''
for line in f:
strToSearch += line
patFinder = re.compile('\d{2}\/\d{2}\/\d{4}\sA\d{3}\sB\d{3}')
findPat1 = re.findall(patFinder, strToSearch)
# search only dates
datFinder = re.compile('\d{2}\/\d{2}\/\d{4}')
findDat = re.findall(datFinder, strToSearch)
nowDate = date.today()
fileLst = open('cels.txt', 'w')
ntrdLst = open('not_ready.txt', 'w')
for i in findPat1:
for Date in findDat:
Date = datetime.strptime(Date, '%d/%m/%Y')
Date = Date.date()
endDate = Date + timedelta(days=731)
if endDate < nowDate:
fileLst.write(i)
else:
ntrdLst.write(i)
f.close()
fileLst.close()
ntrdLst.close()
toClose = raw_input('File was modified, press enter to close: ')
so basically it searches for a string with dates and numbers and then same list but only dates, converts the dates, adds 2 years to each and compares, if the date surpass today's date, goes to the ntrdLst, if not, to fileLst.
My problem is that it writes the same list (i) multiple times and doesn't do the sorting.
i am fearly new to python and programming so i am asking for your help. thanks in advance
edit: -----------------
the normal output was this (without the date and if statement)
27/01/2009 A448 B448
22/10/2001 A434 B434
06/09/2007 A825 B825
06/09/2007 A434 B434
06/05/2010 A826 B826
what i would like is if i had a date that is after date.today() say like 27/01/2016 to write to another file and what i keep getting is the script printing this list 30x times or doesn't take to account the if statement.
(sorry, the if was indeed indented the last loop, i went wrong while putting it in here)

You're computing endDate in a loop, once for each date... but not doing anything with it in the loop. So, after the loop is over, you have the very last endDate, and you use only that one to decide which file to write to.
I'm not sure what your logic is supposed to be, but I'm pretty sure you want to put the if statement with the writes inside the inner loop.
If you do that, then if you have, say, 100 pattern matches and 25 dates, you'll end up writing 2500 strings--some to one file, some to the other. Is that what you wanted?

SOLVED
i gave it a little (A LOT) of thought about it and just got all together in one piece. i knew that there too many for loops but now i got it. Thanks anyway to you whom have reached a helping hand to me. I leave the code for anyone having a similar problem.
nowDate = date.today()
for line in sourceFile:
s = re.compile('(\d{2}\/\d{2}\/\d{4})\s(C\d{3}\sS\d{3})')
s1 = re.search(s, line)
if s1:
date = s1.group(1)
date = datetime.strptime(date, '%d/%m/%Y')
date = date.date()
endDate = date + timedelta(days=731)
if endDate <= nowDate:
fileLst.write(s1.group())
fileLst.write('\n')
else:
print ('not ready: ', date.strftime('%d-%m-%Y'))
ntrdLst.write(s1.group(1))
ntrdLst.write('\n')

Related

Error: time data "b'YYYY/MM/DD" does not match format '%Y/%m/%d' but it does

I'm trying to parse dates from a textfile, but executing the scripts throws incorrect data format, when the format is correct.
The file is a .txt file with the following structure
2018/02/15 05:00:13 - somestring - anotherstring
2018/02/15 05:00:14 - somestring - anotherstring
2018/02/15 05:00:15 - somestring - anotherstring
... etc
The script gets the file divided in lines, and each line is divided on fields, of which one field is a date and time. I divided the date and the time in two separate fields, the time gets converted ok so the problem is in the date.
This is what I get on execution:
ValueError: time data "b'2018/02/15" does not match format '%Y/%m/%d'
I noticed it prints the string with a "b" in front of it, which if I'm not mistaken it means it's a byte literal. I've tried using "decode("utf-8")" on it, but it throw's exception as "string" has no method decode.
#the file is in one long string as I get it from a 'cat' bash command via ssh
file = str(stdout.read()) #reads the cat into a long string
strings = file.split("\\n") #splits the string into lines
for string in strings:
fields = string.split(" - ")
if len(fields) >= 3:
#dates.append(datetime.strptime(campos[0],"%Y/%m/%d %H:%M:%S")) #Wrong format
datentime = fields[0].split()
dates.append(datetime.strptime(datentime[0],"%Y/%m/%d")) #Wrong format
print(datentime[1])
dates.append(datetime.strptime(datentime[1],"%H:%M:%S")) #WORKS

I can't figure out why that is happening to you with the code you gave so I can't offer a fix for that but I tried testing on it and this worked for me:
datetime.strptime(str(datentime[0])[2,:-1], "%Y/%m/%d")
It removes the B and ' from the string, if you still have problems with that, please post how you got that string, maybe there was some error on the way.

use try and except:
import datetime
def convertDate(d):
strptime = datetime.datetime.strptime
try:
return strptime(d, "%Y/%m/%d")
except TypeError:
return strptime(d.decode("utf-8"), "%Y/%m/%d")
print(convertDate(b'2018/02/15'))
print(convertDate('2018/02/15'))

How to convert a downloaded string to datetime format?

I am trying to check if today's date < date downloaded from text file online. Here is my code :
import datetime
import requests
URL = "http://directlinktotextfile.com/text.txt"
result = requests.get(URL)
today = datetime.datetime.now().date()
Url_date = result.text
Url_date.strip()
Url_date = datetime.date(Url_date)
if today < Url_date :
print "Today is less than future date"
raw_input()
else:
print "Today is greater than or = to future date"
raw_input()
The result that comes back is just this : 2018,02,14. I use .strip() in case there might be blank spaces or extra lines. I've printed out result.text after strip() and it shows the correct details. Why is it that I can't check if today < Url_date. It works fine if I enter manually a date into datetime.date(2018,02,14), but when I'm downloading the string it won't work. Any suggestions?

You pass string to datetime.date() which should be each an integer.
Url_list = []
Url_list = Url_date.split(",")
yr = int(Url_list[0])
mn = int(Url_list[1])
d = int(Url_list[2])
Now pass these integers to datetime.date
Url_date = datetime.date(yr, mn, d)

The arguments you pass to datetime.date(arg1, arg2, arg3) are not strings as a whole. When you pass it from url, what you are actually doing is
datetime.date("2018,2,14")
Note that you are passing only one string argument and not 3 different integers. You should split the date string using comma and then convert each into integers and then pass them as arguments to datetime.date.

Here is what your code is trying to do :
Url_date = datetime.date("2018,02,14")
But he wants to have:
Url_date = datetime.date(2018,02,14)
Do
Url_date.split(',') # Result: ['2018','02','14']
And then convert all the string in the array in integers
It should be ok :)

Use strptime:
import datetime
today = datetime.datetime.now().date()
parsed = datetime.datetime.strptime("2018,02,14", "%Y,%m,%d").date()
print(today < parsed) # True

Python: How to add/subtract a number only to numeric characters in a string?

Say for example, I have the following strings and an input 4.0, which represents seconds:
John Time Made 11:05:20 in 2010
5.001 Kelly #1
6.005 Josh #8
And would like the following result:
John Time Made 11:05:24 in 2010 #Input 4.0 is added to the seconds of 11:05:20
1.001 Kelly #1 #4.0 is subtracted from the first number 5.001 = 1.001
2.005 Josh #8 #4.0 is subtracted from the first number 5.001 = 2.005
How can I recognize the hours:minutes:seconds in the first line, and #.### in the rest to add/subtract the input number?
Thank you in advance and will accept/upvote answer

This solution should work if your complete data has the same format as this particular sample you provided. You should have the data in the input.txt file.
val_to_add = 4
with open('input.txt') as fin:
# processing first line
first_line = fin.readline().strip()
splitted = first_line.split(' ')
# get hour, minute, second corresponding to time (11:05:20)
time_values = splitted[3].split(':')
# seconds is the last element
seconds = int(time_values[-1])
# add the value
new_seconds = seconds + val_to_add
# doing simple math to avoid having values >= 60 for minute and second
# this part probably can be solved with datetime or some other lib, but it's not that complex, so I did it in couple of lines
seconds = new_seconds % 60 # if we get > 59 seconds we only put the modulo as second and the other part goes to minute
new_minutes = int(time_values[1]) + new_seconds // 60 # if we have more than 60 s then here we'll add minutes produced by adding to the seconds
minutes = new_minutes % 60 # similarly as for seconds
hours = int(time_values[0]) + new_minutes // 60
# here I convert again to string so we could easily apply join operation (operates only on strings) and additionaly add zero in front for 1 digit numbers
time_values[0] = str(hours).rjust(2, '0')
time_values[1] = str(minutes).rjust(2, '0')
time_values[2] = str(seconds).rjust(2, '0')
new_time_val = ':'.join(time_values)# join the values to follow the HH:MM:SS format
splitted[3] = new_time_val# replace the old time with the new one (with the value added)
first_line_modified = ' '.join(splitted)# just join the modified list
print(first_line_modified)
# processing othe lines
for line in fin:
# here we only get the first (0th) value and subtract the val_to_add and round to 3 digits the response (to avoid too many decimal places)
stripped = line.strip()
splitted = stripped.split(' ')
splitted[0] = str(round(float(splitted[0]) - val_to_add, 3))
modified_line = ' '.join(splitted)
print(modified_line)

Although regex was discouraged in the comments, regex can be used to parse the time objects into datetime.time objects, perform the necessary calculations on them, then print them in the required format:
# datetime module for time calculations
import datetime
# regex module
import re
# seconds to add to time
myinp = 4
# List of data strings
# data = 'John Time Made 11:05:20 in 2010', '5.001 Kelly', '6.005 Josh'
with open('data.txt') as f:
data = f.readlines()
new_data = []
#iterate through the list of data strings
for time in data:
try:
# First check for 'HH:MM:SS' time format in data string
# regex taken from this question: http://stackoverflow.com/questions/8318236/regex-pattern-for-hhmmss-time-string
match = re.findall("([0-1]?\d|2[0-3]):([0-5]?\d):([0-5]?\d)", time)
# this regex returns a list of tuples as strings "[('HH', 'MM', 'SS')]",
# which we join back together with ':' (colon) separators
t = ':'.join(match[0])
# create a Datetime object from indexing the first matched time in the list,
# taken from this answer http://stackoverflow.com/questions/100210/what-is-the-standard-way-to-add-n-seconds-to-datetime-time-in-python
# May create an IndexError exception, which we catch in the `except` clause below
orig = datetime.datetime(100,1,1,int(match[0][0]), int(match[0][1]), int(match[0][2]))
# Add the number of seconds to the Datetime object,
# taken from this answer: http://stackoverflow.com/questions/656297/python-time-timedelta-equivalent
newtime = (orig + datetime.timedelta(0, myinp)).time()
# replace the time in the original data string with the newtime and print
new_data.append(time.replace(t, str(newtime)))
# catch an IndexError Exception, which we look for float-formatted seconds only
except IndexError:
# look for float-formatted seconds (s.xxx)
# taken from this answer: http://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string
match = re.findall("\d+\.\d+", time)
# create a Datetime object from indexing the first matched time in the list,
# specifying only seconds, and microseconds, which we convert to milliseconds (micro*1000)
orig = datetime.datetime(100,1,1,second=int(match[0].split('.')[0]),microsecond=int(match[0].split('.')[1])*1000)
# Subtract the seconds from the Datetime object, similiar to the time addtion in the `try` clause above
newtime = orig - datetime.timedelta(0, myinp)
# format the newtime as `seconds` concatenated with the milliseconds converted from microseconds
newtime_fmt = newtime.second + newtime.microsecond/1000000.
# Get the seconds value (first value(index 0)) from splitting the original string at the `space` between the `seconds` and `name` strings
t = time.split(' ')[0]
# replace the time in the original data string with the newtime and print
new_data.append(time.replace(t , str(newtime_fmt)))
with open('new_data.txt', 'w') as nf:
for newline in new_data:
nf.write(newline)
new_data.txt file contents should read as:
John Time Made 11:05:24 in 2010
1.001 Kelly
2.005 Josh

Print/write date to a file only if it's a different day. (Python)

I'm doing a maths quiz that asks people questions and at the end it would write their name and score to a file. I want there to be a date before the scores but the way I do it:
file1.write(time.strftime("%d/%m/%y" + "\n"))
file1.write("Name: ")
file1.write(Name + "\n")
file1.write("Score: ")
file1.write(str(Score)+ "\n")
file1.write("" + "\n")
file1.close()#
Writes the date every time someone finishes the quiz. I want it to write the date only once before a person's results unless it's a new day. For example today is 10/06/2015, it would print this date before the results of the first person that attempts this quiz and then not print the date again until it's a new day like 11/06/2015. Thanks

Fast and dirty solution:
with open("results.txt", "r+") as f:
if time.strftime("%d/%m/%y") not in f.read():
f.write(time.strftime("%d/%m/%y" + "\n"))

1 - Get today's date by:
myToday = time.strftime('%x')
2 - Check whether the date has changed or not:
if time.strftime('%x') != myToday:
myToday = time.strftime('%x') # change today var if it's a new date.
file1.write(myToday+'\n')

Python - read 10min from log file

I need some tool to read latest 10 minutes entry in my log file, and if some words are logged then print some text.
log file:
23.07.2014 09:22:11 INFO Logging.LogEvent 0 Failed login test#test.com
23.07.2014 09:29:02 INFO Logging.LogEvent 0 login test#test.com
23.07.2014 09:31:55 INFO Logging.LogEvent 0 login test#test.com
23.07.2014 09:44:14 INFO Logging.LogEvent 0 Failed login test#test.com
if during last 10min some entry = Failed -print ALARM.
All what i did is find 'Failed' match but i have no idea how to check last 10min in my log file ;/ -any idea??
from sys import argv
from datetime import datetime, timedelta
with open('log_test.log', 'r') as f:
for line in f:
try:
e = line.index("Failed")
except:
pass
else:
print(line)

Your format %d.%m.%Y is worse than %Y:%m:%d which can be used in string comparison.
We also do not know if log is big and if it is sorted. If it is not sorted (it is common for multithreaded applications) you will have to analyze each line and convert it into datetime:
def get_dt_from_line(s):
return datetime.datetime.strptime(s[:20], '%d.%m.%Y %H:%M:%S')
Then use it as filter (for small files):
MAX_CHECK_TIMEDELTA = datetime.timedelta(minutes=10)
LOG_START_ANALYZE_DATETIME = (datetime.datetime.today() - MAX_CHECK_TIMEDELTA)
lines = [s for s in TXT.split('\n') if 'Failed' in s and get_dt_from_line(s) >= LOG_START_ANALYZE_DATETIME]
print('\n'.join(lines))
For big files you can read file line by line.
If your log file is just for one day you can use string comparison instead of datetime comparison:
LOG_START_ANALYZE_DATETIME = (datetime.datetime.today() - datetime.timedelta(minutes=10)).strftime('%d.%m.%Y %H:%M:%S')
lines = [s for s in TXT.split('\n') if 'Failed' in s and s >= LOG_START_ANALYZE_DATETIME]

If I were you, I would lookup line by line, get the timestamp of the first line and then iterate until the difference between the first date and the current one is more than 10 minutes, while counting occurences of the word "Failed".
I think that you'll sort something out with splitting your line following spaces. But be careful as if someday, your log format changes, your script is likely not gonna be working too.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Searching and sorting in Python - python

Related

Error: time data "b'YYYY/MM/DD" does not match format '%Y/%m/%d' but it does

How to convert a downloaded string to datetime format?

Python: How to add/subtract a number only to numeric characters in a string?

Print/write date to a file only if it's a different day. (Python)

Python - read 10min from log file

Categories

Resources