What qualifies collection of strings to become a line? - python

Following code is taking every character and running the loop as many times. But when I save the same line in a text file and perform same operation, the loop is only run once for 1 line. It is bit confusing. Possible reason I can think off is that first method is running the loop by considering "a" as a list. Kindly correct me if I am wrong. Also let me know how to create a line in code itself rather first saving it in a file and then using it.
>>> a="In this world\n"
>>> i=0
>>> for lines in a:
... i=i+1
... print i
...
1
2
3
4
5
6
7
8
9
10
11
12
13

You're trying to loop over a, which is a string. Regardless of how many newlines you have in a string, when you loop over it, you're going to go character by character.
If you want to loop through a bunch of lines, you have to use a list:
lines = ["this is line 1", "this is another line", "etc"]
for line in lines:
print line
If you have a string containing a bunch of newlines and want to convert it to a list of lines, use the split method:
text = "This is line 1\nThis is another line\netc"
lines = text.split("\n")
for line in lines:
print line
The reason why you go line by line when reading from a file is because the people who implemented Python decided that it would be more useful if iterating over a file yielded a collection of lines instead of a collection of characters.
However, a file and a string are different things, and you should not necessarily expect that they work in the same way.

Just change the name of the variable when looping on the line:
i = 0
worldLine ="In this world\n"
for character in worldLine:
i=i+1
print i
count = 0
readFile = open('myFile','r')
for line in readFile:
count += 1
now it should be clear what's going on.
Keeping meaningful names will save you a lot of debugging time.
Considering doing the following:
i = 0
worldLine =["In this world\n"]
for character in worldLine:
i=i+1
print i
if you want to loop on a list of lines consisting of worldLine only.

Related

Find all strings in text file fitting either of two formats

So I know similar questions have been asked before, but every method I have tried is not working...
Here is the ask: I have a text file (which is a log file) that I am parsing for any occurrence of "app.task2". The following are the 2 scenarios that can occur (As they appear in the text file, independent of my code):
Scenario 1:
Mar 23 10:28:24 dasd[116] <Notice>: app.task2.refresh:556A2D:[
{name: ApplicationPolicy, policyWeight: 50.000, response: {Decision: Can Proceed, Score: 0.45}}
] sumScores:68.785000, denominator:96.410000, FinalDecision: Can Proceed FinalScore: 0.713463}
Scenario 2:
Mar 23 10:35:56 dasd[116] <Notice>: 'app.task2.refresh:C6C2FE' CurrentScore: 0.636967, ThresholdScore: 0.410015 DecisionToRun:1
The problem I am facing is that my current code below, I am not getting the entire log entry for the first case, and it is only pulling the first line in the log, not the remainder of the log entry, and it appears to be stopping at the new line escape character, which is occurring after ":[".
My Code:
all = []
with open(path_to_log) as f:
for line in f:
if "app.task2" in line:
all.append(line)
print all
How can I get the entire log entry for the first case? I tried stripping escape characters with no luck. From here I should be able to parse the list of results returned for what I truly need, but this will help! ty!
OF NOTE: I need to be able to locate these types of log entries (which will then give us either scenario 1 or scenario 2) by the string "app.task2". So this needs to be incorporated, like in my example...
Before adding the line to all, check if it ends with [. If it does, keep reading and merge the lines until you get to ].
import re
all = []
with open(path_to_log) as f:
for line in f:
if "app.task2" in line:
if re.search(r'\[\s*$', line): # start of multiline log message
for line2 in f:
line += line2
if re.search(r'^\s*\]', line2): # end of multiline log message
break
all.append(line)
print(all)
You are iterating over each each line individually which is why you only get the first line in scenario 1.
Either you can add a counter like this:
all = []
count = -1
with open(path_to_log) as f:
for line in f:
if count > 0:
all.append(line)
if count == 1:
tmp = all[-count:]
del all[-count:]
all.append("\n".join(tmp))
count -= 1
continue
if "app.task2" in line:
all.append(line)
if line.endswith('[\n'):
count = 3
print all
In this case i think Barmar solution would work just as good.
Or you can (preferably) when storing the log file have some distinct delimiter between each log entry and just split the log file by this delimiter.
I like #Barmar's solution with nested loops on the same file object, and may use that technique in the future. But prior to seeing I would have done it with a single loop, which may or may not be more readable:
all = []
keep = False
for line in open(path_to_log,"rt"):
if "app.task2" in line:
all.append(line)
keep = line.rstrip().endswith("[")
elif keep:
all.append(line)
keep = not line.lstrip().startswith("]")
print (all)
or, you can print it nicer with:
print(*all,sep='\n')

reading numbers from text into variables in python

assume i have a text file that only has the following format where each line has two numbers separated by a space:
2 4
66 99
11 67
1 3
this is my code which i tried doing:
with open('text_file.txt') as file:
lines = []
for i in file:
lines.append(i)
print(lines)
the problem with my code is that it keeps printing "\n" with the numbers and i don't know how to get around that
i need a way to read only the two numbers in the last line and store each number in a variable, so for the example above:
var1 = 1 , var2 = 3.
i need this for a multi-threading program , where i am testing a "race condition" between three threads where each thread reads the last two numbers in the text file and read the last line of numbers, and then print the thread_ip in the first ,and increment the second from the previous last line and print it.
im not asking for help for the whole assignment, just the reading/writing the text file part is what i can't seem to figure out.
with open("example_file", "r") as fin:
var1, var2 = [int(i) for i in fin.readline().split()]
To store the last two numbers in the last line:
with open("example_file", "r") as fin:
var1, var2 = [int(i) for i in fin.readlines()[-1].split()]
So both of the examples in the currently accepted answer from izhang work, but they come with a cost... In the first example, you have to needlessly walk the entire file to come to the last line. And in the second example, you have to read the entire file into memory. If dealing with large datasets, and/or running inside of constrained containers, this will absolutely become a problem.
The better solution is to read the file backwards and break after the first read line. This is not an easy task to do from scratch, but fortunately, there is a module for that.
from file_read_backwards import FileReadBackwards
with FileReadBackwards('text_file.txt') as f:
for l in f:
var1, var2 = l.split()
break
print(var1, var2)

Error with .readlines()[n]

I'm a beginner with Python.
I tried to solve the problem: "If we have a file containing <1000 lines, how to print only the odd-numbered lines? ". That's my code:
with open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')as f:
n=1
num_lines=sum(1 for line in f)
while n<num_lines:
if n/2!=0:
a=f.readlines()[n]
print(a)
break
n=n+2
where n is a counter and num_lines calculates how many lines the file contains.
But when I try to execute the code, it says:
"a=f.readlines()[n]
IndexError: list index out of range"
Why it doesn't recognize n as a counter?
You have the call to readlines into a loop, but this is not its intended use,
because readlines ingests the whole of the file at once, returning you a LIST
of newline terminated strings.
You may want to save such a list and operate on it
list_of_lines = open(filename).readlines() # no need for closing, python will do it for you
odd = 1
for line in list_of_lines:
if odd : print(line, end='')
odd = 1-odd
Two remarks:
odd is alternating between 1 (hence true when argument of an if) or 0 (hence false when argument of an if),
the optional argument end='' to the print function is required because each line in list_of_lines is terminated by a new line character, if you omit the optional argument the print function will output a SECOND new line character at the end of each line.
Coming back to your code, you can fix its behavior using a
f.seek(0)
before the loop to rewind the file to its beginning position and using the
f.readline() (look, it's NOT readline**S**) method inside the loop,
but rest assured that proceding like this is. let's say, a bit unconventional...
Eventually, it is possible to do everything you want with a one-liner
print(''.join(open(filename).readlines()[::2]))
that uses the slice notation for lists and the string method .join()
Well, I'd personally do it like this:
def print_odd_lines(some_file):
with open(some_file) as my_file:
for index, each_line in enumerate(my_file): # keep track of the index of each line
if index % 2 == 1: # check if index is odd
print(each_line) # if it does, print it
if __name__ == '__main__':
print_odd_lines('C:\Users\Savina\Desktop\rosalind_ini5.txt')
Be aware that this will leave a blank line instead of the even number. I'm sure you figure how to get rid of it.
This code will do exactly as you asked:
with open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')as f:
for i, line in enumerate(f.readlines()): # Iterate over each line and add an index (i) to it.
if i % 2 == 0: # i starts at 0 in python, so if i is even, the line is odd
print(line)
To explain what happens in your code:
A file can only be read through once. After that is has to be closed and reopened again.
You first iterate over the entire file in num_lines=sum(1 for line in f). Now the object f is empty.
If n is odd however, you call f.readlines(). This will go through all the lines again, but none are left in f. So every time n is odd, you go through the entire file. It is faster to go through it once (as in the solutions offered to your question).
As a fix, you need to type
f.close()
f = open(r'C:\Users\Savina\Desktop\rosalind_ini5.txt')
everytime after you read through the file, in order to get back to the start.
As a side note, you should look up modolus % for finding odd numbers.

Want to skip last and first 5 lines while reading file in python

If I want to see only data between line number 5 to what comes before last 5 rows in a file. While I was reading that particular file.
Code I have used as of now :
f = open("/home/auto/user/ip_file.txt")
lines = f.readlines()[5:] # this will start from line 5 but how to set end
for line in lines:
print("print line ", line )
Please suggest me I am newbie for python.
Any suggestions are most welcome too.
You could use a neat feature of slicing, you can count from the end with negative slice index, (see also this question):
lines = f.readlines()[5:-5]
just make sure there are more than 10 lines:
all_lines = f.readlines()
lines = [] if len(all_lines) <= 10 else all_lines[5:-5]
(this is called a ternary operator)

Python 3.4 - Capture block of text based on single string

I have searched far and wide and I hope someone can either point me to the link I missed or help me out with this logic.
We have a script the goes out and collects logs from various devices and places them in text files. Within these text files there is a time stamp and we need to collect the few lines of text before and after this time stamp.
I already have a script that matches the time stamps and removes them for certain reports (included below) but I cannot figure out how to match the time stamp and then capture the surrounding lines.
regex_time_stamp = re.compile('\d{2}:\d{2}:\d{2}|\d{1,2}y\d{1,2}w\d{1,2}d|\d{1,2}w\d{1,2}d|\d{1,2}d\d{1,2}h')
with open(filename, 'r') as f:
h = f.readlines()
for line in h:
if regex_time_stamp.search(line) is not None:
new_line = re.sub(regex_time_stamp, '', line)
pre_list.append(new_line)
else:
pre_list.append(line)
Any assistance would be greatly appreciated! Thanks for taking the time to read this.
The basic algorithm is to remember the three most recently read lines. When you match a header, read the next two lines and the combine it with the header and the last three lines that you've saved.
Alternately, since you're saving all of the lines in a list, simply keep track of which element is the current element, and when you find a header you can go back and get the previous two and next two elements.
Catch with duplicated lines
Agreed with the basic algorithm by #Bryan-Oakley and #TigerhawkT3, however there's a catch:
What if several lines match consecutively?
You could end up duplicating "context" lines by printing the last 2 lines of the first match, and then the last 2 lines of the second match... that would actually also contain the previous matched line.
The solution is to keep track of which line number was last printed, in order to print just enough lines before the current matched line.
Flexible context parameter
What if also you want to print 3 lines before and after instead of 2? Then you need to keep track of more lines.
What if you want only 1 ?
Then your number of lines to print needs to be a parameter and the algorithm needs to use it.
Sample input and output
Here's a file sample that contains the word MATCH instead of your timestamp, for clarity. The other lines contain NOT + line number
==
NOT 0
NOT 1
NOT 2
NOT 3
NOT 4
MATCH LINE 5
NOT 6
NOT 7
NOT 8
NOT 9
MATCH LINE 10
MATCH LINE 11
NOT 12
MATCH LINE 13
NOT 14
==
The output should be:
==
NOT 3
NOT 4
LINE 5
NOT 6
NOT 8
NOT 9
LINE 10
LINE 11
NOT 12
LINE 13
NOT 14
==
Solution
This solution iterates on the file and keeps track of:
what is the last line that was printed? This will take care of not duplicating "context" lines if matched lines come in sequence.
what is the last line that was matched? This will tell the program to print the current line if it is "close" to the last matched line. How close? This is determined by your "number of lines to print" parameter. Then we also set the last_line_printed variable to the current line index.
Here's a simplified algorithm in English:
When matching a line we will:
print the last N lines, from the last_line_printed variable to the current index
print the current line after stripping the timestamp
set the last_line_printed = last_line_matched = current line index
continue
When not matching a line we will:
print the current line if current_index < last_line_matched index + number_of_lines_to_print
Of course we're taking care of whether we're close to the beginning of file by checking limits
Not print but return an array
This solution doesn't print directly but returns an array with all the lines to print. That's just a bit classier.
I like to name my "return" variable result but that's just me. It makes it obvious what is the result variable during the whole algorithm.
Code
You can try this code with the input above, it'll print the same output.
def search_timestamps_context(filename, number_of_lines_to_print=2):
import re
result = []
regex_time_stamp = re.compile('\d{2}:\d{2}:\d{2}|\d{1,2}y\d{1,2}w\d{1,2}d|\d{1,2}w\d{1,2}d|\d{1,2}d\d{1,2}h')
# for my test
regex_time_stamp = re.compile('MATCH')
with open(filename, 'r') as f:
h = f.readlines()
# Remember which is the last line printed and matched
last_line_printed = -1
last_line_matched = -1
for idx, line in enumerate(h):
if regex_time_stamp.search(line) is not None:
# We want to print the last "number_of_lines_to_print" lines
# ...unless they were already printed
# We want to return last lines from idx - number_of_lines_to_print
# print ('** Matched ', line, idx, last_line_matched, last_line_printed)
if last_line_printed == -1:
lines_to_print = max(idx - number_of_lines_to_print, 0)
else:
# Unless we've already printed those lines because of a previous match, then we continue
lines_to_print = max(idx - number_of_lines_to_print, last_line_printed + 1)
for l in h[lines_to_print:idx]:
result.append(l)
# Now print the stripped line
new_line = re.sub(regex_time_stamp, '', line)
result.append(new_line)
# Update the last line printed
last_line_printed = last_line_matched = idx
else:
# If not a match, we still need to print the current line if we had a match N lines before
if last_line_matched != -1 and idx < last_line_matched + number_of_lines_to_print:
result.append(line)
last_line_printed = idx
return result
filename = 'test_match.txt'
lines = search_timestamps_context(filename, number_of_lines_to_print=2)
print (''.join(lines))
Improvements
The usage of getlines() is inefficient: we are reading the whole file before starting.
It would be more efficient to just iterate, but then we need to remember the last lines in case we need to print them. To achieve that, we would maintain a list of the last N lines, and not more.
That's an exercise left to the reader :)

Categories