How to open a text file and traverse through multiple lines concurrently? - python

I have a text file I want to open and do something to a line based on the next line.
For example if I have the following lines:
(a) A dog jump over the fire.
(1) A fire jump over a dog.
(b) A cat jump over the fire.
(c) A horse jump over a dog.
My code would be something like this:
with open("dog.txt") as f:
lines = filter(None, (line.rstrip() for line in f))
for value in lines:
if value has letter enclosed in parenthesis
do something
then if next line has a number enclosed in parenthesis
do something
EDIT: here is the solution I used.
for i in range(len(lines)) :
if re.search('^\([a-z]', lines[i-1]) :
print lines[i-1]
if re.search('\([0-9]', lines[i]) :
print lines[i]

Store the previous line and process it after reading the next:
file = open("file.txt")
previous = ""
for line in file:
# Don't do anything for the first line, as there is no previous line.
if previous != "":
if previous[0] == "(": # Or any other type of check you want to do.
# Process the 'line' variable here.
pass
previous = line
file.close()

you should use python's iter:
with open('file.txt') as f:
for line in f:
prev_line = line # current line.
next_line = next(f) # when call next(f), the loop will go to next line.
do_something(prev_line, next_line)

Related

How to loop through consecutive lines in python?

I have a text file that, for the sake of simplicity, contains:
cat
dog
goat
giraffe
walrus
elephant
How can I create a script that would set a variable, animal in this case, to the first line in the text file, print animal, but then do the whole thing again, but make animal set to the next line (in this instance, dog).
Here's what I've tried so far:
while True:
with open('./text.txt','r') as f:
for i in enumerate('./text.txt'):
if i in lines:
print(lines)
If you want to read the file one line at a time (which may be needed for large files):
with open('./text.txt','r') as f:
line = True
# this will stop when there is nothing left to read, as line will be ''
# note that an 'empty' line will still have a line ending, i.e. '\n'
while line:
line = f.readline()
print(line)
If you don't care about the size, you can read all lines at once with .readlines() and just loop over the returned values from that.
Use readlines to store each line in a list.
with open('file.txt','r') as f:
animals = f.readlines()
for animal in animals:
print(animal.strip())
You could try the following:
with open('./text.txt') as f:
for animal in f.readlines():
print(animal.strip())

how to print only the first value in python

In my output file I have:
Energy=-0.111....
'other text'
Energy=-0.1223
Now I am trying to write a script where I open the output, I read the energy value and print it in another output.
Below is my code
with open('.out', 'rt') as f:
data = f.readlines()
for line in data:
if line.__contains__('Energy'):
print(line)
My problem is that I want in my script print only the first value of energy Energy=-0.111.... but with my script in the output I have all of them, so it doesn't work. How I can correct my script? I want to understand how I can tell to him to print the first value of Energy in one script and in another only the second one.
Since you need to extend the logic to print the first, second etc times 'Energy' lines are found, you can store them in a list and access whichever one you need to print.
data = f.readlines()
energy_lines = []
for line in data:
if 'Energy' in line:
energy_lines.append(line)
print(energy_lines[0]) # first line
print(energy_lines[1]) # second line
Why not just use "break" after printing?
for line in data:
if line.__contains__('Energy'):
print(line)
break
First observation: Add a break as mentioned by John.
n observation: You could create a counter - if i want the third observation:
data = f.readlines()
for line in data:
counter = 0
if line.__contains__('Energy'):
counter+=1
if counter==3:
print(line)
break
If the energy line is always the first line in your text file:
with open('.out', 'rt') as f:
line = next(f)
if 'Energy' in line:
print(line)
If not, then still no point reading in all lines (This should consume less memory and a little less time)
num_occurence = 1
match_count = 0
with open('.out', 'rt') as f:
for line in f:
if 'Energy' in line:
match_count += 1
if match_count == num_occurences:
print(line)
break
I also changed
if line.__contains__('Energy'):
into
if 'Energy' in line:
As one should normally avoid to call magic functions, (less readable and more magic)

How to open a file in python, read the comments ("#"), find a word after the comments and select the word after it?

I have a function that loops through a file that Looks like this:
"#" XDI/1.0 XDAC/1.4 Athena/0.9.25
"#" Column.4: pre_edge
Content
That is to say that after the "#" there is a comment. My function aims to read each line and if it starts with a specific word, select what is after the ":"
For example if I had These two lines. I would like to read through them and if the line starts with "#" and contains the word "Column.4" the word "pre_edge" should be stored.
An example of my current approach follows:
with open(file, "r") as f:
for line in f:
if line.startswith ('#'):
word = line.split(" Column.4:")[1]
else:
print("n")
I think my Trouble is specifically after finding a line that starts with "#" how can I parse/search through it? and save its Content if it contains the desidered word.
In case that # comment contain str Column.4: as stated above, you could parse it this way.
with open(filepath) as f:
for line in f:
if line.startswith('#'):
# Here you proceed comment lines
if 'Column.4' in line:
first, remainder = line.split('Column.4: ')
# Remainder contains everything after '# Column.4: '
# So if you want to get first word ->
word = remainder.split()[0]
else:
# Here you can proceed lines that are not comments
pass
Note
Also it is a good practice to use for line in f: statement instead of f.readlines() (as mentioned in other answers), because this way you don't load all lines into memory, but proceed them one by one.
You should start by reading the file into a list and then work through that instead:
file = 'test.txt' #<- call file whatever you want
with open(file, "r") as f:
txt = f.readlines()
for line in txt:
if line.startswith ('"#"'):
word = line.split(" Column.4: ")
try:
print(word[1])
except IndexError:
print(word)
else:
print("n")
Output:
>>> ['"#" XDI/1.0 XDAC/1.4 Athena/0.9.25\n']
>>> pre_edge
Used a try and except catch because the first line also starts with "#" and we can't split that with your current logic.
Also, as a side note, in the question you have the file with lines starting as "#" with the quotation marks so the startswith() function was altered as such.
with open('stuff.txt', 'r+') as f:
data = f.readlines()
for line in data:
words = line.split()
if words and ('#' in words[0]) and ("Column.4:" in words):
print(words[-1])
# pre_edge

how to skip certain line in text file and keep reading the next line in python?

I have been searching for this answer but did not quite get it.
I have a text file that looks like this
who are you????
who are you man?
who are you!!!!
who are you? man or woman?
I want to skip the line with man in it and print
who are you????
who are you!!!!
My code so far
f = open("test.txt", "r")
word = "man"
for line in f:
if word in line:
f.next()
else:
print line
This prints the first line only
who are you????
How should I troubleshoot this problem?
Thank you for your help.
It's not necessary to add an if else statement in for loop, so you can modify your code in this way:
f = open("test.txt", "r")
word = "man"
for line in f:
if not word in line:
print line
Furthermore, the issue in your code is that you are using f.next() directly in a for loop used to scan the file. This is the reason because when the line contains "man" word, your code skips two lines.
If you want preserve if else statement because this is only an example of a more complex problem, you can use the following code:
f = open("test.txt", "r")
word = "man"
for line in f:
if word in line:
continue
else:
print line
Using continue, you skip one loop's iteration, and so you can reach your goal.
As Alex Fung suggests, would be better use with, so your code would become like this:
with open("test.txt", "r") as test_file:
for line in test_file:
if "man" not in line:
print line
Problem
With your current code, when the current line contains "man" :
you don't print anything. That's correct.
you also skip the next line. That's your problem!
f.next() is already called implicitely by for line in f: at each iteration. So you actually call f.next() twice when "man" is found.
If the last line of your file contains a "man", Python will throw an exception because there's no next line.
You might have been looking for continue, which would also achieve the desired result but would be complex and unneeded. Note that it's called next in Perl and Ruby, which might be confusing.
Example
who are you???? # <- This line gets printed, because there's no "man" in it
who are you man? # word in line is True. Don't print anything. And skip next line
who are you!!!! # Line is skipped because of f.next()
who are you? man or woman? # word in line is True. Don't print anything.
# Try to skip next line, but there's no next line anymore.
# The script raises an exception : StopIteration
Correct code
Don't forget to close the file. You can do this automatically with with :
word = "man"
with open("test.txt") as f:
for line in f:
if not word in line:
print line, # <- Note the comma to avoid double newlines
How about
f = open("test.txt", "r")
word = "man"
for line in f:
if not word in line:
print line

How to find a substring in a line and append from that line up to the next substring?

The test.txt would be
1
2
3
start
4
5
6
end
7
8
9
I would like the result to be
start
4
5
6
end
This is my code
file = open('test.txt','r')
line = file.readline()
start_keyword = 'start'
end_keyword = 'end'
lines = []
while line:
line = file.readlines()
for words_in_line in line:
if start_keyword in words_in_line:
lines.append(words_in_line)
file.close()
print entities
It returns
['start\n']
I have no idea what to add to the above code to achieve the result I want to get. I have been searching and changing the code around but I don't know how to get this to work as I want it to.
Use a flag. Try this:
file = open('test.txt','r')
start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []
lines = file.readlines()
for line in lines:
line = line.strip()
if line == start_keyword:
in_range = True
elif line == end_keyword:
in_range = False
elif in_range:
entities.append(line)
file.close()
# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]
print entities
About your code, notice that readlines already reads all lines in a file, so calling readline doesn't seem to make much sense, unless you are ignoring the first line. Also use strip to remove EOL characters from the strings. Notice how your code doesn't do what you expect it to:
# Reads ALL lines in the file as an array
line = file.readlines()
# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:
# If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
if start_keyword in words_in_line:
lines.append(words_in_line)
You need a state variable to decide whether you are storing the lines or not. Here is a simplistic example that will always store the line, and then will change its mind and discard it for the cases you don't want:
start_keyword = 'start'
end_keyword = 'end'
lines = []
reading = False
with open('test.txt', 'r') as f:
for line in f:
lines.append(line)
if start_keyword in line:
reading = True
elif end_keyword in line:
reading = False
elif not reading:
lines.pop()
print ''.join(lines)
If the file isn't too big (relative to how much RAM your computer has):
start = 'start'
end = 'end'
with open('test.txt','r') as f:
content = f.read()
result = content[content.index(start):content.index(end)]
You can then print it with print(result), create a list by using result.split(), and so on.
If there are multiple start/stop points, and/or the file is very large:
start = 'start'
end = 'end'
running = False
result = []
with open('test.txt','r') as f:
for line in f:
if start in line:
running = True
result.append(line)
elif end in line:
running = False
result.append(line)
elif running:
result.append(line)
This leaves you with a list, which you can join(), print(), write to a file, and so on.
You can use some kind of a flag that gets set to true when you encounter the start_keyword and if that flag is set you add the lines to lines list, and it gets unset when end_keyword is encountered (but only after end_keyword has been written into the lines list.
Also use .strip() on words_in_line to remove the \n (and other trailing and leading whitespaces) If you do not want them in the list lines , if you do want them, then don't strip it.
Example -
flag = False
for words_in_line in line:
if start_keyword in words_in_line:
flag = True
if flag:
lines.append(words_in_line.strip())
if end_keyword in words_in_line:
flag = False
Please note, this would add multiple start to end blocks into the lines list, I am guessing that is what you want.
A file object is it's own iterator, you don't need a while loop to read a file line by line, you can iterate over the file object itself. To catch the sections just start an inner loopn when you encounter a line with start and break the inner loop when you hit end:
with open("in.txt") as f:
out = []
for line in f:
if start in line:
out.append(line)
for _line in f:
out.append(_line)
if end in _line:
break
Output:
['start\n', '4\n', '5\n', '6\n', 'end\n']

Categories