Python: Using readine() in "for line in file:" Loop - python

Lets say I have a text file that looks like:
a
b
start_flag
c
d
e
end_flag
f
g
I wish to iterate over this data line by line, but when I encounter a 'start_flag', I want to iterate until I reach an 'end_flag' and count the number of lines in between:
newline = ''
for line in f:
count = 0
if 'start_flag' in line:
while 'end_flag' not in newline:
count += 1
newline = f.readline()
print(str(count))
What is the expected behavior of this code? Will it iterate like:
a
b
start_flag
c
d
e
end_flag
f
g
Or:
a
b
start_flag
c
d
e
end_flag
c
d
e
end_flag
f
g

There shouldn't be any need to use readline(). Try it like this:
with open(path, 'r') as f:
count = 0
counting = False
for line in f:
if 'start_flag' in line:
counting = True
elif 'end_flag' in line:
counting = False
#do something with your count result
count = 0 #reset it for the next start_flag
if counting is True:
count += 1
This handles it all with the if statements in the correct order, allowing you to just run sequentially through the file in one go. You could obviously add more operations into this, and do things with the results, for example appending them to a list if you expect to run into multiple start and end flags.

Use this:
enter = False
count = 0
for line in f:
if 'start_flag' in line:
enter = True
if 'end_flag' in line:
print count
count = 0
enter = False
if enter is True:
count+=1

Related

how to add string in previous line to the end of next line in python

I want to add string startswith "AA" to the end of next line like this
(that have many lines in text)
input:
AA
1 A B C
2 D E F
AA
3 G H I
output:
1 A B C AA
2 D E F
3 G H I AA
you need to keep two lists, a temporary one to keep all your string that start with "AA" and other for the output and fill them accordingly
>>> text="""AA
1 A B C
2 D E F
AA
3 G H I"""
>>> output=[]
>>> temp=[]
>>> for line in text.splitlines():
line = line.strip() #remove trailing while spaces
if not line:
continue #ignore empty lines
if line.startswith("AA"):
temp.append(line)
else:
if temp: #concatenate our temp list if there anything there
line += " " + " ".join(temp)
temp.clear()
output.append(line)
>>> print("\n".join(output))
1 A B C AA
2 D E F
3 G H I AA
>>>
import sys # To use stdin(standard input) function
checker="AA" # Make var for chekcing word
raw=sys.stdin.readlines()
#paste text from clipboard and ctrl + z input
#If you takes too long time to paste text from clipboard
#Look below another code which to make lines list.
lines=list(map(str.strip,raw))
result=[]
for i in range(0,len(lines)-1):
if lines[i] == checker:
result.append(lines[i+1]+lines[i])
else:
pass
result_str='\n'.join(result)
print(result_str)
I hope it wiil work for you.
If your raw text's lines bigger than 1,000 than I reconnand you use open function. Here is some example.
your raw text file name is raw.txt and same folder with python file.
lines='' # Make empty str var
with open('raw.txt', 'r') as f:
lines=f.readlines(): # read all lines with escape code (\n)
Map=map(str.strip, lines) # remove escape code by map, strip function.
lises=list(Map) #to using slicing Map should transformed list.

Selecting line from file by using "startswith" and "next" commands

I have a file from which I want to create a list ("timestep") from the numbers which appear after each line "ITEM: TIMESTEP" so:
timestep = [253400, 253500, .. etc]
Here is the sample of the file I have:
ITEM: TIMESTEP
253400
ITEM: NUMBER OF ATOMS
378
ITEM: BOX BOUNDS pp pp pp
-2.6943709180241954e-01 5.6240920636804063e+01
-2.8194230631882372e-01 5.8851195163321044e+01
-2.7398090193568775e-01 5.7189372326936599e+01
ITEM: ATOMS id type q x y z
16865 3 0 28.8028 1.81293 26.876
16866 2 0 27.6753 2.22199 27.8362
16867 2 0 26.8715 1.04115 28.4178
16868 2 0 25.7503 1.42602 29.4002
16869 2 0 24.8716 0.25569 29.8897
16870 3 0 23.7129 0.593415 30.8357
16871 3 0 11.9253 -0.270359 31.7252
ITEM: TIMESTEP
253500
ITEM: NUMBER OF ATOMS
378
ITEM: BOX BOUNDS pp pp pp
-2.6943709180241954e-01 5.6240920636804063e+01
-2.8194230631882372e-01 5.8851195163321044e+01
-2.7398090193568775e-01 5.7189372326936599e+01
ITEM: ATOMS id type q x y z
16865 3 0 28.8028 1.81293 26.876
16866 2 0 27.6753 2.22199 27.8362
16867 2 0 26.8715 1.04115 28.4178
16868 2 0 25.7503 1.42602 29.4002
16869 2 0 24.8716 0.25569 29.8897
16870 3 0 23.7129 0.593415 30.8357
16871 3 0 11.9253 -0.270359 31.7252
To do this I tried to use "startswith" and "next" commands at once and it didn't work. Is there other way to do it? I send also the code I'm trying to use for that:
timestep = []
with open(file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.split()
if line[0].startswith("ITEM: TIMESTEP"):
timestep.append(next(line))
print(timestep)
The logic is to decide whether to append the current line to timestep or not. So, what you need is a variable which tells you append the current line when that variable is TRUE.
timestep = []
append_to_list = False # decision variable
with open(file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip() # remove "\n" from line
if line.startswith("ITEM"):
# Update add_to_list
if line == 'ITEM: TIMESTEP':
append_to_list = True
else:
append_to_list = False
else:
# append to list if line doesn't start with "ITEM" and append_to_list is TRUE
if append_to_list:
timestep.append(line)
print(timestep)
output:
['253400', '253500']
First - I don't like this, because it doesn't scale. You can only get the first immediately following line nicely, anything else will be just ugh...
But you asked, so ... for x in lines will create an iterator over lines and use that to keep the position. You don't have access to that iterator, so next will not be the next element you're expecting. But you can make your own iterator and use that:
lines_iter = iter(lines)
for line in lines_iter:
# whatever was here
timestep.append(next(line_iter))
However, if you ever want to scale it... for is not a good way to iterate over a file like this. You want to know what is in the next/previous line. I would suggest using while:
timestep = []
with open('example.txt', 'r') as f:
lines = f.readlines()
i = 0
while i < len(lines):
if line[i].startswith("ITEM: TIMESTEP"):
i += 1
while not line[i].startswith("ITEM: "):
timestep.append(next(line))
i += 1
else:
i += 1
This way you can extend it for different types of ITEMS of variable length.
So the problem with your code is subtle. You have a list lines which you iterate over, but you can't call next on a list.
Instead, turn it into an explicit iterator and you should be fine
timestep = []
with open(file, 'r') as f:
lines = f.readlines()
lines_iter = iter(lines)
for line in lines_iter:
line = line.strip() # removes the newline
if line.startswith("ITEM: TIMESTEP"):
timestep.append(next(lines_iter, None)) # the second argument here prevents errors
# when ITEM: TIMESTEP appears as the
# last line in the file
print(timestep)
I'm also not sure why you included line.split, which seems to be incorrect (in any case line.split()[0].startswith('ITEM: TIMESTEP') can never be true, since the split will separate ITEM: and TIMESTEP into separate elements of the resulting list.)
For a more robust answer, consider grouping your data based on when the line begins with ITEM.
def process_file(f):
ITEM_MARKER = 'ITEM: '
item_title = '(none)'
values = []
for line in f:
if line.startswith(ITEM_MARKER):
if values:
yield (item_title, values)
item_title = line[len(ITEM_MARKER):].strip() # strip off the marker
values = []
else:
values.append(line.strip())
if values:
yield (item_title, values)
This will let you pass in the whole file and will lazily produce a set of values for each ITEM: <whatever> group. Then you can aggregate in some reasonable way.
with open(file, 'r') as f:
groups = process_file(f)
aggregations = {}
for name, values in groups:
aggregations.setdefault(name, []).extend(values)
print(aggregations['TIMESTEP']) # this is what you want
You can use enumerate to help with index referencing. We can check to see if the string ITEM: TIMESTEP is in the previous line then add the integer to our timestep list.
timestep = []
with open('example.txt', 'r') as f:
lines = f.readlines()
for i, line in enumerate(lines):
if "ITEM: TIMESTEP" in lines[i-1]:
timestep.append(int(line.strip()))
print(timestep)

Python: How to increment the count when a variable repeats

I have a txt file which has following entries:
Rx = 34 // Counter gets incremented = 1, since the Rx was found for the first time
Rx = 2
Rx = 10
Tx = 2
Tx = 1
Rx = 3 // Counter gets incremented = 2, since the Rx was found for the first time after Tx
Rx = 41
Rx = 3
Rx = 19
I want to increment the count only for the 'Rx' that gets repeated for the first time and not for all the Rx in the text file My code is as follows:
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
print count
But this is giving me the count of all the Rx's in the txt file. I want the output as 2 and not 7.
Please help me out !
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
if count >=2:
break
print(m.group(0))
break the loop since you only needs to find out repeats.
import re
f = open("test.txt","r")
count = 0
for lines in f:
m = re.search("Rx = \d{1,2}", lines)
if m:
count +=1
if count >=2:
break
print count
By saying if m: it's going to continue to increment count as long as m != 0. If you'd like to only get the first 2, you need to introduce some additional logic.
if you want to find the count for the Rxes that are repeated 1x :
import re
rx_count = {}
with open("test.txt","r") as f:
count = 0
for lines in f:
if line.startswith('Rx'): rx_count[lines] = rx_count.get(lines,0)+1
now you have a counter dictionary in rx_count and we filter out all the values greater than 1, then sum those values together , and print out the count
rx_count = {k:v for k,v in rx_count.interitems() if v > 1}
count = sum(rx_count.values())
print count
To do exactly what you want, you're going need to keep track of which strings you've already seen.
You can do this by using a set to keep track of which you have seen until there is a duplicate, and then only counting occurrences of that string.
This example would do that
import re
count = 0
matches = set()
with open("test.txt", "r") as f:
for line in f:
m = re.search(r"Rx = \d{1,2}", line)
if not m:
# Skip the rest if no match
continue
if m.group(0) not in matches:
matches.add(m.group(0))
else:
# First string we saw
first = m.group(0)
count = 2
break
for line in f:
m = re.search(r"Rx = \d{1,2}", line)
## This or whatever check you want to do
if m.group(0) == first:
count += 1
print(count)

Count the number of lines after each pattern

I have a file with lines, some lines have a particular pattern. The number of lines after each pattern differs and I want to count the number of lines after each pattern.
<pattern>
line 1
line 2
line 3
<pattern>
line 1
line 2
etc
my code:
for line in fp:
c = 0
if line.startswith("<"):
header = line.split(" ")
else:
c = c+1
The code I have captures the pattern as well as the lines, but I don't know how to stop before the next pattern and start another count after the pattern.
just save the c into an array and set c = 0
a is the array
l is the length of the array
array a;
l = 0;
for line in fp:
c = 0
if line.startswith("<"):
header = line.split(" ")
a[l] = c
c = 0
l = l+1
else:
c = c+1
To read the values you can read the array from 0 to l:
for i in range(0,l):
print "c%d is %d" % (i,a[i])

Code doesn't print the last sequence in a file

I have a file that looks like this:
<s0> 3
line1
line2
line3
<s1> 5
line1
line2
<s2> 4
etc. up to more than a thousand
Each sequence has a header like <s0> 3, which in this case states that three lines follow. In the example above, the number of lines below <s1> is two, so I have to correct the header to <s1> 2.
The code I have below picks out the sequence headers and the correct number of lines below them. But for some reason, it never gets the details of the last sequence. I know something is wrong but I don't know what. Can someone point me to what I am doing wrong?
import re
def call():
with open('trial_perl.txt') as fp:
docHeader = open("C:\path\header.txt","w")
c = 0
c1 = 0
header = []
k = -1
for line in fp:
if line.startswith("<s"):
#header = line.split(" ")
#print header[1]
c = 0
else:
c1 = c + 1
c += 1
if c == 0 and c1>0:
k +=1
printing = c1
if printing >= 0:
s = "<s%s>" % (k)
#print "%s %d" % (s, printing)
docHeader.write(s+" "+str(printing)+"\n")
call()
you have no sentinel at the end of the last sequence in your data, so your code will need to deal with the last sequence AFTER the loop is done.
If I may suggest some python tricks to get to your results; you don't need those c/c1/k counter variables, as they make the code more difficult to read and maintain. Instead, populate a map of sequence header to sequence items and then use the map to do all your work:
(this code works only if all sequence headers are unique - if you have duplicates, it won't work)
with open('trial_perl.txt') as fp:
docHeader = open("C:\path\header.txt","w")
data = {}
for line in fp:
if line.startswith("<s"):
current_sequence = line
# create a list with the header as the key
data[current_sequence] = []
else:
# add each sequence to the list we defined above
data[current_sequence].append(line)
Your map is ready! It looks like this:
{"<s0> 3": ["line1", "line2", "line5"],
"<s1> 5": ["line1", "line2"]}
You can iterate it like this:
for header, lines in data.items():
# header is the key, or "<s0> 3"
# lines is the list of lines under that header ["line1", "line2", etc]
num_of_lines = len(lines)
The main problem is that you neglect to check the value of c after you have read the last line. You probably had difficulty spotting this problem because of all the superfluous code. You don't have to increment k, since you can extract the value from the <s...> tag. And you don't have to have all three variables c, c1, and printing. A single count variable will do.
import re, sys
def call():
with open('trial_perl.txt') as fp:
docHeader = sys.stdout #open("C:\path\header.txt","w")
count = 0
id = None
for line in fp:
if line.startswith("<s"):
if id != None:
tag = '<s%s>' % id
docHeader.write('<s%d> %d\n' % (id, count))
count = 0
id = int(line[2:line.find('>')])
else:
count += 1
if id != None:
tag = '<s%s>' % id
docHeader.write('<s%d> %d\n' % (id, count))
call()
Another approach using groupby from itertools, where you take the maximum number of line in each group - a group corresponding to a sequence of header + line in your file: :
from itertools import groupby
def call():
with open('stack.txt') as fp:
header = [-1]
lines = [0]
for line in fp:
if line.startswith("<s"):
header.append(header[-1]+1)
lines.append(0)
else:
header.append(header[-1])
lines.append(lines[-1] +1)
with open('result','w') as f:
for key, group in groupby(zip(header[1:],lines[1:]), lambda x: x[0]):
f.write(str(("<s%d> %d\n" % max(group))))
f.close()
call()
#<s0> 3
#<s1> 2
stack.txt is the file containing your data:
<s0> 3
line1
line2
line3
<s1> 5
line1
line2

Categories