Read file then stop then continue until given line - python

Python 3.7 question.
I do have a file looking like this:
1
10 10 10
3
25 29 10
52 55 30
70 70 20
0
where 1 shows there will be 1 line coming, 3 shows 3 will come, 0 marks end of file. How to achieve this?
I've tried
def read_each_course(filename):
with open(filename, 'r') as f:
lines = []
content = f.readlines()
lines += [x.rstrip() for x in content]
for i in lines:
while True:
if str(i).count(" ") == 0:
lines_to_read = int(i)
break
return lines_to_read, next(i)
but that won't work I get
TypeError: 'str' object is not an iterator
for the next(i).
My idea was to get a list of lists as the items like:
[[1, [10, 10, 10]], [3, [25, 29, 10], [52, 55, 30], [70, 70, 20]]]
BUT, I am unsure if that design is a good idea in general? Or should it then be rather a single linked list as the ultimate goal is that as the 3 numbers are coordinates I'll have to only use the next item such as x2-x1, y2-y1, penalty if left out (additional cost) where total cost is the hyp. of the xy triangle which is fine I can calculate that.

This revision of the answer by RomainL simplifies the logic.
It makes use of iterators to parse the file.
def read_each_course(filename):
result = []
with open(filename) as f:
it = iter(f)
while True:
count = int(next(it))
if count == 0: # Found the stop marker
break
current = [count]
for _ in range(count):
current.append([int(v) for v in next(it).strip().split()])
result.append(current)
return result
print(read_each_course("file2.txt"))
Output as required.

this code should do the tricks, for your design question I have no idea it seems to me to be opinion-based. So I will focus on the code.
In your code lines is a list, and i is an element of this list. you calling next on one of the list elem, not on the list here. I have to admit that I do not understand the logic of your code. So I cannot really help.
def read_each_course(filename):
result = []
current = []
with open(filename) as f_in:
for line in f_in: # loop over the file line by line
spt = line.strip().split() # split
if len(spt) == 1: # one elem
if current: # not the first one
result.append(current)
current = []
if spt[0] == 0: # end of file
break
current.append(int(spt[0]))
else:
current.append(list(map(int, spt)))
return result

Related

How to create this kind of loop?

I am having a bit of trouble figuring the following out:
I have a file with 100 lines for example, let's call it file A
I also have another file with 100 lines for example, let's call it file B
Now I need the first loop to read 10 lines from file A and do it's thing and then go to the other loop that reads 10 lines from file B, does it thing and then goes back to the first loop to do 11-20 lines from file A and then back to second loop that does 11-20 lines from file B.
I need both loops to remember from which line to read.
How should I approach this?
Thanks!
EDIT:
Could something like this work?
a=0
b=10
x=0
y=10
for 1000 times:
read a-b rows:
do its thing
a += 10
b += 10
read x-y rows:
do its thing
x += 10
y += 10
You can iterate over 10 lines at a time using this approach.
class File:
def __init__(self, filename):
self.f = open(filename, 'r')
def line(self):
yield self.f.readline()
def next(self, limit):
for each in range(limit):
yield self.f.readline()
def lines(self, limit=10):
return [x for x in self.next(limit=limit)]
file1 = File('C:\\Temp\\test.csv')
file2 = File('C:\\Temp\\test2.csv')
print(file1.lines(10)
print(file2.lines(10)
print(file1.lines(10)
print(file2.lines(10)
Now you can jump back and forth between files iterating over the next 10 lines.
Here is another solution using a generator and a context manager:
class SwitchFileReader():
def __init__(self, file_paths, lines = 10):
self.file_paths = file_paths
self.file_objects = []
self.lines = 1 if lines < 1 else lines
def __enter__(self):
for file in self.file_paths:
file_object = open(file, "r")
self.file_objects.append(file_object)
return self
def __exit__(self, type, value, traceback):
for file in self.file_objects:
file.close()
def __iter__(self):
while True:
next_lines = [
[file.readline() for _ in range(self.lines)]
for file in self.file_objects
]
if any(not all(lines) for lines in next_lines):
break
for lines in next_lines:
yield lines
file_a = r"D:\projects\playground\python\stackgis\data\TestA.txt"
file_b = r"D:\projects\playground\python\stackgis\data\TestB.txt"
with SwitchFileReader([file_a, file_b], 10) as file_changer:
for next_lines in file_changer:
print(next_lines , end="") # do your thing
The iteration will stop as soon as there are less remaining lines in any of the files.
Assuming file_a has 12 lines and file_b has 13 lines. Line 11 and 12 from file_a and line 11 to 13 from file_b would be ignored.
For simplicity I'm going to work with list. You can read the file into a list.
Let's split the problem. We need
group each list by any number. In your case 10
Loop in each 10 bunches for both arrays.
Grouping
Here an answer: https://stackoverflow.com/a/4998460/2681662
def group_by_each(lst, N):
return [lst[n:n+N] for n in range(0, len(lst), N)]
Loop in two list at the same time:
You can use zip for this.
lst1 = list(range(100)) # <- Your data
lst2 = list(range(100, 200)) # <-- Your second data
def group_by_each(lst, N):
return [lst[n:n+N] for n in range(0, len(lst), N)]
for ten1, ten2 in zip(group_by_each(lst1, 10), group_by_each(lst2, 10)):
print(ten1)
print(ten2)
When you iterate over a file object, it yields lines in the associated file.
You just need a single loop that grabs the next ten lines from both files each iteration. In this example, the loop will end as soon as either file is exhausted:
from itertools import islice
lines_per_iter = 10
file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")
while (a := list(islice(file_a, lines_per_iter))) and (b := list(islice(file_b, lines_per_iter))):
print(f"Next {lines_per_iter} lines from A: {a}")
print(f"Next {lines_per_iter} lines from B: {b}")
file_a.close()
file_b.close()
Ok, thank you for all the answers, I found a working solution to my project like this:
a=0
b=10
x=0
y=10
while True:
for list1 in range(a, b):
#read the lines from file A
a += 10
b += 10
for list2 in range(x, y):
#read the lines from file B
if y == 100:
break
x += 10
y += 10
I know it's been a long time since this question was asked, but I still feel like answering it my own way for future viewers and future reference. I'm not exactly sure if this is the best way to do it, but it can read multiple files simultaneously which is pretty cool.
from itertools import islice, chain
from pprint import pprint
def simread(files, nlines_segments, nlines_contents):
lines = [[] for i in range(len(files))]
total_lines = sum(nlines_contents)
current_index = 0
while len(tuple(chain(*lines))) < total_lines:
if len(lines[current_index]) < nlines_contents[current_index]:
lines[current_index].extend(islice(
files[current_index],
nlines_segments[current_index],
))
current_index += 1
if current_index == len(files):
current_index = 0
return lines
with open('A.txt') as A, open('B.txt') as B:
lines = simread(
[A, B], # files
[10, 10], # lines to read at a time from each file
[100, 100], # number of lines in each file
) # returns two lists containing the lines in files A and B
pprint(lines)
You can even add another file C (with any number of lines, even a thousand) like so:
with open('A.txt') as A, open('B.txt') as B, open('C.txt') as C:
lines = simread(
[A, B, C], # files
[10, 10, 100], # lines to read at a time from each file
[100, 100, 1000], # number of lines in each file
) # returns two lists containing the lines in files A and B
pprint(lines)
The values in nlines_segments can also be changed, like so:
with open('A.txt') as A, open('B.txt') as B, open('C.txt') as C:
lines = simread(
[A, B, C], # files
[5, 20, 125], # lines to read at a time from each file
[100, 100, 1000], # number of lines in each file
) # returns two lists containing the lines in files A and B
pprint(lines)
This would read file A five lines at a time, file B twenty lines at a time, and file C 125 lines at a time.
NOTE: The values provided in nlines_segments all have to be factors of their corresponding values in nlines_contents, which should all be the exact number of lines in the files they correspond to.
I hope this heps!
There is already a billion answers, but I just felt like answering this in a simple way.
with open('fileA.txt', 'r') as a:
a_lines = a.readlines()
a_prog = 0
with open('fileB.txt', 'r') as b:
b_lines = b.readlines()
b_prog = 0
for i in range(10):
temp = []
for line in range(a_prog, a_prog + 10):
temp.append(a_lines[line].strip())
a_prog += 10
#Temp is the full 10-line block.
#Do something...
temp = []
for line in range(b_prog, b_prog + 10):
temp.append(b_lines[line].strip())
b_prog += 10
#Temp is the full 10-line block.
#Do something...

How can I change this function so that it returns a list of the number of even digits in a file?

def evens(number_file: TextIO) -> List[int]:
lst = []
line = number_file.readline().strip()
while line != '':
evens = 0
line = number_file.readline().strip()
while line.isdigit():
evens = evens + int(line)
line = number_file.readline().strip()
lst.append(evens)
return last
in this example the file 'numbers.txt' looks like this:
START
1
2
END
START
3
END
START
4
5
6
END
Each line is either an int or 'START' or 'END'
I want to make a function that returns the number of evens in each section so when the code tuns on this file, it should return the list [1, 0, 2]. Could someone please help me?
import numpy as np
def main():
line = [1,4,5,6,8,8,9,10]
evens = np.array()
for l in line:
if (l % 2) == 0:
np.append(evens, l)
return evens
if __name__ == '__main__':
main()
Without your txt file and the code to read it in, this is the best I can do.
When writing new code, it's a good idea to start small and then add capabilities bit by bit. e.g., write code that works without worrying about sections first, then update to work with sections.
But anyway, here's some revised code that should work:
def evens(number_file: TextIO) -> List[int]:
lst = []
line = number_file.readline().strip()
while line != '':
evens = 0
# ignore all text lines
while !line.isdigit():
line = number_file.readline().strip()
# process number lines
while line.isdigit():
if int(line) % 2 == 0:
evens = evens + 1
line = number_file.readline().strip()
lst.append(evens)
return lst
And here's a version that may be simpler (for loops in Python can often make your life easier, e.g. when processing every row of a file):
def evens(number_file: TextIO) -> List[int]:
lst = []
for row in number_file:
line = row.strip()
if line == 'START':
# new section, start new count
lst.append(0)
elif line.isdigit() and int(line) % 2 == 0:
# update current count (last item in lst)
lst[-1] += 1
return lst

Selecting line from file by using "startswith" and "next" commands

I have a file from which I want to create a list ("timestep") from the numbers which appear after each line "ITEM: TIMESTEP" so:
timestep = [253400, 253500, .. etc]
Here is the sample of the file I have:
ITEM: TIMESTEP
253400
ITEM: NUMBER OF ATOMS
378
ITEM: BOX BOUNDS pp pp pp
-2.6943709180241954e-01 5.6240920636804063e+01
-2.8194230631882372e-01 5.8851195163321044e+01
-2.7398090193568775e-01 5.7189372326936599e+01
ITEM: ATOMS id type q x y z
16865 3 0 28.8028 1.81293 26.876
16866 2 0 27.6753 2.22199 27.8362
16867 2 0 26.8715 1.04115 28.4178
16868 2 0 25.7503 1.42602 29.4002
16869 2 0 24.8716 0.25569 29.8897
16870 3 0 23.7129 0.593415 30.8357
16871 3 0 11.9253 -0.270359 31.7252
ITEM: TIMESTEP
253500
ITEM: NUMBER OF ATOMS
378
ITEM: BOX BOUNDS pp pp pp
-2.6943709180241954e-01 5.6240920636804063e+01
-2.8194230631882372e-01 5.8851195163321044e+01
-2.7398090193568775e-01 5.7189372326936599e+01
ITEM: ATOMS id type q x y z
16865 3 0 28.8028 1.81293 26.876
16866 2 0 27.6753 2.22199 27.8362
16867 2 0 26.8715 1.04115 28.4178
16868 2 0 25.7503 1.42602 29.4002
16869 2 0 24.8716 0.25569 29.8897
16870 3 0 23.7129 0.593415 30.8357
16871 3 0 11.9253 -0.270359 31.7252
To do this I tried to use "startswith" and "next" commands at once and it didn't work. Is there other way to do it? I send also the code I'm trying to use for that:
timestep = []
with open(file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.split()
if line[0].startswith("ITEM: TIMESTEP"):
timestep.append(next(line))
print(timestep)
The logic is to decide whether to append the current line to timestep or not. So, what you need is a variable which tells you append the current line when that variable is TRUE.
timestep = []
append_to_list = False # decision variable
with open(file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip() # remove "\n" from line
if line.startswith("ITEM"):
# Update add_to_list
if line == 'ITEM: TIMESTEP':
append_to_list = True
else:
append_to_list = False
else:
# append to list if line doesn't start with "ITEM" and append_to_list is TRUE
if append_to_list:
timestep.append(line)
print(timestep)
output:
['253400', '253500']
First - I don't like this, because it doesn't scale. You can only get the first immediately following line nicely, anything else will be just ugh...
But you asked, so ... for x in lines will create an iterator over lines and use that to keep the position. You don't have access to that iterator, so next will not be the next element you're expecting. But you can make your own iterator and use that:
lines_iter = iter(lines)
for line in lines_iter:
# whatever was here
timestep.append(next(line_iter))
However, if you ever want to scale it... for is not a good way to iterate over a file like this. You want to know what is in the next/previous line. I would suggest using while:
timestep = []
with open('example.txt', 'r') as f:
lines = f.readlines()
i = 0
while i < len(lines):
if line[i].startswith("ITEM: TIMESTEP"):
i += 1
while not line[i].startswith("ITEM: "):
timestep.append(next(line))
i += 1
else:
i += 1
This way you can extend it for different types of ITEMS of variable length.
So the problem with your code is subtle. You have a list lines which you iterate over, but you can't call next on a list.
Instead, turn it into an explicit iterator and you should be fine
timestep = []
with open(file, 'r') as f:
lines = f.readlines()
lines_iter = iter(lines)
for line in lines_iter:
line = line.strip() # removes the newline
if line.startswith("ITEM: TIMESTEP"):
timestep.append(next(lines_iter, None)) # the second argument here prevents errors
# when ITEM: TIMESTEP appears as the
# last line in the file
print(timestep)
I'm also not sure why you included line.split, which seems to be incorrect (in any case line.split()[0].startswith('ITEM: TIMESTEP') can never be true, since the split will separate ITEM: and TIMESTEP into separate elements of the resulting list.)
For a more robust answer, consider grouping your data based on when the line begins with ITEM.
def process_file(f):
ITEM_MARKER = 'ITEM: '
item_title = '(none)'
values = []
for line in f:
if line.startswith(ITEM_MARKER):
if values:
yield (item_title, values)
item_title = line[len(ITEM_MARKER):].strip() # strip off the marker
values = []
else:
values.append(line.strip())
if values:
yield (item_title, values)
This will let you pass in the whole file and will lazily produce a set of values for each ITEM: <whatever> group. Then you can aggregate in some reasonable way.
with open(file, 'r') as f:
groups = process_file(f)
aggregations = {}
for name, values in groups:
aggregations.setdefault(name, []).extend(values)
print(aggregations['TIMESTEP']) # this is what you want
You can use enumerate to help with index referencing. We can check to see if the string ITEM: TIMESTEP is in the previous line then add the integer to our timestep list.
timestep = []
with open('example.txt', 'r') as f:
lines = f.readlines()
for i, line in enumerate(lines):
if "ITEM: TIMESTEP" in lines[i-1]:
timestep.append(int(line.strip()))
print(timestep)

Python Nested loop Executes only once for file iteration

#if(len(results) != 0)
fr = (open("new_file.txt","r"))
fr1 = (open("results.txt","w"))
for j in range (len(line_list)):
for i, line in enumerate(fr):
if(i == line_list[j]):`find the line in the file`
fr1.write(FAILURE_STRING+line)`mark the failure`
else:`enter code here`
fr1.write(line)
fr.close()
fr1.close()
In the above example mmy j loop executes only once. I am trying to mark the failure in the results file. even if my line_list has almost 7 element (line numbers i am suppose to mark the failure for in case of a mismatch), it marks failure for only 1 element. If i take the J for loop inside, it will mark all the failure the there will be the duplicates inside the results file (the number of duplicates of each line would be as same as the number of elements in the line_list )
If I understood correctly, you have a list of lines that do not match with the ones on a file (new_file.txt), and you want to introduce an error string to those lines. For that, you have to use fr.readlines() on the cycle, which results in something like this
line_list = [0, 2, 2, 4] # Example list of lines
FAILURE_STRING = "NO"
fr = open("new_file.txt", "r")
fr1 = open("results.txt", "w")
for i, line in enumerate(fr.readlines()):
if(i == line_list[i]):
fr1.write(FAILURE_STRING+line)
else:
fr1.write(line)
fr.close()
fr1.close()
open returns a generator, and you can only iterate over a generator once.
You have two options:
Reverse the for loops so you only iterate over the file once.
for i, line in enumerate(fr):
for j in range (len(line_list)):
if(i == line_list[j]): #find the line in the file
fr1.write(FAILURE_STRING+line)#mark the failure`
else:
fr1.write(line)
Cast your file to a type that's not a generator
fr = [i for i in fr]
Thank you for all your answers. #NightShadeQueen the 2 point in your answer helped me to get to it.
The following is the solution which worked:
if(len(results) != 0):
fr1 = (open("results.txt","w"))
fr = (open("new_file.txt","r"))
fr = [i for i in fr]
for i in range (len(fr)):
if i in line_list:
fr1.write(FAILURE_STRING+fr[i])
else:`enter code here`
fr1.write(fr[i])
fr1.close()

Reason for two similar codes giving different result and different approaches to this task

The question is
def sum_numbers_in_file(filename):
"""
Return the sum of the numbers in the given file (which only contains
integers separated by whitespace).
>>> sum_numbers_in_file("numbers.txt")
19138
"""
this is my first code:
rtotal = 0
myfile = open(filename,"r")
num = myfile.readline()
num_list = []
while num:
number_line = ""
number_line += (num[:-1])
num_list.append(number_line.split(" "))
num = myfile.readline()
for item in num_list:
for item2 in item:
if item2!='':
rtotal+= int(item2)
return rtotal
this is my second code:
f = open(filename)
m = f.readline()
n = sum([sum([int(x) for x in line.split()]) for line in f])
f.close()
return n
however the first one returns 19138 and the second one 18138
numbers.txt contains the following:
1000
15000
2000
1138
Because m = f.readLine() already reads 1 line from f and then you do the operation with the rest of the lines. If you delete that statement the 2 outputs will be the same. (I think :))
I'd say that m = f.readline() in the second snippet skips the first line (which contains 1000), that's why you get a wrong result.
As requested.. another approach to the question:
import re
def sum(filename):
return sum(int(x.group()) for x in re.finditer(r'\d+',open(filename).read()))
As said by answers, you are skipping first line because f.readline(). But a shorter approach would be:
n=sum((int(line[:-1]) for line in open("numbers.txt") if line[0].isnumeric()))

Categories