Loop over the list many times - python

Let's say I have a file source.txt containing a few rows.
I want to print rows over and over until I break the program manually.
file_source = 'source.txt'
source = open(file_source,'r')
while 1:
for line in source:
print line
source.close()
The easiest solution is fut open and close into while loop. By my feeling is that's not the best solution.
Can you suggest something better?
How to loop over variable source many times?
Regards

I wasn't sure this would work, but it appears you can just seek to the beginning of the file and then continue iterating:
file_source = 'source.txt'
source = open(file_source,'r')
while 1:
for line in source:
print line
source.seek(0)
source.close()
And obviously if the file is small you could simply read the whole thing into a list in memory and iterate over that instead.

You can read the lines at first and save them into a list. So your file is closed after reading. Then you can proceed with your infinite loop:
lines = []
with open(file_source, 'rb') as f:
lines = f.readlines()
while 1:
for line in lines:
print line
But, this is not advised if your file is very large since everything from the file will be read into the memory:
file.readlines([sizehint]):
Read until EOF using readline() and return a list containing the lines thus read.

Related

Python refuses to iterate through lines in a file more than once [duplicate]

This question already has answers here:
Iterating on a file doesn't work the second time [duplicate]
(4 answers)
Closed 4 years ago.
I am writing a program that requires me to iterate through each line of a file multiple times:
loops = 0
file = open("somefile.txt")
while loops < 5:
for line in file:
print(line)
loops = loops + 1
For the sake of brevity, I am assuming that I always need to loop through a file and print each line 5 times. That code has the same issue as the longer version I have implemented in my program: the file is only iterated through one time. After that the print(line) file does nothing. Why is this?
It's because the file = open("somefile.txt") line occurs only once, before the loop. This creates one cursor pointing to one location in the file, so when you reach the end of the first loop, the cursor is at the end of the file. Move it into the loop:
loops = 0
while loops < 5:
file = open("somefile.txt")
for line in file:
print(line)
loops = loops + 1
file.close()
for loop in range(5):
with open('somefile.txt') as fin:
for line in fin:
print(fin)
This will re-open the file five times. You could seek() to beginning instead, if you like.
for line in file reads each line once. If you want to start over from the first line, you could for example close and reopen the file.
Python file objects are iterators. Like other iterators, they can only be iterated on once before becoming exhausted. Trying to iterate again results in the iterator raising StopIteration (the signal it has nothing left to yield) immediately.
That said, file objects do let you cheat a bit. Unlike most other iterators, you can rewind them using their seek method. Then you can iterate their contents again.
Another option would be to reopen the file each time you need to iterate on it. This is simple enough, but (ignoring the OS's disk cache) it might be a bit wasteful to read the file repeatedly.
A final option would be to read the whole contents of the file into a list at the start of the program and then do the iteration over the list instead of over the file directly. This is probably the most efficient option as long as the file is small enough that fitting its whole contents in memory at one time is not a problem.
when you iterate once the pointer points to the last line in the file so try to use
file.seek(0) instead of opening the file again and again in the loop
with open('a.txt','r+')as f:
for i in range(0,5):
for line in f:
print(line)
f.seek(0)
Files are treated as generator expressions by default when you iterate through them. If you want to iterate over the file multiple times line by line, you might want to convert the file to something like a list first.
lines = open("somefile.txt").read().splitlines()
for line in lines:
print(line)

parse a file, appending each line at the end and removing the line from the top

I am trying to move each line down at the bottom of the file; this is how the file look like:
daodaos 12391039
idiejda 94093420
jfijdsf 10903213
....
#completed
So at the end of the parsing, I am planning to get all the entry that are on the top, under the actual string that says # completed.
The problem is that I am not sure how can I do this in one pass; I know that I can read the whole file, every single line, close the file and then re-open the file in write mode; searching for that line, removing it from the file and adding it to the end; but it feels incredibly inefficient.
Is there a way in one pass, to process the current line; then in the same for loop, delete the line and append it at the end of the file?
file = open('myfile.txt', 'a')
for items in file:
#process items line
#append items line to the end of the file
#remove items line from the file
suggest to keep it simple read and writeback
with open('myfile.txt') as f:
lines = f.readlines()
with open('myfile.txt', 'w') as f:
newlines = []
for line in lines:
# do you stuff, check if completed, rearrange the list
if line.startswith('#completed'):
idx=i
newlines = lines[idx:] + lines[:idx]
break
f.write(''.join(newlines)) # write back new lines
below is another version i could think of if insist wanna modify while reading
with open('myfile.txt', 'r+') as f:
newlines = ''
line = True
while line:
line = f.readline()
if line.startswith('#completed'):
# line += f.read() # uncomment this line if you interest on line after #completed
f.truncate()
f.seek(0)
f.write(line + newlines)
break
else:
newlines += line
Not really.
Your main problem here is that you're iterating on the file at the same time you want to change it. This will Do Bad Things (tm) to your processing, unless you plan to micro-manage the file position pointer.
You do have that power: the seek method lets you move to a given file location, expressed in bytes. seek(0) moves to the start of the file; seek(-1) to the end. The problem you face is that your for loop trusts that this pointer indicates the next line to read.
One distinct problem is that you can't just remove a line from the middle of the file; something exists in those bytes. Think of it as lines of text on a page, written in pencil. You can erase line 4, but this does not cause lines 5-end to magically float up half a centimeter; they're still in the same physical location.
How to Do It ... sort of
Read all of the lines into a list. You can easily change a list the way you want. When you hit the end, then write the list back to the file -- or use your magic seek and append powers to alter only a little of it.
I'll recommend you to do this the simple way: read all the file and store it in a variable, move the completed files to another variable and then rewrite your file.

Python huge file reading [duplicate]

This question already has answers here:
What is the idiomatic way to iterate over a binary file?
(5 answers)
Closed 8 years ago.
I need to read a big datafile (~200GB) , line by line using a Python script.
I have tried the regular line by line methods, However those methods use a large amount of memory. I want to be able to read the file chunk by chunk.
Is there a better way to load a large file line by line, say
a) by explicitly mentioning the maximum number of lines the file could load at any one time in memory ? Or
b) by loading it by chunks of size, say, 1024 bytes, provided the last line of the said chunk loads completely without being truncated?
Instead of reading it all at once, try reading it line by line:
with open("myFile.txt") as f:
for line in f:
#Do stuff with your line
Or, if you want to read N lines in at a time:
with open("myFile.txt") as myfile:
head = [next(myfile) for x in xrange(N)]
print head
To handle the StopIteration error that comes from hitting the end of the file, it's a simple try/catch (although there are plenty of ways).
try:
head = [next(myfile) for x in xrange(N)]
except StopIteration:
rest_of_lines = [line for line in myfile]
Or you can read those last lines in however you want.
To iterate over the lines of a file, do not use readlines. Instead, iterate over the file itself (you will find versions using xreadlines - it is deprecated and simply returns the file object itself) or :
with open(the_path, 'r') as the_file:
for line in the_file:
# Do stuff with the line
To read multiple lines at a time, you can use next on the file (it is an iterator), but you need to catch StopIteration, which indicates that there is no data left:
with open(the_path, 'r') as the_file:
the_lines = []
done = False
for i in range(number_of_lines): # Use xrange on Python 2
try:
the_lines.append(next(the_file))
except StopIteration:
done = True # Reached end of file
# Do stuff with the lines
if done:
break # No data left
Of course, you can also load the file in chunks of a specified byte count:
with open(the_path, 'r') as the_file:
while True:
data = the_file.read(the_byte_count)
if len(data) == 0:
# All data is gone
break
# Do stuff with the data chunk

While Loop Not Performing Main Function

I'm trying to write a Python script that uses a particular external application belonging to the company I work for. I can generally figure things out for myself when it comes to programming and scripting, but this time I am truely lost!
I can't seem to figure out why the while loop wont function as it is meant to. It doesn't give any errors which doesn't help me. It just seems to skip past the important part of the code in the centre of the loop and then goes on to increment the "count" like it should afterwards!
f = open('C:/tmp/tmp1.txt', 'w') #Create a tempory textfile
f.write("TEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\n") #Put some simple text in there
f.close() #Close the file
count = 0 #Insert the line number from the text file you want to begin with (first line starts with 0)
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt')) #Get the number of lines from the textfile
f = open('C:/tmp/tmp2.txt', 'w') #Create a new textfile
f.close() #Close it
while (count < num_lines): #Keep the loop within the starting line and total number of lines from the first text file
with open('C:/tmp/tmp1.txt', 'r') as f: #Open the first textfile
line2 = f.readlines() #Read these lines for later input
for line2[count] in f: #For each line from chosen starting line until last line from first text file,...
with open('C:/tmp/tmp2.txt', 'a') as g: #...with the second textfile open for appending strings,...
g.write("hello\n") #...write 'hello\n' each time while "count" < "num_lines"
count = count + 1 #Increment the "count"
I think everything works up until: "for line2[count] in f:"
The real code I'm working on is somewhat more complicated, and the application I'm using isn't exactly for sharing, so I have simplified the code to give silly outputs instead just to fix the problem.
I'm not looking for alternative code, I'm just looking for a reason why the loop isn't working so I can try to fix it myself.
All answers will be appreciated, and thanking everyone in advance!
Cormac
Some comments:
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt'))
Why? What's wrong with len(open(filename, 'rb').readlines())?
while (count < num_lines):
...
count = count + 1
This is bad style, you could use:
for i in range(num_lines):
...
Note that I named your index i, which is universally recognized, and that I used range and a for loop.
Now, your problem, like I said in the comment, is that f is a file (that is, a stream of bytes with a location pointer) and you've read all the lines from it. So when you do for line2[count] in f:, it will try reading a line into line2[count] (this is a bit weird, actually, you almost never use a for loop with a list member as an index but apparently you can do that), see that there's no line to read, and never executes what's inside the loop.
Anyway, you want to read a file, line by line, starting from a given line number? Here's a better way to do that:
from itertools import islice
start_line = 0 # change this
filename = "foobar" # also this
with open(filename, 'rb') as f:
for line in islice(f, start_line, None):
print(line)
I realize you don't want alternative code, but your code really is needlessly complicated.
If you want to iterate over the lines in the file f, I suggest replacing your "for" line with
for line in line2:
# do something with "line"...
You put the lines in an array called line2, so use that array! Using line2[count] as a loop variable doesn't make sense to me.
You seem to get it wrong how the 'for line in f' loop works. It iterates over a file and calls readline, until there are no lines to read. But at the moment you start the loop all the lines are already read(via f.readlines()) and file's current position is at end. You can achieve what you want by calling f.seek(0), but that doesn't seem to be a good decision anyway, since you're going to read file again and that's slow IO.
Instead you want to do smth like:
for line in line2[count:]: # iterate over lines read, starting with `count` line
do_smth_with(line)

Read file Again Python

I would like to read a file again and again when it arrives at the end.
The file is only numbers separate by comma.
I use python and I read on the doc that file.seek(0) can be use for this but doesn't work for me.
This is my script:
self.users = []
self.index = -1
infile = open(filename, "r")
for line in infile.readlines():
if line != None:
self.users.append(String.split((line),','))
else:
infile.seek(0)
infile.read()
infile.close()
self.index= self._index +1
return self.users[self.index]
Thank you for your help
infile.read() will read in the whole of the file and then throw away the result. Why are you doing it?
When you call infile.readlines you have already read in the whole file. Then your loop iterates over the result, which is just a Python list. Moving to the start of the file will have no effect on that.
If your code did in fact move to the start of the file after reaching the end, it would simply loop for ever until it ran out of memory (because of the endlessly growing users list).
You could get the behaviour you're asking for by storing the result of readlines() in a variable and then putting the whole for line in all_lines: loop inside another while True:. (Or closing, re-opening and re-reading every time, if (a) you are worried that the file might be changed by another program or (b) you want to avoid reading it all in in a single gulp. For (b) you would replace for line in infile.readlines(): with for line in infile:. For (a), note that trying to read a file while something else might be writing to it is likely to be a bad idea no matter how you do it.)
I strongly suspect that the behaviour you're asking for is not what you really want. What's the goal you're trying to achieve by making your program keep reading the file over and over?
The 'else' branch will never be pursued because the for loop will iterate over all the lines of the files and then exit.
If you want the seek operation to be executed you will have to put it outside the for loop
self.users = []
self.index = -1
infile = open(filename, "r")
while True:
for line in infile.readlines():
self.users.append(String.split((line),','))
infile.seek(0)
infile.close()
self.index= self._index +1
return self.users[self.index]
The problem is, if you will loop for ever you will exhaust the memory. If you want to read it only twice then copy and paste the for loop, otherwise decide an exit condition and use a break operation.
readlines is already reading the entire file contents into an in-memory list, which you are free to iterate over again and again!
To re-read the file do:
infile = file('whatever')
while True:
content = infile.readlines()
# do something with list 'content'
# re-read the file - why? I do not know
infile.seek(0)
infile.close()
You can use itertools.cycle() here.
Here's an example :
import itertools
f = open(filename)
lines = f.readlines()
f.close()
for line in itertools.cycle(lines):
print line,

Categories