Reading in a file, one chunk at a time [duplicate] - python

This question already has answers here:
Read multiple block of file between start and stop flags
(4 answers)
Closed 6 years ago.
I have a VERY large file formatted like this:
(mydelimiter)
line
line
(mydelimiter)
line
line
(mydelimiter)
Since the file is so large I can't read it all into memory at once. So I would like to read each chunk between "(mydelimiter)" at a time, perform some operations on it, then read in the next chunk.
This is the code I have so far:
with open(infile,'r') as f:
chunk = []
for line in f:
chunk.append(line)
Now, I'm not sure how to tell python "keep appending lines UNTIL you hit another line with '(mydelimiter)' in it", and then save the line where it stopped abd start there in the next iteration of the for loop.
Note: it's also not possible to read in a certain number of lines at a time since each chunk is variable length.

Aren't you perhaps over thinking this? Something as simple as the following code can do the trick for you
with open(infile,'r') as f:
chunk = []
for line in f:
if line == 'my delimiter':
call_something(chunk)
chunk=[]
else :
chunk.append(line)

Related

How to read each line in a file backwards using Python [duplicate]

This question already has answers here:
How do I reverse a string in Python?
(19 answers)
Closed 3 years ago.
I'm trying to read a file (example below), line by line, backwards using Python.
abcd 23ad gh1 n d
gjds 23iu bsddfs ND31 NG
Note: I'm not trying to read the file from the end to the beginning, but I want to read each line starting from the end, i.e d for line 1, and NG for line 2.
I know that
with open (fileName) as f:
for line in f:
reads each line from left to right, I want to read it from right to left.
Try this:
with open(fileName, 'r') as f:
for line in f:
for item in line.split()[::-1]:
print(item)
If your file is not too big, you can read lines in reverse easily
with open(fileName) as f:
for line in reversed(f.readlines()):
# do something
Otherwise, I believe you'd have to use seed.

How to cleverly read big file in chunks? [duplicate]

This question already has answers here:
How should I read a file line-by-line in Python?
(3 answers)
Closed 3 years ago.
I have a very big file (~10GB) and I want to read it in its wholeness. In order to achieve this, I cut it into chunks. However, I have troubles cutting the big file into exploitable pieces: I want thousands lines together without having them splitted in the middle. I have found a function here on SO that I have arranged a bit:
def readPieces(file):
while True:
data = file.read(4096).strip()
if not data:
break
yield data
with open('bigfile.txt', 'r') as f:
for chunk in readPieces(f):
print(chunk)
I can specify the bytes I want to read (here 4MB) but when I do so my lines get cut in the middle, and if I remove it, it'll read the big file that will lead to a process stop. How can I do this?
Also, the lines in my file haven't equal size.
The following code reads the file line by line, the previous line gets garbage collected.
with open('bigfile.txt') as file:
for line in file:
print(line)

Python refuses to iterate through lines in a file more than once [duplicate]

This question already has answers here:
Iterating on a file doesn't work the second time [duplicate]
(4 answers)
Closed 4 years ago.
I am writing a program that requires me to iterate through each line of a file multiple times:
loops = 0
file = open("somefile.txt")
while loops < 5:
for line in file:
print(line)
loops = loops + 1
For the sake of brevity, I am assuming that I always need to loop through a file and print each line 5 times. That code has the same issue as the longer version I have implemented in my program: the file is only iterated through one time. After that the print(line) file does nothing. Why is this?
It's because the file = open("somefile.txt") line occurs only once, before the loop. This creates one cursor pointing to one location in the file, so when you reach the end of the first loop, the cursor is at the end of the file. Move it into the loop:
loops = 0
while loops < 5:
file = open("somefile.txt")
for line in file:
print(line)
loops = loops + 1
file.close()
for loop in range(5):
with open('somefile.txt') as fin:
for line in fin:
print(fin)
This will re-open the file five times. You could seek() to beginning instead, if you like.
for line in file reads each line once. If you want to start over from the first line, you could for example close and reopen the file.
Python file objects are iterators. Like other iterators, they can only be iterated on once before becoming exhausted. Trying to iterate again results in the iterator raising StopIteration (the signal it has nothing left to yield) immediately.
That said, file objects do let you cheat a bit. Unlike most other iterators, you can rewind them using their seek method. Then you can iterate their contents again.
Another option would be to reopen the file each time you need to iterate on it. This is simple enough, but (ignoring the OS's disk cache) it might be a bit wasteful to read the file repeatedly.
A final option would be to read the whole contents of the file into a list at the start of the program and then do the iteration over the list instead of over the file directly. This is probably the most efficient option as long as the file is small enough that fitting its whole contents in memory at one time is not a problem.
when you iterate once the pointer points to the last line in the file so try to use
file.seek(0) instead of opening the file again and again in the loop
with open('a.txt','r+')as f:
for i in range(0,5):
for line in f:
print(line)
f.seek(0)
Files are treated as generator expressions by default when you iterate through them. If you want to iterate over the file multiple times line by line, you might want to convert the file to something like a list first.
lines = open("somefile.txt").read().splitlines()
for line in lines:
print(line)

Python huge file reading [duplicate]

This question already has answers here:
What is the idiomatic way to iterate over a binary file?
(5 answers)
Closed 8 years ago.
I need to read a big datafile (~200GB) , line by line using a Python script.
I have tried the regular line by line methods, However those methods use a large amount of memory. I want to be able to read the file chunk by chunk.
Is there a better way to load a large file line by line, say
a) by explicitly mentioning the maximum number of lines the file could load at any one time in memory ? Or
b) by loading it by chunks of size, say, 1024 bytes, provided the last line of the said chunk loads completely without being truncated?
Instead of reading it all at once, try reading it line by line:
with open("myFile.txt") as f:
for line in f:
#Do stuff with your line
Or, if you want to read N lines in at a time:
with open("myFile.txt") as myfile:
head = [next(myfile) for x in xrange(N)]
print head
To handle the StopIteration error that comes from hitting the end of the file, it's a simple try/catch (although there are plenty of ways).
try:
head = [next(myfile) for x in xrange(N)]
except StopIteration:
rest_of_lines = [line for line in myfile]
Or you can read those last lines in however you want.
To iterate over the lines of a file, do not use readlines. Instead, iterate over the file itself (you will find versions using xreadlines - it is deprecated and simply returns the file object itself) or :
with open(the_path, 'r') as the_file:
for line in the_file:
# Do stuff with the line
To read multiple lines at a time, you can use next on the file (it is an iterator), but you need to catch StopIteration, which indicates that there is no data left:
with open(the_path, 'r') as the_file:
the_lines = []
done = False
for i in range(number_of_lines): # Use xrange on Python 2
try:
the_lines.append(next(the_file))
except StopIteration:
done = True # Reached end of file
# Do stuff with the lines
if done:
break # No data left
Of course, you can also load the file in chunks of a specified byte count:
with open(the_path, 'r') as the_file:
while True:
data = the_file.read(the_byte_count)
if len(data) == 0:
# All data is gone
break
# Do stuff with the data chunk

Search through file until no more data [duplicate]

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Reading huge data from files and calling them
I do not know how to search through a file, I have a file which has around 50 lines of data in this format (1.000 2.000 3.000) but I do not know how to do one line do a conversion(already have) then go to the next line do the same until it reaches the end, so basically do a process line by line until no more lines.
with open('filename') as f:
for line in f:
line = line.rstrip()
# do the conversion (that you already know how to do)
Here:
with open('filename') as f: opens the file (and automatically closes it at the end);
for line in f: reads every line of the file into line;
line = line.rstrip() removes any trailing whitespaces and the newline character from line.

Categories