Back to the top of file? - python

I have a program where I'm reading numbers from a file and adding those numbers to different lists in my program. Now, I need to jump back to the top of the file and read from the top again. Does anyone know if there is a command that does that or if its even possible?

You can use seek(0) to start from the beginning all over again.
Actually, when you read from a file, it continuously updates the offset to the current bytes. seek() provides you with the ability to set the offset at any position.
At the beginning, offset is located at 0. So, f.seek(0) will set the offset at the beginning of the file.
with open('filename','r') as f:
f.read(100) # Read first 100 byte
f.seek(0) # set the offset at the beginning
f.read(50) # Read first 50 byte again.

There are couple of ways to achieve this. Simplest one is using seek(0). 0 represents start of the file.
You can even use tell() to store any position of the file and reuse it:
with open('file.txt', 'r') as f:
first_position = f.tell()
f.read() # your read
f.seek(first_position) # it will take you to the previous position you marked.

Related

Is there a buffer when one uses the 'readline()' method? Can i access previously 'read/accessed' lines of txt? [duplicate]

For an exercise I'm doing, I'm trying to read the contents of a given file twice using the read() method. Strangely, when I call it the second time, it doesn't seem to return the file content as a string?
Here's the code
f = f.open()
# get the year
match = re.search(r'Popularity in (\d+)', f.read())
if match:
print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())
if matches:
# matches is always None
Of course I know that this is not the most efficient or best way, this is not the point here. The point is, why can't I call read() twice? Do I have to reset the file handle? Or close / reopen the file in order to do that?
Calling read() reads through the entire file and leaves the read cursor at the end of the file (with nothing more to read). If you are looking to read a certain number of lines at a time you could use readline(), readlines() or iterate through lines with for line in handle:.
To answer your question directly, once a file has been read, with read() you can use seek(0) to return the read cursor to the start of the file (docs are here). If you know the file isn't going to be too large, you can also save the read() output to a variable, using it in your findall expressions.
Ps. Don't forget to close the file after you are done with it.
As other answers suggested, you should use seek().
I'll just write an example:
>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output
Everyone who has answered this question so far is absolutely right - read() moves through the file, so after you've called it, you can't call it again.
What I'll add is that in your particular case, you don't need to seek back to the start or reopen the file, you can just store the text that you've read in a local variable, and use it twice, or as many times as you like, in your program:
f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
# matches will now not always be None
The read pointer moves to after the last read byte/character. Use the seek() method to rewind the read pointer to the beginning.
Every open file has an associated position.
When you read() you read from that position.
For example read(10) reads the first 10 bytes from a newly opened file, then another read(10) reads the next 10 bytes.
read() without arguments reads all of the contents of the file, leaving the file position at the end of the file. Next time you call read() there is nothing to read.
You can use seek to move the file position. Or probably better in your case would be to do one read() and keep the result for both searches.
read() consumes. So, you could reset the file, or seek to the start before re-reading. Or, if it suites your task, you can use read(n) to consume only n bytes.
I always find the read method something of a walk down a dark alley. You go down a bit and stop but if you are not counting your steps you are not sure how far along you are. Seek gives the solution by repositioning, the other option is Tell which returns the position along the file. May be the Python file api can combine read and seek into a read_from(position,bytes) to make it simpler - till that happens you should read this page.

Why does inlining f.read() cause this assertion? [duplicate]

For an exercise I'm doing, I'm trying to read the contents of a given file twice using the read() method. Strangely, when I call it the second time, it doesn't seem to return the file content as a string?
Here's the code
f = f.open()
# get the year
match = re.search(r'Popularity in (\d+)', f.read())
if match:
print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())
if matches:
# matches is always None
Of course I know that this is not the most efficient or best way, this is not the point here. The point is, why can't I call read() twice? Do I have to reset the file handle? Or close / reopen the file in order to do that?
Calling read() reads through the entire file and leaves the read cursor at the end of the file (with nothing more to read). If you are looking to read a certain number of lines at a time you could use readline(), readlines() or iterate through lines with for line in handle:.
To answer your question directly, once a file has been read, with read() you can use seek(0) to return the read cursor to the start of the file (docs are here). If you know the file isn't going to be too large, you can also save the read() output to a variable, using it in your findall expressions.
Ps. Don't forget to close the file after you are done with it.
As other answers suggested, you should use seek().
I'll just write an example:
>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output
Everyone who has answered this question so far is absolutely right - read() moves through the file, so after you've called it, you can't call it again.
What I'll add is that in your particular case, you don't need to seek back to the start or reopen the file, you can just store the text that you've read in a local variable, and use it twice, or as many times as you like, in your program:
f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
# matches will now not always be None
The read pointer moves to after the last read byte/character. Use the seek() method to rewind the read pointer to the beginning.
Every open file has an associated position.
When you read() you read from that position.
For example read(10) reads the first 10 bytes from a newly opened file, then another read(10) reads the next 10 bytes.
read() without arguments reads all of the contents of the file, leaving the file position at the end of the file. Next time you call read() there is nothing to read.
You can use seek to move the file position. Or probably better in your case would be to do one read() and keep the result for both searches.
read() consumes. So, you could reset the file, or seek to the start before re-reading. Or, if it suites your task, you can use read(n) to consume only n bytes.
I always find the read method something of a walk down a dark alley. You go down a bit and stop but if you are not counting your steps you are not sure how far along you are. Seek gives the solution by repositioning, the other option is Tell which returns the position along the file. May be the Python file api can combine read and seek into a read_from(position,bytes) to make it simpler - till that happens you should read this page.

python - Trying to do a program to replace a given line, by the same line but all CAPS

Trying to do a college exercise where I'm supposed to replace a given line in a file, by the same line but written in all caps. The problem is we can only write in the same file, and in that exact line, we can't write in the rest of the file.
This is the code I have so far, but I can't figure out how to go to the line I want
def upper(n):
count=0
with open("upper.txt", "r+") as file:
lines = file.readlines()
file.seek(0)
for line in file.readlines():
if count == n:
pos = file.tell()
line1 = str(line.upper())
count += 1
file.seek(pos)
file.write(line1)
Help appreciated!
The problem lies in that your readlines already has read the entire file, and so the position of the "file cursor" is always at the end of the file. In theory, a simple fix should be:
Initialize pos to 0.
Read a single line.
If the current line counter indicates this is the one you want, set the position to pos again, update that line, and exit.
Update pos to point to the end of this line (so it points to the start of the next line).
Loop until satisfied.
In code, that would be this:
def upper(n):
count=0
with open("text.txt", "r+") as file:
pos = 0
for line in file.readlines():
if count == n:
line1 = line.upper()
break
pos = file.tell()
count += 1
file.seek(pos)
file.write(line1)
upper(5)
However! There is a snag. File operations are heavily buffered, and the for loop on readlines does not read one line at a time. Instead, for efficiency, it reads as much as possible, but it only "returns" the next line to your program. On a next run through your loop, it simply checks if it already had read enough of your text file to return the following line, and if not, it fills its internal buffer again. So, even while tell() will correctly be updated to the external file position – the value you see –, it does not reflect the "cursor" position of what you are processing at the time.
One way to circumvent this is to physically mimic what readlines does: read a single byte at a time, determine whether you have read an entire line (then this byte would be \n), and update your position and status based on this.
However, a more proper way of updating a file is to read it into memory in its entirety, change it, and write it back to disk. Changing part of an existing file with "r+" is usually recommended to use binary mode (where the position of each byte is known beforehand); admittedly, in theory your method should have worked as well, but as you see the file buffering defeats this.
Reading, changing, and writing the file entirely is as simple as this:
def better_upper(n):
count=0
with open("text.txt", "r") as file:
lines = file.readlines()
lines[n] = lines[n].upper()
with open("text.txt", "w") as file:
file.writelines(lines)
better_upper(5)
(Where the only caveat is that it always overwrites the original file. That is: if something unexpected goes wrong, it will probably erase text.txt. If you want a belt-and-suspenders approach, write to a new file, then check if it got written correctly. If it did, delete the old file and rename the new one. Left as an exercise to the reader.)

seek() function?

Please excuse my confusion here but I have read the documentation regarding the seek() function in python (after having to use it) and although it helped me I am still a bit confused on the actual meaning of what it does, any explanations are much appreciated, thank you.
Regarding seek() there's not too much to worry about.
First of all, it is useful when operating over an open file.
It's important to note that its syntax is as follows:
fp.seek(offset, from_what)
where fp is the file pointer you're working with; offset means how many positions you will move; from_what defines your point of reference:
0: means your reference point is the beginning of the file
1: means your reference point is the current file position
2: means your reference point is the end of the file
if omitted, from_what defaults to 0.
Never forget that when managing files, there'll always be a position inside that file where you are currently working on. When just open, that position is the beginning of the file, but as you work with it, you may advance.
seek will be useful to you when you need to walk along that open file, just as a path you are traveling into.
When you open a file, the system points to the beginning of the file. Any read or write you do will happen from the beginning. A seek() operation moves that pointer to some other part of the file so you can read or write at that place.
So, if you want to read the whole file but skip the first 20 bytes, open the file, seek(20) to move to where you want to start reading, then continue with reading the file.
Or say you want to read every 10th byte, you could write a loop that does seek(9, 1) (moves 9 bytes forward relative to the current positions), read(1) (reads one byte), repeat.
The seek function expect's an offset in bytes.
Ascii File Example:
So if you have a text file with the following content:
simple.txt
abc
You can jump 1 byte to skip over the first character as following:
fp = open('simple.txt', 'r')
fp.seek(1)
print fp.readline()
>>> bc
Binary file example gathering width :
fp = open('afile.png', 'rb')
fp.seek(16)
print 'width: {0}'.format(struct.unpack('>i', fp.read(4))[0])
print 'height: ', struct.unpack('>i', fp.read(4))[0]
Note: Once you call read you are changing the position of the
read-head, which act's like seek.
For strings, forget about using WHENCE: use f.seek(0) to position at beginning of file and f.seek(len(f)+1) to position at the end of file. Use open(file, "r+") to read/write anywhere in a file. If you use "a+" you'll only be able to write (append) at the end of the file regardless of where you position the cursor.

Why can't I call read() twice on an open file?

For an exercise I'm doing, I'm trying to read the contents of a given file twice using the read() method. Strangely, when I call it the second time, it doesn't seem to return the file content as a string?
Here's the code
f = f.open()
# get the year
match = re.search(r'Popularity in (\d+)', f.read())
if match:
print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())
if matches:
# matches is always None
Of course I know that this is not the most efficient or best way, this is not the point here. The point is, why can't I call read() twice? Do I have to reset the file handle? Or close / reopen the file in order to do that?
Calling read() reads through the entire file and leaves the read cursor at the end of the file (with nothing more to read). If you are looking to read a certain number of lines at a time you could use readline(), readlines() or iterate through lines with for line in handle:.
To answer your question directly, once a file has been read, with read() you can use seek(0) to return the read cursor to the start of the file (docs are here). If you know the file isn't going to be too large, you can also save the read() output to a variable, using it in your findall expressions.
Ps. Don't forget to close the file after you are done with it.
As other answers suggested, you should use seek().
I'll just write an example:
>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output
Everyone who has answered this question so far is absolutely right - read() moves through the file, so after you've called it, you can't call it again.
What I'll add is that in your particular case, you don't need to seek back to the start or reopen the file, you can just store the text that you've read in a local variable, and use it twice, or as many times as you like, in your program:
f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
# matches will now not always be None
The read pointer moves to after the last read byte/character. Use the seek() method to rewind the read pointer to the beginning.
Every open file has an associated position.
When you read() you read from that position.
For example read(10) reads the first 10 bytes from a newly opened file, then another read(10) reads the next 10 bytes.
read() without arguments reads all of the contents of the file, leaving the file position at the end of the file. Next time you call read() there is nothing to read.
You can use seek to move the file position. Or probably better in your case would be to do one read() and keep the result for both searches.
read() consumes. So, you could reset the file, or seek to the start before re-reading. Or, if it suites your task, you can use read(n) to consume only n bytes.
I always find the read method something of a walk down a dark alley. You go down a bit and stop but if you are not counting your steps you are not sure how far along you are. Seek gives the solution by repositioning, the other option is Tell which returns the position along the file. May be the Python file api can combine read and seek into a read_from(position,bytes) to make it simpler - till that happens you should read this page.

Categories