How To Only Read One Part Of A Line In Python - python

I am working on making a program that will read a txt file that is full of binary, then turn the binary into ASCII, and print the outcome. I need to read only parts of the lines, so say I had 00100001 and 00100110 on the same line, how do I make my program only read 0010000, and ignore 00100110?

read() takes optional argument which is a size of string to read. So you may use it as follows:
with open('numbers.txt') as f:
f.read(8) # read up to 8 chars from first line
f.readline() # skip to next line
Certainly not a full answer (both problem definition and input file format are not precised), but it may be a good place to start.
Still, personally I'd read file line by line and simply perform some unified operation on each line.

Related

Modify only a specific part of a file

I have a python script that is supposed to read a file. The issue is that that file is very large so for efficiency I decided that my script should only read from line 650000 and onward, since previous line does not contain relevant information.
Is there any way to only modify lines 650000 till eof, so for example, if i read() this file only those specific lines would appear?
Files are not line-oriented, they are blocks of bytes.
There's no way, short of reading the data in, to figure out how many bytes make up those first 650,000 lines, so you'd have to do that just in order to skip them.
Starting modifying a file at a certain offset is possible, but that offset will be in bytes which is the addressing unit used by files.
Skipping lines can be done easily enough:
with open("myfile.txt", "w+t") as f:
for i in xrange(650000):
f.readline() # Read a line and throw it away
f.write("hello")
This will truncate the file so that there will be no data after the hello (but 650,000 lines before it, of course).

Why does readline() put the file pointer at the end of the file in Python?

I have a code, that is supposed to write a text file, and then replace some text in one line with something else.
def read(text):
file = open('File.txt', 'r+') #open the file for reading and writing
x = file.readline() # read the first line of the file
file.seek(0, 0) #put the pointer back to the begining
file.readline() #skip one line
file.write(text) #write the text
file.close() #close the file
read('ABC')
At the beginning it's fine. It reads the first line and sets pointer to the beginning of the file. But then when it's supposed to read one line and put the pointer at the second line, it puts it at the end of the file. If I assign that to a variable, it only reads one line, but it still sets the pointer at the end of the file.
Apparently readline() doesn't work as I thought it was, so please tell me how I could read some lines of the text and the write something to the specific line.
Writing, by default, always takes place at the end of the file. Calling file.readline() doesn't change this behaviour, especially since readline() calls can use a buffer to read in larger blocks.
You could override by using file.seek() explicitly to go to the end of a line; you just read the line, you know the length, seek to that point:
x = file.readline()
file.seek(len(x), 0)
file.write(text) #write the text
Note that you cannot insert lines, or easily replace lines. A file is a stream of individual bytes, not lines, so if you write in a line of 10 characters (including the newline) you can only replace that line with 10 other characters. Longer or shorter lines won't work here; you are just going to replace fewer or more characters in the file and either partially replace a line or overwrite (part of) the next line.

Not understanding read command in Python

I'm trying to understand what's going on with my read function. I'm simply doing a readline of a text document I created in canopy. For some reason it only gives me w for whatever value I put in. I'm new to the world of Python so I'm sure its an easy answer! Thanks for your help!
import os
my_file = open(os.path.expanduser("~/Desktop/Python Files/Test Text.txt"),'r')
print my_file.readline(3)
my_file.close()
My Text document is below
w
o
r
d
s
my_file.readline(3) reads up to 3 bytes from the first line.
The first line contains a w and an end-of-line character.
If you want to read up to the first 3 bytes regardless of the line, use my_file.read(3). Note that end-of-line characters are included in the count.
If you want to print the first 3 lines, you could use
import os
with open(os.path.expanduser("~/Desktop/Python Files/Test Text.txt"),'r') as my_file:
for i, line in enumerate(my_file):
if i >= 3: break
print(line)
or
import itertools as IT
with open(os.path.expanduser("~/Desktop/Python Files/Test Text.txt"),'r') as my_file:
for line in IT.islice(my_file, 3):
print(line)
For short files you could instead use
with open(os.path.expanduser("~/Desktop/Python Files/Test Text.txt"),'r') as my_file:
lines = my_file.readlines()
for line in lines[:3]:
print(line)
but note that my_file.readlines() returns a list of all the lines in the
file. Since this can be very memory-intensive if the file is huge, and since it
is usually possible to process a file line-by-line (which is much less
memory-intensive), generally the first two methods of reading a file are
preferred over the third.
'readline([size]) -> next line from the file, as a string.Retain newline. A non-negative size argument limits the maximum number of bytes to return (an incomplete line may be returned then).Return an empty string at EOF.
readline reads next line and so on.The argument size is for how many bytes should it read from the corresponding line.
Using f.readline does not give random access to the file. I think you want to read the third (or maybe fourth if you're zero-indexing) line. The argument that you're passing to f.readline is a maximum byte count to read, rather than a specific line to read.

Python: read a line and write back to that same line

I am using python to make a template updater for html. I read a line and compare it with the template file to see if there are any changes that needs to be updated. Then I want to write any changes (if there are any) back to the same line I just read from.
Reading the file, my file pointer is positioned now on the next line after a readline(). Is there anyway I can write back to the same line without having to open two file handles for reading and writing?
Here is a code snippet of what I want to do:
cLine = fp.readline()
if cLine != templateLine:
# Here is where I would like to write back to the line I read from
# in cLine
Updating lines in place in text file - very difficult
Many questions in SO are trying to read the file and update it at once.
While this is technically possible, it is very difficult.
(text) files are not organized on disk by lines, but by bytes.
The problem is, that read number of bytes on old lines is very often different from new one, and this mess up the resulting file.
Update by creating a new file
While it sounds inefficient, it is the most effective way from programming point of view.
Just read from file on one side, write to another file on the other side, close the files and copy the content from newly created over the old one.
Or create the file in memory and finally do the writing over the old one after you close the old one.
At the OS level the things are a bit different from how it looks from Python - from Python a file looks almost like a list of strings, with each string having arbitrary length, so it seems to be easy to swap a line for something else without affecting the rest of the lines:
l = ["Hello", "world"]
l[0] = "Good bye"
In reality, though, any file is just a stream of bytes, with strings following each other without any "padding". So you can only overwrite the data in-place if the resulting string has exactly the same length as the source string - otherwise it'll simply overwrite the following lines.
If that is the case (your processing guarantees not to change the length of strings), you can "rewind" the file to the start of the line and overwrite the line with new data. The below script converts all lines in file to uppercase in-place:
def eof(f):
cur_loc = f.tell()
f.seek(0,2)
eof_loc = f.tell()
f.seek(cur_loc, 0)
if cur_loc >= eof_loc:
return True
return False
with open('testfile.txt', 'r+t') as fp:
while True:
last_pos = fp.tell()
line = fp.readline()
new_line = line.upper()
fp.seek(last_pos)
fp.write(new_line)
print "Read %s, Wrote %s" % (line, new_line)
if eof(fp):
break
Somewhat related: Undo a Python file readline() operation so file pointer is back in original state
This approach is only justified when your output lines are guaranteed to have the same length, and when, say, the file you're working with is really huge so you have to modify it in place.
In all other cases it would be much easier and more performant to just build the output in memory and write it back at once. Another option is to write to a temporary file, then delete the original and rename the temporary file so it replaces the original file.

file pointer down then over

The Task
I am writing a program in python that running a SAP2000 program by importing a new .s2k file each time into the Sap2000 program, and then a new file is generated from the results of the previous run by the means of exporting the data.
The file is about 1,500 lines containing arbitrary words and numbers. (For a better understanding, see this: http://pastebin.com/8ptYacJz, which is the file I am dealing with.)
I'm required to replace one number in the file.
That number is somewhere in the middle of line 800.
The Question
Does anyone know an efficient way to move down to the middle of line 800 in a file, in order to replace one number?
What I've Tried
Regular expressions did not work, because there can be more then one instance of the same number.
So I came up with the solution of templating the file and writing a new file each time with the number to be changed as a template parameter.
This solution does work but the person insists that I can move the file pointer down to line 800, then over to the middle of the line to replace the number.
Here is the only code I have for the problem that takes the file buffer to a line then back up to the beginning when I try to seek over.
import sys
import os
#open file
f = open("output.$2k")
#this will go to line 883 in text file
count = 0;
while count < 883:
line = f.readline()
count = count+1
#this would seek over to middle of file DOESN'T WORK
f.seek(0,0)
line = f.readline()
print(line)
f.close()
Yes and no. Consider:
f=open('output.$2k','r+')
f.seek(300)
f.write('\n')
f.close()
This script just changes the 300th character in your ascii file to a newline. Now the tricky part is that there is no way to know the length of a line in an ascii file short of reading until you get to a newline. So, locating the particular character in the file at the middle of the 800th line is non-trivial. However, if you can make guarantees (due to the way the file was written) about the line length, you can calculate the position without any problem. Also note that replacing 1 with 100 won't work here. You need to replace 1 character with 1 character.
And just for all the other *NIX users in the world ... please don't put $ in your filename. That's just a nightmare...
OK, i'm not a professional programmer, but my (stupid) approach would be: If it's always line 800, read the file line by line while tracking the line numbers. Write then directly to a new file. Read line 800, change it, write it. Then write the rest. Dumb and not elegant but it should work-unless i miss something which i probably do. And there goes my meager reputation :D
No. Read in the line, manipulate it, then write it out to the new file you've previously opened for writing (and have been writing the other lines to, unmodified).
A first thing:
#this would seek over to middle of file DOESN'T WORK
f.seek(0,0)
this is not true. This seeks to the beginning of the file.
To your actual question:
Does anyone know an efficient way to move down to the middle of line 800 in a file, in order to replace one number?
In general, no. You'd need to rewrite the file. For example like this:
# open the file in read-and-update mode
with open("file", 'r+') as f:
# read all lines
lines = f.readlines()
# update 800'th line
my_line = lines[799].split()
my_line[5] = "%s" % my_number # TODO: put in index of number and updated number
lines[799] = " ".join(my_line)
# truncate and rewrite file
f.truncate(0)
f.writelines(lines)
You can do it, if the starting position of the number in the file is predictable (e.g. number_starting_pos = 1234 from the beginning of the file) and the size of the string representation is also predictable (e.g. 20).
Then you could rewrite the number and make sure you fill up the padding with whitespace again to overwrite any content of the previous entry.
Similar to this:
with open("file", 'r+') as f:
# seek to the number starting position
f.seek(number_starting_pos, 0)
# update number field, assuming width (20), arbitrary space-padding allowed
my_number_string = "%19s " % my_number
# make sure the string is indeed exactly of the specific size (it may be longer)
assert len(my_number_string) == 20, "file writing would fail! aborting!"
f.write(my_number_string)
For this to work, you'd need to have a look at the docs of your SAP-thingy, and see if whitespace indeed not matters.
However, both approaches are based on a lot of assumptions. Depending on your use case it may easily break your code, e.g. if a line is inserted or even a characters is inserted before the number field.

Categories