I'd like to modify some characters of a file in-place, without having to copy the entire content of the file in another, or overwrite the existing one. However, it doesn't seem possible to just replace a character by another:
>>> f = open("foo", "a+") # file does not exist
>>> f.write("a")
1
>>> f.seek(0)
0
>>> f.write("b")
1
>>> f.seek(0)
0
>>> f.read()
'ab'
Here I'd have expected "a" to be replaced by "b", so that the content of the file would be just "b", but this is not the case. Is there a way to do this?
That's because of the mode you're using, in append mode, the file pointer is moved to the end of file before write, you should open your file in w+ mode:
f = open("foo", "w+") # file does not exist
f.write("samething")
f.seek(1)
f.write("o")
f.seek(0)
print f.read() # prints "something"
If you want to do that on an existing file without truncating it, you should open it in r+ mode for reading and writing.
Truncate the file using file.truncate first:
>>> f = open("foo", "a+")
>>> f.write('a')
>>> f.truncate(0) #truncates the file to 0 bytes
>>> f.write('b')
>>> f.seek(0)
>>> f.read()
'b'
Otherwise open the file in w+mode as suggested by #Guillaume.
import fileinput
for line in fileinput.input('abc', inplace=True):
line = line.replace('t', 'ed')
print line,
This doesn't replace character by character, instead it scans through each line replacing required character and writes the modified line.
For example:
file 'abc' contains:
i want
to replace
character
After executing, output would be:
i waned
edo replace
characeder
Will it help you? Hope so..
I believe that you may be able to modify the example from this answer.
https://stackoverflow.com/a/290494/1669208
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print line.replace(char1, char2),
Related
I have seen these two ways to process a file:
file = open("file.txt")
for line in file:
#do something
file = open("file.txt")
contents = file.read()
for line in contents:
# do something
I know that in the first case, the file will act like a list, so the for loop iterates over the file as if it were a list. What exactly happens in the second case, where we read the file and then iterate over the contents? What are the consequences of taking each approach, and how should I choose between them?
In the first one you are iterating over the file, line by line. In this scenario, the entire file data is not read into the memory at once; instead, only the current line is read into memory. This is useful for handling very large files, and good for robustness if you don't know if the file is going to be large or not.
In the second one, file.read() returns the complete file data as a string. When you are iterating over it, you are actually iterating over the file's data character by character. This reads the complete file data into memory.
Here's an example to show this behavior.
a.txt file contains
Hello
Bye
Code:
>>> f = open('a.txt','r')
>>> for l in f:
... print(l)
...
Hello
Bye
>>> f = open('a.txt','r')
>>> r = f.read()
>>> print(repr(r))
'Hello\nBye'
>>> for c in r:
... print(c)
...
H
e
l
l
o
B
y
e
The second case reads in the contents of the file into one big string. If you iterate over a string, you get each character in turn. If you want to get each line in turn, you can do this:
for line in contents.split('\n'):
# do something
Or you can read in the contents as a list of lines using readlines() instead of read().
with open('file.txt','r') as fin:
lines = fin.readlines()
for line in lines:
# do something
I want to insert a line into the middle of a text file in Python, so I tried
with open(erroredFilepath, 'r+t') as erroredFile:
fileContents = erroredFile.read()
if 'insert_here' in fileContents:
insertString.join(fileContents.rsplit('insert_here'))
erroredFile.truncate()
erroredFile.write(insertString)
However, insertString got written at the end of the file. Why?
As an aside, I tried to simply things, by just using strings, instead of files.
'123456789'.join('qwertyuiop'.split('y'))
gives
'qwert123456789uiop'
what happened to the 'y' ?
If you want to write in the middle of the file use the fileinput module.
import fileinput
for line in fileinput.input(erroredFilepath, inplace=True):
print("something", end ="")
from the docs
if the keyword argument inplace=True is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently).
Whatever you print will go in the file. So you have to read and print every line and modify whichever you want to replace. Also, when printing existing lines, use end="" as it will prevent print from adding an extra newline.
Although OS-level details of files vary, in general, when you have a file open in r+ mode and do some read or write operation, the "current position" is left after the last read or write.
When you did:
fileContents = erroredFile.read()
the stream erroredFile was read to the end, so the current position is now "at the end".
The truncate function defaults to using the current position as the size to which to truncate. Assume the file is 100 bytes long, so that the current position "at the end" is byte 100. Then:
erroredFile.truncate()
means "make the file 100 bytes long"—which it already is.
The current position remains at the end of the file, so the subsequent write appends.
Presumably you wanted to seek back to the beginning of the file, and/or use truncate(0) (note that just truncate(0) will, at least on Unix-like systems, leave the seek position at the end of the file so that the next write leaves a hole where the original data used to be). You could also be slightly more clever: if you're inserting, just overwrite-and-extend in place (no truncate is required at all).
(Joel Hinz already answered the second question, I see.)
test.txt
a
b
c
d
e
1. Read into a list then overwrite
def match_then_insert(filename, match, content):
lines = open(filename).read().splitlines()
index = lines.index(match)
lines.insert(index, content)
open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')
Result
a
b
123
c
d
e
2. FileInput
from fileinput import FileInput
def match_then_insert(filename, match, content):
for line in FileInput(filename, inplace=True):
if match in line:
line = content + '\n' + line
print(line, end='') # Redirect to the original file
match_then_insert('test.txt', match='c', content='123')
3. seek
def match_then_insert(filename, match, content):
with open(filename, mode='rb+') as f:
while True:
try:
line = f.readline()
except IndexError:
break
line_str = line.decode().splitlines()[0]
if line_str == match:
f.seek(-len(line), 1)
rest = f.read()
f.seek(-len(rest), 1)
f.truncate()
content = content + '\n'
f.write(content.encode())
f.write(rest)
break
match_then_insert('test.txt', match='c', content='123')
Compare
Method
Time/s
Read into a list then overwrite
54.42
FileInput
121.59
seek
3.53
from timeit import timeit
from fileinput import FileInput
def init_txt():
open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
lines = open(filename).read().splitlines()
index = lines.index(match)
lines.insert(index, content)
open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
for line in FileInput(filename, inplace=True):
if match in line:
line = content + '\n' + line
print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
with open(filename, mode='rb+') as f:
while True:
try:
line = f.readline()
except IndexError:
break
line_str = line.decode().splitlines()[0]
if line_str == match:
f.seek(-len(line), 1)
rest = f.read()
f.seek(-len(rest), 1)
f.truncate()
content = content + '\n'
f.write(content.encode())
f.write(rest)
break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))
Not a Python answer but it may widen your horizon. Use sed:
$ cat input.txt
foo
bar
baz
INSERT HERE
qux
quux
$ sed '/INSERT HERE/anew stuff' < input.txt
foo
bar
baz
INSERT HERE
new stuff
qux
quux
The command a will append the text on a new line. If you want to insert the text before the match, use the command i:
$ sed '/INSERT HERE/inew stuff' < input.txt
foo
bar
baz
new stuff
INSERT HERE
qux
quux
Why do you not try a two steps solution? At first, you read and fix the string, at the second step, you rewrite the file. Probably it's not the most efficient algorithm, but I think it works.
with open(erroredFilepath, 'r') as erroredFile:
fileContents = erroredFile.read()
fileContents.replace('insert_here', 'insert_string')
with open(erroredFilePath, 'w') as fixingFile:
fixingFile.write(fileContents)
The following simple code reads a CSV file and returns the number of lines of the file. As you can see in the output, the file has 501 lines.
>>> import codecs
>>> f = codecs.open("tmp.csv", "r", "utf_8")
>>> print len(f.readlines())
501
But if I insert a readline() before using readlines(), the latter does not reach at the end of the file.
>>> import codecs
>>> f = codecs.open("tmp.csv", "r", "utf_8")
>>> f.readline()
>>> print len(f.readlines())
1
Is there any basic mistake in my code? How can I mix readline() and readlines()? (actually I don't need to mix these two functions in my real program, but I am just curious...)
You can download the file at
https://dl.dropboxusercontent.com/u/16653989/tmp/tmp.csv
This has something to do with the codecs module. Because when you do the same thing with the regular python open statement, it works as expected:
f = open('tmp.csv')
f.readline()
>>> print len(f.readlines())
500
I have a text file with first line of unicode characters and all other lines in ASCII.
I try to read the first line as one variable, and all other lines as another. However, when I use the following code:
# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
I get the following output:
<open file '1.txt', mode 'rb' at 0x01235230>
28
7
And now for something completely differerent:
<open file '1.txt', mode 'r' at 0x017875A0>
28
77
If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().
Why does such thing happen?
And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687
Because you used .readline() first, the codecs.open() file has filled a linebuffer; the subsequent call to .readlines() returns only the buffered lines.
If you call .readlines() again, the rest of the lines are returned:
>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
The work-around is to not mix .readline() and .readlines():
f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use io.open() instead of codecs.open(); the io library is what Python 3 uses to implement the built-in open() function and is a lot more robust and versatile than the codecs module.
Python learner. So please excuse me.
I am following: http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
I want to read a file; here is my file:
# cat test
line1 word1
line2 word2
line3 word3
line4 word4
and here it my code:
>>> f = open ('test')
>>> for line in f:
... print f
...
<open file 'test', mode 'r' at 0xb7729180>
<open file 'test', mode 'r' at 0xb7729180>
<open file 'test', mode 'r' at 0xb7729180>
<open file 'test', mode 'r' at 0xb7729180>
How and why i am getting above output? I was hoping that it will print each line per line.
What am I missing here. looking at the link mentioned above, my syntax seems to be OK but the output is not
Thanks.
During the iteration, you are printing f instead of the line variable.
>>> f = open ('test')
>>> for line in f:
... print line
You are printing the file handle, replace print f with print line:
f = open ('test')
for line in f:
print line
for line in open('test')
print line
You have to read the lines with a readline function. Then it turns each line into a list value. That is when you can do your for statement.
for line in f:
This means 'iterate through the contents of f, assigning each item to the variable named 'line''. Not the converse. f won't change.. it's what you are getting the data from, not the variable you are assigning it to.
That said, the other answer I saw.. from someone whose name starts with 'D'.. is correct: you want to print (line), not print(f). I write that with ()'s because it's a good habit to get into, it makes your code compatible with both Python 2.x and Python 3.x.