Related
I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)
I am trying to edit a file as follows in python 2.7.10 and running into below error, can anyone provide guidance on what the issue is on how to edit files?
import fileinput,re
filename = 'epivers.h'
text_to_search = re.compile("#define EPI_VERSION_STR \"(\d+\.\d+) (TOB) (r(\d+) ASSRT)\"")
replacement_text = "#define EPI_VERSION_STR \"9.130.27.50.1.2.3 (r749679 ASSRT)\""
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
print(line.replace(text_to_search, replacement_text))
file.close()
Error:-
Traceback (most recent call last):
File "pythonfiledit.py", line 5, in <module>
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
AttributeError: FileInput instance has no attribute '__exit__'
UPDATE:
import fileinput,re
import os
import shutil
import sys
import tempfile
filename = 'epivers.h'
text_to_search = re.compile("#define EPI_VERSION_STR \"(\d+\.\d+) (TOB) (r(\d+) ASSRT)\"")
replacement_text = "#define EPI_VERSION_STR \"9.130.27.50.1.2.3 (r749679 ASSRT)\""
with open(filename) as src, tempfile.NamedTemporaryFile(
'w', dir=os.path.dirname(filename), delete=False) as dst:
# Discard first line
for line in src:
if text_to_search.search(line):
# Save the new first line
line = text_to_search .sub(replacement_text,line)
dst.write(line + '\n')
dst.write(line)
# remove old version
os.unlink(filename)
# rename new version
os.rename(dst.name,filename)
I am trying to match line define EPI_VERSION_STR "9.130.27 (TOB) (r749679 ASSRT)"
If r is a compiled regular expression and line is a line of text, the way to apply the regex is
r.match(line)
to find a match at the beginning of line, or
r.search(line)
to find a match anywhere. In your particular case, you simply need
line = r.sub(replacement, line)
though in addition, you'll need to add a backslash before the round parentheses in your regex in order to match them literally (except in a few places where you apparently put in grouping parentheses around the \d+ for no particular reason; maybe just take those out).
Your example input string contains three digits, and the replacement string contains six digits, so \d+\.\d+ will never match either of those. I'm guessing you want something like \d+(?:\.\d+)+ or perhaps very bluntly [\d.]+ if the periods can be adjacent.
Furthermore, a single backslash in a string will be interpreted by Python, before it gets passed to the regex engine. You'll want to use raw strings around regexes, nearly always. For improved legibility, perhaps also prefer single quotes or triple double quotes over regular double quotes, so you don't have to backslash the double quotes within the regex.
Finally, your usage of fileinput is wrong. You can't use it as a context manager. Just loop over the lines which fileinput.input() returns.
import fileinput, re
filename = 'epivers.h'
text_to_search = re.compile(r'#define EPI_VERSION_STR "\d+(?:\.\d+)+ \(TOB\) \(r\d+ ASSRT\)"')
replacement_text = '#define EPI_VERSION_STR "9.130.27.50.1.2.3 (r749679 ASSRT)"'
for line in fileinput.input(filename, inplace=True, backup='.bak'):
print(text_to_search.sub(replacement_text, line))
In your first attempt, line.replace() was a good start, but it doesn't accept a regex argument (and of course, you don't close() a file you opened with with ...). In your second attempt, you are checking whether the line is identical to the regex, which of course it isn't (just like the string "two" isn't equivalent to the numeric constant 2).
Read the file, use re.sub to substitute, then write the new contents back:
with open(filename) as f:
text = f.read()
new_text = re.sub(r'#define EPI_VERSION_STR "\d+\(?:.\d+\)+ \(TOB\) \(r\d+ ASSRT\)"',
'#define EPI_VERSION_STR "9.130.27.50.1.2.3 (r749679 ASSRT)"',
text)
with open(filename, 'w') as f:
f.write(new_text)
there are multiple files in directory with extension .txt, .dox, .qcr etc.
i need to list out txt files, search & replace the text from each txt files only.
need to search the $$\d ...where \d stands for the digit 1,2,3.....100.
need to replace with xxx.
please let me know the python script for this .
thanks in advance .
-Shrinivas
#created following script, it works for single txt files, but it is not working for txt files more than one lies in directory.
-----
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll("t.read","$$\d","xxx")
t.close()
fileinput.input(...) is supposed to process a bunch of files, and must be ended with a corresponding fileinput.close(). So you can either process all in one single call:
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=True):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
dummy = sys.stdout.write(line) # to avoid a possible output of the size
fileinput.close() # to orderly close everythin
replaceAll(glob.glob('*.txt'), "$$\d","xxx")
or consistently close fileinput after processing each file, but it rather ignores the main fileinput feature.
Try out this.
import re
def replaceAll(file,searchExp,replaceExp):
for line in file.readlines():
try:
line = line.replace(re.findall(searchExp,line)[0],replaceExp)
except:
pass
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll(t,"\d+","xxx")
t.close()
Here we are sending file handler to the replaceAll function rather than a string.
You can try this:
import os
import re
the_files = [i for i in os.listdir("foldername") if i.endswith("txt")]
for file in the_files:
new_data = re.sub("\d+", "xxx", open(file).read())
final_file = open(file, 'w')
final_file.write(new_data)
final_file.close()
Instead on reading each and every line cant we just search for the string in the file and replace it... i am trying but unable to get any idea how to do thth?
file = open(C:\\path.txt,"r+")
lines = file.readlines()
replaceDone=0
file.seek(0)
newString="set:: windows32\n"
for l in lines:
if (re.search("(address_num)",l,re.I) replaceDone==0:
try:
file.write(l.replace(l,newString))
replaceDone=1
except IOError:
file.close()
Here's an example you can adapt that replaces every sequence of '(address_num)' with 'set:: windows32' for a file:
import fileinput
import re
for line in fileinput.input('/home/jon/data.txt', inplace=True):
print re.sub('address_num', 'set:: windows32', line, flags=re.I),
This is not very memory efficient but I guess it is what you are looking for:
import re
text = open(file_path, 'r').read()
open(file_path, 'w').write(re.sub(old_string, new_string, text))
Read the whole file, replace and write back the whole file.
I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)