Read in file - change contents - write out to same file - python

I have to read in a file, change a sections of the text here and there, and then write out to the same file.
Currently I do:
f = open(file)
file_str = f.read() # read it in as a string, Not line by line
f.close()
#
# do_actions_on_file_str
#
f = open(file, 'w') # to clear the file
f.write(file_str)
f.close()
But I would imagine that there is a more pythonic approach that yields the same result.
Suggestions?

That looks straightforward, and clear already. Any suggestion depends on how big the files are. If not really huge that looks fine. If really large, you could process in chunks.
But you could use a context manager, to avoid the explicit closes.
with open(filename) as f:
file_str = f.read()
# do stuff with file_str
with open(filename, "w") as f:
f.write(file_str)

If you work line by line you can use fileinput with inplace mode
import fileinput
for line in fileinput.input(mifile, inplace=1):
print process(line)
if you need to process all the text at once, then your code can be optimized a bit using with that takes care of closing the file:
with open(myfile) as f:
file_str = f.read()
#
do_actions_on_file_str
#
with open(myfile, 'w') as f:
f.write(file_str)

Related

Meditation with texts in a text file in the case of threading

iam using this code to to pull the first line at text file at threading mod before delete it from the file
with open(r'C:\datanames\names.txt','r') as fin:
name = fin.readline()
with open(r'C:\datanames\names.txt', 'r') as fin:
data = fin.read().splitlines(True)
with open(r'C:\datanames\names.txt', 'w') as fout:
fout.writelines(data[1:])
put it make me lose the data Often
Is there a more efficient and practical way to use it in such a situation? (threading)
I see no reason to use threading for this. It's very straightforward.
To remove the first line from a file do this:
FILENAME = 'foo.txt'
with open(FILENAME, 'r+') as file:
lines = file.readlines()
file.seek(0)
file.writelines(lines[1:])
file.truncate()

os.write() appends file instead of overwriting, but O_APPEND isn't used [duplicate]

I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?
You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html
file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.
Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()
import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.
See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()
in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)
Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))

Replace newlines with a space in all files in a directory - Python

I have about 4000 txt files in a directory. I'd like to replace newlines with spaces in each file using a for loop. Actually, the script works for that purpose but when I save the file, it doesn't get saved or it gets saved with newlines again. Here is my script;
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.write(data)
As I said I'm able to replace the newlines with a space, but at the end, it doesn't get saved. I also don't get any errors.
To further elaborate my comment ("It's almost always a bad idea to open a file in the 'r+' mode (because of the way the current position is handled). Open a file for reading, read the data, replace the newlines, open the same file file for writing, write the data"):
for file in glob.glob(path):
with open(file) as f:
data = f.read().replace('\n', ' ')
with open(file, "w") as f:
f.write(data)
You need to reset file position to 0 with seek and then truncate the leftover with truncate after you finishing writing the replacement string.
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.seek(0)
f.write(data)
f.truncate()

Reading and writing to a file

I have an XML file that contains an illegal character, I am iterating through the file, removing the character from all of the lines and storing the lines in a list. I now want to write those same lines back into the file and overwrite what is already there.
I tried this:
file = open(filename, "r+")
#do stuff
Which is only appending the results to the end of the file, I would like to overwrite the existing file.
And this:
file = open(filename, "r")
#read from the file
file.close()
file = open(filename, "w")
#write to file
file.close()
This gives me a Bad File Descriptor error.
How can i read and write to the same file?
Thanks
You could re-write the lines list with writelines function.
with open(filename, "r") as f:
lines = f.readlines()
#edit lines here
with open(filename, "w") as f:
f.writelines(lines)
The reason you're appending to the end of the file the whole time is that you need to seek to the beginning of the file to write your lines out.
with open(filename, "r+") as file:
lines = file.readlines()
lines = [line.replace(bad_character, '') for line in lines]
file.seek(0)
file.writelines(lines)
file.truncate() # Will get rid of any excess characters left at the end of the file due to the length of your new file being shorter than the old one, as you've removed characters.
(Decided to just use the context manager syntax myself.)

Remove lines from a text file which do not contain a certain string with python

I am trying to form a quotes file of a specific user name in a log file. How do I remove every line that does not contain the specific user name in it? Or how do I write all the lines which contain this user name to a new file?
with open('input.txt', 'r') as rfp:
with open('output.txt', 'w') as wfp:
for line in rfp:
if ilikethis(line):
wfp.write(line)
with open(logfile) as f_in:
lines = [l for l in f_in if username in l]
with open(outfile, 'w') as f_out:
f_out.writelines(lines)
Or if you don't want to store all the lines in memory
with open(logfile) as f_in:
lines = (l for l in f_in if username in l)
with open(outfile, 'w') as f_out:
f_out.writelines(lines)
I sort of like the first one better but for a large file, it might drag.
Something along this line should suffice:
newfile = open(newfilename, 'w')
for line in file(filename, 'r'):
if name in line:
newfile.write(line)
newfile.close()
See : http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects
f.readlines() returns a list containing all the lines of data in the file.
An alternative approach to reading lines is to loop over the file object. This is memory efficient, fast, and leads to simpler code
>>> for line in f:
print line
Also you can checkout the use of with keyword. The advantage that the file is properly closed after its suite finishes
>>> with open(filename, 'r') as f:
... read_data = f.read()
>>> f.closed
True
I know you asked for python, but if you're on unix this is a job for grep.
grep name file
If you're not on unix, well... the answer above does the trick :)

Categories