String replacement on a whole text file in Python 3.x? - python

How can I replace a string with another string, within a given text file. Do I just loop through readline() and run the replacement while saving out to a new file? Or is there a better way?
I'm thinking that I could read the whole thing into memory, but I'm looking for a more elegant solution...
Thanks in advance

fileinput is the module from the Python standard library that supports "what looks like in-place updating of text files" as well as various other related tasks.
for line in fileinput.input(['thefile.txt'], inplace=True):
print(line.replace('old stuff', 'shiny new stuff'), end='')
This code is all you need for the specific task you mentioned -- it deals with all of the issues (writing to a different file, removing the old one when done and replacing it with the new one). You can also add a further parameter such as backup='.bk' to automatically preserve the old file as (in this case) thefile.txt.bk, as well as process multiple files, take the filenames to process from the commandline, etc, etc -- do read the docs, they're quite good (and so is the module I'm suggesting!-).

If the file can be read into memory at once, I'd say that
old = myfile.read()
new = old.replace("find this", "replace by this")
output.write(new)
is at least as readable as
for line in myfile:
output.write(line.replace("find this", "replace by this"))
and it might be a little faster, but in the end it probably doesn't really matter.

Related

How can I read four specific lines of a file without reading the whole file in python?

I need to read 4 specific lines of a file in python. I don't want to read all the file and then get four out of it ( for the sake of menory). Does anyone know how to do that?
Thanks!
P. S. I used the following code but apparently it reads all the file and then take 4 out of it.
a=open("file", "r")
b=a.readlines() [c:d]
you have to read at least to the lines you are interested in ... you can use islice to grab a slice
interesting_lines = list(itertools.islice(a,c,d))
but it still reads up to those lines
Files, at least on Macs and Windows and Linux and other UNIXy systems, are just streams of bytes; there's no concept of "line" in the file structure, just bytes that happen to represent newline characters. So the only way to find the Nth line in the file is to start at the beginning and read until you've found (N-1) newlines. You don't have to store all the content you scan through, but you do have to read it.
Then you have to read and store from that point until you find 4 more newlines.
You can do this in Python, but it's not clear to me that it's a win compared to using the straightforward approach that reads more than it needs to; feels like premature optimization to me.

How to copy a JSON file in another JSON file, with Python

I want to copy the contents of a JSON file in another JSON file, with Python
Any ideas ?
Thank you :)
Given the lack of research effort, I normally wouldn't answer, but given the poor suggestions in comments, I'll bite and give a better option.
Now, this largely depends on what you mean, do you wish to overwrite the contents of one file with another, or insert? The latter can be done like so:
with open("from.json", "r") as from, open("to.json", "r") as to:
to_insert = json.load(from)
destination = json.load(to)
destination.append(to_insert) #The exact nature of this line varies. See below.
with open("to.json", "w") as to:
json.dump(to, destination)
This uses python's json module, which allows us to do this very easily.
We open the two files for reading, then open the destination file again in writing mode to truncate it and write to it.
The marked line depends on the JSON data structure, here I am appending it to the root list element (which could not exist), but you may want to place it at a particular dict key, or somesuch.
In the case of replacing the contents, it becomes easier:
with open("from.json", "r") as from, open("to.json", "w") as to:
to.write(from.read())
Here we literally just read the data out of one file and write it into the other file.
Of course, you may wish to check the data is JSON, in which case, you can use the JSON methods as in the first solution, which will throw exceptions on invalid data.
Another, arguably better, solution to this could also be shutil's copy methods, which would avoid actually reading or writing the file contents manually.
Using the with statement gives us the benefit of automatically closing our files - even if exceptions occur. It's best to always use them where we can.
Note that in versions of Python before 2.7, multiple context managers are not handled by the with statement, instead you will need to nest them:
with open("from.json", "r") as from:
with open("to.json", "r+") as to:
...

How can I alter a file and write only the changes to disk - basically, sed (python)?

Let's say I have a file /etc/conf1
it's contents are along the lines of
option = banana
name = monkey
operation = eat
and let's say I want to replace "monkey" with "ostrich". How can I do that without reading the file to memory, altering it and then just writing it all back? Basically, how can I modify the file "in place"?
You can't. "ostrich" is one letter more than "monkey", so you'll have to rewrite the file at least from that point onwards. File systems do not support "shifting" file contents upwards or downwards.
If it's just a small file, there's no reason to bother with even this, and you might as well rewrite the whole file.
If it's a really large file, you'll need to reconsider the internal design of the file's contents, for example, with a block-based approach.
You should look at the fileinput module:
http://docs.python.org/library/fileinput.html
There's an option to perform inplace editing via the input method:
http://docs.python.org/library/fileinput.html#fileinput.input
UPDATE - example code:
import fileinput
import re
import sys
for line in fileinput.input(inplace=True):
sys.stdout.write(re.sub(r'monkey', 'ostrich', line))
Using sys.stdout.write so as not to add any extra newlines in.
It depends on what you mean by "in place". How can you do it if you want to replace monkey with supercalifragilisticexpialidocious? Do you want to overwrite the remaining file? If not, you are going to have to read ahead and shift subsequent contents of the file forwards.
CPU instructions operate on data which come from memory.
The portion of the file you wish to read must be resident in memory before you can read it; before you write anything to disk, that information must be in memory.
The whole file doesn't have to be there at once, but to do a search-replace on an entire file, every character of the file will pass through RAM at some point.
What you're probably looking for is something like the mmap() system call. The above fileinput module sounds like a plausible thing to use.
In-place modifications are only easy if you don't alter the size of the file or only append to it. The following example replaces the first byte of the file by an "a" character:
fd = os.open("...", os.O_WRONLY | os.O_CREAT)
os.write(fd, "a")
os.close(fd)
Note that Python's file objects don't support this, you have to use the low-level functions. For appending, open file file with the open() function in "a" mode.
sed -i.bak '/monkey$/newword/' file

Reading a csv file in Python with different line terminator

I have a file in CSV format where the delimiter is the ASCII unit separator ^_ and the line terminator is the ASCII record separator ^^ (obviously, since these are nonprinting characters, I've just used one of the standard ways of writing them here). I've written plenty of code that reads and writes CSV files, so my issue isn't with Python's csv module per se. The problem is that the csv module doesn't support reading (but it does support writing) line terminators other than a carriage return or line feed, at least as of Python 2.6 where I just tested it. The documentation says that this is because it's hard coded, which I take to mean it's done in the C code that underlies the module, since I didn't see anything in the csv.py file that I could change.
Does anyone know a way around this limitation (patch, another CSV module, etc.)? I really need to read in a file where I can't use carriage returns or new lines as the line terminator because those characters will appear in some of the fields, and I'd like to avoid writing my own custom reader code if possible, even though that would be rather simple to meet my needs.
Why not supply a custom iterable to the csv.reader function? Here is a naive implementation which reads the entire contents of the CSV file into memory at once (which may or may not be desirable, depending on the size of the file):
def records(path):
with open(path) as f:
contents = f.read()
return (record for record in contents.split('^^'))
csv.reader(records('input.csv'))
I think that should work.

Python Script to find instances of a set of strings in a set of files

I have a file which I use to centralize all strings used in my application. Lets call it Strings.txt;
TITLE="Title"
T_AND_C="Accept my terms and conditions please"
START_BUTTON="Start"
BACK_BUTTON="Back"
...
This helps me with I18n, the issue is that my application is now a lot larger and has evolved. As such a lot of these strings are probably not used anymore. I want to eliminate the ones that have gone and tidy up the file.
I want to write a python script, using regular expressions I can get all of the string aliases but how can I search all files in a Java package hierarchy for an instance of a string? If there is a reason I use use perl or bash then let me know as I can but I'd prefer to stick to one scripting language.
Please ask for clarification if this doesn't make sense, hopefully this is straightforward, I just haven't used python much.
Thanks in advance,
Gav
Assuming the files are of reasonable size (as source files will be) so you can easily read them in memory, and that you're looking for the parts in quotes right of the = signs:
import collections
files_by_str = collections.defaultdict(list)
thestrings = []
with open('Strings.txt') as f:
for line in f:
text = line.split('=', 1)[1]
text = text.strip().replace('"', '')
thestrings.append(text)
import os
for root, dirs, files in os.walk('/top/dir/of/interest'):
for name in files:
path = os.path.join(root, name)
with open(path) as f:
data = f.read()
for text in thestrings:
if text in data:
files_by_str[text].append(path)
break
This gives you a dict with the texts (those that are present in 1+ files, only), as keys, and lists of the paths to the files containing them as values. If you care only about a yes/no answer to the question "is this text present somewhere", and don't care where, you can save some memory by keeping only a set instead of the defaultdict; but I think that often knowing what files contained each text will be useful, so I suggest this more complete version.
to parse your strings.txt you don't need regular expressions:
all_strings = [i.partition('=')[0] for i in open('strings.txt')]
to parse your source you could use the dumbest regex:
re.search('\bTITLE\b', source) # for each string in all_strings
to walk the source directory you could use os.walk.
Successful re.search would mean that you need to remove that string from the all_strings: you'll be left with strings that needs to be removed from strings.txt.
You might consider using ack.
% ack --java 'search_string'
This will search under the current directory.
You should consider using YAML: easy to use, human readable.
You are re-inventing gettext, the standard for translating programs in the Free Software sphere (even outside python).
Gettext works with, in principle, large files with strings like these :-). Helper programs exist to merge in new marked strings from the source into all translated versions, marking unused strings etc etc. Perhaps you should take a look at it.

Categories