Replace all regex matches in a file - python

Consider a basic regex like a(.+?)a. How can one replace all occurences of that regex in a file with the content of the first group?

Use can use the re module to use regular expressions in python and the fileinput module to simply replace text in files in-place
Example:
import fileinput
import re
fn = "test.txt" # your filename
r = re.compile('a(.+?)a')
for line in fileinput.input(fn, inplace=True):
match = r.match(line)
print match.group() if match else line.replace('\n', '')
Before:
hello this
aShouldBeAMatch!!!!! and this should be gone
you know
After:
hello this
aShouldBeAMa
you know
Note: this works because the argument inplace=True causes input file to be moved to a backup file and standard output is directed to the input file, as documented under Optional in-place filtering.

You can use Notepad++ with Version >= 6.0. Since then it does support PCRE Regex.
You can then use your regex a(.+?)a and replace with $1

sed
Are you limited to using Python tools? Because sed works very well.
$ sed -i <filename> "s/a(.+?)a/\1/g"
Vim
In a Vim window, give the following search-and-replace ex command:
:%s/\va(.+?)a/\1/g
Note that many regex characters are escaped in Vim- \v sets "very magic" mode, which removes the need for escaping. The same command with "magic" (the default) is :%s/a\(.\+\?)a/\1/g
Python
If you're looking to do this in Python, BigYellowCactus' answer is excellent (use the re module for regex, and fileinput to modify the file).

Related

Sed one liner not working in python subprocess

I am trying to incorporate this sed command to remove the last comma in a son file.
sed -i -e '1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\1 /' file.json"
when i run this in the command line, it works fine. When i try to run as a subprocess as so it doesn't work.
Popen("sed -e '1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\1 /' file.json",shell=True).wait()
What am I doing wrong?
It doesn't work because when you write \1, python interprets that as \x01 and our regular expression doesn't work / is illegal.
That is already better:
check_call(["sed","-i","-e",r"1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\1 /","file.json"])
because splitting as a real list and passing your regex as a raw string has a better chance to work. And check_call is what you need to just call a process, without caring about its output.
But I would do even better: since python is good at processing files, given your rather simple problem, I would create a fully portable version, no need for sed:
# read the file
with open("file.json") as f:
contents = f.read().rstrip().rstrip(",") # strip last newline/space then strip last comma
# write back the file
with open("file.json","w") as f:
f.write(contents)
In general, you might try the following solutions:
Pass the raw string, as was mentioned
Escape the '\' character.
This code also does what you need:
Popen("sed -e '1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\\1 /' file.json", shell=True).wait()
or
try:
check_call(["sed", "-i", "-e", "1h;1!H;$!d;${s/.*//;x};s/\(.*\),/\\1 /", "file.json"])
except:
pass # or handle the error

Python one-liner to replace one word with another word in text file

I'm trying to use Python (through a Linux terminal) to replace the word 'example' in following line of a text file text_file.txt:
abcdefgh example uvwxyz
What I want is:
abcdefgh replaced_example uvwxyz
Can I do this with a one-liner in Python?
EDIT:
I have a perl one-liner perl -p -i -e 's#example#replaced_example#' text_file.txt but I want to do it in Python too
You can do it:
python -c 'print open("text_file.txt").read().replace("example","replaced_example")'
But it's rather clunky. Python's syntax isn't designed to make nice 1-liners (although frequently it works out that way). Python values clarity above everything else which is one reason you need to import things to get the real powerful tools python has to offer. Since you need to import things to really leverage the power of python, it doesn't lend to creating simple scripts from the commandline.
I would rather use a tool that is designed for this sort of thing -- e.g. sed:
sed -e 's/example/replace_example/g' text_file.txt
Incidentally the fileinput module supports inplace modification just like sed -i
-bash-3.2$ python -c '
import fileinput
for line in fileinput.input("text_file.txt", inplace=True):
print line.replace("example","replace_example"),
'

Python replacement of Rubys grep?

abc=123
dabc=123
abc=456
dabc=789
aabd=123
From the above file I need to find lines beginning with abc= (whitespaces doesn't matter)
in ruby I would put this in an array and do
matches = input.grep(/^\s*abc=.*/).map(&:strip)
I'm a totally noob in Python, even said I'm a fresh Python developer is too much.
Maybe there is a better "Python way" of doing this without even grepping ?
The Python version I have available on the platform where I need to solve the problem is 2.6
There is no way of use Ruby at that time
with open("myfile.txt") as myfile:
matches = [line.rstrip() for line in myfile if line.lstrip().startswith("abc=")]
In Python you would typically use a list comprehension whose if clause does what you'd accomplish with Ruby's grep:
import sys, re
matches = [line.strip() for line in sys.stdin
if re.match(r'^\s*abc=.*', line)]

python system call

having a difficult time understanding how to get python to call a system function...
the_file = ('logs/consolidated.log.gz')
webstuff = subprocess.Popen(['/usr/bin/zgrep', '/meatsauce/', the_file ],stdout=subprocess.PIPE) % dpt_search
for line in webstuff.stdout:
print line
Trying to get python to build another file with my search string.
Thanks!
I recommend the PyMotW Subprocess page from Doug Hellmann who (quoted) "Reads the docs so you don't have to"
Apart from that:
f = file('sourcefile')
for line in f:
if 'pattern' in line:
# mind the , at the end,
# since there's no stripping involved
# and print adds a newline without it
print line,
if you need to match regular expressions apart from the documentation in the Python Standard Library documentation for the re module also refer to the PyMotW Regular Expression page

Replace part of string using python regular expression

I have the following lines (many, many):
...
gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567
..
What I'd like to do is to find line 'particular' (whatever number is after ':')
and replace this number with '111222333'. How can I do that using python regular expressions ?
for line in input:
key, val = line.split(':')
if key == 'particular':
val = '111222333'
I'm not sure regex would be of any value in this specific case. My guess is they'd be slower. That said, it can be done. Here's one way:
for line in input:
re.sub('^particular : .*', 'particular : 111222333')
There are subtleties involved in this, and this is almost certainly not what you'd want in production code. You need to check all of the re module constants to make sure the regex is acting the way you expect, etc. You might be surprised at the flexibility you find in dealing with problems like this in Python if you try not to use re (of course, this isn't to say re isn't useful) ;-)
Sure you need a regular expression?
other_number = '111222333'
some_text, some_number = line.split(': ')
new_line = ': '.join(some_text, other_number)
#!/usr/bin/env python
import re
text = '''gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567'''
print(re.sub('[0-9]+', '111222333', text))
input = """gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567"""
entries = re.split("\n+", input)
for entry in entries:
if entry.startswith("particular"):
entry = re.sub(r'[0-9]+', r'111222333', entry)
or with sed:
sed -e 's/^particular: [0-9].*$/particular: 111222333/g' file
An important point here is that if you have a lot of lines, you want to process them one by one. That is, instead of reading all the lines in replacing them, and writing them out again, you should read in a line at a time and write out a line at a time. (This would be inefficient if you were actually reading a line at a time from the disk; however, Python's IO is competent and will buffer the file for you.)
with open(...) as infile, open(...) as outfile:
for line in infile:
if line.startswith("particular"):
outfile.write("particular: 111222333")
else:
outfile.write(line)
This will be speed- and memory-efficient.
Your sed example forces me to say neat!
python -c "import re, sys; print ''.join(re.sub(r'^(particular:) \d+', r'\1 111222333', l) for l in open(sys.argv[1]))" file

Categories