Replace part of string using python regular expression - python

I have the following lines (many, many):
...
gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567
..
What I'd like to do is to find line 'particular' (whatever number is after ':')
and replace this number with '111222333'. How can I do that using python regular expressions ?

for line in input:
key, val = line.split(':')
if key == 'particular':
val = '111222333'
I'm not sure regex would be of any value in this specific case. My guess is they'd be slower. That said, it can be done. Here's one way:
for line in input:
re.sub('^particular : .*', 'particular : 111222333')
There are subtleties involved in this, and this is almost certainly not what you'd want in production code. You need to check all of the re module constants to make sure the regex is acting the way you expect, etc. You might be surprised at the flexibility you find in dealing with problems like this in Python if you try not to use re (of course, this isn't to say re isn't useful) ;-)

Sure you need a regular expression?
other_number = '111222333'
some_text, some_number = line.split(': ')
new_line = ': '.join(some_text, other_number)

#!/usr/bin/env python
import re
text = '''gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567'''
print(re.sub('[0-9]+', '111222333', text))

input = """gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567"""
entries = re.split("\n+", input)
for entry in entries:
if entry.startswith("particular"):
entry = re.sub(r'[0-9]+', r'111222333', entry)
or with sed:
sed -e 's/^particular: [0-9].*$/particular: 111222333/g' file

An important point here is that if you have a lot of lines, you want to process them one by one. That is, instead of reading all the lines in replacing them, and writing them out again, you should read in a line at a time and write out a line at a time. (This would be inefficient if you were actually reading a line at a time from the disk; however, Python's IO is competent and will buffer the file for you.)
with open(...) as infile, open(...) as outfile:
for line in infile:
if line.startswith("particular"):
outfile.write("particular: 111222333")
else:
outfile.write(line)
This will be speed- and memory-efficient.

Your sed example forces me to say neat!
python -c "import re, sys; print ''.join(re.sub(r'^(particular:) \d+', r'\1 111222333', l) for l in open(sys.argv[1]))" file

Related

Find TM superscript in python 2 using regex

My text file includes "SSS™" as one of its words and I am trying to find it using regular expression. My problem is with finding ™ superscript. My code is:
import re
path='G:\python_code\A.txt'
f_general=open(path, 'r')
special=re.findall(r'\U2122',f_general.read())
print(special)
but it doesn't print anything. How can I fix it?
It may have to do with the encoding of your file. Try this:
import re
path = "g:\python_code\A.txt"
f_general=open(path, "r", encoding="UTF-16")
data = f_general.read()
special=re.findall(chr(8482), data)
print(special)
print(chr(8482))
Note I'm using the decimal value for Trade mark. This is the site I use:
https://www.ascii.cl/htmlcodes.htm
So, open the file you have in notepad. Do a save as and choose encoding unicode and this should all work. Working with extended ascii can be a hassle. I am using Python 3.6 but I think this should still work in 2.x
Note when it prints out the chr(8482) in your command line it will probably just be a T, at least that is what I get in windows.
update
Try this for Python 2 and this should capture the word before trademark:
import re
with open("g:\python_code\A.txt", "rb") as f:
data = f.read().decode("UTF-16")
regex = re.compile("\S+" + chr(8482))
match = re.search(regex, data)
if match:
print (match.group(0))

Notepad ++ Python code to add a value to numbers matching a pattern

I have a notepad++ file with contents like below
{s:11:"wpseo_title";s:42:"Web Designing training institutes in Kochi";}i:357;a:1:
{s:11:"wpseo_title";s:32:"CSS training institutes in Kochi";}i:358;a:1:
{s:11:"wpseo_title";s:34:"HTML5 training institutes in Kochi";}i:359;a:1:
{s:11:"wpseo_title";s:39:"JavaScript training institutes in Kochi";}i:360;a:1:
{s:11:"wpseo_title";s:32:"XML training institutes in Kochi";}}}
I need a way to search for the phrase ";s:42:" and increment the number part of the phrase by 1. In this case, 42 will become 43.
I just need to do it. Dont care if it is through a python script like this
Notepad++ Regular Expression add up numbers
or any other method.
Please help me. I am new to python/ any such language.
Perl one-liner version:
perl -ne 's/(?<=;s:)(\d+)(?=:)/$1+1/ge; print' data.txt
Using your link as an example, the regex match should be:
def calculate(linenumber, match):
editor.pyreplace(match.group(0), ';s:%d="%d"' % (match.group(1), str(int(match.group(2))+1)), 0, 0, linenumber, linenumber)
editor.pysearch(r';s:([0-9])="([0-9]+)"', calculate)
I think. I've never actually done this before!
target_file = "some_file.txt"
with open("tmp_out.txt","w") as f_out:
with open(target_file) as f_in:
for line in f_in:
f_out.write(re.sub(";s:(\d+):",lambda match:";s:%d:"%(int(match.groups(0)[0])+1,line) + "\n")
shutil.move("tmp_out.txt",target_file)
something like that I think
or even better
import fileinput
for line in fileinput.input(target_file, inplace=True):
print re.sub(";s:(\d+):",lambda match:";s:%d:"%(int(match.groups(0)[0])+1,line)
Though, one liner in Python is difficult to achieve, but you can use something similar using the callback feature of re.sub
repl = lambda e: e.group(1) + str(int(e.group(2))+1)
with open("in.txt") as fin:
with open("out.txt","w") as fout:
fout.write(re.sub(r"(;s:)(\d+):",repl, fin.read()))

how to filter out lines between two timestamps in python

I have following issue, i have a log file that i want to read line by line, but to reduce the lines i want to filter out the lines that are between two timestamps!
example in awk:
find all between two patterns: pattern1 = 2012-10-23 14, pattern2 = 2012-10-23 16
awk '/2012-10-23 14/{P=1;next}/2012-10-23 16/{exit} P' server.log
or with egrep and one pattern:
egrep "2012-10-23 (1[4-6]:[0-5][0-9])" server.log
The above awk line would give me only the lines between those two timestamps.
How can i do it in python without executing any system command or awk, grep..., but only with python regular expression
Thanks in adv.
one to one translation from your awk code:
with open('yourFile') as f:
lines = f.read().splitlines()
for l in lines:
if l.startswith('2012-10-23 14'):
p=1
elif l.startswith('2012-10-23 16'):
p=0
break
if p: print l
this will start the output when the 1st line starting with 2012-10-23 14 ... matched, and stop printting when the 1st line starting with 2012-10-23 16.. matched. (same as your awk codes)
I think that the #Kent post will work only if we assume that the timestamp is at the beginning of your line. With your AWK / egrep code you ask for something more generic.
Following code should work:
independently where the searched pattern within the line is located
independently on if the lines in the log are properly sorted (though this is highly assumable ;-) )
as non-blocking generator yielding the results as they are being processed without unnecessary memory allocation.
has more generic code construction, in case you want to make further modifications.
import re
def log_lines(yourFile, regexp):
rxp = re.compile(regexp)
with open(yourFile) as f:
for line in f.readlines():
if rxp.search(line):
yield line
for line in log_lines("yourFile", "2012-10-23 1[4-6]"):
print line
Stay with python, it is adictive ;-)

Python replacement of Rubys grep?

abc=123
dabc=123
abc=456
dabc=789
aabd=123
From the above file I need to find lines beginning with abc= (whitespaces doesn't matter)
in ruby I would put this in an array and do
matches = input.grep(/^\s*abc=.*/).map(&:strip)
I'm a totally noob in Python, even said I'm a fresh Python developer is too much.
Maybe there is a better "Python way" of doing this without even grepping ?
The Python version I have available on the platform where I need to solve the problem is 2.6
There is no way of use Ruby at that time
with open("myfile.txt") as myfile:
matches = [line.rstrip() for line in myfile if line.lstrip().startswith("abc=")]
In Python you would typically use a list comprehension whose if clause does what you'd accomplish with Ruby's grep:
import sys, re
matches = [line.strip() for line in sys.stdin
if re.match(r'^\s*abc=.*', line)]

Using grep in python

There is a file (query.txt) which has some keywords/phrases which are to be matched with other files using grep. The last three lines of the following code are working perfectly but when the same command is used inside the while loop it goes into an infinite loop or something(ie doesn't respond).
import os
f=open('query.txt','r')
b=f.readline()
while b:
cmd='grep %s my2.txt'%b #my2 is the file in which we are looking for b
os.system(cmd)
b=f.readline()
f.close()
a='He is'
cmd='grep %s my2.txt'%a
os.system(cmd)
First of all, you are not iterating over the file properly. You can simply use for b in f: without the .readline() stuff.
Then your code will blow in your face as soon as the filename contains any characters which have a special meaning in the shell. Use subprocess.call instead of os.system() and pass an argument list.
Here's a fixed version:
import os
import subprocess
with open('query.txt', 'r') as f:
for line in f:
line = line.rstrip() # remove trailing whitespace such as '\n'
subprocess.call(['/bin/grep', line, 'my2.txt'])
However, you can improve your code even more by not calling grep at all.
Read my2.txt to a string instead and then use the re module to perform the search. In case you do not need a regex at all, you can even simply use if line in my2_content
Your code scans the whole my2.txt file for each query in query.txt.
You want to:
read all queries into a list
iterate once over all lines of the text file and check each file against all queries.
Try this code:
with open('query.txt','r') as f:
queries = [l.strip() for l in f]
with open('my2.txt','r') as f:
for line in f:
for query in queries:
if query in line:
print query, line
This isn't actually a good way to use Python, but if you have to do something like that, then do it correctly:
from __future__ import with_statement
import subprocess
def grep_lines(filename, query_filename):
with open(query_filename, "rb") as myfile:
for line in myfile:
subprocess.call(["/bin/grep", line.strip(), filename])
grep_lines("my2.txt", "query.txt")
And hope that your file doesn't contain any characters which have special meanings in regular expressions =)
Also, you might be able to do this with grep alone:
grep -f query.txt my2.txt
It works like this:
~ $ cat my2.txt
One two
two two
two three
~ $ cat query.txt
two two
three
~ $ python bar.py
two two
two three
$ grep -wFf query.txt my2.txt > out.txt
this will match all the keywords in query.txt with my2.txt file and save the output in out.txt
Read man grep for a description of all the possible arguments.

Categories