String Operation on captured group in re Python

String Operation on captured group in re Python - python

I have a string:
str1 = "abc = def"
I want to convert it to:
str2 = "abc = #Abc#"
I am trying this:
re.sub("(\w+) = (\w+)",r"\1 = %s" % ("#"+str(r"\1").title()+"#"),str1)
but it returns: (without the string operation done)
"abc = #abc#"
What is the possible reason .title() is not working.?
How to use string operation on the captured group in python?

You can see what's going on with the help of a little function:
import re
str1 = "abc = def"
def fun(m):
print("In fun(): " + m)
return m
str2 = re.sub(r"(\w+) = (\w+)",
r"\1 = %s" % ("#" + fun(r"\1") + "#"),
# ^^^^^^^^^^
str1)
Which yields
In fun(): \1
So what you are basically trying to do is to change \1 (not the substitute!) to an uppercase version which obviously remains \1 literally. The \1 is replaced only later with the captured content than your call to str.title().
Go with a lambda function as proposed by #Rakesh.

Try using lambda.
Ex:
import re
str1 = "abc = def"
print( re.sub("(?P<one>(\w+)) = (\w+)",lambda match: r'{0} = #{1}#'.format(match.group('one'), match.group('one').title()), str1) )
Output:
abc = #Abc#

Related

nextLine().split("\\s+") converted to python

Can someone explain to me what
nextLine().split("\\s+")
does and how would I convert that to python?
Thanks
i wanted to use it but its in java

split takes an input string, possibly a regular expression (in your case) and uses the regex as a delimiter. Here, the regex is simply \s+ (the extra backslash is to escape the string), where \s denotes any sort of white space and + means "one or more", so basically, if I have the string "Hello world ! ." you will have the output ["Hello", "world", "!", "."].
In Python, you need to use the re library for this functionality:
re.split(r"\s+", input_str)
Or, just for this specific case (as #Kurt pointed out), input_str.split() will do the trick.

The nextLine() is used to read user input, and split("\\s+") will split it to a bunch of elements based on a specific delimiter, and for this case the delim is a regex \\s+.
The equivalent of it in python is this, by using the :
import re
s = input()
sub_s = re.split(r"\s+", s)
# hello and welcome everyone
# ['hello', 'and', 'welcome', 'everyone']

code in java
import java.util.*;
public class MyClass {
public static void main(String args[]) {
String s = "Hello my Wonderful\nWorld!";
// nextLine()
Scanner scanner = new Scanner(s);
System.out.println("'" + scanner.nextLine() + "'");
System.out.println("'" + scanner.nextLine() + "'");
scanner.close();
// nextLine().split("\\s+")
scanner = new Scanner(s);
String str[] = scanner.nextLine().split("\\s+");
System.out.println("*" + str[2] + "*");
scanner.close();
}
}
python
s = "Hello my Wonderful\nWorld!";
o = s.split("\n")
print ("'" + o[0] + "'")
print ("'" + o[1] + "'")
'''
resp. use of
i = s.find('\n')
print (s[:i])
print (s[i+1:])
e.g.
def get_lines(str):
start = 0
end = 0
sub = '\n'
while True:
end = str.find(sub, start)
if end==-1:
yield str[start:]
return
else:
yield str[start:end]
start = end + 1
i = iter(get_lines(s))
print ("'" + next (i) + "'")
print ("'" + next (i) + "'")
'''
o = s.split()
print ("*" + o[2] + "*")
output
'Hello my Wonderful'
'World!'
*Wonderful*

Removing words with special characters "\" and "/"

During the analysis of tweets, I run in the "words" that have either \ or / (could have more than one appearance in one "word"). I would like to have such words removed completely but can not quite nail this
This is what I tried:
sen = 'this is \re\store and b\\fre'
sen1 = 'this i\s /re/store and b//fre/'
slash_back = r'(?:[\w_]+\\[\w_]+)'
slash_fwd = r'(?:[\w_]+/+[\w_]+)'
slash_all = r'(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))'
strt = re.sub(slash_back,"",sen)
strt1 = re.sub(slash_fwd,"",sen1)
strt2 = re.sub(slash_all,"",sen1)
print strt
print strt1
print strt2
I would like to get:
this is and
this i\s and
this and
However, I receive:
and
this i\s / and /
i\s /re/store b//fre/
To add: in this scenario the "word" is a string separated either by spaces or punctuation signs (like a regular text)

How's this? I added some punctuation examples:
import re
sen = r'this is \re\store and b\\fre'
sen1 = r'this i\s /re/store and b//fre/'
sen2 = r'this is \re\store, and b\\fre!'
sen3 = r'this i\s /re/store, and b//fre/!'
slash_back = r'\s*(?:[\w_]*\\(?:[\w_]*\\)*[\w_]*)'
slash_fwd = r'\s*(?:[\w_]*/(?:[\w_]*/)*[\w_]*)'
slash_all = r'\s*(?:[\w_]*[/\\](?:[\w_]*[/\\])*[\w_]*)'
strt = re.sub(slash_back,"",sen)
strt1 = re.sub(slash_fwd,"",sen1)
strt2 = re.sub(slash_all,"",sen1)
strt3 = re.sub(slash_back,"",sen2)
strt4 = re.sub(slash_fwd,"",sen3)
strt5 = re.sub(slash_all,"",sen3)
print(strt)
print(strt1)
print(strt2)
print(strt3)
print(strt4)
print(strt5)
Output:
this is and
this i\s and
this and
this is, and!
this i\s, and!
this, and!

One way you could do it without re is with join and a comprehension.
sen = 'this is \re\store and b\\fre'
sen1 = 'this i\s /re/store and b//fre/'
remove_back = lambda s: ' '.join(i for i in s.split() if '\\' not in i)
remove_forward = lambda s: ' '.join(i for i in s.split() if '/' not in i)
>>> print(remove_back(sen))
this is and
>>> print(remove_forward(sen1))
this i\s and
>>> print(remove_back(remove_forward(sen1)))
this and

Python/Pyparsing - Multiline quotes

I'm trying to use pyparsing to match a multiline string that can continue in a similar fashion to those of python:
Test = "This is a long " \
"string"
I can't find a way to make pyparsing recognize this. Here is what I've tried so far:
import pyparsing as pp
src1 = '''
Test("This is a long string")
'''
src2 = '''
Test("This is a long " \
"string")
'''
_lp = pp.Suppress('(')
_rp = pp.Suppress(')')
_str = pp.QuotedString('"', multiline=True, unquoteResults=False)
func = pp.Word(pp.alphas)
function = func + _lp + _str + _rp
print src1
print function.parseString(src1)
print '-------------------------'
print src2
print function.parseString(src2)

The problem is that having a multi-line quoted string doesn't do what you think. A multiline quoted string is literally that -- a string with newlines inside:
import pyparsing as pp
src0 = '''
"Hello
World
Goodbye and go"
'''
pat = pp.QuotedString('"', multiline=True)
print pat.parseString(src0)
The output of parsing this string would be ['Hello\n World\n Goodbye and go'].
As far as I know, if you want a string that's similar to how Python's strings behave, you have to define it yourself:
import pyparsing as pp
src1 = '''
Test("This is a long string")
'''
src2 = '''
Test("This is a long"
"string")
'''
src3 = '''
Test("This is a long" \\
"string")
'''
_lp = pp.Suppress('(')
_rp = pp.Suppress(')')
_str = pp.QuotedString('"')
_slash = pp.Suppress(pp.Optional("\\"))
_multiline_str = pp.Combine(pp.OneOrMore(_str + _slash), adjacent=False)
func = pp.Word(pp.alphas)
function = func + _lp + _multiline_str + _rp
print src1
print function.parseString(src1)
print '-------------------------'
print src2
print function.parseString(src2)
print '-------------------------'
print src3
print function.parseString(src3)
This produces the following output:
Test("This is a long string")
['Test', 'This is a long string']
-------------------------
Test("This is a long"
"string")
['Test', 'This is a longstring']
-------------------------
Test("This is a long" \
"string")
['Test', 'This is a longstring']
Note: The Combine class merges the various quoted strings into a single unit so that they appear as a single string in the output list. The reason why the backslash is suppressed so that it isn't combined as a part of output string.

Search Patterns replacement using lambda

I need to write into a file with Before and after search replacement patterns. I have written the below code. I have used function in writing to output file and it worked fine. But i have around 20 such replacement patterns and i feel i am not writing a good code because i need to create functions for all those replacements. Can you please let me know is there any other way in implementing this?
import re
Report_file = open("report.txt", "w")
st = '''<TimeLog>
<InTime='10Azx'>1056789</InTime>
<OutTime='14crg'>1056867</OutTime>
<PsTime='32lxn'>1056935</PsTime>
<ClrTime='09zvf'>1057689</ClrTime>
</TimeLog>'''
def tcnv(str):
Report_file.write("Previous TS: " + str + "\n\n")
v1 = re.search(r"(?i)<clrtime='(\d+\w+)'>", str)
val1 = v1.group(1)
v2 = re.search(r"(?i)(<clrtime='(\d+\w+)'>(.*?)</clrtime>)", str)
val2 = v2.group(3)
soutval = "<Clzone><clnvl='" + val1 + "'>" + val2 + "</clnvl></Clzone>"
Report_file.write("New TS: " + soutval + "\n")
return soutval
st = re.sub(r"(?i)(<clrtime='(\d+\w+)'>(.*?)</clrtime>)", lambda m: tcnv(m.group(1)), st)
st = re.sub(r"(?i)<intime='(\d+\w+)'>(.*?)</intime>", "<Izone><Invl='\\1'>\\2</Invl></Izone>", st)
st = re.sub(r"(?i)<outtime='(\d+\w+)'>(.*?)</outtime>", "<Ozone><onvl='\\1'>\\2</onnvl></Ozone>", st)
st = re.sub(r"(?i)<pstime='(\d+\w+)'>(.*?)</pstime>", "<Pszone><psnvl='\\1'>\\2</psnvl

I didn't see why you put the re.IGNORECASE flag under the form of (?i), so I don't use it the following solution, and the pattern is written with the uppercased letters where necessary according to your sample
Note that you should use the with statement to open the files, it would be far better:
with open('filename.txt','rb') as f:
ch = f.read()
The answer
import re
st = '''<InTime='10Azx'>1056789</InTime>
<OutTime='14crg'>1056867</OutTime>
<PsTime='32lxn'>1056935</PsTime>
<ClrTime='09zvf'>1057689</ClrTime>
'''
d = dict(zip(('InTime','OutTime','PsTime','ClrTime'),
(('Izone><Invl','/Invl></Izone'),
('Ozone><onvl','/onnvl></Ozone'),
('Pszone><psnvl','/psnvl></Pszone'),
('Clzone><clnvl','/clnvl></Clzone'))
)
)
def ripl(ma,d=d):
return "<{}='{}'>{}<{}>".format(d[ma.group(1)][0],
ma.group(2),
ma.group(3),
d[ma.group(1)][1])
st2 = re.sub(r"<(InTime|OutTime|PsTime|ClrTime)='(\d+\w+)'>(.*?)</\1>",
ripl, st)
print '%s\n\n%s\n' % (st,st2)

Replace a pattern in python

How to replace the pattern in the string with
decoded_str=" Name(++info++)Age(++info++)Adress of the emp(++info++)"
The first pattern "(++info++)" needs to replaced with (++info a++)
The second pattern "(++info++)" needs to replaced with (++info b++)
The third pattern "(++info++)" needs to replaced with (++info c++)
If there many more then it should be replaced accordingly

This should be simple enough:
for character in range(ord('a'), ord('z')):
if "(++info++)" not in decoded_str:
break
decoded_str = decoded_str.replace("(++info++)", "(++info {0}++)".format(chr(character)), 1)
print decoded_str
It has the added benefit of stopping at 'z'. If you want to wrap around:
import itertools
for character in itertools.cycle(range(ord('a'), ord('z'))):
if "(++info++)" not in decoded_str:
break
decoded_str = decoded_str.replace("(++info++)", "(++info {0}++)".format(chr(character)), 1)
print decoded_str
And just for fun, a one-liner, and O(n):
dstr = "".join(x + "(++info {0}++)".format(chr(y)) for x, y in zip(dstr.split("(++info++)"), range(ord('a'), ord('z'))))[:-len("(++info a++)")]

import string
decoded_str = " Name(++info++)Age(++info++)Adress of the emp(++info++)"
s = decoded_str.replace('++info++', '++info %s++')
s % tuple(i for i in string.ascii_lowercase[:s.count('%s')])

Here is a rather ugly yet pragmatic solution:
import string
decoded_str = " Name(++info++)Age(++info++)Adress of the emp(++info++)"
letters = list(string.lowercase)
token = "(++info++)"
rep_token = "(++info %s++)"
i = 0
while (token in decoded_str):
decoded_str = decoded_str.replace(token, rep_token % letters[i], 1)
i += 1
print decoded_str

>>> import re
>>> rx = re.compile(r'\(\+\+info\+\+\)')
>>> s = "Name(++info++)Age(++info++)Adress of the emp(++info++)"
>>> atoz = iter("abcdefghijklmnopqrstuvwxyz")
>>> rx.sub(lambda m: '(++info ' + next(atoz) + '++)', s)
'Name(++info a++)Age(++info b++)Adress of the emp(++info c++)'

Here's a quick hack to do it:
string=" Name(++info++)Age(++info++)Adress of the emp(++info++)"
def doit(s):
import string
allTheLetters = list(string.lowercase)
i=0
s2 = s.replace("++info++","++info "+allTheLetters[i]+"++",1)
while (s2!=s):
s=s2
i=i+1
s2 = s.replace("++info++","++info "+allTheLetters[i]+"++",1)
return s
Note that performance is probably not very great.

import re, string
decoded_str=" Name(++info++)Age(++info++)Adress of the emp(++info++)"
sub_func=('(++info %s++)'%c for c in '.'+string.ascii_lowercase).send
sub_func(None)
print re.sub('\(\+\+info\+\+\)', sub_func, decoded_str)

from itertools import izip
import string
decoded_str=" Name(++info++)Age(++info++)Adress of the emp(++info++)"
parts = iter(decoded_str.split("(++info++)"))
first_part = next(parts)
tags = iter(string.ascii_lowercase)
encoded_str=first_part+"".join("(++info %s++)%s"%x for x in izip(tags, parts))
print encoded_str

decoded_str=" Name(++info++)Age(++info++)Adress of the emp(++info++)"
import re
for i, f in enumerate(re.findall(r"\(\+\+info\+\+\)",decoded_str)):
decoded_str = re.sub(r"\(\+\+info\+\+\)","(++info %s++)"%chr(97+i),decoded_str,1)
print decoded_str

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

String Operation on captured group in re Python - python

Try using lambda. Ex: import re str1 = "abc = def" print( re.sub("(?P<one>(\w+)) = (\w+)",lambda match: r'{0} = #{1}#'.format(match.group('one'), match.group('one').title()), str1) ) Output: abc = #Abc#

Related

nextLine().split("\\s+") converted to python

Removing words with special characters "\" and "/"

Python/Pyparsing - Multiline quotes

Search Patterns replacement using lambda

Replace a pattern in python

Categories

Resources