How to delete () using re module in Python - python

I am in trouble for processing XML text.
I want to delete () from my text as follows:
from <b>(apa-bhari(n))</b> to <b>apa-bhari(n)</b>
The following code was made
name= re.sub('<b>\((.+)\)</b>','<b>\1</b>',name)
But this can only returns
<b></b>
I do not understand escape sequences and backreference. Please tell me the solution.

You need to use raw strings, or escape the slashes:
name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>', name)

You need to escape backslashes in Python strings if followed by a number; the following expressions are all true:
assert '\1' == '\x01'
assert len('\\1') == 2
assert '\)' == '\\)'
So, your code would be
name = re.sub('<b>\\((.+)\\)</b>','<b>\\1</b>',name)
Alternatively, use the regular expression string definition:
name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>',name)

Try:
name= re.sub('<b>\((.+)\)</b>','<b>\\1</b>',name)
or if you do not want to have an illisible code with \\ everywhere you are using backslashes, do not escape manually backslashes, but add an r before the string, ex: r"myString\" is the same as "myString\\".

Related

python re.sub not replacing string

I am trying to replace the string in a file like below, but somehow its not replacing
my_string = "TABLE "_$deleted$73$0"" --> inside can be any number, i wanted to change like below one
replace_string = "TABLE "I01""
o = "TABLE \"_$[a-z]+*$*\""
n = "TABLE \"I01\""
re.sub (o, n, file)
its not replacing that string, Pls help
Regards
Kannan
Use raw strings to bypass Python's escape sequences (prefix the string literal with r) This only bypasses Python's escape sequences, the Regex's special symbols still need to be escaped. Also, using a single quoted literal is cleaner as your pattern contains double quotation marks.
There were multiple issues with your pattern. Both the $ symbols must be prefixed with \. Secondly, [a-z]+* has a + followed by a * which doesn't make sense, you don't need the *. The following is what seems to be the pattern as per the knowledge of one of its matches:
o = r'TABLE "_\$[a-z]+\$\d+\$\d+"'
Let me know if something is not clear. Hope this helps.

How to fix “<string> DeprecationWarning: invalid escape sequence” in Python?

I am using pytest-3/python3
def check_email(email):
**regex = '^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'**
if(re.search(regex,email)):
return True
else:
return False
The ** part is what gave an error
Use a raw string
regex = r'^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'
Note the r prefix. This ensures that the \ are not interpreted as possible escape sequences and instead just as plain \.
As \ is meaningful in regular expressions, it is a good habit in python to always use raw strings for regular expressions. (See the re documentation)
The usage of "\~" is not printing literally a "\~" but will be reduced to an escape sequence much like '\n' is a New-Line Character. You can use for example "\\w" to achieve a literal "\w".
Or you can preprend the whole string with an r like r'your \string'

Python regular expression with string in it

I would like to match a string with something like:
re.match(r'<some_match_symbols><my_match><some_other_match_symbols>', mystring)
where mymatch is a string I would like it to find. The problem is that it may be different from time to time, and it is stored in a variable. Would it be possible to add one variable to a regex?
Nothing prevents you from simply doing this:
re.match('<some_match_symbols>' + '<my_match>' + '<some_other_match_symbols>', mystring)
Regular expressions are nothing else than a string containing some special characters, specific to the regular expression syntax. But they are still strings, so you can do whatever you are used to do with strings.
The r'…' syntax is btw. a raw string syntax which basically just prevents any escape sequences inside the string from being evaluated. So r'\n' will be the same as '\\n', a string containing a backslash and an n; while '\n' contain a line break.
import re
url = "www.dupe.com"
expression = re.compile('<p>%s</p>'%url)
result = expression.match("<p>www.dupe.com</p>BBB")
if result:
print result.start(), result.end()
The r'' notation is for constants. Use the re library to compile from variables.

Unbalanced parenthesis python

I have the following code:
def commandType(self):
import re
print self.cmds[self.counter]
if re.match("#",self.cmds[self.counter]):
return Parser.A_COMMAND
elif re.match('(',self.cmds[self.counter]):
return Parser.L_COMMAND
else:
return Parser.C_COMMAND
and on this line: elif re.match('(',self.cmds[self.counter]):
I'm getting an error.
What am I doing wrong?
Parentheses have special meaning in regular expressions. You can escape the paren but you really do not need a regex at all for this problem:
def commandType(self):
print self.cmds[self.counter]
if '#' in self.cmds[self.counter]):
return Parser.A_COMMAND
elif '(' in self.cmds[self.counter]:
return Parser.L_COMMAND
else:
return Parser.C_COMMAND
The parenthesis '(' and ')' are used as grouping mechanism and scope operators in regexps. You have to escape them (and any other control symbols) via backslash, e.g. '\('.
The language of regular expressions gives a special meaning to ( (it's used for starting a group). If you want to match a literal left-parenthesis, you need to escape it with a backslash: elif re.match(r'\(', ....
(Why r'...' rather than just '...'? Because in ordinary strings, backslashes are also used for escaping control characters and suchlike, and you need to write \\ to get a single backslash into the string. So you could instead write elif re.match('\\(', .... It's better to get into the habit of using r'...' strings for regular expressions -- it's less error-prone.)

Using regex in python

i have the following problem.
I want to escape all special characters in a python string.
str='eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\\1', str)
'eFEx\\1x\\1k\\1\\1\\1'
str='eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\1', str)
'eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\\\1', str)
I can't seem to win here. '\1' indicates the special character and i want to add a '\' before this special character. but using \1 removes its special meaning and \\1 also does not help.
Use r'\\\1'. That's a backslash (escaped, so denoted \\) followed by \1.
To verify that this works, try:
str = 'eFEx-x?k=;-'
print re.sub("([^a-zA-Z0-9])",r'\\\1', str)
This prints:
eFEx\-x\?k\=\;\-
which I think is what you want. Don't be confused when the interpreter outputs 'eFEx\\-x\\?k\\=\\;\\-'; the double backslashes are there because the interpreter quotes it output, unless you use print.
Why don't you use re.escape()?
str = 'eFEx-x?k=;-'
re.escape(str)
'eFEx\\-x\\?k\\=\\;\\-'
Try adding another backslash:
s = 'eFEx-x?k=;-'
print re.sub("([^a-zA-Z0-9])",r'\\\1', s)

Categories