Unbalanced parenthesis python - python

I have the following code:
def commandType(self):
import re
print self.cmds[self.counter]
if re.match("#",self.cmds[self.counter]):
return Parser.A_COMMAND
elif re.match('(',self.cmds[self.counter]):
return Parser.L_COMMAND
else:
return Parser.C_COMMAND
and on this line: elif re.match('(',self.cmds[self.counter]):
I'm getting an error.
What am I doing wrong?

Parentheses have special meaning in regular expressions. You can escape the paren but you really do not need a regex at all for this problem:
def commandType(self):
print self.cmds[self.counter]
if '#' in self.cmds[self.counter]):
return Parser.A_COMMAND
elif '(' in self.cmds[self.counter]:
return Parser.L_COMMAND
else:
return Parser.C_COMMAND

The parenthesis '(' and ')' are used as grouping mechanism and scope operators in regexps. You have to escape them (and any other control symbols) via backslash, e.g. '\('.

The language of regular expressions gives a special meaning to ( (it's used for starting a group). If you want to match a literal left-parenthesis, you need to escape it with a backslash: elif re.match(r'\(', ....
(Why r'...' rather than just '...'? Because in ordinary strings, backslashes are also used for escaping control characters and suchlike, and you need to write \\ to get a single backslash into the string. So you could instead write elif re.match('\\(', .... It's better to get into the habit of using r'...' strings for regular expressions -- it's less error-prone.)

Related

regex re.search is not returning the match

I tried this code:
x = re.search("f?e?males?\b", "russian male")
if (x):
print("YES! We have a match!")
else:
print("No match")
BUT it is printing "No match".
Im testing to apply it to a data frame. If the string has "male" in it, it has to return another value.
But, regex is not working. Do you know why? I dont want to put only "male" because I want to also match female, females, males, etc.
Use the r prefix when writing the patterns. i.e r'f?e?males\b'
Raw strings interact weirdly. More information can be found in the top answer here -> Python regex - r prefix
Add an 'r' in front of the regex:x = re.search(r"f?e?males?\b", "russian male"), because your regex has an '\' in the string.
See Regular expression operations:
Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression.
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
The problem seems to be the \b-Part of your regex. I think you want a lookahead here: x = re.search(r"f?e?males?(?!\S)", "russian male")
This matches "russian male", "russian male ", "russian males" but not "russian maley" or "russian male!"
Oh, and as the other 2 answers pointed out: you need the r in front of your regex :)

How to fix “<string> DeprecationWarning: invalid escape sequence” in Python?

I am using pytest-3/python3
def check_email(email):
**regex = '^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'**
if(re.search(regex,email)):
return True
else:
return False
The ** part is what gave an error
Use a raw string
regex = r'^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'
Note the r prefix. This ensures that the \ are not interpreted as possible escape sequences and instead just as plain \.
As \ is meaningful in regular expressions, it is a good habit in python to always use raw strings for regular expressions. (See the re documentation)
The usage of "\~" is not printing literally a "\~" but will be reduced to an escape sequence much like '\n' is a New-Line Character. You can use for example "\\w" to achieve a literal "\w".
Or you can preprend the whole string with an r like r'your \string'

python regex: to match space character or end of string

I want to match space chars or end of string in a text.
import re
uname='abc'
assert re.findall('#%s\s*$' % uname, '#'+uname)
assert re.findall('#%s\s*$' % uname, '#'+uname+' '+'aa')
assert not re.findall('#%s\s*$' % uname, '#'+uname+'aa')
The pattern is not right.
How to use python?
\s*$ is incorrect: this matches "zero or more spaces followed by the end of the string", rather than "one or more spaces or the end of the string".
For this situation, I would use
(?:\s+|$) (inside a raw string, as others have mentioned).
The (?:) part is just about separating that subexpression so that the | operator matches the correct fragment and no more than the correct fragment.
Try this:
assert re.findall('#%s\\s*$' % uname, '#'+uname)
You must escape the \ character if you don't use raw strings.
It's a bit confusing, but stems from the fact that \ is a meta character for both the python interpreter and the re module.
Use raw strings.
assert re.findall(r'#%s\s*$' % uname, '#'+uname)
Otherwise the use of \ as a special character in regular strings conflicts with its use as a special character in regular expressions.
But this assertion is impossible to fail. Of course, a string consisting of nothing but "#" plus the contents of the variable uname is going to match a regular expression of "#" plus uname plus optional (always empty) whitespace and then the end of the string. It's a tautology. I suspect you are trying to check for something else?

Python regular expression with string in it

I would like to match a string with something like:
re.match(r'<some_match_symbols><my_match><some_other_match_symbols>', mystring)
where mymatch is a string I would like it to find. The problem is that it may be different from time to time, and it is stored in a variable. Would it be possible to add one variable to a regex?
Nothing prevents you from simply doing this:
re.match('<some_match_symbols>' + '<my_match>' + '<some_other_match_symbols>', mystring)
Regular expressions are nothing else than a string containing some special characters, specific to the regular expression syntax. But they are still strings, so you can do whatever you are used to do with strings.
The r'…' syntax is btw. a raw string syntax which basically just prevents any escape sequences inside the string from being evaluated. So r'\n' will be the same as '\\n', a string containing a backslash and an n; while '\n' contain a line break.
import re
url = "www.dupe.com"
expression = re.compile('<p>%s</p>'%url)
result = expression.match("<p>www.dupe.com</p>BBB")
if result:
print result.start(), result.end()
The r'' notation is for constants. Use the re library to compile from variables.

How to delete () using re module in Python

I am in trouble for processing XML text.
I want to delete () from my text as follows:
from <b>(apa-bhari(n))</b> to <b>apa-bhari(n)</b>
The following code was made
name= re.sub('<b>\((.+)\)</b>','<b>\1</b>',name)
But this can only returns
<b></b>
I do not understand escape sequences and backreference. Please tell me the solution.
You need to use raw strings, or escape the slashes:
name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>', name)
You need to escape backslashes in Python strings if followed by a number; the following expressions are all true:
assert '\1' == '\x01'
assert len('\\1') == 2
assert '\)' == '\\)'
So, your code would be
name = re.sub('<b>\\((.+)\\)</b>','<b>\\1</b>',name)
Alternatively, use the regular expression string definition:
name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>',name)
Try:
name= re.sub('<b>\((.+)\)</b>','<b>\\1</b>',name)
or if you do not want to have an illisible code with \\ everywhere you are using backslashes, do not escape manually backslashes, but add an r before the string, ex: r"myString\" is the same as "myString\\".

Categories