How to fix “<string> DeprecationWarning: invalid escape sequence” in Python? - python

I am using pytest-3/python3
def check_email(email):
**regex = '^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'**
if(re.search(regex,email)):
return True
else:
return False
The ** part is what gave an error

Use a raw string
regex = r'^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$'
Note the r prefix. This ensures that the \ are not interpreted as possible escape sequences and instead just as plain \.
As \ is meaningful in regular expressions, it is a good habit in python to always use raw strings for regular expressions. (See the re documentation)

The usage of "\~" is not printing literally a "\~" but will be reduced to an escape sequence much like '\n' is a New-Line Character. You can use for example "\\w" to achieve a literal "\w".
Or you can preprend the whole string with an r like r'your \string'

Related

Python lint issue : invalid escape sequence '\/'

This is my python code line which is giving me invalid escape sequence '/' lint issue.
pattern = 'gs:\/\/([a-z0-9-]+)\/(.+)$' # for regex matching
It is giving me out that error for all the backslash I used here .
any idea how to resolve this ?
There's two issues here:
Since this is not a raw string, the backslashes are string escapes, not regexp escapes. Since \/ is not a valid string escape sequence, you get that warning. Use a raw string so that the backslashes will be ignored by the string parser and passed to the regexp engine. See What exactly is a "raw string regex" and how can you use it?
In some languages / is part of the regular expression syntax (it's the delimiter around the regexp), so they need to be escaped. But Python doesn't use / this way, so there's no need to escape them in the first place.
Use this:
pattern = r'gs://([a-z0-9-]+)/(.+)$' # for regex matching

Python - Should I be using string prefix r when looking for a period (full stop or .) using regex?

I would like to know the reason I get the same result when using string prefix "r" or not when looking for a period (full stop) using python regex.
After reading a number sources (Links below) a multiple times and experimenting with in code to find the same result (again see below), I am still unsure of:
What is the difference when using string prefix "r" and not using string prefix "r", when looking for a period using regex?
Which way is considered the correct way of finding a period in a string using python regex with string prefix "r" or without string prefix "r"?
re.compile("\.").sub("!", "blah.")
'blah!'
re.compile(r"\.").sub("!", "blah.")
'blah!'
re.compile(r"\.").search("blah.").group()
'.'
re.compile("\.").search("blah.").group()
'.'
Sources I have looked at:
Python docs: string literals
http://docs.python.org/2/reference/lexical_analysis.html#string-literals
Regular expression to replace "escaped" characters with their originals
Python regex - r prefix
r prefix is for raw strings
http://forums.udacity.com/questions/7000217/r-prefix-is-for-raw-strings
The raw string notation is just that, a notation to specify a string value. The notation results in different string values when it comes to backslash escapes recognized by the normal string notation. Because regular expressions also attach meaning to the backslash character, raw string notation is quite handy as it avoids having to use excessive escaping.
Quoting from the Python Regular Expression HOWTO:
The solution is to use Python’s raw string notation for regular expressions; backslashes are not handled in any special way in a string literal prefixed with 'r', so r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation.
The \. combination has no special meaning in regular python strings, so there is no difference, at all between the result of '\.' and r'\.'; you can use either:
>>> len('\.')
2
>>> len(r'\.')
2
Raw strings only make a difference when the backslash + other characters do have special meaning in regular string notation:
>>> '\b'
'\x08'
>>> r'\b'
'\\b'
>>> len('\b')
1
>>> len(r'\b')
2
The \b combination has special meaning; in a regular string it is interpreted as the backspace character. But regular expressions see \b as a word boundary anchor, so you'd have to use \\b in your Python string every time you wanted to use this in a regular expression. Using r'\b' instead makes it much easier to read and write your expressions.
The regular expression functions are passed string values; the result of Python interpreting your string literal. The functions do not know if you used raw or normal string literal syntax.

Python regular expression with string in it

I would like to match a string with something like:
re.match(r'<some_match_symbols><my_match><some_other_match_symbols>', mystring)
where mymatch is a string I would like it to find. The problem is that it may be different from time to time, and it is stored in a variable. Would it be possible to add one variable to a regex?
Nothing prevents you from simply doing this:
re.match('<some_match_symbols>' + '<my_match>' + '<some_other_match_symbols>', mystring)
Regular expressions are nothing else than a string containing some special characters, specific to the regular expression syntax. But they are still strings, so you can do whatever you are used to do with strings.
The r'…' syntax is btw. a raw string syntax which basically just prevents any escape sequences inside the string from being evaluated. So r'\n' will be the same as '\\n', a string containing a backslash and an n; while '\n' contain a line break.
import re
url = "www.dupe.com"
expression = re.compile('<p>%s</p>'%url)
result = expression.match("<p>www.dupe.com</p>BBB")
if result:
print result.start(), result.end()
The r'' notation is for constants. Use the re library to compile from variables.

Unbalanced parenthesis python

I have the following code:
def commandType(self):
import re
print self.cmds[self.counter]
if re.match("#",self.cmds[self.counter]):
return Parser.A_COMMAND
elif re.match('(',self.cmds[self.counter]):
return Parser.L_COMMAND
else:
return Parser.C_COMMAND
and on this line: elif re.match('(',self.cmds[self.counter]):
I'm getting an error.
What am I doing wrong?
Parentheses have special meaning in regular expressions. You can escape the paren but you really do not need a regex at all for this problem:
def commandType(self):
print self.cmds[self.counter]
if '#' in self.cmds[self.counter]):
return Parser.A_COMMAND
elif '(' in self.cmds[self.counter]:
return Parser.L_COMMAND
else:
return Parser.C_COMMAND
The parenthesis '(' and ')' are used as grouping mechanism and scope operators in regexps. You have to escape them (and any other control symbols) via backslash, e.g. '\('.
The language of regular expressions gives a special meaning to ( (it's used for starting a group). If you want to match a literal left-parenthesis, you need to escape it with a backslash: elif re.match(r'\(', ....
(Why r'...' rather than just '...'? Because in ordinary strings, backslashes are also used for escaping control characters and suchlike, and you need to write \\ to get a single backslash into the string. So you could instead write elif re.match('\\(', .... It's better to get into the habit of using r'...' strings for regular expressions -- it's less error-prone.)

Raw string in Python regular expression using Windows folder path

Trying to use backslashes in raw strings with this regular expression:
import re
print re.sub(r'^[a-zA-Z]:\\.+(\\Data.+)', r'D:\folder\1', r'C:\Some\Path\Data\File.txt')
Expected output:
D:\folder\Data\File.txt
However \f is being interpreted. Is there any way to make this work without converting to forward slashes?
re.sub interprets escape sequences in the replacement string (docs). Adding an extra backslash before the \f to escape the backslash seems to do the trick:
import re
print re.sub(r'^[a-zA-Z]:\\.+(\\Data.+)', r'D:\\folder\1', r'C:\Some\Path\Data\File.txt')
If your replacement string is dynamic, you can always use another regexp to escape backslashes, or use str.encode('unicode-escape').
To avoid special characters translation you can use lambda-function:
print re.sub(r'^[a-zA-Z]:\\.+(\\Data.+)', lambda x: r'D:\\folder\1', r'C:\Some\Path\Data\File.txt')

Categories