This is my python code line which is giving me invalid escape sequence '/' lint issue.
pattern = 'gs:\/\/([a-z0-9-]+)\/(.+)$' # for regex matching
It is giving me out that error for all the backslash I used here .
any idea how to resolve this ?
There's two issues here:
Since this is not a raw string, the backslashes are string escapes, not regexp escapes. Since \/ is not a valid string escape sequence, you get that warning. Use a raw string so that the backslashes will be ignored by the string parser and passed to the regexp engine. See What exactly is a "raw string regex" and how can you use it?
In some languages / is part of the regular expression syntax (it's the delimiter around the regexp), so they need to be escaped. But Python doesn't use / this way, so there's no need to escape them in the first place.
Use this:
pattern = r'gs://([a-z0-9-]+)/(.+)$' # for regex matching
Related
I am writing Google Big Query wrappers in python. One of the queries has a regex and the python code is treating it as an syntax error.
Here is the regex
WHEN tier2 CONTAINS '-' THEN REGEXP_EXTRACT(tier2,'(.*)\s-')
the error is Invalid string literal: '(.*)\s-'>
The error is for \ in the regex.
Any suggestion to overcome it
You need to escape backslash by preceding it with yet another backslash
Backslash \ is an escape character so you need to escape it so it is treated as a normal character
Try
'(.*)\\s-'
Based on your comments, looks like above is exactly what you are using in BigQuery - so in this case you need to escape each of two backslashes
'(.*)\\\\s-'
I want to catch any warning that contains the string 'value'.
From this question, I see this example to catch a specific message:
warnings.filterwarnings('error', 'Unknown table .*')
The docs say about the message parameter:
message is a string containing a regular expression that the warning message must match (the match is compiled to always be case-insensitive).
I have the following code but no errors are thrown and instead I'm just getting the warnings which I cannot catch.
warnings.filterwarnings('error', message='\bvalue\b')
What am I missing? As far as I know, that regex should work for matching the 'value' string.
Python's regular expression syntax is documented here, and the first thing it says is:
Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
In Python, as in many languages, the string '\b' corresponds to ASCII backspace
(link).
You need to escape your backslash characters, or else use Python's special "raw/regex" prefix:
warnings.filterwarnings('error', message='\\bvalue\\b')
warnings.filterwarnings('error', message=r'\bvalue\b')
Why is pep8 complaining on the next string in the code?
import re
re.compile("\d{3}")
The warning I receive:
ID:W1401 Anomalous backslash in string: '\d'. String constant might be missing an r prefix.
Can you explain what is the meaning of the message? What do I need to change in the code so that the warning W1401 is passed?
The code passes the tests and runs as expected. Moreover \d{3} is a valid regex.
"\d" is same as "\\d" because there's no escape sequence for d. But it is not clear for the reader of the code.
But, consider \t. "\t" represent tab chracter, while r"\t" represent literal \ and t character.
So use raw string when you mean literal \ and d:
re.compile(r"\d{3}")
or escape backslash explicitly:
re.compile("\\d{3}")
Python is unable to parse '\d' as an escape sequence, that's why it produces a warning.
After that it's passed down to regex parser literally, works fine as an E.S. for regex.
I am trying to match (using regex in python):
http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg
in the following string:
http://www.mymaterialssite.com','http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'
My code has something like this:
temp="http://www.mymaterialssite.com','http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'"
dummy=str(re.compile(r'.com'',,''(.*?)'',,''Model Photo').search(str(temp)).group(1))
I do not think the "dummy" is correct & I am unsure how I "escape" the single and double quotes in the regex re.compile command.
I tried googling for the problem, but I couldnt find anything relevant.
Would appreciate any guidance on this.
Thanks.
The easiest way to deal with strings in Python that contain escape characters and quotes is to triple double-quote the string (""") and prefix it with r. For example:
my_str = r"""This string would "really "suck"" to write if I didn't
know how to tell Python to parse it as "raw" text with the 'r' character and
triple " quotes. Especially since I want \n to show up as a backlash followed
by n. I don't want \0 to be the null byte either!"""
The r means "take escape characters as literal". The triple double-quotes (""") prevent single-quotes, double-quotes, and double double-quotes from prematurely ending the string.
EDIT: I expanded the example to include things like \0 and \n. In a normal string (not a raw string) a \ (the escape character) signifies that the next character has special meaning. For example \n means "the newline character". If you literally wanted the character \ followed by n in your string you would have to write \\n, or just use a raw string instead, as I show in the example above.
You can also read about string literals in the Python documentation here:
For beginners: http://docs.python.org/tutorial/introduction.html#strings
Complex explanation: http://docs.python.org/reference/lexical_analysis.html#string-literals
Try triple quotes:
import re
tmp=""".*http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg.*"""
str="""http://www.mymaterialssite.com\'\,\'http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'"""
x=re.match(tmp,str)
if x!=None:
print x.group()
Also you were missing the .* in the beginning of the pattern and at the end. I added that too.
if you use double quotes (which have the same meaning as the single ones, in Python), you don't have to escape at all.. (in this case). you can even use string literal without the starting r (you don't have any backslash there)
re.compile(".com','(.*?)','Model Photo")
Commas don't need to be escaped, and single quotes don't need to be escaped if you use double quotes to create the string:
>>> dummy=re.compile(r".com','(.*?)','Model Photo").search(temp).group(1)
>>> print dummy
http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg
Note that I also removed some unnecessary str() calls, and for future reference if you do ever need to escape single or double quotes (say your string contains both), use a backslash like this:
'.com\',\'(.*?)\',\'Model Photo'
As mykhal pointed out in comments, this doesn't work very nicely with regex because you can no longer use the raw string (r'...') literal. A better solution would be to use triple quoted strings as other answers suggested.
Why in python I can't use:
r"c:\"
When a string must contain the same quote character with which it starts, escaping that character is the only available workaround -- so the design alternative was either to make raw-string literals unable to contain their leading quote character, or keep the "backlash escapes" convention, even in string literals, just for quote characters.
Since raw-string literals were designed for handy representation of regular expression patterns (not for DOS / Windows paths!-), and in RE patterns a trailing backslash is never necessary, the design decision was easy (based on the real use case for raw-string literals).
Use "c:/" or "c:\\". Raw string literals are for escaping escape-sequences, not for including literal backslashes, though they do work that way, except in this exact case.
Its a known case I think, better use "c:\\" for that case.
From the documentation:
... a raw string cannot end in a single backslash (since the backslash would escape the following quote character).
.
Even with raw strings, \" causes the " not to be interpreted as the end of the string (though the backslash gets into your string), so r"foo\"bar" would be a legal string. This is convenient enough when writing regex but not great for writing paths.
This is not a big deal as most of the time you should be using os.path and other modules to deal with your paths.
found in Design and History FAQ http://docs.python.org/faq/design.html#why-can-t-raw-strings-r-strings-end-with-a-backslash
Raw strings were designed to ease
creating input for processors (chiefly
regular expression engines) that want
to do their own backslash escape
processing. Such processors consider
an unmatched trailing backslash to be
an error anyway, so raw strings
disallow that. In return, they allow
you to pass on the string quote
character by escaping it with a
backslash. These rules work well when
r-strings are used for their intended
purpose.