python raw string notation throwing error with trailing slash - python

I'm trying to set a path to a string variable in python using raw string notation and am getting an error with the trailing slash:
datapath = r'C:\path\to\my\data\'
gives me an "EOL while scanning string literal" error
I thought raw string notation was supposed to make everything in the string literal. Can someone explain this to me?
Thanks

There's an exception for the end quote of the string because:
C:\path\to\my\data\'
sees ' literally since the previous backslash isn't seen as an escape char, so string parsing continues.
So as frustrating as it is, you have to do r'C:\path\to\my\data\\'

The documentation defines a string literal in this way:
stringliteral ::= [stringprefix](shortstring | longstring)
You're using the r stringprefix.
Then we have these definitions for characters in the strings:
shortstringchar ::= <any source character except "\" or newline or the quote>
longstringchar ::= <any source character except "\">
where you will notice that the backwards slash is not one of the characters allowed in a shortstring or a longstring.

Related

Python lint issue : invalid escape sequence '\/'

This is my python code line which is giving me invalid escape sequence '/' lint issue.
pattern = 'gs:\/\/([a-z0-9-]+)\/(.+)$' # for regex matching
It is giving me out that error for all the backslash I used here .
any idea how to resolve this ?
There's two issues here:
Since this is not a raw string, the backslashes are string escapes, not regexp escapes. Since \/ is not a valid string escape sequence, you get that warning. Use a raw string so that the backslashes will be ignored by the string parser and passed to the regexp engine. See What exactly is a "raw string regex" and how can you use it?
In some languages / is part of the regular expression syntax (it's the delimiter around the regexp), so they need to be escaped. But Python doesn't use / this way, so there's no need to escape them in the first place.
Use this:
pattern = r'gs://([a-z0-9-]+)/(.+)$' # for regex matching

Why do I only have to escape the last backslash in a string literal?

I have the following Python code:
localExtractpath = "D:\Python\From 0 to 1\Excel\"
if os.path.exists(localZipPath):
print("Cool! '" + localZipPath + "' exists...proceeding...")
This gives me the error:
File "", line 2
localExtractpath = "D:\Python\From 0 to 1\Excel\"
^
SyntaxError: EOL while scanning string literal
When I escape the last \ in the string, the code works. Why do I only have to escape the last \?
Why do I only have to escape the last \?
Because only after the last \ there is a symbol (") which together with \ forms an escape sequence — \" (escaping the role of the quote symbol " as a string terminator).
If \ with subsequent symbol(s) don't form an escape sequence, it's kept as is, i.e. as the backslash symbol itself.
(In your case neither \P, nor \F and nor \E form an allowed escape sequence, so the symbol \ itself is interpreted literally — as is.)
An (unsolicited) solution:
Use forward slashes (/) instead of backslashes (\)
(all Windows system calls accept them, too):
localExtractpath = "D:/Python/From 0 to 1/Excel/"
The last backslash in "D:\Python\From 0 to 1\Excel\" is escaping your ending quotation mark, so in the eyes of the interpreter, your string is unterminated. In fact, you have to escape all your backslashes if you want to use the literal backslash in your string:
"D:\\Python\\From 0 to 1\\Excel\\"
Other answers are right: you should escape all your backslashes and, even better, you should use forward slash for path elements (you can even take a look at the pathlib library).
But to answer specifically your question on why the issue lies only in the last one and not in the previous backslashes, you should take a look at the definition of string literals.
You will see that there is a (short) list of characters for which the backslash makes something particular. For the rest, the backslash is taken as itself.
For instance "\n" is interpreted not as a string with two characters (\ and n) but as a string with only a single line-feed character.
That is not the case with "\P", "\F" or "\E" which are two characters each since they don't have a specific meaning.
\" and \' are particular in that they allow to respectively insert a " or ' character in a string literal delimited by this same character.
For example, 'single: \', double "' and "single: ', double: \"" are two ways to define the single: ', double " string literal.

Python - How to catch specific mysql warnings?

I want to catch any warning that contains the string 'value'.
From this question, I see this example to catch a specific message:
warnings.filterwarnings('error', 'Unknown table .*')
The docs say about the message parameter:
message is a string containing a regular expression that the warning message must match (the match is compiled to always be case-insensitive).
I have the following code but no errors are thrown and instead I'm just getting the warnings which I cannot catch.
warnings.filterwarnings('error', message='\bvalue\b')
What am I missing? As far as I know, that regex should work for matching the 'value' string.
Python's regular expression syntax is documented here, and the first thing it says is:
Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
In Python, as in many languages, the string '\b' corresponds to ASCII backspace
(link).
You need to escape your backslash characters, or else use Python's special "raw/regex" prefix:
warnings.filterwarnings('error', message='\\bvalue\\b')
warnings.filterwarnings('error', message=r'\bvalue\b')

pep8 warning on regex string in Python, Eclipse

Why is pep8 complaining on the next string in the code?
import re
re.compile("\d{3}")
The warning I receive:
ID:W1401 Anomalous backslash in string: '\d'. String constant might be missing an r prefix.
Can you explain what is the meaning of the message? What do I need to change in the code so that the warning W1401 is passed?
The code passes the tests and runs as expected. Moreover \d{3} is a valid regex.
"\d" is same as "\\d" because there's no escape sequence for d. But it is not clear for the reader of the code.
But, consider \t. "\t" represent tab chracter, while r"\t" represent literal \ and t character.
So use raw string when you mean literal \ and d:
re.compile(r"\d{3}")
or escape backslash explicitly:
re.compile("\\d{3}")
Python is unable to parse '\d' as an escape sequence, that's why it produces a warning.
After that it's passed down to regex parser literally, works fine as an E.S. for regex.

How to escape “\” characters in python

i am very new to regular expression and trying get "\" character using python
normally i can escape "\" like this
print ("\\");
print ("i am \\nit");
output
\
i am \nit
but when i use the same in regX it didn't work as i thought
print (re.findall(r'\\',"i am \\nit"));
and return me output
['\\']
can someone please explain why
EDIT: The problem is actually how print works with lists & strings. It prints the representation of the string, not the string itself, the representation of a string containing just a backslash is '\\'. So findall is actually finding the single backslash correctly, but print isn't printing it as you'd expect. Try:
>>> print(re.findall(r'\\',"i am \\nit")[0])
\
(The following is my original answer, it can be ignored (it's entirely irrelevant), I'd misinterpreted the question initially. But it seems to have been upvoted a bit, so I'll leave it here.)
The r prefix on a string means the string is in "raw" mode, that is, \ are not treated as special characters (it doesn't have anything to do with "regex").
However, r'\' doesn't work, as you can't end a raw string with a backslash, it's stated in the docs:
Even in a raw string, string quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character).
But you actually can use a non-raw string to get a single backslash: "\\".
can someone please explain why
Because re.findall found one match, and the match text consisted of a backslash. It gave you a list with one element, which is a string, which has one character, which is a backslash.
That is written ['\\'] because '\\' is how you write "a string with one backslash" - just like you had to do when you wrote the example code print "\\".
Note that you're using two different kinds of string literal here -- there's the regular string "a string" and the raw string r"a raw string". Regular string literals observe backslash escaping, so to actually put a backslash in the string, you need to escape it too. Raw string literals treat backslashes like any other character, so you're more limited in which characters you can actually put in the string (no specials that need an escape code) but it's easier to enter things like regular expressions, because you don't need to double up backslashes if you need to add a backslash to have meaning inside the string, not just when creating the string.
It is unnecessary to escape backslashes in raw strings, unless the backslash immediately precedes the closing quote.

Categories