Why can't I end a raw string with a backslash? [duplicate] - python

This question already has answers here:
Why can't Python's raw string literals end with a single backslash?
(14 answers)
Closed 6 years ago.
I am confused here, even though raw strings convert every \ to \\ but when this \ appears in the end it raises error.
>>> r'so\m\e \te\xt'
'so\\m\\e \\te\\xt'
>>> r'so\m\e \te\xt\'
SyntaxError: EOL while scanning string literal
Update:
This is now covered in Python FAQs as well: Why can’t raw strings (r-strings) end with a backslash?

You still need \ to escape ' or " in raw strings, since otherwise the python interpreter doesn't know where the string stops. In your example, you're escaping the closing '.
Otherwise:
r'it wouldn\'t be possible to store this string'
r'since it'd produce a syntax error without the escape'
Look at the syntax highlighting to see what I mean.

Raw strings can't end in single backslashes because of how the parser works (there is no actual escaping going on, though). The workaround is to add the backslash as a non-raw string literal afterwards:
>>> print(r'foo\')
File "<stdin>", line 1
print(r'foo\')
^
SyntaxError: EOL while scanning string literal
>>> print(r'foo''\\')
foo\
Not pretty, but it works. You can add plus to make it clearer what is happening, but it's not necessary:
>>> print(r'foo' + '\\')
foo\

Python strings are processed in two steps:
First the tokenizer looks for the closing quote. It recognizes backslashes when it does this, but doesn't interpret them - it just looks for a sequence of string elements followed by the closing quote mark, where "string elements" are either (a character that's not a backslash, closing quote or a newline - except newlines are allowed in triple-quotes), or (a backslash, followed by any single character).
Then the contents of the string are interpreted (backslash escapes are processed) depending on what kind of string it is. The r flag before a string literal only affects this step.

Quote from https://docs.python.org/3.4/reference/lexical_analysis.html#literals:
Even in a raw literal, quotes can be escaped with a backslash, but the
backslash remains in the result; for example, r"\"" is a valid string
literal consisting of two characters: a backslash and a double quote;
r"\" is not a valid string literal (even a raw string cannot end in an
odd number of backslashes). Specifically, a raw literal cannot end in
a single backslash (since the backslash would escape the following
quote character). Note also that a single backslash followed by a
newline is interpreted as those two characters as part of the literal,
not as a line continuation.
So in raw string, backslash are not treated specially, except when preceding " or '. Therefore, r'\' or r"\" is not a valid string cause right quote is escaped thus making the string literal invalid. In such case, there's no difference whether r exists, i.e. r'\' is equivalent to '\' and r"\" is equivalent to "\".

Related

How does Python interpret backslash in string? [duplicate]

This question already has answers here:
Why do 3 backslashes equal 4 in a Python string?
(5 answers)
Closed 7 months ago.
Although I noticed the pattern but how does the backslash work in string theoretically?
'##2_#]&*^%$\]'
output: '##2_#]&*^%$\\]'
'##2_#]&*^%$\\]'
output: '##2_#]&*^%$\\]'
'##2_#]&*^%$\\\]'
output: '##2_#]&*^%$\\\\]'
The backslash \ character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter `r' or 'R'; such strings are called raw strings and use different rules for backslash escape sequences.
Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
In strict compatibility with Standard C, up to three octal digits are accepted, but an unlimited number of hex digits is taken to be part of the hex escape (and then the lower 8 bits of the resulting hex number are used in 8-bit implementations).
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)
When an 'r' or 'R' prefix is present, backslashes are still used to quote the following character, but all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase `n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a value string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.
From your follow-up comment:
What puzzled me is in my example, it doesn't escape. Single backslash produces double backslashes. Double backslashes produce Double backslashes. Triple backslashes produce quadruple backslashes.....
To be clear: your first output is a string with one backslash in it. Python displays two backslashes in its representation of the string.
When you input the string with a single backslash, Python does not treat the sequence \] in the input as any special escape sequence, and therefore the \ is turned into an actual backslash in the actual string, and the ] into a closing square bracket. Quoting from the documentation linked by Klaus D.:
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)
When you input the string with a double backslash, the sequence \\ is an escape sequence for a single backslash, and then the ] is just a ].
Either way, when Python displays the string back to you, it uses \\ for the single actual backslash, because it does not look ahead to determine that a single backslash would work - the backslash always gets escaped.
To go into a little more detail: Python doesn't care about how you specified the string in the first place - it has a specific "normalized" form that depends only on what the string actually contains. We can see this by playing around with the different ways to quote a string:
>>> 'foo'
'foo'
>>> "foo"
'foo'
>>> r'foo'
'foo'
>>> """foo"""
'foo'
The normalized form will use double quotes if that avoids escape sequences for single quotes:
>>> '\'\'\''
"'''"
But it will switch back to single quotes if the string contains both types of quote:
>>> '\'"'
'\'"'
>>> "'\"'
'\'"'
(Exercise: how many characters are actually in this string, and what are they? How many backslashes does the string contain?)
It contains two characters - a single-quote and a double-quote - and no backslashes.
For the first pattern
'##2_#]&*^%$\]'
\ is not escaped so in the output one more \ is added to escape it.
For the second pattern
'##2_#]&*^%$\\]'
\ is already escaped in the pattern so no new \ in the output.
For the third pattern
'##2_#]&*^%$\\\]' first \ is escaping the second \ and third
\ is being escaped by adding one more \ in the output. So four \.
Hope it helps.

How to include escape char in Python string

I'm trying to build a string dynamically with the following code
output = "".join(["network", "\", "account"])
The escaped result should be something like network\account
How can do this in Python3 without running into this errors
File "<stdin>", line 1
"".join(["network", "\", "account"])
^
SyntaxError: invalid syntax
Raw strings is another way (someone already posted an answer using an escape character).
Precede the quotes with an r:
r'network\account
Edit:
I realise that this doesn't actually work with your example using a single backslash,
I had posted:
output = "".join(["network", r"\", "account"])
but according to Python docs.
Even in a raw literal, quotes can be escaped with a backslash, but the
backslash remains in the result; for example, r"\"" is a valid string
literal consisting of two characters: a backslash and a double quote;
r"\" is not a valid string literal (even a raw string cannot end in an
odd number of backslashes). Specifically, a raw literal cannot end in
a single backslash (since the backslash would escape the following
quote character).
https://docs.python.org/3/reference/lexical_analysis.html?highlight=raw%20strings
Escape the backslash:
output = "".join(["network", "\\", "account"])
In Python strings, the backslash "\" is a special character, also called the "escape" character, so use '\\' for "escape" character
output = "".join(["network", "\\", "account"])

Escape sequence char as a list string [duplicate]

When I write print('\') or print("\") or print("'\'"), Python doesn't print the backslash \ symbol. Instead it errors for the first two and prints '' for the third. What should I do to print a backslash?
This question is about producing a string that has a single backslash in it. This is particularly tricky because it cannot be done with raw strings. For the related question about why such a string is represented with two backslashes, see Why do backslashes appear twice?. For including literal backslashes in other strings, see using backslash in python (not to escape).
You need to escape your backslash by preceding it with, yes, another backslash:
print("\\")
And for versions prior to Python 3:
print "\\"
The \ character is called an escape character, which interprets the character following it differently. For example, n by itself is simply a letter, but when you precede it with a backslash, it becomes \n, which is the newline character.
As you can probably guess, \ also needs to be escaped so it doesn't function like an escape character. You have to... escape the escape, essentially.
See the Python 3 documentation for string literals.
A hacky way of printing a backslash that doesn't involve escaping is to pass its character code to chr:
>>> print(chr(92))
\
print(fr"\{''}")
or how about this
print(r"\ "[0])
For completeness: A backslash can also be escaped as a hex sequence: "\x5c"; or a short Unicode sequence: "\u005c"; or a long Unicode sequence: "\U0000005c". All of these will produce a string with a single backslash, which Python will happily report back to you in its canonical representation - '\\'.

In python SyntaxError: EOL while scanning string literal [duplicate]

This question already has answers here:
Why can't Python's raw string literals end with a single backslash?
(13 answers)
Closed last year.
I'm using python 3.5.1. When I was trying this
print(r'\t\\\')
I got the error: SyntaxError: EOL while scanning string literal.
But this one worked well
print(r'\t\\')
Can anyone please explain this?
See the 3.5 docs on String and Bytes literals:
Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw literal cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal, not as a line continuation.

Confusion escaping single quotes in a single-quoted raw string literal

The following works as expected:
>>> print re.sub('(\w)"(\W)', r"\1''\2", 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal'' is a special case of a "string literal''.
Since I wanted to use single quotes in the replacement expression (is that the correct terminology?), I quoted it using double quotes.
But then for my edification I tried using single quotes in the replacement expression and can't understand the results:
>>> print re.sub('(\w)"(\W)', r'\1\'\'\2', 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal\'\' is a special case of a "string literal\'\'.
Shouldn't the two forms produce exactly the same output?
So, my questions are:
How do I escape a single quote in a single-quoted raw string?
How do I escape a double quote in a double-quoted raw string?
Why is it that in the first parameter to re.sub() I didn't have to use raw string, but in the second parameter I have to. Both seem like string representations of regexes to this Python noob.
If it makes a difference, am using Python 2.7.5 on Mac OS X (10.9, Mavericks).
No, they should not. A raw string literal does let you escape quotes, but the backslashes will be included:
>>> r"\'"
"\\'"
where Python echoes the resulting string as a string literal with the backslash escaped.
This is explicitly documented behaviour of the raw string literal syntax:
When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes).
If you didn't use a raw string literal for the second parameter, Python would interpret the \digit combination as octal byte values:
>>> '\0'
'\x00'
You can construct the same string without raw string literals with doubling the backslash:
>>> '\\1\'\'\\2'
"\\1''\\2"
To answer the questions of the OP:
How do I escape a single quote in a single-quoted raw string?
That is not possible, except if you have the special case where the single quote is preceded by a backslash (as Martijn pointed out).
How do I escape a double quote in a double-quoted raw string?
See above.
Why is it that in the first parameter to re.sub() I didn't have to use raw string, but in the second parameter I have to. Both seem like string representations of regexes to this Python noob.
Completing Martijn's answer (which only covered the second parameter): The backslashes in the first parameter are attempted to be interpreted as escape characters together with their following characters, because the string is not raw. However, because the following characters do not happen to form valid escape sequences together with a backslash, the backslash is interpreted as a character:
>>> '(\w)"(\W)'
'(\\w)"(\\W)'
>>> '(\t)"(\W)'
'(\t)"(\\W)'

Categories