What is the purpose of backward-slash b in python? I ran print "\"foo\bar" in the Python interpreter and got this result:
>>> print "\"foo\bar"
"foar
See the string literal documentation:
\b ASCII Backspace (BS)
It produces a backspace character. Your terminal backspaced over the second o when printing that character.
The \b is a back space character
\b ASCII Backspace (BS)
If you want to print the string \foo\bar do this:
>>> print r"\foo\bar"
\foo\bar
This utilizes the raw strings available in python.
String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences
Related
Could you tell me why '?\\\?'=='?\\\\?' gives True? That drives me crazy and I can't find a reasonable answer...
>>> list('?\\\?')
['?', '\\', '\\', '?']
>>> list('?\\\\?')
['?', '\\', '\\', '?']
Basically, because python is slightly lenient in backslash processing. Quoting from https://docs.python.org/2.0/ref/strings.html :
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string.
(Emphasis in the original)
Therefore, in python, it isn't that three backslashes are equal to four, it's that when you follow backslash with a character like ?, the two together come through as two characters, because \? is not a recognized escape sequence.
This is because backslash acts as an escape character for the character(s) immediately following it, if the combination represents a valid escape sequence. The dozen or so escape sequences are listed here. They include the obvious ones such as newline \n, horizontal tab \t, carriage return \r and more obscure ones such as named unicode characters using \N{...}, e.g. \N{WAVY DASH} which represents unicode character \u3030. The key point though is that if the escape sequence is not known, the character sequence is left in the string as is.
Part of the problem might also be that the Python interpreter output is misleading you. This is because the backslashes are escaped when displayed. However, if you print those strings, you will see the extra backslashes disappear.
>>> '?\\\?'
'?\\\\?'
>>> print('?\\\?')
?\\?
>>> '?\\\?' == '?\\?' # I don't know why you think this is True???
False
>>> '?\\\?' == r'?\\?' # but if you use a raw string for '?\\?'
True
>>> '?\\\\?' == '?\\\?' # this is the same string... see below
True
For your specific examples, in the first case '?\\\?', the first \ escapes the second backslash leaving a single backslash, but the third backslash remains as a backslash because \? is not a valid escape sequence. Hence the resulting string is ?\\?.
For the second case '?\\\\?', the first backslash escapes the second, and the third backslash escapes the fourth which results in the string ?\\?.
So that's why three backslashes is the same as four:
>>> '?\\\?' == '?\\\\?'
True
If you want to create a string with 3 backslashes you can escape each backslash:
>>> '?\\\\\\?'
'?\\\\\\?'
>>> print('?\\\\\\?')
?\\\?
or you might find "raw" strings more understandable:
>>> r'?\\\?'
'?\\\\\\?'
>>> print(r'?\\\?')
?\\\?
This turns of escape sequence processing for the string literal. See String Literals for more details.
Because \x in a character string, when x is not one of the special backslashable characters like n, r, t, 0, etc, evaluates to a string with a backslash and then an x.
>>> '\?'
'\\?'
From the python lexical analysis page under string literals at:
https://docs.python.org/2/reference/lexical_analysis.html
There is a table that lists all the recognized escape sequences.
\\ is an escape sequence that is === \
\? is not an escape sequence and is === \?
so '\\\\' is '\\' followed by '\\' which is '\\' (two escaped \)
and '\\\' is '\\' followed by '\' which is also '\\' (one escaped \ and one raw \)
also, it should be noted that python does not distinguish between single and double quotes surrounding a string literal, unlike some other languages.
So 'String' and "String" are the exact same thing in python, they do not affect the interpretation of escape sequences.
mhawke's answer pretty much covers it, I just want to restate it in a more concise form and with minimal examples that illustrate this behaviour.
I guess one thing to add is that escape processing moves from left to right, so that \n first finds the backslash and then looks for a character to escape, then finds n and escapes it; \\n finds first backslash, finds second and escapes it, then finds n and sees it as a literal n; \? finds backslash and looks for a char to escape, finds ? which cannot be escaped, and so treats \ as a literal backslash.
As mhawke noted, the key here is that interactive interpreter escapes the backslash when displaying a string. I'm guessing the reason for that is to ensure that text strings copied from interpreter into code editor are valid python strings. However, in this case this allowance for convenience causes confusion.
>>> print('\?') # \? is not a valid escape code so backslash is left as-is
\?
>>> print('\\?') # \\ is a valid escape code, resulting in a single backslash
'\?'
>>> '\?' # same as first example except that interactive interpreter escapes the backslash
\\?
>>> '\\?' # same as second example, backslash is again escaped
\\?
I am trying to iterate in a string and find a character on it and delete it.
For example, my string is "HowAre\youDoing" and I want the string "HowAreyouDoing" back (without the character '\'. My Loop is:
for c in string:
if c == '\':
The Point is that '\' is a Special character and it doesn´t allow me to do it in this way. Does anybody knows how can I proceed?
thanks
In python, as in most programing languages, the backslash character is used to introduce a special character, like \n for newline or \t for tab (and several more).
If you initialize a string in python with \y, it will escape it automatically, since \y is not a valid special character and python assumes that you want the actual character \ which is escaped to \\:
>>> s = "HowAre\youDoing"
>>> s
'HowAre\\youDoing'
So, to replace it in your case, just do
>>> s.replace("\\", "")
'HowAreyouDoing'
If you'd like to replace special characters like the aforementioned, you would need to specify the respective special character with an unescaped "\":
>>> s = "HowAre\nyouDoing"
>>> s
'HowAre\nyouDoing'
>>> s.replace("\n", "")
'HowAreyouDoing'
You should escape the character
for c in string:
if c == '\\':
Had two answers and some comments, mentioned another question, but all had not provided REASON, why Python did this changes? such as '/b' is '/x08' is just the result, but why?
Cheers.
I try to add this path"F:\big data\Python_coding\diveintopython-5.4\py"
into sys.path, therefore, the code under it could be imported directly.
after using : sys.path.append('F:\big data\Python_coding\diveintopython-5.4\py')
I found I had this path inside sys.path: 'F:\x08ig data\Python_coding\diveintopython-5.4\py'
I then tested using the following code:mypath1='F:\big data\bython_coding\aiveintopython-5.4\ry'
the mypath1 now is : 'F:\x08ig data\x08ython_coding\x07iveintopython-5.4\ry'
all the '\b' changed into '\x08' and '\a' changed into '\x07'
I searched for a while, but still can not find the reason, could you please check it out and any feedback or help will be appropriated.
Many thanks.
Your strings are being escaped. Check out the docs on string literals:
The backslash () character is used to escape characters that
otherwise have a special meaning, such as newline, backslash itself,
or the quote character. String literals may optionally be prefixed
with a letter r' orR'; such strings are called raw strings and use
different rules for backslash escape sequences.
This is a historical usage dating from the early 60s. It allows you to enter characters that you're not otherwise able to enter from a standard keyboard. For example, if you type into the Python interpreter:
print "\xDC"
...you'll get Ü. In your case, you have \b - representing backspace - which Python displays in the \xhh form, where hh is the hexadecimal value for 08. \a is the escape sequence for the ASCII bell: try print "\a" with your sound on and you should hear a beep.
Escape sequence \a, \b is equivalnt to \x07, \x08.
>>> '\a'
'\x07'
>>> '\b'
'\x08'
You should escape \ itself to represent backslash literally:
>>> '\\a'
'\\a'
>>> '\\b'
'\\b'
or use raw string literals:
>>> r'\a'
'\\a'
>>> r'\b'
'\\b'
I am reading through http://docs.python.org/2/library/re.html. According to this the "r" in pythons re.compile(r' pattern flags') refers the raw string notation :
The solution is to use Python’s raw string notation for regular
expression patterns; backslashes are not handled in any special way in
a string literal prefixed with 'r'. So r"\n" is a two-character string
containing '\' and 'n', while "\n" is a one-character string
containing a newline. Usually patterns will be expressed in Python
code using this raw string notation.
Would it be fair to say then that:
re.compile(r pattern) means that "pattern" is a regex while, re.compile(pattern) means that "pattern" is an exact match?
As #PauloBu stated, the r string prefix is not specifically related to regex's, but to strings generally in Python.
Normal strings use the backslash character as an escape character for special characters (like newlines):
>>> print('this is \n a test')
this is
a test
The r prefix tells the interpreter not to do this:
>>> print(r'this is \n a test')
this is \n a test
>>>
This is important in regular expressions, as you need the backslash to make it to the re module intact - in particular, \b matches empty string specifically at the start and end of a word. re expects the string \b, however normal string interpretation '\b' is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'), or tell python it is a raw string (r'\b').
>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']
No, as the documentation pasted in explains the r prefix to a string indicates that the string is a raw string.
Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash \ character, raw strings provide a way to indicate to python that you want an unescaped string.
Examine the following:
>>> "\n"
'\n'
>>> r"\n"
'\\n'
>>> print "\n"
>>> print r"\n"
\n
Prefixing with an r merely indicates to the string that backslashes \ should be treated literally and not as escape characters for python.
This is helpful, when for example you are searching on a word boundry. The regex for this is \b, however to capture this in a Python string, I'd need to use "\\b" as the pattern. Instead, I can use the raw string: r"\b" to pattern match on.
This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \\, to escape this in python means I need to escape each slash and the pattern becomes "\\\\", or the much simpler r"\\".
As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.
No. Not everything in regex syntax needs to be preceded by \, so ., *, +, etc still have special meaning in a pattern
The r'' is often used as a convenience for regex that do need a lot of \ as it prevents the clutter of doubling up the \
When I write print('\') or print("\") or print("'\'"), Python doesn't print the backslash \ symbol. Instead it errors for the first two and prints '' for the third. What should I do to print a backslash?
This question is about producing a string that has a single backslash in it. This is particularly tricky because it cannot be done with raw strings. For the related question about why such a string is represented with two backslashes, see Why do backslashes appear twice?. For including literal backslashes in other strings, see using backslash in python (not to escape).
You need to escape your backslash by preceding it with, yes, another backslash:
print("\\")
And for versions prior to Python 3:
print "\\"
The \ character is called an escape character, which interprets the character following it differently. For example, n by itself is simply a letter, but when you precede it with a backslash, it becomes \n, which is the newline character.
As you can probably guess, \ also needs to be escaped so it doesn't function like an escape character. You have to... escape the escape, essentially.
See the Python 3 documentation for string literals.
A hacky way of printing a backslash that doesn't involve escaping is to pass its character code to chr:
>>> print(chr(92))
\
print(fr"\{''}")
or how about this
print(r"\ "[0])
For completeness: A backslash can also be escaped as a hex sequence: "\x5c"; or a short Unicode sequence: "\u005c"; or a long Unicode sequence: "\U0000005c". All of these will produce a string with a single backslash, which Python will happily report back to you in its canonical representation - '\\'.