how do I know how many backslash to add in Python - python

Why I need to add 5 backslash on the left if I want to show three of them in Python? How to count the backslash?
# [ ] print "\\\WARNING!///"
print('"\\\\\Warning///"')

You can use a "raw string" by adding r:
print(r'"\\\Warning///"')
This helps to avoid the backslash's "escape" properties, which python uses to control the use of special characters

Backslash is taken as escape sequence most of the time thus for printing single \ one needs to use \\

Unlike standard C, any unrecognized escape sequences are left unchanged in python.
>>> print('\test')
' est' # '\t' evaluates to tab + 'est'
>>> print('\\test')
'\test' # '\\' evaluates to literal '\' + 'test'
>>> print('\\\test')
'\ est' # '\\' evaluates to literal '\' + '\t' evaluates to tab + 'est'
>>> print('\\\\test')
'\\test' # '\\' evaluates to literal '\' + '\\' evaluates to literal '\' + 'test'

according to https://docs.python.org/2.0/ref/strings.html
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string.
since \W is not a valid escape sequence, \W is printed as it is. on the other hand \\ is printed as \.
so \\\\\W is printed as \\\W
However, in python 3.6, according to Strings and bytes literals
Changed in version 3.6: Unrecognized escape sequences produce a DeprecationWarning. In some future version of Python they will be a SyntaxError.
So your code might give SyntaxError in future python.

\ is used for special characters like '\n', '\t' etc. You should type 2n or 2n-1 \ for printing n \.
>>> print('\warning') # 1 \
\warning
>>> print('\\warning') # 2 \
\warning
>>> print('\\\warning') # 3 \
\\warning
>>> print('\\\\warning') # 4 \
\\warning
>>> print('\\\\\warning') # 5 \
\\\warning
>>> print('\\\\\\warning') # 6 \
\\\warning
>>>

Actually you should type 2n backslashes to represent n of them, technically. In a strict grammar, backslash is reserved as an escape character and has a special meaning, i.e., it does not represent "backslash". So, we took it from our character set to give it a special meaning then how to represent a pure "backslash"? The answer is represent it in the way newline character is represented, namely '\' stands for a backslash.
And the reason why you get 3 backslashes printed when 5 is typed is: The first 4 of them is interpreted as I said above and when it comes to the fifth one, the interpreter found that there is no definition for '\W' so it treated the fifth backslash as a normal one instead of a escape character. This is an advanced convenience feature of the interpreter and might not be true in other versions of it or in other languages (especially in more strict ones).

Related

Confused about the backslash in Python

I understand that to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\\". Without raw string notation, one must use "\\\\".
When I saw the code string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string), I was wondering the meaning of a backslash in \' and \`, since it also works well as ' and `, like string = re.sub(r"[^A-Za-z0-9(),!?'`]", " ", string). Is there any need to add the backslash here?
I tried some examples in Python:
str1 = "\'s"
print(str1)
str2 = "'s"
print(str2)
The result is same as 's. I think this might be the reason why in previous code, they use \'\` in string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string). I was wondering is there any difference between "\'s" and "'s" ?
string = 'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
re.match(r"\\", string)
The re.match returns nothing, which shows there is no backslash in the string. However, I do see backslashes in it. Is that the backslash in \' actually not a backslash?
In python, those are escaped characters, because they can also have other meanings to the code other than as they appear on-screen (for example, a string can be made by wrapping it in a single quote). You can see all of the python string literals here, but the reason there were no backslashes found in that string is that they are considered escaped single quotes. Although it's not necessary, it is still valid syntax because it sometimes is needed
Check out https://docs.python.org/2.0/ref/strings.html for a better explanation.
The problem with your second example is that string isn't a raw string, so the \' is interpreted as '. If you change it to:
>>> not_raw = 'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
>>> res1 = re.search(r'\\',not_raw)
>>> type(res1)
<type 'NoneType'>
>>> raw = r'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
>>> res2 = re.search(r'\\',raw)
>>> type(res2)
<type '_sre.SRE_Match'>
For an explanation of re.match vs re.search: What is the difference between Python's re.search and re.match?

Regarding the regex in search module with and without raw text

I am doing the following in python2.7
>>> a='hello team 123'
>>> b=re.search('hello team [0-9]+',a)
>>>
>>> b
<_sre.SRE_Match object at 0x00000000022995E0>
>>> b=re.search(r'hello team [0-9]+',a)
>>> b
<_sre.SRE_Match object at 0x0000000002299578>
>>>
Now as you see, in one case i am doing the raw text while in the other it's without raw text.
From one of the posts on SO, i learnt:
The r means that the string is to be treated as a raw string, which means all escape codes will be ignored.
For an example:
'\n' will be treated as a newline character, while r'\n' will be treated as the characters \ followed by n
Then, why is my example working for both cases i.e with r and without r?
Is it because none of my example uses \ ?
Also please look at the attached screenshot
You are not using any special characters in your string, so r'' and '' will do the same thing.
In hello team [0-9]+ nothing needs to escaped. It will be passed to regex engine as it is. If you use special characters in your Python string then you need to escape them to pass them to regex engine.
There are two levels of escaping involved in regex. First level is Python string and second level regex engine.
So for example:
'\\\\' --> Python(string translation) ---> '\\' ---> Regex Engine(translation) ---> '\'
In order to avoid Python string translation you use raw strings.
r'\\' --> Python(string translation) ---> '\\' ---> Regex Engine(translation) ---> '\'
>>> print repr('\\')
'\\'
>>> print repr(r'\\')
'\\\\'
>>> print str('\\')
\
>>> print str(r'\\')
\\

how is the string """\\"Axis of Awesome\\\"""" interrupted in python

i read some info on python's EOL error, and i find an explanation about that error. The author gives an instance about the correct ones, however i can not figure out how does the string """\\"Axis of Awesome\\\"""" can work. Can someone do me a favor to explain how does the string interrupt.Thanks.
==================================================
the answer:
i thought it a lot and finally figure it out. The explanation does the __repr__ function on string and outputs \\"Axis of Awesome\\", however in Hyperboreus's explanation, the __str__ function on string has been called and eventually the result is \"Axis of Awesome\". Actually, they are the same.
"""\\"Axis of Awesome\\\""""
is parsed as
""" \\ " Axis of Awesome \\ \" """
1 2 3 4 5 6 7
Start of string literal
Escaped literal backslash
Literal quotation mark
Literal text
Escaped literal backslash
Escaped literal quotation mark
End of string literal
If you were to print this out, you'd get:
\"Axis of Awesome\"
The example you linked to has one fewer backslash at the end, and is instead parsed like this:
""" \\ " Axis of Awesome \\ """ "
1 2 3 4 5 6 7
Start of string literal
Escaped literal backslash
Literal quotation mark
Literal text
Escaped literal backslash
End of string literal
Start of new singly-quoted string literal; since there's no close quote before the end-of-line, this is a syntax error
Looks like you are confused with a three quotes ( """ ). This is a multiline quote. So, """\"Axis of Awesome\\""" is actually \"Axis of Awesome\\".
s = """ my name is
abc
and I am not a programmer ..."""
So, your actual string is in between the """. If you want to store this string, you have to do this:
>>> a = ' \\"Axis of Awesome\\" '
>>> a
' \\"Axis of Awesome\\" '
>>>

Can't get single \ in python

I'm trying to learn python, and I'm pretty new at it, and I can't figure this one part out.
Basically, what I'm doing now is something that takes the source code of a webpage, and takes out everything that isn't words.
Webpages have a lot of \n and \t, and I want something that will find \ and delete everything between it and the next ' '.
def removebackslash(source):
while(source.find('\') != -1):
startback = source.find('\')
endback = source[startback:].find(' ') + startback + 1
source = source[0:startback] + source[endback:]
return source
is what I have. It doesn't work like this, because the \' doesn't close the string, but when I change \ to \\, it interprets the string as \\. I can't figure out anything that is interpreted at '\'
\ is an escape character; it either gives characters a special meaning or takes said special meaning away. Right now, it's escaping the closing single quote and treating it as a literal single quote. You need to escape it with itself to insert a literal backslash:
def removebackslash(source):
while(source.find('\\') != -1):
startback = source.find('\\')
endback = source[startback:].find(' ') + startback + 1
source = source[0:startback] + source[endback:]
return source
Try using replace:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
So in your case:
my_text = my_text.replace('\n', '')
my_text = my_text.replace('\t', '')
As others have said, you need to use '\\'. The reason you think this isn't working is because when you get the results, they look like they begin with two backslashes. But they don't begin with two backslashes, it's just that Python shows two backslashes. If it didn't, you couldn't tell the difference between a newline (represented as \n) and a backslash followed by the letter n (represented as \\n).
There are two ways to convince yourself of what's really going on. One is to use print on the result, which causes it to expand the escapes:
>>> x = "here is a backslash \\ and here comes a newline \n this is on the next line"
>>> x
u'here is a backslash \\ and here comes a newline \n this is on the next line'
>>> print x
here is a backslash \ and here comes a newline
this is on the next line
>>> startback = x.find('\\')
>>> x[startback:]
u'\\ and here comes a newline \n this is on the next line'
>>> print x[startback:]
\ and here comes a newline
this is on the next line
Another way is to use len to verify the length of the string:
>>> x = "Backslash \\ !"
>>> startback = x.find('\\')
>>> x[startback:]
u'\\ !'
>>> print x[startback:]
\ !
>>> len(x[startback:])
3
Notice that len(x[startback:]) is 3. The string contains three characters: backslash, space, and exclamation point. You can see what's going on even more simply by just looking at a string that contains only a backslash:
>>> x = "\\"
>>> x
u'\\'
>>> print x
\
>>> len(x)
1
x only looks like it starts with two backslashes when you evaluate it at the interactive prompt (or otherwise use it's __repr__ method). When you actually print it, you can see it's only one backslash, and when you look at its length, you can see it's only one character long.
So what this means is you need to escape the backslash in your find, and you need to recognize that the backslashes displayed in the output may also be doubled.
The SO auto-format shows your problem. Since \ is used to escape characters, it's escaping the end quotes. Try changing that line to (note the use of double quotes):
while(source.find("\\") != -1):
Read more about escape characters in the docs.
I don't think anyone's mentioned this yet, but if you don't want to deal with having to escape characters just use a raw string.
source.find(r'\')
Adding the letter r before the string tells Python not to interpret any special characters and keeps the string exactly as you type it.

Handling backreferences to capturing groups in re.sub replacement pattern

I want to take the string 0.71331, 52.25378 and return 0.71331,52.25378 - i.e. just look for a digit, a comma, a space and a digit, and strip out the space.
This is my current code:
coords = '0.71331, 52.25378'
coord_re = re.sub("(\d), (\d)", "\1,\2", coords)
print coord_re
But this gives me 0.7133,2.25378. What am I doing wrong?
You should be using raw strings for regex, try the following:
coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)
With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):
>>> '\1,\2'
'\x01,\x02'
>>> print '\1,\2'
,
>>> print r'\1,\2' # this is what you actually want
\1,\2
Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).
Python interprets the \1 as a character with ASCII value 1, and passes that to sub.
Use raw strings, in which Python doesn't interpret the \.
coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)
This is covered right in the beginning of the re documentation, should you need more info.

Categories