In the REPL, I get the following output for 2 different scenarios
1st Scenario
>>> 'This is a \" and a \' in a string'
'This is a " and a ' in a string'
2nd Scenario
>>> a = 'This is a \" and a \' in a string'
>>> print(a)
This is a " and a ' in a string
In scenario 1, the second backslash is printed even though it is used as an escape character, but in scenario 2, it escapes. I was wondering why it happens so in scenario 1
Scenario 1 is treated as a text literal where the single quotes are part of the string. Scenario 2 assigns the value inside the two outermost quotes as the text value, so that both of those quotes are treated not as part of the text, but as delimiters.
To achieve the same result as scenario 1 in scenario 2, you would need to add escaped quotes at the appropriate positions, like so:
a = '\'This is a \" and a \' in a string\''
print(a)
Related
I am doing the following in python2.7
>>> a='hello team 123'
>>> b=re.search('hello team [0-9]+',a)
>>>
>>> b
<_sre.SRE_Match object at 0x00000000022995E0>
>>> b=re.search(r'hello team [0-9]+',a)
>>> b
<_sre.SRE_Match object at 0x0000000002299578>
>>>
Now as you see, in one case i am doing the raw text while in the other it's without raw text.
From one of the posts on SO, i learnt:
The r means that the string is to be treated as a raw string, which means all escape codes will be ignored.
For an example:
'\n' will be treated as a newline character, while r'\n' will be treated as the characters \ followed by n
Then, why is my example working for both cases i.e with r and without r?
Is it because none of my example uses \ ?
Also please look at the attached screenshot
You are not using any special characters in your string, so r'' and '' will do the same thing.
In hello team [0-9]+ nothing needs to escaped. It will be passed to regex engine as it is. If you use special characters in your Python string then you need to escape them to pass them to regex engine.
There are two levels of escaping involved in regex. First level is Python string and second level regex engine.
So for example:
'\\\\' --> Python(string translation) ---> '\\' ---> Regex Engine(translation) ---> '\'
In order to avoid Python string translation you use raw strings.
r'\\' --> Python(string translation) ---> '\\' ---> Regex Engine(translation) ---> '\'
>>> print repr('\\')
'\\'
>>> print repr(r'\\')
'\\\\'
>>> print str('\\')
\
>>> print str(r'\\')
\\
I have a string as
a = "hello i am stackoverflow.com user +-"
Now I want to convert the escape characters in the string except the quotation marks and white space. So my expected output is :
a = "hello i am stackoverflow\.com user \+\-"
What I did so far is find all the special characters in a string except whitespace and double quote using
re.findall(r'[^\w" ]',a)
Now, once I found all the required special characters I want to update the string. I even tried re.sub but it replaces the special characters. Is there anyway I can do it?
Use re.escape.
>>> a = "hello i am stackoverflow.com user +-"
>>> print(re.sub(r'\\(?=[\s"])', r'', re.escape(a)))
hello i am stackoverflow\.com user \+\-
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
r'\\(?=[\s"])' matches all the backslashes which exists just before to space or double quotes. Replacing the matched backslashes with an empty string will give you the desired output.
OR
>>> a = 'hello i am stackoverflow.com user "+-'
>>> print(re.sub(r'((?![\s"])\W)', r'\\\1', a))
hello i am stackoverflow\.com user "\+\-
((?![\s"])\W) captures all the non-word characters but not of space or double quotes. Replacing the matched characters with backslash + chars inside group index 1 will give you the desired output.
It seems like you could use backreferences with re.sub to achieve what your desired output:
import re
a = "hello i am stackoverflow.com user +-"
print re.sub(r'([^\w" ])', r'\\\1', a) # hello i am stackoverflow\.com user \+\-
The replacement pattern r'\\\1' is just \\ which means a literal backslash, followed \1 which means capture group 1, the pattern captured in the parentheses in the first argument.
In other words, it will escape everything except:
alphanumeric characters
underscore
double quotes
space
i read some info on python's EOL error, and i find an explanation about that error. The author gives an instance about the correct ones, however i can not figure out how does the string """\\"Axis of Awesome\\\"""" can work. Can someone do me a favor to explain how does the string interrupt.Thanks.
==================================================
the answer:
i thought it a lot and finally figure it out. The explanation does the __repr__ function on string and outputs \\"Axis of Awesome\\", however in Hyperboreus's explanation, the __str__ function on string has been called and eventually the result is \"Axis of Awesome\". Actually, they are the same.
"""\\"Axis of Awesome\\\""""
is parsed as
""" \\ " Axis of Awesome \\ \" """
1 2 3 4 5 6 7
Start of string literal
Escaped literal backslash
Literal quotation mark
Literal text
Escaped literal backslash
Escaped literal quotation mark
End of string literal
If you were to print this out, you'd get:
\"Axis of Awesome\"
The example you linked to has one fewer backslash at the end, and is instead parsed like this:
""" \\ " Axis of Awesome \\ """ "
1 2 3 4 5 6 7
Start of string literal
Escaped literal backslash
Literal quotation mark
Literal text
Escaped literal backslash
End of string literal
Start of new singly-quoted string literal; since there's no close quote before the end-of-line, this is a syntax error
Looks like you are confused with a three quotes ( """ ). This is a multiline quote. So, """\"Axis of Awesome\\""" is actually \"Axis of Awesome\\".
s = """ my name is
abc
and I am not a programmer ..."""
So, your actual string is in between the """. If you want to store this string, you have to do this:
>>> a = ' \\"Axis of Awesome\\" '
>>> a
' \\"Axis of Awesome\\" '
>>>
I'm trying to learn python, and I'm pretty new at it, and I can't figure this one part out.
Basically, what I'm doing now is something that takes the source code of a webpage, and takes out everything that isn't words.
Webpages have a lot of \n and \t, and I want something that will find \ and delete everything between it and the next ' '.
def removebackslash(source):
while(source.find('\') != -1):
startback = source.find('\')
endback = source[startback:].find(' ') + startback + 1
source = source[0:startback] + source[endback:]
return source
is what I have. It doesn't work like this, because the \' doesn't close the string, but when I change \ to \\, it interprets the string as \\. I can't figure out anything that is interpreted at '\'
\ is an escape character; it either gives characters a special meaning or takes said special meaning away. Right now, it's escaping the closing single quote and treating it as a literal single quote. You need to escape it with itself to insert a literal backslash:
def removebackslash(source):
while(source.find('\\') != -1):
startback = source.find('\\')
endback = source[startback:].find(' ') + startback + 1
source = source[0:startback] + source[endback:]
return source
Try using replace:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
So in your case:
my_text = my_text.replace('\n', '')
my_text = my_text.replace('\t', '')
As others have said, you need to use '\\'. The reason you think this isn't working is because when you get the results, they look like they begin with two backslashes. But they don't begin with two backslashes, it's just that Python shows two backslashes. If it didn't, you couldn't tell the difference between a newline (represented as \n) and a backslash followed by the letter n (represented as \\n).
There are two ways to convince yourself of what's really going on. One is to use print on the result, which causes it to expand the escapes:
>>> x = "here is a backslash \\ and here comes a newline \n this is on the next line"
>>> x
u'here is a backslash \\ and here comes a newline \n this is on the next line'
>>> print x
here is a backslash \ and here comes a newline
this is on the next line
>>> startback = x.find('\\')
>>> x[startback:]
u'\\ and here comes a newline \n this is on the next line'
>>> print x[startback:]
\ and here comes a newline
this is on the next line
Another way is to use len to verify the length of the string:
>>> x = "Backslash \\ !"
>>> startback = x.find('\\')
>>> x[startback:]
u'\\ !'
>>> print x[startback:]
\ !
>>> len(x[startback:])
3
Notice that len(x[startback:]) is 3. The string contains three characters: backslash, space, and exclamation point. You can see what's going on even more simply by just looking at a string that contains only a backslash:
>>> x = "\\"
>>> x
u'\\'
>>> print x
\
>>> len(x)
1
x only looks like it starts with two backslashes when you evaluate it at the interactive prompt (or otherwise use it's __repr__ method). When you actually print it, you can see it's only one backslash, and when you look at its length, you can see it's only one character long.
So what this means is you need to escape the backslash in your find, and you need to recognize that the backslashes displayed in the output may also be doubled.
The SO auto-format shows your problem. Since \ is used to escape characters, it's escaping the end quotes. Try changing that line to (note the use of double quotes):
while(source.find("\\") != -1):
Read more about escape characters in the docs.
I don't think anyone's mentioned this yet, but if you don't want to deal with having to escape characters just use a raw string.
source.find(r'\')
Adding the letter r before the string tells Python not to interpret any special characters and keeps the string exactly as you type it.
I want to replace a pattern with a string. The string is given in a variable. It might, of course, contains '\1', and it should not be interpreted as a backreference - but simply as \1.
How can I achieve that?
The previous answer using re.escape() would escape too much, and you would get undesirable backslashes in the replacement and the replaced string.
It seems like in Python only the backslash needs escaping in the replacement string, thus something like this could be sufficient:
replacement = replacement.replace("\\", "\\\\")
Example:
import re
x = r'hai! \1 <ops> $1 \' \x \\'
print "want to see: "
print x
print "getting: "
print re.sub(".(.).", x, "###")
print "over escaped: "
print re.sub(".(.).", re.escape(x), "###")
print "could work: "
print re.sub(".(.).", x.replace("\\", "\\\\"), "###")
Output:
want to see:
hai! \1 <ops> $1 \' \x \\
getting:
hai! # <ops> $1 \' \x \
over escaped:
hai\!\ \1\ \<ops\>\ \$1\ \\'\ \x\ \\
could work:
hai! \1 <ops> $1 \' \x \\
Due to comments I thought quite a while about this and tried it out. Helped me a lot to increase my understanding about escaping, so I changed my answer nearly completely that it could be useful for later readers.
NullUserException gave you just the short version, I try to explain it a bit more. And thanks to the critical reviews of Qtax and Duncan, this answer is hopefully now correct and helpful.
The backslash has a special meaning, its the escape character in strings, that means the backslash and the following character form an escape sequence that is translated to something else when something is done with the string. This "something is done" is already the creation of the string. So if you want to use \ literally you need to escape it. This escape character is the backslash itself.
So as start some examples for a better understanding what happens. I print additionally the ASCII codes of the characters in the string to hopefully increase the understandability of what happens.
s = "A\1\nB"
print s
print [x for x in s]
print [hex(ord(x)) for x in s]
is printing
A
B
['A', '\x01', '\n', 'B']
['0x41', '0x1', '0xa', '0x42']
So while I typed \ and 1 in the code, s does not contain those two characters, it contains the ASCII character 0x01 which is "Start of heading". Same for the \n, it translated to 0x0a the Linefeed character.
Since this behaviour is not always wanted, raw strings can be used, where the escape sequences are ignored.
s = r"A\1\nB"
print s
print [x for x in s]
print [hex(ord(x)) for x in s]
I just added the r before the string and the result is now
A\1\nB
['A', '\\', '1', '\\', 'n', 'B']
['0x41', '0x5c', '0x31', '0x5c', '0x6e', '0x42']
All characters are printed as I typed them.
This is the situation we have. Now there is the next thing.
There can be the situation that a string should be passed to a regex to be found literally, so every character that has a special meaning within a regex (e.g. +*$[.) needs to escaped, therefore there is a special function re.escape that does this job.
But for this question this is the wrong function, because the string should not be used within a regex, but as the replacement string for re.sub.
So new situation:
A raw string including escape sequences should be used as replacement string for re.sub. re.sub will also handle the escape sequences, but with a small, but important, difference to the handling before: \n is still translated to 0x0a the Linefeed character, but the transition of \1 has changed now! It will be replaced by the content of the capturing group 1 of the regex in re.sub.
s = r"A\1\nB"
print re.sub(r"(Replace)" ,s , "1 Replace 2")
And the result is
1 AReplace
B 2
The \1 has been replaced with the content of the capturing group and \n with the LineFeed character.
The important point is, you have to understand this behaviour and now you have two possiblities to my opinion (and I am not going to judge which one is the correct one)
The creator is unsure about the string behaviour and if he inputs \n then he wants a newline. In this case use this to just escape the \ that are followed by a digit.
OnlyDigits = re.sub(r"(Replace)" ,re.sub(r"(\\)(?=\d)", r"\\\\", s) , "1 Replace 2")
print OnlyDigits
print [x for x in OnlyDigits]
print [hex(ord(x)) for x in OnlyDigits
Output:
1 A\1
B 2
['1', ' ', 'A', '\\', '1', '\n', 'B', ' ', '2']
['0x31', '0x20', '0x41', '0x5c', '0x31', '0xa', '0x42', '0x20', '0x32']
The creator nows exactly what he is doing and if he would have wanted a newline, he would have typed \0xa. In this case escape all
All = re.sub(r"(Replace)" ,re.sub(r"(\\)", r"\\\\", s) , "1 Replace 2")
print All
print [x for x in All]
print [hex(ord(x)) for x in All]
Output:
1 A\1\nB 2
['1', ' ', 'A', '\\', '1', '\\', 'n', 'B', ' ', '2']
['0x31', '0x20', '0x41', '0x5c', '0x31', '0x5c', '0x6e', '0x42', '0x20', '0x32']