how is the string """\\"Axis of Awesome\\\"""" interrupted in python - python

i read some info on python's EOL error, and i find an explanation about that error. The author gives an instance about the correct ones, however i can not figure out how does the string """\\"Axis of Awesome\\\"""" can work. Can someone do me a favor to explain how does the string interrupt.Thanks.
==================================================
the answer:
i thought it a lot and finally figure it out. The explanation does the __repr__ function on string and outputs \\"Axis of Awesome\\", however in Hyperboreus's explanation, the __str__ function on string has been called and eventually the result is \"Axis of Awesome\". Actually, they are the same.

"""\\"Axis of Awesome\\\""""
is parsed as
""" \\ " Axis of Awesome \\ \" """
1 2 3 4 5 6 7
Start of string literal
Escaped literal backslash
Literal quotation mark
Literal text
Escaped literal backslash
Escaped literal quotation mark
End of string literal
If you were to print this out, you'd get:
\"Axis of Awesome\"
The example you linked to has one fewer backslash at the end, and is instead parsed like this:
""" \\ " Axis of Awesome \\ """ "
1 2 3 4 5 6 7
Start of string literal
Escaped literal backslash
Literal quotation mark
Literal text
Escaped literal backslash
End of string literal
Start of new singly-quoted string literal; since there's no close quote before the end-of-line, this is a syntax error

Looks like you are confused with a three quotes ( """ ). This is a multiline quote. So, """\"Axis of Awesome\\""" is actually \"Axis of Awesome\\".
s = """ my name is
abc
and I am not a programmer ..."""
So, your actual string is in between the """. If you want to store this string, you have to do this:
>>> a = ' \\"Axis of Awesome\\" '
>>> a
' \\"Axis of Awesome\\" '
>>>

Related

Python REPLY Output is different in 2 scenarios

In the REPL, I get the following output for 2 different scenarios
1st Scenario
>>> 'This is a \" and a \' in a string'
'This is a " and a ' in a string'
2nd Scenario
>>> a = 'This is a \" and a \' in a string'
>>> print(a)
This is a " and a ' in a string
In scenario 1, the second backslash is printed even though it is used as an escape character, but in scenario 2, it escapes. I was wondering why it happens so in scenario 1
Scenario 1 is treated as a text literal where the single quotes are part of the string. Scenario 2 assigns the value inside the two outermost quotes as the text value, so that both of those quotes are treated not as part of the text, but as delimiters.
To achieve the same result as scenario 1 in scenario 2, you would need to add escaped quotes at the appropriate positions, like so:
a = '\'This is a \" and a \' in a string\''
print(a)

how do I know how many backslash to add in Python

Why I need to add 5 backslash on the left if I want to show three of them in Python? How to count the backslash?
# [ ] print "\\\WARNING!///"
print('"\\\\\Warning///"')
You can use a "raw string" by adding r:
print(r'"\\\Warning///"')
This helps to avoid the backslash's "escape" properties, which python uses to control the use of special characters
Backslash is taken as escape sequence most of the time thus for printing single \ one needs to use \\
Unlike standard C, any unrecognized escape sequences are left unchanged in python.
>>> print('\test')
' est' # '\t' evaluates to tab + 'est'
>>> print('\\test')
'\test' # '\\' evaluates to literal '\' + 'test'
>>> print('\\\test')
'\ est' # '\\' evaluates to literal '\' + '\t' evaluates to tab + 'est'
>>> print('\\\\test')
'\\test' # '\\' evaluates to literal '\' + '\\' evaluates to literal '\' + 'test'
according to https://docs.python.org/2.0/ref/strings.html
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string.
since \W is not a valid escape sequence, \W is printed as it is. on the other hand \\ is printed as \.
so \\\\\W is printed as \\\W
However, in python 3.6, according to Strings and bytes literals
Changed in version 3.6: Unrecognized escape sequences produce a DeprecationWarning. In some future version of Python they will be a SyntaxError.
So your code might give SyntaxError in future python.
\ is used for special characters like '\n', '\t' etc. You should type 2n or 2n-1 \ for printing n \.
>>> print('\warning') # 1 \
\warning
>>> print('\\warning') # 2 \
\warning
>>> print('\\\warning') # 3 \
\\warning
>>> print('\\\\warning') # 4 \
\\warning
>>> print('\\\\\warning') # 5 \
\\\warning
>>> print('\\\\\\warning') # 6 \
\\\warning
>>>
Actually you should type 2n backslashes to represent n of them, technically. In a strict grammar, backslash is reserved as an escape character and has a special meaning, i.e., it does not represent "backslash". So, we took it from our character set to give it a special meaning then how to represent a pure "backslash"? The answer is represent it in the way newline character is represented, namely '\' stands for a backslash.
And the reason why you get 3 backslashes printed when 5 is typed is: The first 4 of them is interpreted as I said above and when it comes to the fifth one, the interpreter found that there is no definition for '\W' so it treated the fifth backslash as a normal one instead of a escape character. This is an advanced convenience feature of the interpreter and might not be true in other versions of it or in other languages (especially in more strict ones).

Match literal string '\$'

I'm trying to match literal string '\$'. I'm escaping both '\' and '$' by backslash. Why isn't working when I escape the backslash in the pattern? But if I use a dot then it works.
import re
print re.match('\$','\$')
print re.match('\\\$','\$')
print re.match('.\$','\$')
Output:
None
None
<_sre.SRE_Match object at 0x7fb89cef7b90>
Can someone explain what's happening internally?
You should use the re.escape() function for this:
escape(string)
Return string with all non-alphanumerics backslashed; this is useful
if you want to match an arbitrary literal string that may have regular
expression metacharacters in it.
For example:
import re
val = re.escape('\$') # val = '\\\$'
print re.match(val,'\$')
It outputs:
<_sre.SRE_Match object; span=(0, 2), match='\\$'>
This is equivalent to what #TigerhawkT3 mentioned in his answer.
Unfortunately, you need more backslashes. You need to escape them to indicate that they're literals in the string and get them into the expression, and then further escape them to indicate that they're literals instead of regex special characters. This is why raw strings are often used for regular expressions: the backslashes don't explode.
>>> import re
>>> print re.match('\$','\$')
None
>>> print re.match('\\\$','\$')
None
>>> print re.match('.\$','\$')
<_sre.SRE_Match object at 0x01E1F800>
>>> print re.match('\\\\\$','\$')
<_sre.SRE_Match object at 0x01E1F800>
>>> print re.match(r'\\\$','\$')
<_sre.SRE_Match object at 0x01E1F800>
r'string'
is the raw string
try annotating your regex string
here are the same re's with and without raw annotation
print( re.match(r'\\\$', '\$'))
<_sre.SRE_Match object; span=(0, 2), match='\\$'>
print( re.match('\\\$', '\$'))
None
this is python3 on account of because
In a (non-raw) string literal, backslash is special. It means the Python interpreter should handle following character specially. For example "\n" is a string of length 1 containing the newline character. "\$" is a string of a single character, the dollar sign. "\\$" is a string of two characters: a backslash and a dollar sign.
In regular expressions, the backslash also means the following character is to be handled specially, but in general the special meaning is different. In a regular expression, $ matches the end of a line, and \$ matches a dollar sign, \\ matches a single backslash, and \\$ matches a backslash at the end of a line.
So, when you do re.match('\$',s) the Python interpreter reads '\$' to construct a string object $ (i.e., length 1) then passes that string object to re.match. With re.match('\\$',s) Python makes a string object \$ (length 2) and passes that string object to re.match.
To see what's actually being passed to re.match, just print it. For example:
pat = '\\$'
print "pat :" + pat + ":"
m = re.match(pat, s)
People usually use raw string literals to avoid the double-meaning of backslashes.
pat = r'\$' # same 2-character string as above
Thanks for the above answers. I am adding this answer because we don't have a short summary in the above answers.
The backslash \ needs to be escaped both in python string and regex engine.
Python string will translate 2 \\ to 1 \. And regex engine will require 2 \\ to match 1 \
So to provide the regex engine with 2 \\ in order to match 1 \ we will have to use 4 \\\\ in python string.
\\\\ --> Python(string translation) ---> \\ ---> Regex Engine(translation) ---> \
You have to use . as . matches any characters except newline.

How to check if \n is in a string

I want to remove \n from a string if it is in a string.
I have tried:
slashn = str(chr(92))+"n"
if slashn in newString:
newerString = newString.replace(slashn,'')
print(newerString)
else:
print(newString)
Assume that newString is a word that has \n at the end of it. E.g. text\n.
I have also tried the same code except slash equals to "\\"+"n".
Use str.replace() but with raw string literals:
newString = r"new\nline"
newerString = newString.replace(r"\n", "")
If you put a r right before the quotes enclosing a string literal, it becomes a raw string literal that does not treat any backslash characters as special escape sequences.
Example to clarify raw string literals (output is behind the #> comments):
# Normal string literal: single backslash escapes the 'n' and makes it a new-line character.
print("new\nline")
#> new
#> line
# Normal string literal: first backslash escapes the second backslash and makes it a
# literal backslash. The 'n' won't be escaped and stays a literal 'n'.
print("new\\nline")
#> new\nline
# Raw string literal: All characters are taken literally, the backslash does not have any
# special meaning and therefore does not escape anything.
print(r"new\nline")
#> new\nline
# Raw string literal: All characters are taken literally, no backslash has any
# special meaning and therefore they do not escape anything.
print(r"new\\nline")
#> new\\nline
You can use strip() of a string. Or strip('\n'). strip is a builtin function of a string.
Example:
>>>
>>>
>>> """vivek
...
... """
'vivek\n\n'
>>>
>>> """vivek
...
... """.strip()
'vivek'
>>>
>>> """vivek
...
... \n"""
'vivek\n\n\n'
>>>
>>>
>>> """vivek
...
... \n""".strip()
'vivek'
>>>
Look for the help command for a string builtin function strip like this:
>>>
>>> help(''.strip)
Help on built-in function strip:
strip(...)
S.strip([chars]) -> string or unicode
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
>>>
Use
string_here.rstrip('\n')
To remove the newline.
Try with strip()
your_string.strip("\n") # removes \n before and after the string
If you want to remove the newline from the ends of a string, I'd use .strip(). If no arguments are given then it will remove whitespace characters, this includes newlines (\n).
Using .strip():
if newString[-1:-2:-1] == '\n': #Test if last two characters are "\n"
newerString = newString.strip()
print(newerString)
else:
print(newString)
Another .strip() example (Using Python 2.7.9)
Also, the newline character can simply be represented as "\n".
Text="test.\nNext line."
print(Text)
Output:::: test.\nNextline"
This is because the element is stored in double inverted commas.In such cases next line will behave as text enclose in string.

Can't get single \ in python

I'm trying to learn python, and I'm pretty new at it, and I can't figure this one part out.
Basically, what I'm doing now is something that takes the source code of a webpage, and takes out everything that isn't words.
Webpages have a lot of \n and \t, and I want something that will find \ and delete everything between it and the next ' '.
def removebackslash(source):
while(source.find('\') != -1):
startback = source.find('\')
endback = source[startback:].find(' ') + startback + 1
source = source[0:startback] + source[endback:]
return source
is what I have. It doesn't work like this, because the \' doesn't close the string, but when I change \ to \\, it interprets the string as \\. I can't figure out anything that is interpreted at '\'
\ is an escape character; it either gives characters a special meaning or takes said special meaning away. Right now, it's escaping the closing single quote and treating it as a literal single quote. You need to escape it with itself to insert a literal backslash:
def removebackslash(source):
while(source.find('\\') != -1):
startback = source.find('\\')
endback = source[startback:].find(' ') + startback + 1
source = source[0:startback] + source[endback:]
return source
Try using replace:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
So in your case:
my_text = my_text.replace('\n', '')
my_text = my_text.replace('\t', '')
As others have said, you need to use '\\'. The reason you think this isn't working is because when you get the results, they look like they begin with two backslashes. But they don't begin with two backslashes, it's just that Python shows two backslashes. If it didn't, you couldn't tell the difference between a newline (represented as \n) and a backslash followed by the letter n (represented as \\n).
There are two ways to convince yourself of what's really going on. One is to use print on the result, which causes it to expand the escapes:
>>> x = "here is a backslash \\ and here comes a newline \n this is on the next line"
>>> x
u'here is a backslash \\ and here comes a newline \n this is on the next line'
>>> print x
here is a backslash \ and here comes a newline
this is on the next line
>>> startback = x.find('\\')
>>> x[startback:]
u'\\ and here comes a newline \n this is on the next line'
>>> print x[startback:]
\ and here comes a newline
this is on the next line
Another way is to use len to verify the length of the string:
>>> x = "Backslash \\ !"
>>> startback = x.find('\\')
>>> x[startback:]
u'\\ !'
>>> print x[startback:]
\ !
>>> len(x[startback:])
3
Notice that len(x[startback:]) is 3. The string contains three characters: backslash, space, and exclamation point. You can see what's going on even more simply by just looking at a string that contains only a backslash:
>>> x = "\\"
>>> x
u'\\'
>>> print x
\
>>> len(x)
1
x only looks like it starts with two backslashes when you evaluate it at the interactive prompt (or otherwise use it's __repr__ method). When you actually print it, you can see it's only one backslash, and when you look at its length, you can see it's only one character long.
So what this means is you need to escape the backslash in your find, and you need to recognize that the backslashes displayed in the output may also be doubled.
The SO auto-format shows your problem. Since \ is used to escape characters, it's escaping the end quotes. Try changing that line to (note the use of double quotes):
while(source.find("\\") != -1):
Read more about escape characters in the docs.
I don't think anyone's mentioned this yet, but if you don't want to deal with having to escape characters just use a raw string.
source.find(r'\')
Adding the letter r before the string tells Python not to interpret any special characters and keeps the string exactly as you type it.

Categories