python equal strings are not equal with new line character in it - python

I have a password. It contains newline (lets for now omit why) character and is:
"h2sdf\ndfGd"
This password is in dict my_dict. When I just print values of dict I get "\" instead of "" - "h2sdf\ndfGd"! Don't understand why.
When I get it and use it to authenticate to web server, it says that Authentication fails. When I try to compare:
my_dict["password"] == "h2sdf\ndfGd"
it returns False.
But when I try just print(my_dict["password"]) I get h2sdf\ndfGd which is identical, but for python it is not. Why? I am lost.

Check this:
>>> print("h2sdf\ndfGd")
h2sdf
dfGd
>>> print("h2sdf\\ndfGd")
h2sdf\ndfGd
You simply have to escape \n with a double \ backslash, to prevent it to become a newline.

Characters like tabs, newlines, which cannot be represented in a string are described using an escape sequence with a backslash.
In order to indicate that the backslash is not part of an escape sequence(\n, \t, ...), it must itself be escaped using another backslash: \\.
my_dict["password"] == "h2sdf\\ndfGd"
If you don't want to have to escape all your \, you can use a raw string instead.
Raw strings are prefixed with r or R, and treat backslashes \ as literal characters.
my_dict["password"] == r"h2sdf\ndfGd"

Related

How to write a regular expression with a list of quotes, using raw string literal? [duplicate]

Technically, any odd number of backslashes, as described in the documentation.
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
File "<stdin>", line 1
r'\\\'
^
SyntaxError: EOL while scanning string literal
It seems like the parser could just treat backslashes in raw strings as regular characters (isn't that what raw strings are all about?), but I'm probably missing something obvious.
The whole misconception about python's raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python's tutorial sequence:
When an 'r' or 'R' prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string
So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).
This way:
r'abc\d' comprises a, b, c, \, d
r'abc\'d' comprises a, b, c, \, ', d
r'abc\'' comprises a, b, c, \, '
and:
r'abc\' comprises a, b, c, \, ' but there is no terminating quote now.
Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will 'devour' string closing char.
The reason is explained in the part of that section which I highlighted in bold:
String quotes can be escaped with a
backslash, but the backslash remains
in the string; for example, r"\"" is a
valid string literal consisting of two
characters: a backslash and a double
quote; r"\" is not a valid string
literal (even a raw string cannot end
in an odd number of backslashes).
Specifically, a raw string cannot end
in a single backslash (since the
backslash would escape the following
quote character). Note also that a
single backslash followed by a newline
is interpreted as those two characters
as part of the string, not as a line
continuation.
So raw strings are not 100% raw, there is still some rudimentary backslash-processing.
That's the way it is! I see it as one of those small defects in python!
I don't think there's a good reason for it, but it's definitely not parsing; it's really easy to parse raw strings with \ as a last character.
The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character.
However, this shouldn't cause any trouble.
If you're worried about not being able to easily write windows folder pathes such as c:\mypath\ then worry not, for, you can represent them as r"C:\mypath", and, if you need to append a subdirectory name, don't do it with string concatenation, for it's not the right way to do it anyway! use os.path.join
>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'
In order for you to end a raw string with a slash I suggest you can use this trick:
>>> print r"c:\test"'\\'
test\
It uses the implicit concatenation of string literals in Python and concatenates one string delimited with double quotes with another that is delimited by single quotes. Ugly, but works.
Another trick is to use chr(92) as it evaluates to "\".
I recently had to clean a string of backslashes and the following did the trick:
CleanString = DirtyString.replace(chr(92),'')
I realize that this does not take care of the "why" but the thread attracts many people looking for a solution to an immediate problem.
Since \" is allowed inside the raw string. Then it can't be used to identify the end of the string literal.
Why not stop parsing the string literal when you encounter the first "?
If that was the case, then \" wouldn't be allowed inside the string literal. But it is.
The reason for why r'\' is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'. Same applies for double quotes.
But you could use:
'\\'
Another user who has since deleted their answer (not sure if they'd like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).
I thought it was an interesting idea and am including it as community wiki for posterity.
Naive raw strings
The naive idea of a raw string is
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
and it will mean itself.
Unfortunately, this does not work, because if the whatever
happens to contain a quote, the raw string would end at that point.
It is simply impossible that I can put "whatever I want"
between fixed delimiters, because some of it could look like
the terminating delimiter -- no matter what that delimiter is.
Real-world raw strings (variant 1)
One possible approach to this problem would be to say
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.
This restriction sounds harsh, until one recognizes that
Python's large offering of quotes can accommodate most situations
with this rule. The following are all valid Python quotes:
'
"
'''
"""
With this many possibilities for the delimiter, almost anything
can be made to work.
About the only exception would be if the string
literal is supposed to contain a complete list of all allowed
Python quotes.
Real-world raw strings (variant 2, as in Python)
Python, however, takes a different route using
an extended version of the above rule.
It effectively states
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.
If I insist on including a quote, even that is allowed,
but I have to put a backslash before it.
So the Python approach is, in a sense, even more liberal
than variant 1 above -- but it has the side effect of
"mis"interpreting the closing quote as part of the string
if the last intended character of the string is a backslash.
Variant 2 is not helpful:
If I want the quote in my string,
but not the backslash, the allowed version of my string literal
will not be what I need.
However, given the three different other kinds of quotes I have
at my disposal, I will probably just pick one of those and my
problem will be solved -- so this is not problematic case.
The problematic case is this one:
If I want my string to end with a backslash, I am at a loss.
I need to resort to concatenating a non-raw string literal
containing the backslash.
Conclusion
After writing this, I go with several of the other posters
that variant 1 would have been easier to understand and to accept
and therefore more pythonic. That's life!
Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.
That does indeed disallow \ as last character since it will escape the " and make the parser choke. But as pointed out earlier \ is legal.
some tips :
1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :
os.path.normpath('c:/folder1/')
2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use 'r' prefix before your literal string). for example :
r'\one \two \three'
3) if you need to prefix a string in a variable X with a backslash then you can do this :
X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X # X2 now contains \dummy
4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :
voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name
now lilypond_statement contains "\DisplayLilyMusic \upper"
long live python ! :)
n3on
Despite its role, even a raw string cannot end in a single
backslash, because the backslash escapes the following quote
character—you still must escape the surrounding quote character to
embed it in the string. That is, r"...\" is not a valid string
literal—a raw string cannot end in an odd number of backslashes.
If you need to end a raw string with a single backslash, you can use
two and slice off the second.
I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:
x = 'a string\\'
x
'a string\\'
# Now save it in a text file and it will appear with a single backslash:
with open("my_file.txt", 'w') as h:
h.write(x)
BTW it is not working with json if you dump it using python's json library.
Finally, I work with Spyder, and I noticed that if I open the variable in spider's text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it's not very helpful for most needs but maybe for some..).

Print only a single slash using print r"\" (Python) [duplicate]

Technically, any odd number of backslashes, as described in the documentation.
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
File "<stdin>", line 1
r'\\\'
^
SyntaxError: EOL while scanning string literal
It seems like the parser could just treat backslashes in raw strings as regular characters (isn't that what raw strings are all about?), but I'm probably missing something obvious.
The whole misconception about python's raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python's tutorial sequence:
When an 'r' or 'R' prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string
So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).
This way:
r'abc\d' comprises a, b, c, \, d
r'abc\'d' comprises a, b, c, \, ', d
r'abc\'' comprises a, b, c, \, '
and:
r'abc\' comprises a, b, c, \, ' but there is no terminating quote now.
Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will 'devour' string closing char.
The reason is explained in the part of that section which I highlighted in bold:
String quotes can be escaped with a
backslash, but the backslash remains
in the string; for example, r"\"" is a
valid string literal consisting of two
characters: a backslash and a double
quote; r"\" is not a valid string
literal (even a raw string cannot end
in an odd number of backslashes).
Specifically, a raw string cannot end
in a single backslash (since the
backslash would escape the following
quote character). Note also that a
single backslash followed by a newline
is interpreted as those two characters
as part of the string, not as a line
continuation.
So raw strings are not 100% raw, there is still some rudimentary backslash-processing.
That's the way it is! I see it as one of those small defects in python!
I don't think there's a good reason for it, but it's definitely not parsing; it's really easy to parse raw strings with \ as a last character.
The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character.
However, this shouldn't cause any trouble.
If you're worried about not being able to easily write windows folder pathes such as c:\mypath\ then worry not, for, you can represent them as r"C:\mypath", and, if you need to append a subdirectory name, don't do it with string concatenation, for it's not the right way to do it anyway! use os.path.join
>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'
In order for you to end a raw string with a slash I suggest you can use this trick:
>>> print r"c:\test"'\\'
test\
It uses the implicit concatenation of string literals in Python and concatenates one string delimited with double quotes with another that is delimited by single quotes. Ugly, but works.
Another trick is to use chr(92) as it evaluates to "\".
I recently had to clean a string of backslashes and the following did the trick:
CleanString = DirtyString.replace(chr(92),'')
I realize that this does not take care of the "why" but the thread attracts many people looking for a solution to an immediate problem.
Since \" is allowed inside the raw string. Then it can't be used to identify the end of the string literal.
Why not stop parsing the string literal when you encounter the first "?
If that was the case, then \" wouldn't be allowed inside the string literal. But it is.
The reason for why r'\' is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'. Same applies for double quotes.
But you could use:
'\\'
Another user who has since deleted their answer (not sure if they'd like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).
I thought it was an interesting idea and am including it as community wiki for posterity.
Naive raw strings
The naive idea of a raw string is
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
and it will mean itself.
Unfortunately, this does not work, because if the whatever
happens to contain a quote, the raw string would end at that point.
It is simply impossible that I can put "whatever I want"
between fixed delimiters, because some of it could look like
the terminating delimiter -- no matter what that delimiter is.
Real-world raw strings (variant 1)
One possible approach to this problem would be to say
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.
This restriction sounds harsh, until one recognizes that
Python's large offering of quotes can accommodate most situations
with this rule. The following are all valid Python quotes:
'
"
'''
"""
With this many possibilities for the delimiter, almost anything
can be made to work.
About the only exception would be if the string
literal is supposed to contain a complete list of all allowed
Python quotes.
Real-world raw strings (variant 2, as in Python)
Python, however, takes a different route using
an extended version of the above rule.
It effectively states
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.
If I insist on including a quote, even that is allowed,
but I have to put a backslash before it.
So the Python approach is, in a sense, even more liberal
than variant 1 above -- but it has the side effect of
"mis"interpreting the closing quote as part of the string
if the last intended character of the string is a backslash.
Variant 2 is not helpful:
If I want the quote in my string,
but not the backslash, the allowed version of my string literal
will not be what I need.
However, given the three different other kinds of quotes I have
at my disposal, I will probably just pick one of those and my
problem will be solved -- so this is not problematic case.
The problematic case is this one:
If I want my string to end with a backslash, I am at a loss.
I need to resort to concatenating a non-raw string literal
containing the backslash.
Conclusion
After writing this, I go with several of the other posters
that variant 1 would have been easier to understand and to accept
and therefore more pythonic. That's life!
Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.
That does indeed disallow \ as last character since it will escape the " and make the parser choke. But as pointed out earlier \ is legal.
some tips :
1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :
os.path.normpath('c:/folder1/')
2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use 'r' prefix before your literal string). for example :
r'\one \two \three'
3) if you need to prefix a string in a variable X with a backslash then you can do this :
X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X # X2 now contains \dummy
4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :
voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name
now lilypond_statement contains "\DisplayLilyMusic \upper"
long live python ! :)
n3on
Despite its role, even a raw string cannot end in a single
backslash, because the backslash escapes the following quote
character—you still must escape the surrounding quote character to
embed it in the string. That is, r"...\" is not a valid string
literal—a raw string cannot end in an odd number of backslashes.
If you need to end a raw string with a single backslash, you can use
two and slice off the second.
I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:
x = 'a string\\'
x
'a string\\'
# Now save it in a text file and it will appear with a single backslash:
with open("my_file.txt", 'w') as h:
h.write(x)
BTW it is not working with json if you dump it using python's json library.
Finally, I work with Spyder, and I noticed that if I open the variable in spider's text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it's not very helpful for most needs but maybe for some..).

Replacing a string by a string with backslashes

I am creating a program that automatically generates my reports in LaTeX, where I have to escape special LaTeX characters. Basically, whenever I read $ or _ or %, etc, I have to replace it by \$, \_ and \%, respectively.
I naively tried to do mystring.replace('$','\$'), yet it adds a double backslash, as shown below:
my_text_to_parse = "$x^2+2\cdot x + 2 = 0$"
my_text_to_parse.replace('$','\$')
#=> "\\$x^2+2\cdot x + 2 = 0\\$"
Is there any way to avoid doubling escape characters?
You're seeing the double backslash because you're getting the representation of the string, not the output. In the representation, it prints an backslash because \ is a protected character and therefore must be escaped. This is because it is used in special characters (e.g. \t, \n) and usage might be confused.. When the string is actually printed or saved, those double backslashes should be printed properly as a single backslash.
For example, compare
print('\')
# SyntaxError: EOL while scanning string literal
to
print('\\')
# \
In the first string, the second quotation mark is being escaped by the backslash. This shows why you generally can't use raw backslashes in strings. In the second string, the second backslash is being escaped by the first. The two backslashes get interpreted as a single one.
print(repr('\\'))
# '\\'
But the representation of the second string still shows both backslashes. This behavior is the same as other special characters such as \n, where it can be a bit easier to see the issue. Just as \n is the special character that means line break, \\ is the special character that means single backslash.
print('hi\nmom')
# hi
# mom
print(repr('hi\nmom'))
# 'hi\nmom'
To actually answer your question, the way you're doing it should work properly, but you probably don't want to do it quite that way. This is because creating a string with '\$' doesn't make this escaping issue clear. It seems like it is a special character \$ in the same way that \n is a special character, but because there is no character defined like that, the python interpreter is smart enough to replace the single backslash with a double backslash. But you generally don't want to rely on that behavior.
A better way to do it is to explicitly escape the backslash with another one or to use a raw string, where no escaping is allowed. All of these will give the same result.
s = '$x^2+2\\cdot x + 2 = 0$'
print(s.replace('$', '\$')) # Technically works, but not as clear
# \$x^2+2\cdot x + 2 = 0\$
print(s.replace('$', '\\$')) # Escaping the backslash
# \$x^2+2\cdot x + 2 = 0\$
print(s.replace('$', r'\$')) # Using a raw string
# \$x^2+2\cdot x + 2 = 0\$
print re.sub(r"\$","\$",x)
You can try re.sub.It will give the expected result.

How can I print a single backslash?

When I write print('\') or print("\") or print("'\'"), Python doesn't print the backslash \ symbol. Instead it errors for the first two and prints '' for the third. What should I do to print a backslash?
This question is about producing a string that has a single backslash in it. This is particularly tricky because it cannot be done with raw strings. For the related question about why such a string is represented with two backslashes, see Why do backslashes appear twice?. For including literal backslashes in other strings, see using backslash in python (not to escape).
You need to escape your backslash by preceding it with, yes, another backslash:
print("\\")
And for versions prior to Python 3:
print "\\"
The \ character is called an escape character, which interprets the character following it differently. For example, n by itself is simply a letter, but when you precede it with a backslash, it becomes \n, which is the newline character.
As you can probably guess, \ also needs to be escaped so it doesn't function like an escape character. You have to... escape the escape, essentially.
See the Python 3 documentation for string literals.
A hacky way of printing a backslash that doesn't involve escaping is to pass its character code to chr:
>>> print(chr(92))
\
print(fr"\{''}")
or how about this
print(r"\ "[0])
For completeness: A backslash can also be escaped as a hex sequence: "\x5c"; or a short Unicode sequence: "\u005c"; or a long Unicode sequence: "\U0000005c". All of these will produce a string with a single backslash, which Python will happily report back to you in its canonical representation - '\\'.

How to escape “\” characters in python

i am very new to regular expression and trying get "\" character using python
normally i can escape "\" like this
print ("\\");
print ("i am \\nit");
output
\
i am \nit
but when i use the same in regX it didn't work as i thought
print (re.findall(r'\\',"i am \\nit"));
and return me output
['\\']
can someone please explain why
EDIT: The problem is actually how print works with lists & strings. It prints the representation of the string, not the string itself, the representation of a string containing just a backslash is '\\'. So findall is actually finding the single backslash correctly, but print isn't printing it as you'd expect. Try:
>>> print(re.findall(r'\\',"i am \\nit")[0])
\
(The following is my original answer, it can be ignored (it's entirely irrelevant), I'd misinterpreted the question initially. But it seems to have been upvoted a bit, so I'll leave it here.)
The r prefix on a string means the string is in "raw" mode, that is, \ are not treated as special characters (it doesn't have anything to do with "regex").
However, r'\' doesn't work, as you can't end a raw string with a backslash, it's stated in the docs:
Even in a raw string, string quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character).
But you actually can use a non-raw string to get a single backslash: "\\".
can someone please explain why
Because re.findall found one match, and the match text consisted of a backslash. It gave you a list with one element, which is a string, which has one character, which is a backslash.
That is written ['\\'] because '\\' is how you write "a string with one backslash" - just like you had to do when you wrote the example code print "\\".
Note that you're using two different kinds of string literal here -- there's the regular string "a string" and the raw string r"a raw string". Regular string literals observe backslash escaping, so to actually put a backslash in the string, you need to escape it too. Raw string literals treat backslashes like any other character, so you're more limited in which characters you can actually put in the string (no specials that need an escape code) but it's easier to enter things like regular expressions, because you don't need to double up backslashes if you need to add a backslash to have meaning inside the string, not just when creating the string.
It is unnecessary to escape backslashes in raw strings, unless the backslash immediately precedes the closing quote.

Categories