What is the right way to encode a string with backslashes? [duplicate] - python

This question already has answers here:
How to fix "<string> DeprecationWarning: invalid escape sequence" in Python?
(2 answers)
Closed 4 years ago.
In the given example: "\info\more info\nName"
how would I turn this into bytes
I tried using unicode-escape but that didn't seem to work :(
data = "\info\more info\nName"
dataV2 = str.encode(data)
FinalData = dataV2.decode('unicode-escape').encode('utf_8')
print(FinalData)
This is were I should get b'\info\more info\nName'
but something unexpected happens and I get DeprecationWarnings in my terminal
I'm assuming that its because of the backslashes causing a invalid sequence but I need them for this project

Backslashes before characters indicate an attempt to escape the character that follows to make it into a special character of some sort. You get the DeprecationWarning because Python is (finally) going to make unrecognized escapes an error, rather than silently treating them as a literal backslash followed by the character.
To fix, either double your backslashes (not sure if you intended a newline; if so, double double the backslash before the n):
data = "\\info\\more info\\nName"
or, if you want all the backslashes to be literal backslashes (the \n shouldn't be a newline), then you can use a raw string by prefixing with r:
data = r"\info\more info\nName"
which disables backslashes interpolation for everything except the quote character itself.
Note that if you just let data echo in the interactive interpreter, it will show the backslashes as doubled (because it implicitly uses the repr of the str, which is what you'd type to reproduce it). To avoid that, print the str to see what it would actually look like:
>>> "\\info\\more info\\nName" # repr produced by simply evaluating it, which shows backslashes doubled, but there's really only one each time
"\\info\\more info\\nName"
>>> print("\\info\\more info\\nName") # print shows the "real" contents
\info\more info\nName
>>> print("\\info\\more info\nName") # With new line left in place
\info\more info
Name
>>> print(r"\info\more info\nName") # Same as first option, but raw string means no doubling backslashes
\info\more info\nName

You can escape a backslash with another backslash.
data = "\\info\\more info\nName"
You could also use a raw string for the parts that don't need escapes.
data = r"\info\more info""\nName"
Note that raw strings don't work if the final character is a backslash.

Related

using os.path.join, but it's always doubling the '\' even putting it as rawstring [duplicate]

When I create a string containing backslashes, they get duplicated:
>>> my_string = "why\does\it\happen?"
>>> my_string
'why\\does\\it\\happen?'
Why?
What you are seeing is the representation of my_string created by its __repr__() method. If you print it, you can see that you've actually got single backslashes, just as you intended:
>>> print(my_string)
why\does\it\happen?
The string below has three characters in it, not four:
>>> 'a\\b'
'a\\b'
>>> len('a\\b')
3
You can get the standard representation of a string (or any other object) with the repr() built-in function:
>>> print(repr(my_string))
'why\\does\\it\\happen?'
Python represents backslashes in strings as \\ because the backslash is an escape character - for instance, \n represents a newline, and \t represents a tab.
This can sometimes get you into trouble:
>>> print("this\text\is\not\what\it\seems")
this ext\is
ot\what\it\seems
Because of this, there needs to be a way to tell Python you really want the two characters \n rather than a newline, and you do that by escaping the backslash itself, with another one:
>>> print("this\\text\is\what\you\\need")
this\text\is\what\you\need
When Python returns the representation of a string, it plays safe, escaping all backslashes (even if they wouldn't otherwise be part of an escape sequence), and that's what you're seeing. However, the string itself contains only single backslashes.
More information about Python's string literals can be found at: String and Bytes literals in the Python documentation.
As Zero Piraeus's answer explains, using single backslashes like this (outside of raw string literals) is a bad idea.
But there's an additional problem: in the future, it will be an error to use an undefined escape sequence like \d, instead of meaning a literal backslash followed by a d. So, instead of just getting lucky that your string happened to use \d instead of \t so it did what you probably wanted, it will definitely not do what you want.
As of 3.6, it already raises a DeprecationWarning, although most people don't see those. It will become a SyntaxError in some future version.
In many other languages, including C, using a backslash that doesn't start an escape sequence means the backslash is ignored.
In a few languages, including Python, a backslash that doesn't start an escape sequence is a literal backslash.
In some languages, to avoid confusion about whether the language is C-like or Python-like, and to avoid the problem with \Foo working but \foo not working, a backslash that doesn't start an escape sequence is illegal.

Why print returns \\, not a escape character \ in Python [duplicate]

This question already has answers here:
How to fix "<string> DeprecationWarning: invalid escape sequence" in Python?
(2 answers)
Closed 7 months ago.
The below code prints the emoji like, this 😂 :
print('\U0001F602')
print('{}'.format('\U0001F602'))
However, If I use \ like the below, it prints \U0001F602
print('\{}'.format('U0001F602'))
Why the print('\{}'.format()) retunrs \\, not a escape character, which is \?
I have been checking this and searched in Google, but couldn't find the proper answer.
Referring to String and Bytes literals, when python sees a backslash in a string literal while compiling the program, it looks to the next character to see how the following characters are to be escaped. In the first case the following character is U so python knows its a unicode escape. In the final case, it sees {, realizes there is no escape, and just emits the backslash and that { character.
In print('\{}'.format('U0001F602')) there are two different string literals '\{}' and 'U0001F602'. That the first string will be parsed at runtime with .format doesn't make the result a string literal at all - its a composite value.
>>> print('\{}'.format('U0001F602'))
\U0001F602
This is because you are giving {} as an argument to .format function and it only fills value inside the curly braces.
ANd it is printing a single \ not double \

Why is the string automatically being changed? Is it because of backslash \ in the string? [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 2 years ago.
Say I assign variable
x = '\\dnassmb1\biloadfiles_dev\Workday'
print(x)
Output:
'\\dnassmb1\x08iloadfiles_dev\\Workday'
I would like to know why it's changing to "x08.." specifically and how to avoid that automatic change and use string as it is. Thank you!
You are doing wrong.Backslash has a different meaning in pyhton while using in strings.
Backslashes are actually used to put some special character inside the string.
If you want to get the above string printed;
x = '\\\dnassmb1\\biloadfiles_dev\\Workday'
print(x)
If you got this, i am using an extra backslash everywhere where i want a backslash to be printed. This is because the first backslash indicates that what ever is going to come after it is just a part of the string and has no special meaning.
Use raw strings:
x = r'\\dnassmb1\biloadfiles_dev\Workday'
This will prevent python from treating your backslashes as escape sequences. See string and byte literals in the Python documentation for a full treatment of string parsing.
It's important to pay close attention to the difference between representation and value here. Just because a string appears to have four backslashes in it, doesn't mean that those backslashes are in the value of the string. Consider:
>>> x = '\\dnassmb1\biloadfiles_dev\Workday' # regular string
>>> y = r'\\dnassmb1\biloadfiles_dev\Workday' # raw string
>>> print(x); print(y)
\dnassmbiloadfiles_dev\Workday
\\dnassmb1\biloadfiles_dev\Workday
Here, x and y are both just strings, once Python has parsed them. But even though the parts inside the quotes are the same, the bytes of the string are different. In y's case, you see exactly the number of backslashes you put in.

How to assign \ character to a variable in Python? [duplicate]

This question already has an answer here:
Why does printing a tuple (list, dict, etc.) in Python double the backslashes?
(1 answer)
Closed 5 years ago.
I am trying to assign
user = 'corp\adam'
Using python I am unable to create a user variable as desire.
Desired Output:
user
'corp\adam'
I don't want to print the variable. I need to store it.
In Python (and commonly in other programming languages too) the backslash character is used to denote special characters that could not be typed directly into a string. This is known as an escape sequence. To specify a literal backslash, use it twice:
user = 'corp\\adam'
Try a double backslash, i.e. corp\\adam. The first backslash denotes that the second one has to be evaluated like a normal character, not as another escape character.
The backslash character acts as an escape when the next character is an ASCII or Python special escape character. Either escape the backslash with another backslash:
'corp\\adam'
or use a raw string (where backslash only escapes the quote character it doesn't even escape itself):
r'corp\adam'
You can use r'corp\adam' that way you tell Python it's a "raw" string and the backslash will not be used to escape other characters.
From the docs:
Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters.

Why can't I end a raw string with a backslash? [duplicate]

This question already has answers here:
Why can't Python's raw string literals end with a single backslash?
(14 answers)
Closed 6 years ago.
I am confused here, even though raw strings convert every \ to \\ but when this \ appears in the end it raises error.
>>> r'so\m\e \te\xt'
'so\\m\\e \\te\\xt'
>>> r'so\m\e \te\xt\'
SyntaxError: EOL while scanning string literal
Update:
This is now covered in Python FAQs as well: Why can’t raw strings (r-strings) end with a backslash?
You still need \ to escape ' or " in raw strings, since otherwise the python interpreter doesn't know where the string stops. In your example, you're escaping the closing '.
Otherwise:
r'it wouldn\'t be possible to store this string'
r'since it'd produce a syntax error without the escape'
Look at the syntax highlighting to see what I mean.
Raw strings can't end in single backslashes because of how the parser works (there is no actual escaping going on, though). The workaround is to add the backslash as a non-raw string literal afterwards:
>>> print(r'foo\')
File "<stdin>", line 1
print(r'foo\')
^
SyntaxError: EOL while scanning string literal
>>> print(r'foo''\\')
foo\
Not pretty, but it works. You can add plus to make it clearer what is happening, but it's not necessary:
>>> print(r'foo' + '\\')
foo\
Python strings are processed in two steps:
First the tokenizer looks for the closing quote. It recognizes backslashes when it does this, but doesn't interpret them - it just looks for a sequence of string elements followed by the closing quote mark, where "string elements" are either (a character that's not a backslash, closing quote or a newline - except newlines are allowed in triple-quotes), or (a backslash, followed by any single character).
Then the contents of the string are interpreted (backslash escapes are processed) depending on what kind of string it is. The r flag before a string literal only affects this step.
Quote from https://docs.python.org/3.4/reference/lexical_analysis.html#literals:
Even in a raw literal, quotes can be escaped with a backslash, but the
backslash remains in the result; for example, r"\"" is a valid string
literal consisting of two characters: a backslash and a double quote;
r"\" is not a valid string literal (even a raw string cannot end in an
odd number of backslashes). Specifically, a raw literal cannot end in
a single backslash (since the backslash would escape the following
quote character). Note also that a single backslash followed by a
newline is interpreted as those two characters as part of the literal,
not as a line continuation.
So in raw string, backslash are not treated specially, except when preceding " or '. Therefore, r'\' or r"\" is not a valid string cause right quote is escaped thus making the string literal invalid. In such case, there's no difference whether r exists, i.e. r'\' is equivalent to '\' and r"\" is equivalent to "\".

Categories