Replace a character with backslash bug - Python - python

This feels like a bug to me. I am unable to replace a character in a string with a single backslash:
>>>st = "a&b"
>>>st.replace('&','\\')
'a\\b'
I know that '\' isn't a legitimate string because the \ escapes the last '.
However, I don't want the result to be 'a\\b'; I want it to be 'a\b'. How is this possible?

You are looking at the string representation, which is itself a valid Python string literal.
The \\ is itself just one slash, but displayed as an escaped character to make the value a valid Python literal string. You can copy and paste that string back into Python and it'll produce the same value.
Use print st.replace('&','\\') to see the actual value being displayed, or test for the length of the resulting value:
>>> st = "a&b"
>>> print st.replace('&','\\')
a\b
>>> len(st.replace('&','\\'))
3

Related

using os.path.join, but it's always doubling the '\' even putting it as rawstring [duplicate]

When I create a string containing backslashes, they get duplicated:
>>> my_string = "why\does\it\happen?"
>>> my_string
'why\\does\\it\\happen?'
Why?
What you are seeing is the representation of my_string created by its __repr__() method. If you print it, you can see that you've actually got single backslashes, just as you intended:
>>> print(my_string)
why\does\it\happen?
The string below has three characters in it, not four:
>>> 'a\\b'
'a\\b'
>>> len('a\\b')
3
You can get the standard representation of a string (or any other object) with the repr() built-in function:
>>> print(repr(my_string))
'why\\does\\it\\happen?'
Python represents backslashes in strings as \\ because the backslash is an escape character - for instance, \n represents a newline, and \t represents a tab.
This can sometimes get you into trouble:
>>> print("this\text\is\not\what\it\seems")
this ext\is
ot\what\it\seems
Because of this, there needs to be a way to tell Python you really want the two characters \n rather than a newline, and you do that by escaping the backslash itself, with another one:
>>> print("this\\text\is\what\you\\need")
this\text\is\what\you\need
When Python returns the representation of a string, it plays safe, escaping all backslashes (even if they wouldn't otherwise be part of an escape sequence), and that's what you're seeing. However, the string itself contains only single backslashes.
More information about Python's string literals can be found at: String and Bytes literals in the Python documentation.
As Zero Piraeus's answer explains, using single backslashes like this (outside of raw string literals) is a bad idea.
But there's an additional problem: in the future, it will be an error to use an undefined escape sequence like \d, instead of meaning a literal backslash followed by a d. So, instead of just getting lucky that your string happened to use \d instead of \t so it did what you probably wanted, it will definitely not do what you want.
As of 3.6, it already raises a DeprecationWarning, although most people don't see those. It will become a SyntaxError in some future version.
In many other languages, including C, using a backslash that doesn't start an escape sequence means the backslash is ignored.
In a few languages, including Python, a backslash that doesn't start an escape sequence is a literal backslash.
In some languages, to avoid confusion about whether the language is C-like or Python-like, and to avoid the problem with \Foo working but \foo not working, a backslash that doesn't start an escape sequence is illegal.

convert a string to its codepoint in python

there are characters like '‌' that are not visible so I cant copy paste it. I want to convert any character to its codepoint like '\u200D'
another example is: 'abc' => '\u0061\u0062\u0063'
Allow me to rephrase your question. The header convert a string to its codepoint in python clearly did not get through to everyone, mostly, I think, because we can't imagine what you want it for.
What you want is a string containing a representation of Unicode escapes.
You can do that this way:
print(''.join("\\u{:04x}".format(b) for b in b'abc'))
\u0061\u0062\u0063
If you display that printed value as a string literal you will see doubled backslashes, because backslashes have to be escaped in a Python string. So it will look like this:
'\\u0061\\u0062\\u0063'
The reason for that is that if you simply put unescaped backslashes in your string literal, like this:
a = "\u0061\u0062\u0063"
when you display a at the prompt you will get:
>>> a
'abc'
'\u0061\u0062\u0063'.encode('utf-8') will encode the text to Unicode.
Edit:
Since python automatically converts the string to Unicode you can't see the value but you can create a function that will generate that.
def get_string_unicode(string_to_convert):
res = ''
for letter in string_to_convert:
res += '\\u' + (hex(ord(letter))[2:]).zfill(4)
return res
Result:
>>> get_string_unicode('abc')
'\\u0061\\u0062\\u0063'

pandas to_dict() converts backslash to double backslash [duplicate]

When I create a string containing backslashes, they get duplicated:
>>> my_string = "why\does\it\happen?"
>>> my_string
'why\\does\\it\\happen?'
Why?
What you are seeing is the representation of my_string created by its __repr__() method. If you print it, you can see that you've actually got single backslashes, just as you intended:
>>> print(my_string)
why\does\it\happen?
The string below has three characters in it, not four:
>>> 'a\\b'
'a\\b'
>>> len('a\\b')
3
You can get the standard representation of a string (or any other object) with the repr() built-in function:
>>> print(repr(my_string))
'why\\does\\it\\happen?'
Python represents backslashes in strings as \\ because the backslash is an escape character - for instance, \n represents a newline, and \t represents a tab.
This can sometimes get you into trouble:
>>> print("this\text\is\not\what\it\seems")
this ext\is
ot\what\it\seems
Because of this, there needs to be a way to tell Python you really want the two characters \n rather than a newline, and you do that by escaping the backslash itself, with another one:
>>> print("this\\text\is\what\you\\need")
this\text\is\what\you\need
When Python returns the representation of a string, it plays safe, escaping all backslashes (even if they wouldn't otherwise be part of an escape sequence), and that's what you're seeing. However, the string itself contains only single backslashes.
More information about Python's string literals can be found at: String and Bytes literals in the Python documentation.
As Zero Piraeus's answer explains, using single backslashes like this (outside of raw string literals) is a bad idea.
But there's an additional problem: in the future, it will be an error to use an undefined escape sequence like \d, instead of meaning a literal backslash followed by a d. So, instead of just getting lucky that your string happened to use \d instead of \t so it did what you probably wanted, it will definitely not do what you want.
As of 3.6, it already raises a DeprecationWarning, although most people don't see those. It will become a SyntaxError in some future version.
In many other languages, including C, using a backslash that doesn't start an escape sequence means the backslash is ignored.
In a few languages, including Python, a backslash that doesn't start an escape sequence is a literal backslash.
In some languages, to avoid confusion about whether the language is C-like or Python-like, and to avoid the problem with \Foo working but \foo not working, a backslash that doesn't start an escape sequence is illegal.

Printing a literal string python in octal

Hi I am having trouble trying to print a literal string in a proper format.
For starters i have an object with a string parameter which is used for metadata such that it looks like:
obj {
metadata: <str>
}
The object is being returned as a protocol response and we have the object to use as such.
print obj gives:
metadata: "\n)\n\022foobar"
When I print the obj.metadata python treats the value as a string and converts the escapes to linebreaks and the corresponding ascii values as expected.
When i tried
print repr(obj.metadata)
"\n)\n\x12foobar"
Unfortunately python prints the literal but converts the escaped characters from octal to hex. Is there a way i can print the python literal with the escaped characters in octal or convert the string such that I can have the values printed as it is in the object?
Thanks for the help
The extremely bad solution I have so far is
print str(obj).rstrip('\"').lstrip('metadata: \"')
to get the correct answer, but i am assuming there must be a smarter way
TLDR:
x = "\n)\n\022foobar"
print x
)
foobar
print repr(x)
'\n)\n\x12foobar'
how do i get x to print the way it was assigned
Please try this:
print('\\\n)\\\n\\\022foobar')
or
print(r'\n)\n\022foobar')
The escape character '\' interprets the character following it differently, for example \n is used for new line.
The double escape character '\\' or letter 'r' nullifies the interpretation of the escape character. This is similar in C language.

Python - How to print one backslash in a string within a dictionary?

I have a dictionary with some strings, in one of the string there are two backslashes. I want to replace them with a single backslash.
These are the backslashes: IfNotExist\\u003dtrue
Configurations = {
"javax.jdo.option.ConnectionUserName": "test",
"javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
"javax.jdo.option.ConnectionPassword": "sxxxsasdsasad",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue"
}
print (Configurations)
When I print it keeps showing the two backslashes. I know that the way to escape a backslash is using \ this works in a regular string but it does not work in a dictionary.
Any ideas?
The problem comes from the encoding.
In fact \u003d is the UNICODE representation of =.
The backslash is escaped by another backslash which is a good thing.
You may need to:
Replace \u003d as =
Read it as unicode then you should prepend the string with u like u"hi \\u003d" may be ok
Printing the dictionary shows you a representation of the dictionary object. It doesn't necessarily show you a nice representation of everything inside it. To do that you need to do:
for value in Configurations.values():
print(value)
When you print out your dictionary using
print (Configurations), it will print out the repr() value of the dictionary
You will get
{'javax.jdo.option.ConnectionDriverName': 'org.mariadb.jdbc.Driver', 'javax.jdo.option.ConnectionUserName': 'test', 'javax.jdo.option.ConnectionPassword': 'sxxxsasdsasad', 'javax.jdo.option.ConnectionURL': 'jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue'}
You need to print out your dictionary with
print (Configurations["javax.jdo.option.ConnectionURL"])
or
print (str(Configurations["javax.jdo.option.ConnectionURL"]))
Note: str() is added
Then the output will be
jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\u003dtrue
For more detail check Python Documentation - Fancier Output Formatting
The str() function is meant to return representations of values which
are fairly human-readable, while repr() is meant to generate
representations which can be read by the interpreter (or will force a
SyntaxError if there is no equivalent syntax).
If you want to represent that string by using a single backslash instead of a double backslash, then you need the str() representation, not the repr(). When you print a dictionary, you always get the repr() of the included strings.
You can print the str() by formatting the dictionary yourself, like so:
print ( "{" +
', '.join("'{key}': '{value}'".format(key=key, value=value)
for key, value in Configurations.items()) +
"}")
Depending on how you print your string, Python will print two backslashes where the string actually only has one in it. This is Python's way of indicating that the backslash is an actual backslash, and not part of an escaped character; because print will actually show you '\n' for a carriage return, for example.
Try writing the string to a file and then opening the file in an editor.
(Linux..)
> f = open('/tmp/somefile.txt', 'w')
> f.write(sometextwithbackslashes)
> \d
$ vi /tmp/somefile.txt

Categories