Hi I am having trouble trying to print a literal string in a proper format.
For starters i have an object with a string parameter which is used for metadata such that it looks like:
obj {
metadata: <str>
}
The object is being returned as a protocol response and we have the object to use as such.
print obj gives:
metadata: "\n)\n\022foobar"
When I print the obj.metadata python treats the value as a string and converts the escapes to linebreaks and the corresponding ascii values as expected.
When i tried
print repr(obj.metadata)
"\n)\n\x12foobar"
Unfortunately python prints the literal but converts the escaped characters from octal to hex. Is there a way i can print the python literal with the escaped characters in octal or convert the string such that I can have the values printed as it is in the object?
Thanks for the help
The extremely bad solution I have so far is
print str(obj).rstrip('\"').lstrip('metadata: \"')
to get the correct answer, but i am assuming there must be a smarter way
TLDR:
x = "\n)\n\022foobar"
print x
)
foobar
print repr(x)
'\n)\n\x12foobar'
how do i get x to print the way it was assigned
Please try this:
print('\\\n)\\\n\\\022foobar')
or
print(r'\n)\n\022foobar')
The escape character '\' interprets the character following it differently, for example \n is used for new line.
The double escape character '\\' or letter 'r' nullifies the interpretation of the escape character. This is similar in C language.
Related
there are characters like '' that are not visible so I cant copy paste it. I want to convert any character to its codepoint like '\u200D'
another example is: 'abc' => '\u0061\u0062\u0063'
Allow me to rephrase your question. The header convert a string to its codepoint in python clearly did not get through to everyone, mostly, I think, because we can't imagine what you want it for.
What you want is a string containing a representation of Unicode escapes.
You can do that this way:
print(''.join("\\u{:04x}".format(b) for b in b'abc'))
\u0061\u0062\u0063
If you display that printed value as a string literal you will see doubled backslashes, because backslashes have to be escaped in a Python string. So it will look like this:
'\\u0061\\u0062\\u0063'
The reason for that is that if you simply put unescaped backslashes in your string literal, like this:
a = "\u0061\u0062\u0063"
when you display a at the prompt you will get:
>>> a
'abc'
'\u0061\u0062\u0063'.encode('utf-8') will encode the text to Unicode.
Edit:
Since python automatically converts the string to Unicode you can't see the value but you can create a function that will generate that.
def get_string_unicode(string_to_convert):
res = ''
for letter in string_to_convert:
res += '\\u' + (hex(ord(letter))[2:]).zfill(4)
return res
Result:
>>> get_string_unicode('abc')
'\\u0061\\u0062\\u0063'
I'm a Python newbie and I'm trying to make one script that writes some strings in a file if there's a difference. Problem is that original string has some characters in \uNNNN Unicode format and I cannot convert the new string to the same Unicode format.
The original string I'm trying to compare: \u00A1 ATENCI\u00D3N! \u25C4
New string is received as: ¡ ATENCIÓN! ◄
And this the code
str = u'¡ ATENCIÓN! ◄'
print(str)
str1 = str.encode('unicode_escape')
print (str1)
str2 = str1.decode()
print (str2)
And the result is:
¡ ATENCIÓN! ◄
b'\\xa1 ATENCI\\xd3N! \\u25c4'
\xa1 ATENCI\xd3N! \u25c4
So, how can I get \xa1 ATENCI\xd3N! \u25c4 converted to \u00A1 ATENCI\u00D3N! \u25C4 as this is the only Unicode format I can save?
Note: Cases of characters in strings also need to be the same for comparison.
The issue is, according to the docs (read down a little bit, between the escape sequences tables), the \u, \U, and \N Unicode escape sequences are only recognized in string literals. That means that once the literal is evaluated in memory, such as in a variable assignment:
s = "\u00A1 ATENCI\u00D3N! \u25C4"
any attempt to str.encode() it automatically converts it to a bytes object that uses \x where it can:
b'\\xa1 ATENCI\\xd3N! \\u25c4'
Using
b'\\xa1 ATENCI\\xd3N! \\u25c4'.decode("unicode_escape")
will convert it back to '¡ ATENCIÓN! ◄'. This uses the actual (intended) representation of the characters, and not the \uXXXX escape sequences of the original string s.
So, what you should do is not mess around with encoding and decoding things. Observe:
print("\u00A1 ATENCI\u00D3N! \u25C4" == '¡ ATENCIÓN! ◄')
True
That's all the comparison you need to do.
For further reading, you may be interested in:
How to work with surrogate pairs in Python?
Encodings and Unicode from the Python docs.
I have a dataset containing some some poorly parsed text that includes a lot of unicode characters (like 'a', '{', 'Ⅷ', '♞', ...) that have been improperly converted to Unicode.
All of the backslashes are escaped, so every unicode escape sequence was interpreted as a \ next to a u instead of a single character, \u.
More specifically, I have strings that look like this:
>>> '\\u00e9'
'\\u00e9'
And I want them to look like this:
>>> '\u00e9'
'é'
How can I convert the first string to the second?
Here is one way to accomplish without importing another module.
input_string = '\\u00e9'
print(input_string.encode('latin-1').decode('unicode-escape'))
# output
é
First you need to identify the string as hex
classmethod fromhex(string)
This bytes class method returns a bytes object, decoding the given string object. The string must contain two hexadecimal digits per byte, with ASCII whitespace being ignored.
https://docs.python.org/3/library/stdtypes.html#bytes.fromhex
Next we need to convert the hex to Unicode
bytes.decode(encoding="utf-8", errors="strict")
https://docs.python.org/3/library/stdtypes.html#bytes.decode
So it would look something like this
char = '\\u00e9'
print (bytes.fromhex(char)[3:-1].decode('latin-1'))
I have a dictionary with some strings, in one of the string there are two backslashes. I want to replace them with a single backslash.
These are the backslashes: IfNotExist\\u003dtrue
Configurations = {
"javax.jdo.option.ConnectionUserName": "test",
"javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
"javax.jdo.option.ConnectionPassword": "sxxxsasdsasad",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue"
}
print (Configurations)
When I print it keeps showing the two backslashes. I know that the way to escape a backslash is using \ this works in a regular string but it does not work in a dictionary.
Any ideas?
The problem comes from the encoding.
In fact \u003d is the UNICODE representation of =.
The backslash is escaped by another backslash which is a good thing.
You may need to:
Replace \u003d as =
Read it as unicode then you should prepend the string with u like u"hi \\u003d" may be ok
Printing the dictionary shows you a representation of the dictionary object. It doesn't necessarily show you a nice representation of everything inside it. To do that you need to do:
for value in Configurations.values():
print(value)
When you print out your dictionary using
print (Configurations), it will print out the repr() value of the dictionary
You will get
{'javax.jdo.option.ConnectionDriverName': 'org.mariadb.jdbc.Driver', 'javax.jdo.option.ConnectionUserName': 'test', 'javax.jdo.option.ConnectionPassword': 'sxxxsasdsasad', 'javax.jdo.option.ConnectionURL': 'jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue'}
You need to print out your dictionary with
print (Configurations["javax.jdo.option.ConnectionURL"])
or
print (str(Configurations["javax.jdo.option.ConnectionURL"]))
Note: str() is added
Then the output will be
jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\u003dtrue
For more detail check Python Documentation - Fancier Output Formatting
The str() function is meant to return representations of values which
are fairly human-readable, while repr() is meant to generate
representations which can be read by the interpreter (or will force a
SyntaxError if there is no equivalent syntax).
If you want to represent that string by using a single backslash instead of a double backslash, then you need the str() representation, not the repr(). When you print a dictionary, you always get the repr() of the included strings.
You can print the str() by formatting the dictionary yourself, like so:
print ( "{" +
', '.join("'{key}': '{value}'".format(key=key, value=value)
for key, value in Configurations.items()) +
"}")
Depending on how you print your string, Python will print two backslashes where the string actually only has one in it. This is Python's way of indicating that the backslash is an actual backslash, and not part of an escaped character; because print will actually show you '\n' for a carriage return, for example.
Try writing the string to a file and then opening the file in an editor.
(Linux..)
> f = open('/tmp/somefile.txt', 'w')
> f.write(sometextwithbackslashes)
> \d
$ vi /tmp/somefile.txt
This feels like a bug to me. I am unable to replace a character in a string with a single backslash:
>>>st = "a&b"
>>>st.replace('&','\\')
'a\\b'
I know that '\' isn't a legitimate string because the \ escapes the last '.
However, I don't want the result to be 'a\\b'; I want it to be 'a\b'. How is this possible?
You are looking at the string representation, which is itself a valid Python string literal.
The \\ is itself just one slash, but displayed as an escaped character to make the value a valid Python literal string. You can copy and paste that string back into Python and it'll produce the same value.
Use print st.replace('&','\\') to see the actual value being displayed, or test for the length of the resulting value:
>>> st = "a&b"
>>> print st.replace('&','\\')
a\b
>>> len(st.replace('&','\\'))
3