This question already has an answer here:
Why does printing a tuple (list, dict, etc.) in Python double the backslashes?
(1 answer)
Closed 7 months ago.
When I create a string containing backslashes, they get duplicated:
NOTE : I want to add \ in request because i want to call third party API and they want me to send request with \ in some of their keys.
I have taken reference from this answer Why do backslashes appear twice?, but its working only for string, not for dict.
mystr = {"str": "why\does\it\happen?"}
print(mystr)
output:
{'str': 'why\\does\\it\\happen?'}
here i am attching a screenshot for better understanding.
mystr isn't a str, it's a dict. When you print a dict, it prints the repr of each string inside it, rather than the string itself.
>>> mydict = {"str": "why\does\it\happen?"}
>>> print(mydict)
{'str': 'why\\does\\it\\happen?'}
>>> print(repr(mydict['str']))
'why\\does\\it\\happen?'
>>> print(mydict['str'])
why\does\it\happen?
Note that the repr() includes elements other than the string contents:
The quotes around it (indicating that it's a string)
The contents use backslash-escapes to disambiguate the individual characters. This extends to other "special" characters as well; for example, if this were a multiline string, the repr would show linebreaks as \n within a single line of text. Actual backslash characters are always rendered as \\ so that they can be distinguished from backslashes that are part of other escape sequences.
The key thing to understand is that these extra elements are just the way that the dict is rendered when it is printed. The actual contents of the string inside the dict do not have "doubled backslashes", as you can see when you print mydict['str'].
If you are using this dict to call an API, you should not be using str(mydict) or anything similar; if it's a Python API, you should be able to use mydict itself, and if it's a web API, it should be using something like JSON encoding (json.dumps(mydict)).
I think that to build the printed string of the dict, python call the __repr__ method of object inside it (for the values) instead of the __str__ as you would expect for printing the dict.
It would make sense since dict can contain every type of object not just string so the __repr__| method can be found everywhere (it's included in the base object in python) when the __str__ need to be written.
But it's only a guess, not a definitive answer.
Related
I'm using Python to call an API that returns the last name of some soccer players. One of the players has a "ć" in his name.
When I call the endpoint, the name prints out with the unicode attached to it:
>>> last_name = (json.dumps(response["response"][2]["player"]["lastname"]))
>>> print(last_name)
"Mitrovi\u0107"
>>> print(type(last_name))
<class 'str'>
If I were to take copy and paste that output and put it in a variable on its own like so:
>>> print("Mitrovi\u0107")
Mitrović
>>> print(type("Mitrovi\u0107"))
<class 'str'>
Then it prints just fine?
What is wrong with the API endpoint call and the string that comes from it?
Well, you serialise the string with json.dumps() before printing it, that's why you get a different output.
Compare the following:
>>> print("Mitrović")
Mitrović
and
>>> print(json.dumps("Mitrović"))
"Mitrovi\u0107"
The second command adds double quotes to the output and escapes non-ASCII chars, because that's how strings are encoded in JSON. So it's possible that response["response"][2]["player"]["lastname"] contains exactly what you want, but maybe you fooled yourself by wrapping it in json.dumps() before printing.
Note: don't confuse Python string literals and JSON serialisation of strings. They share some common features, but they aren't the same (eg. JSON strings can't be single-quoted), and they serve a different purpose (the first are for writing strings in source code, the second are for encoding data for sending it accross the network).
Another note: You can avoid most of the escaping with ensure_ascii=False in the json.dumps() call:
>>> print(json.dumps("Mitrović", ensure_ascii=False))
"Mitrović"
Count the number of characters in your string & I'll bet you'll notice that the result of json is 13 characters:
"M-i-t-r-o-v-i-\-u-0-1-0-7", or "Mitrovi\\u0107"
When you copy "Mitrovi\u0107" you're coping 8 characters and the '\u0107' is a single unicode character.
That would suggest the endpoint is not sending properly json-escaped unicode, or somewhere in your doc you're reading it as ascii first. Carefully look at exactly what you're receiving.
I understand that the goal of repr is to be unambiguous, but the behavior of repr really confused me.
repr('"1"')
Out[84]: '\'"1"\''
repr("'1'")
Out[85]: '"\'1\'"'
Based on the above code, I think repr just put '' around the string.
But when i try this:
repr('1')
Out[82]: "'1'"
repr("1")
Out[83]: "'1'"
repr put "" around strings and repr("1") and repr('1') is the same.
Why?
There are three levels of quotes going on here!
The quotes inside the string you're passing (only present in your first example).
The quotes in the string produced by repr. Keep in mind that repr tries to return a string representation that would work as Python code, so if you pass it a string, it will add quotes around the string.
The quotes added by your Python interpreter upon printing the output. These are probably what confuses you. Probably your interpreter is calling repr again, in order to give you an idea of the type of object being returned. Otherwise, the string 1 and the number 1 would look identical.
To get rid of this extra level of quoting, so you can see the exact string produced by repr, use print(repr(...)) instead.
The python REPL (and Ipython in your case) print out the repr() of the output value, so your input is getting repred twice.
To avoid this, print it out instead.
>>> repr('1') # what you're doing
"'1'"
>>> print(repr('1')) # if you print it out
'1'
>>> print(repr(repr('1'))) # what really happens in the first line
"'1'"
The original (outer) quotes may not be preserved since the object being repred has no idea what they originally were.
From documentation:
repr(object): Return a string containing a printable representation of
an object.
So it returns a string that given to Python can be used to recreate that object.
Your first example:
repr('"1"') # string <"1"> passed as an argument
Out[84]: '\'"1"\'' # to create your string you need to type like '"1"'.
# Outer quotes are just interpretator formatting
Your second example:
repr("'1'") # you pass a string <'1'>
Out[85]: '"\'1\'"' # to recreate it you have to type "'1'" or '\'1\'',
# depending on types of quotes you use (<'> and <"> are the same in python
Last,
repr('1') # you pass <1> as a string
Out[82]: "'1'" # to make that string in python you type '1', right?
repr("1") # you pass the same <1> as string
Out[83]: "'1'" # to recreate it you can type either '1' or "1", does not matter. Hence the output.
I both interpreter and repr set surrounding quotes to ' or " depending on content to minimize escaping, so that's why output differs.
In a dictionary, I have the following value with equals signal:
{"appVersion":"o0u5jeWA6TwlJacNFnjiTA=="}
To be explicit, I need to replace the = for the unicode representation '\u003d' (basically the reverse process of [json.loads()][1]). How can I set the unicode value to a variable without store the value with two scapes (\\u003d)?.
I've tryed of different ways, including the enconde/decode, repr(), unichr(61), etc, and even searching a lot, cound't find anything that does this, all the ways give me the following final result (or the original result):
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
Since now, thanks for your attention.
EDIT
When I debug the code, it gives me the value of the variable with 2 escapes. The program will get this value and use it to do the following actions, including the extra escape. I'm using this code to construct a json by the json.dumps() and the result returned is a unicode with 2 escapes.
Follow a print of the final result after the JSON construction. I need to find a way to store the value in the var with just one escape.
I don't know if make difference, but I'm doing this to a custom BURP Plugin, manipulating some selected requests.
Here is an image of my POC, getting the value of the var.
The extra backslash is not actually added, The Python interpreter uses the repr() to indicate that it's a backslash not something like \t or \n when the string containing \ gets printed:
I hope this helps:
>>> t['appVersion'] = t["appVersion"].replace('=', '\u003d')
>>> t['appVersion']
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
>>> print(t['appVersion'])
o0u5jeWA6TwlJacNFnjiTA\u003d\u003d
>>> t['appVersion'] == 'o0u5jeWA6TwlJacNFnjiTA\u003d\u003d'
True
I have a dictionary with some strings, in one of the string there are two backslashes. I want to replace them with a single backslash.
These are the backslashes: IfNotExist\\u003dtrue
Configurations = {
"javax.jdo.option.ConnectionUserName": "test",
"javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
"javax.jdo.option.ConnectionPassword": "sxxxsasdsasad",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue"
}
print (Configurations)
When I print it keeps showing the two backslashes. I know that the way to escape a backslash is using \ this works in a regular string but it does not work in a dictionary.
Any ideas?
The problem comes from the encoding.
In fact \u003d is the UNICODE representation of =.
The backslash is escaped by another backslash which is a good thing.
You may need to:
Replace \u003d as =
Read it as unicode then you should prepend the string with u like u"hi \\u003d" may be ok
Printing the dictionary shows you a representation of the dictionary object. It doesn't necessarily show you a nice representation of everything inside it. To do that you need to do:
for value in Configurations.values():
print(value)
When you print out your dictionary using
print (Configurations), it will print out the repr() value of the dictionary
You will get
{'javax.jdo.option.ConnectionDriverName': 'org.mariadb.jdbc.Driver', 'javax.jdo.option.ConnectionUserName': 'test', 'javax.jdo.option.ConnectionPassword': 'sxxxsasdsasad', 'javax.jdo.option.ConnectionURL': 'jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue'}
You need to print out your dictionary with
print (Configurations["javax.jdo.option.ConnectionURL"])
or
print (str(Configurations["javax.jdo.option.ConnectionURL"]))
Note: str() is added
Then the output will be
jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\u003dtrue
For more detail check Python Documentation - Fancier Output Formatting
The str() function is meant to return representations of values which
are fairly human-readable, while repr() is meant to generate
representations which can be read by the interpreter (or will force a
SyntaxError if there is no equivalent syntax).
If you want to represent that string by using a single backslash instead of a double backslash, then you need the str() representation, not the repr(). When you print a dictionary, you always get the repr() of the included strings.
You can print the str() by formatting the dictionary yourself, like so:
print ( "{" +
', '.join("'{key}': '{value}'".format(key=key, value=value)
for key, value in Configurations.items()) +
"}")
Depending on how you print your string, Python will print two backslashes where the string actually only has one in it. This is Python's way of indicating that the backslash is an actual backslash, and not part of an escaped character; because print will actually show you '\n' for a carriage return, for example.
Try writing the string to a file and then opening the file in an editor.
(Linux..)
> f = open('/tmp/somefile.txt', 'w')
> f.write(sometextwithbackslashes)
> \d
$ vi /tmp/somefile.txt
I am from a c background. started learning python few days ago. my question is what is the end of string notation in python. like we are having \0 in c. is there anything like that in python.
There isn't one. Python strings store the length of the string independent from the string contents.
There is nothing like that in Python. A string is simply a string. The following:
test = "Hello, world!"
is simply a string of 13 characters. It's a self-contained object and it knows how many character it contains, there is no need for an end-of-string notation.
Python's string management is internally a little more complex than that. Strings is a sequence type so that from a python coder's point of view it is more an array of characters than anything. (And so it has no terminating character but just a length property.)
If you must know: Internally python strings' data character arrays are null terminated. But the String object stores a couple of other properties as well. (e.g. the hash of the string for use as key in dictionaries.)
For more detailed info (especially for C coders) see here: http://www.laurentluce.com/posts/python-string-objects-implementation/