Behavior of python's repr method - python

I understand that the goal of repr is to be unambiguous, but the behavior of repr really confused me.
repr('"1"')
Out[84]: '\'"1"\''
repr("'1'")
Out[85]: '"\'1\'"'
Based on the above code, I think repr just put '' around the string.
But when i try this:
repr('1')
Out[82]: "'1'"
repr("1")
Out[83]: "'1'"
repr put "" around strings and repr("1") and repr('1') is the same.
Why?

There are three levels of quotes going on here!
The quotes inside the string you're passing (only present in your first example).
The quotes in the string produced by repr. Keep in mind that repr tries to return a string representation that would work as Python code, so if you pass it a string, it will add quotes around the string.
The quotes added by your Python interpreter upon printing the output. These are probably what confuses you. Probably your interpreter is calling repr again, in order to give you an idea of the type of object being returned. Otherwise, the string 1 and the number 1 would look identical.
To get rid of this extra level of quoting, so you can see the exact string produced by repr, use print(repr(...)) instead.

The python REPL (and Ipython in your case) print out the repr() of the output value, so your input is getting repred twice.
To avoid this, print it out instead.
>>> repr('1') # what you're doing
"'1'"
>>> print(repr('1')) # if you print it out
'1'
>>> print(repr(repr('1'))) # what really happens in the first line
"'1'"
The original (outer) quotes may not be preserved since the object being repred has no idea what they originally were.

From documentation:
repr(object): Return a string containing a printable representation of
an object.
So it returns a string that given to Python can be used to recreate that object.
Your first example:
repr('"1"') # string <"1"> passed as an argument
Out[84]: '\'"1"\'' # to create your string you need to type like '"1"'.
# Outer quotes are just interpretator formatting
Your second example:
repr("'1'") # you pass a string <'1'>
Out[85]: '"\'1\'"' # to recreate it you have to type "'1'" or '\'1\'',
# depending on types of quotes you use (<'> and <"> are the same in python
Last,
repr('1') # you pass <1> as a string
Out[82]: "'1'" # to make that string in python you type '1', right?
repr("1") # you pass the same <1> as string
Out[83]: "'1'" # to recreate it you can type either '1' or "1", does not matter. Hence the output.
I both interpreter and repr set surrounding quotes to ' or " depending on content to minimize escaping, so that's why output differs.

Related

Why do backslashes appear twice in dict? [duplicate]

This question already has an answer here:
Why does printing a tuple (list, dict, etc.) in Python double the backslashes?
(1 answer)
Closed 7 months ago.
When I create a string containing backslashes, they get duplicated:
NOTE : I want to add \ in request because i want to call third party API and they want me to send request with \ in some of their keys.
I have taken reference from this answer Why do backslashes appear twice?, but its working only for string, not for dict.
mystr = {"str": "why\does\it\happen?"}
print(mystr)
output:
{'str': 'why\\does\\it\\happen?'}
here i am attching a screenshot for better understanding.
mystr isn't a str, it's a dict. When you print a dict, it prints the repr of each string inside it, rather than the string itself.
>>> mydict = {"str": "why\does\it\happen?"}
>>> print(mydict)
{'str': 'why\\does\\it\\happen?'}
>>> print(repr(mydict['str']))
'why\\does\\it\\happen?'
>>> print(mydict['str'])
why\does\it\happen?
Note that the repr() includes elements other than the string contents:
The quotes around it (indicating that it's a string)
The contents use backslash-escapes to disambiguate the individual characters. This extends to other "special" characters as well; for example, if this were a multiline string, the repr would show linebreaks as \n within a single line of text. Actual backslash characters are always rendered as \\ so that they can be distinguished from backslashes that are part of other escape sequences.
The key thing to understand is that these extra elements are just the way that the dict is rendered when it is printed. The actual contents of the string inside the dict do not have "doubled backslashes", as you can see when you print mydict['str'].
If you are using this dict to call an API, you should not be using str(mydict) or anything similar; if it's a Python API, you should be able to use mydict itself, and if it's a web API, it should be using something like JSON encoding (json.dumps(mydict)).
I think that to build the printed string of the dict, python call the __repr__ method of object inside it (for the values) instead of the __str__ as you would expect for printing the dict.
It would make sense since dict can contain every type of object not just string so the __repr__| method can be found everywhere (it's included in the base object in python) when the __str__ need to be written.
But it's only a guess, not a definitive answer.

Python - How can I convert a special character to the unicode representation?

In a dictionary, I have the following value with equals signal:
{"appVersion":"o0u5jeWA6TwlJacNFnjiTA=="}
To be explicit, I need to replace the = for the unicode representation '\u003d' (basically the reverse process of [json.loads()][1]). How can I set the unicode value to a variable without store the value with two scapes (\\u003d)?.
I've tryed of different ways, including the enconde/decode, repr(), unichr(61), etc, and even searching a lot, cound't find anything that does this, all the ways give me the following final result (or the original result):
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
Since now, thanks for your attention.
EDIT
When I debug the code, it gives me the value of the variable with 2 escapes. The program will get this value and use it to do the following actions, including the extra escape. I'm using this code to construct a json by the json.dumps() and the result returned is a unicode with 2 escapes.
Follow a print of the final result after the JSON construction. I need to find a way to store the value in the var with just one escape.
I don't know if make difference, but I'm doing this to a custom BURP Plugin, manipulating some selected requests.
Here is an image of my POC, getting the value of the var.
The extra backslash is not actually added, The Python interpreter uses the repr() to indicate that it's a backslash not something like \t or \n when the string containing \ gets printed:
I hope this helps:
>>> t['appVersion'] = t["appVersion"].replace('=', '\u003d')
>>> t['appVersion']
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
>>> print(t['appVersion'])
o0u5jeWA6TwlJacNFnjiTA\u003d\u003d
>>> t['appVersion'] == 'o0u5jeWA6TwlJacNFnjiTA\u003d\u003d'
True

Python - How to print one backslash in a string within a dictionary?

I have a dictionary with some strings, in one of the string there are two backslashes. I want to replace them with a single backslash.
These are the backslashes: IfNotExist\\u003dtrue
Configurations = {
"javax.jdo.option.ConnectionUserName": "test",
"javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
"javax.jdo.option.ConnectionPassword": "sxxxsasdsasad",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue"
}
print (Configurations)
When I print it keeps showing the two backslashes. I know that the way to escape a backslash is using \ this works in a regular string but it does not work in a dictionary.
Any ideas?
The problem comes from the encoding.
In fact \u003d is the UNICODE representation of =.
The backslash is escaped by another backslash which is a good thing.
You may need to:
Replace \u003d as =
Read it as unicode then you should prepend the string with u like u"hi \\u003d" may be ok
Printing the dictionary shows you a representation of the dictionary object. It doesn't necessarily show you a nice representation of everything inside it. To do that you need to do:
for value in Configurations.values():
print(value)
When you print out your dictionary using
print (Configurations), it will print out the repr() value of the dictionary
You will get
{'javax.jdo.option.ConnectionDriverName': 'org.mariadb.jdbc.Driver', 'javax.jdo.option.ConnectionUserName': 'test', 'javax.jdo.option.ConnectionPassword': 'sxxxsasdsasad', 'javax.jdo.option.ConnectionURL': 'jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\\u003dtrue'}
You need to print out your dictionary with
print (Configurations["javax.jdo.option.ConnectionURL"])
or
print (str(Configurations["javax.jdo.option.ConnectionURL"]))
Note: str() is added
Then the output will be
jdbc:mysql://hive-metastore.cr.eu-west-1.rds.amazonaws.com:3306/hive?createDatabaseIfNotExist\u003dtrue
For more detail check Python Documentation - Fancier Output Formatting
The str() function is meant to return representations of values which
are fairly human-readable, while repr() is meant to generate
representations which can be read by the interpreter (or will force a
SyntaxError if there is no equivalent syntax).
If you want to represent that string by using a single backslash instead of a double backslash, then you need the str() representation, not the repr(). When you print a dictionary, you always get the repr() of the included strings.
You can print the str() by formatting the dictionary yourself, like so:
print ( "{" +
', '.join("'{key}': '{value}'".format(key=key, value=value)
for key, value in Configurations.items()) +
"}")
Depending on how you print your string, Python will print two backslashes where the string actually only has one in it. This is Python's way of indicating that the backslash is an actual backslash, and not part of an escaped character; because print will actually show you '\n' for a carriage return, for example.
Try writing the string to a file and then opening the file in an editor.
(Linux..)
> f = open('/tmp/somefile.txt', 'w')
> f.write(sometextwithbackslashes)
> \d
$ vi /tmp/somefile.txt

How to return a string without quotes Python 3

I've got to write a single-input module that can convert decimals to Bukiyip (some ancient language with a counting base of 3 or 4). For the purpose of the assignment, we only need to work with base 3.
I've written some code that does this, but it returns my Bukiyip number with quotes, leaving me with an answer such as '110' for 12.
Please help me understand how to work around this? I'm new to Python and keen to learn so explanations will be really appreciated.
def bukiyip_to_decimal(num):
convert_to_string = "012"
if num < 3:
return convert_to_string[num]
else:
return bukiyip_to_decimal(num//3) + convert_to_string[num%3]
I've also tried the following, but get errors.
else:
result = bukiyip_to_decimal(num//3) + convert_to_string[num%3]
print(int(result))
You are either echoing the return value in your interpreter, including the result in a container (such as a list, dictionary, set or tuple), or directly producing the repr() output for your result.
Your function (rightly) returns a string. When echoing in the interpreter or using the repr() function you are given a debugging-friendly representation, which for strings means Python will format the value in a way you can copy and paste right back into Python to reproduce the value. That means that the quotes are included.
Just print the value itself:
>>> result = bukiyip_to_decimal(12)
>>> result
'110'
>>> print(result)
110
or use it in other output:
>>> print('The Bukiyip representation for 12 is {}'.format(result))
The Bukiyip representation for 12 is 110
int() doesn't work? The quotes are not for decoration, you see. They are part of the string literal representation. "hello" is a string. It is not hello with quotes. A bare hello is an identifier, a name. So you don't wanna strip quotes from a string, which doesn't make any sense. What you want is a int.

How to print tuples of unicode strings in original language (not u'foo' form)

I have a list of tuples of unicode objects:
>>> t = [('亀',), ('犬',)]
Printing this out, I get:
>>> print t
[('\xe4\xba\x80',), ('\xe7\x8a\xac',)]
which I guess is a list of the utf-8 byte-code representation of those strings?
but what I want to see printed out is, surprise:
[('亀',), ('犬',)]
but I'm having an inordinate amount of trouble getting the bytecode back into a human-readable form.
but what I want to see printed out is, surprise:
[('亀',), ('犬',)]
What do you want to see it printed out on? Because if it's the console, it's not at all guaranteed your console can display those characters. This is why Python's ‘repr()’ representation of objects goes for the safe option of \-escapes, which you will always be able to see on-screen and type in easily.
As a prerequisite you should be using Unicode strings (u''). And, as mentioned by Matthew, if you want to be able to write u'亀' directly in source you need to make sure Python can read the file's encoding. For occasional use of non-ASCII characters it is best to stick with the escaped version u'\u4e80', but when you have a lot of East Asian text you want to be able to read, “# coding=utf-8” is definitely the way to go.
print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])
That would print the characters unwrapped by quotes. Really you'd want:
def reprunicode(u):
return repr(u).decode('raw_unicode_escape')
print u'[%s]' % u', '.join([u'(%s,)' % reprunicode(ti[0]) for ti in t])
This would work, but if the console didn't support Unicode (and this is especially troublesome on Windows), you'll get a big old UnicodeError.
In any case, this rarely matters because the repr() of an object, which is what you're seeing here, doesn't usually make it to the public user interface of an application; it's really for the coder only.
However, you'll be pleased to know that Python 3.0 behaves exactly as you want:
plain '' strings without the ‘u’ prefix are now Unicode strings
repr() shows most Unicode characters verbatim
Unicode in the Windows console is better supported (you can still get UnicodeError on Unix if your environment isn't UTF-8)
Python 3.0 is a little bit new and not so well-supported by libraries, but it might well suit your needs better.
First, there's a slight misunderstanding in your post. If you define a list like this:
>>> t = [('亀',), ('犬',)]
...those are not unicodes you define, but strs. If you want to have unicode types, you have to add a u before the character:
>>> t = [(u'亀',), (u'犬',)]
But let's assume you actually want strs, not unicodes. The main problem is, __str__ method of a list (or a tuple) is practically equal to its __repr__ method (which returns a string that, when evaluated, would create exactly the same object). Because __repr__ method should be encoding-independent, strings are represented in the safest mode possible, i.e. each character outside of ASCII range is represented as a hex character (\xe4, for example).
Unfortunately, as far as I know, there's no library method for printing a list that is locale-aware. You could use an almost-general-purpose function like this:
def collection_str(collection):
if isinstance(collection, list):
brackets = '[%s]'
single_add = ''
elif isinstance(collection, tuple):
brackets = '(%s)'
single_add =','
else:
return str(collection)
items = ', '.join([collection_str(x) for x in collection])
if len(collection) == 1:
items += single_add
return brackets % items
>>> print collection_str(t)
[('亀',), ('犬',)]
Note that this won't work for all possible collections (sets and dictionaries, for example), but it's easy to extend it to handle those.
Python source code files are strictly ASCII, so you must use the \u escape sequences unless you specify an encoding. See PEP 0263.
#!/usr/bin/python
# coding=utf-8
t = [u'亀', u'犬']
print t
When you pass an array to print, Python converts the object into a string using Python's rules for string conversions. The output of such conversions are designed for eval(), which is why you see those \u sequences. Here's a hack to get around that based on bobince's solution. The console must accept Unicode or this will throw an exception.
t = [(u'亀',), (u'犬',)]
print repr(t).decode('raw_unicode_escape')
So this appears to do what I want:
print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])
>>> t = [('亀',), ('犬',)]
>>> print t
[('\xe4\xba\x80',), ('\xe7\x8a\xac',)]
>>> print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])
[(亀,), (犬,)]
Surely there's a better way to do it.
(but other two answers thus far don't result in the original string being printed out as desired).
Try:
import codecs, sys
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
It seems people are missing what people want here. When I print unicode from a tuple, I just want to get rid of the 'u' '[' '(' and quotes. What we want is a function like below.
After scouring the Net it seems to be the cleanest way to get atomic displayable data.
If the data is not in a tuple or list, I don't think this problem exists.
def Plain(self, U_String) :
P_String = str(U_String)
m=re.search("^\(\u?\'(.*)\'\,\)$", P_String)
if (m) : #Typical unicode
P_String = m.group(1).decode("utf8")
return P_String

Categories