This question already has answers here:
memory location in unicode strings
(2 answers)
Closed 8 years ago.
Why Unicode string literals show different id's ? I was hoping the same behavior as that of String literals.
>>> p = 'abcd'
>>> q = 'abcd'
>>> id(p) == id(q)
True
>>> p = u'abcd'
>>> q = u'abcd'
>>> id(p) == id(q)
False
Please provide some pointers on this.
For the same reason two dicts with the same contents would have different ids: they are distinct objects. I suspect that the non-Unicode string literals being the same object is something of an optimization.
Related
This question already has answers here:
What are the rules for cpython's string interning?
(2 answers)
Python string interning
(2 answers)
Why and where python interned strings when executing `a = 'python'` while the source code does not show that?
(1 answer)
Closed last year.
When you assign same string literal to two variables, Python only allocates one string. This is very reasonable since string is immutable object in Python.
>>> a = "Hello"
>>> b = "Hello"
>>> id(a)
4311984752
>>> id(b)
4311984752
>>> a is b
True
But the strange part is: when the string contains special character (like !), Python will allocate two strings with exact same content.
>>> a = "hi!"
>>> b = "hi!"
>>> id(a)
4328663024
>>> id(b)
4317237616
>>> a is b
False
I read about this strange behaviour from here: https://python-course.eu/python-tutorial/data-types-and-variables.php
But that guide didn't elaborate why Python does this seemingly unnecessary duplicated string allocation.
My question is what's rationale behind Python's design of duplicated string allocation for string containing special character?
This question already has answers here:
'is' operator behaves differently when comparing strings with spaces
(6 answers)
About the changing id of an immutable string
(5 answers)
Closed 8 years ago.
>>> s1 = "spam"
>>> s2 = "spam"
>>> s1 is s2
True
>>> q = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> r = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> q is r
False
How many characters should have to s1 is s2 give False? Where is limit? i.e. I am asking how long a string has to be before python starts making separate copies of it.
String interning is implementation specific and shouldn't be relied upon, use equality testing if you want to check two strings are identical.
If you want, for some bizarre reason, to force the comparison to be true then use the intern function:
>>> a = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp')
>>> b = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp')
>>> a is b
True
Here is a piece of comment about interned string from CPython 2.5.0 source file (stringobject.h)
/* ... ... This is generally restricted to strings that **"look like" Python identifiers**, although the intern() builtin can be used to force interning of any string ... ... */
Accordingly, strings contain only underscores, digits or alphabets will be interned. In your example, q and ``r contain ;, so they will not be interned.
This question already has answers here:
'is' operator behaves differently when comparing strings with spaces
(6 answers)
Closed 8 months ago.
>>> a = "zzzzqqqqasdfasdf1234"
>>> b = "zzzzqqqqasdfasdf1234"
>>> id(a)
4402117560
>>> id(b)
4402117560
but
>>> c = "!##$"
>>> d = "!##$"
>>> id(c) == id(d)
False
>>> id(a) == id(b)
True
Why get same id() result only when assign string?
Edited: I replace "ascii string" with just "string". Thanks for feedback
It's not about ASCII vs. non-ASCII (your "non-ASCII" is still ASCII, it's just punctuation, not alphanumeric). CPython, as an implementation detail, interns string constants that contain only "name characters". "Name characters" in this case means the same thing as the regex escape \w: Alphanumeric, plus underscore.
Note: This can change at any time, and should never be relied on, it's just an optimization they happen to use.
At a guess, this choice was made to optimize code that uses getattr and setattr, dicts keyed by a handful of string literals, etc., where interning means that the dictionary lookups involved often ends up doing pointer comparisons and avoiding comparing the strings at all (when two strings are both interned, they are definitionally either the same object, or not equal, so you can avoid reading their data entirely).
This question already has answers here:
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Closed 8 years ago.
i has some json works like this.
>>> j = '{"a":5}'
>>> js = json.loads(j)
>>> for key in js:
... key
... type(key)
... key is unicode('a')
...
u'a'
<type 'unicode'>
False
In my opinion last value of output should be true. Plese helpme to find my mistake.
Why are you using is here? You just want to compare:
key == unicode('a')
or even better:
key == u'a'
This question already has answers here:
'is' operator behaves differently when comparing strings with spaces
(6 answers)
About the changing id of an immutable string
(5 answers)
Closed 8 years ago.
>>> s1 = "spam"
>>> s2 = "spam"
>>> s1 is s2
True
>>> q = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> r = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> q is r
False
How many characters should have to s1 is s2 give False? Where is limit? i.e. I am asking how long a string has to be before python starts making separate copies of it.
String interning is implementation specific and shouldn't be relied upon, use equality testing if you want to check two strings are identical.
If you want, for some bizarre reason, to force the comparison to be true then use the intern function:
>>> a = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp')
>>> b = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp')
>>> a is b
True
Here is a piece of comment about interned string from CPython 2.5.0 source file (stringobject.h)
/* ... ... This is generally restricted to strings that **"look like" Python identifiers**, although the intern() builtin can be used to force interning of any string ... ... */
Accordingly, strings contain only underscores, digits or alphabets will be interned. In your example, q and ``r contain ;, so they will not be interned.