This question already has answers here:
compare object to empty tuple with the 'is' operator in Python 2.x
(4 answers)
Closed 6 years ago.
I want to make my code more (memory-)efficient. Right now we have a lot of functions that take an iterable as parameter like:
def foo(para,meter,iterable):
#...
pass
and sometimes we have to provide it an empty list to do its work properly: foo(14,25,[]). The problem is that each time a new list is constructed: it requires to allocate on the heap, and a list seems to 64 bytes of memory (on my own machine, tested with sys.getsizeof([])) whereas the empty tuple only
takes a (potentially one time) 48 bytes.
I was therefore wondering whether the empty tuple is a constant. Since tuples are immutable, one can easily make the tuple with length 0 (so ()) a constant in the program. This would decrease the "construction time" (well there is none since it only would set a reference to the constant) and reduce the amount of memory allocated.
My question is whether there are guarantees regarding the Python interpreter (that is any popular interpreter) that the empty tuple is indeed a constant such that () does not require construction time nor allocates additional memory.
Testing it with id(..) seems to support the theory that there is indeed only one zero-tuple:
>>> id(())
140290183798856
>>> a = ()
>>> id(a)
140290183798856
but it could be possible that at runtime the Python interpreter forks the tuple for some reason.
In CPython, the empty tuple is a singleton. Only one copy is created, ever, then reused whenever you use () or use tuple() on an empty generator.
The PyTuple_new() function essentially does this:
if (size == 0 && free_list[0]) {
op = free_list[0];
Py_INCREF(op);
// ...
return (PyObject *) op;
}
So if the tuple size is 0 (empty) and free_list[0] object exists (the existing empty tuple singleton), just use that.
See How is tuple implemented in CPython? for more details on free_list; CPython will also re-use already-created tuple instances up to length 20.
This is an implementation detail. Other implementations (Jython, IronPython, PyPy) do not have to do the same.
Related
This question already has an answer here:
What's with the integer cache maintained by the interpreter?
(1 answer)
Closed 2 years ago.
I came across this phrase:
"Python keeps an array of ints between -5 and 256. When you create
an int in that range, you get a reference to a pre-existing object"
You can verify with this code:
def check(n,d):
a = b = n
a -= d
b -= d
return a is b
Now, check(500,10) returns False. But check(500,300) returns True. Why does Python compiler would do such a thing? Isn't it a perfect recipe for bugs?
CPython (the reference interpreter) does it because it saves a lot of memory (and a small amount of execution time) to have the most commonly used ints served from a cache. Incrementing a number uses a shared temporary, not a unique value in each place you do the increment. Iterating a bytes or bytearray object can go much faster by directly pulling the cached entries. It's not a language guarantee though, so never write code like this (which relies on it).
It's not a bug factory because:
Relying on object identity tests for ints is a terrible idea in the first place; you should always be using == to compare ints, and
ints are immutable; it's impossible to modify the cached entries without writing intentionally evil ctypes or C extension modules. Normal Python code can't trigger bugs due to this cache.
This question already has answers here:
How can two Python objects have same id but 'is' operator returns False?
(2 answers)
Why is the id of a Python class not unique when called quickly?
(6 answers)
Unnamed Python objects have the same id
(2 answers)
Closed 4 years ago.
Note that this question might be (is?) specific to CPython.
Say you have some list, and check copies of the list for identity against each other:
>>> a=list(range(10))
>>> b,c=a[:],a[:]
>>> b is c
False
>>> id(b), id(c)
(3157888272304, 3157888272256)
No great shakes there. But if we do this in a more ephemeral way, things might seem a bit weird at first:
>>> a[:] is a[:]
False # <- two ephemeral copies not the same object (duh)
>>> id(a[:]),id(a[:])
(3157888272544, 3157888272544) # <- but two other ephemerals share the same id..? hmm....
...until we recognize what is probably going on here. I have not confirmed it by looking at the CPython implementation (I can barely read c++ so it would be a waste of time, to be honest), but it at least seems obvious that even though two objects have the same id, CPython is smart enough to know that they aren't the same object.
Assuming this is correct, my question is: what criteria is CPython using to determine whether the two ephemeral objects are the not the same object, given that they have the same id (presumably for efficiency reasons- see below)? Is it perhaps looking at the time it was marked to be garbage collected? The time it was created? Or something else...?
My theory on why they have the same id is that, likely, CPython knows an ephemeral copy of the list was already made and is waiting to be garbage collected, and it just efficiently re-uses the same memory location. It would be great if an answer could clarify/confirm this as well.
Two unmutable objects, sharing the same address, would, as you are concerned, be indistinguishable from each other.
The thing is that when you do a[:] is a[:] both objetcts are not at the same address - in order for the identity operator is to compare both objects, both operands have to exist - so, there is still a reference to the object at the left hand side when the native code for is is actually run.
On the other hand, when you do id(a[:]),id(a[:]) the object inside the parentheses on the first call is left without any references as soon as the id function call is done, and is destroyed, freeing the memory block to be used by the second a[:].
From what I've been aware of, using [], {} or () to instantiate objects returns a new instance of list, dict or tuple respectively; a new instance object with a new identity.
This was pretty clear to me until I actually tested it and I noticed that () is () actually returns True instead of the expected False:
>>> () is (), [] is [], {} is {}
(True, False, False)
as expected, this behavior is also manifested when creating objects with list(), dict() and tuple() respectively:
>>> tuple() is tuple(), list() is list(), dict() is dict()
(True, False, False)
The only relevant piece of information I could find in the docs for tuple() states:
[...] For example, tuple('abc') returns ('a', 'b', 'c') and tuple([1, 2, 3]) returns (1, 2, 3). If no argument is given, the constructor creates a new empty tuple, ().
Suffice to say, this isn't sufficient for answering my question.
So, why do empty tuples have the same identity whilst others like lists or dictionaries do not?
In short:
Python internally creates a C list of tuple objects whose first element contains the empty tuple. Every time tuple() or () is used, Python will return the existing object contained in the aforementioned C list and not create a new one.
Such mechanism does not exist for dict or list objects which are, on the contrary, recreated from scratch every time.
This is most likely related to the fact that immutable objects (like tuples) cannot be altered and, as such, are guaranteed to not change during execution. This is further solidified when considering that frozenset() is frozenset() returns True; like () an empty frozenset is considered an singleton in the implementation of CPython. With mutable objects, such guarantees are not in place and, as such, there's no incentive to cache their zero element instances (i.e their contents could change with the identity remaining the same).
Take note: This isn't something one should depend on, i.e one shouldn't consider empty tuples to be singletons. No such guarantees are explicitly made in the documentation so one should assume it is implementation dependent.
How it is done:
In the most common case, the implementation of CPython is compiled with two macros PyTuple_MAXFREELIST and PyTuple_MAXSAVESIZE set to positive integers. The positive value for these macros results in the creation of an array of tuple objects with size PyTuple_MAXSAVESIZE.
When PyTuple_New is called with the parameter size == 0 it makes sure to add a new empty tuple to the list if it doesn't already exist:
if (size == 0) {
free_list[0] = op;
++numfree[0];
Py_INCREF(op); /* extra INCREF so that this is never freed */
}
Then, if a new empty tuple is requested, the one that is located in the first position of this list is going to get returned instead of a new instance:
if (size == 0 && free_list[0]) {
op = free_list[0];
Py_INCREF(op);
/* rest snipped for brevity.. */
One additional reason causing an incentive to do this is the fact that function calls construct a tuple to hold the positional arguments that are going to be used. This can be seen in the load_args function in ceval.c:
static PyObject *
load_args(PyObject ***pp_stack, int na)
{
PyObject *args = PyTuple_New(na);
/* rest snipped for brevity.. */
which is called via do_call in the same file. If the number of arguments na is zero, an empty tuple is going to be returned.
In essence, this might be an operation that's performed frequently so it makes sense to not reconstruct an empty tuple every single time.
Further reading:
A couple more answers shed light on CPython's caching behaviour with immutables:
For integers, another answer that digs in the source can be found here.
For strings, a handful of answers can be found here, here and here.
This question already has an answer here:
CPython memory allocation
(1 answer)
Closed 6 years ago.
I was reading this question: Cannot return int array because I ran into the same problem.
It seems that data structures (because C can obviously return a locally declared variable) declared locally within a function cannot be returned, in this case an array.
However Python doesn't suffer from the same problem; as far as I can remember, it's possible to declare an array within a function and to return that array without having to pass it as an argument.
What is the difference "under the hood"? Is Python using pointers implicitly (using malloc within the function)?
For the record, Python's built-in mutable sequence type is called a list, not an array, but it behaves similarly (it's just dynamically resizable, like C++'s std::vector).
In any event, you're correct that all Python objects are implicitly dynamically allocated; only the references (roughly, pointers) to them are on the "stack" (that said, the Python interpreter stack and the C level stack are not the same thing to start with). Comparable C code would dynamically allocate the array and return a pointer to it (with the caller freeing it when done; different Python interpreters handle this differently, but the list would be garbage collected when no longer referenced in one way or another).
Python has no real concept of "stack arrays" (it always returns a single object, though that object could be a tuple to simulate multiple return values), so returns are always ultimately a single "pointer" value (the reference to the returned object).
It seems that data structures (because C can obviously return a locally declared variable) declared locally within a function cannot be returned, in this case an array.
You already have a good Python answer; I wanted to look at the C side a little more closely.
Yes, a C function returns a value. That value may be primitive C type, or a struct or union type. Or, it may be a pointer type.
The C language syntax makes arrays and pointers seem very similar, which makes arrays special. Because the name of the array is the same as the address of the first element, it can't be something else. In particular, an array name does not refer to the whole array (except in the case of the sizeof operator). Because any other use of an array name refers to the address of the first element, attempting to return an array results in returning only that address.
Because it's a C function, that address is returned by value: namely, a value of a pointer type. So, when we say,
char *s = strdup("hello");
s is a pointer type whose value is not "hello", but the value of address of the first element of the array that strdup allocates.
Python doesn't suffer from the same problem
When Y is a property of X, Y is a problem only if that property is, in the eyes of the beholder, undesirable. You can be sure the way C treats arrays is not accidental, and is often convenient.
This question already has answers here:
Why does id({}) == id({}) and id([]) == id([]) in CPython?
(2 answers)
Closed 7 years ago.
Here is the simple Python code: What's the difference between Case 1 and Case 2 -- why am I getting result as False in first case and True in other? Why are the ids equal in the Case 2? Also does dir(object) call object._dir__() internally? If so the return object/results of two calls should it be the same.
class Hello:
def __init__(self):
self.a1 = "a1"
hello = Hello()
print(hello)
# Case 1
var1 = dir(hello)
var2 = hello.__dir__()
print(id(var1), id(var2), id(var1) == id(var2))
# Case 2
print(id(dir(hello)), id(hello.__dir__()), id(dir(hello)) == id(hello.__dir__()))
print(dir(hello) == hello.__dir__())
Output
<__main__.Hello object at 0x7f320828c320>
139852862206472 139852862013960 False
139852862014024 139852862014024 True
False
It's just a coincidence that you're ever getting True. (Well, not a coincidence, since the implementation of CPython makes it very likely… but it's not something the language requires.)
In case 1, you have two different dicts in var1 and var2. They're both alive at the same time, so they can't have the same id.
In case 2, you again have two different dicts—but this time, you aren't storing them anywhere; as soon as you call id on one, you release it, which means it can get garbage collected* before you get the other one,** which means it can end up reusing the same id.***
Notice that the docs for id say:
This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
If you actually want to test whether two expressions refer to the same object, use is, don't compare their ids.
Your edited question also asks:
Also does dir(object) calls object._dir__() internally?
According to dir:
If the object has a method named __dir__(), this method will be called and must return the list of attributes.
And the data model section on __dir__ says:
Called when dir() is called on the object. A sequence must be returned. dir() converts the returned sequence to a list and sorts it.
Then you say:
If so the return object of two calls should be the same.
Well, it depends on what you mean by "the same". It should return an equal value (since nothing has changed), but it's not going to be the identical value, which is what you're trying to test for. (If it isn't obvious why dir gives you a new list each time, it should still be clear that it must do so from the fact that "dir() converts the returned sequence to a list and sorts it"…)
* Because CPython uses reference counting as its primary garbage collection mechanism, "can be collected" generally means "will be collected immediately". This isn't true for most other Python implementations.
** If the order in which parts of your expression get evaluated isn't clear to you from reading the docs, you can try dis.dis('id(dir(hello)) == id(hello.__dir__())') to see the actual bytecodes in order.
*** In CPython, the id is just the address of the PyObject struct that represents the object; if one PyObject gets freed and another one of the same type gets allocated immediately after, it will usually get the same address.