I've been trying to learn how CPython is implemented under the scenes. It's great that Python is high level, but I don't like treating it like a black box.
With that in mind, how are tuples implemented? I've had a look at the source (tupleobject.c), but it's going over my head.
I see that PyTuple_MAXSAVESIZE = 20 and PyTuple_MAXFREELIST = 2000, what is saving and the "free list"? (Will there be a performance difference between tuples of length 20/21 or 2000/2001? What enforces the maximum tuple length?)
As a caveat, everything in this answer is based on what I've gleaned from looking over the implementation you linked.
It seems that the standard implementation of a tuple is simply as an array. However, there are a bunch of optimizations in place to speed things up.
First, if you try to make an empty tuple, CPython instead will hand back a canonical object representing the empty tuple. As a result, it can save on a bunch of allocations that are just allocating a single object.
Next, to avoid allocating a bunch of small objects, CPython recycles memory for many small lists. There is a fixed constant (PyTuple_MAXSAVESIZE) such that all tuples less than this length are eligible to have their space reclaimed. Whenever an object of length less than this constant is deallocated, there is a chance that the memory associated with it will not be freed and instead will be stored in a "free list" (more on that in the next paragraph) based on its size. That way, if you ever need to allocate a tuple of size n and one has previously been allocated and is no longer in use, CPython can just recycle the old array.
The free list itself is implemented as an array of size PyTuple_MAXSAVESIZE storing pointers to unused tuples, where the nth element of the array points either to NULL (if no extra tuples of size n are available) or to a reclaimed tuple of size n. If there are multiple different tuples of size n that could be reused, they are chained together in a sort of linked list by having each tuple's zeroth entry point to the next tuple that can be reused. (Since there is only one tuple of length zero ever allocated, there is never a risk of reading a nonexistent zeroth element). In this way, the allocator can store some number of tuples of each size for reuse. To ensure that this doesn't use too much memory, there is a second constant PyTuple_MAXFREELIST that controls the maximum length of any of these linked lists within any bucket. There is then a secondary array of length PyTuple_MAXSAVESIZE that stores the length of the linked lists for tuples of each given length so that this upper limit isn't exceeded.
All in all, it's a very clever implementation!
Because in the course of normal operations Python will create and destroy a lot of small tuples, Python keeps an internal cache of small tuples for that purpose. This helps cut down on a lot of memory allocation and deallocation churn. For the same reasons small integers from -5 to 255 are interned (made into singletons).
The PyTuple_MAXSAVESIZE definition controls at the maximum size of tuples that qualify for this optimization, and the PyTuple_MAXFREELIST definition controls how many of these tuples keeps around in memory. When a tuple of length < PyTuple_MAXSAVESIZE is discarded, it is added to the free list if there is still room for one (in tupledealloc), to be re-used when Python creates a new small tuple (in PyTuple_New).
Python is being a little clever about how it stores these; for each tuple of length > 0, it'll reuse the first element of each cached tuple to chain up to PyTuple_MAXFREELIST tuples together into a linked list. So each element in the free_list array is a linked list of Python tuple objects, and all tuples in such a linked list are of the same size. The only exception is the empty tuple (length 0); only one is ever needed of these, it is a singleton.
So, yes, for tuples over length PyTuple_MAXSAVESIZE python is guaranteed to have to allocate memory separately for a new C structure, and that could affect performance if you create and discard such tuples a lot.
If you want to understand Python C internals, I do recommend you study the Python C API; it'll make it easier to understand the various structures Python uses to define objects, functions and methods in C.
Related
I have read the answer from this question as well as the related questions about the issue of having different objects sharing the same id (which can be answered by this Python docs about id). However, in these questions, I notice that the contents of the objects are the same (thus the memory sizes are the same, too). I experiment with the list of different sizes and contents on both the IPython shell and .py file with CPython, and get the "same id" result, too:
print(id([1]), id([1,2,3]), id([1,2,3,4,1,1,1,1,1,1,1,1,1,1,1,1]))
# Result: 2067494928320 2067494928320 2067494928320
The result doesn't change despite how many elements or the size of the number (big or small) I add to the list
So I have a question here: when an id is given, does the list size have any effect on whether the id can be reused or not? I thought that it could because according to the docs above,
CPython implementation detail: This is the address of the object in memory.
and if the address does not have enough space for the list, then a new id should be given. But I'm quite surprised about the result above.
Make a list, and some items to it. the id remains the same:
In [21]: alist = []
In [22]: id(alist)
Out[22]: 139981602327808
In [23]: for i in range(29): alist.append(i)
In [24]: id(alist)
Out[24]: 139981602327808
But the memory use for this list occurs in several parts. There's some sort storage for the list instance itself (that's that the id references). Python is written in C, but all items are objects (as in C++).
The list also has a data buffer, think of it as a C array with fix size. It holds pointers to objects elsewhere in memory. That buffer has space for the current references plus some sort of growth space. As you add items to list, their references are inserted in the growth space. When that fills up, the list gets a new buffer, with more growth space. List append is relatively fast, with periodic slow downs as it copies references to the new buffer. All that occurs under the covers so that the Python programmer doesn't notice.
I suspect that in my example alist got a new buffer several times, but I don't there's any way to track or measure that.
Storage for the objects referenced by the list is another matter. cython creates small integer objects (up to 256) at the start, so my list (and yours) will have references to those unique integer objects. It also maintains some sort of cache of strings. But other things, such as larger numbers, other lists, dicts, custom class objects, are created as needed. Identically valued objects might well have different id.
So while the data buffer of the list is contiguous, the items referenced in the buffer are not.
By and large, that memory management is unimportant to programmers. It's only when data structures get very large that we need to worry. And that seems to occur more with object classes like numpy arrays, which have a very different memory use.
In the chapter on Arrays in the book Elements of Programming Interviews in Python, it is mentioned that Filling an array from the front is slow, so see if it’s possible to write values from the back.
What could be the possible reason for that?
Python lists, at least in CPython, the standard Python implementation, are actually implemented from a data structure perspective as arrays, not lists.
However, these are dynamically allocated and resized, so appending to the end of a Python-list is actually possible. It takes a somewhat variable amount of time to do so: CPython tries to allocate additional space when items are being appended beyond what is actually necessary, such that it doesn't need to allocate more space for every append operation. At best, appending, if space has already been allocated, is O(1), and since it is an array, indexing is also O(1).
What will take a long time, however, is adding something to the beginning of a list, as this would require shifting all the array values, and is O(n), just as popping the first element is.
Python language designers have decided to call these arrays lists instead of arrays, contradicting standard terminology, in part, I assume, because the dynamic resizing makes them different from standard, fixed-size lists.
Unless I'm mistaken, collections.deque implements a doubly-linked list, with the corresponding O(1) appends/pops on either side, and so on.
I have a quite large list (>1K elements) of objects of the same type in my Python program. The list is never modified - no elements are added, removed or changed. Are there any downside to putting the objects into a tuple instead of a list?
On the one hand, tuples are immutable so that matches my requirements. On the other hand, using such a large tuple just feels wrong. In my mind, tuples has always been for small collections. It's a double, a tripple, a quadruple... Not a two-thousand-and-fiftyseven-duple.
Is my fear of large tuples somehow justified? Is it bad for performance, unpythonic, or otherwise bad practice?
In CPython, go ahead. Under the covers, the only real difference between the storage of lists and tuples is that the C-level array holding the tuple elements is allocated in the tuple object, while a list object contains a pointer to a C-level array holding the list elements, which is allocated separately from the list object. The list implementation needs to do that because the list may grow, and so the memory containing the C-level vector may need to change its base address. A tuple can't change size, so the memory for it is allocated directly in the tuple object.
I've created tuples with millions of elements, and yet I lived to type about it ;-)
Obscure
In CPython, there can even be "a reason" to prefer giant tuples: the cyclic garbage collection scheme exempts a tuple from periodic scanning if it only contains immutable objects. Then the tuple can never be part of a cycle, so cyclic gc can ignore it. The same optimization cannot be used for lists; just because a list contains only immutable objects during one run of cyclic gc says nothing about whether that will still be the case during the next run.
This is almost never highly significant, but it can save a percent or so in a long-running program, and the benefit of exempting giant tuples grows the bigger they are.
Yes, it is OK.
However, depending on the operations you're doing, you might want to consider using the set function in Python. This will convert your input iterable (tuple, list, or other) to a set. Sets are nice for a few reasons, but especially because you get a unique list of items that has constant time lookup for items.
There's nothing "un-pythonic" about holding large data sets in memory, though.
If I am using C-Python or jython (in Python 2.7), and for list ([]) data structure, if I continue to add new elements, will there be memory reallocation issue like how Java ArrayList have (since Java ArrayList requires continuous memory space, if current pre-allocated space is full, it needs re-allocate new larger continuous large memory space, and move existing elements to the new allocated space)?
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java#ArrayList.ensureCapacity%28int%29
regards,
Lin
The basic story, at least for the main Python, is that a list contains pointers to objects elsewhere in memory. The list is created with a certain free space (eg. for 8 pointers). When that fills up, it allocates more memory, and so on. Whether it moves the pointers from one block of memory to another, is a detail that most users ignore. In practice we just append/extend a list as needed and don't worry about memory use.
Why does creating a list from a list make it larger?
I assume jython uses the same approach, but you'd have to dig into its code to see how that translates to Java.
I mostly answer numpy questions. This is a numerical package that creates fixed sized multidimensional arrays. If a user needs to build such an array incrementally, we often recommend that they start with a list and append values. At the end they create the array. Appending to a list is much cheaper than rebuilding an array multiple times.
Internally python lists are Array of pointers as mentioned by hpaulj
The next question then is how can you extend the an Array in C as explained in the answer. Which explains this can be done using realloc function in C.
This lead me to look in to the behavior of realloc which mentions
The function may move the memory block to a new location (whose address is returned by the function).
From this my understanding is the array object is extended if contiguous memory is available, else memory block (containing the Array object not List object) is copied to newly allocated memory block with greater size.
This is my understanding, corrections are welcome if I am wrong.
I've been trying to learn how CPython is implemented under the scenes. It's great that Python is high level, but I don't like treating it like a black box.
With that in mind, how are tuples implemented? I've had a look at the source (tupleobject.c), but it's going over my head.
I see that PyTuple_MAXSAVESIZE = 20 and PyTuple_MAXFREELIST = 2000, what is saving and the "free list"? (Will there be a performance difference between tuples of length 20/21 or 2000/2001? What enforces the maximum tuple length?)
As a caveat, everything in this answer is based on what I've gleaned from looking over the implementation you linked.
It seems that the standard implementation of a tuple is simply as an array. However, there are a bunch of optimizations in place to speed things up.
First, if you try to make an empty tuple, CPython instead will hand back a canonical object representing the empty tuple. As a result, it can save on a bunch of allocations that are just allocating a single object.
Next, to avoid allocating a bunch of small objects, CPython recycles memory for many small lists. There is a fixed constant (PyTuple_MAXSAVESIZE) such that all tuples less than this length are eligible to have their space reclaimed. Whenever an object of length less than this constant is deallocated, there is a chance that the memory associated with it will not be freed and instead will be stored in a "free list" (more on that in the next paragraph) based on its size. That way, if you ever need to allocate a tuple of size n and one has previously been allocated and is no longer in use, CPython can just recycle the old array.
The free list itself is implemented as an array of size PyTuple_MAXSAVESIZE storing pointers to unused tuples, where the nth element of the array points either to NULL (if no extra tuples of size n are available) or to a reclaimed tuple of size n. If there are multiple different tuples of size n that could be reused, they are chained together in a sort of linked list by having each tuple's zeroth entry point to the next tuple that can be reused. (Since there is only one tuple of length zero ever allocated, there is never a risk of reading a nonexistent zeroth element). In this way, the allocator can store some number of tuples of each size for reuse. To ensure that this doesn't use too much memory, there is a second constant PyTuple_MAXFREELIST that controls the maximum length of any of these linked lists within any bucket. There is then a secondary array of length PyTuple_MAXSAVESIZE that stores the length of the linked lists for tuples of each given length so that this upper limit isn't exceeded.
All in all, it's a very clever implementation!
Because in the course of normal operations Python will create and destroy a lot of small tuples, Python keeps an internal cache of small tuples for that purpose. This helps cut down on a lot of memory allocation and deallocation churn. For the same reasons small integers from -5 to 255 are interned (made into singletons).
The PyTuple_MAXSAVESIZE definition controls at the maximum size of tuples that qualify for this optimization, and the PyTuple_MAXFREELIST definition controls how many of these tuples keeps around in memory. When a tuple of length < PyTuple_MAXSAVESIZE is discarded, it is added to the free list if there is still room for one (in tupledealloc), to be re-used when Python creates a new small tuple (in PyTuple_New).
Python is being a little clever about how it stores these; for each tuple of length > 0, it'll reuse the first element of each cached tuple to chain up to PyTuple_MAXFREELIST tuples together into a linked list. So each element in the free_list array is a linked list of Python tuple objects, and all tuples in such a linked list are of the same size. The only exception is the empty tuple (length 0); only one is ever needed of these, it is a singleton.
So, yes, for tuples over length PyTuple_MAXSAVESIZE python is guaranteed to have to allocate memory separately for a new C structure, and that could affect performance if you create and discard such tuples a lot.
If you want to understand Python C internals, I do recommend you study the Python C API; it'll make it easier to understand the various structures Python uses to define objects, functions and methods in C.