Python 2 vs Python 3 - Size of an integer [duplicate]

Python 2 vs Python 3 - Size of an integer [duplicate] - python

I want to check the size of int data type in python:
import sys
sys.getsizeof(int)
It comes out to be "436", which doesn't make sense to me.
Anyway, I want to know how many bytes (2,4,..?) int will take on my machine.

The short answer
You're getting the size of the class, not of an instance of the class. Call int to get the size of an instance:
>>> sys.getsizeof(int())
24
If that size still seems a little bit large, remember that a Python int is very different from an int in (for example) c. In Python, an int is a fully-fledged object. This means there's extra overhead.
Every Python object contains at least a refcount and a reference to the object's type in addition to other storage; on a 64-bit machine, that takes up 16 bytes! The int internals (as determined by the standard CPython implementation) have also changed over time, so that the amount of additional storage taken depends on your version.
Some details about int objects in Python 2 and 3
Here's the situation in Python 2. (Some of this is adapted from a blog post by Laurent Luce). Integer objects are represented as blocks of memory with the following structure:
typedef struct {
PyObject_HEAD
long ob_ival;
} PyIntObject;
PyObject_HEAD is a macro defining the storage for the refcount and the object type. It's described in some detail by the documentation, and the code can be seen in this answer.
The memory is allocated in large blocks so that there's not an allocation bottleneck for every new integer. The structure for the block looks like this:
struct _intblock {
struct _intblock *next;
PyIntObject objects[N_INTOBJECTS];
};
typedef struct _intblock PyIntBlock;
These are all empty at first. Then, each time a new integer is created, Python uses the memory pointed at by next and increments next to point to the next free integer object in the block.
I'm not entirely sure how this changes once you exceed the storage capacity of an ordinary integer, but once you do so, the size of an int gets larger. On my machine, in Python 2:
>>> sys.getsizeof(0)
24
>>> sys.getsizeof(1)
24
>>> sys.getsizeof(2 ** 62)
24
>>> sys.getsizeof(2 ** 63)
36
In Python 3, I think the general picture is the same, but the size of integers increases in a more piecemeal way:
>>> sys.getsizeof(0)
24
>>> sys.getsizeof(1)
28
>>> sys.getsizeof(2 ** 30 - 1)
28
>>> sys.getsizeof(2 ** 30)
32
>>> sys.getsizeof(2 ** 60 - 1)
32
>>> sys.getsizeof(2 ** 60)
36
These results are, of course, all hardware-dependent! YMMV.
The variability in integer size in Python 3 is a hint that they may behave more like variable-length types (like lists). And indeed, this turns out to be true. Here's the definition of the C struct for int objects in Python 3:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
The comments that accompany this definition summarize Python 3's representation of integers. Zero is represented not by a stored value, but by an object with size zero (which is why sys.getsizeof(0) is 24 bytes while sys.getsizeof(1) is 28). Negative numbers are represented by objects with a negative size attribute! So weird.

Related

Python C-API int128 support

In python one can handle very large integers (for instance uuid.uuid4().int.bit_length() gives 128), but the largest int datastructure the C-API documentation offers is long long, and it is a 64-bit int.
I would love to be able to get a C int128 from a PyLong, but it seems there is no tooling for this. PyLong_AsLongLong for instance cannot handle python integers bigger than 2**64.
Is there some documentation I missed, and it is actually possible?
Is there currently not possible, but some workaround exist? (I would love to use the tooling available in the python C-API for long long with int128, for instance a PyLong_AsInt128AndOverflow function).
Is it a planed feature in a forthcoming python release?

There are a couple of different ways you can access the level of precision you want.
Systems with 64-bit longs often have 128-bit long longs. Notice that the article you link says "at least 64 bits". It's worth checking sizeof(long long) in case there's nothing further to do.
Assuming that is not what you are working with, you'll have to look closer at the raw PyLongObject, which is actually a typedef of the private _longobject structure.
The raw bits are accessible through the ob_digit field, with the length given by ob_size. The data type of the digits, and the actual number of boots they hold is given by the typedef digit and the macro PYLONG_BITS_IN_DIGIT. The latter must be smaller than 8 * sizeof(digit), larger than 8, and a multiple of 5 (so 30 or 15, depending on how your build was done).
Luckily for you, there is an "undocumented" method in the C API that will copy the bytes of the number for you: _PyLong_AsByteArray. The comment in longobject.h reads:
/* _PyLong_AsByteArray: Convert the least-significant 8*n bits of long
v to a base-256 integer, stored in array bytes. Normally return 0,
return -1 on error.
If little_endian is 1/true, store the MSB at bytes[n-1] and the LSB at
bytes[0]; else (little_endian is 0/false) store the MSB at bytes[0] and
the LSB at bytes[n-1].
If is_signed is 0/false, it's an error if v < 0; else (v >= 0) n bytes
are filled and there's nothing special about bit 0x80 of the MSB.
If is_signed is 1/true, bytes is filled with the 2's-complement
representation of v's value. Bit 0x80 of the MSB is the sign bit.
Error returns (-1):
+ is_signed is 0 and v < 0. TypeError is set in this case, and bytes
isn't altered.
+ n isn't big enough to hold the full mathematical value of v. For
example, if is_signed is 0 and there are more digits in the v than
fit in n; or if is_signed is 1, v < 0, and n is just 1 bit shy of
being large enough to hold a sign bit. OverflowError is set in this
case, but bytes holds the least-significant n bytes of the true value.
*/
You should be able to get a UUID with something like
PyLongObject *mylong;
unsigned char myuuid[16];
_PyLong_AsByteArray(mylong, myuuid, sizeof(myuuid), 1, 0);

Why do tuples take less space in memory than lists?

A tuple takes less memory space in Python:
>>> a = (1,2,3)
>>> a.__sizeof__()
48
whereas lists takes more memory space:
>>> b = [1,2,3]
>>> b.__sizeof__()
64
What happens internally on the Python memory management?

I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.
Regardless of the implementation, lists are variable-sized while tuples are fixed-size.
So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.
But there's another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always - to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.
So lists need at least 16 bytes more memory than tuples. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:
>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4) # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96
>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1) # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2) # no re-alloc
>>> l.append(3) # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4) # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72
Images
I decided to create some images to accompany the explanation above. Maybe these are helpful
This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:
That's actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:
Useful links:
tuple struct in CPython repository for Python 2.7
list struct in CPython repository for Python 2.7
int struct in CPython repository for Python 2.7
Note that __sizeof__ doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:
>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72
There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__ method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).

I'll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won't touch on that.
I'm going to use 64-bit values here, as you are.
The size for lists is calculated from the following function, list_sizeof:
static PyObject *
list_sizeof(PyListObject *self)
{
Py_ssize_t res;
res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
return PyInt_FromSsize_t(res);
}
Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.
The PyListObject structure has three fields:
PyObject_VAR_HEAD # 24 bytes
PyObject **ob_item; # 8 bytes
Py_ssize_t allocated; # 8 bytes
these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.
So for now res is:
sizeof(PyListObject) + self->allocated * sizeof(void*)
or:
40 + self->allocated * sizeof(void*)
If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it's name implies, holds the number of allocated elements.
Without any elements, the size of lists is calculated to be:
>>> [].__sizeof__()
40
i.e the size of the instance struct.
tuple objects don't define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:
static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
Py_ssize_t res, isize;
res = 0;
isize = self->ob_type->tp_itemsize;
if (isize > 0)
res = Py_SIZE(self) * isize;
res += self->ob_type->tp_basicsize;
return PyInt_FromSsize_t(res);
}
This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.
tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:
PyObject_VAR_HEAD # 24 bytes
PyObject *ob_item[1]; # 8 bytes
So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):
>>> ().__sizeof__()
24
huh? Well, here's an oddity which I haven't found an explanation for, the tp_basicsize of tuples is actually calculated as follows:
sizeof(PyTupleObject) - sizeof(PyObject *)
why an additional 8 bytes is removed from tp_basicsize is something I haven't been able to find out. (See MSeifert's comment for a possible explanation)
But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.
Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert's covers nicely in his answer.

MSeifert answer covers it broadly; to keep it simple you can think of:
tuple is immutable. Once set, you can't change it. So you know in advance how much memory you need to allocate for that object.
list is mutable. You can add or remove items to or from it. It has to know its current size. It resizes as needed.
There are no free meals - these capabilities comes with a cost. Hence the overhead in memory for lists.

The size of the tuple is prefixed, i.e. at tuple initialization the interpreter allocates enough space for the contained data and hence it's immutable (can't be modified). Whereas a list is a mutable object, hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list (allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future runtime changes, like appends and modifications.
That pretty much sums it up.

Why do ints require three times as much memory in Python?

On a 64-bit system an integer in Python takes 24 bytes. This is 3 times the memory that would be needed in e.g. C for a 64-bit integer. Now, I know this is because Python integers are objects. But what is the extra memory used for? I have my guesses, but it would be nice to know for sure.

Remember that the Python int type does not have a limited range like C int has; the only limit is the available memory.
Memory goes to storing the value, the current size of the integer storage (the storage size is variable to support arbitrary sizes), and the standard Python object bookkeeping (a reference to the relevant object and a reference count).
You can look up the longintrepr.h source (the Python 3 int type was traditionally known as the long type in Python 2); it makes effective use of the PyVarObject C type to track integer size:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
The ob_digit array stores 'digits' of either 15 or 30 bits wide (depending on your platform); so on my 64-bit OS X system, an integer up to (2 ^ 30) - 1 uses 1 'digit':
>>> sys.getsizeof((1 << 30) - 1)
28
but if you use 2 30-bit digits in the number an additional 4 bytes are needed, etc:
>>> sys.getsizeof(1 << 30)
32
>>> sys.getsizeof(1 << 60)
36
>>> sys.getsizeof(1 << 90)
40
The base 24 bytes then are the PyObject_VAR_HEAD structure, holding the object size, the reference count and the type pointer (each 8 bytes / 64 bits on my 64-bit OS X platform).
On Python 2, integers <= sys.maxint but >= -sys.maxint - 1 are stored using a simpler structure storing just the single value:
typedef struct {
PyObject_HEAD
long ob_ival;
} PyIntObject;
because this uses PyObject instead of PyVarObject there is no ob_size field in the struct and the memory size is limited to just 24 bytes; 8 for the long value, 8 for the reference count and 8 for the type object pointer.

From longintrepr.h, we see that a Python 'int' object is defined with this C structure:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
Digit is a 32-bit unsigned value. The bulk of the space is taken by the variable size object header. From object.h, we can find its definition:
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
We can see that we are using a Py_ssize_t, 64-bits assuming 64-bit system, to store the count of "digits" in the value. This is possibly wasteful. We can also see that the general object header has a 64-bit reference count, and a pointer to the object type, which will also be a 64-bits of storage. The reference count is necessary for Python to know when to deallocate the object, and the pointer to the object type is necessary to know that we have an int and not, say, a string, as C structures have no way to test the type of an object from an arbitrary pointer.
_PyObject_HEAD_EXTRA is defined to nothing on most builds of python, but can be used to store a linked list of all Python objects on the heap if the build enables that option, using another two pointers of 64-bits each.

how to use python to built an int arrray and manuplate it as efficient as C?

From http://www.cs.bell-labs.com/cm/cs/pearls/sol01.html:
The C code is like that:
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];
void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }
I found ctypes, BitArrays,numpy but I'm not sure whether they could be as efficient as the C codes above.
For example, if I write codes like this:
from ctypes import c_int
a=[c_int(9)]*1024*1024
Would the space used be 1M Bytes, or much more?
Does anyone know some good libraries that can do the same things in Python?

Numpy or ctypes are both good choices. But are you sure your Python code really needs to be as efficient as C, and are you sure this code is a performance hotspot?
The best thing to do is to use the Python profiler to ensure that this code really does need to be as efficient as C. If it truly does, then it will probably be easiest to just keep the code in C and link to it using something like ctypes or SWIG.
Edit: To answer your updated question, a numpy array of size N with element size M will contain N*M bytes of contiguous memory, plus a header and some bytes for views.
Here are a couple of related links:
Python memory usage of numpy arrays
Memory usage of numpy-arrays

you can also check the built-in array module:
>>> import array
>>> help(array)
Help on built-in module array:
NAME
array
FILE
(built-in)
DESCRIPTION
This module defines an object type which can efficiently represent
an array of basic values: characters, integers, floating point
numbers. Arrays are sequence types and behave very much like lists,
except that the type of objects stored in them is constrained. The
type is specified at object creation time by using a type code, which
is a single character. The following type codes are defined:
Type code C Type Minimum size in bytes
'b' signed integer 1
'B' unsigned integer 1
'u' Unicode character 2 (see note)
'h' signed integer 2
'H' unsigned integer 2
'i' signed integer 2
'I' unsigned integer 2
'l' signed integer 4
'L' unsigned integer 4
'f' floating point 4
'd' floating point 8

This:
a=[c_int()]
makes a list which contains a reference to a c_int object.
Multiplying the list merely duplicates the references, so:
a = [c_int()] * 1024 * 1024
actually creates a list of 1024 * 1024 references to the same single c_int object.
If you want an array of 1024 * 1024 c_ints, do this:
a = c_int * (1024 * 1024)

Size of list in memory

I just experimented with the size of python data structures in memory. I wrote the following snippet:
import sys
lst1=[]
lst1.append(1)
lst2=[1]
print(sys.getsizeof(lst1), sys.getsizeof(lst2))
I tested the code on the following configurations:
Windows 7 64bit, Python3.1: the output is: 52 40 so lst1 has 52 bytes and lst2 has 40 bytes.
Ubuntu 11.4 32bit with Python3.2: output is 48 32
Ubuntu 11.4 32bit Python2.7: 48 36
Can anyone explain to me why the two sizes differ although both are lists containing a 1?
In the python documentation for the getsizeof function I found the following: ...adds an additional garbage collector overhead if the object is managed by the garbage collector. Could this be the case in my little example?

Here's a fuller interactive session that will help me explain what's going on (Python 2.6 on Windows XP 32-bit, but it doesn't matter really):
>>> import sys
>>> sys.getsizeof([])
36
>>> sys.getsizeof([1])
40
>>> lst = []
>>> lst.append(1)
>>> sys.getsizeof(lst)
52
>>>
Note that the empty list is a bit smaller than the one with [1] in it. When an element is appended, however, it grows much larger.
The reason for this is the implementation details in Objects/listobject.c, in the source of CPython.
Empty list
When an empty list [] is created, no space for elements is allocated - this can be seen in PyList_New. 36 bytes is the amount of space required for the list data structure itself on a 32-bit machine.
List with one element
When a list with a single element [1] is created, space for one element is allocated in addition to the memory required by the list data structure itself. Again, this can be found in PyList_New. Given size as argument, it computes:
nbytes = size * sizeof(PyObject *);
And then has:
if (size <= 0)
op->ob_item = NULL;
else {
op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
memset(op->ob_item, 0, nbytes);
}
Py_SIZE(op) = size;
op->allocated = size;
So we see that with size = 1, space for one pointer is allocated. 4 bytes (on my 32-bit box).
Appending to an empty list
When calling append on an empty list, here's what happens:
PyList_Append calls app1
app1 asks for the list's size (and gets 0 as an answer)
app1 then calls list_resize with size+1 (1 in our case)
list_resize has an interesting allocation strategy, summarized in this comment from its source.
Here it is:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
/* check for integer overflow */
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
Let's do some math
Let's see how the numbers I quoted in the session in the beginning of my article are reached.
So 36 bytes is the size required by the list data structure itself on 32-bit. With a single element, space is allocated for one pointer, so that's 4 extra bytes - total 40 bytes. OK so far.
When app1 is called on an empty list, it calls list_resize with size=1. According to the over-allocation algorithm of list_resize, the next largest available size after 1 is 4, so place for 4 pointers will be allocated. 4 * 4 = 16 bytes, and 36 + 16 = 52.
Indeed, everything makes sense :-)

sorry, previous comment was a bit curt.
what's happening is that you're looking at how lists are allocated (and i think maybe you just wanted to see how big things were - in that case, use sys.getsizeof())
when something is added to a list, one of two things can happen:
the extra item fits in spare space
extra space is needed, so a new list is made, and the contents copied across, and the extra thing added.
since (2) is expensive (copying things, even pointers, takes time proportional to the number of things to be copied, so grows as lists get large) we want to do it infrequently. so instead of just adding a little more space, we add a whole chunk. typically the size of the amount added is similar to what is already in use - that way the maths works out that the average cost of allocating memory, spread out over many uses, is only proportional to the list size.
so what you are seeing is related to this behaviour. i don't know the exact details, but i wouldn't be surprised if [] or [1] (or both) are special cases, where only enough memory is allocated (to save memory in these common cases), and then appending does the "grab a new chunk" described above that adds more.
but i don't know the exact details - this is just how dynamic arrays work in general. the exact implementation of lists in python will be finely tuned so that it is optimal for typical python programs. so all i am really saying is that you can't trust the size of a list to tell you exactly how much it contains - it may contain extra space, and the amount of extra free space is difficult to judge or predict.
ps a neat alternative to this is to make lists as (value, pointer) pairs, where each pointer points to the next tuple. in this way you can grow lists incrementally, although the total memory used is higher. that is a linked list (what python uses is more like a vector or a dynamic array).
[update] see Eli's excellent answer. they explain that both [] and [1] are allocated exactly, but that appending to [] allocates an extra chunk. the comment in the code is what i am saying above (this is called "over-allocation" and the amount is porportional to what we have so that the average ("amortised") cost is proportional to size).

Here's a quick demonstration of the list growth pattern. Changing the third argument in range() will change the output so it doesn't look like the comments in listobject.c, but the result when simply appending one element seem to be perfectly accurate.
allocated = 0
for newsize in range(0,100,1):
if (allocated < newsize):
new_allocated = (newsize >> 3) + (3 if newsize < 9 else 6)
allocated = newsize + new_allocated;
print newsize, allocated

formula changes based on the system architecture
(size-36)/4 for 32 bit machines and
(size-64)/8 for 64 bit machines
36,64 - size of an empty list based on machine
4,8 - size of a single element in the list based on machine

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.