I am trying to compare sizes of data types in Python with sys.getsizeof(). However, for integers and floats, it returns same - 24 (not customary 4 or 8 bytes). Also, size of an array declared with array.array() with 4 integer elements is returned 72 (not 96). and with 4 float elements- 88 (not 96). What is going on?
import array, sys
arr1 = array.array('d', [1,2,3,4])
arr2 = array.array('i', [1,2,3,4])
print sys.getsizeof(arr1[1]), sys.getsizeof(arr2[1]) # 24, 24
print sys.getsizeof(arr1), sys.getsizeof(arr2) # 88, 72
The function sys.getsizeof() returns the amount of space the Python object takes. Not the amount of space you would need to represent the data in that object in the memory of the underlying system.
Python objects have overhead to cover reference counting (for garbage collection) and other implementation-related stuff. In addition, an array is not a naive sequence of floats or ints; the data structure has a fair amount of stuff under the hood that keeps track of datatype, number of elements and so on. That's where the 'd' or 'i' lives, for example.
To get the answers I think you are expecting, try
print (arr1.itemsize * len(arr1))
print (arr2.itemsize * len(arr2))
Related
Any solution consuming less than O(Bit Length) time is welcome. I need to process around 100 million large integers.
answer = [0 for i in xrange(100)]
def pluginBits(val):
global answer
for j in xrange(len(answer)):
if val <= 0:
break
answer[j] += (val & 1)
val >>= 1
A speedier way to do this would be to use '{:b}'.format(someval) to convert from integer to a string of '1's and '0's. Python still needs to do similar work to perform this conversion, but doing it at the C layer in the interpreter internals involves significantly less overhead for larger values.
For conversion to actual list of integer 1s and 0s, you could do something like:
# Done once at top level to make translation table:
import string
bitstr_to_intval = string.maketrans(b'01', b'\x00\x01')
# Done for each value to convert:
bits = bytearray('{:b}'.format(origint).translate(bitstr_to_intval))
Since bytearray is a mutable sequence of values in range(256) that iterates the actual int values, you don't need to convert to list; it should be usable in 99% of the places the list would be used, using less memory and running faster.
This does generate the values in the reverse of the order your code produces (that is, bits[-1] here is the same as your answer[0], bits[-2] is your answer[1], etc.), and it's unpadded, but since you're summing bits, the padding isn't needed, and reversing the result is a trivial reversing slice (add [::-1] to the end). Summing the bits from each input can be made much faster by making answer a numpy array (that allows a bulk element-wise addition at the C layer), and putting it all together gets:
import string
bitstr_to_intval = string.maketrans(b'01', b'\x00\x01')
answer = numpy.zeros(100, numpy.uint64)
def pluginBits(val):
bits = bytearray('{:b}'.format(val).translate(bitstr_to_intval))[::-1]
answer[:len(bits)] += bits
In local tests, this definition of pluginBits takes a little under one-seventh the time to sum the bits at each position for 10,000 random input integers of 100 bits each, and gets the same results.
Python can indeed left shift a bit by large integers
1L << 100
# 1267650600228229401496703205376L
But NumPy seemingly has a problem:
a = np.array([1,2,100])
output = np.left_shift(1L,a)
print output
# array([ 2, 4, 68719476736])
Is there a way to overcome this using NumPy's left_shift operation? Individually accessing the array elements gives the same incorrect result:
1L << a[2]
# 68719476736
Python long values aren't the same type as the integers held in a. Specifically, Python long values are not limited to 32 bits or 64 bits but instead can take up an arbitrary amount of memory.
On the other hand, NumPy will create a as an array of int32 or int64 integer values. When you left-shift this array, you get back an array of the same datatype. This isn't enough memory to hold 1 << 100 and the result of the left-shift overflows the memory it's been allocated in the array, producing the incorrect result.
To hold integers that large, you'll have to specify a to have the object datatype. For example:
>>> np.left_shift(1, a.astype(object))
array([2, 4, 1267650600228229401496703205376L], dtype=object)
object arrays can hold a mix of different types, including the unlimited-size Python long/integer values. However, a lot of the performance benefits of homogeneous datatype NumPy arrays will be lost when using object arrays.
Given the array a you create, the elements of output are going to be integers. Unlike Python itself, integers in a numpy darray are of fixed size with a defined maximum and minimum value.
numpy is arriving at the value given (2 ** 36) by reducing the shift modulo the length of the integer, 64 bits (100 mod 64 == 36).
This is an issue with 64-bit Python too, except that the integer at which this becomes a problem is larger. if you run a[2]=1L << a[2] you will get this Traceback :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
So as the traceback says Python int too large to convert to C long , so you need to change the array type (that is actually C structure ) to an python object :
>>> a = np.array([1,2,100],dtype='O')
>>> a[2]=1L << a[2]
>>> a
array([1, 2, 1267650600228229401496703205376L], dtype=object)
I'm working on a project that involves accessing data from a large list that's kept in memory. Because the list is quite voluminous (millions of lines) I keep an eye on how much memory is being used. I use OS X so I keep Activity Monitor open as I create these lists.
I've noticed that the amount of memory used by a list can vary wildly depending on how it is constructed but I can't seem to figure out why.
Now for some example code:
(I am using Python 2.7.4 on OSX 10.8.3)
The first function below creates a list and fills it with all the same three random numbers.
The second function below creates a list and fills it with all different random numbers.
import random
import sys
def make_table1(size):
list1 = size *[(float(),float(),float())] # initialize the list
line = (random.random(),
random.random(),
random.random())
for count in xrange(0, size): # Now fill it
list1[count] = line
return list1
def make_table2(size):
list1 = size *[(float(),float(),float())] # initialize the list
for count in xrange(0, size): # Now fill it
list1[count] = (random.random(),
random.random(),
random.random())
return list1
(First let me say that I realize the code above could have been written much more efficiently. It's written this way to keep the two examples as similar as possible.)
Now I create some lists using these functions:
In [2]: thing1 = make_table1(6000000)
In [3]: sys.getsizeof(thing1)
Out[3]: 48000072
At this point my memory used jumps by about 46 MB, which is what I would expect from the information given above.
Now for the next function:
In [4]: thing2 = make_table2(6000000)
In [5]: sys.getsizeof(thing2)
Out[5]: 48000072
As you can see, the memory taken up by the two lists is the same. They are exactly the same length so that's to be expected. What I didn't expect is that my memory used as given by Activity Monitor jumps to over 1 GB!
I understand there is going to be some overhead but 20x as much? 1 GB for a 46MB list?
Seriously?
Okay, on to diagnostics...
The first thing I tried is to collect any garbage:
In [5]: import gc
In [6]: gc.collect()
Out[6]: 0
It made zero difference to the amount of memory used.
Next I used guppy to see where the memory is going:
In [7]: from guppy import hpy
In [8]: hpy().heap()
Out[8]:
Partition of a set of 24217689 objects. Total size = 1039012560 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 6054789 25 484821768 47 484821768 47 tuple
1 18008261 74 432198264 42 917020032 88 float
2 2267 0 96847576 9 1013867608 98 list
3 99032 0 11392880 1 1025260488 99 str
4 585 0 1963224 0 1027223712 99 dict of module
5 1712 0 1799552 0 1029023264 99 dict (no owner)
6 13606 0 1741568 0 1030764832 99 types.CodeType
7 13355 0 1602600 0 1032367432 99 function
8 1494 0 1348088 0 1033715520 99 type
9 1494 0 1300752 0 1035016272 100 dict of type
<691 more rows. Type e.g. '_.more' to view.>
okay, my memory is taken up by:
462 MB of of tuple (huh?)
412 MB of float (what?)
92 MB of list (Okay, this one makes sense. 2*46MB = 92)
My lists are preallocated so I don't think that there is over-allocation going on.
Questions:
Why is the amount of memory used by these two very similar lists so different?
Is there a different way to populate a list that doesn't have so much overhead?
Is there a way to free up all that memory?
Note: Please don't suggest storing on the disk or using array.array or numpy or pandas data structures. Those are all great options but this question isn't about them. This question is about plain old lists.
I have tried similar code with Python 3.3 and the result is the same.
Here is someone with a similar problem. It contains some hints but it's not the same question.
Thank you all!
Both functions make a list of 6000000 references.
sizeof(thelist) ≅ sizeof(reference_to_a_python_object) * 6000000
First list contains 6000000 references to the same one tuple of three floats.
Second list contains references to 6000000 different tuples containing 18000000 different floats.
As you can see, a float takes 24 bytes and a triple takes 80 bytes (using your build of python). No, there's no way around that except numpy.
To turn the lists into collectible garbage, you need to get rid of any references to them:
del thing1
del thing2
This question already has answers here:
In-memory size of a Python structure
(7 answers)
Closed 9 years ago.
I am writing Python code to do some big number calculation, and have serious concern about the memory used in the calculation.
Thus, I want to count every bit of each variable.
For example, I have a variable x, which is a big number, and want to count the number of bits for representing x.
The following code is obviously useless:
x=2**1000
len(x)
Thus, I turn to use the following code:
x=2**1000
len(repr(x))
The variable x is (in decimal) is:
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
but the above code returns 303
The above long long sequence is of length 302, and so I believe that 303 should be related to the string length only.
So, here comes my original question:
How can I know the memory size of variable x?
One more thing; in C/C++ language, if I define
int z=1;
This means that there are 4 bytes= 32 bits allocated for z, and the bits are arranged as 00..001(31 0's and one 1).
Here, my variable x is huge, I don't know whether it follows the same memory allocation rule?
Use sys.getsizeof to get the size of an object, in bytes.
>>> from sys import getsizeof
>>> a = 42
>>> getsizeof(a)
12
>>> a = 2**1000
>>> getsizeof(a)
146
>>>
Note that the size and layout of an object is purely implementation-specific. CPython, for example, may use totally different internal data structures than IronPython. So the size of an object may vary from implementation to implementation.
Regarding the internal structure of a Python long, check sys.int_info (or sys.long_info for Python 2.7).
>>> import sys
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)
Python either stores 30 bits into 4 bytes (most 64-bit systems) or 15 bits into 2 bytes (most 32-bit systems). Comparing the actual memory usage with calculated values, I get
>>> import math, sys
>>> a=0
>>> sys.getsizeof(a)
24
>>> a=2**100
>>> sys.getsizeof(a)
40
>>> a=2**1000
>>> sys.getsizeof(a)
160
>>> 24+4*math.ceil(100/30)
40
>>> 24+4*math.ceil(1000/30)
160
There are 24 bytes of overhead for 0 since no bits are stored. The memory requirements for larger values matches the calculated values.
If your numbers are so large that you are concerned about the 6.25% unused bits, you should probably look at the gmpy2 library. The internal representation uses all available bits and computations are significantly faster for large values (say, greater than 100 digits).
I just experimented with the size of python data structures in memory. I wrote the following snippet:
import sys
lst1=[]
lst1.append(1)
lst2=[1]
print(sys.getsizeof(lst1), sys.getsizeof(lst2))
I tested the code on the following configurations:
Windows 7 64bit, Python3.1: the output is: 52 40 so lst1 has 52 bytes and lst2 has 40 bytes.
Ubuntu 11.4 32bit with Python3.2: output is 48 32
Ubuntu 11.4 32bit Python2.7: 48 36
Can anyone explain to me why the two sizes differ although both are lists containing a 1?
In the python documentation for the getsizeof function I found the following: ...adds an additional garbage collector overhead if the object is managed by the garbage collector. Could this be the case in my little example?
Here's a fuller interactive session that will help me explain what's going on (Python 2.6 on Windows XP 32-bit, but it doesn't matter really):
>>> import sys
>>> sys.getsizeof([])
36
>>> sys.getsizeof([1])
40
>>> lst = []
>>> lst.append(1)
>>> sys.getsizeof(lst)
52
>>>
Note that the empty list is a bit smaller than the one with [1] in it. When an element is appended, however, it grows much larger.
The reason for this is the implementation details in Objects/listobject.c, in the source of CPython.
Empty list
When an empty list [] is created, no space for elements is allocated - this can be seen in PyList_New. 36 bytes is the amount of space required for the list data structure itself on a 32-bit machine.
List with one element
When a list with a single element [1] is created, space for one element is allocated in addition to the memory required by the list data structure itself. Again, this can be found in PyList_New. Given size as argument, it computes:
nbytes = size * sizeof(PyObject *);
And then has:
if (size <= 0)
op->ob_item = NULL;
else {
op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
memset(op->ob_item, 0, nbytes);
}
Py_SIZE(op) = size;
op->allocated = size;
So we see that with size = 1, space for one pointer is allocated. 4 bytes (on my 32-bit box).
Appending to an empty list
When calling append on an empty list, here's what happens:
PyList_Append calls app1
app1 asks for the list's size (and gets 0 as an answer)
app1 then calls list_resize with size+1 (1 in our case)
list_resize has an interesting allocation strategy, summarized in this comment from its source.
Here it is:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
/* check for integer overflow */
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
Let's do some math
Let's see how the numbers I quoted in the session in the beginning of my article are reached.
So 36 bytes is the size required by the list data structure itself on 32-bit. With a single element, space is allocated for one pointer, so that's 4 extra bytes - total 40 bytes. OK so far.
When app1 is called on an empty list, it calls list_resize with size=1. According to the over-allocation algorithm of list_resize, the next largest available size after 1 is 4, so place for 4 pointers will be allocated. 4 * 4 = 16 bytes, and 36 + 16 = 52.
Indeed, everything makes sense :-)
sorry, previous comment was a bit curt.
what's happening is that you're looking at how lists are allocated (and i think maybe you just wanted to see how big things were - in that case, use sys.getsizeof())
when something is added to a list, one of two things can happen:
the extra item fits in spare space
extra space is needed, so a new list is made, and the contents copied across, and the extra thing added.
since (2) is expensive (copying things, even pointers, takes time proportional to the number of things to be copied, so grows as lists get large) we want to do it infrequently. so instead of just adding a little more space, we add a whole chunk. typically the size of the amount added is similar to what is already in use - that way the maths works out that the average cost of allocating memory, spread out over many uses, is only proportional to the list size.
so what you are seeing is related to this behaviour. i don't know the exact details, but i wouldn't be surprised if [] or [1] (or both) are special cases, where only enough memory is allocated (to save memory in these common cases), and then appending does the "grab a new chunk" described above that adds more.
but i don't know the exact details - this is just how dynamic arrays work in general. the exact implementation of lists in python will be finely tuned so that it is optimal for typical python programs. so all i am really saying is that you can't trust the size of a list to tell you exactly how much it contains - it may contain extra space, and the amount of extra free space is difficult to judge or predict.
ps a neat alternative to this is to make lists as (value, pointer) pairs, where each pointer points to the next tuple. in this way you can grow lists incrementally, although the total memory used is higher. that is a linked list (what python uses is more like a vector or a dynamic array).
[update] see Eli's excellent answer. they explain that both [] and [1] are allocated exactly, but that appending to [] allocates an extra chunk. the comment in the code is what i am saying above (this is called "over-allocation" and the amount is porportional to what we have so that the average ("amortised") cost is proportional to size).
Here's a quick demonstration of the list growth pattern. Changing the third argument in range() will change the output so it doesn't look like the comments in listobject.c, but the result when simply appending one element seem to be perfectly accurate.
allocated = 0
for newsize in range(0,100,1):
if (allocated < newsize):
new_allocated = (newsize >> 3) + (3 if newsize < 9 else 6)
allocated = newsize + new_allocated;
print newsize, allocated
formula changes based on the system architecture
(size-36)/4 for 32 bit machines and
(size-64)/8 for 64 bit machines
36,64 - size of an empty list based on machine
4,8 - size of a single element in the list based on machine