In-memory size of a Python structure - python

Is there a reference for the memory size of Python data stucture on 32- and 64-bit platforms?
If not, this would be nice to have it on SO. The more exhaustive the better! So how many bytes are used by the following Python structures (depending on the len and the content type when relevant)?
int
float
reference
str
unicode string
tuple
list
dict
set
array.array
numpy.array
deque
new-style classes object
old-style classes object
... and everything I am forgetting!
(For containers that keep only references to other objects, we obviously do not want to count the size of the item themselves, since it might be shared.)
Furthermore, is there a way to get the memory used by an object at runtime (recursively or not)?

The recommendation from an earlier question on this was to use sys.getsizeof(), quoting:
>>> import sys
>>> x = 2
>>> sys.getsizeof(x)
14
>>> sys.getsizeof(sys.getsizeof)
32
>>> sys.getsizeof('this')
38
>>> sys.getsizeof('this also')
48
You could take this approach:
>>> import sys
>>> import decimal
>>>
>>> d = {
... "int": 0,
... "float": 0.0,
... "dict": dict(),
... "set": set(),
... "tuple": tuple(),
... "list": list(),
... "str": "a",
... "unicode": u"a",
... "decimal": decimal.Decimal(0),
... "object": object(),
... }
>>> for k, v in sorted(d.iteritems()):
... print k, sys.getsizeof(v)
...
decimal 40
dict 140
float 16
int 12
list 36
object 8
set 116
str 25
tuple 28
unicode 28
2012-09-30
python 2.7 (linux, 32-bit):
decimal 36
dict 136
float 16
int 12
list 32
object 8
set 112
str 22
tuple 24
unicode 32
python 3.3 (linux, 32-bit)
decimal 52
dict 144
float 16
int 14
list 32
object 8
set 112
str 26
tuple 24
unicode 26
2016-08-01
OSX, Python 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
decimal 80
dict 280
float 24
int 24
list 72
object 16
set 232
str 38
tuple 56
unicode 52

These answers all collect shallow size information. I suspect that visitors to this question will end up here looking to answer the question, "How big is this complex object in memory?"
There's a great answer here: https://goshippo.com/blog/measure-real-size-any-python-object/
The punchline:
import sys
def get_size(obj, seen=None):
"""Recursively finds size of objects"""
size = sys.getsizeof(obj)
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen:
return 0
# Important mark as seen *before* entering recursion to gracefully handle
# self-referential objects
seen.add(obj_id)
if isinstance(obj, dict):
size += sum([get_size(v, seen) for v in obj.values()])
size += sum([get_size(k, seen) for k in obj.keys()])
elif hasattr(obj, '__dict__'):
size += get_size(obj.__dict__, seen)
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
size += sum([get_size(i, seen) for i in obj])
return size
Used like so:
In [1]: get_size(1)
Out[1]: 24
In [2]: get_size([1])
Out[2]: 104
In [3]: get_size([[1]])
Out[3]: 184
If you want to know Python's memory model more deeply, there's a great article here that has a similar "total size" snippet of code as part of a longer explanation: https://code.tutsplus.com/tutorials/understand-how-much-memory-your-python-objects-use--cms-25609

I've been happily using pympler for such tasks. It's compatible with many versions of Python -- the asizeof module in particular goes back to 2.2!
For example, using hughdbrown's example but with from pympler import asizeof at the start and print asizeof.asizeof(v) at the end, I see (system Python 2.5 on MacOSX 10.5):
$ python pymp.py
set 120
unicode 32
tuple 32
int 16
decimal 152
float 16
list 40
object 0
dict 144
str 32
Clearly there is some approximation here, but I've found it very useful for footprint analysis and tuning.

Try memory profiler.
memory profiler
Line # Mem usage Increment Line Contents
==============================================
3 #profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a

Also you can use guppy module.
>>> from guppy import hpy; hp=hpy()
>>> hp.heap()
Partition of a set of 25853 objects. Total size = 3320992 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 11731 45 929072 28 929072 28 str
1 5832 23 469760 14 1398832 42 tuple
2 324 1 277728 8 1676560 50 dict (no owner)
3 70 0 216976 7 1893536 57 dict of module
4 199 1 210856 6 2104392 63 dict of type
5 1627 6 208256 6 2312648 70 types.CodeType
6 1592 6 191040 6 2503688 75 function
7 199 1 177008 5 2680696 81 type
8 124 0 135328 4 2816024 85 dict of class
9 1045 4 83600 3 2899624 87 __builtin__.wrapper_descriptor
<90 more rows. Type e.g. '_.more' to view.>
And:
>>> hp.iso(1, [1], "1", (1,), {1:1}, None)
Partition of a set of 6 objects. Total size = 560 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 17 280 50 280 50 dict (no owner)
1 1 17 136 24 416 74 list
2 1 17 64 11 480 86 tuple
3 1 17 40 7 520 93 str
4 1 17 24 4 544 97 int
5 1 17 16 3 560 100 types.NoneType

When you use the dir([object]) built-in function, you can get the __sizeof__ of the built-in function.
>>> a = -1
>>> a.__sizeof__()
24

One can also make use of the tracemalloc module from the Python standard library. It seems to work well for objects whose class is implemented in C (unlike Pympler, for instance).

Related

Finding a memory leak in a very big project

I have a pretty big multithreading Python project that apparently has a memory leak somewhere. A DoctorThread shows me these (shortened) results:
Partition of a set of 418 objects. Total size = 96792 bytes.
Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
0 43 10 22792 24 22792 24 guppy.etc.Glue.Interface
1 66 16 18480 19 41272 43 dict of guppy.etc.Glue.Owner
2 25 6 18344 19 59616 62 dict of guppy.etc.Glue.Share
3 8 2 8384 9 68000 70 guppy.etc.Glue.Share
4 86 21 6696 7 74696 77 dict (no owner)
5 22 5 6160 6 80856 84 guppy.etc.Glue.Owner
6 37 9 2608 3 83464 86 dict (no owner), dict of guppy.etc.Glue.Interface
7 28 7 2464 3 85928 89 guppy.heapy.heapyc.HeapView
8 11 3 1840 2 87768 91 <Nothing>
9 2 0 1112 1 88880 92 __builtin__.cell
<24 more rows. Type e.g. '_.more' to view.>
Partition of a set of 23178 objects. Total size = 1604224 bytes.
Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
0 11135 48 801440 50 801440 50 list
1 11153 48 602408 38 1403848 88 tuple
[...]
<95 more rows. Type e.g. '_.more' to view.>
Partition of a set of 45140 objects. Total size = 2987568 bytes.
Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
0 22114 49 1591936 53 1591936 53 list
1 22133 49 1195328 40 2787264 93 tuple
[...]
<95 more rows. Type e.g. '_.more' to view.>
Partition of a set of 66115 objects. Total size = 4337720 bytes.
Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
0 32524 49 2341216 54 2341216 54 list
1 32513 49 1755848 40 4097064 94 tuple
[...]
<104 more rows. Type e.g. '_.more' to view.>
Partition of a set of 88355 objects. Total size = 5739128 bytes.
Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
0 43644 49 3141856 55 3141856 55 list
1 43633 49 2356328 41 5498184 96 tuple
[...]
<104 more rows. Type e.g. '_.more' to view.>
Partition of a set of 110380 objects. Total size = 7097992 bytes.
Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
0 54734 50 3940576 56 3940576 56 list
1 54753 50 2956808 42 6897384 97 tuple
[...]
<97 more rows. Type e.g. '_.more' to view.>
As you can see, the number of list and tuple referrers increases gradually. And it never stops increasing. These two entries are the only ones that increase constantly.
The DoctorThread class looks like this:
class DoctorThread(threading.Thread):
def __init__(self):
super(DoctorThread, self).__init__()
self.daemon = True
self.hp = guppy.hpy()
def run(self):
time.sleep(5)
logging.info("Doctor Thread started - taking heap snapshots")
before_heap = self.hp.heap()
while not PippinNetwork.is_shutdown():
gc.collect()
leftover = self.hp.heap() - before_heap
print(leftover.byrcs)
time.sleep(2.0)
Memory consumption is equally increasing. How can I find the culprit of this leak?
Update: solved.
In hindsight it was a rampant list.append((object, object)). The guppy results could have been interpreted like that.
In case of memory leak in big source codes you can use 'pympler' in python to track for the reference in future work.
from pympler import muppy
from pympler import summary
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
you can filter objects as well. Only include those where you have doubt then you can see which object or variable is causing memory leak then see the logic behind that.
I am suggesting this method cause in this way you don't have to cut down the logic (cause your project would have the huge code)
In case of clarification, you can ask in comment.

How can I print all rows using python's guppy

I am using python's guppy in order to see heap usage in a python program. I do:
h = hpy
hp = h.heap()
print hp
and this is the produced output:
Partition of a set of 339777 objects. Total size = 51680288 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 137974 41 17732032 34 17732032 34 str
1 93077 27 8342072 16 26074104 50 tuple
2 992 0 3428864 7 29502968 57 dict of module
3 23606 7 3021568 6 32524536 63 types.CodeType
4 23577 7 2829240 5 35353776 68 function
5 2815 1 2541648 5 37895424 73 type
6 2815 1 2513128 5 40408552 78 dict of type
7 2112 1 2067840 4 42476392 82 dict (no owner)
8 4495 1 1729792 3 44206184 86 unicode
9 4026 1 671376 1 44877560 87 list
<972 more rows. Type e.g. '_.more' to view.>
How can I print all rows?
Use all Method which is used to show all lines.
import decimal
from guppy import hpy
d = {
"int": 0,
"float": 0.0,
"dict": dict(),
"set": set(),
"tuple": tuple(),
"list": list(),
"str": "a",
"unicode": u"a",
"decimal": decimal.Decimal(0),
"object": object(),
}
hp = hpy()
heap = hp.heap()
print(heap.all)
I used code taken from this model and with the documentation I could get clear in my own mind what the various entities are.
The end result is that the following prints out the report for the whole of the heap, the same sort of report that you only get 10 lines at a time of normally:
h = hpy()
identity_set = h.heap()
stats = identity_set.stat
print()
print("Index Count Size Cumulative Size Object Name")
for row in stats.get_rows():
print("%5d %5d %8d %8d %30s"%(row.index, row.count, row.size, row.cumulsize, row.name))

How do I determine the memory usage of a python type?

Working with large datasets means worrying about memory usage. Is there a bulit-in function, neat hack or widely available package to determine the memory usage of a given type?
In the current case I am wondering how many bytes of memory a single pandas.timedelta Object requires, in order to determine how many of them I can reasonably load into local memory. A general method to determine the memory requirements of any type would be preferable, though.
this can be done by using python memory profiler
>>> from guppy import hpy; h=hpy()
>>> h.heap()
Partition of a set of 48477 objects. Total size = 3265516 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 25773 53 1612820 49 1612820 49 str
1 11699 24 483960 15 2096780 64 tuple
2 174 0 241584 7 2338364 72 dict of module
3 3478 7 222592 7 2560956 78 types.CodeType
4 3296 7 184576 6 2745532 84 function
5 401 1 175112 5 2920644 89 dict of class
6 108 0 81888 3 3002532 92 dict (no owner)
7 114 0 79632 2 3082164 94 dict of type
8 117 0 51336 2 3133500 96 type
9 667 1 24012 1 3157512 97 __builtin__.wrapper_descriptor
<76 more rows. Type e.g. '_.more' to view.>
>>> h.iso(1,[],{})
Partition of a set of 3 objects. Total size = 176 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 33 136 77 136 77 dict (no owner)
1 1 33 28 16 164 93 list
2 1 33 12 7 176 100 int
>>> x=[]
>>> h.iso(x).sp
0: h.Root.i0_modules['__main__'].__dict__['x']
>>>

Simple way to get current memory usage from Guppy

tl/dr: how do I get the current memory usage of my python program using Guppy? Is there a simple command?
I'm trying to track memory usage in a python program using guppy. This is my first usage of guppy, so I'm not very sure of how it behaves. What I want is
to be able to plot the total usage as "time" progresses in a simulation. This is a basic bit of code for what I can do:
from guppy import hpy
import networkx as nx
h = hpy()
L=[1,2,3]
h.heap()
> Partition of a set of 89849 objects. Total size = 12530016 bytes.
> Index Count % Size % Cumulative % Kind (class / dict of class)
> 0 40337 45 3638400 29 3638400 29 str
> 1 21681 24 1874216 15 5512616 44 tuple
> 2 1435 2 1262344 10 6774960 54 dict (no owner)
But I would like to just know what the current size is (the 12530016 bytes). So I'd like to be able to call something like h.total() to get the total size. I'd be shocked if this doesn't exist as a simple command, but so far, looking through the documentation I haven't found it. It's probably documented, just not where I'm looking.
x = h.heap()
x.size
returns the total size. For example:
from guppy import hpy
import networkx as nx
h = hpy()
num_nodes = 1000
num_edges = 5000
G = nx.gnm_random_graph(num_nodes, num_edges)
x = h.heap()
print(x.size)
prints
19820968
which is consistent with the Total size reported by
print(x)
# Partition of a set of 118369 objects. Total size = 19820904 bytes.
# Index Count % Size % Cumulative % Kind (class / dict of class)
# 0 51057 43 6905536 35 6905536 35 str
# 1 7726 7 3683536 19 10589072 53 dict (no owner)
# 2 28416 24 2523064 13 13112136 66 tuple
# 3 516 0 1641312 8 14753448 74 dict of module
# 4 7446 6 953088 5 15706536 79 types.CodeType
# 5 6950 6 834000 4 16540536 83 function
# 6 584 0 628160 3 17168696 87 dict of type
# 7 584 0 523144 3 17691840 89 type
# 8 169 0 461696 2 18153536 92 unicode
# 9 174 0 181584 1 18335120 93 dict of class
# <235 more rows. Type e.g. '_.more' to view.>

Way to find the total Memory Used by a program in Python Kivy

Is there a way yo find total memory used by a process and whole program in Python Kivy .
I.e. some way by which i can find out :
total memory used by the program?
objects active and using how much memory ?
Thanks!
Heapy is a memory profiler for Python. Use it like this:
>>> from guppy import hpy
>>> h = hpy()
>>> h.heap()
The output will be something like this:
Partition of a set of 1449133 objects. Total size = 102766644 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 985931 68 46300932 45 46300932 45 str
1 24681 2 22311624 22 68612556 67 dict of pkgcore.ebuild.ebuild_src.package
2 49391 3 21311864 21 89924420 88 dict (no owner)
3 115974 8 3776948 4 93701368 91 tuple
4 152181 11 3043616 3 96744984 94 long
5 36009 2 1584396 2 98329380 96 weakref.KeyedRef
6 11328 1 1540608 1 99869988 97 dict of pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace
7 24702 2 889272 1 100759260 98 types.MethodType
8 11424 1 851840 1 101611100 99 list
9 24681 2 691068 1 102302168 100 pkgcore.ebuild.ebuild_src.package
<54 more rows. Type e.g. '_.more' to view.>
This is "basically a snapshot of what's reachable in ram". I didn't do much Kivy development so I never got around to profiling, but I think this should work.
See:
Getting started with Heapy
How to use guppy/heapy for tracking down memory usage

Categories