I'm trying to get size of Numba typed dictionary in bytes:
from numba import njit
from numba.typed import Dict
from sys import getsizeof as gso
#njit
def get_dict(n):
d = {0:0}
for i in range(1,n):
d[i] = i
return d
print(gso(get_dict(10))
print(gso(get_dict(10000)))
In both cases getsizeof function returns 64 bytes. Obviously, size of Dictionary must depend on length.
If I transform typed dictionary into native Python dictionary using dict(), it works and returns 376 and 295016:
print(gso(dict(get_dict(10))))
print(gso(dict(get_dict(10000))))
How can I measure it?
Currently (numba 0.46.0) what you intended to do is very likely not possible with a DictType.
sys.getsizeof on containers is tricky at best and very misleading in the worst case. The problem is that getsizeof needs to reduce possibly very complicated implementation details to one integer number.
The first problem is that calling sys.getsizeof on a container typically only reports the size of the container not the contents of the container - or in case of an opaque container it only returns the size of the wrapper. What you encountered was the latter - DictType is just a wrapper for an opaque numba struct that's defined (probably) in c. So the 64 you're seeing is actually correct, that's the size of the wrapper. You can probably access the wrapped type, but given that it's hidden behind private attributes it's not intended for non-numba-code, so I won't go down that path - mostly because any answer relying on those kind of implementation details may be outdated at any time.
However sys.getsizeof requires deep understanding of the implementation details to be interpreted correctly. So even for the plain dict it's not obvious what the number represents. It's certainly the calculated memory in bytes of the container (without the contents) but it could also be a key-sharing dictionary (in your case it's not a key-sharing dictionary) where the number would be accurate but since part of the key-sharing dictionary are shared it's probably not the number you are looking for. As mentioned, it normally also doesn't account for the contents of the container but that's an implementation detail as well, for example numpy.array includes the size of the contents, while list, set, etc. don't. That's because a numpy array doesn't have "real contents" - at least it doesn't have Python objects as content.
So even if the DictType wrapper would report the size of the underlying dictionary, you would still need to know if it contained the contents in the dictionary, or as pointer to Python objects or (even more convoluted) as objects defined in another language (like C, or C++) to interpret the results correctly.
So my advise would be to not use sys.getsizeof, except out of curiosity or academic interest. And then only if you're willing to dig through all the implementation details (that may or may not change any time) in order to interpret the results correctly. If you're really interested in memory consumption you are often better off using a tool that tracks the memory usage of your program as a whole. That still has a lot of pitfalls (memory-reuse, unused memory-allocations) and requires a significant amount of knowledge how memory is used to interpret correctly (virtual memory, shared memory, and how memory is allocated) but it often yields a more realistic view how much memory your program actually uses.
import gc
import numba as nb
import psutil
#nb.njit
def get_dict(n):
d = {0:0}
for i in range(1,n):
d[i] = i
return d
get_dict(1)
gc.collect()
print(psutil.Process().memory_info())
d = get_dict(100_000)
gc.collect()
print(psutil.Process().memory_info())
Which gives on my computer:
pmem(rss=120696832, vms=100913152, num_page_faults=34254, peak_wset=120700928,
wset=120696832, peak_paged_pool=724280, paged_pool=724280, peak_nonpaged_pool=1255376,
nonpaged_pool=110224, pagefile=100913152, peak_pagefile=100913152, private=100913152)
pmem(rss=126820352, vms=107073536, num_page_faults=36717, peak_wset=129449984,
wset=126820352, peak_paged_pool=724280, paged_pool=724280, peak_nonpaged_pool=1255376,
nonpaged_pool=110216, pagefile=107073536, peak_pagefile=109703168, private=107073536)
Which shows that the program needed 6 123 520 bytes of memory (using the "rss" - resident set size) more after the call than it had allocated before.
And similar for a plain Python dictionary:
import gc
import psutil
gc.collect()
print(psutil.Process().memory_info())
d = {i: i for i in range(100_000)}
gc.collect()
print(psutil.Process().memory_info())
del d
gc.collect()
Which gave a difference of 8 552 448 bytes on my computer.
Note that these numbers represent the complete process, so treat these with caution. For example for small values (get_dict(10)) they return 4096 on my windows computer because that's the page-size of windows. The program there actually allocated more space than the dictionary needs because of the OS-restrictions.
However even with these pitfalls and restrictions that's still significantly more accurate if you're interested the memory-requirements of your program.
If you still (out of curiosity) want to know how much memory the DictType theoretically needs you should probably ask the numba developers to enhance numba so that they implement __sizeof__ for their Python wrappers so that the numbers are more representative. You could for example open an issue on their issue tracker or ask on their mailing list.
An alternative may be to use other third-party tools, for example pypmler, but I haven't used these myself so I don't know if they work in this case.
Related
I'm looking for an example that purposely makes a memory leak in Python.
It should be as short and simple as possible and ideally not use non-standard dependencies (that could simply do the memory leak in C code) or multi-threading/processing.
I've seen memory leaks achieved before but only when bad things were being done to libraries such as matplotlib. Also, there are many questions about how to find and fix memory leaks in Python, but they all seem to be big programs with lots of external dependencies.
The reason for asking this is about how good Python's GC really is. I know it detects reference cycles. However, can it be tricked? Is there some way to leak memory? It may be impossible to solve the most restrictive version of this problem. In that case, I'm very happy to see a rigorous argument why. Ideally, the answer should refer to the actual implementation and not just state that "an ideal garbage collector would be ideal and disallow memory leaks".
For nitpicking purposes: An ideal solution to the problem would be a program like this:
# Use Python version at least v3.10
# May use imports.
# Bonus points for only standard library.
# If the problem is unsolvable otherwise (please argue that it is),
# then you may use e.g. Numpy, Scipy, Pandas. Minus points for Matplotlib.
def memleak():
# do whatever you want but only within this function
# No global variables!
# Bonus points for no observable side-effects (besides memory use)
# ...
for _ in range(100):
memleak()
The function must return and be called multiple times. Goals in order of bonus points (high number = many bonus points)
the program keeps using more memory, until it crashes.
after calling the function multiple times (e.g. the 100 specified above), the program may continue doing other (normal) things such that the memory leaked during the function is never freed.
Like 2 but the memory cannot be freed, even by by calling gc manually and similar means.
One way to "trick" CPython's garbage collector into leaking memory is by invalidating an object's reference count. We can do this by creating an extraneous strong reference that never gets deleted.
To create a new strong reference, we need to invoke Py_IncRef (or Py_NewRef) from Python's C API. This can be done via ctypes.pythonapi:
import ctypes
import sys
# Create C API callable
inc_ref = ctypes.pythonapi.Py_IncRef
inc_ref.argtypes = [ctypes.py_object]
inc_ref.restype = None
# Create an arbitrary object.
obj = object()
# Print the number of references to obj.
# This should be 2:
# - one for the global variable 'obj'
# - one for the argument inside of 'sys.getrefcount'
print(sys.getrefcount(obj))
# Create a new strong reference.
inc_ref(obj)
# Print the number of references to obj.
# This should be 3 now.
print(sys.getrefcount(obj))
outputs
2
3
Concretely, you can write your memleak function as
import ctypes
def memleak():
# Create C api callable
inc_ref = ctypes.pythonapi.Py_IncRef
inc_ref.argtypes = [ctypes.py_object]
inc_ref.restype = None
# Allocate a large object
obj = list(range(10_000_000))
# Increment its ref count
inc_ref(obj)
# obj will have a dangling reference after this function exits
memleak() # leaks memory
An object with a dangling strong reference will never be freed by reference counting, and won't be detected as an unreachable object by the optional garbage collector. Running gc manually via
gc.collect()
will have not effect.
The goal is to simulate a high-radiation environment.
Normally, code like the following:
a = 5
print(a)
print(a)
would print:
5
5
I want to be able to change the underlying byte representation of a randomly during runtime (according to some predefined function that takes a seed). In that case, the following code:
a = RandomlyChangingInteger(5)
print(a)
print(a)
could result in:
4
2
One way this can be done for languages like C and C++ is to insert extra instructions that could potentially modify a, before every usage of a in the compiled code.
Something like BITFLIPS (which uses valgrind) is what I'm thinking about.
Is this even possible in Python?
You can do it, sort of. The built-in int is immutable, therefore you cannot modify its value. You can, however, create a custom class that emulates an int:
import random
class RandomlyChangingInteger(object):
def __int__(self):
return random.randint(0, 10)
def __str__(self):
return str(self.__int__())
then
a = RandomlyChangingInteger()
print(a)
print(a)
should print something like
4
5
Note that you can't use this class to do math as it stands. You must implement other int methods (such as __add__, __mul__, etc) first.
You're trying to simulate radiation-induced bitflips, but your expectations of what that would do are way off target. Radiation effects are much more likely to crash a Python program than they are to change an object's value to another valid value. This makes simulating radiation effects not very useful.
The CPython implementation relies on so many pointers and so much indirection that after a few bit flips in your data, at least one of them is almost certain to hit something that causes a crash. Perhaps corrupting an object's type pointer, causing a bad memory access the next time you try to do almost anything with the object, or perhaps corrupting a reference count, causing an object to be freed while still in use. Maybe corrupting the length of an int (Python ints are variable-width), causing Python to try to read past the end of the allocation.
Where a C array of ints might just be a giant block of numerical data, where random bit corruption could be detected or managed, a Python list of ints is mostly pointers and other metadata.
If you really want to simulate random bit flips, the best way to go would likely be to rebuild CPython with a tool like the BITFLIPS thing you linked.
In C, a statement x=x+1 will change the content at the same memory that is allocated for x. But in Python, since a variable can have different types, x at the left and right side of = may be of different types, which means they may refer to different pieces of memory. If so, after x changes its reference from the old memory to the new memory, the old memory can be reclaimed by the garbage collection mechanism. If it is the case, the following code may trigger the garbage collection process many times thus is very low efficient:
for i in range(1000000000)
i=i+1
Is my guess correct?
Update:
I need to correct the typo in the code to make the question clearer:
x=0
for i in range(1000000000)
x=x+1
#SvenMarnach, do you mean the integers 0,1,2,...,999999999 (which the label x once referred to) all exist in memory if garbage collection is not activated?
id can be used to track the 'allocation' of memory to objects. It should be used with caution, but here I think it's illuminating. id is a bit like a c pointer - that is, some how related to 'where' the object is located in memory.
In [18]: for i in range(0,1000,100):
...: print(i,id(i))
...: i = i+1
...: print(i,id(i))
...:
0 10914464
1 10914496
100 10917664
101 10917696
200 10920864
201 10920896
300 140186080959760
301 140185597404720
400 140186080959760
401 140185597404720
...
900 140186080959760
901 140185597404720
In [19]: id(1)
Out[19]: 10914496
Small integers (<256) are cached - that is, integer 1, once created is 'reused'.
In [20]: id(202)
Out[20]: 10920928 # same id as in the loop
In [21]: id(302)
Out[21]: 140185451618128 # different id
In [22]: id(901)
Out[22]: 140185597404208
In [23]: id(i)
Out[23]: 140185597404720 # = 901, but different id
In this loop, the first few iterations create or reuse small integers. But it appears that when creating larger integers, it is 'reusing' memory. It may not be full blown garbage collection, but the code is somehow optimized to avoid unnecessary memory use.
Generally as Python programmers don't focus on those details. Write clean reliable Python code. In this example, modifying an iteration variable in the loop is poor practice (even if it is just an example).
You are mostly correct, though I think a few clarifications may help.
First, the concept of variables in C in Python is rather different. In C, a variable generally references a fixed location in memory, as you stated yourself. In Python, a variable is just a label that can be attached to any object. An object could have multiple such labels, or none at all, and labels can be freely moved between objects. An assignment in C copies a new value to a memory location, while an assignment in Python attaches a new label to an object.
Integers are also very different in both languages. In C, an integer has a fixed size, and stores an integer value in a format native to the hardware. In Python, integers have arbitrary precision. They are stored as array of "digits" (usually 30-bit integers in CPython) together with a Python type header storing type information. Bigger integers will occupy more memory than smaller integers.
Moreover, integer objects in Python are immutable – they can't be changed once created. This means every arithmetic operation creates a new integer object. So the loop in your code indeed creates a new integer object in each iteration.
However, this isn't the only overhead. It also creates a new integer object for i in each iteration, which is dropped at the end of the loop body. And the arithmetic operation is dynamic – Python needs to look up the type of x and its __add__() method in each iteration to figure out how to add objects of this type. And function call overhead in Python is rather high.
Garbage collection and memory allocation on the other hand are rather fast in CPython. Garbage collection for integers relies completely on reference counting (no reference cycles possible here), which is fast. And for allocation, CPython uses an arena allocator for small objects that can quickly reuse memory slots without calling the system allocator.
So in summary, yes, compared to the same code in C, this code will run awfully slow in Python. A modern C compiler would simply compute the result of this loop at compile time and load the result to a register, so it would finish basically immediately. If raw speed for integer arithmetic is what you want, don't write that code in Python.
I am trying to write a python module which checks consistency of the mac addresses stored in the HW memory. The scale could go upto 80K mac addresses. But when I make multiple calls to get a list of mac addresses through a python method, the memory does not get freed up and eventually I am running out of memory.
An example of what I am doing is:
import resource
import copy
def get_list():
list1 = None
list1 = []
for j in range(1,10):
for i in range(0,1000000):
list1.append('abcdefg')
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
return list1
for i in range(0,5):
x=get_list()
On executing the script, I get:
45805
53805
61804
69804
77803
85803
93802
101801
109805
118075
126074
134074
142073
150073
158072
166072
174071
182075
190361
198361
206360
214360
222359
230359
238358
246358
254361
262365
270364
278364
286363
294363
302362
310362
318361
326365
334368
342368
350367
358367
366366
374366
382365
390365
398368
i.e. the memory usage reported keeps going up.
Is it that I am looking at the memory usage in a wrong way?
And if not, is there a way to not have the memory usage go up between function calls in a loop. (In my case with mac addresses, I do not call the same list of mac addresses again. I get the list from a different section of the HW memory. i.e. all the calls to get mac addresses are valid, but after each call the data obtained is useless and can be discarded.
Python is a managed language. Memory is, generally speaking, the concern of the implementation rather than the average developer. The system is designed to reclaim memory that you are no longer using automatically.
If you are using CPython, an object will be destroyed when its reference count reaches zero, or when the cyclic garbage collector finds and collects it. If you want to reclaim the memory belonging to an object, you need to ensure that no references to it remain, or at least that it is not reachable from any stack frame's variables. That is to say, it should not be possible to refer to the data you want reclaimed, either directly or through some expression such as foo.bar[42], from any currently executing function.
If you are using another implementation, such as PyPy, the rules may vary. In particular, reference counting is not required by the Python language standard, so objects may not go away until the next garbage collection run (and then you may have to wait for the right generation to be collected).
For older versions of Python (prior to Python 3.4), you also need to worry about reference cycles which involve finalizers (__del__() methods). The old garbage collector cannot collect such cycles, so they will (basically) get leaked. Most built-in types do not have finalizers, are not capable of participating in reference cycles, or both, but this is a legitimate concern if you are creating your own classes.
For your use case, you should empty or replace the list when you no longer need its contents (with e.g. list1 = [] or del list1[:]), or return from the function which created it (assuming it's a local variable, rather than a global variable or some other such thing). If you find that you are still running out of memory after that, you should either switch to a lower-overhead language like C or invest in more memory. For more complicated cases, you can use the gc module to test and evaluate how the garbage collector is interacting with your program.
Try this : it might not Lways free the memory as it may still be in use.
See if it works
gc.collect()
The question arose when answering to another SO question (there).
When I iterate several times over a python set (without changing it between calls), can I assume it will always return elements in the same order? And if not, what is the rationale of changing the order ? Is it deterministic, or random? Or implementation defined?
And when I call the same python program repeatedly (not random, not input dependent), will I get the same ordering for sets?
The underlying question is if python set iteration order only depends on the algorithm used to implement sets, or also on the execution context?
There's no formal guarantee about the stability of sets. However, in the CPython implementation, as long as nothing changes the set, the items will be produced in the same order. Sets are implemented as open-addressing hashtables (with a prime probe), so inserting or removing items can completely change the order (in particular, when that triggers a resize, which reorganizes how the items are laid out in memory.) You can also have two identical sets that nonetheless produce the items in different order, for example:
>>> s1 = {-1, -2}
>>> s2 = {-2, -1}
>>> s1 == s2
True
>>> list(s1), list(s2)
([-1, -2], [-2, -1])
Unless you're very certain you have the same set and nothing touched it inbetween the two iterations, it's best not to rely on it staying the same. Making seemingly irrelevant changes to, say, functions you call inbetween could produce very hard to find bugs.
A set or frozenset is inherently an unordered collection. Internally, sets are based on a hash table, and the order of keys depends both on the insertion order and on the hash algorithm. In CPython (aka standard Python) integers less than the machine word size (32 bit or 64 bit) hash to themself, but text strings, bytes strings, and datetime objects hash to integers that vary randomly; you can control that by setting the PYTHONHASHSEED environment variable.
From the __hash__ docs:
Note
By default, the __hash__() values of str, bytes and datetime
objects are “salted” with an unpredictable random value. Although they
remain constant within an individual Python process, they are not
predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service
caused by carefully-chosen inputs that exploit the worst case
performance of a dict insertion, O(n^2) complexity. See
http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of dicts, sets and
other mappings. Python has never made guarantees about this ordering
(and it typically varies between 32-bit and 64-bit builds).
See also PYTHONHASHSEED.
The results of hashing objects of other classes depend on the details of the class's __hash__ method.
The upshot of all this is that you can have two sets containing identical strings but when you convert them to lists they can compare unequal. Or they may not. ;) Here's some code that demonstrates this. On some runs, it will just loop, not printing anything, but on other runs it will quickly find a set that uses a different order to the original.
from random import seed, shuffle
seed(42)
data = list('abcdefgh')
a = frozenset(data)
la = list(a)
print(''.join(la), a)
while True:
shuffle(data)
lb = list(frozenset(data))
if lb != la:
print(''.join(data), ''.join(lb))
break
typical output
dachbgef frozenset({'d', 'a', 'c', 'h', 'b', 'g', 'e', 'f'})
deghcfab dahcbgef
And when I call the same python
program repeatedly (not random, not
input dependent), will I get the same
ordering for sets?
I can answer this part of the question now after a quick experiment. Using the following code:
class Foo(object) :
def __init__(self,val) :
self.val = val
def __repr__(self) :
return str(self.val)
x = set()
for y in range(500) :
x.add(Foo(y))
print list(x)[-10:]
I can trigger the behaviour that I was asking about in the other question. If I run this repeatedly then the output changes, but not on every run. It seems to be "weakly random" in that it changes slowly. This is certainly implementation dependent so I should say that I'm running the macports Python2.6 on snow-leopard. While the program will output the same answer for long runs of time, doing something that affects the system entropy pool (writing to the disk mostly works) will somethimes kick it into a different output.
The class Foo is just a simple int wrapper as experiments show that this doesn't happen with sets of ints. I think that the problem is caused by the lack of __eq__ and __hash__ members for the object, although I would dearly love to know the underlying explanation / ways to avoid it. Also useful would be some way to reproduce / repeat a "bad" run. Does anyone know what seed it uses, or how I could set that seed?
It’s definitely implementation defined. The specification of a set says only that
Being an unordered collection, sets do not record element position or order of insertion.
Why not use OrderedDict to create your own OrderedSet class?
The answer is simply a NO.
Python set operation is NOT stable.
I did a simple experiment to show this.
The code:
import random
random.seed(1)
x=[]
class aaa(object):
def __init__(self,a,b):
self.a=a
self.b=b
for i in range(5):
x.append(aaa(random.choice('asf'),random.randint(1,4000)))
for j in x:
print(j.a,j.b)
print('====')
for j in set(x):
print(j.a,j.b)
Run this for twice, you will get this:
First time result:
a 2332
a 1045
a 2030
s 1935
f 1555
====
a 2030
a 2332
f 1555
a 1045
s 1935
Process finished with exit code 0
Second time result:
a 2332
a 1045
a 2030
s 1935
f 1555
====
s 1935
a 2332
a 1045
f 1555
a 2030
Process finished with exit code 0
The reason is explained in comments in this answer.
However, there are some ways to make it stable:
set PYTHONHASHSEED to 0, see details here, here and here.
Use OrderedDict instead.
As pointed out, this is strictly an implementation detail.
But as long as you don’t change the structure between calls, there should be no reason for a read-only operation (= iteration) to change with time: no sane implementation does that. Even randomized (= non-deterministic) data structures that can be used to implement sets (e.g. skip lists) don’t change the reading order when no changes occur.
So, being rational, you can safely rely on this behaviour.
(I’m aware that certain GCs may reorder memory in a background thread but even this reordering will not be noticeable on the level of data structures, unless a bug occurs.)
The definition of a set is unordered, unique elements ("Unordered collections of unique elements"). You should care only about the interface, not the implementation. If you want an ordered enumeration, you should probably put it into a list and sort it.
There are many different implementations of Python. Don't rely on undocumented behaviour, as your code could break on different Python implementations.