What is the purpose of deepcopy's second parameter, memo? - python

from copy import*
print c
#{'a': 'aaa'}
print b
print c
# print {'a': 'aaa', 10310992: 3, 10310980: 4, 10311016: 1, 11588784: [1, 2, 3, 4, [1, 2, 3, 4]], 11566456: [1, 2, 3, 4], 10311004: 2}
why c print that
Please try to use the code, rather than text, because my English is not very good, thank you
in django.utils.tree.py
def __deepcopy__(self, memodict):
Utility method used by copy.deepcopy().
obj = Node(connector=self.connector, negated=self.negated)
obj.__class__ = self.__class__
obj.children = deepcopy(self.children, memodict)
obj.subtree_parents = deepcopy(self.subtree_parents, memodict)
return obj
import copy
memo = {}
x1 = range(5)
y1 = copy.deepcopy(x1, memo)
y2=copy.deepcopy(x2, memo)
print memo
print id(y1),id(y2),id(y3)
print y1,y2,y3
print memo
print :
{10310992: 3, 10310980: 4, 10311016: 1, 11588784: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4]], 10311028: 0, 11566456: [0, 1, 2, 3, 4], 10311004: 2}
{11572448: [6, 7, 8], 10310992: 3, 10310980: 4, 10311016: 1, 11572368: [2, 3, 4, 11], 10310956: 6, 10310896: 11, 10310944: 7, 11588784: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4], 6, 7, 8, [6, 7, 8], 11, [2, 3, 4, 11]], 10311028: 0, 11566456: [0, 1, 2, 3, 4], 10310932: 8, 10311004: 2}
11572408 11581280 11580960
['www', 1, 2, 3, 4] [6, 7, 8] [2, 3, 4, 11]
{11572448: [6, 7, 8], 10310992: 3, 10310980: 4, 10311016: 1, 11572368: [2, 3, 4, 11], 10310956: 6, 10310896: 11, 10310944: 7, 11588784: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4], 6, 7, 8, [6, 7, 8], 11, [2, 3, 4, 11]], 10311028: 0, 11566456: ['www', 1, 2, 3, 4], 10310932: 8, 10311004: 2}

No one above gave a good example of how to use it.
Here's what I do:
def __deepcopy__(self, memo):
copy = type(self)()
memo[id(self)] = copy
copy._member1 = self._member1
copy._member2 = deepcopy(self._member2, memo)
return copy
Where member1 is an object not requiring deepcopy (like a string or integer), and member2 is one that does, like another custom type or a list or dict.
I've used the above code on highly tangled object graphs and it works very well.
If you also want to make your classes pickleable (for file save / load), there is not analogous memo param for getstate / setstate, in other words the pickle system somehow keeps track of already referenced objects, so you don't need to worry.
The above works on PyQt5 classes that you inherit from (as well as pickling - for instance I can deepcopy or pickle a custom QMainWindow, QWidget, QGraphicsItem, etc.)
If there is some initialization code in your constructor that creates new objects, for instance a CustomWidget(QWidget) that creates a new CustomScene(QGraphicsScene), but you'd like to pickle or copy the scene from one CustomWidget to a new one, then one way is to make a new=True parameter in your __init__ and say:
def __init__(..., new=True):
if new:
self._scene = CustomScene()
def __deepcopy__(self, memo):
copy = type(self)(..., new=False)
copy._scene = deepcopy(self._scene, memo)
That ensures you don't create a CustomScene (or some big class that does a lot of initializing) twice! You also should use the same setting (new=False) in your __setstate__ method, eg.:
def __setstate__(self, data):
self.__init__(...., new=False)
self._member1 = data['member 1']
There are other ways to get around the above, but this is the one I converged to and use frequently.
Why did I talk about pickling as well? Because you will want both in any application typically, and you maintain them at the same time. If you add a member to your class, you add it to setstate, getstate, and deepcopy code. I would make it a rule that for any new class you make, you create the above three methods if you plan on doing copy / paste an file save / load in your app. Alternative is JSON and save / loading yourself, but then there's a lot more work for you to do including memoization.
So to support all the above, you need __deepcopy__, __setstate__, and __getstate__ methods and to import deepcopy:
from copy import deepcopy
, and when you write your pickle loader / saver functions (where you call pickle.load()/ pickle.dump() to load / save your object hierarchy / graph) do import _pickle as pickle for the best speeds (_pickle is some faster C impl which is usually compatible with your app requirements).

It's the memo dict, where id-to-object correspondence is kept to reconstruct complex object graphs perfectly. Hard to "use the code", but, let's try:
>>> import copy
>>> memo = {}
>>> x = range(5)
>>> y = copy.deepcopy(x, memo)
>>> memo
{399680: [0, 1, 2, 3, 4], 16790896: 3, 16790884: 4, 16790920: 1,
438608: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4]], 16790932: 0, 16790908: 2}
>>> id(x)
>>> for j in x: print j, id(j)
0 16790932
1 16790920
2 16790908
3 16790896
4 16790884
so as you see the IDs are exactly right. Also:
>>> for k, v in memo.items(): print k, id(v)
399680 435264
16790896 16790896
16790884 16790884
16790920 16790920
438608 435464
16790932 16790932
16790908 16790908
you see the identity for the (immutable) integers.
So here's a graph:
>>> z = [x, x]
>>> t = copy.deepcopy(z, memo)
>>> print id(t[0]), id(t[1]), id(y)
435264 435264 435264
so you see all the subcopies are the same objects as y (since we reused the memo).

You can read more by checking the Python online documentation:
The deepcopy() function is recursive, and it will work its way down through a deeply nested object. It uses a dictionary to detect objects it has seen before, to detect an infinite loop. You should just ignore this dictionary.
class A(object):
def __init__(self, *args):
self.lst = args
class B(object):
def __init__(self):
self.x = self
def my_deepcopy(arg):
obj = type(arg)() # get new, empty instance of type arg
for key in arg.__dict__:
obj.__dict__[key] = my_deepcopy(arg.__dict__[key])
return obj
except AttributeError:
return type(arg)(arg) # return new instance of a simple type such as str
a = A(1, 2, 3)
b = B()
b.x is b # evaluates to True
c = my_deepcopy(a) # works fine
c = my_deepcopy(b) # stack overflow, recurses forever
from copy import deepcopy
c = deepcopy(b) # this works because of the second, hidden, dict argument
Just ignore the second, hidden, dict argument. Do not try to use it.

Here's a quick illustration I used for explaining this to myself:
a = [1,2,3]
memo = {}
b = copy.deepcopy(a,memo)
# now memo = {139907464678864: [1, 2, 3], 9357408: 1, 9357440: 2, 9357472: 3, 28258000: [1, 2, 3, [1, 2, 3]]}
key = 139907464678864
print(id(a) == key) #True
print(id(b) == key) #False
print(id(a) == id(memo[key])) #False
print(id(b) == id(memo[key])) #True
in other words:
memo[id_of_initial_object] = copy_of_initial_object


How is sorted(key=lambda x:) implemented behind the scene?

An example:
names = ["George Washington", "John Adams", "Thomas Jefferson", "James Madison"]
sorted(names, key=lambda name: name.split()[-1].lower())
I know key is used to compare different names, but it can have two different implementations:
First compute all keys for each name, and bind the key and name together in some way, and sort them. The p
Compute the key each time when a comparison happens
The problem with the first approach is that it has to define another data structure to bind the key and data. The problem with the second approach is that the key might be computed for multiple times, that is, name.split()[-1].lower() will be executed many times, which is very time-consuming.
I am just wondering in which way Python implemented sorted().
The key function is executed just once per value, to produce a (keyvalue, value) pair; this is then used to sort and later on just the values are returned in the sorted order. This is sometimes called a Schwartzian transform.
You can test this yourself; you could count how often the function is called, for example:
>>> def keyfunc(value):
... keyfunc.count += 1
... return value
>>> keyfunc.count = 0
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.count
or you could collect all the values that are being passed in; you'll see that they follow the original input order:
>>> def keyfunc(value):
... keyfunc.arguments.append(value)
... return value
>>> keyfunc.arguments = []
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.arguments
[0, 8, 1, 6, 4, 5, 3, 7, 9, 2]
If you want to read the CPython source code, the relevant function is called listsort(), and the keyfunc is used in the following loop (saved_ob_item is the input array), which is executed before sorting takes place:
for (i = 0; i < saved_ob_size ; i++) {
keys[i] = PyObject_CallFunctionObjArgs(keyfunc, saved_ob_item[i],
if (keys[i] == NULL) {
for (i=i-1 ; i>=0 ; i--)
if (saved_ob_size >= MERGESTATE_TEMP_SIZE/2)
goto keyfunc_fail;
lo.keys = keys;
lo.values = saved_ob_item;
so in the end, you have two arrays, one with keys and one with the original values. All sort operations act on the two arrays in parallel, sorting the values in lo.keys and moving the elements in lo.values in tandem.

built-in max heap API in Python

Default heapq is min queue implementation and wondering if there is an option for max queue? Thanks.
I tried the solution using _heapify_max for max heap, but how to handle dynamically push/pop element? It seems _heapify_max could only be used during initialization time.
import heapq
def heapsort(iterable):
h = []
for value in iterable:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
if __name__ == "__main__":
print heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
Edit, tried _heapify_max seems not working for dynamically push/pop elements. I tried both methods output the same, both output is, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].
def heapsort(iterable):
h = []
for value in iterable:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
def heapsort2(iterable):
h = []
for value in iterable:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
if __name__ == "__main__":
print heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
print heapsort2([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
Thanks in advance,
In the past I have simply used sortedcontainers's SortedList for this, as:
> a = SortedList()
> a.add(3)
> a.add(2)
> a.add(1)
> a.pop()
It's not a heap, but it's fast and works directly as required.
If you absolutely need it to be a heap, you could make a general negation class to hold your items.
class Neg():
def __init__(self, x):
self.x = x
def __cmp__(self, other):
return -cmp(self.x, other.x)
def maxheappush(heap, item):
heapq.heappush(heap, Neg(item))
def maxheappop(heap):
return heapq.heappop(heap).x
But that will be using a little more memory.
There is a _heappop_max function in the latest cpython source that you may find useful:
def _heappop_max(heap):
"""Maxheap version of a heappop."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
heapq._siftup_max(heap, 0)
return returnitem
return lastelt
If you change the heappush logic using heapq._siftdown_max you should get the desired output:
def _heappush_max(heap, item):
heapq._siftdown_max(heap, 0, len(heap)-1)
def _heappop_max(heap):
"""Maxheap version of a heappop."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
heapq._siftup_max(heap, 0)
return returnitem
return lastelt
def heapsort2(iterable):
h = []
for value in iterable:
_heappush_max(h, value)
return [_heappop_max(h) for i in range(len(h))]
In [14]: heapsort2([1,3,6,2,7,9,0,4,5,8])
Out[14]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [15]: heapsort2([7, 8, 9, 6, 4, 2, 3, 5, 1, 0])
Out[15]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [16]: heapsort2([19,13,15,17,11,10,14,20,18])
Out[16]: [20, 19, 18, 17, 15, 14, 13, 11, 10]
In [17]: heapsort2(["foo","bar","foobar","baz"])
Out[17]: ['foobar', 'foo', 'baz', 'bar']

Unittest implementation Python property

I have a class with the following property clusters:
import numpy as np
class ClustererKmeans(object):
def __init__(self):
self.clustering = np.array([0, 0, 1, 1, 3, 3, 3, 4, 5, 5])
def clusters(self):
assert self.clustering is not None, 'A clustering shall be set before obtaining clusters'
return np.unique(self.clustering)
I now want to write a unittest for this simple property. I start off with:
from unittest import TestCase, main
from unittest.mock import Mock
class Test_clusters(TestCase):
def test_gw_01(self):
sut = Mock()
sut.clustering = np.array([0, 0, 1, 1, 3, 3, 3, 4, 5, 5])
r = ClustererKmeans.clusters(sut)
e = np.array([0, 1, 3, 4, 5])
# The following line checks to see if the two numpy arrays r and e are equal,
# and gives a detailed error message if they are not.
TestUtils.equal_np_matrix(self, r, e, 'clusters')
if __name__ == "__main__":
However, this does not run.
TypeError: 'property' object is not callable
I next change the line r = ClustererKmeans.clusters(sut) to the following:
r = sut.clusters
But again, I get an unexpected error.
AssertionError: False is not true : r shall be a <class 'numpy.ndarray'> (is now a <class 'unittest.mock.Mock'>)
Is there an easy way to test the implementation of a property in Python using the unittest framework?
To call property directly you can replace in your original code ClustererKmeans.clusters(sut) by ClustererKmeans.clusters.__get__(sut).
Even if I'm a mocking enthusiastic IMHO this case is not a good example to apply it. Mocking are useful to remove dependencies from class and resources. In your case ClustererKmeans have a empty constructor and there isn't any dependency to break. You can do it by:
class Test_clusters(TestCase):
def test_gw_01(self):
sut = ClustererKmeans()
sut.clustering = np.array([0, 0, 1, 1, 3, 3, 3, 4, 5, 5])
np.testing.assert_array_equal(np.array([0, 1, 2, 3, 4, 5]),sut.clusters)
If you would use mocking you can patch ClustererKmeans() object by using unittest.mock.patch.object:
def test_gw_01(self):
sut = ClustererKmeans()
with patch.object(sut,"clustering",new=np.array([0, 0, 1, 1, 3, 3, 3, 4, 5, 5])):
e = np.array([0, 1, 3, 4, 5])
np.testing.assert_array_equal(np.array([0, 1, 2, 3, 4, 5]),sut.clusters)
...but why use patch when python give to you a simple and direct way to do it?
Another way to use mock framework should be trust numpy.unique and check if the property do
the right work:
def test_gw_01(self, mock_unique):
sut = ClustererKmeans()
sut.clustering = Mock()
v = sut.clusters
#Check is called ....
#.... and return
self.assertIs(v, mock_unique.return_value)
#Moreover we can test the exception
sut.clustering = None
self.assertRaises(Exception, lambda s:s.clusters, sut)
I apologize for some errors but I don't test the code. I you notify to me I will fix all as soon as possible.

Iterating over parts of the Stern-Brocot tree in Python

My goal is to iterate over the pairs [a,b] a coprime to b and a+b<=n. For example, if n=8, I want to iterate over [1, 2], [2, 3], [3, 4], [3, 5], [1, 3], [2, 5], [1, 4], [1, 5], [1, 6], [1, 7].
My first thought was a recursive function using the Stern-Brocot tree:
def Stern_Brocot(n,a=0,b=1,c=1,d=1):
return 0
return [a+c,b+d]
return [a+c]+[b+d]+y
return [a+c]+[b+d]+x
return [a+c]+[b+d]+x+y
As expected,
>>> Stern_Brocot(8)
[1, 2, 2, 3, 3, 4, 3, 5, 1, 3, 2, 5, 1, 4, 1, 5, 1, 6, 1, 7]
And for n<=995, it works well. But suddenly at n>=996, it gives this error:
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
File "C:\Users\Pim\Documents\C Programmeren en Numerieke Wisk\Python\PE\PE127.py", line 35, in Stern_Brocot
File "C:\Users\Pim\Documents\C Programmeren en Numerieke Wisk\Python\PE\PE127.py", line 35, in Stern_Brocot
RuntimeError: maximum recursion depth exceeded in comparison
And since I want n to equal 120000, this approach won't work.
So my question is: what would be a good approach to iterate over parts of the Stern_Brocot tree? (if there's another way to iterate over coprime integers, that'd be good as well).
Here's an non-recursive implementation
def Stern_Brocot(n):
states = [(0, 1, 1, 1)]
result = []
while len(states) != 0:
a, b, c, d = states.pop()
if a + b + c + d <= n:
result.append((a+c, b+d))
states.append((a, b, a+c, b+d))
states.append((a+c, b+d, c, d))
return result
Before defining Stern_Brocot, add sys.setrecursionlimit(120000). This will set the program's recursion limit to 120000.
So, instead, you can do this:
import sys
def Stern_Brocot(n,a=0,b=1,c=1,d=1):
return 0
return [a+c,b+d]
return [a+c]+[b+d]+y
return [a+c]+[b+d]+x
return [a+c]+[b+d]+x+y

Is there a better way to do an "unravel" function in python?

I was faced with the problem of executing n number of concurrent events that all return iterators to the results they aquired. However, there was an optional limit parameter that says, basically, to consolidate all the iterators and return up-to limit results.
So, for example: I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.
Thus, unravel:
import itertools
def unravel(*iterables, with_limit = None):
make_iter = {a:iter(i) for a,i in enumerate(iterables)}
if not isinstance(with_limit, int):
with_limit = -1
resize = False
while True:
for iid, take_from in make_iter.items():
if with_limit == 0:
raise StopIteration
yield next(take_from)
except StopIteration:
resize = iid
with_limit -= 1
if resize:
resize = False
if len(make_iter.keys()) > 1:
else: raise StopIteration
>>> a = [1,2,3,4,5]
>>> b = [6,7,8,9,10]
>>> c = [1,3,5,7]
>>> d = [2,4,6,8]
>>> print([e for e in unravel(c, d)])
[1, 2, 3, 4, 5, 6, 7, 8]
>>> print([e for e in unravel(c, d, with_limit = 3)])
[1, 2, 3]
>>> print([e for e in unravel(a, b, with_limit = 6)])
[1, 6, 2, 7, 3, 8]
>>> print([e for e in unravel(a, b, with_limit = 100)])
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
Does something like this already exist, or is this a decent implementation?
Inspired by #abernert 's suggestion, this is what I went with. Thanks everybody!
def unravel(*iterables, limit = None):
yield from itertools.islice(
), limit)
>>> a = [x for x in range(10)]
>>> b = [x for x in range(5)]
>>> c = [x for x in range(0, 20, 2)]
>>> d = [x for x in range(1, 30, 2)]
>>> print(list(unravel(a, b)))
[1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> print(list(unravel(a, b, limit = 3)))
[1, 1, 2]
>>> print(list(unravel(a, b, c, d, limit = 20)))
[1, 1, 1, 2, 3, 2, 2, 4, 5, 3, 3, 6, 7, 4, 4, 8, 9, 5, 10, 11]
What you're doing here is almost just zip.
You want a flat iterable, rather than an iterable of sub-iterables, but chain fixes that.
And you want to take only the first N values, but islice fixes that.
So, if the lengths are all equal:
>>> list(chain.from_iterable(zip(a, b)))
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
>>> list(islice(chain.from_iterable(zip(a, b)), 7))
[1, 6, 2, 7, 3, 8, 4]
But if the lengths aren't equal, that will stop as soon as the first iterable finishes, which you don't want. And the only alternative in the stdlib is zip_longest, which fills in missing values with None.
You can pretty easily write a zip_longest_skipping (which is effectively the round_robin in Peter's answer), but you can also just zip_longest and filter out the results:
>>> list(filter(None, chain.from_iterable(zip_longest(a, b, c, d))))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]
(Obviously this doesn't work as well if your values are all either strings or None, but when they're all positive integers it works fine… to handle the "or None" case, do sentinel=object(), pass that to zip_longest, then filter on x is not sentinel.)
From the itertools example recipes:
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
Use itertools.islice to enforce your with_limit, eg:
print([e for e in itertools.islice(roundrobin(c, d), 3)])
>>> list(roundrobin(a, b, c, d))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]
For what you're actually trying to do, there's probably a much better solution.
I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.
OK, so why are the results in 8 separate iterables? There's no good reason for that. Instead of giving each thread its own queue (or global list and lock, or whatever you're using) and then trying to zip them together, why not have them all share a queue in the first place?
In fact, that's the default way that almost any thread pool is designed (including multiprocessing.Pool and concurrent.futures.Executor in the stdlib). Look at the main example for concurrent.futures.ThreadPoolExecutor:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
print('%r page is %d bytes' % (url, len(data)))
That's almost exactly your use case—spamming a bunch of URL downloads out over 5 different threads and gathering the results as they come in—without your problem even arising.
Of course it's missing with_limit, but you can just wrap that as_completed iterable in islice to handle that, and you're done.
This uses a generator and izip_longest to pull one item at a time from multiple iterators
from itertools import izip_longest
def unravel(cap, *iters):
counter = 0
for slice in izip_longest(*iters):
for entry in [s for s in slice if s is not None]:
yield entry
counter += 1
if counter >= cap: break
