Poor Python Multiprocessing Performance

Poor Python Multiprocessing Performance - python

I attempted to speed up my python program using the multiprocessing module but I found it was quite slow.
A Toy example is as follows:
import time
from multiprocessing import Pool, Manager
class A:
def __init__(self, i):
self.i = i
def score(self, x):
return self.i - x
class B:
def __init__(self):
self.i_list = list(range(1000))
self.A_list = []
def run_1(self):
for i in self.i_list:
self.x = i
map(self.compute, self.A_list) #map version
self.A_list.append(A(i))
def run_2(self):
p = Pool()
for i in self.i_list:
self.x = i
p.map(self.compute, self.A_list) #multicore version
self.A_list.append(A(i))
def compute(self, some_A):
return some_A.score(self.x)
if __name__ == "__main__":
st = time.time()
foo = B()
foo.run_1()
print("Map: ", time.time()-st)
st = time.time()
foo = B()
foo.run_2()
print("MultiCore: ", time.time()-st)
The outcomes on my computer(Windows 10, Python 3.5) is
Map: 0.0009996891021728516
MultiCore: 19.34994912147522
Similar results can be observed on Linux Machine (CentOS 7, Python 3.6).
I guess it was caused by the pickling/depicking of objects among processes? I tried to use the Manager module but failed to get it to work.
Any help will be appreciated.

Wow that's impressive (and slow!).
Yes, this is because Objects must be accessed concurrently by workers, which is costly.
So I played a little bit and managed to gain a lot of perf by making the compute method static. So basically, you don't need to share the B object instance anymore. Still very slow but better.
import time
from multiprocessing import Pool, Manager
class A:
def __init__(self, i):
self.i = i
def score(self, x):
return self.i - x
x=0
def static_compute(some_A):
res= some_A.score(x)
return res
class B:
def __init__(self):
self.i_list = list(range(1000))
self.A_list = []
def run_1(self):
for i in self.i_list:
x=i
map(self.compute, self.A_list) #map version
self.A_list.append(A(i))
def run_2(self):
p = Pool(4)
for i in self.i_list:
x=i
p.map(static_compute, self.A_list) #multicore version
self.A_list.append(A(i))
The other reason that makes it slow, to me, is the fixed cost of using Pool. You're actually launching a Pool.map 1000 times. If there is a fixed cost associated with launching those processes, that would make the overall strategy slow. Maybe you should test that with longer A_list (longer than the i_list, which requires a different algo).

The reasoning behind this is:
the map call is performed by main
*meaning when foo.run_1() is called. The main is mapping for itself.
much like telling your self what to do.
*when foo_run2() is called the main is mapping for max process capablilites of that pc.
If your max process is 6, then the main is mapping for 6 Threads.
much like orginizing 6 people to tell you something.
Side Note:
if you use:
p.imap(self.compute,self.A_list)
the items will append in order to A_list

Related

How to pass arguments to class after initialized?

I'm trying to create threads to run a class method. However, when I try to pass one class to another, it tries to initialize the class and never gets threaded.
I'm taking a list of tuples and trying to pass that list to the cfThread class, along with the class method that I want to use. From here, I'd like to create a separate thread to run the classes method and take action on one of tuples from the list. The REPLACEME is a placeholder because the class is looking for a tuple but I don't have one to pass to it yet. My end goal is to be able to pass a target (class / function) to a thread class that can create it's own queue and manage the threads without having to manually do it.
Below is a simple example to hopefully do a better job of explaining what I'm trying to do.
#!/bin/python3.10
import concurrent.futures
class math:
def __init__(self, num) -> None:
self.num = num
def add(self):
return self.num[0] + self.num[1]
def sub(self):
return self.num[0] - self.num[1]
def mult(self):
return self.num[0] * self.num[1]
class cfThread:
def __init__(self, target, args):
self.target = target
self.args = args
def run(self):
results = []
with concurrent.futures.ThreadPoolExecutor(10) as execute:
threads = []
for num in self.args:
result = execute.submit(self.target, num)
threads.append(result)
for result in concurrent.futures.as_completed(threads):
results.append(result)
return results
if __name__ == '__main__':
numbers = [(1,2),(3,4),(5,6)]
results = cfThread(target=math(REPLACEME).add(), args=numbers).run()
print(results)

target has to be a callable; you want to wrap your call to add in a lambda expression.
results = cfThread(target=lambda x: math(x).add(), args=numbers)

Using multiprocessing module in class

I have the following program and I want to use multiprocessing module. It uses external files, in which I call the PSO class from another file. costfunc is a function from another file and the other args are just variables.
Swarm is a list containing as much objects as the value of ps, and each object has multiple attributes which need to update at every iteration.
Following Hannu implemented multiprocessing.pool and it is working, however it is taking much more time than running sequentially.
Would appreciate if you could tell me what are the reasons for it happening and how can I make it run faster?
# IMPORT PACKAGES -----------------------------------------------------------+
import random
import numpy as np
# IMPORT FILES --------------------------------------------------------------+
from Reducer import initial
# Particle Class ------------------------------------------------------------+
class Particle:
def __init__(self,D,bounds_x,bounds_v):
self.Position_i = [] # particle position
self.Velocity_i = [] # particle velocity
self.Cost_i = -1 # cost individual
self.Position_Best_i = [] # best position individual
self.Cost_Best_i = -1 # best cost individual
self.Constraint_Best_i = [] # best cost individual contraints
self.Constraint_i = [] # constraints individual
self.Penalty_i = -1 # constraints individual
x0,v0 = initial(D,bounds_x,bounds_v)
for i in range(0,D):
self.Velocity_i.append(v0[i])
self.Position_i.append(x0[i])
# evaluate current fitness
def evaluate(self,costFunc,i):
self.Cost_i, self.Constraint_i,self.Penalty_i = costFunc(self.Position_i,i)
# check to see if the current position is an individual best
if self.Cost_i < self.Cost_Best_i or self.Cost_Best_i == -1:
self.Position_Best_i = self.Position_i
self.Cost_Best_i = self.Cost_i
self.Constraint_Best_i = self.Constraint_i
self.Penalty_Best_i = self.Penalty_i
return self
def proxy(gg, costf, i):
print(gg.evaluate(costf, i))
# Swarm Class ---------------------------------------------------------------+
class PSO():
def __init__(self,costFunc,bounds_x,bounds_v,ps,D,maxiter):
self.Cost_Best_g = -1 # Best Cost for Group
self.Position_Best_g = [] # Best Position for Group
self.Constraint_Best_g = []
self.Penalty_Best_g = -1
# Establish Swarm
Swarm = []
for i in range(0,ps):
Swarm.append(Particle(D,bounds_x,bounds_v))
# Begin optimization Loop
i = 1
self.Evol = []
while i <= maxiter:
pool = multiprocessing.Pool(processes = 4)
results = pool.map_async(partial(proxy, costf = costFunc, i=i), Swarm)
pool.close()
pool.join()
Swarm = results.get()
if Swarm[j].Cost_i< self.Cost_Best_g or self.Cost_Best_g == -1:
self.Position_Best_g = list(Swarm[j].Position_i)
self.Cost_Best_g = float(Swarm[j].Cost_i)
self.Constraint_Best_g = list(Swarm[j].Constraint_i)
self.Penalty_Best_g = float(Swarm[j].Penalty_i)
self.Evol.append(self.Cost_Best_g)
i += 1

You need a proxy function to do the function call, and as you need to deliver arguments to the function, you will need partial as well. Consider this:
from time import sleep
from multiprocessing import Pool
from functools import partial
class Foo:
def __init__(self, a):
self.a = a
self.b = None
def evaluate(self, CostFunction, i):
xyzzy = CostFunction(i)
sleep(0.01)
self.b = self.a*xyzzy
return self
def CostFunc(i):
return i*i
def proxy(gg, costf, i):
return gg.evaluate(costf, i)
def main():
Swarm = []
for i in range(0,10):
nc = Foo(i)
Swarm.append(nc)
p = Pool()
for i in range(100,102):
results = p.map_async(partial(proxy, costf=CostFunc, i=i), Swarm)
p.close()
p.join()
Swarm = []
for a in results.get():
Swarm.append(a)
for s in Swarm:
print (s.b)
main()
This creates a Swarm list of objects, and within each of these objects is evaluate that is the function you need to call. Then we have parameters (CostFunc and an integer as in your code).
We will now use Pool.map_async to map your Swarm list to your pool. This gives each worker one instance of Foo from your Swarm list, and we have a proxy function that actually calls then evaluate().
However, as apply_async only sends an object from the iterable to the function, instead of using proxy as the target function to pool, we use partial to create the target function to pass the "fixed" arguments.
And as you apparently want to get the modified objects back, this requires another trick. If you modify the target object in Pool process, it just modifies the local copy and throws it away as soon as the processing completes. There would be no way for the subprocess to modify main process memory anyway (or vice versa), this would cause a segmentation fault.
Instead, after modifying the object, we return self. When your pool has completed its work, we discard the old Swarm and reassemble it from the result objects.

python Priority Queue implementation

i'm having trouble creating an insert function with the following parameters. The insert function should take in a priority queue, and an element and inserts it using the priority rules -
The priority queue will take a series of tasks and order them
based on their importance. Each task has an integer priority from 10 (highest priority) to 1
(lowest priority). If two tasks have the same priority, the order should be based on the order
they were inserted into the priority queue (earlier ﬁrst).
So, as of right now i've created the following code to initialize some of the things needed...
class Tasks():
__slots__ = ('name', 'priority')
def __init__(bval):
bval.name = myName
bval.priority = myPriority
return bval
class PriorityQueue():
__slots__ = ('queue', 'element')
def __init__(aval):
aval.queue = queue
aval.element = element
return aval
The code i'm trying to write is insert(element, queue): which should insert the elements using the priority queue. Similarly, myPriorty is an integer from 1 to 10.
Similarly can I do the following to insure that I create a priority from 1 to 10...
def __init__(bval , myPriority = 10):
bval.priority = myPriority
bval.pq = [[] for priority in range(bval.myPriority)]
so that I can replace myPriority in the insert task with bval.pq

Why are you trying to re-invent the wheel?
from Queue import PriorityQueue
http://docs.python.org/2/library/queue.html?highlight=priorityqueue#Queue.PriorityQueue
The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]). A typical pattern for entries is a tuple in the form:
(priority_number, data).
I use such a module to communicate between the UI and a background polling thread.
READ_LOOP = 5
LOW_PRI = 3
MED_PRI = 2
HI_PRI = 1
X_HI_PRI = 0
and then something like this:
CoreGUI.TX_queue.put((X_HI_PRI,'STOP',[]))

Note that there is a Queue. If you are okay with it being synchronized, I would use that.
Otherwise, you should use a heap to maintain your queue. See Python documentation with an example of that.

From a great book "Modern Python Standard Library Cookbook" by Alessandro Molina
Heaps are a perfect match for everything that has priorities, such as
a priority queue:
import time
import heapq
class PriorityQueue:
def __init__(self):
self._q = []
def add(self, value, priority=0):
heapq.heappush(self._q, (priority, time.time(), value))
def pop(self):
return heapq.heappop(self._q)[-1]
Example:
>>> def f1(): print('hello')
>>> def f2(): print('world')
>>>
>>> pq = PriorityQueue()
>>> pq.add(f2, priority=1)
>>> pq.add(f1, priority=0)
>>> pq.pop()()
hello
>>> pq.pop()()
world

A deque (from collections import deque) is the python implementation of a single queue. You can add items to one end and remove them from the other. If you have a deque for each priority level, you can add to the priority level you want.
Together, it looks a bit like this:
from collections import deque
class PriorityQueue:
def __init__(self, priorities=10):
self.subqueues = [deque() for _ in range(levels)]
def enqueue(self, priorty, value):
self.subqueues[priority].append(value)
def dequeue(self):
for queue in self.subqueues:
try:
return queue.popleft()
except IndexError:
continue

how to make a lot of parameters available to the entire system?

I have objects from various classes that work together to perform a certain task. The task requires a lot of parameters, provided by the user (through a configuration file). The parameters are used deep inside the system.
I have a choice of having the controller object read the configuration file, and then allocate the parameters as appropriate to the next layer of objects, and so on in each layer. But the only objects themselves know which parameters they need, so the controller object would need to learn a lot of detail about every other object.
The other choice is to bundle all the parameters into a collection, and pass the whole collection into every function call (equivalently, create a global object that stores them, and is accessible to everyone). This looks and feels ugly, and would cause a variety of minor technical issues (e.g., I can't allow two objects to use parameters with the same name; etc.)
What to do?

I have used the "global collection" alternative in the past.
If you are concerned with naming: how would you handle this in your config file? The way I see it, your global collection is a datastructure representing the same information you have in your config file, so if you have a way of resolving or avoiding name clashes in your cfg-file, you can do the same in your global collection.

I hope you don't feel like I'm thread-jacking you - what you're asking about is similar to what I was thinking about in terms of property aggregation to avoid the models you want to avoid.
I also nicked a bit of the declarative vibe that Elixir has turned me onto.
I'd be curious what the Python gurus of stack overflow think of it and what better alternatives there might be. I don't like big kwargs and if I can avoid big constructors I prefer to.
#!/usr/bin/python
import inspect
from itertools import chain, ifilter
from pprint import pprint
from abc import ABCMeta
class Property(object):
def __init__(self, value=None):
self._x = value
def __repr__(self):
return str(self._x)
def getx(self):
return self._x
def setx(self, value):
self._x = value
def delx(self):
del self._x
value = property(getx, setx, delx, "I'm the property.")
class BaseClass(object):
unique_baseclass_thing = Property()
def get_prop_tree(self):
mro = self.__class__.__mro__
r = []
for i in xrange( 0, len(mro) - 1 ):
child_prop_names = set(dir(mro[i]))
parent_prop_names = set(dir(mro[i+1]))
l_k = list( chain( child_prop_names - parent_prop_names ) )
l_n = [ (x, getattr(mro[i],x,None)) for x in l_k ]
l_p = list( ifilter(lambda y: y[1].__class__ == Property, l_n))
r.append(
(mro[i],
(dict
( l_p )
)
)
)
return r
def get_prop_list(self):
return list( chain(* [ x[1].items() for x in reversed( self.get_prop_tree() ) ] ) )
class SubClass(BaseClass):
unique_subclass_thing = Property(1)
class SubSubClass(SubClass):
unique_subsubclass_thing_one = Property("blah")
unique_subsubclass_thing_two = Property("foo")
if __name__ == '__main__':
a = SubSubClass()
for b in a.get_prop_tree():
print '---------------'
print b[0].__name__
for prop in b[1].keys():
print "\t", prop, "=", b[1][prop].value
print
for prop in a.get_prop_list():
When you run it..
SubSubClass
unique_subsubclass_thing_one = blah
unique_subsubclass_thing_two = foo
---------------
SubClass
unique_subclass_thing = 1
---------------
BaseClass
unique_baseclass_thing = None
unique_baseclass_thing None
unique_subclass_thing 1
unique_subsubclass_thing_one blah
unique_subsubclass_thing_two foo

Dictionary vs Object - which is more efficient and why?

What is more efficient in Python in terms of memory usage and CPU consumption - Dictionary or Object?
Background:
I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.
Example:
To check the performance I wrote two simple programs that do the same - one is using objects, other dictionary:
Object (execution time ~18sec):
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
Dictionary (execution time ~12sec):
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
Question:
Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?

Have you tried using __slots__?
From the documentation:
By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.
The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.
So does this save time as well as memory?
Comparing the three approaches on my computer:
test_slots.py:
class Obj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_obj.py:
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_dict.py:
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
test_namedtuple.py (supported in 2.6):
import collections
Obj = collections.namedtuple('Obj', 'i l')
all = {}
for i in range(1000000):
all[i] = Obj(i, [])
Run benchmark (using CPython 2.5):
$ lshw | grep product | head -n 1
product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py
real 0m27.398s (using 'normal' object)
real 0m16.747s (using __dict__)
real 0m11.777s (using __slots__)
Using CPython 2.6.2, including the named tuple test:
$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py
real 0m27.197s (using 'normal' object)
real 0m17.657s (using __dict__)
real 0m12.249s (using __slots__)
real 0m12.262s (using namedtuple)
So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.

Attribute access in an object uses dictionary access behind the scenes - so by using attribute access you are adding extra overhead. Plus in the object case, you are incurring additional overhead because of e.g. additional memory allocations and code execution (e.g. of the __init__ method).
In your code, if o is an Obj instance, o.attr is equivalent to o.__dict__['attr'] with a small amount of extra overhead.

Have you considered using a namedtuple? (link for python 2.4/2.5)
It's the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.
It's only downside compared with dictionaries is that (like tuples) it doesn't give you the ability to change attributes after creation.

Here is a copy of #hughdbrown answer for python 3.6.1, I've made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.
Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.
from datetime import datetime
import os
import psutil
process = psutil.Process(os.getpid())
ITER_COUNT = 1000 * 1000 * 5
RESULT=None
def makeL(i):
# Use this line to negate the effect of the strings on the test
# return "Python is smart and will only create one string with this line"
# Use this if you want to see the difference with 5 million unique strings
return "This is a sample string %s" % i
def timeit(method):
def timed(*args, **kw):
global RESULT
s = datetime.now()
RESULT = method(*args, **kw)
e = datetime.now()
sizeMb = process.memory_info().rss / 1024 / 1024
sizeMbStr = "{0:,}".format(round(sizeMb, 2))
print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))
return timed
class Obj(object):
def __init__(self, i):
self.i = i
self.l = makeL(i)
class SlotObj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = makeL(i)
from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])
#timeit
def profile_dict_of_nt():
return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]
#timeit
def profile_list_of_nt():
return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))
#timeit
def profile_dict_of_dict():
return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))
#timeit
def profile_list_of_dict():
return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]
#timeit
def profile_dict_of_obj():
return dict((i, Obj(i)) for i in range(ITER_COUNT))
#timeit
def profile_list_of_obj():
return [Obj(i) for i in range(ITER_COUNT)]
#timeit
def profile_dict_of_slot():
return dict((i, SlotObj(i)) for i in range(ITER_COUNT))
#timeit
def profile_list_of_slot():
return [SlotObj(i) for i in range(ITER_COUNT)]
profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()
And these are my results
Time Taken = 0:00:07.018720, provile_dict_of_nt, Size = 951.83
Time Taken = 0:00:07.716197, provile_list_of_nt, Size = 1,084.75
Time Taken = 0:00:03.237139, profile_dict_of_dict, Size = 1,926.29
Time Taken = 0:00:02.770469, profile_list_of_dict, Size = 1,778.58
Time Taken = 0:00:07.961045, profile_dict_of_obj, Size = 1,537.64
Time Taken = 0:00:05.899573, profile_list_of_obj, Size = 1,458.05
Time Taken = 0:00:06.567684, profile_dict_of_slot, Size = 1,035.65
Time Taken = 0:00:04.925101, profile_list_of_slot, Size = 887.49
My conclusion is:
Slots have the best memory footprint and are reasonable on speed.
dicts are the fastest, but use the most memory.

from datetime import datetime
ITER_COUNT = 1000 * 1000
def timeit(method):
def timed(*args, **kw):
s = datetime.now()
result = method(*args, **kw)
e = datetime.now()
print method.__name__, '(%r, %r)' % (args, kw), e - s
return result
return timed
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
class SlotObj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = []
#timeit
def profile_dict_of_dict():
return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))
#timeit
def profile_list_of_dict():
return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]
#timeit
def profile_dict_of_obj():
return dict((i, Obj(i)) for i in xrange(ITER_COUNT))
#timeit
def profile_list_of_obj():
return [Obj(i) for i in xrange(ITER_COUNT)]
#timeit
def profile_dict_of_slotobj():
return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))
#timeit
def profile_list_of_slotobj():
return [SlotObj(i) for i in xrange(ITER_COUNT)]
if __name__ == '__main__':
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slotobj()
profile_list_of_slotobj()
Results:
hbrown#hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749

There is no question.
You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).
I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.
About the reasons an object is slower, I think your measurement is not correct.
You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a "custom" object. Although from the language perspective they are the same, they have quite a different implementation.
After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.

Here are my test runs of the very nice script of #Jarrod-Chesney.
For comparison, I also run it against python2 with "range" replaced by "xrange".
By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.
Python 3.6.9:
Time Taken = 0:00:04.971369, profile_dict_of_nt, Size = 944.27
Time Taken = 0:00:05.743104, profile_list_of_nt, Size = 1,066.93
Time Taken = 0:00:02.524507, profile_dict_of_dict, Size = 1,920.35
Time Taken = 0:00:02.123801, profile_list_of_dict, Size = 1,760.9
Time Taken = 0:00:05.374294, profile_dict_of_obj, Size = 1,532.12
Time Taken = 0:00:04.517245, profile_list_of_obj, Size = 1,441.04
Time Taken = 0:00:04.590298, profile_dict_of_slot, Size = 1,030.09
Time Taken = 0:00:04.197425, profile_list_of_slot, Size = 870.67
Time Taken = 0:00:08.833653, profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006, profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105, profile_ordict_of_obj, Size = 1,799.29
Time Taken = 0:00:05.559248, profile_ordict_of_slot, Size = 1,257.75
Python 2.7.15+:
Time Taken = 0:00:05.193900, profile_dict_of_nt, Size = 906.0
Time Taken = 0:00:05.860978, profile_list_of_nt, Size = 1,177.0
Time Taken = 0:00:02.370905, profile_dict_of_dict, Size = 2,228.0
Time Taken = 0:00:02.100117, profile_list_of_dict, Size = 2,036.0
Time Taken = 0:00:08.353666, profile_dict_of_obj, Size = 2,493.0
Time Taken = 0:00:07.441747, profile_list_of_obj, Size = 2,337.0
Time Taken = 0:00:06.118018, profile_dict_of_slot, Size = 1,117.0
Time Taken = 0:00:04.654888, profile_list_of_slot, Size = 964.0
Time Taken = 0:00:59.576874, profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784, profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230, profile_ordict_of_obj, Size = 11,477.0
Time Taken = 0:00:51.485756, profile_ordict_of_slot, Size = 11,193.0
So, on both major versions, the conclusions of #Jarrod-Chesney are still looking good.

There is yet another way with the help of recordclass library to reduce memory usage if data structure isn't supposed to contain reference cycles.
Let's compare two classes:
class DataItem:
__slots__ = ('name', 'age', 'address')
def __init__(self, name, age, address):
self.name = name
self.age = age
self.address = address
and
$ pip install recordclass
>>> from recordclass import make_dataclass
>>> DataItem2 = make_dataclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
DataItem(name='Mike', age=10, address='Cherry Street 15')
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
64 40
It became possible since dataobject-based subclasses doesn't support cyclic garbage collection, which is not needed in such cases.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Poor Python Multiprocessing Performance - python

Related

How to pass arguments to class after initialized?

Using multiprocessing module in class

python Priority Queue implementation

how to make a lot of parameters available to the entire system?

Dictionary vs Object - which is more efficient and why?

Categories

Resources