Dictionary vs Object - which is more efficient and why? - python

What is more efficient in Python in terms of memory usage and CPU consumption - Dictionary or Object?
Background:
I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.
Example:
To check the performance I wrote two simple programs that do the same - one is using objects, other dictionary:
Object (execution time ~18sec):
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
Dictionary (execution time ~12sec):
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
Question:
Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?

Have you tried using __slots__?
From the documentation:
By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.
The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.
So does this save time as well as memory?
Comparing the three approaches on my computer:
test_slots.py:
class Obj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_obj.py:
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_dict.py:
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
test_namedtuple.py (supported in 2.6):
import collections
Obj = collections.namedtuple('Obj', 'i l')
all = {}
for i in range(1000000):
all[i] = Obj(i, [])
Run benchmark (using CPython 2.5):
$ lshw | grep product | head -n 1
product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py
real 0m27.398s (using 'normal' object)
real 0m16.747s (using __dict__)
real 0m11.777s (using __slots__)
Using CPython 2.6.2, including the named tuple test:
$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py
real 0m27.197s (using 'normal' object)
real 0m17.657s (using __dict__)
real 0m12.249s (using __slots__)
real 0m12.262s (using namedtuple)
So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.

Attribute access in an object uses dictionary access behind the scenes - so by using attribute access you are adding extra overhead. Plus in the object case, you are incurring additional overhead because of e.g. additional memory allocations and code execution (e.g. of the __init__ method).
In your code, if o is an Obj instance, o.attr is equivalent to o.__dict__['attr'] with a small amount of extra overhead.

Have you considered using a namedtuple? (link for python 2.4/2.5)
It's the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.
It's only downside compared with dictionaries is that (like tuples) it doesn't give you the ability to change attributes after creation.

Here is a copy of #hughdbrown answer for python 3.6.1, I've made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.
Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.
from datetime import datetime
import os
import psutil
process = psutil.Process(os.getpid())
ITER_COUNT = 1000 * 1000 * 5
RESULT=None
def makeL(i):
# Use this line to negate the effect of the strings on the test
# return "Python is smart and will only create one string with this line"
# Use this if you want to see the difference with 5 million unique strings
return "This is a sample string %s" % i
def timeit(method):
def timed(*args, **kw):
global RESULT
s = datetime.now()
RESULT = method(*args, **kw)
e = datetime.now()
sizeMb = process.memory_info().rss / 1024 / 1024
sizeMbStr = "{0:,}".format(round(sizeMb, 2))
print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))
return timed
class Obj(object):
def __init__(self, i):
self.i = i
self.l = makeL(i)
class SlotObj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = makeL(i)
from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])
#timeit
def profile_dict_of_nt():
return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]
#timeit
def profile_list_of_nt():
return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))
#timeit
def profile_dict_of_dict():
return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))
#timeit
def profile_list_of_dict():
return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]
#timeit
def profile_dict_of_obj():
return dict((i, Obj(i)) for i in range(ITER_COUNT))
#timeit
def profile_list_of_obj():
return [Obj(i) for i in range(ITER_COUNT)]
#timeit
def profile_dict_of_slot():
return dict((i, SlotObj(i)) for i in range(ITER_COUNT))
#timeit
def profile_list_of_slot():
return [SlotObj(i) for i in range(ITER_COUNT)]
profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()
And these are my results
Time Taken = 0:00:07.018720, provile_dict_of_nt, Size = 951.83
Time Taken = 0:00:07.716197, provile_list_of_nt, Size = 1,084.75
Time Taken = 0:00:03.237139, profile_dict_of_dict, Size = 1,926.29
Time Taken = 0:00:02.770469, profile_list_of_dict, Size = 1,778.58
Time Taken = 0:00:07.961045, profile_dict_of_obj, Size = 1,537.64
Time Taken = 0:00:05.899573, profile_list_of_obj, Size = 1,458.05
Time Taken = 0:00:06.567684, profile_dict_of_slot, Size = 1,035.65
Time Taken = 0:00:04.925101, profile_list_of_slot, Size = 887.49
My conclusion is:
Slots have the best memory footprint and are reasonable on speed.
dicts are the fastest, but use the most memory.

from datetime import datetime
ITER_COUNT = 1000 * 1000
def timeit(method):
def timed(*args, **kw):
s = datetime.now()
result = method(*args, **kw)
e = datetime.now()
print method.__name__, '(%r, %r)' % (args, kw), e - s
return result
return timed
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
class SlotObj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = []
#timeit
def profile_dict_of_dict():
return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))
#timeit
def profile_list_of_dict():
return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]
#timeit
def profile_dict_of_obj():
return dict((i, Obj(i)) for i in xrange(ITER_COUNT))
#timeit
def profile_list_of_obj():
return [Obj(i) for i in xrange(ITER_COUNT)]
#timeit
def profile_dict_of_slotobj():
return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))
#timeit
def profile_list_of_slotobj():
return [SlotObj(i) for i in xrange(ITER_COUNT)]
if __name__ == '__main__':
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slotobj()
profile_list_of_slotobj()
Results:
hbrown#hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749

There is no question.
You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).
I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.
About the reasons an object is slower, I think your measurement is not correct.
You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a "custom" object. Although from the language perspective they are the same, they have quite a different implementation.
After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.

Here are my test runs of the very nice script of #Jarrod-Chesney.
For comparison, I also run it against python2 with "range" replaced by "xrange".
By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.
Python 3.6.9:
Time Taken = 0:00:04.971369, profile_dict_of_nt, Size = 944.27
Time Taken = 0:00:05.743104, profile_list_of_nt, Size = 1,066.93
Time Taken = 0:00:02.524507, profile_dict_of_dict, Size = 1,920.35
Time Taken = 0:00:02.123801, profile_list_of_dict, Size = 1,760.9
Time Taken = 0:00:05.374294, profile_dict_of_obj, Size = 1,532.12
Time Taken = 0:00:04.517245, profile_list_of_obj, Size = 1,441.04
Time Taken = 0:00:04.590298, profile_dict_of_slot, Size = 1,030.09
Time Taken = 0:00:04.197425, profile_list_of_slot, Size = 870.67
Time Taken = 0:00:08.833653, profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006, profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105, profile_ordict_of_obj, Size = 1,799.29
Time Taken = 0:00:05.559248, profile_ordict_of_slot, Size = 1,257.75
Python 2.7.15+:
Time Taken = 0:00:05.193900, profile_dict_of_nt, Size = 906.0
Time Taken = 0:00:05.860978, profile_list_of_nt, Size = 1,177.0
Time Taken = 0:00:02.370905, profile_dict_of_dict, Size = 2,228.0
Time Taken = 0:00:02.100117, profile_list_of_dict, Size = 2,036.0
Time Taken = 0:00:08.353666, profile_dict_of_obj, Size = 2,493.0
Time Taken = 0:00:07.441747, profile_list_of_obj, Size = 2,337.0
Time Taken = 0:00:06.118018, profile_dict_of_slot, Size = 1,117.0
Time Taken = 0:00:04.654888, profile_list_of_slot, Size = 964.0
Time Taken = 0:00:59.576874, profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784, profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230, profile_ordict_of_obj, Size = 11,477.0
Time Taken = 0:00:51.485756, profile_ordict_of_slot, Size = 11,193.0
So, on both major versions, the conclusions of #Jarrod-Chesney are still looking good.

There is yet another way with the help of recordclass library to reduce memory usage if data structure isn't supposed to contain reference cycles.
Let's compare two classes:
class DataItem:
__slots__ = ('name', 'age', 'address')
def __init__(self, name, age, address):
self.name = name
self.age = age
self.address = address
and
$ pip install recordclass
>>> from recordclass import make_dataclass
>>> DataItem2 = make_dataclass('DataItem', 'name age address')
>>> inst = DataItem('Mike', 10, 'Cherry Street 15')
>>> inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
>>> print(inst2)
DataItem(name='Mike', age=10, address='Cherry Street 15')
>>> print(sys.getsizeof(inst), sys.getsizeof(inst2))
64 40
It became possible since dataobject-based subclasses doesn't support cyclic garbage collection, which is not needed in such cases.

Related

Poor Python Multiprocessing Performance

I attempted to speed up my python program using the multiprocessing module but I found it was quite slow.
A Toy example is as follows:
import time
from multiprocessing import Pool, Manager
class A:
def __init__(self, i):
self.i = i
def score(self, x):
return self.i - x
class B:
def __init__(self):
self.i_list = list(range(1000))
self.A_list = []
def run_1(self):
for i in self.i_list:
self.x = i
map(self.compute, self.A_list) #map version
self.A_list.append(A(i))
def run_2(self):
p = Pool()
for i in self.i_list:
self.x = i
p.map(self.compute, self.A_list) #multicore version
self.A_list.append(A(i))
def compute(self, some_A):
return some_A.score(self.x)
if __name__ == "__main__":
st = time.time()
foo = B()
foo.run_1()
print("Map: ", time.time()-st)
st = time.time()
foo = B()
foo.run_2()
print("MultiCore: ", time.time()-st)
The outcomes on my computer(Windows 10, Python 3.5) is
Map: 0.0009996891021728516
MultiCore: 19.34994912147522
Similar results can be observed on Linux Machine (CentOS 7, Python 3.6).
I guess it was caused by the pickling/depicking of objects among processes? I tried to use the Manager module but failed to get it to work.
Any help will be appreciated.
Wow that's impressive (and slow!).
Yes, this is because Objects must be accessed concurrently by workers, which is costly.
So I played a little bit and managed to gain a lot of perf by making the compute method static. So basically, you don't need to share the B object instance anymore. Still very slow but better.
import time
from multiprocessing import Pool, Manager
class A:
def __init__(self, i):
self.i = i
def score(self, x):
return self.i - x
x=0
def static_compute(some_A):
res= some_A.score(x)
return res
class B:
def __init__(self):
self.i_list = list(range(1000))
self.A_list = []
def run_1(self):
for i in self.i_list:
x=i
map(self.compute, self.A_list) #map version
self.A_list.append(A(i))
def run_2(self):
p = Pool(4)
for i in self.i_list:
x=i
p.map(static_compute, self.A_list) #multicore version
self.A_list.append(A(i))
The other reason that makes it slow, to me, is the fixed cost of using Pool. You're actually launching a Pool.map 1000 times. If there is a fixed cost associated with launching those processes, that would make the overall strategy slow. Maybe you should test that with longer A_list (longer than the i_list, which requires a different algo).
The reasoning behind this is:
the map call is performed by main
*meaning when foo.run_1() is called. The main is mapping for itself.
much like telling your self what to do.
*when foo_run2() is called the main is mapping for max process capablilites of that pc.
If your max process is 6, then the main is mapping for 6 Threads.
much like orginizing 6 people to tell you something.
Side Note:
if you use:
p.imap(self.compute,self.A_list)
the items will append in order to A_list

Using multiprocessing module in class

I have the following program and I want to use multiprocessing module. It uses external files, in which I call the PSO class from another file. costfunc is a function from another file and the other args are just variables.
Swarm is a list containing as much objects as the value of ps, and each object has multiple attributes which need to update at every iteration.
Following Hannu implemented multiprocessing.pool and it is working, however it is taking much more time than running sequentially.
Would appreciate if you could tell me what are the reasons for it happening and how can I make it run faster?
# IMPORT PACKAGES -----------------------------------------------------------+
import random
import numpy as np
# IMPORT FILES --------------------------------------------------------------+
from Reducer import initial
# Particle Class ------------------------------------------------------------+
class Particle:
def __init__(self,D,bounds_x,bounds_v):
self.Position_i = [] # particle position
self.Velocity_i = [] # particle velocity
self.Cost_i = -1 # cost individual
self.Position_Best_i = [] # best position individual
self.Cost_Best_i = -1 # best cost individual
self.Constraint_Best_i = [] # best cost individual contraints
self.Constraint_i = [] # constraints individual
self.Penalty_i = -1 # constraints individual
x0,v0 = initial(D,bounds_x,bounds_v)
for i in range(0,D):
self.Velocity_i.append(v0[i])
self.Position_i.append(x0[i])
# evaluate current fitness
def evaluate(self,costFunc,i):
self.Cost_i, self.Constraint_i,self.Penalty_i = costFunc(self.Position_i,i)
# check to see if the current position is an individual best
if self.Cost_i < self.Cost_Best_i or self.Cost_Best_i == -1:
self.Position_Best_i = self.Position_i
self.Cost_Best_i = self.Cost_i
self.Constraint_Best_i = self.Constraint_i
self.Penalty_Best_i = self.Penalty_i
return self
def proxy(gg, costf, i):
print(gg.evaluate(costf, i))
# Swarm Class ---------------------------------------------------------------+
class PSO():
def __init__(self,costFunc,bounds_x,bounds_v,ps,D,maxiter):
self.Cost_Best_g = -1 # Best Cost for Group
self.Position_Best_g = [] # Best Position for Group
self.Constraint_Best_g = []
self.Penalty_Best_g = -1
# Establish Swarm
Swarm = []
for i in range(0,ps):
Swarm.append(Particle(D,bounds_x,bounds_v))
# Begin optimization Loop
i = 1
self.Evol = []
while i <= maxiter:
pool = multiprocessing.Pool(processes = 4)
results = pool.map_async(partial(proxy, costf = costFunc, i=i), Swarm)
pool.close()
pool.join()
Swarm = results.get()
if Swarm[j].Cost_i< self.Cost_Best_g or self.Cost_Best_g == -1:
self.Position_Best_g = list(Swarm[j].Position_i)
self.Cost_Best_g = float(Swarm[j].Cost_i)
self.Constraint_Best_g = list(Swarm[j].Constraint_i)
self.Penalty_Best_g = float(Swarm[j].Penalty_i)
self.Evol.append(self.Cost_Best_g)
i += 1
You need a proxy function to do the function call, and as you need to deliver arguments to the function, you will need partial as well. Consider this:
from time import sleep
from multiprocessing import Pool
from functools import partial
class Foo:
def __init__(self, a):
self.a = a
self.b = None
def evaluate(self, CostFunction, i):
xyzzy = CostFunction(i)
sleep(0.01)
self.b = self.a*xyzzy
return self
def CostFunc(i):
return i*i
def proxy(gg, costf, i):
return gg.evaluate(costf, i)
def main():
Swarm = []
for i in range(0,10):
nc = Foo(i)
Swarm.append(nc)
p = Pool()
for i in range(100,102):
results = p.map_async(partial(proxy, costf=CostFunc, i=i), Swarm)
p.close()
p.join()
Swarm = []
for a in results.get():
Swarm.append(a)
for s in Swarm:
print (s.b)
main()
This creates a Swarm list of objects, and within each of these objects is evaluate that is the function you need to call. Then we have parameters (CostFunc and an integer as in your code).
We will now use Pool.map_async to map your Swarm list to your pool. This gives each worker one instance of Foo from your Swarm list, and we have a proxy function that actually calls then evaluate().
However, as apply_async only sends an object from the iterable to the function, instead of using proxy as the target function to pool, we use partial to create the target function to pass the "fixed" arguments.
And as you apparently want to get the modified objects back, this requires another trick. If you modify the target object in Pool process, it just modifies the local copy and throws it away as soon as the processing completes. There would be no way for the subprocess to modify main process memory anyway (or vice versa), this would cause a segmentation fault.
Instead, after modifying the object, we return self. When your pool has completed its work, we discard the old Swarm and reassemble it from the result objects.

Python27: random() after a setstate() doesn't produce the same random number

I have been subclassing an Python's random number generator to make a generator that doesn't repeat results (it's going to be used to generate unique id's for a simulator) and I was just testing to see if it was consistent in it's behavior after it has been loaded from a previours state
Before people ask:
It's a singleton class
No there's nothing else that should be using that instance (a tear down sees to that)
Yes I tested it without the singleton instance to check
and yes when I create this subclass I do call a new instance ( super(nrRand,self).__init__())
And yes according to another post I should get consistent results see: Rolling back the random number generator in python?
Below is my test code:
def test_stateSavingConsitantcy(self):
start = int(self.r.random())
for i in xrange(start):
self.r.random()
state = self.r.getstate()
next = self.r.random()
self.r.setstate(state)
nnext = self.r.random()
self.assertEqual(next, nnext, "Number generation not constant got {0} expecting {1}".format(nnext,next))
Any help that can be provided would greatly appreciated
EDIT:
Here is my subclass as requested
class Singleton(type):
_instances = {}
def __call__(self, *args, **kwargs):
if self not in self._instances:
self._instances[self] = super(Singleton,self).__call__(*args,**kwargs)
return self._instances[self]
class nrRand(Random):
__metaclass__ = Singleton
'''
classdocs
'''
def __init__(self):
'''
Constructor
'''
super(nrRand,self).__init__()
self.previous = []
def random(self):
n = super(nrRand,self).random()
while n in self.previous:
n = super(nrRand,self).random()
self.previous.append(n)
return n
def seed(self,x):
if x is None:
x = long(time.time()*1000)
self.previous = []
count = x
nSeed = 0
while count < 0:
nSeed = super(nrRand,self).random()
count -= 1
super(nrRand,self).seed(nSeed)
while nSeed < 0:
super(nrRand,self).seed(nSeed)
count -= 1
def getstate(self):
return (self.previous, super(nrRand,self).getstate())
def setstate(self,state):
self.previous = state[0]
super(nrRand,self).setstate(state[1])
getstate and setstate only manipulate the state the Random class knows about; neither method knows that you also need to roll back the set of previously-generated numbers. You're rolling back the state inherited from Random, but then the object sees that it's already produced the next number and skips it. If you want getstate and setstate to work properly, you'll have to override them to set the state of the set of already-generated numbers.
UPDATE:
def getstate(self):
return (self.previous, super(nrRand,self).getstate())
This shouldn't directly use self.previous. Since you don't make a copy, you're returning the actual object used to keep track of what numbers have been produced. When the RNG produces a new number, the state returned by getstate reflects the new number. You need to copy self.previous, like so:
def getstate(self):
return (self.previous[:], super(nrRand, self).getstate())
I also recommend making a copy in setstate:
def setstate(self, state):
previous, parent_state = state
self.previous = previous[:]
super(nrRand, self).setstate(parent_state)

How do you get access to the dictionary under traits.api.Dict()?

Here is an example of failure from a shell.
>>> from traits.api import Dict
>>> d=Dict()
>>> d['Foo']='BAR'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Dict' object does not support item assignment
I have been searching all over the web, and there is no indication of how to use Dict.
I am trying to write a simple app that displays the contents of a python dictionary. This link (Defining view elements from dictionary elements in TraitsUI) was moderately helpful except for the fact that the dictionary gets updated on some poll_interval and if I use the solution there (wrapping a normal python dict in a class derived from HasTraits) the display does not update when the underlying dictionary gets updated.
Here are the relevant parts of what I have right now. The last class can pretty much be ignored, the only reason I included it is to help understand how I intend to use the Dict.
pyNetObjDisplay.run_ext() gets called once per loop from the base classes run() method
class DictContainer(HasTraits):
_dict = {}
def __getattr__(self, key):
return self._dict[key]
def __getitem__(self, key):
return self._dict[key]
def __setitem__(self, key, value):
self._dict[key] = value
def __delitem__(self, key, value):
del self._dict[key]
def __str__(self):
return self._dict.__str__()
def __repr__(self):
return self._dict.__repr__()
def has_key(self, key):
return self._dict.has_key(key)
class displayWindow(HasTraits):
_remote_data = Instance(DictContainer)
_messages = Str('', desc='Field to display messages to the user.', label='Messages', multi_line=True)
def __remote_data_default(self):
tempDict = DictContainer()
tempDict._dict = Dict
#tempDict['FOO'] = 'BAR'
sys.stderr.write('SETTING DEFAULT DICTIONARY:\t%s\n' % tempDict)
return tempDict
def __messages_default(self):
tempStr = Str()
tempStr = ''
return tempStr
def traits_view(self):
return View(
Item('object._remote_data', editor=ValueEditor()),
Item('object._messages'),
resizable=True
)
class pyNetObjDisplay(pyNetObject.pyNetObjPubClient):
'''A derived pyNetObjPubClient that stores remote data in a dictionary and displays it using traitsui.'''
def __init__(self, hostname='localhost', port=54322, service='pyNetObject', poll_int=10.0):
self._display = displayWindow()
self.poll_int = poll_int
super(pyNetObjDisplay, self).__init__(hostname, port, service)
self._ui_running = False
self._ui_pid = 0
### For Testing Only, REMOVE THESE LINES ###
self.connect()
self.ns_subscribe(service, 'FOO', poll_int)
self.ns_subscribe(service, 'BAR', poll_int)
self.ns_subscribe(service, 'BAZ', poll_int)
############################################
def run_ext(self):
if not self._ui_running:
self._ui_running = True
self._ui_pid = os.fork()
if not self._ui_pid:
time.sleep(1.25*self.poll_int)
self._display.configure_traits()
for ((service, namespace, key), value) in self._object_buffer:
sys.stderr.write('TEST:\t' + str(self._display._remote_data) + '\n')
if not self._display._remote_data.has_key(service):
self._display._remote_data[service] = {}
if not self._display._remote_data[service].has_key(namespace):
#self._remote_data[service][namespace] = {}
self._display._remote_data[service][namespace] = {}
self._display._remote_data[service][namespace][key] = value
msg = 'Got Published ((service, namespace, key), value) pair:\t((%s, %s, %s), %s)\n' % (service, namespace, key, value)
sys.stderr.write(msg)
self._display._messages += msg
sys.stderr.write('REMOTE DATA:\n' + str(self._display._remote_data)
self._object_buffer = []
I think your basic problem has to do with notification issues for traits that live outside the model object, and not with "how to access those objects" per se [edit: actually no this is not your problem at all! But it is what I thought you were trying to do when I read your question with my biased mentality towards problems I have seen before and in any case my suggested solution will still work]. I have run into this sort of problem recently because of how I decided to design my program (with code describing a GUI separated modularly from the very complex sets of data that it can contain). You may have found my other questions, as you found the first one.
Having lots of data live in a complex data hierarchy away from the GUI is not the design that traitsui has in mind for your application and it causes all kinds of problems with notifications. Having a flatter design where GUI information is integrated into the different parts of your program more directly is the design solution.
I think that various workarounds might be possible for this in general (I have used some for instance in enabled_when listening outside model object) that don't involve dictionaries. I'm not sure what the most design friendly solution to your problem with dictionaries is, but one thing that works and doesn't interfere a lot with your design (but it is still a "somewhat annoying" solution) is to make everything in a dictionary be a HasTraits and thus tag it as listenable. Like so:
from traits.api import *
from traitsui.api import *
from traitsui.ui_editors.array_view_editor import ArrayViewEditor
import numpy as np
class DContainer(HasTraits):
_dict=Dict
def __getattr__(self, k):
if k in self._dict:
return self._dict[k]
class DItem(HasTraits):
_item=Any
def __init__(self,item):
super(DItem,self).__init__()
self._item=item
def setitem(self,val):
self._item=val
def getitem(self):
return self._item
def traits_view(self):
return View(Item('_item',editor=ArrayViewEditor()))
class LargeApplication(HasTraits):
d=Instance(DContainer)
stupid_listener=Any
bn=Button('CLICKME')
def _d_default(self):
d=DContainer()
d._dict={'a_stat':DItem(np.random.random((10,1))),
'b_stat':DItem(np.random.random((10,10)))}
return d
def traits_view(self):
v=View(
Item('object.d.a_stat',editor=InstanceEditor(),style='custom'),
Item('bn'),
height=500,width=500)
return v
def _bn_fired(self):
self.d.a_stat.setitem(np.random.random((10,1)))
LargeApplication().configure_traits()
Okay, I found the answer (kindof) in this question: Traits List not reporting items added or removed
when including Dict or List objects as attributes in a class one should NOT do it this way:
class Foo(HasTraits):
def __init__(self):
### This will not work as expected!
self.bar = Dict(desc='Description.', label='Name', value={})
Instead do this:
class Foo(HasTraits):
def __init__(self):
self.add_trait('bar', Dict(desc='Description.', label='Name', value={}) )
Now the following will work:
>>> f = Foo()
>>> f.bar['baz']='boo'
>>> f.bar['baz']
'boo'
Unfortunately for some reason the GUI generated with configure_traits() does not update it's view when the underlying data changes. Here is some test code that demonstrates the problem:
import os
import time
import sys
from traits.api import HasTraits, Str, Dict
from traitsui.api import View, Item, ValueEditor
class displayWindow(HasTraits):
def __init__(self, **traits):
super(displayWindow, self).__init__(**traits)
self.add_trait('_remote_data', Dict(desc='Dictionary to store remote data in.', label='Data', value={}) )
self.add_trait('_messages', Str(desc='Field to display messages to the user.', label='Messages', multi_line=True, value='') )
def traits_view(self):
return View(
Item('object._remote_data', editor=ValueEditor()),
Item('object._messages'),
resizable=True
)
class testObj(object):
def __init__(self):
super(testObj, self).__init__()
self._display = displayWindow()
self._ui_pid = 0
def run(self):
### Run the GUI in the background
self._ui_pid = os.fork()
if not self._ui_pid:
self._display.configure_traits()
i = 0
while True:
self._display._remote_data[str(i)] = i
msg = 'Added (key,value):\t("%s", %s)\n' % (str(i), i, )
self._display._messages += msg
sys.stderr.write(msg)
time.sleep(5.0)
i+=1
if __name__ == '__main__':
f = testObj()
f.run()

Segmentation fault 11, python hash with lists, hashing 1 million objects

When I try to make and hash objects from a file, containing one million songs, I get a weird segmentation error after about 12000 succesfull hashes.
Anyone have any idea why this:
Segmentation fault: 11
happens when I run the program?
I have these classes for hashing the objects:
class Node():
def __init__(self, key, value = None):
self.key = key
self.value = value
def __str__(self):
return str(self.key) + " : " + str(self.value)
class Hashtable():
def __init__(self, hashsize, hashlist = [None]):
self.hashsize = hashsize*2
self.hashlist = hashlist*(self.hashsize)
def __str__(self):
return self.hashlist
def hash_num(self, name):
result = 0
name_list = list(name)
for letter in name_list:
result = (result*self.hashsize + ord(letter))%self.hashsize
return result
def check(self, num):
if self.hashlist[num] != None:
num = (num + 11**2)%self.hashsize#Kolla här jättemycket!
chk_num = self.check(num)#här med
return chk_num#lär dig
else:
return num
def check_atom(self, num, name):
if self.hashlist[num].key == name:
return num
else:
num = (num + 11**2)%self.hashsize
chk_num = self.check_atom(num, name)#läs här
return chk_num#läs det här
def put(self, name, new_atom):
node = Node(name)
node.value = new_atom
num = self.hash_num(name)
chk_num = self.check(num)
print(chk_num)
self.hashlist[chk_num] = node
def get(self, name):
num = self.hash_num(name)
chk_num = self.check_atom(num, name)
atom = self.hashlist[chk_num]
return atom.value
And I call upon the function in this code:
from time import *
from hashlist import *
import sys
sys.setrecursionlimit(1000000000)
def lasfil(filnamn, h):
with open(filnamn, "r", encoding="utf-8") as fil:
for rad in fil:
data = rad.split("<SEP>")
artist = data[2].strip()
song = data[3].strip()
h.put(artist, song)
def hitta(artist, h):
try:
start = time()
print(h.get(artist))
stop = time()
tidhash = stop - start
return tidhash
except AttributeError:
pass
h = Hashtable(1000000)
lasfil("write.txt", h)
The reason you're getting a segmentation fault is this line:
sys.setrecursionlimit(1000000000)
I assume you added it because you received a RuntimeError: maximum recursion depth exceeded. Raising the recursion limit doesn't allocate any more memory for the call stack, it just defers the aforementioned exception. If you set it too high, the interpreter runs out of stack space and accesses memory that doesn't belong to it, causing random errors (likely segfaults, but in theory anything is possible).
The real solution is to not use unbounded recursion. For things like balanced search trees, where the recursion depth is limited to a few dozen levels, it's okay, but you can't replace long loops with recursion.
Also, unless this is an exercise in creating hash tables, you should just use the built in dict. If it is an exercise in creating hash tables, consider this a hint that something about your hash table sucks: It indicates a probe length of at least 1000, more likely several thousand. It should only be a few dozen at most, ideally in the single digits.

Categories