Understanding specific python dictionary creation using for i loop - python

I just started reading through Dave Peticolas tutorial on Twisted (http://krondo.com/blog/?p=1247), and quickly going through his early examples using Python sockets, I came across a line of code that I can't wrap my head around. The code is on his github, specifically https://github.com/jdavisp3/twisted-intro/blob/master/async-client/get-poetry.py, but the context doesn't really matter.
Here is the line:
sock2task = dict([(s, i + 1) for i, s in enumerate(sockets)])
Where sockets is a list of sockets.
This line will create a dictionary in the form of
{<sock3 object>: 3, <sock3 object>: 2, <sock3 object>: 1}
however, I just don't understand how.
Trying to get an equivalent statement, I came up with
sock2task = dict(enumerate(sockets, start=1))
however this results in
{1: <sock3 object>, 2: <sock3 object>, 3: <sock3 object>}
which has the keys and values swapped, and is in reverse.
So how does it work? In the full code, neither s or i are defined..
Thanks, Matt

Your line in question is the initialization of a dictionary using list comprehension. To break it down:
A dict can be initialized like this
dict = dict([(key0, value0), ...)]) # make a dictionary out of a list of tuples
The list comprehension in the book is made up of following components:
1.
# "for every index i and corresponding entry s in sockets"
for i, s in enumerate(sockets)
2.
# a tuple of the socket s and its index + 1: `i + 1`
(s, i + 1)
3.
# "Make a list in which for every index i
# and corresponding entry s in sockets there is a tuple (s, i + 1)"
[(s, i + 1) for i, s in enumerate(sockets)]
And so:
# "Convert this whole thing into a dictionary!"
dict([(s, i + 1) for i, s in enumerate(sockets)])
An equivalent code would be:
sock2task = {}
for index, socket in enumerate(sockets):
sock2task[socket] = index + 1
The outputted dictionary starts at 3 by coincidence, because dictionaries are not ordered.
I hope it is clearer now.

You have to dive into list comprehensions to get some necessary background.
Despite that, you can create a dictionary from a list of pairs, where the first value will be assigned as key and the second as value. The following code has the same idea but it will help you to understand how it works.
result = {}
for index, s in enumerate(sockets):
result[s] = index + 1
Your code is getting the inverse result because you did not inverted the values returned by enumerate (as you can see in my example and at the original code).

sock2task = dict([(s, i + 1) for i, s in enumerate(sockets)])
^
This is a list comprehension + tuple unpacking, cast to a dict.
A list comprehension takes the form of [f(i) for i in iterable]. In this case, iterable is enumerate(sockets), which creates a tuple of two elements.
Tuples can be unpacked. ie i,j = (0,1) would assign 0 to i and 1 to j.
So, basically, i and s are created where I have the caret sign pointing.
(Incidentally, in python 2.7+ you can also use the dictionary comprehension. sock2task = { s: i + 1 for i,s in enumerate(sockets)}

Related

The following code is gives the output as i = 0, 1, 2, 3, 4 for some reason. Can anyone explain how this is happening? [duplicate]

Let's say we have a Python dictionary d, and we're iterating over it like so:
for k, v in d.iteritems():
del d[f(k)] # remove some item
d[g(k)] = v # add a new item
(f and g are just some black-box transformations.)
In other words, we try to add/remove items to d while iterating over it using iteritems.
Is this well defined? Could you provide some references to support your answer?
See also How to avoid "RuntimeError: dictionary changed size during iteration" error? for the separate question of how to avoid the problem.
Alex Martelli weighs in on this here.
It may not be safe to change the container (e.g. dict) while looping over the container.
So del d[f(k)] may not be safe. As you know, the workaround is to use d.copy().items() (to loop over an independent copy of the container) instead of d.iteritems() or d.items() (which use the same underlying container).
It is okay to modify the value at an existing index of the dict, but inserting values at new indices (e.g. d[g(k)] = v) may not work.
It is explicitly mentioned on the Python doc page (for Python 2.7) that
Using iteritems() while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
Similarly for Python 3.
The same holds for iter(d), d.iterkeys() and d.itervalues(), and I'll go as far as saying that it does for for k, v in d.items(): (I can't remember exactly what for does, but I would not be surprised if the implementation called iter(d)).
You cannot do that, at least with d.iteritems(). I tried it, and Python fails with
RuntimeError: dictionary changed size during iteration
If you instead use d.items(), then it works.
In Python 3, d.items() is a view into the dictionary, like d.iteritems() in Python 2. To do this in Python 3, instead use d.copy().items(). This will similarly allow us to iterate over a copy of the dictionary in order to avoid modifying the data structure we are iterating over.
I have a large dictionary containing Numpy arrays, so the dict.copy().keys() thing suggested by #murgatroid99 was not feasible (though it worked). Instead, I just converted the keys_view to a list and it worked fine (in Python 3.4):
for item in list(dict_d.keys()):
temp = dict_d.pop(item)
dict_d['some_key'] = 1 # Some value
I realize this doesn't dive into the philosophical realm of Python's inner workings like the answers above, but it does provide a practical solution to the stated problem.
The following code shows that this is not well defined:
def f(x):
return x
def g(x):
return x+1
def h(x):
return x+10
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[g(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[h(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
The first example calls g(k), and throws an exception (dictionary changed size during iteration).
The second example calls h(k) and throws no exception, but outputs:
{21: 'axx', 22: 'bxx', 23: 'cxx'}
Which, looking at the code, seems wrong - I would have expected something like:
{11: 'ax', 12: 'bx', 13: 'cx'}
Python 3 you should just:
prefix = 'item_'
t = {'f1': 'ffw', 'f2': 'fca'}
t2 = dict()
for k,v in t.items():
t2[k] = prefix + v
or use:
t2 = t1.copy()
You should never modify original dictionary, it leads to confusion as well as potential bugs or RunTimeErrors. Unless you just append to the dictionary with new key names.
This question asks about using an iterator (and funny enough, that Python 2 .iteritems iterator is no longer supported in Python 3) to delete or add items, and it must have a No as its only right answer as you can find it in the accepted answer. Yet: most of the searchers try to find a solution, they will not care how this is done technically, be it an iterator or a recursion, and there is a solution for the problem:
You cannot loop-change a dict without using an additional (recursive) function.
This question should therefore be linked to a question that has a working solution:
How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete")
Also helpful as it shows how to change the items of a dict on the run: How can I replace a key:value pair by its value wherever the chosen key occurs in a deeply nested dictionary? (= "replace").
By the same recursive methods, you will also able to add items as the question asks for as well.
Since my request to link this question was declined, here is a copy of the solution that can delete items from a dict. See How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete") for examples / credits / notes.
import copy
def find_remove(this_dict, target_key, bln_overwrite_dict=False):
if not bln_overwrite_dict:
this_dict = copy.deepcopy(this_dict)
for key in this_dict:
# if the current value is a dict, dive into it
if isinstance(this_dict[key], dict):
if target_key in this_dict[key]:
this_dict[key].pop(target_key)
this_dict[key] = find_remove(this_dict[key], target_key)
return this_dict
dict_nested_new = find_remove(nested_dict, "sub_key2a")
The trick
The trick is to find out in advance whether a target_key is among the next children (= this_dict[key] = the values of the current dict iteration) before you reach the child level recursively. Only then you can still delete a key:value pair of the child level while iterating over a dictionary. Once you have reached the same level as the key to be deleted and then try to delete it from there, you would get the error:
RuntimeError: dictionary changed size during iteration
The recursive solution makes any change only on the next values' sub-level and therefore avoids the error.
I got the same problem and I used following procedure to solve this issue.
Python List can be iterate even if you modify during iterating over it.
so for following code it will print 1's infinitely.
for i in list:
list.append(1)
print 1
So using list and dict collaboratively you can solve this problem.
d_list=[]
d_dict = {}
for k in d_list:
if d_dict[k] is not -1:
d_dict[f(k)] = -1 # rather than deleting it mark it with -1 or other value to specify that it will be not considered further(deleted)
d_dict[g(k)] = v # add a new item
d_list.append(g(k))
Today I had a similar use-case, but instead of simply materializing the keys on the dictionary at the beginning of the loop, I wanted changes to the dict to affect the iteration of the dict, which was an ordered dict.
I ended up building the following routine, which can also be found in jaraco.itertools:
def _mutable_iter(dict):
"""
Iterate over items in the dict, yielding the first one, but allowing
it to be mutated during the process.
>>> d = dict(a=1)
>>> it = _mutable_iter(d)
>>> next(it)
('a', 1)
>>> d
{}
>>> d.update(b=2)
>>> list(it)
[('b', 2)]
"""
while dict:
prev_key = next(iter(dict))
yield prev_key, dict.pop(prev_key)
The docstring illustrates the usage. This function could be used in place of d.iteritems() above to have the desired effect.

Python - Updating value in one dictionary is updating value in all dictionaries

I have a list of dictionaries called lod. All dictionaries have the same keys but different values. I am trying to update one specific value in the list of values for the same key in all the dictionaries.
I am attempting to do it with the following for loop:
for i in range(len(lod)):
a=lod[i][key][:]
a[p]=a[p]+lov[i]
lod[i][key]=a
What's happening is each is each dictionary is getting updated len(lod) times so lod[0][key][p] is supposed to have lov[0] added to it but instead it is getting lov[0]+lov[1]+.... added to it.
What am I doing wrong?
Here is how I declared the list of dicts:
lod = [{} for _ in range(len(dataul))]
for j in range(len(dataul)):
for i in datakl:
rrdict[str.split(i,',')[0]]=list(str.split(i,',')[1:len(str.split(i,','))])
lod[j]=rrdict
The problem is in how you created the list of dictionaries. You probably did something like this:
list_of_dicts = [{}] * 20
That's actually the same dict 20 times. Try doing something like this:
list_of_dicts = [{} for _ in range(20)]
Without seeing how you actually created it, this is only an example solution to an example problem.
To know for sure, print this:
[id(x) for x in list_of_dicts]
If you defined it in the * 20 method, the id is the same for each dict. In the list comprehension method, the id is unique.
This it where the trouble starts: lod[j] = rrdict. lod itself is created properly with different dictionaries. Unfortunately, afterwards any references to the original dictionaries in the list get overwritten with a reference to rrdict. So in the end, the list contains only references to one single dictionary. Here is some more pythonic and readable way to solve your problem:
lod = [{} for _ in range(len(dataul))]
for rrdict in lod:
for line in datakl:
splt = line.split(',')
rrdict[splt[0]] = splt[1:]
You created the list of dictionaries correctly, as per other answer.
However, when you are updating individual dictionaries, you completely overwrite the list.
Removing noise from your code snippet:
lod = [{} for _ in range(whatever)]
for j in range(whatever):
# rrdict = lod[j] # Uncomment this as a possible fix.
for i in range(whatever):
rrdict[somekey] = somevalue
lod[j] = rrdict
Assignment on the last line throws away the empty dict that was in lod[j] and inserts a reference to the object represented by rrdict.
Not sure what your code does, but see a commented-out line - it might be the fix you are looking for.

Python large list manipulation

I have python list like below:
DEMO_LIST = [
[{'unweighted_criket_data': [-46.14554728131345, 2.997789122813151, -23.66171024766996]},
{'weighted_criket_index_input': [-6.275794430258629, 0.4076993207025885, -3.2179925936831144]},
{'manual_weighted_cricket_data': [-11.536386820328362, 0.7494472807032877, -5.91542756191749]},
{'average_weighted_cricket_data': [-8.906090625293496, 0.5785733007029381, -4.566710077800302]}],
[{'unweighted_football_data': [-7.586729834820534, 3.9521665714843675, 5.702038461085529]},
{'weighted_football_data': [-3.512655913521907, 1.8298531225972623, 2.6400438074826]},
{'manual_weighted_football_data': [-1.8966824587051334, 0.9880416428710919, 1.4255096152713822]},
{'average_weighted_football_data': [-2.70466918611352, 1.4089473827341772, 2.0327767113769912]}],
[{'unweighted_rugby_data': [199.99999999999915, 53.91020408163265, -199.9999999999995]},
{'weighted_rugby_data': [3.3999999999999857, 0.9164734693877551, -3.3999999999999915]},
{'manual_rugby_data': [49.99999999999979, 13.477551020408162, -49.99999999999987]},
{'average_weighted_rugby_data': [26.699999999999886, 7.197012244897959, -26.699999999999932]}],
[{'unweighted_swimming_data': [2.1979283454982053, 14.079951031527246, -2.7585499298828777]},
{'weighted_swimming_data': [0.8462024130168091, 5.42078114713799, -1.062041723004908]},
{'manual_weighted_swimming_data': [0.5494820863745513, 3.5199877578818115, -0.6896374824707194]},
{'average_weighted_swimming_data': [0.6978422496956802, 4.470384452509901, -0.8758396027378137]}]]
I want to manipulate list items and do some basic math operation,like getting each data type list (example taking all first element of unweighted data and do sum etc)
Currently I am doing it like this.
The current solution is a very basic one, I want to do it in such way that if the list length is grown, it can automatically calculate the results. Right now there are four list, it can be 5 or 8,the final result should be the summation of all the first element of unweighted values,example:
now I am doing result_u1/4,result_u2/4,result_u3/4
I want it like result_u0/4,result_u1/4.......result_n4/4 # n is the number of list inside demo list
Any idea how I can do that?
(sorry for the beginner question)
You can implement a specific list class for yourself, that adds your summary with new item's values in append function, or decrease them on remove:
class MyList(list):
def __init__(self):
self.summary = 0
list.__init__(self)
def append(self, item):
self.summary += item.sample_value
list.append(self, item)
def remove(self, item):
self.summary -= item.sample_value
list.remove(self, item)
And a simple usage:
my_list = MyList()
print my_list.summary # Outputs 0
my_list.append({'sample_value': 10})
print my_list.summary # Outputs 10
In Python, whenever you start counting how many there are of something inside an iterable (a string, a list, a set, a collection of any of these) in order to loop over it - its a sign that your code can be revised.
Things can can work for 3 of something, can work for 300, 3000 and 3 million of the same thing without changing your code.
In your case, your logic is - "For every X inside DEMO_LIST, do something"
This translated into Python is:
for i in DEMO_LIST:
# do something with i
This snippet will run through any size of DEMO_LIST and each time i is each of whatever is in side DEMO_LIST. In your case it is the list that contains your dictionaries.
Further expanding on that, you can say:
for i in DEMO_LIST:
for k in i:
# now you are in each list that is inside the outer DEMO_LIST
Expanding this to do a practical example; a sum of all unweighted_criket_data:
all_unweighted_cricket_data = []
for i in DEMO_LIST:
for k in i:
if 'unweighted_criket_data' in k:
for data in k['unweighted_cricket_data']:
all_unweighted_cricked_data.append(data)
sum_of_data = sum(all_unweighted_cricket_data)
There are various "shortcuts" to do the same, but you can appreciate those once you understand the "expanded" version of what the shortcut is trying to do.
Remember there is nothing wrong with writing it out the 'long way' especially when you are not sure of the best way to do something. Once you are comfortable with the logic, then you can use shortcuts like list comprehensions.
Start by replacing this:
for i in range(0,len(data_list)-1):
result_u1+=data_list[i][0].values()[0][0]
result_u2+=data_list[i][0].values()[0][1]
result_u3+=data_list[i][0].values()[0][2]
print "UNWEIGHTED",result_u1/4,result_u2/4,result_u3/4
With this:
sz = len(data_list[i][0].values()[0])
result_u = [0] * sz
for i in range(0,len(data_list)-1):
for j in range(0,sz):
result_u[j] += data_list[i][0].values()[0][j]
print "UNWEIGHTED", [x/len(data_list) for x in result_u]
Apply similar changes elsewhere. This assumes that your data really is "rectangular", that is to say every corresponding inner list has the same number of values.
A slightly more "Pythonic"[*] version of:
for j in range(0,sz):
result_u[j] += data_list[i][0].values()[0][j]
is:
for j, dataval in enumerate(data_list[i][0].values()[0]):
result_u[j] += dataval
There are some problems with your code, though:
values()[0] might give you any of the values in the dictionary, since dictionaries are unordered. Maybe it happens to give you the unweighted data, maybe not.
I'm confused why you're looping on the range 0 to len(data_list)-1: if you want to include all the sports you need 0 to len(data_list), because the second parameter to range, the upper limit, is excluded.
You could perhaps consider reformatting your data more like this:
DEMO_LIST = {
'cricket' : {
'unweighted' : [1,2,3],
'weighted' : [4,5,6],
'manual' : [7,8,9],
'average' : [10,11,12],
},
'rugby' : ...
}
Once you have the same keys in each sport's dictionary, you can replace values()[0] with ['unweighted'], so you'll always get the right dictionary entry. And once you have a whole lot of dictionaries all with the same keys, you can replace them with a class or a named tuple, to define/enforce that those are the values that must always be present:
import collections
Sport = collections.namedtuple('Sport', 'unweighted weighted manual average')
DEMO_LIST = {
'cricket' : Sport(
unweighted = [1,2,3],
weighted = [4,5,6],
manual = [7,8,9],
average = [10,11,12],
),
'rugby' : ...
}
Now you can replace ['unweighted'] with .unweighted.
[*] The word "Pythonic" officially means something like, "done in the style of a Python programmer, taking advantage of any useful Python features to produce the best idiomatic Python code". In practice it usually means "I prefer this, and I'm a Python programmer, therefore this is the correct way to write Python". It's an argument by authority if you're Guido van Rossum, or by appeal to nebulous authority if you're not. In almost all circumstances it can be replaced with "good IMO" without changing the sense of the sentence ;-)

How to compare an element of a tuple (int) to determine if it exists in a list

I have the two following lists:
# List of tuples representing the index of resources and their unique properties
# Format of (ID,Name,Prefix)
resource_types=[('0','Group','0'),('1','User','1'),('2','Filter','2'),('3','Agent','3'),('4','Asset','4'),('5','Rule','5'),('6','KBase','6'),('7','Case','7'),('8','Note','8'),('9','Report','9'),('10','ArchivedReport',':'),('11','Scheduled Task',';'),('12','Profile','<'),('13','User Shared Accessible Group','='),('14','User Accessible Group','>'),('15','Database Table Schema','?'),('16','Unassigned Resources Group','#'),('17','File','A'),('18','Snapshot','B'),('19','Data Monitor','C'),('20','Viewer Configuration','D'),('21','Instrument','E'),('22','Dashboard','F'),('23','Destination','G'),('24','Active List','H'),('25','Virtual Root','I'),('26','Vulnerability','J'),('27','Search Group','K'),('28','Pattern','L'),('29','Zone','M'),('30','Asset Range','N'),('31','Asset Category','O'),('32','Partition','P'),('33','Active Channel','Q'),('34','Stage','R'),('35','Customer','S'),('36','Field','T'),('37','Field Set','U'),('38','Scanned Report','V'),('39','Location','W'),('40','Network','X'),('41','Focused Report','Y'),('42','Escalation Level','Z'),('43','Query','['),('44','Report Template ','\\'),('45','Session List',']'),('46','Trend','^'),('47','Package','_'),('48','RESERVED','`'),('49','PROJECT_TEMPLATE','a'),('50','Attachments','b'),('51','Query Viewer','c'),('52','Use Case','d'),('53','Integration Configuration','e'),('54','Integration Command f'),('55','Integration Target','g'),('56','Actor','h'),('57','Category Model','i'),('58','Permission','j')]
# This is a list of resource ID's that we do not want to reference directly, ever.
unwanted_resource_types=[0,1,3,10,11,12,13,14,15,16,18,20,21,23,25,27,28,32,35,38,41,47,48,49,50,57,58]
I'm attempting to compare the two in order to build a third list containing the 'Name' of each unique resource type that currently exists in unwanted_resource_types. e.g. The final result list should be:
result = ['Group','User','Agent','ArchivedReport','ScheduledTask','...','...']
I've tried the following that (I thought) should work:
result = []
for res in resource_types:
if res[0] in unwanted_resource_types:
result.append(res[1])
and when that failed to populate result I also tried:
result = []
for res in resource_types:
for type in unwanted_resource_types:
if res[0] == type:
result.append(res[1])
also to no avail. Is there something i'm missing? I believe this would be the right place to perform list comprehension, but that's still in my grey basket of understanding fully (The Python docs are a bit too succinct for me in this case).
I'm also open to completely rethinking this problem, but I do need to retain the list of tuples as it's used elsewhere in the script. Thank you for any assistance you may provide.
Your resource types are using strings, and your unwanted resources are using ints, so you'll need to do some conversion to make it work.
Try this:
result = []
for res in resource_types:
if int(res[0]) in unwanted_resource_types:
result.append(res[1])
or using a list comprehension:
result = [item[1] for item in resource_types if int(item[0]) in unwanted_resource_types]
The numbers in resource_types are numbers contained within strings, whereas the numbers in unwanted_resource_types are plain numbers, so your comparison is failing. This should work:
result = []
for res in resource_types:
if int( res[0] ) in unwanted_resource_types:
result.append(res[1])
The problem is that your triples contain strings and your unwanted resources contain numbers, change the data to
resource_types=[(0,'Group','0'), ...
or use int() to convert the strings to ints before comparison, and it should work. Your result can be computed with a list comprehension as in
result=[rt[1] for rt in resource_types if int(rt[0]) in unwanted_resource_types]
If you change ('0', ...) into (0, ... you can leave out the int() call.
Additionally, you may change the unwanted_resource_types variable into a set, like
unwanted_resource_types=set([0,1,3, ... ])
to improve speed (if speed is an issue, else it's unimportant).
The one-liner:
result = map(lambda x: dict(map(lambda a: (int(a[0]), a[1]), resource_types))[x], unwanted_resource_types)
without any explicit loop does the job.
Ok - you don't want to use this in production code - but it's fun. ;-)
Comment:
The inner dict(map(lambda a: (int(a[0]), a[1]), resource_types)) creates a dictionary from the input data:
{0: 'Group', 1: 'User', 2: 'Filter', 3: 'Agent', ...
The outer map chooses the names from the dictionary.

fast data comparison in python

I want to compare a large set of data in the form of 2 dictionaries of varying lengths.
(edit)
post = {0: [0.96180319786071777, 0.37529754638671875],
10: [0.20612385869026184, 0.17849941551685333],
20: [0.20612400770187378, 0.17510984838008881],...}
pre = {0: [0.96180319786071777, 0.37529754638671875],
1: [0.20612385869026184, 0.17849941551685333],
2: [0.20612400770187378, 0.17510984838008881],
5065: [0.80861318111419678, 0.76381617784500122],...}
The answer we need to get is 5065: [0.80861318111419678, 0.76381617784500122]. This is based on the fact that we are only comparing the values and not the indices at all.
I am using this key value pair only to remember the sequence of data. The data type can be replaced with a list/set if need be. I need to find out the key:value (index and value) pairs of the elements that are not in common to the dictionaries.
The code that I am using is very simple..
new = {}
found = []
for i in range(0, len(post)):
found= []
for j in range(0, len(pre)):
if post[i] not in pre.values():
if post[i] not in new:
new[i] = post[i]
found.append(j)
break
if found:
for f in found: pre.pop(f)
new{} contains the elements I need.
The problem I am facing is that this process is too slow. It takes sometimes over an hour to process. The data can be much larger at times. I need it to be faster.
Is there an efficient way of doing what I am trying to achieve ? I would like it if we dont depend on external packages apart from those bundled with python 2.5 (64 bit) unless absolutely necessary.
Thank you all.
This is basically what sets are designed for (computing differences in sets of items). The only gotcha is that the things you put into a set need to be hashable, and lists aren't. However, tuples are, so if you convert to that, you can put those into a set:
post_set = set(tuple(x) for x in post.itervalues())
pre_set = set(tuple(x) for x in pre.itervalues())
items_in_only_one_set = post_set ^ pre_set
For more about sets: http://docs.python.org/library/stdtypes.html#set
To get the original indices after you've computed the differences, what you'd probably want is to generate reverse lookup tables:
post_indices = dict((tuple(v),k) for k,v in post.iteritems())
pre_indices = dict((tuple(v),k) for k,v in pre.iteritems())
Then you can just take a given tuple and look up its index via the dictionaries:
index = post_indices.get(a_tuple, pre_indices.get(a_tuple))
Your problem is likely the nested for loops combined with use of range(), which creates a new list each time which can be slow. You will probably get some automatic speedups by iterating pre and post directly, and avoid doing so in a nested fashion.
post = {0: [0.96180319786071777, 0.37529754638671875],
10: [0.20612385869026184, 0.17849941551685333],
20: [0.20612400770187378, 0.17510984838008881]}
pre = {0: [0.96180319786071777, 0.37529754638671875],
1: [0.20612385869026184, 0.17849941551685333],
2: [0.20612400770187378, 0.17510984838008881],
5065: [0.80861318111419678, 0.76381617784500122]}
'''Create sets of values, independent of dict key for O(1) lookup'''
post_set=set(map(tuple, post.values()))
pre_set=set(map(tuple, pre.values()))
'''Iterate through each structure only once, filtering items that are found in
the sets we created earlier, updating new_diff'''
from itertools import ifilterfalse
new_diff=dict(ifilterfalse(lambda x: tuple(x[1]) in pre_set, post.items()))
new_diff.update(ifilterfalse(lambda x: tuple(x[1]) in post_set, pre.items()))
new_diff is now a dict such that each value is not found in both post and pre, with the original index preserved.
>>> print new_diff
{5065: [0.80861318111419678, 0.76381617784500122]}

Categories