Over counting pairs in python loop - python

I have a list of dictionaries where each dict is of the form:
{'A': a,'B': b}
I want to iterate through the list and for every (a,b) pair, find the pair(s), (b,a), if it exists.
For example if for a given entry of the list A = 13 and B = 14, then the original pair would be (13,14). I would want to search the entire list of dicts to find the pair (14,13). If (14,13) occurred multiple times I would like to record that too.
I would like to count the number of times for all original (a,b) pairs in the list, when the complement (b,a) appears, and if so how many times. To do this I have two for loops and a counter when a complement pair is found.
pairs_found = 0
for i, val in enumerate( list_of_dicts ):
for j, vol in enumerate( list_of_dicts ):
if val['A'] == vol['B']:
if vol['A'] == val['B']:
pairs_found += 1
This generates a pairs_found greater than the length of list_of_dicts. I realize this is because the same pairs will be over-counted. I am not sure how I can overcome this degeneracy?
Edit for Clarity
list_of_dicts = []
list_of_dicts[0] = {'A': 14, 'B', 23}
list_of_dicts[1] = {'A': 235, 'B', 98}
list_of_dicts[2] = {'A': 686, 'B', 999}
list_of_dicts[3] = {'A': 128, 'B', 123}
....
Lets say that the list has around 100000 entries. Somewhere in that list, there will be one or more entries, of the form {'A' 23, 'B': 14}. If this is true then I would like a counter to increase its value by one. I would like to do this for every value in the list.

Here is what I suggest:
Use tuple to represent your pairs and use them as dict/set keys.
Build a set of unique inverted pairs you'll look for.
Use a dict to store the number of time a pair appears inverted
Then the code should look like this:
# Create a set of unique inverted pairs
inverted_pairs_set = {(d['B'],d['A']) for d in list_of_dicts}
# Create a counter for original pairs
pairs_counter_dict = {(ip[1],ip[0]):0 for ip in inverted_pairs_set]
# Create list of pairs
pairs_list = [(d['A'],d['B']) for d in list_of_dicts]
# Count for each inverted pairs, how many times
for p in pairs_list:
if p in inverted_pairs_set:
pairs_counter_dict[(p[1],p[0])] += 1

You can create a counter dictionary that contains the values of the 'A' and 'B' keys in all your dictionaries:
complements_cnt = {(dct['A'], dct['B']): 0 for dct in list_of_dicts}
Then all you need is to iterate over your dictionaries again and increment the value for the "complements":
for dct in list_of_dicts:
try:
complements_cnt[(dct['B'], dct['A'])] += 1
except KeyError: # in case there is no complement there is nothing to increase
pass
For example with such a list_of_dicts:
list_of_dicts = [{'A': 1, 'B': 2}, {'A': 2, 'B': 1}, {'A': 1, 'B': 2}]
This gives:
{(1, 2): 1, (2, 1): 2}
Which basically says that the {'A': 1, 'B': 2} has one complement (the second) and {'A': 2, 'B': 1} has two (the first and the last).
The solution is O(n) which should be quite fast even for 100000 dictionaries.
Note: This is quite similar to #debzsud answer. I haven't seen it before I posted the answer though. :(

I am still not 100% sure what it is you want to do but here is my guess:
pairs_found = 0
for i, dict1 in enumerate(list_of_dicts):
for j, dict2 in enumerate(list_of_dicts[i+1:]):
if dict1['A'] == dict2['B'] and dict1['B'] == dict2['A']:
pairs_found += 1
Note the slicing on the second for loop. This avoids checking pairs that have already been checked before (comparing D1 with D2 is enough; no need to compare D2 to D1)
This is better than O(n**2) but still there is probably room for improvement

You could first create a list with the values of each dictionary as tuples:
example_dict = [{"A": 1, "B": 2}, {"A": 4, "B": 3}, {"A": 5, "B": 1}, {"A": 2, "B": 1}]
dict_values = [tuple(x.values()) for x in example_dict]
Then create a second list with the number of occurrences of each element inverted:
occurrences = [dict_values.count(x[::-1]) for x in dict_values]
Finally, create a dict with dict_values as keys and occurrences as values:
dict(zip(dict_values, occurrences))
Output:
{(1, 2): 1, (2, 1): 1, (4, 3): 0, (5, 1): 0}
For each key, you have the number of inverted keys. You can also create the dictionary on the fly:
occurrences = {dict_values: dict_values.count(x[::-1]) for x in dict_values}

Related

Get value of dictionaries into separate lists

I am trying to get array by first key.
The names of the keys are always the same and the number of elements is the same.
[{'a': 1, 'b':41, 'c':324}, {'a': 1, 'b':12, 'c':65}, {'a': 2, 'b':36, 'c':12}]
expected output:
[{'b':41, 'c':324}, {'b':12, 'c':65}]
[{'b':36, 'c':12}]
Make a new dictionary that uses the values of the a keys as its keys.
newdict = {}
for d in data:
newdict.setdefault(d['a'], []).append({'b': d['b'], 'c': d['c']})
result = list(new_dict.values())

make list of dictionaries overwriting one key entry from a list using iterators

I have the horrible feeling this will be a duplicate, I tried my best to find the answer already.
I have a dictionary and a list, and I want to create a list of dictionaries, using the list to overwrite one of the key values, like this:
d={"a":1,"b":10}
c=[3,4,5]
arg=[]
for i in c:
e=d.copy()
e["a"]=i
arg.append(e)
this gives the desired result
arg
[{'a': 3, 'b': 10}, {'a': 4, 'b': 10}, {'a': 5, 'b': 10}]
but the code is ugly, especially with the copy command, and instead of one list I have 4 or 5 in my real example which leads to a huge nested loop. I feel sure there is a neater way with an iterator like
arg=[d *with* d[a]=i for i in c]
where I'm not sure what to put in the place of the "with".
Again, apologies if this is already answered.
IIUC, you could do:
d={"a":1,"b":10}
c=[3,4,5]
res = [{ **d, "a" : ci } for ci in c]
print(res)
Output
[{'a': 3, 'b': 10}, {'a': 4, 'b': 10}, {'a': 5, 'b': 10}]
The part:
"a" : ci
rewrites the value at the key "a" and **d unpacks the dictionary.
I would do it this way:
arg=[d.copy() for i in range(len(c))]
for i in range(len(arg)):
arg[i]['a']=c[i]
This code first creates a list of dictionaries with the length of c and then updates 'a' for each dictionary, with the respective itme of c
You could do it using a dictionary comprehension within a list comprehension, checking for key == 'a':
d = {"a":1,"b":10}
c = [3,4,5]
l = [{k: num if k == 'a' else v for k,v in d.items()} for num in c]
In Python 3.9 there is new method to create new dictionary with updated values and keep old dictionary without updates - using operator |
new_dict = old_dict | dict_with_updates
With list comprehension it will be
arg = [ d | {"a": i} for i in c]
Full example
d = {"a": 1, "b": 10}
c = [3, 4, 5]
arg = [ d | {"a": i} for i in c]
print(arg)
BTW: There is also |= to update existing dictionary
old_dict |= dict_with_updates
Doc: What’s New In Python 3.9

Denormalize/flatten list of nested objects into dot separated key value pairs

It would have simpler if my nested objects were dictionaries, but these are list of dictionaries.
Example:
all_objs1 = [{
'a': 1,
'b': [{'ba': 2, 'bb': 3}, {'ba': 21, 'bb': 31}],
'c': 4
}, {
'a': 11,
'b': [{'ba': 22, 'bb': 33, 'bc': [{'h': 1, 'e': 2}]}],
'c': 44
}]
I expect output in following format:
[
{'a': 1, 'b.ba': 2, 'b.bb': 3, 'c': 4},
{'a': 1, 'b.ba': 21, 'b.bb': 31, 'c': 4},
{'a': 11, 'b.ba': 22, 'b.bb': 33, 'bc.h': 1, 'bc.e': 2, 'c': 44},
]
Basically, number of flattened objects generated will be equal to (obj * depth)
With my current code:
def flatten(obj, flattened_obj, last_key=''):
for k,v in obj.iteritems():
if not isinstance(v, list):
flattened_obj.update({last_key+k : v})
else:
last_key += k + '.'
for nest_obj in v:
flatten(nest_obj, flattened_obj, last_key)
last_key = remove_last_key(last_key)
def remove_last_key(key_path):
second_dot = key_path[:-1].rfind('.')
if second_dot > 0:
return key_path[:second_dot+1]
return key_path
Output:
[
{'a': 1, 'b.bb': 31, 'c': 4, 'b.ba': 21},
{'a': 11, 'b.bc.e': 2, 'c': 44, 'b.bc.h': 1, 'b.bb': 33, 'b.ba': 22}
]
I am able to flatten the object (not accurate though), but I am not able to create a new object at each nested object.
I can not use pandas library as my app is deployed on app engine.
code.py:
from itertools import product
from pprint import pprint as pp
all_objs = [{
"a": 1,
"b": [{"ba": 2, "bb": 3}, {"ba": 21, "bb": 31}],
"c": 4,
#"d": [{"da": 2}, {"da": 5}],
}, {
"a": 11,
"b": [{"ba": 22, "bb": 33, "bc": [{"h": 1, "e": 2}]}],
"c": 44,
}]
def flatten_dict(obj, parent_key=None):
base_dict = dict()
complex_items = list()
very_complex_items = list()
for key, val in obj.items():
new_key = ".".join((parent_key, key)) if parent_key is not None else key
if isinstance(val, list):
if len(val) > 1:
very_complex_items.append((key, val))
else:
complex_items.append((key, val))
else:
base_dict[new_key] = val
if not complex_items and not very_complex_items:
return [base_dict]
base_dicts = list()
partial_dicts = list()
for key, val in complex_items:
partial_dicts.append(flatten_dict(val[0], parent_key=new_key))
for product_tuple in product(*tuple(partial_dicts)):
new_base_dict = base_dict.copy()
for new_dict in product_tuple:
new_base_dict.update(new_dict)
base_dicts.append(new_base_dict)
if not very_complex_items:
return base_dicts
ret = list()
very_complex_keys = [item[0] for item in very_complex_items]
very_complex_vals = tuple([item[1] for item in very_complex_items])
for product_tuple in product(*very_complex_vals):
for base_dict in base_dicts:
new_dict = base_dict.copy()
new_items = zip(very_complex_keys, product_tuple)
for key, val in new_items:
new_key = ".".join((parent_key, key)) if parent_key is not None else key
new_dict.update(flatten_dict(val, parent_key=new_key)[0])
ret.append(new_dict)
return ret
def main():
flatten = list()
for obj in all_objs:
flatten.extend(flatten_dict(obj))
pp(flatten)
if __name__ == "__main__":
main()
Notes:
As expected, recursion is used
It's general, it also works for the case that I mentioned in my 2nd comment (for one input dict having more than one key with a value consisting of a list with more than one element), that can be tested by decommenting the "d" key in all_objs. Also, theoretically it should support any depth
flatten_dict: takes an input dictionary and outputs a list of dictionaries (as the input dictionary might yield more than one output dictionary):
Every key having a "simple" (not list) value, goes into the output dictionar(y/ies) unchanged
At this point, a base output dictionary is complete (if the input dictionary will generate more than output dictionary, all will have the base dictionary keys/values, if it only generates one output dictionary, then that will be the base one)
Next, the keys with "problematic" values - that may generate more than output dictionary - (if any) are processed:
Keys having a list with a single element ("problematic") - each might generate more than one output dictionary:
Each of the values will be flattened (might yield more than one output dictionary); the corresponding key will be used in the process
Then, the cartesian product will be computed on all the flatten dictionary lists (for current input, there will only be one list with one element)
Now, each product item needs to be in a distinct output dictionary, so the base dictionary is duplicated and updated with the keys / values of every element in the product item (for current input, there will be only one element per product item)
The new dictionary is appended to a list
At this point a list of base dictionaries (might only be one) is complete, if no values consisting of lists with more than one element are present, this is the return list, else everything below has to be done for each base dictionary in the list
Keys having a list with a more elements ("very problematic") - each will generate more than one output dictionaries:
First, the cartesian product will be computed against all the values (lists with more than one element). In the current case, since since it's only one such list, each product item will only contain an element from that list
Then, for each product item element, its key will need to be established based on the lists order (for the current input, the product item will only contain one element, and also, there will only be one key)
Again, each product item needs to be in a distinct output dictionary, so the base dictionary is duplicated and updated with the keys / values, of the flattened product item
The new dictionary is appended to the output dictionaries list
Works with Python 3 and Python 2
Might be slow (especially for big input objects), as performance was not the goal. Also since it was built bottom-up (adding functionality as new cases were handled), it is pretty twisted (RO: intortocheated :) ), there might be a simpler implementation that I missed.
Output:
c:\Work\Dev\StackOverflow\q046341856>c:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe code.py
[{'a': 1, 'b.ba': 2, 'b.bb': 3, 'c': 4},
{'a': 1, 'b.ba': 21, 'b.bb': 31, 'c': 4},
{'a': 11, 'b.ba': 22, 'b.bb': 33, 'b.bc.e': 2, 'b.bc.h': 1, 'c': 44}]
#EDIT0:
Made it more general (although it's not visible for the current input): values containing only one element can yield more than output dictionary (when flattened), addressed that case (before I was only considering the 1st output dictionary, simply ignoring the rest)
Corrected a logical error that was masked out tuple unpacking combined with cartesian product: if not complex_items ... part
#EDIT1:
Modified the code to match a requirement change: the key in the flattened dictionary must have the full nesting path in the input dictionary
use this code to get your desired output. It generates output based on recursive call.
import json
from copy import deepcopy
def flatten(final_list, all_obj, temp_dct, last_key):
for dct in all_obj:
deep_temp_dct = deepcopy(temp_dct)
for k, v in dct.items():
if isinstance(v, list):
final_list, deep_temp_dct = flatten(final_list, v, deep_temp_dct, k)
else:
prefix = ""
if last_key : prefix = last_key + "."
key = prefix+ k
deep_temp_dct[key] = v
if deep_temp_dct not in final_list:
final_list.append(deep_temp_dct)
return final_list, deep_temp_dct
final_list, _ = flatten([], all_objs1, {}, "")
print json.dumps(final_list, indent =4 )
let me know if it works for you.

Removing dictionaries from a list on the basis of duplicate value of key

I am new to Python. Suppose i have the following list of dictionaries:
mydictList= [{'a':1,'b':2,'c':3},{'a':2,'b':2,'c':4},{'a':2,'b':3,'c':4}]
From the above list, i want to remove dictionaries with same value of key b. So the resultant list should be:
mydictList = [{'a':1,'b':2,'c':3},{'a':2,'b':3,'c':4}]
You can create a new dictionary based on the value of b, iterating the mydictList backwards (since you want to retain the first value of b), and get only the values in the dictionary, like this
>>> {item['b'] : item for item in reversed(mydictList)}.values()
[{'a': 1, 'c': 3, 'b': 2}, {'a': 2, 'c': 4, 'b': 3}]
If you are using Python 3.x, you might want to use list function over the dictionary values, like this
>>> list({item['b'] : item for item in reversed(mydictList)}.values())
Note: This solution may not maintain the order of the dictionaries.
First, sort the list by b-values (Python's sorting algorithm is stable, so dictionaries with identical b values will retain their relative order).
from operator import itemgetter
tmp1 = sorted(mydictList, key=itemgetter('b'))
Next, use itertools.groupby to create subiterators that iterate over dictionaries with the same b value.
import itertools
tmp2 = itertools.groupby(tmp1, key=itemgetter('b))
Finally, create a new list that contains only the first element of each subiterator:
# Each x is a tuple (some-b-value, iterator-over-dicts-with-b-equal-some-b-value)
newdictList = [ next(x[1]) for x in tmp2 ]
Putting it all together:
from itertools import groupby
from operator import itemgetter
by_b = itemgetter('b')
newdictList = [ next(x[1]) for x in groupby(sorted(mydictList, key=by_b), key=by_b) ]
A very straight forward approach can go something like this:
mydictList= [{'a':1,'b':2,'c':3},{'a':2,'b':2,'c':4},{'a':2,'b':3,'c':4}]
b_set = set()
new_list = []
for d in mydictList:
if d['b'] not in b_set:
new_list.append(d)
b_set.add(d['b'])
Result:
>>> new_list
[{'a': 1, 'c': 3, 'b': 2}, {'a': 2, 'c': 4, 'b': 3}]

How to get the index with the key in a dictionary?

I have the key of a python dictionary and I want to get the corresponding index in the dictionary. Suppose I have the following dictionary,
d = { 'a': 10, 'b': 20, 'c': 30}
Is there a combination of python functions so that I can get the index value of 1, given the key value 'b'?
d.??('b')
I know it can be achieved with a loop or lambda (with a loop embedded). Just thought there should be a more straightforward way.
Use OrderedDicts: http://docs.python.org/2/library/collections.html#collections.OrderedDict
>>> x = OrderedDict((("a", "1"), ("c", '3'), ("b", "2")))
>>> x["d"] = 4
>>> x.keys().index("d")
3
>>> x.keys().index("c")
1
For those using Python 3
>>> list(x.keys()).index("c")
1
Dictionaries in python (<3.6) have no order. You could use a list of tuples as your data structure instead.
d = { 'a': 10, 'b': 20, 'c': 30}
newd = [('a',10), ('b',20), ('c',30)]
Then this code could be used to find the locations of keys with a specific value
locations = [i for i, t in enumerate(newd) if t[0]=='b']
>>> [1]
You can simply send the dictionary to list and then you can select the index of the item you are looking for.
DictTest = {
'4000':{},
'4001':{},
'4002':{},
'4003':{},
'5000':{},
}
print(list(DictTest).index('4000'))
No, there is no straightforward way because Python dictionaries do not have a set ordering.
From the documentation:
Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.
In other words, the 'index' of b depends entirely on what was inserted into and deleted from the mapping before:
>>> map={}
>>> map['b']=1
>>> map
{'b': 1}
>>> map['a']=1
>>> map
{'a': 1, 'b': 1}
>>> map['c']=1
>>> map
{'a': 1, 'c': 1, 'b': 1}
As of Python 2.7, you could use the collections.OrderedDict() type instead, if insertion order is important to your application.
#Creating dictionary
animals = {"Cat" : "Pat", "Dog" : "Pat", "Tiger" : "Wild"}
#Convert dictionary to list (array)
keys = list(animals)
#Printing 1st dictionary key by index
print(keys[0])
#Done :)

Categories