Unifying python dicts? - python

Does anone have a fair algorithm for unifying (almost) arbitrary dicts? That is, given the dicts
a = {1: 1, 2: 2, 3: [1,2,3]}
b = {4: 4, 3: [5], 5: {'a': 0, 'b': {}}
c = {3: [{'A': '0'}], 5: {'b': {'B': 1}}}
unify (a, b, c)
yields
{1: 1,
2: 2,
3: [1, 2, 3, 5, {'A': '0'}],
4: 4,
5: {'a': 0, 'b': {'B': 1}}
}
I keep wanting a generic solution. I wind up searching for a generic solution a couple of times a year and not finding one (no Google, unify from unification and unify from union is not the same word!), and I keep putting off writing one myself. I very well know that programming Prolog leads to an odd pespective on life, but hey, how can one have a recursive dict/key/value-store and not have unification?
I have in the past needed ordering, hence lists, and back then I wound up not going for a generic version but hardcoding. This time around I don't actually need unification of sets/lists at all, and the fall back is to once again hardcode as I know what the keys can be ahead of time. But: If there already were a generic solution out there, I wouldn't have to reinvent the wheel again and again. It's just wrong to have to do that.
The really pythonic solution would probably start with a __unify__-method on all things that can be unified, it's that basic.

If you are stuck as to how to iterate through a dictionary, using a for loop iterates through the keys:
>>> for i in {1: "abc"}: print i
1
As the comments say, please specify what problems you're facing rather than asking SO to write the code for you.

Like zodiac mentioned, it's difficult to answer with out a direct problem; however, I'll try for a solution.
#Merge Lists of Dictionaries Functions
def merge_lists(l1, l2, key):
merged = {}
for item in l1+l2:
if item[key] not in merged:
merged[item[key]] = item
return [val for (_, val) in merged.items()]
tell me how this works

Related

Efficient way to add a new key-value to a nested dictionary in Python?

I have a very large nested dictionary and below I am showing a sample of it.
tmp_dict = {1: {'A': 1, 'B': 2},
2: {'A': 0, 'B': 0}}
The question is what is any better/efficient way to add a new pair key value to my existing nested dict. I am currently looping through the keys to do so. Here is an example:
>>> for k in tmp_dict.keys():
tmp_dict[k].update({'C':1})
A simple method would be like so:
for key in tmp_dict:
tmp_dict[key]['C']=1
Or, you could use dictionary comprehension, as sushanth suggested
tmp_dict = {k: {**v, 'C': 1} for k, v in timp_dict.items()}
You can read more about the asterisks (and why this works) here.
In terms of complexity, they are all O(N) time complexity (I think the dict comprehension maybe O(N^2)). So, your solution should have a relatively quick run time anyways.

How to reduce the sets in dict values using comprehension?

I have
x = {'a':set([1]) , 'b':set([2]), 'c':set([3]) }
It is guaranteed that there is only one element in the set. I need to convert this to
{'a': 1, 'c': 3, 'b': 2}
Following works:
x1 = {k:x[k].pop() for k in x.keys()} OR
x1 = {k:next(iter(x[k])) for k in x.keys()}
but I am not liking it as pop() here is modifying the original collection. I need help on following.
How can I use unpacking as mentioned here within comprehension.
Is there any way, I can use functools.reduce for this.
What can be a better or Pythonic way of doing this overall?
If you want to do this with an unpacking, that'd be
{k: item for k, [item] in x.iteritems()}
In my opinion, the most readable option would be to use next and iter. Unpacking might also not be of much use since it is more of an assignment operation. (See user2357112's answer)
How about simply:
>>> {k: next(iter(v)) for k, v in x.items()}
{'a': 1, 'c': 3, 'b': 2}

Write a statement without calling any function?

In order to optimize a code in one single line, I am trying to write a determinate statement in my code without calling any function or method. While I was thinking about this I wondered if this is even possible in my case. I was searching some information about this but it seems to be very rarely, but in my current work I must be able to keep the code intact except that optimize section.
Hope you could give me a hand. Any help is welcome.
This is my current progress.
def count_chars(s):
'''(str) -> dict of {str: int}
Return a dictionary where the keys are the characters in s and the values
are how many times those characters appear in s.
>>> count_chars('abracadabra')
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
'''
d = {}
for c in s:
if not (c in d):
# This is the line it is assumed to be modified without calling function or method
else:
d[c] = d[c] + 1
return d
How about this, as mentioned in the comments, it does implicitly use functions, but I think it may be the sort of thing you are looking for?
s='abcab'
chars={}
for char in s:
if char not in chars:
chars[char]=0
chars[char]+=1
Result
{'a': 2, 'b': 2, 'c': 1}

Dict Comprehension Error

I am trying to create a simple dictionary of each letter with a number afterward (from 1-26), like this: {'a': 1, 'b': 2, 'c': 3, ...}.
I wanted to try using a dictionary comprehension to do this, so I did:
from string import lowercase
d = {s:i for s in lowercase for i in range(1, 27)}
However, this results in: {'a': 26, 'b': 26, 'c': 26, ...}. I think this happens because it's iterating over every value in lowercase, assigning it to 1, then 2, then 3 (for every value) ending at 26. There are only 26 keys because since it's a dictionary, it won't have two keys of the same letter (so it overwrites all of them to 26 at the end). I am not sure how to fix this, so if I could get guidance on how to actually do this, that would be great.
I got it to work using dict() and zip(): dict(zip(lowercase, range(1, 27))). However, I want to know how to do this using a dictionary comprehension. Thanks!
With enumerate:
{s: i for i, s in enumerate(lowercase, 1)}

python quickest way to merge dictionaries based on key match

I have 2 lists of dictionaries. List A is 34,000 long, list B is 650,000 long. I am essentially inserting all the List B dicts into the List A dicts based on a key match. Currently, I am doing the obvious, but its taking forever (seriously, like a day). There must be a quicker way!
for a in listA:
a['things'] = []
for b in listB:
if a['ID'] == b['ID']:
a['things'].append(b)
from collections import defaultdict
dictB = defaultdict(list)
for b in listB:
dictB[b['ID']].append(b)
for a in listA:
a['things'] = []
for b in dictB[a['ID']]:
a['things'].append(b)
this will turn your algorithm from O(n*m) to O(m)+O(n), where n=len(listA), m=len(listB)
basically it avoids looping through each dict in listB for each dict in listA by 'precalculating' what dicts from listB match each 'ID'
Here's an approach that may help. I'll leave it to you to fill in the details.
Your code is slow because it is a O(n^2) algorithm, comparing every A against every B.
If you sort each of listA and listB by id first (these are O(nlogn)) operations, then you can iterate easily through the sorted versions of A and B (this will be in linear time).
This approach is common when you have to do external merges on very large data sets. Mihai's answer is better for internal merging, where you simply index everything by id (in memory). If you have the memory to hold these additional structures, and dictionary lookup is constant time, that approach will likely be faster, not to mention simpler. :)
By way of example let's say A had the following ids after sorting
acfgjp
and B had these ids, again after sorting
aaaabbbbcccddeeeefffggiikknnnnppppqqqrrr
The idea is, strangely enough, to keep indexes into A and B (I know that does not sound very Pythonic). At first you are looking at a in A and a in B. So you walk through B adding all the a's to your "things" array for a. Once you exhaust the a's in B, you move up one in A, to c. But the next item in B is b, which is less than c, so you have to skip the b's. Then you arrive at a c in B, so you can start adding into "things" for c. Continue in this fashion until both lists are exhausted. Just one pass. :)
I'd convert ListA and ListB into dictionaries instead, dictionaries with ID as the key. Then it is a simple matter to append data using python's quick dictionary lookups:
from collections import defaultdict
class thingdict(dict):
def __init__(self, *args, **kwargs):
things = []
super(thingdict,self).__init__(*args, things=things, **kwargs)
A = defaultdict(thingdict)
A[1] = defaultdict(list)
A[2] = defaultdict(list, things=[6]) # with some dummy data
A[3] = defaultdict(list, things=[7])
B = {1: 5, 2: 6, 3: 7, 4: 8, 5: 9}
for k, v in B.items():
# print k,v
A[k]['things'].append(v)
print A
print B
This returns:
defaultdict(<class '__main__.thingdict'>, {
1: defaultdict(<type 'list'>, {'things': [5]}),
2: defaultdict(<type 'list'>, {'things': [6, 6]}),
3: defaultdict(<type 'list'>, {'things': [7, 7]}),
4: {'things': [8]},
5: {'things': [9]}
})
{1: 5, 2: 6, 3: 7, 4: 8, 5: 9}

Categories