Python Collections Counter for a List of Dictionaries - python

I have a dynamically growing list of arrays that I would like to add like values together. Here's an example:
{"something" : [{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"} ... ]}
I'd like to be able to add together the dictionaries inside the list. So, in this case for the key "something", the result would be:
["one":400, "three": 400, "two": 800]
or something to that effect. I'm familiar with the Python's collection counter, but since the "something" list contains dicts, it will not work (unless I'm missing something). The dict is also being dynamically created, so I can't build the list without the dicts. EG:
Counter({'b':3, 'c':4, 'd':5, 'b':2})
Would normally work, but as soon as I try to add an element, the previous value will be overwritten. I've noticed other questions such as these:
Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
Python count of items in a dictionary of lists
But again, the objects within the list are dicts.

I think this does what you want, but I'm not sure because I don't know what "The dict is also being dynamically created, so I can't build the list without the dicts" means. Still:
input = {
"something" : [{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"}],
"foo" : [{"a" : 100, "b" : 200}, {"a" : 300, "b": 400}],
}
def counterize(x):
return Counter({k : int(v) for k, v in x.iteritems()})
counts = {
k : sum((counterize(x) for x in v), Counter())
for k, v in input.iteritems()
}
Result:
{
'foo': Counter({'b': 600, 'a': 400}),
'something': Counter({'two': 800, 'three': 400, 'one': 300})
}
I expect using sum with Counter is inefficient (in the same way that using sum with strings is so inefficient that Guido banned it), but I might be wrong. Anyway, if you have performance problems, you could write a function that creates a Counter and repeatedly calls += or update on it:
def makeints(x):
return {k : int(v) for k, v in x.iteritems()}
def total(seq):
result = Counter()
for s in seq:
result.update(s)
return result
counts = {k : total(makeints(x) for x in v) for k, v in input.iteritems()}

One way would be do as follows:
from collections import defaultdict
d = {"something" :
[{"one":"200"}, {"three":"400"}, {"one":"100"}, {"two":"800"}]}
dd = defaultdict(list)
# first get and group values from the original data structure
# and change strings to ints
for inner_dict in d['something']:
for k,v in inner_dict.items():
dd[k].append(int(v))
# second. create output dictionary by summing grouped elemetns
# from the first step.
out_dict = {k:sum(v) for k,v in dd.items()}
print(out_dict)
# {'two': 800, 'one': 300, 'three': 400}
In here I don't use counter, but defaultdict. Its a two step approach.

Related

How to find and append dictionary values which are present in a list

I have a list which has unique sorted values
arr = ['Adam', 'Ben', 'Chris', 'Dean', 'Flower']
I have a dictionary which has values as such
dict = {
'abc': {'Dean': 1, 'Adam':0, 'Chris':1},
'def': {'Flower':0, 'Ben':1, 'Dean':0}
}
From looking at values from arr I need to have each item and if the value isn't present in subsequent smaller dict that should be assigned a value -1
Result
dict = {
'abc': {'Adam':0, 'Ben':-1, 'Chris':1, 'Dean': 1, 'Flower':-1},
'def': {'Adam':-1, 'Ben':1, 'Chris':-1, 'Dean': 0, 'Flower':0}
}
how can I achieve this using list and dict comprehensions in python
dd = {
key: {k: value.get(k, -1) for k in arr}
for key, value in dd.items()
}
{k: value.get(k, -1) for k in arr} will make sure that your keys are in the same order as you defined in the arr list.
A side note on the order of keys in dictionary.
Dictionaries preserve insertion order. Note that updating a key does
not affect the order. Keys added after deletion are inserted at the
end.
Changed in version 3.7: Dictionary order is guaranteed to be insertion
order. This behavior was an implementation detail of CPython from 3.6.
Please do not make a variable called dict, rename it to dct or something since dict it is a reserved python internal.
As for your question: just iterate through your dct and add the missing keys using setdefault:
arr = ['Adam', 'Ben', 'Chris', 'Dean', 'Flower']
dct = {
'abc': {'Dean': 1, 'Adam':0, 'Chris':1},
'def': {'Flower':0, 'Ben':1, 'Dean':0}
}
def add_dict_keys(dct, arr):
for key in arr:
dct.setdefault(key, -1)
return dct
for k, v in dct.items():
add_dict_keys(v, arr)
print(dct) # has updated values

Counting distinct dictionary values

I have this dictionary (key,list)
index={'chair':['one','two','two','two'],'table':['two','three','three']}
and i want this
#1. number of times each value occurs in each key. ordered descending
indexCalc={'chair':{'two':3,'one':1}, 'table':{'three':2,'two':1}}
#2. value for maximum amount for each key
indexMax={'chair':3,'table':2}
#3. we divide each value in #1 by value in #2
indexCalcMax={'chair':{'two':3/3,'one':1/3}, 'table':{'three':2/2,'two':1/2}}
I think I should use lambda expressions, but can't come up with any idea how i can do that. Any help?
First, define your values as lists correctly:
index = {'chair': ['one','two','two','two'], 'table': ['two','three','three']}
Then use collections.Counter with dictionary comprehensions:
from collections import Counter
number of times each value occurs in each key.
res1 = {k: Counter(v) for k, v in index.items()}
value for maximum amount for each key
res2 = {k: v.most_common()[0][1] for k, v in res1.items()}
we divide each value in #1 by value in #2
res3 = {k: {m: n / res2[k] for m, n in v.items()} for k, v in res1.items()}
index={'chair':{'one','two','two','two'},'table':{'two','three','three'}}
Problem: {} is creating a set. So you should consider to convert it into list.
Now coming to your solution:
from collections import Counter
index={'chair': ['one','two','two','two'],'table':['two','three','three']}
updated_index = {'chair': dict(Counter(index['chair'])), 'table': dict(Counter(index['table']))}
updated_index_2 = {'chair': Counter(index['chair']).most_common()[0][1], 'table': Counter(index['table']).most_common()[0][1]}
print(updated_index)
print(updated_index_2)
You can use python collections library, Counter to find the count without writing any lambda function.
{'chair': {'one': 1, 'two': 3}, 'table': {'two': 1, 'three': 2}}
{'chair': 3, 'table': 2}
Firstly, you have a mistake in how you created the index dict. You should have lists as the elements for each dictionary, you currently have sets. Sets are automatically deduplicated, so you will not be able to get a proper count from there.
You should correct index to be:
index={'chair':['one','two','two','two'],'table':['two','three','three']}
You can use the Counter module in Python 3, which is a subclass of the dict module, to generate what you want for each entry in indexCalc. A counter will create a dictionary with a key, and the number of times that key exists in a collection.
indexCalc = {k, Counter(v) for k, v in index}
indexCalc looks like this:
{'chair': Counter({'two': 3, 'one': 1}), 'table': Counter({'three': 2, 'two': 1})}
We can easily find the index that corresponds to the maximum value in each sub-dictionary:
indexMax = {k: max(indexCalc[k].values()) for k in indexCalc}
indexMax looks like this:
{'chair': 3, 'table': 2}
You can create indexCalcMax with the following comprehension, which is a little ugly:
indexCalcMax = {k: {val: indexCalc[k][val] / indexMax[k] for val in indexCalc[k]} for k in indexCalc}
which is a dict-comprehension translation of this loop:
for k in indexCalc:
tmp = {}
for val in indexCalc[k]:
tmp[val] = indexCalc[k][val] / float(indexMax[k])
indexCalcMax[k] = tmp
I know this is suboptimal, but I had to do it as a thought exercise:
indexCalc = {
k: {key: len([el for el in index[k] if el == key]) for key in set(index[k])}
for k in index
}
Not exactly lambda, as suggested, but comprehensions... Don't use this code in production :) This answer is only partial, you can use the analogy and come up with the other two structures that you require.

Creating Dictionaries from Lists inside of Dictionaries

I'm quite new to Python and I have been stumped by a seemingly simple task.
In part of my program, I would like to create Secondary Dictionaries from the values inside of lists, of which they are values of a Primary Dictionary.
I would also like to default those values to 0
For the sake of simplicity, the Primary Dictionary looks something like this:
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
What I would like my result to be is something like:
{'list_a':[{'apple':0}, {'orange':0}], 'list_b':[{'car':0}, {'bus':0}]}
I understand the process should be to iterate through each list in the primaryDict, then iterate through the items in the list and then assign them as Dictionaries.
I've tried many variations of "for" loops all looking similar to:
for listKey in primaryDict:
for word in listKey:
{word:0 for word in listKey}
I've also tried some methods of combining Dictionary and List comprehension,
but when I try to index and print the Dictionaries with, for example:
print(primaryDict['list_a']['apple'])
I get the "TypeError: list indices must be integers or slices, not str", which I interpret that my 'apple' is not actually a Dictionary, but still a string in a list. I tested that by replacing 'apple' with 0 and it just returns 'apple', proving it true.
I would like help with regards to:
-Whether or not the values in my list are assigned as Dictionaries with value '0'
or
-Whether the mistake is in my indexing (in the loop or the print function), and what I am mistaken with
or
-Everything I've done won't get me the desired outcome and I should attempt a different approach
Thanks
Here is a dict comprehension that works:
{k: [{v: 0} for v in vs] for k, vs in primaryDict.items()}
There are two problems with your current code. First, you are trying to iterate over listKey, which is a string. This produces a sequence of characters.
Second, you should use something like
[{word: 0} for word in words]
in place of
{word:0 for word in listKey}
You are close. The main issue is the way you iterate your dictionary, and the fact you do not append or assign your sub-dictionaries to any variable.
This is one solution using only for loops and list.append.
d = {}
for k, v in primaryDict.items():
d[k] = []
for w in v:
d[k].append({w: 0})
{'list_a': [{'apple': 0}, {'orange': 0}],
'list_b': [{'car': 0}, {'bus': 0}]}
A more Pythonic solution is to use a single list comprehension.
d = {k: [{w: 0} for w in v] for k, v in primaryDict.items()}
If you are using your dictionary for counting, which seems to be the implication, an even more Pythonic solution is to use collections.Counter:
from collections import Counter
d = {k: Counter(dict.fromkeys(v, 0)) for k, v in primaryDict.items()}
{'list_a': Counter({'apple': 0, 'orange': 0}),
'list_b': Counter({'bus': 0, 'car': 0})}
There are specific benefits attached to collections.Counter relative to normal dictionaries.
You can get the data structure that you desire via:
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
for k, v in primaryDict.items():
primaryDict[k] = [{e: 0} for e in v]
# primaryDict
{'list_b': [{'car': 0}, {'bus': 0}], 'list_a': [{'apple': 0}, {'orange': 0}]}
But the correct nested access would be:
print(primaryDict['list_a'][0]['apple']) # note the 0
If you actually want primaryDict['list_a']['apple'] to work, do instead
for k, v in primaryDict.items():
primaryDict[k] = {e: 0 for e in v}
# primaryDict
{'list_b': {'car': 0, 'bus': 0}, 'list_a': {'orange': 0, 'apple': 0}}
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
for listKey in primaryDict:
primaryDict[i] = [{word:0} for word in primaryDict[listKey]]
print(primaryDict)
Output:
{'list_a':[{'apple':0}, {'orange':0}], 'list_b':[{'car':0}, {'bus':0}]}
Hope this helps!
#qqc1037, I checked and updated your code to make it working. I have mentioned the problem with your code as comments. Finally, I have also added one more example using list comprehension, map() & lambda function.
import json
secondaryDict = {}
for listKey in primaryDict:
new_list = [] # You did not define any temporary list
for word in primaryDict [listKey]: # You forgot to use key that refers the list
new_list.append( {word:0}) # Here you forgot to append to list
secondaryDict2.update({listKey: new_list}) # Finally, you forgot to update the secondary dictionary
# Pretty printing dictionary
print(json.dumps(secondaryDict, indent=4));
"""
{
"list_a": [
{
"apple": 0
},
{
"orange": 0
}
],
"list_b": [
{
"car": 0
},
{
"bus": 0
}
]
}
"""
Another example: Using list comprehension, map(), lambda function
# Using Python 3.5.2
import json
primaryDict = {'list_a':['apple', 'orange'], 'list_b':['car', 'bus']}
secondaryDict = dict(map(lambda key: (key, [{item:0} for item in primaryDict[key]]), list(primaryDict) ))
# Pretty printing secondary dictionary
print(json.dumps(secondaryDict, indent=4))
"""
{
"list_a": [
{
"apple": 0
},
{
"orange": 0
}
],
"list_b": [
{
"car": 0
},
{
"bus": 0
}
]
}
"""

copy part of one dict to a new dict based on a list of keys

Sample:
d = {
"test": 1,
"sample": 2,
"example": 3,
"product": 4,
"software": 5,
"demo": 6,
}
filter_keys = ["test","sample","example","demo"]
I want to create a new dict that contains only those items from the first dict whose keys appear in the list. In other words, I want:
d2 = {
"test": 1,
"sample": 2,
"example": 3,
"demo": 6,
}
I could do it with a loop:
d2 = {}
for k in d.keys():
if (k in filter_keys):
d2[k] = d[k]
But this seems awfully "un-Pythonic". I'm also guessing that if you had a huge dict, say 5,000 items or so, the constant adding of new items to the new dict would be slow compared to a more direct way.
Also, you'd want to be able to handle errors. If the list contains something that's not a key in the dict, it should just be ignored. Or maybe it gets added to the new dict but with a value of None.
Is there a better way to accomplish this?
A straight-forward way to do this is with the "dictionary comprehension":
filtered_dict = {key: value for key, value in d.items() if key in filter_keys}
Note that if the condition appears at the end of the comprehension, it filters execution of the loop statement. Depending on whether the numbers of keys in the dictionary is greater than the number of keys you want to filter on, this revision could be more efficient:
filtered_dict = {key: d[key] for key in filter_keysif key in d}
Checking for membership in the dictionary (key in d) is significantly faster than checking for membership in the filter key list (key in filter_keys). But which ends up faster depends on the size of the filter key list (and, to a lesser extent, the size of the dictionary).
A relatively python way to do it without a dictionary comprehension is with the dict constructor:
filtered_dict = dict([(key, value) for key, value in d.items() if key in filter_keys])
Note that this is essentially equivalent to the dictionary comprehension, but may be clearer if you aren't familiar with dictionary comprehension syntax.
Dictionary comprehension is one way to do it:
new_d = {k: v for k, v in d.items() if k in l}
Demo:
>>> d = {
... "test": 1,
... "sample": 2,
... "example": 3,
... "product": 4,
... "software": 5,
... "demo": 6,
... }
>>>
>>> l = ["test","sample","example","demo"]
>>> new_d = {k: v for k, v in d.items() if k in l}
>>> new_d
{'sample': 2, 'demo': 6, 'test': 1, 'example': 3}
For optimal performance, you should iterate over the keys in the list and check if they are in the dict rather than the other way around:
d2 = {}
for k in list_of_keys:
if k in d:
d2[k] = d[k]
The benefit here is that the dict.__contains__ (in) on a dict is O(1) whereas for the list it's O(N). For big lists, that's a HUGE benefit (O(N) algorithm vs. O(N^2)).
We can be a little more succinct by expressing the above loop with an equivalent dict-comprehension:
d2 = {k: d[k] for k in list_of_keys if k in d}
This will be likely be marginally faster than the loop, but probably not enough to ever worry about. That said, most python programmers would prefer this version as it is more succinct and very common.
As per your last part of the question:
Or maybe it gets added to the new dict but with a value of None.
l = ["test","sample","example","demo","badkey"]
d = {
"test": 1,
"sample": 2,
"example": 3,
"product": 4,
"software": 5,
"demo": 6,
}
print {k: d.get(k) for k in l}
{'test': 1, 'sample': 2, 'badkey': None, 'example': 3, 'demo': 6}
You can pass a default return value to dict.get, it is None by default but you could set it to d.get(k,"No_match") etc.. or whatever value you wanted.

combining a list of dictionaries with another dictionary

I have a list with a set amount of dictionaries inside which I have to compare to one other dictionary.
They have the following form (there is no specific form or pattern for keys and values, these are randomly chosen examples):
list1 = [
{'X1': 'Q587', 'X2': 'Q67G7', ...},
{'AB1': 'P5K7', 'CB2': 'P678', ...},
{'B1': 'P6H78', 'C2': 'BAA5', ...}]
dict1 = {
'X1': set([B00001,B00020,B00010]),
'AB1': set([B00001,B00007,B00003]),
'C2': set([B00001,B00002,B00003]), ...
}
What I want to have now is a new dictionary which has as keys: the values of the dictionaries in list1. and as values the values of dict1. And this only when the keys intersect in compared dictionaries.
I have done this in the following way:
nDicts = len(list1)
resultDict = {}
for key in range(0,nDicts):
for x in list1[key].keys():
if x in dict1.keys():
resultDict.update{list1[key][x]:dict1[x]}
print resultDict
The desired output should be of the form:
resulDict = {
'Q587': set([B00001,B00020,B00010]),
'P5K7': set([B00001,B00007,B00003]),
'BAA5': set([B00001,B00002,B00003]), ...
}
This works but since the amount of data is so high this takes forever.
Is there a better way to do this?
EDIT: I have changed the input values a little, the only ones that matter are the keys which intersect between the dictionaries within list1 and those within dict1.
The keys method in Python 2.x makes a list with a copy of all of the keys, and you're doing this not only for each dict in list1 (probably not a big deal, but it's hard to know for sure without knowing your data), but also doing it for dict1 over and over again.
On top of that, doing an in test on a list takes a long time, because it has to check each value in the list until it finds a match, but doing an in test on a dictionary is nearly instant, because it just has to look up the hash value.
Both keys are actually completely unnecessary—iterating a dict gives you the keys in order (an unspecified order, but the same is true for calling keys()), and in-checking a dict searches the same keys you'd get with keys(). So, just removing them does the same thing, but simpler, faster, and with less memory used. So:
for key in range(0,nDicts):
for x in list1[key]:
if x in dict1:
resultDict={list1[key][x]:dict1[x]}
print resultDict
There are also ways you can simplify this that probably won't help performance that much, but are still worth doing.
You can iterate directly over list1 instead of building a huge list of all the indices and iterating that.
for list1_dict in list1:
for x in list1_dict:
if x in dict1:
resultDict = {list_dict[x]: dict1[x]}
print resultDict
And you can get the keys and values in a single step:
for list1_dict in list1:
for k, v in list1_dict.iteritems():
if k in dict1:
resultDict = {v: dict1[k]}
print resultDict
Also, if you expect most of the values to be found, it will take about twice as long to first check for the value and then look it up as it would to just try to look it up and handle failure. (This is not true if most of the values will not be found, however.) So:
for list1_dict in list1:
for k, v in list1_dict.iteritems():
try:
resultDict = {v: dict1[k]}
print resultDict
except KeyError:
pass
You can simplify and optimize your operation with set intersections; as of Python 2.7 dictionaries can represent keys as sets using the dict.viewkeys() method, or dict.keys() in Python 3:
resultDict = {}
for d in list1:
for sharedkey in d.viewkeys() & dict1:
resultDict[d[sharedkey]] = dict1[sharedkey]
This can be turned into a dict comprehension even:
resultDict = {d[sharedkey]: dict1[sharedkey]
for d in list1 for sharedkey in d.viewkeys() & dict1}
I am assuming here you wanted one resulting dictionary, not a new dictionary per shared key.
Demo on your sample input:
>>> list1 = [
... {'X1': 'AAA1', 'X2': 'BAA5'},
... {'AB1': 'AAA1', 'CB2': 'BAA5'},
... {'B1': 'AAA1', 'C2': 'BAA5'},
... ]
>>> dict1 = {
... 'X1': set(['B00001', 'B00002', 'B00003']),
... 'AB1': set(['B00001', 'B00002', 'B00003']),
... }
>>> {d[sharedkey]: dict1[sharedkey]
... for d in list1 for sharedkey in d.viewkeys() & dict1}
{'AAA1': set(['B00001', 'B00002', 'B00003'])}
Note that both X1 and AB1 are shared with dictionaries in list1, but in both cases, the resulting key is AAA1. Only one of these wins (the last match), but since both values in dict1 are exactly the same anyway that doesn't make any odds in this case.
If you wanted separate dictionaries per dictionary in list1, simply move the for d in list1: loop out:
for d in list1:
resultDict = {d[sharedkey]: dict1[sharedkey] for sharedkey in d.viewkeys() & dict1}
if resultDict: # can be empty
print resultDict
If you really wanted one dictionary per shared key, move another loop out:
for d in list1:
for sharedkey in d.viewkeys() & dict1:
resultDict = {d[sharedkey]: dict1[sharedkey]}
print resultDict
#!/usr/bin/env python
list1 = [
{'X1': 'AAA1', 'X2': 'BAA5'},
{'AB1': 'AAA1', 'CB2': 'BAA5'},
{'B1': 'AAA1', 'C2': 'BAA5'}
]
dict1 = {
'X1': set(['B00001','B00002','B00003']),
'AB1': set(['B00001','B00002','B00003'])
}
g = ( k.iteritems() for k in list1)
ite = ((a,b) for i in g for a,b in i if dict1.has_key(a))
d = dict(ite)
print d

Categories