sum up values of dictionaries - python

I have a dictionary such as below:
grocery={
'James': {'Brocolli': 3, 'Carrot': 3, 'Cherry': 5},
'Jill': {'Apples': 2, 'Carrot': 4, 'Tomatoes': 8},
'Sunny': {'Apples': 5, 'Carrot': 2, 'Cherry': 2, 'Chicken': 3, 'Tomatoes': 6}
}
food={}
for a,b in grocery.items():
for i,j in b.items():
food[i]+=(b.get(i,0))
I am trying to calculate total of each food item and it is not working as expected.
For eg: I would like to count total of Carrot, total of Apples and so on.
The above code is giving me following error:
File "dictionary1.py", line 6, in <module>
food[i]+=(b.get(i,0))
KeyError: 'Cherry
How to sum up total of each item?

Simply do
from collections import defaultdict
food = defaultdict(int) <-- default value of 0 to every non existent key
..and your code should work :)
PS. You get the error because you are trying to add values to uninitialized keys... Don't assume that non existent keys start from 0...

Your food dictionary is empty and has no keys at the start; you can't just sum up a value to something that isn't there yet.
Instead of +=, get the current value or a default, using dict.get() again:
food[i] = food.get(i, 0) + b.get(i,0)
You don't really need to use b.get() here, as you already have the values of b in the variable j:
food[i] = food.get(i, 0) + j
You could also use a collections.defaultdict() object to make keys 'automatically' exist when you try to access them, with a default value:
from collections import defaultdict
food = defaultdict(int) # insert int() == 0 when a key is not there yet
and in the inner loop then use food[i] += j.
I strongly recommend you use better names for your variables. If you iterate over dict.values() rather than dict.items(), you can look at the values only when you don't need the keys (like for the outer for loop):
food = {}
for shopping in grocery.values():
for name, quantity in shopping.items():
food[name] = food.get(name, 0) + quantity
Another option is to use a dedicated counting and summing dictionary subclass, called collections.Counter(). This class directly supports summing your groceries in a single line:
from collections import Counter
food = sum(map(Counter, grocery.values()), Counter())
map(Counter, ...) creates Counter objects for each of your input dictionaries, and sum() adds up all those objects (the extra Counter() argument 'primes' the function to use an empty Counter() as a starting value rather than an integer 0).
Demo of the latter:
>>> from collections import Counter
>>> sum(map(Counter, grocery.values()), Counter())
Counter({'Tomatoes': 14, 'Carrot': 9, 'Cherry': 7, 'Apples': 7, 'Brocolli': 3, 'Chicken': 3})
A Counter is still a dictionary, just one with extra functionality. You can always go back to a dictionary by passing the Counter to dict():
>>> food = sum(map(Counter, grocery.values()), Counter())
>>> dict(food)
{'Brocolli': 3, 'Carrot': 9, 'Cherry': 7, 'Apples': 7, 'Tomatoes': 14, 'Chicken': 3}

You get the error, because in the beginning the keys, i.e. 'Apples', 'Tomatoes', ..., do not exist in food. You can correct this with a try-except block:
grocery={
"Jill":{"Apples":2, "Tomatoes":8,"Carrot":4},
"James":{"Carrot":3,"Brocolli":3,"Cherry":5},
"Sunny":{"Chicken":3,"Apples":5,"Carrot":2,"Tomatoes":6,"Cherry":2}
}
food={}
for a,b in grocery.items():
for i,j in b.items():
try:
food[i] += j
except KeyError:
food[i] = j
Also, you can get rid of the b.get(i,0) statement, because you already iterate through b and only get values (j) that actually exist in b.

Related

How to find multiple maximums in a dictionary

Trying to analyse some strings and compute the number of times they come up. This data is stored in a dictionary. If I were to use the max function only the first highest number encountered would be printed.
count = {"cow": 4, "moo": 4, "sheep": 1}
print(max(count.keys(), key=lambda x: count[x]))
cow
This would yield cow to be the max. How would I get "cow" and "moo" to both be printed
count = {"cow": 4, "moo": 4, "sheep": 1}
cow, moo
Why not keep it simple?
mx = max(count.values())
print([k for k, v in count.items() if v == mx])
# ['cow', 'moo']
The bracketed expression in line two is a list comprehension, essentially a short hand for a for loop that runs over one list-like object (an "iterable") and creates a new list as it goes along. A subtlety in this case is that there are two loop variables (k and v) that run simultaneously their values being assigned by tuple unpacking (.items() returns pairs (key, value) one after the other). To summarize the list comprehension here is roughly equivalent to:
result = []
for k, v in count.items():
if v == mx:
result.append(k)
But the list comprehension will run faster and is also easier to read once you got used to it.
Just group the counts with a defaultdict, and take the maximum:
from collections import defaultdict
count = {"cow": 4, "moo": 4, "sheep": 1}
d = defaultdict(list)
for animal, cnt in count.items():
d[cnt].append(animal)
print(dict(d))
# {4: ['cow', 'moo'], 1: ['sheep']}
print(max(d.items())[1])
# ['cow', 'moo']

list comprehension to build a nested dictionary from a list of tuples

I have data (counts) indexed by user_id and analysis_type_id obtained from a database. It's a list of 3-tuple. Sample data:
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
where the first item of each tuple is the count, the second the analysis_type_id, and the last the user_id.
I'd like to place that into a dictionary, so i can retrieve the counts quickly: given a user_id and analysis_type_id. It would have to be a two-level dictionary. Is there any better structure?
To construct the two-level dictionary "by hand", I would code:
dict = {4:{1:4,5:3,10:2},5:{10:2}}
Where user_id is the first dict key level, analysis_type_id is the second (sub-) key, and the count is the value inside the dict.
How would I create the "double-depth" in dict keys through list comprehension?
Or do I need to resort to a nested for-loop, where I first iterate through unique user_id values, then find matching analysis_type_id and fill in the counts ... one-at-a-time into the dict?
Two Tuple Keys
I would suggest abandoning the idea of nesting dictionaries and simply use two tuples as the keys directly. Like so:
d = { (user_id, analysis_type_id): count for count, analysis_type_id, user_id in counts}
The dictionary is a hash table. In python, each two tuple has a single hash value (not two hash values) and thus each two tuple is looked up based on its (relatively) unique hash. Therefore this is faster (2x faster, most of the time) than looking up the hash of TWO separate keys (first the user_id, then the analysis_type_id).
However, beware of premature optimization. Unless you're doing millions of lookups, the increase in performance of the flat dict is unlikely to matter. The real reason to favor the use of the two tuple here is that the syntax and readability of a two tuple solution is far superior than other solutions- that is, assuming the vast majority of the time you will be wanting to access items based on a pair of values and not groups of items based on a single value.
Consider Using a namedtuple
It may be convenient to create a named tuple for storing those keys. Do that this way:
from collections import namedtuple
IdPair = namedtuple("IdPair", "user_id, analysis_type_id")
Then use it in your dictionary comprehension:
d = { IdPair(user_id, analysis_type_id): count for count, analysis_type_id, user_id in counts}
And access a count you're interested in like this:
somepair = IdPair(user_id = 4, analysis_type_id = 1)
d[somepair]
The reason this is sometimes useful is you can do things like this:
user_id = somepair.user_id # very nice syntax
Some Other Useful Options
One downside of the above solution is the case in which your lookup fails. In that case, you will only get a traceback like the following:
>>> d[IdPair(0,0)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: IdPair(user_id=0, analysis_type_id=0)
This isn't very helpful; was it the user_id that was unmatched, or the analysis_type_id, or both?
You can create a better tool for yourself by creating your own dict type that gives you a nice traceback with more information. It might look something like this:
class CountsDict(dict):
"""A dict for storing IdPair keys and count values as integers.
Provides more detailed traceback information than a regular dict.
"""
def __getitem__(self, k):
try:
return super().__getitem__(k)
except KeyError as exc:
raise self._handle_bad_key(k, exc) from exc
def _handle_bad_key(self, k, exc):
"""Provides a custom exception when a bad key is given."""
try:
user_id, analysis_type_id = k
except:
return exc
has_u_id = next((True for u_id, _ in self if u_id==user_id), False)
has_at_id = next((True for _, at_id in self if at_id==analysis_type_id), False)
exc_lookup = {(False, False):KeyError(f"CountsDict missing pair: {k}"),
(True, False):KeyError(f"CountsDict missing analysis_type_id: "
f"{analysis_type_id}"),
(False, True):KeyError(f"CountsDict missing user_id: {user_id}")}
return exc_lookup[(user_id, analysis_type_id)]
Use it just like a regular dict.
However, it may make MORE sense to simply add new pairs to your dict (with a count of zero) when you try to access a missing pair. If this is the case, I'd use a defaultdict and have it set the count to zero (using the default value of int as the factory function) when a missing key is accessed. Like so:
from collections import defaultdict
my_dict = defaultdict(default_factory=int,
((user_id, analysis_type_id), count) for count, analysis_type_id, user_id in counts))
Now if you attempt to access a key that is missing, the count will be set to zero. However, one problem with this method is that ALL keys will be set to zero:
value = my_dict['I'm not a two tuple, sucka!!!!'] # <-- will be added to my_dict
To prevent this, we go back to the idea of making a CountsDict, except in this case, your special dict will be a subclass of defaultdict. However, unlike a regular defaultdict, it will check to make sure the key is a valid kind before it is added. And as a bonus, we can make sure ANY two tuple that is added as a key becomes an IdPair.
from collections import defaultdict
class CountsDict(defaultdict):
"""A dict for storing IdPair keys and count values as integers.
Missing two-tuple keys are converted to an IdPair. Invalid keys raise a KeyError.
"""
def __getitem__(self, k):
try:
user_id, analysis_type_id = k
except:
raise KeyError(f"The provided key {k!r} is not a valid key.")
else:
# convert two tuple to an IdPair if it was not already
k = IdPair(user_id, analysis_type_id)
return super().__getitem__(k)
Use it just like the regular defaultdict:
my_dict = CountsDict(default_factory=int,
((user_id, analysis_type_id), count) for count, analysis_type_id, user_id in counts))
NOTE: In the above I have not made it so that two tuple keys are converted to IdPairs upon instance creation (because __setitem__ is not utilized during instance creation). To create this functionality, we would also need to implement an override of the __init__ method.
Wrap Up
Out of all of these, the more useful option depends entirely on your use case.
The most readable solution utilizes a defaultdict which saves you nested loops and bumpy checking if keys already exist:
from collections import defaultdict
dct = defaultdict(dict) # do not shadow the built-in 'dict'
for x, y, z in counts:
dct[z][y] = x
dct
# defaultdict(dict, {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}})
If you really want a one-liner comprehension you can use itertools.groupby and this clunkiness:
from itertools import groupby
dct = {k: {y: x for x, y, _ in g} for k, g in groupby(sorted(counts, key=lambda c: c[2]), key=lambda c: c[2])}
If your initial data is already sorted by user_id, you can save yourself the sorting.
This is a good use for the defaultdict object. You can create a defaultdict whose elements are always dicts. Then you can just stuff the counts into the right dicts, like this:
from collections import defaultdict
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = defaultdict(dict)
for count, analysis_type_id, user_id in counts:
dct[user_id][analysis_type_id]=count
dct
# defaultdict(dict, {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}})
# if you want a 'normal' dict, you can finish with this:
dct = dict(dct)
Or you can just use standard dicts with setdefault:
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = dict()
for count, analysis_type_id, user_id in counts:
dct.setdefault(user_id, dict())
dct[user_id][analysis_type_id]=count
dct
# {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}}
I don't think you can do this neatly with a list comprehension, but there's no need to be afraid of a for-loop for this kind of thing.
you could use the following logic. It's no need to import any package, just we should use for loops properly.
counts = [(4, 1, 4), (3, 5, 4), (2, 10, 4), (2, 10, 5)]
dct = {x[2]:{y[1]:y[0] for y in counts if x[2] == y[2]} for x in counts }
"""output will be {4: {1: 4, 5: 3, 10: 2}, 5: {10: 2}} """
You can list comprehension for nested loops with condition and use one or more of them for elements selections:
# create dict with tuples
line_dict = {str(nest_list[0]) : nest_list[1:] for nest_list in nest_lists for elem in nest_list if elem== nest_list[0]}
print(line_dict)
# create dict with list
line_dict1 = {str(nest_list[0]) list(nest_list[1:]) for nest_list in nest_lists for elem in nest_list if elem== nest_list[0]}
print(line_dict1)
Example: nest_lists = [("a","aa","aaa","aaaa"), ("b","bb","bbb","bbbb") ("c","cc","ccc","cccc"), ("d","dd","ddd","dddd")]
Output: {'a': ('aa', 'aaa', 'aaaa'), 'b': ('bb', 'bbb', 'bbbb'), 'c': ('cc', 'ccc', 'cccc'), 'd': ('dd', 'ddd', 'dddd')}, {'a': ['aa', 'aaa', 'aaaa'], 'b': ['bb', 'bbb', 'bbbb'], 'c': ['cc', 'ccc', 'cccc'], 'd': ['dd', 'ddd', 'dddd']}

How do I represent a dictionary from a list assuming that every other number by its side is its value?

I have a list that looks like this,
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
In this list, each number after the word represents the value of the word. I want to represent this list in a dictionary such that the value of each repeated word gets added. I want the dictionary to be like this:
dict = {'hello':'2', 'go':'14', 'sit':'6','line':'3','play':'0'}
In the list 'go' occurs twice with two different values so we add the number that occur just after the word, similarly for other words.
This is my approach, it does not seem to work.
import csv
with open('teest.txt', 'rb') as input:
count = {}
my_file = input.read()
listt = my_file.split()
i = i + 2
for i in range(len(listt)-1):
if listt[i] in count:
count[listt[i]] = count[listt[i]] + listt[i+1]
else:
count[listt[i]] = listt[i+1]
Counting occurrences of unique keys is usually possible with defaultdict.
import collections as ct
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
dd = ct.defaultdict(int)
iterable = iter(lista)
for word in iterable:
dd[word] += int(next(iterable))
dd
# defaultdict(int, {'go': 14, 'hello': 2, 'line': 3, 'play': 0, 'sit': 6})
Here we initialize the defaultdict to accept integers. We make a list iterator, both creates a generator and allows us to call next() on it. Since the word and value occur in consecutive pairs in the list, we will iterate and immediately call next() to extract these values in sync. We assign these items as (key, value) pairs to the defaultdict, which happens to keep count.
Convert the integers to strings if this is required:
{k: str(v) for k, v in dd.items()}
# {'go': '14', 'hello': '2', 'line': '3', 'play': '0', 'sit': '6'}
An alternate tool may be the Counter (see #DexJ's answer), which is related to this type of defaultdict. In fact, Counter() can substitute defaultdict(int) here and return the same result.
You can "stride" the array 2 items at a time using a range(). The optional 3rd argument in a range lets you define a "skip".
range(start, stop[, step])
Using this, we can create a range of indexes that skip ahead 2 at a time, for the entire length of your list. We can then ask the list what "name" is at that index lista[i] and what "value" is after it lista[i + 1].
new_dict = {}
for i in range(0, len(lista), 2):
name = lista[i]
value = lista[i + 1]
# the name already exists
# convert their values to numbers, add them, then convert back to a string
if name in new_dict:
new_dict[name] = str( int(new_dict[name]) + int(value) )
# the name doesn't exist
# simply append it with the value
else:
new_dict[name] = value
as explained by #Soviut you may use range() function with step value 2 to reach to word directly. as I seen in your list you have value stored as string so I have converted them to integers.
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
data = {}
for i in range(0, len(lista), 2): # increase searching with step of 2 from 0 i.e. 0,2,4,...
if lista[i] in data.keys(): # this condition checks whether your element exist in dictionary key or not
data[lista[i]] = int(data[lista[i]]) + int(lista[i+1])
else:
data[lista[i]] = int(lista[i+1])
print(data)
Output
{'hello': 2, 'go': 14, 'sit': 6, 'line': 3, 'play': 0}
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
dictionary = {}
for keyword, value in zip(*[iter(lista)]*2): # iterate two at a time
if keyword in dictionary: # if the key is present, add to the existing sum
dictionary[keyword] = dictionary[keyword] + int(value)
else: # if not present, set the value for the first time
dictionary[keyword] = int(value)
print(dictionary)
Output:
{'hello': 2, 'go': 14, 'sit': 6, 'line': 3, 'play': 0}
Another solution using iter(), itertools.zip_longest() and itertools.groupby() functions:
import itertools
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
it = iter(lista)
d = {k: sum(int(_[1]) for _ in g)
for k,g in itertools.groupby(sorted(itertools.zip_longest(it, it)), key=lambda x: x[0])}
print(d)
The output:
{'line': 3, 'sit': 6, 'hello': 2, 'play': 0, 'go': 14}
You can use range(start,end,steps) to get endpoint and split list and just use Counter() from collections to sum duplicate key's value and you're done :)
here yourdict will be {'go': 14, 'line': 3, 'sit': 6, 'play': 0, 'hello': 2}
from collections import Counter
counter_obj = Counter()
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
items, start = [], 0
for end in range(2,len(lista)+2,2):
print end
items.append(lista[start:end])
start = end
for item in items:
counter_obj[item[0]] += int(item[1])
yourdict = dict(counter_obj)
print yourdict

Adding Values that Have the Same Key

I know this is simple, but I've been searching for about an hour and was unable to find a good answer. I know there has to be something more elegant than iterating through the keys to look for matches and then adding the values.
What I have is:
test_dict = [{'Bob':2}, {'Jane':1}, {'Marco':1}, {'Suzy':2}, {'Bob':1},{'Mark':3}, {'Ellen':1}, {'Suzy':1}]
What I want to do is add the values together when the keys match (in this case Bob and Suzy). This would eliminate the duplicate keys by adding their values together. It would look like:
test_dict = [{'Bob':3}, {'Jane':1}, {'Marco':1}, {'Suzy':3},{'Mark':3}, {'Ellen':1}]
What I have tried is:
from collections import Counter
final = Counter(test_dict)
As well as other collections module items. I'd really prefer not to loop through the dictionary to compare each key for a match and then add the values together. This seems like a really inefficient idea but I can't think of (or find) anything else.
You were on the right track with Counter:
>>> sum((Counter(d) for d in test_dict), Counter())
Counter({'Bob': 3, 'Ellen': 1, 'Jane': 1, 'Marco': 1, 'Mark': 3, 'Suzy': 3})
To do the same with an explicit loop would be like:
>>> counter = Counter()
>>> for dict_ in test_dict:
... counter.update(**dict_)
...
>>> counter
Counter({'Bob': 3, 'Ellen': 1, 'Jane': 1, 'Marco': 1, 'Mark': 3, 'Suzy': 3})
You can also use defaultdict, where you extract each key and value form test_dict and add it to the defaultdict which takes care of instantiating keys that do not yet exist.
from collections import defaultdict
dd = defaultdict(int)
for d in test_dict:
dd[d.keys()[0]] += d.values()[0]
>>> dd
defaultdict(int,
{'Bob': 3,
'Ellen': 1,
'Jane': 1,
'Marco': 1,
'Mark': 3,
'Suzy': 3})

Iterate a list through a dictionary

I have a list with the same values as the keys of a dictionary. I want to write a code that does something to the values of the dictionary (e.g. increases them by one) as many times as their key appears in the list.
So e.g.
listy=['dgdg','thth','zuh','zuh','thth','dgdg']
dicty = {'dgdg':1, 'thth':2, 'zuh':5}
I tried this code:
def functy (listx,dictx):
for i in range (0, len(listx)):
for k,v in dictx:
if listx[i]==k:
v=v+1
else:
pass
functy(listy, dicty)
But it raises this error:
Traceback (most recent call last):
File "C:\Python34\8.py", line 12, in <module>
functy(listy, dicty)
File "C:\Python34\8.py", line 6, in functy
for k,v in dictx:
ValueError: too many values to unpack (expected 2)
Could you tell me why it doesn't work and how I can make it?
dict.__iter__ will by default refer to dict.keys().
Because you want both the key and its value it should be
for k,v in dictx.items():
which will yield a list of tuples:
>>> a={1:2,2:3,3:4}
>>> a.items()
[(1, 2), (2, 3), (3, 4)]
iteritems is also available, but yields from a generator instead of a list:
>>> a.iteritems()
<dictionary-itemiterator object at 0x00000000030115E8>
However, you should take into consideration directly indexing by key, otherwise your assignment v=v+1 will not be persisted to the dict:
def functy (listx,dictx):
for item in listx:
if item in dictx:
dictx[item]+=1
>>> listy=['dgdg','thth','zuh','zuh','thth','dgdg']
>>> dicty = {'dgdg':1, 'thth':2, 'zuh':5}
>>> print dicty
{'thth': 2, 'zuh': 5, 'dgdg': 1}
>>> functy(listy, dicty)
>>> print dicty
{'thth': 4, 'zuh': 7, 'dgdg': 3}
You're missing the point of having a dictionary, which is that you can index it directly by key instead of iterating over it:
def functy(listx, dictx):
for item in listx:
if item in dictx:
dictx[item] += 1
It looks like you're trying to use a dictionary as a counter. If that's the case, why not use the built-in Python Counter?
from collections import Counter
dicty = Counter({'dgdg':1, 'thth':2, 'zuh':5})
dicty += Counter(['dgdg','thth','zuh','zuh','thth','dgdg'])
# dicty is now Counter({'zuh': 7, 'thth': 4, 'dgdg': 3})
I suggest you use collections.Counter, which is a dict subclass for counting hashable objects.
>>> import collections
>>> count_y = collections.Counter(dicty) # convert dicty into a Counter
>>> count_y.update(item for item in listy if item in count_y)
>>> count_y
Counter({'zuh': 7, 'thth': 4, 'dgdg': 3})
You can iterate a dictionary like this:
for k in dictx:
v = dictx[k]
dictx.items() instead of dictx. When trying to iterate over dictx you are receiving only keys.
listy=['dgdg','thth','zuh','zuh','thth','dgdg']
dicty = {'dgdg':1, 'thth':2, 'zuh':5}
# items() missed and also dicty not updated in the original script
def functy (listx,dictx):
for i in range (0, len(listx)):
for k,v in dictx.items():
if listx[i]==k:
dictx[k] += 1
else:
pass
functy(listy, dicty)
print(dicty)
{'dgdg': 3, 'thth': 4, 'zuh': 7}

Categories