store multiple values for one key in dictionary - python

I have a list of data, which has 2 values:
a 12
a 11
a 5
a 12
a 11
I would like to use a dictionary, so I can end up with a list of values for each of the key. Column 1 may have a different entry, like 'b', so I can arrange data based on column 1 as key, while column 2 is the data for each key
[a:12,11,5]
How do I achieve this? From what I read, if 2 values has the same key, the last one override the previous one, so only one key is in the dictionary.
d={}
for line in results:
templist=line.split(' ')
thekey=templist[0]
thevalue=templist[1]
if thevalue in d:
d[thekey].append(thevalue)
else:
d[thekey]=[thevalue]
Am I approaching the problem using the wrong way?

Python dicts can have only one value for a key, so you cannot assign multiple values in the fashion you are trying to.
Instead, store the mutiple values in a list corresponding to the key so that the list becomes the one value corresponding to the key:
d = {}
d["a"] = []
d["a"].append(1)
d["a"].append(2)
>>> print d
{'a': [1, 2]}
You can use a defaultdict to simplify this, which will initialise the key if it doesn't exist with an empty list as below:
from collections import defaultdict
d = defaultdict(list)
d["a"].append(1)
d["a"].append(2)
>>> print d
defaultdict(<type 'list'>, {'a': [1, 2]})
If you don't want repeat values for a key, you can use a set instead of list. But do note that a set is unordered.
from collections import defaultdict
d = defaultdict(set)
d["a"].add(1)
d["a"].add(2)
d["a"].add(1)
>>> print d
defaultdict(<type 'set'>, {'a': set([1, 2])})
If you need to maintain order, either use sorted at runtime, or use a list
with an if clause to check for values
from collections import defaultdict
d = defaultdict(list)
for item in (1, 2, 1, 2, 3, 4, 1, 2):
if item not in d["a"]:
d["a"].append(item)
>>> print d
defaultdict(<type 'list'>, {'a': [1, 2, 3, 4]})

Related

How to sum values in multidimensional dictionary?

Normally I would use sum(dict['A'].values()) in order to sum all the values in a dictionary with the key "A". However in this case it is not all the values of the "main" key I want to sum, but rather all the values where the "secondary/sub-key" has a specific name. Let me show a simplified example below:
dict = {'A':{'val1':3,'val2':5},'B':{'val1':2,'val2':6}}
sum1 = dict['A']['val1']+dict['B']['val1']
sum2 = dict['A']['val2']+dict['B']['val2']
The example above is fairly easy since it's only a 2*2 dimension, and thus I can fairly easily add the values directly. But this method is not practical when the dictionary gets larger. So I wonder what is the most efficient solution.
To sum values for a single subkey you could use sum() with a generator expression:
>>> d = {'A': {'val1': 3,'val2': 5}, 'B': {'val1': 2, 'val2': 6}}
>>> sum(x['val1'] for x in d.values())
5
To sum values for all subkeys you can use collections.Counter:
>>> from collections import Counter
>>> counter = sum(map(Counter, d.values()), Counter())
>>> dict(counter)
{'val2': 11, 'val1': 5}
You can iterate through the keys of your dict and retrieve the values to add them to a variable.
So you would start by declaring the sum variables where you will store the sums, and iterate through every key in your dict to add the corresponding values to the sum variables. You could also easily add more values in the future by adding val3 and sum3, val4 and sum4, etc. Here is an example:
my_dict = {'A':{'val1':3,'val2':5},'B':{'val1':2,'val2':6}}
# initiate variables to store the sums
sum1, sum2 = (0,) * 2
# iterate through the keys of your dict and increment the sum variables
for key in my_dict:
sum1 += my_dict[key]['val1']
sum2 += my_dict[key]['val2']
print(sum1)
print(sum2)
This prints:
5
11
Also, as #jpp mentioned, never name a variable after a class, so don't name your dictionnary dict e.g. you can name it my_dict
You can store sum of the every element inside the dict in a new dictionary
dict = {'A':{'val1':3,'val2':5},'B':{'val1':2,'val2':7}}
sums = {}
sum = 0
for element in dict:
for key in dict[element]:
sum += dict[element][key]
sums[element] = sum
sum = 0
print(sums['A']) # 8
print(sums['B']) # 9
Use collections.Counter:
>>> from collections import Counter
>>> d = {'A':{'val1':3,'val2':5},'B':{'val1':2,'val2':6}}
>>> sum((Counter(d[x]) for x in d), Counter())
Counter({'val2': 11, 'val1': 5})
Also note that you shouldn't name your dictionary as dict. It shadows the built-in dict function.
I suggest you this easily readable and understandable solution, which iterates over each key/value pair to update a new dictionary with the sum of values, whatever the size of input dictionaries. I also renamed the input dictionary into d instead of dict because dict is the name of the Python built-in class:
from collections import defaultdict
d = {'A':{'val1':3,'val2':5},'B':{'val1':2,'val2':6}}
sumDict = defaultdict(int)
for v1 in d.values():
for k2,v2 in v1.items():
sumDict[k2] += v2
print(dict(sumDict)) # {'val1': 5, 'val2': 11}
print(sumDict['val1']) # 5
print(sumDict['val2']) # 11
You can find the union of relevant keys. Then use a dictionary comprehension to construct a dictionary mapping these keys to their sums:
d = {'A':{'val1':3,'val2':5},'B':{'val1':2,'val2':6}}
sum_keys = set().union(*d.values())
sums = {k: sum(d[i].get(k, 0) for i in d) for k in sum_keys}
print(sums)
{'val1': 5, 'val2': 11}

Get rid of NaN as key in python dictionary

I have a dictionary in which one key is a NaN, and I want to delete the key and its corresponding value. I have tried this:
from math import isnan
clean_dict = filter(lambda k: not isnan(k), dict_HMDB_ID)
but the clean_dict is not a dictionary. I want to output it from python, but I get ''
filter doesn't return a dictionary. It returns a list in Python 2 and a filter object in Python 3.
You can use dict.pop:
d = {'a': 1, 'b': 2}
print(d)
# {'a': 1, 'b': 2}
d.pop('b')
print(d)
# {'a': 1}
And in your specific case,
dict_HMDB_ID.pop(float('NaN'))
For the sake of completeness. it could be done with a dictionary comprehension but there is no point in iterating since keys are unique anyway
clean_dict = {k: v for k, v in dict_HMDB_ID.items() if not math.isnan(k)}
If you insist on using filter here (you really shouldn't) you will need to:
pass it dict_HMDB_ID.items() so it will keep the original values
provide a custom function because it will now operate on (key, value) tuples.
transform the returned filter object (it will contain an iterator with (key, value) tuples) back to a dictionary
import math
dict_HMDB_ID = {1: 'a', float('Nan'): 'b'}
clean_dict = dict(filter(lambda tup: not math.isnan(tup[0]), dict_HMDB_ID.items()))
print(clean_dict)
# {1: 'a'}
I should probably mention that the first approach (.pop) directly modifies the dict_HMDB_ID while the other two create a new dictionary. If you wish to use .pop and create a new dictionary (leaving dict_HMDB_ID as it is) you can create a new dictionary with dict:
d = {'a': 1, 'b': 2}
new_d = dict(d)
new_d.pop('b')
print(d)
# {'b': 2, 'a': 1}
print(new_d)
# {'a': 1}
you could do:
from math import nan
dict_HMDB_ID.pop(nan)
clean_dict = dict_HMDB_ID
or the other way around if you wna to preserve dict_HMDB_ID
from math import nan
clean_dict = dict(dict_HMDB_ID)
clean_dict.pop(nan)

Sum array values with specified index in Python

I have arrays like this:
['[camera_positive,3]', '[lens_positive,1]', '[camera_positive,2]', '[lens_positive,1]', '[lens_positive,1]', '[camera_positive,1]']
How to sum all value on index [1] with same string on index [0]?
Example:
camera_positive = 3 + 2 + 1 = 6
lens_positive = 1 + 1 + 1 = 3
You could use set in order to extract the unique keys and then use list comprehension to compute the sum for each key:
data = [['camera_positive', 3],
['lens_positive', 1],
['camera_positive', 2],
['lens_positive', 1],
['lens_positive', 1],
['camera_positive', 1]]
keys = set(key for key, value in data)
for key1 in keys:
total = sum(value for key2, value in data if key1 == key2)
print("key='{}', sum={}".format(key1, total))
this gives:
key='camera_positive', sum=6
key='lens_positive', sum=3
I'm assuming that you have a list of list, not a list of strings as shown in the question. Otherwise you'll have to do some parsing. That said, I would solve this problem by creating a dictionary, and then iterating over the values and adding them to the dictionary as you go.
The default dict allows this program to work without getting a key error, as it'll assume 0 if the key does not exist yet. You can read up on defaultdict here: https://docs.python.org/3.3/library/collections.html#collections.defaultdict
lmk if that helps!
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d
defaultdict(<class 'int'>, {})
>>> lst=[['a',1], ['b', 2], ['a',4]]
>>> for k, v in lst:
... d[k] += v
...
>>> d
defaultdict(<class 'int'>, {'a': 5, 'b': 2})
You could group the entries by their first index using groupby with lambda x: x[0] or operator.itemgetter(0) as key.
This is maybe a bit less code than what Nick Brady showed. However you would need to sort the list first (for the same key), so it might be slower than his approach.

How to address a dictionary in a list of ordered dicts by unique key value?

(Using Python 2.7) The list, for example:
L = [
{'ID': 1, 'val': ['eggs']},
{'ID': 2, 'val': ['bacon']},
{'ID': 6, 'val': ['sausage']},
{'ID': 9, 'val': ['spam']}
]
This does what I want:
def getdict(list, dict_ID):
for rec in list
if rec['ID'] == dict_ID:
return rec
print getdict(L, 6)
but is there a way to address that dictionary directly, without iterating over the list until you find it?
The use case: reading a file of records (ordered dicts). Different key values from records with a re-occurring ID must be merged with the record with the first occurrence of that ID.
ID numbers may occur in other key values, so if rec['ID'] in list would produce false positives.
While reading records (and adding them to the list of ordered dicts), I maintain a set of unique ID's and only call getdict if a newly read ID is already in there. But then still, it's a lot of iterations and I wonder if there isn't a better way.
The use case: reading a file of records (ordered dicts). Different key
values from records with a re-occurring ID must be merged with the
record with the first occurrence of that ID.
You need to use a defaultdict for this:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['a'].append(1)
>>> d['a'].append(2)
>>> d['b'].append(3)
>>> d['c'].append(4)
>>> d['b'].append(5)
>>> print(d['a'])
[1, 2]
>>> print(d)
defaultdict(<type 'list'>, {'a': [1, 2], 'c': [4], 'b': [3, 5]})
If you want to store other objects, for example a dictionary, just pass that as the callable:
>>> d = defaultdict(dict)
>>> d['a']['values'] = []
>>> d['b']['values'] = []
>>> d['a']['values'].append('a')
>>> d['a']['values'].append('b')
>>> print(d)
defaultdict(<type 'dict'>, {'a': {'values': ['a', 'b']}, 'b': {'values': []}})
Maybe I'm missing something, but couldn't you use a single dictionary?
L = {
1 : 'eggs',
2 : 'bacon',
6 : 'sausage',
9 : 'spam'
}
Then you can do L.get(ID). This will either return the value (eggs, etc) or None if the ID isn't in the dict.
You seem to be doing an inverse dictionary lookup, that is a lookup by value instead of a key. Inverse dictionary lookup - Python has some pointers on how to do this efficiently.

converting list to dict and averaging the values of duplicates in python

I have a list:
list = [(a,1),(b,2),(a,3)]
I want to convert it to a dict where when there is a duplicate (eg. (a,1) and (a,3)), it will be get the average so dict will just have 1 key:value pair which would be in this case a:2.
from collections import defaultdict
l = [('a',1),('b',2),('a',3)]
d = defaultdict(list)
for pair in l:
d[pair[0]].append(pair[1]) #add each number in to the list with under the correct key
for (k,v) in d.items():
d[k] = sum(d[k])/len(d[k]) #re-assign the value associated with key k as the sum of the elements in the list divided by its length
So
print(d)
>>> defaultdict(<type 'list'>, {'a': 2, 'b': 2})
Or even nicer and producing a plain dictionary in the end:
from collections import defaultdict
l = [('a',1),('b',2),('a',3)]
temp_d = defaultdict(list)
for pair in l:
temp_d[pair[0]].append(pair[1])
#CHANGES HERE
final = dict((k,sum(v)/len(v)) for k,v in temp_d.items())
print(final)
>>>
{'a': 2, 'b': 2}
Note that if you are using 2.x (as you are, you will need to adjust the following to force float division):
(k,sum(v)/float(len(v)))
OR
sum(d[k])/float(len(d[k]))

Categories