Finding the max repeated value in dictionary keys python - python

I have dictionary in python as
d = {(2,4):40,(1,2,4):8}
in this dict keys are tuples,
Values are count of each element in the tuple
I need output 1 more dictionary as
Total count of values in all the tuples
out={2:48,4:48,1:8}
The example I gave is small dictionary but I have very large dictionary so time complexity plays the key role
Can someone help me out?

You can do this in a single pass, just iterate over the keys and add the corresponding value. You can use a collections.Counter or whatever dict/dict-like container you prefer:
>>> origin = {(2,4):40,(1,2,4):8}
>>> from collections import Counter
>>> counts = Counter()
>>> for k, v in origin.items(): # python 2 use .iteritems()
... for x in k:
... counts[x] += v
...
>>> counts
Counter({2: 48, 4: 48, 1: 8})

One can utilize the capability of multiple Counters to be handily summed to create some neat one-liners, but their performance can't compete with juanpa's explicit loop approach (timings for the original dict):
from collections import Counter
from operator import add
from functools import reduce
# 1
out = sum((Counter({x: v for x in k}) for k, v in d.items()), Counter())
# timeit: 16.2
# 2
out = reduce(add, (Counter({x: v for x in k}) for k, v in d.items()))
# timeit: 10.8
# 3
# juanpa's approach
# timeit: 3.7

Dict = {(2,4):40,(1,2,4):8}
Out={}
for k,v in Dict.items():
for i in k:
if i in Out:
Out[i] += v
else:
Out[i] = v
print(Out)
{2: 48, 4: 48, 1: 8}

Related

How to get sum of all the keys in python list of tuples?

I am very new to python and I was wondering how to get the following dictionary from the list of tuples?
Question
x = [('A',1),('B',2),('C',3),('A',10),('B',10)]
required_dict = {'A': 11,'B': 12, 'C': 3}
Easiest way IMO is using defaultdict and a single for loop:
from collections import defaultdict
required_dict = defaultdict(int)
for k, v in x:
required_dict[k] += v
You could also do this in a single line with a nested comprehension, but this is less efficient because it involves iterating over x repeatedly instead of doing it in a single pass:
required_dict = {k: sum(v for k1, v in x if k1 == k) for k, v in x}
Another comprehension-based solution that doesn't involve redundant iteration would be to use groupby in order to iterate only within each group of identical keys:
from itertools import groupby
required_dict = {
k: sum(v for _, v in g)
for k, g in groupby(sorted(x), key=lambda t: t[0])
}
These three approaches are respectively:
O(n) (single iteration)
O(n^2) (re-iteration for each element)
O(nlogn) (a full sort followed by a single iteration)

Find minimum non zero value in dictionary (Python)

I have a dictionary and I would like to get the key whose value is the minimum nonzero.
E.g. given the input:
{1:0, 2:1, 3:2}
It would return 2.
You can do it on one iteration.
d = {1:0, 2:1, 3:2}
# Save the minimum value and the key that belongs to it as we go
min_val = None
result = None
for k, v in d.items():
if v and (min_val is None or v < min_val):
min_val = v
result = k
print(result)
Some assumptions:
Negative values will be considered
It will return the first key that found
If it helps, min_val will hold the minimum value
You can use the fact 0 is considered False to filter out 0 values. Then use next with a generator expression:
d = {1:0, 2:1, 3:2}
val = min(filter(None, d.values()))
res = next(k for k, v in d.items() if v == val) # 2
This will only return one key in the case of duplicate keys with 1 as value. For multiple matches, you can use a list comprehension:
res = [k for k, v in d.items() if v == val]
Note your literal ask for "minimum non-zero" will include negative values.
Performance note
The above solution is 2-pass but has time complexity O(n), it's not possible to have lower complexity than this. A 1-pass O(n) solution is possible as shown by #Maor, but this isn't necessarily more efficient:
# Python 3.6.0
%timeit jpp(d) # 43.9 ms per loop
%timeit mao(d) # 98.8 ms per loop
%timeit jon(d) # 183 ms per loop
%timeit reu(d) # 303 ms per loop
Code used for benchmarking:
from random import randint
n = 10**6
d = {i: randint(0, 9) for i in range(n)}
def jpp(d):
val = min(filter(None, d.values()))
return next(k for k, v in d.items() if v == val)
def mao(d):
min_val = None
result = None
for k, v in d.items():
if v and (min_val is None or v < min_val):
min_val = v
result = k
return result
def jon(d):
return min({i for i in d if d[i] != 0})
def reu(d):
no_zeros = {k: v for k, v in d.items() if v != 0}
key, val = min(no_zeros.items(), key=itemgetter(1))
return key
Assuming the dict is named a:
from operator import itemgetter
a = {1:0, 2:1, 3:2}
# remove zeros
no_zeros = {k: v for k, v in a.items() if v != 0} # can use `if v`
# find minimal key and value (by value)
key, val = min(no_zeros.items(), key=itemgetter(1))
# key = 2, val = 1
print(min(i for i in dictionary if dictionary[i] != 0))
this makes a set with no zeros and return the minimum value in that set. Though it is worth pointing out this makes 2 iterations and is thus slower than Maor Refaeli's solution.
Solution
some_dict = {1:0, 2:1, 3:2}
compare = []
for k, v in some_dict.items():
if k != 0:
compare.append(k)
x = min(compare)
print(x)
I just appended all the non-zero keys to a list (compare) and then applied min(compare)
We can plug x back in and check that it is pointing to the key 1 which is the smallest non-zero key and that returns it's value which is 0
>>> print(some_dict[x])
>>> 0

compare a list with values in dictionary

I have a dictionary contains lists of values and a list:
dict1={'first':['hi','nice'], 'second':['night','moon']}
list1= [ 'nice','moon','hi']
I want to compare the value in the dictionary with the list1 and make a counter for the keys if the value of each key appeared in the list:
the output should like this:
first 2
second 1
here is my code:
count = 0
for list_item in list1:
for dict_v in dict1.values():
if list_item.split() == dict_v:
count+= 1
print(dict.keys,count)
any help? Thanks in advance
I would make a set out of list1 for the O(1) lookup time and access to the intersection method. Then employ a dict comprehension.
>>> dict1={'first':['hi','nice'], 'second':['night','moon']}
>>> list1= [ 'nice','moon','hi']
>>>
>>> set1 = set(list1)
>>> {k:len(set1.intersection(v)) for k, v in dict1.items()}
{'first': 2, 'second': 1}
intersection accepts any iterable argument, so creating sets from the values of dict1 is not necessary.
You can use the following dict comprehension:
{k: sum(1 for i in l if i in list1) for k, l in dict1.items()}
Given your sample input, this returns:
{'first': 2, 'second': 1}
You can get the intersection of your list and the values of dict1 using sets:
for key in dict1.keys():
count = len(set(dict1[key]) & set(list1))
print("{0}: {1}".format(key,count))
While brevity can be great, I thought it would be good to also provide an example that is as close to the OPs original code as possible:
# notice conversion to set for O(1) lookup
# instead of O(n) lookup where n is the size of the list of desired items
dict1={'first':['hi','nice'], 'second':['night','moon']}
set1= set([ 'nice','moon','hi'])
for key, values in dict1.items():
counter = 0
for val in values:
if val in set1:
counter += 1
print key, counter
Using collections.Counter
from collections import Counter
c = Counter(k for k in dict1 for i in list1 if i in dict1[k])
# Counter({'first': 2, 'second': 1})
The most simplest and basic approach would be:
dict1={'first':['hi','nice'], 'second':['night','moon']}
list1= [ 'nice','moon','hi']
listkeys=list(dict1.keys())
listvalues=list(dict1.values())
for i in range(0,len(listvalues)):
ctr=0
for j in range(0,len(listvalues[i])):
for k in range(0,len(list1)):
if list1[k]==listvalues[i][j]:
ctr+=1
print(listkeys[i],ctr)
Hope it helps.

Combine python dictionaries that share values and keys

I am doing some entity matching based on string edit distance and my results are a dictionary with keys (query string) and values [list of similar strings] based on some scoring criteria.
for example:
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': ['benyamin'],
'benyamin': ['benjamin'],
'carl': ['karl'],
'karl': ['carl'],
}
Each value also has a corresponding dictionary item, for which it is the key (e.g. 'carl' and 'karl').
I need to combine the elements that have shared values. Choosing one value as the new key (lets say the longest string). In the above example I would hope to get:
results = {
'benjamin': ['ben', 'benj', 'benyamin', 'beny', 'benjamin', 'benyamin'],
'carl': ['carl','karl']
}
I have tried iterating through the dictionary using the keys, but I can't wrap my head around how to iterate and compare through each dictionary item and its list of values (or single value).
This is one solution using collections.defaultdict and sets.
The desired output is very similar to what you have, and can be easily manipulated to align.
from collections import defaultdict
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': 'benyamin',
'benyamin': 'benjamin',
'carl': 'karl',
'karl': 'carl',
}
d = defaultdict(set)
for i, (k, v) in enumerate(results.items()):
w = {k} | (set(v) if isinstance(v, list) else {v})
for m, n in d.items():
if not n.isdisjoint(w):
d[m].update(w)
break
else:
d[i] = w
result = {max(v, key=len): v for k, v in d.items()}
# {'benjamin': {'ben', 'benj', 'benjamin', 'beny', 'benyamin'},
# 'carl': {'carl', 'karl'}}
Credit to #IMCoins for the idea of manipulating v to w in second loop.
Explanation
There are 3 main steps:
Convert values into a consistent set format, including keys and values from original dictionary.
Cycle through this dictionary and add values to a new dictionary. If there is an intersection with some key [i.e. sets are not disjoint], then use that key. Otherwise, add to new key determined via enumeration.
Create result dictionary in a final transformation by mapping max length key to values.
EDIT : Even though performance was not the question here, I took the liberty to perform some tests between jpp's answer, and mine... here is the full script. My script performs the tests in 17.79 seconds, and his in 23.5 seconds.
import timeit
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': ['benyamin'],
'benyamin': ['benjamin'],
'carl': ['karl'],
'karl': ['carl'],
}
def imcoins(result):
new_dict = {}
# .items() for python3x
for k, v in results.iteritems():
flag = False
# Checking if key exists...
if k not in new_dict.keys():
# But then, we also need to check its values.
for item in v:
if item in new_dict.keys():
# If we update, set the flag to True, so we don't create a new value.
new_dict[item].update(v)
flag = True
if flag == False:
new_dict[k] = set(v)
# Now, to sort our newly created dict...
sorted_dict = {}
for k, v in new_dict.iteritems():
max_string = max(v)
if len(max_string) > len(k):
sorted_dict[max(v, key=len)] = set(v)
else:
sorted_dict[k] = v
return sorted_dict
def jpp(result):
from collections import defaultdict
res = {i: {k} | (set(v) if isinstance(v, list) else {v}) \
for i, (k, v) in enumerate(results.items())}
d = defaultdict(set)
for i, (k, v) in enumerate(res.items()):
for m, n in d.items():
if n & v:
d[m].update(v)
break
else:
d[i] = v
result = {max(v, key=len): v for k, v in d.items()}
return result
iterations = 1000000
time1 = timeit.timeit(stmt='imcoins(results)', setup='from __main__ import imcoins, results', number=iterations)
time2 = timeit.timeit(stmt='jpp(results)', setup='from __main__ import jpp, results', number=iterations)
print time1 # Outputs : 17.7903265883
print time2 # Outputs : 23.5605850732
If I move the import from his function to global scope, it gives...
imcoins : 13.4129249463 seconds
jpp : 21.8191823393 seconds

Python: get key with the least value from a dictionary BUT multiple minimum values

I'm trying to do the same as
Get the key corresponding to the minimum value within a dictionary, where we want to get the key corresponding to the minimum value in a dictionary.
The best way appears to be:
min(d, key=d.get)
BUT I want to apply this on a dictionary with multiple minimum values:
d = {'a' : 1, 'b' : 2, 'c' : 1}
Note that the answer from the above would be:
>>> min(d, key=d.get)
'a'
However, I need both the two keys that have a minimum value, namely a and c.
What would be the best approach?
(Ultimately I want to pick one of the two at random, but I don't think this is relevant).
One simple option is to first determine the minimum value, and then select all keys mapping to that minimum:
min_value = min(d.itervalues())
min_keys = [k for k in d if d[k] == min_value]
For Python 3 use d.values() instead of d.itervalues().
This needs two passes through the dictionary, but should be one of the fastest options to do this anyway.
Using reservoir sampling, you can implement a single pass approach that selects one of the items at random:
it = d.iteritems()
min_key, min_value = next(it)
num_mins = 1
for k, v in it:
if v < min_value:
num_mins = 1
min_key, min_value = k, v
elif v == min_value:
num_mins += 1
if random.randrange(num_mins) == 0:
min_key = k
After writing down this code, I think this option is of rather theoretical interest… :)
EDITED: Now using setdefault as suggested :)
I don't know if that helps you but you could build a reverse dictionary with the values as key and the keys (in a list as values).
d = {'a' : 1, 'b' : 2, 'c' : 1}
d2 = {}
for k, v in d.iteritems():
d2.setdefault(v, []).append(k)
print d2[min(d2)]
It will print this:
['a', 'c']
However, I think the other solutions are more compact and probably more elegant...
min_keys = [k for k in d if all(d[m] >= d[k] for m in d)]
or, slightly optimized
min_keys = [k for k, x in d.items() if not any(y < x for y in d.values())]
It's not as efficient as other solutions, but demonstrates the beauty of python (well, to me at least).
def get_rand_min(d):
min_val = min(d.values())
min_keys = filter(lambda k: d[k] == min_val, d)
return random.choice(min_keys)
You can use heapq.nsmallest to get the N smallest members of the dict, then filter out all that are not equal to the lowest one. That's provided you know the maximal number of smallest members you can have, let's assume it's N here. something like:
from heapq import nsmallest
from operator import itemgetter
#get the N smallest members
smallestN = nsmallest(N, myDict.iteritems(), itemgetter(1)))
#leave in only the ones with a score equal to the smallest one
smallest = [x for x in smallestN if x[1] == smallestN[0][1]]
minValue,minKey = min((v,k) for k,v in d.items())
Due to your semantics you need to go through the entire dictionary at least once. This will retrieve exactly 1 minimum element.
If you want all the minimum items in O(log(N)) query time, you can insert your elements into a priority queue as you generate them (if you can). The priority queue must have O(1) insertion time and O(log(N)) extract-min time. (This will be as bad as sorting if all your elements have the same value, but otherwise may work quite well.)
One pass solution would be:
>>> result = [100000, []]
>>> for key, val in d.items():
... if val < result[0]:
... result[1] = [key]; result[0]=val;
... elif val == result[0]:
... result[1].append(key)
...
>>> result
[1, ['a', 'c']]
Here's another way to do it in one pass:
d = {'foo': 2, 'a' : 1, 'b' : 2, 'c' : 1, 'z': 99, 'x': 1}
current_min = d[d.keys()[0]]
min_keys = []
for k, v in d.iteritems():
if v < current_min:
current_min = v
min_keys = [k]
elif v == current_min:
min_keys.append(k)
print min_keys
['a', 'x', 'c']
This works:
d = {'a' :1, 'b' : 2, 'c' : 1}
min_value = min(d.values())
result = [x[0] for x in d.items() if x[1] == k]
Hmpf. After fixing up the code to work, I ended up with #Sven Marnach's answer, so, disregard this ;)

Categories