Finding the dictionary keys whose values are numerically highest - python

Given a Python dict of the form:
dict = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258, ......}
Is there an easy way to print the first x keys with the highest numeric values? That is, say:
Beth 9102
Cecil 3258
Currently this is my attempt:
max = 0
max_word = ""
for key, value in w.word_counts.iteritems():
if value > max:
if key not in stop_words:
max = value
max_word = key
print max_word

I'd simply sort the items by the second value and then pick the first K elements :
d_items = sorted(d.items(), key=lambda x: -x[1])
print d_items[:2]
[('Beth', 9102), ('Cecil', 3258)]
The complexity of this approach is O(N log N + K), not that different from optimal O(N + K log K) (using QuickSelect and sorting just the first K elements).

Using collections.Counter.most_common:
>>> from collections import Counter
>>> d = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258}
>>> c = Counter(d)
>>> c.most_common(2)
[('Beth', 9102), ('Cecil', 3258)]
It uses sorted (O(n*log n)), or heapq.nlargest(k) that might be faster than sorted if k << n, or max() if k==1.

>>> (sorted(dict.items(), key=lambda x:x[1]))[:2]
[('Alice', 2341), ('Cecil', 3258)]

items = sorted(w.word_counts.items(), lambda x, y: cmp(x[1], y[1]), None, True)
items[:5]
Replace 5 with the number of elements you want to get.

d = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258}
vs = sorted(d, key=d.get,reverse=True)
l = [(x,d.get(x)) for x in vs[0:2]]
n [4]: l
Out[4]: [('Beth', 9102), ('Cecil', 3258)]

Convert dict to list of tuples [(2341, 'Alice'), ...] then sort it (without key=lambda ...).

Related

Taking two values from two list (Random Order) of tuples and multiplying

I have two lists and they are lists of tuples.
For example
List1 = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
List2 = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
If the items were in the same order I could use the following code to multiply the two values:
val = [(t1, v1*v2) for (t1, v1), (t2, v2) in zip(tf,idf)]
But my issue is the order of one the lists outputs randomly so the code doesn't work. So essentially I need to see if the word in one list matches the word in the other and then multiply to get an output in a similar way as the list of tuples.
This question excellently demonstrates the advantages of the dictionary data structure and how your problem could benefit from it. So first, we convert your list of tuples to dictionaries (dict-calls) and then you "combine" the two dicts as per your requirement to get the desired result.
lst1 = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
lst2 = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
dct1 = dict(lst1)
dct2 = dict(lst2)
res = {k: v * dct2.get(k, 1) for k, v in dct1.items()}.items()
which produces:
dict_items([('zaidan', 1.8410476297432288e-06), ('zimmerman', 1.8410476297432288e-06), ('ypa', 1.656942866768906e-05)])
And if the dict_item data type is confusing, you can always cast it to a vanilla-list.
res = list(res)
print(res)
# [('zaidan', 1.8410476297432288e-06), ('zimmerman', 1.8410476297432288e-06), ('ypa', 1.656942866768906e-05)]
i would tell you the easiest solution if your data are the same.
just sort it :
ls1 = sorted(ls1, key=lambda tup: tup[0])
ls2 = sorted(ls2, key=lambda tup: tup[0])
val = [(t1, v1*v2) for (t1, v1), (t2, v2) in zip(ls1,ls2)]
If, for any reason, you do not want to use dictionary (although it is a superior solution) but want to do this with lists and tuples, what you are looking for is looping through the lists and checking for equality:
x = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
y = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
z = []
for item in x:
for _item in y:
if item[0] == _item[0]
z.append((item[0], item[1]*_item[1]))
At the end, z will be a list of tuples with the original string at the 0 index and the result of multiplication at the 1 index.

Multiple dynamic sort values in lambda

I can sort an array by mapping. But how can i build the lambda for a unknown count of dynamic maps in s?
l = ['1','2','3']
s = [{ '3':'a', '2':'a', '1':'c'},{ '3':'z', '2':'a', '1':'b'},{ '1':34, '2':123, '3':1000}]
sorted(l, key=lambda x: (s[0][x], s[1][x], s[2][x]))
yes, why hardcoding this when you can iterate on s and create the key using list comprehension:
sorted(l, key=lambda x: [p[x] for p in s])
(note that now key is a list, no longer a tuple but that doesn't matter since they share the same ordering)
You can sort in an arbitrary number of parameters using comparator like so:
from functools import cmp_to_key
l = ['1','2','3']
s = [{ '3':'a', '2':'a', '1':'c'},{ '3':'z', '2':'a', '1':'b'},{ '1':34, '2':123, '3':1000}]
def compare(x,y):
for dc in s:
if dc[x] < dc[y]:
return -1
elif dc[x] > dc[y]:
return 1
return 0
sorted(l, key= cmp_to_key(compare))
There's no need to use lambda function for this one.

how to apply a groupby on list of tuples in python?

In my function I will create different tuples and add to an empty list :
tup = (pattern,matchedsen)
matchedtuples.append(tup)
The patterns have format of regular expressions. I am looking for apply groupby() on matchedtuples in following way:
For example :
matchedtuples = [(p1, s1) , (p1,s2) , (p2, s5)]
And I am looking for this result:
result = [ (p1,(s1,s2)) , (p2, s5)]
So, in this way I will have groups of sentences with the same pattern. How can I do this?
My answer for your question will work for any input structure you will use and print the same output as you gave. And i will use only groupby from itertools module:
# Let's suppose your input is something like this
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', 's5')]
The same solution work if you add more values to your input:
# When you add more values to your input
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5"), ("p2", "s6"), ("p3", "s7")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', ('s5', 's6')), ('p3', 's7')]
Now, if you modify your input structure:
# Let's suppose your modified input is something like this
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], ['s5'])]
Also, the same solution work if you add more values to your new input structure:
# When you add more values to your new input
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"]), (["p2"], ["s6"]), (["p3"], ["s7"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], (['s5'], ['s6'])), (['p3'], ['s7'])]
Ps: Test this code and if it breaks with any other kind of inputs please let me know.
If you require the output you present, you'll need to manually loop through the grouping of matchedtuples and build your list.
First, of course, if the matchedtuples list isn't sorted, sort it with itemgetter:
from operator import itemgetter as itmg
li = sorted(matchedtuples, key=itmg(0))
Then, loop through the result supplied by groupby and append to the list r based on the size of the group:
r = []
for i, j in groupby(matchedtuples, key=itmg(0)):
j = list(j)
ap = (i, j[0][1]) if len(j) == 1 else (i, tuple(s[1] for s in j))
r.append(ap)

Finding the minimum value for different variables

If i am doing some math functions for different variables for example:
a = x - y
b = x**2 - y**2
c = (x-y)**2
d = x + y
How can i find the minimum value out of all the variables. For example:
a = 4
b = 7
c = 3
d = 10
So the minimum value is 3 for c. How can i let my program do this.
What have i thought so far:
make a list
append a,b,c,d in the list
sort the list
print list[0] as it will be the smallest value.
The problem is if i append a,b,c,d to a list i have to do something like:
lst.append((a,b,c,d))
This makes the list to be -
[(4,7,3,10)]
making all the values relating to one index only ( lst[0] )
If possible is there any substitute to do this or any way possible as to how can i find the minimum!
LNG - PYTHON
Thank you
You can find the index of the smallest item like this
>>> L = [4,7,3,10]
>>> min(range(len(L)), key=L.__getitem__)
2
Now you know the index, you can get the actual item too. eg: L[2]
Another way which finds the answer in the form(index, item)
>>> min(enumerate(L), key=lambda x:x[1])
(2, 3)
I think you may be going the wrong way to solving your problem, but it's possible to pull values of variable from the local namespace if you know their names. eg.
>>> a = 4
>>> b = 7
>>> c = 3
>>> d = 10
>>> min(enumerate(['a', 'b', 'c', 'd']), key=lambda x, ns=locals(): ns[x[1]])
(2, 'c')
a better way is to use a dict, so you are not filling your working namespace with these "junk" variables
>>> D = {}
>>> D['a'] = 4
>>> D['b'] = 7
>>> D['c'] = 3
>>> D['d'] = 10
>>> min(D, key=D.get)
'c'
>>> min(D.items(), key=lambda x:x[1])
('c', 3)
You can see that when the correct data structure is used, the amount of code required is much less.
If you store the numbers in an list you can use a reduce having a O(n) complexity due the list is not sorted.
numbers = [999, 1111, 222, -1111]
minimum = reduce(lambda mn, candidate: candidate if candidate < mn else mn, numbers[1:], numbers[0])
pack as dictionary, find min value and then find keys that have matching values (possibly more than one minimum)
D = dict(a = 4, b = 7, c = 3, d = 10)
min_val = min(D.values())
for k,v in D.items():
if v == min_val: print(k)
The buiit-in function min will do the trick. In your example, min(a,b,c,d) will yield 3.

Finding index of values in a list dynamically

I am having two lists as follows:
list_1
['A-1','A-1','A-1','A-2','A-2','A-3']
list_2
['iPad','iPod','iPhone','Windows','X-box','Kindle']
I would like to split the list_2 based on the index values in list_1. For instance,
list_a1
['iPad','iPod','iPhone']
list_a2
['Windows','X-box']
list_a3
['Kindle']
I know index method, but it needs the value to be matched to be passed along with. In this case, I would like to dynamically find the indexes of the values in list_1 with the same value. Is this possible? Any tips/hints would be deeply appreciated.
Thanks.
There are a few ways to do this.
I'd do it by using zip and groupby.
First:
>>> list(zip(list_1, list_2))
[('A-1', 'iPad'),
('A-1', 'iPod'),
('A-1', 'iPhone'),
('A-2', 'Windows'),
('A-2', 'X-box'),
('A-3', 'Kindle')]
Now:
>>> import itertools, operator
>>> [(key, list(group)) for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('A-1', [('A-1', 'iPad'), ('A-1', 'iPod'), ('A-1', 'iPhone')]),
('A-2', [('A-2', 'Windows'), ('A-2', 'X-box')]),
('A-3', [('A-3', 'Kindle')])]
So, you just want each group, ignoring the key, and you only want the second element of each element in the group. You can get the second element of each group with another comprehension, or just by unzipping:
>>> [list(zip(*group))[1] for key, group in
... itertools.groupby(zip(list_1, list_2), operator.itemgetter(0))]
[('iPad', 'iPod', 'iPhone'), ('Windows', 'X-box'), ('Kindle',)]
I would personally find this more readable as a sequence of separate iterator transformations than as one long expression. Taken to the extreme:
>>> ziplists = zip(list_1, list_2)
>>> pairs = itertools.groupby(ziplists, operator.itemgetter(0))
>>> groups = (group for key, group in pairs)
>>> values = (zip(*group)[1] for group in groups)
>>> [list(value) for value in values]
… but a happy medium of maybe 2 or 3 lines is usually better than either extreme.
Usually I'm the one rushing to a groupby solution ;^) but here I'll go the other way and manually insert into an OrderedDict:
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
from collections import OrderedDict
d = OrderedDict()
for code, product in zip(list_1, list_2):
d.setdefault(code, []).append(product)
produces a d looking like
>>> d
OrderedDict([('A-1', ['iPad', 'iPod', 'iPhone']),
('A-2', ['Windows', 'X-box']), ('A-3', ['Kindle'])])
with easy access:
>>> d["A-2"]
['Windows', 'X-box']
and we can get the list-of-lists in list_1 order using .values():
>>> d.values()
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
If you've noticed that no one is telling you how to make a bunch of independent lists with names like list_a1 and so on-- that's because that's a bad idea. You want to keep the data together in something which you can (at a minimum) iterate over easily, and both dictionaries and list of lists qualify.
Maybe something like this?
#!/usr/local/cpython-3.3/bin/python
import pprint
import collections
def main():
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
result = collections.defaultdict(list)
for list_1_element, list_2_element in zip(list_1, list_2):
result[list_1_element].append(list_2_element)
pprint.pprint(result)
main()
Using itertools.izip_longest and itertools.groupby:
>>> from itertools import groupby, izip_longest
>>> inds = [next(g)[0] for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
First group items of list_1 and find the starting index of each group:
>>> inds
[0, 3, 5]
Now use slicing and izip_longest as we need pairs list_2[0:3], list_2[3:5], list_2[5:]:
>>> [list_2[x:y] for x, y in izip_longest(inds, inds[1:])]
[['iPad', 'iPod', 'iPhone'], ['Windows', 'X-box'], ['Kindle']]
To get a list of dicts you can something like:
>>> inds = [next(g) for k, g in groupby(enumerate(list_1), key=lambda x:x[1])]
>>> {k: list_2[ind1: ind2[0]] for (ind1, k), ind2 in
zip_longest(inds, inds[1:], fillvalue=[None])}
{'A-1': ['iPad', 'iPod', 'iPhone'], 'A-3': ['Kindle'], 'A-2': ['Windows', 'X-box']}
You could do this if you want simple code, it's not pretty, but gets the job done.
list_1 = ['A-1','A-1','A-1','A-2','A-2','A-3']
list_2 = ['iPad','iPod','iPhone','Windows','X-box','Kindle']
list_1a = []
list_1b = []
list_1c = []
place = 0
for i in list_1[::1]:
if list_1[place] == 'A-1':
list_1a.append(list_2[place])
elif list_1[place] == 'A-2':
list_1b.append(list_2[place])
else:
list_1c.append(list_2[place])
place += 1

Categories