Python Ranking Dictionary Return Rank - python

I have a python dictionary:
x = {'a':10.1,'b':2,'c':5}
How do I go about ranking and returning the rank value? Like getting back:
res = {'a':1,c':2,'b':3}
Thanks
Edit:
I am not trying to sort as that can be done via sorted function in python. I was more thinking about getting the rank values from highest to smallest...so replacing the dictionary values by their position after sorting. 1 means highest and 3 means lowest.

If I understand correctly, you can simply use sorted to get the ordering, and then enumerate to number them:
>>> x = {'a':10.1, 'b':2, 'c':5}
>>> sorted(x, key=x.get, reverse=True)
['a', 'c', 'b']
>>> {key: rank for rank, key in enumerate(sorted(x, key=x.get, reverse=True), 1)}
{'b': 3, 'c': 2, 'a': 1}
Note that this assumes that the ranks are unambiguous. If you have ties, the rank order among the tied keys will be arbitrary. It's easy to handle that too using similar methods, for example if you wanted all the tied keys to have the same rank. We have
>>> x = {'a':10.1, 'b':2, 'c': 5, 'd': 5}
>>> {key: rank for rank, key in enumerate(sorted(x, key=x.get, reverse=True), 1)}
{'a': 1, 'b': 4, 'd': 3, 'c': 2}
but
>>> r = {key: rank for rank, key in enumerate(sorted(set(x.values()), reverse=True), 1)}
>>> {k: r[v] for k,v in x.items()}
{'a': 1, 'b': 3, 'd': 2, 'c': 2}

Using scipy.stats.rankdata:
[ins] In [55]: from scipy.stats import rankdata
[ins] In [56]: x = {'a':10.1, 'b':2, 'c': 5, 'd': 5}
[ins] In [57]: dict(zip(x.keys(), rankdata([-i for i in x.values()], method='min')))
Out[57]: {'a': 1, 'b': 4, 'c': 2, 'd': 2}
[ins] In [58]: dict(zip(x.keys(), rankdata([-i for i in x.values()], method='max')))
Out[58]: {'a': 1, 'b': 4, 'c': 3, 'd': 3}
#beta, #DSM scipy.stats.rankdata has some other 'methods' for ties also that may be more appropriate to what you are wanting to do with ties.

First sort by value in the dict, then assign ranks. Make sure you sort reversed, and then recreate the dict with the ranks.
from the previous answer :
import operator
x={'a':10.1,'b':2,'c':5}
sorted_x = sorted(x.items(), key=operator.itemgetter(1), reversed=True)
out_dict = {}
for idx, (key, _) in enumerate(sorted_x):
out_dict[key] = idx + 1
print out_dict

One way would be to examine the dictionary for the largest value, then remove it, while building a new dictionary:
my_dict = x = {'a':10.1,'b':2,'c':5}
i = 1
new_dict ={}
while len(my_dict) > 0:
my_biggest_key = max(my_dict, key=my_dict.get)
new_dict[my_biggest_key] = i
my_dict.pop(my_biggest_key)
i += 1
print new_dict

In [23]: from collections import OrderedDict
In [24]: mydict=dict([(j,i) for i, j in enumerate(x.keys(),1)])
In [28]: sorted_dict = sorted(mydict.items(), key=itemgetter(1))
In [29]: sorted_dict
Out[29]: [('a', 1), ('c', 2), ('b', 3)]
In [35]: OrderedDict(sorted_dict)
Out[35]: OrderedDict([('a', 1), ('c', 2), ('b', 3)])

You could do like this,
>>> x = {'a':10.1,'b':2,'c':5}
>>> m = {}
>>> k = 0
>>> for i in dict(sorted(x.items(), key=lambda k: k[1], reverse=True)):
k += 1
m[i] = k
>>> m
{'a': 1, 'c': 2, 'b': 3}

Pretty simple sort-of simple but kind of complex one-liner.
{key[0]:1 + value for value, key in enumerate(
sorted(d.iteritems(),
key=lambda x: x[1],
reverse=True))}
Let me walk you through it.
We use enumerate to give us a natural ordering of elements, which is zero-based. Simply using enumerate(d.iteritems()) will generate a list of tuples that contain an integer, then the tuple which contains a key:value pair from the original dictionary.
We sort the list so that it appears in order from highest to lowest.
We want to treat the value as the enumerated value (that is, we want 0 to be a value for 'a' if there's only one occurrence (and I'll get to normalizing that in a bit), and so forth), and we want the key to be the actual key from the dictionary. So here, we swap the order in which we're binding the two values.
When it comes time to extract the actual key, it's still in tuple form - it appears as ('a', 0), so we want to only get the first element from that. key[0] accomplishes that.
When we want to get the actual value, we normalize the ranking of it so that it's 1-based instead of zero-based, so we add 1 to value.

Using pandas:
import pandas as pd
x = {'a':10.1,'b':2,'c':5}
res = dict(zip(x.keys(), pd.Series(x.values()).rank().tolist()))

Related

Python3 dictionary: remove duplicate values in alphabetical order

Let's say I have the following dictionary:
full_dic = {
'aa': 1,
'ac': 1,
'ab': 1,
'ba': 2,
...
}
I normally use standard dictionary comprehension to remove dupes like:
t = {val : key for (key, val) in full_dic.items()}
cleaned_dic = {val : key for (key, val) in t.items()}
Calling print(cleaned_dic) outputs {'ab': 1,'ba': 2, ...}
With this code, the key that remains seems to always be the final one in the list, but I'm not sure that's even guaranteed as dictionaries are unordered. Instead, I'd like to find a way to ensure that the key I keep is the first alphabetically.
So, regardless of the 'order' the dictionary is in, I want the output to be:
>> {'aa': 1,'ba': 2, ...}
Where 'aa' comes first alphabetically.
I ran some timer tests on 3 answers below and got the following (dictionary was created with random key/value pairs):
dict length: 10
# of loops: 100000
HoliSimo (OrderedDict): 0.0000098405 seconds
Ricardo: 0.0000115448 seconds
Mark (itertools.groupby): 0.0000111745 seconds
dict length: 1000000
# of loops: 10
HoliSimo (OrderedDict): 6.1724137300 seconds
Ricardo: 3.3102091300 seconds
Mark (itertools.groupby): 6.1338266200 seconds
We can see that for smaller dictionary sizes using OrderedDict is fastest but for large dictionary sizes it's slightly better to use Ricardo's answer below.
t = {val : key for (key, val) in dict(sorted(full_dic.items(), key=lambda x: x[0].lower(), reverse=True)).items()}
cleaned_dic = {val : key for (key, val) in t.items()}
dict(sorted(cleaned_dic.items(), key=lambda x: x[0].lower()))
>>> {'aa': 1, 'ba': 2}
Seems like you can do this with a single sort and itertools.groupby. First sort the items by value, then key. Pass this to groupby and take the first item of each group to pass to the dict constructor:
from itertools import groupby
full_dic = {
'aa': 1,
'ac': 1,
'xx': 2,
'ab': 1,
'ba': 2,
}
groups = groupby(sorted(full_dic.items(), key=lambda p: (p[1], p[0])), key=lambda x: x[1])
dict(next(g) for k, g in groups)
# {'aa': 1, 'ba': 2}
You should use the OrderectDict class.
import collections
full_dic = {
'aa': 1,
'ac': 1,
'ab': 1
}
od = collections.OrderedDict(sorted(full_dic.items()))
In this way you will be sure to have sorted dictionary (Original code: StackOverflow).
And then:
result = {}
for k, vin od.items():
if value not in result.values():
result[key] = value
I'm not sure if it will speed up the computation but you can try:
inverted_dict = {}
for k, v in od.items():
if inverted_dict.get(v) is None:
inverted_dict[v] = k
res = {v: k for k, v in inverted_dict.items()}

Over counting pairs in python loop

I have a list of dictionaries where each dict is of the form:
{'A': a,'B': b}
I want to iterate through the list and for every (a,b) pair, find the pair(s), (b,a), if it exists.
For example if for a given entry of the list A = 13 and B = 14, then the original pair would be (13,14). I would want to search the entire list of dicts to find the pair (14,13). If (14,13) occurred multiple times I would like to record that too.
I would like to count the number of times for all original (a,b) pairs in the list, when the complement (b,a) appears, and if so how many times. To do this I have two for loops and a counter when a complement pair is found.
pairs_found = 0
for i, val in enumerate( list_of_dicts ):
for j, vol in enumerate( list_of_dicts ):
if val['A'] == vol['B']:
if vol['A'] == val['B']:
pairs_found += 1
This generates a pairs_found greater than the length of list_of_dicts. I realize this is because the same pairs will be over-counted. I am not sure how I can overcome this degeneracy?
Edit for Clarity
list_of_dicts = []
list_of_dicts[0] = {'A': 14, 'B', 23}
list_of_dicts[1] = {'A': 235, 'B', 98}
list_of_dicts[2] = {'A': 686, 'B', 999}
list_of_dicts[3] = {'A': 128, 'B', 123}
....
Lets say that the list has around 100000 entries. Somewhere in that list, there will be one or more entries, of the form {'A' 23, 'B': 14}. If this is true then I would like a counter to increase its value by one. I would like to do this for every value in the list.
Here is what I suggest:
Use tuple to represent your pairs and use them as dict/set keys.
Build a set of unique inverted pairs you'll look for.
Use a dict to store the number of time a pair appears inverted
Then the code should look like this:
# Create a set of unique inverted pairs
inverted_pairs_set = {(d['B'],d['A']) for d in list_of_dicts}
# Create a counter for original pairs
pairs_counter_dict = {(ip[1],ip[0]):0 for ip in inverted_pairs_set]
# Create list of pairs
pairs_list = [(d['A'],d['B']) for d in list_of_dicts]
# Count for each inverted pairs, how many times
for p in pairs_list:
if p in inverted_pairs_set:
pairs_counter_dict[(p[1],p[0])] += 1
You can create a counter dictionary that contains the values of the 'A' and 'B' keys in all your dictionaries:
complements_cnt = {(dct['A'], dct['B']): 0 for dct in list_of_dicts}
Then all you need is to iterate over your dictionaries again and increment the value for the "complements":
for dct in list_of_dicts:
try:
complements_cnt[(dct['B'], dct['A'])] += 1
except KeyError: # in case there is no complement there is nothing to increase
pass
For example with such a list_of_dicts:
list_of_dicts = [{'A': 1, 'B': 2}, {'A': 2, 'B': 1}, {'A': 1, 'B': 2}]
This gives:
{(1, 2): 1, (2, 1): 2}
Which basically says that the {'A': 1, 'B': 2} has one complement (the second) and {'A': 2, 'B': 1} has two (the first and the last).
The solution is O(n) which should be quite fast even for 100000 dictionaries.
Note: This is quite similar to #debzsud answer. I haven't seen it before I posted the answer though. :(
I am still not 100% sure what it is you want to do but here is my guess:
pairs_found = 0
for i, dict1 in enumerate(list_of_dicts):
for j, dict2 in enumerate(list_of_dicts[i+1:]):
if dict1['A'] == dict2['B'] and dict1['B'] == dict2['A']:
pairs_found += 1
Note the slicing on the second for loop. This avoids checking pairs that have already been checked before (comparing D1 with D2 is enough; no need to compare D2 to D1)
This is better than O(n**2) but still there is probably room for improvement
You could first create a list with the values of each dictionary as tuples:
example_dict = [{"A": 1, "B": 2}, {"A": 4, "B": 3}, {"A": 5, "B": 1}, {"A": 2, "B": 1}]
dict_values = [tuple(x.values()) for x in example_dict]
Then create a second list with the number of occurrences of each element inverted:
occurrences = [dict_values.count(x[::-1]) for x in dict_values]
Finally, create a dict with dict_values as keys and occurrences as values:
dict(zip(dict_values, occurrences))
Output:
{(1, 2): 1, (2, 1): 1, (4, 3): 0, (5, 1): 0}
For each key, you have the number of inverted keys. You can also create the dictionary on the fly:
occurrences = {dict_values: dict_values.count(x[::-1]) for x in dict_values}

Getting the difference (in values) between two dictionaries in python

Let's say you are given 2 dictionaries, A and B with keys that can be the same but values (integers) that will be different. How can you compare the 2 dictionaries so that if the key matches you get the difference (eg if x is the value from key "A" and y is the value from key "B" then result should be x-y) between the 2 dictionaries as a result (preferably as a new dictionary).
Ideally you'd also be able to compare the gain in percent (how much the values changed percentage-wise between the 2 dictionaries which are snapshots of numbers at a specific time).
Given two dictionaries, A and B which may/may not have the same keys, you can do this:
A = {'a':5, 't':4, 'd':2}
B = {'s':11, 'a':4, 'd': 0}
C = {x: A[x] - B[x] for x in A if x in B}
Which only subtracts the keys that are the same in both dictionaries.
You could use a dict comprehension to loop through the keys, then subtract the corresponding values from each original dict.
>>> a = {'a': 5, 'b': 3, 'c': 12}
>>> b = {'a': 1, 'b': 7, 'c': 19}
>>> {k: b[k] - a[k] for k in a}
{'a': -4, 'b': 4, 'c': 7}
This assumes both dict have the exact same keys. Otherwise you'd have to think about what behavior you expect if there are keys in one dict but not the other (maybe some default value?)
Otherwise if you want to evaluate only shared keys, you can use the set intersection of the keys
>>> {k: b[k] - a[k] for k in a.keys() & b.keys()}
{'a': -4, 'b': 4, 'c': 7}
def difference_dict(Dict_A, Dict_B):
output_dict = {}
for key in Dict_A.keys():
if key in Dict_B.keys():
output_dict[key] = abs(Dict_A[key] - Dict_B[key])
return output_dict
>>> Dict_A = {'a': 4, 'b': 3, 'c':7}
>>> Dict_B = {'a': 3, 'c': 23, 'd': 2}
>>> Diff = difference_dict(Dict_A, Dict_B)
>>> Diff
{'a': 1, 'c': 16}
If you wanted to fit that all onto one line, it would be...
def difference_dict(Dict_A, Dict_B):
output_dict = {key: abs(Dict_A[key] - Dict_B[key]) for key in Dict_A.keys() if key in Dict_B.keys()}
return output_dict
If you want to get the difference of similar keys into a new dictionary, you could do something like the following:
new_dict={}
for key in A:
if key in B:
new_dict[key] = A[key] - B[key]
...which we can fit into one line
new_dict = { key : A[key] - B[key] for key in A if key in B }
here is a python package for this case:
https://dictdiffer.readthedocs.io/en/latest/
from dictdiffer import diff
print(list(diff(a, b)))
would do the trick.

Python: Iterate alphabetically over OrderedDict

In a script I have an OrderedDict groups that gets fed key/value pairs alphabetically.
In another part of the script, I'm checking against files that have the same same as key like so:
for (key, value) in groups.items():
file = open(key, 'r')
# do stuff
Stuff happens just fine, part of which is printing a status line for each file, but how can I get Python to iterate through groups alphabetically, or at least numerically as they are ordered (since they are being entered in alphabetical order anyways)?
The whole point of an OrderedDict is that you can iterate through it normally in the order that keys were entered:
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d[1] = 2
>>> d[0] = 3
>>> d[9] = 2
>>> for k, v in d.items():
print(k, v)
(1, 2)
(0, 3)
(9, 2)
Just make sure you don't feed OrderedDict(...) a dictionary to initialize it or it starts off unordered.
If all you want to do is iterate through a dictionary in order of the keys, you can use a regular dictionary and sorted():
>>> d = dict(s=5,g=4,a=6,j=10)
>>> d
{'g': 4, 's': 5, 'j': 10, 'a': 6}
>>> for k in sorted(d):
print(k, ':', d[k])
a : 6
g : 4
j : 10
s : 5
>>>
(pardon the python3 print())
If you really want to stick with the ordered dict, then read the documentation which shows an example of reordering an OrderedDict:
>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
If you really entered them into an OrderedDict alphabetically in the first place, then I'm not sure why you're having trouble.

Return first N key:value pairs from dict

Consider the following dictionary, d:
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
I want to return the first N key:value pairs from d (N <= 4 in this case). What is the most efficient method of doing this?
There's no such thing a the "first n" keys because a dict doesn't remember which keys were inserted first.
You can get any n key-value pairs though:
n_items = take(n, d.items())
This uses the implementation of take from the itertools recipes:
from itertools import islice
def take(n, iterable):
"""Return the first n items of the iterable as a list."""
return list(islice(iterable, n))
See it working online: ideone
For Python < 3.6
n_items = take(n, d.iteritems())
A very efficient way to retrieve anything is to combine list or dictionary comprehensions with slicing. If you don't need to order the items (you just want n random pairs), you can use a dictionary comprehension like this:
# Python 2
first2pairs = {k: mydict[k] for k in mydict.keys()[:2]}
# Python 3
first2pairs = {k: mydict[k] for k in list(mydict)[:2]}
Generally a comprehension like this is always faster to run than the equivalent "for x in y" loop. Also, by using .keys() to make a list of the dictionary keys and slicing that list you avoid 'touching' any unnecessary keys when you build the new dictionary.
If you don't need the keys (only the values) you can use a list comprehension:
first2vals = [v for v in mydict.values()[:2]]
If you need the values sorted based on their keys, it's not much more trouble:
first2vals = [mydict[k] for k in sorted(mydict.keys())[:2]]
or if you need the keys as well:
first2pairs = {k: mydict[k] for k in sorted(mydict.keys())[:2]}
To get the top N elements from your python dictionary one can use the following line of code:
list(dictionaryName.items())[:N]
In your case you can change it to:
list(d.items())[:4]
Python's dicts are not ordered, so it's meaningless to ask for the "first N" keys.
The collections.OrderedDict class is available if that's what you need. You could efficiently get its first four elements as
import itertools
import collections
d = collections.OrderedDict((('foo', 'bar'), (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')))
x = itertools.islice(d.items(), 0, 4)
for key, value in x:
print key, value
itertools.islice allows you to lazily take a slice of elements from any iterator. If you want the result to be reusable you'd need to convert it to a list or something, like so:
x = list(itertools.islice(d.items(), 0, 4))
foo = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6}
iterator = iter(foo.items())
for i in range(3):
print(next(iterator))
Basically, turn the view (dict_items) into an iterator, and then iterate it with next().
in py3, this will do the trick
{A:N for (A,N) in [x for x in d.items()][:4]}
{'a': 3, 'b': 2, 'c': 3, 'd': 4}
You can get dictionary items by calling .items() on the dictionary. then convert that to a list and from there get first N items as you would on any list.
below code prints first 3 items of the dictionary object
e.g.
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
first_three_items = list(d.items())[:3]
print(first_three_items)
Outputs:
[('a', 3), ('b', 2), ('c', 3)]
For Python 3.8 the correct answer should be:
import more_itertools
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
first_n = more_itertools.take(3, d.items())
print(len(first_n))
print(first_n)
Whose output is:
3
[('a', 3), ('b', 2), ('c', 3)]
After pip install more-itertools of course.
Did not see it on here. Will not be ordered but the simplest syntactically if you need to just take some elements from a dictionary.
n = 2
{key:value for key,value in d.items()[0:n]}
Were d is your dictionary and n is the printing number:
for idx, (k, v) in enumerate(d.items()):
if idx == n: break
print(k, v)
Casting your dictionary to a list can be slow.
Your dictionary may be too large and you don't need to cast all of it just for printing a few of the first.
See PEP 0265 on sorting dictionaries. Then use the aforementioned iterable code.
If you need more efficiency in the sorted key-value pairs. Use a different data structure. That is, one that maintains sorted order and the key-value associations.
E.g.
import bisect
kvlist = [('a', 1), ('b', 2), ('c', 3), ('e', 5)]
bisect.insort_left(kvlist, ('d', 4))
print kvlist # [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]
just add an answer using zip,
{k: d[k] for k, _ in zip(d, range(n))}
This will work for python 3.8+:
d_new = {k:v for i, (k, v) in enumerate(d.items()) if i < n}
This depends on what is 'most efficient' in your case.
If you just want a semi-random sample of a huge dictionary foo, use foo.iteritems() and take as many values from it as you need, it's a lazy operation that avoids creation of an explicit list of keys or items.
If you need to sort keys first, there's no way around using something like keys = foo.keys(); keys.sort() or sorted(foo.iterkeys()), you'll have to build an explicit list of keys. Then slice or iterate through first N keys.
BTW why do you care about the 'efficient' way? Did you profile your program? If you did not, use the obvious and easy to understand way first. Chances are it will do pretty well without becoming a bottleneck.
For Python 3 and above,To select first n Pairs
n=4
firstNpairs = {k: Diction[k] for k in list(Diction.keys())[:n]}
This might not be very elegant, but works for me:
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
x= 0
for key, val in d.items():
if x == 2:
break
else:
x += 1
# Do something with the first two key-value pairs
You can approach this a number of ways. If order is important you can do this:
for key in sorted(d.keys()):
item = d.pop(key)
If order isn't a concern you can do this:
for i in range(4):
item = d.popitem()
Dictionary maintains no order , so before picking top N key value pairs lets make it sorted.
import operator
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4}
d=dict(sorted(d.items(),key=operator.itemgetter(1),reverse=True))
#itemgetter(0)=sort by keys, itemgetter(1)=sort by values
Now we can do the retrieval of top 'N' elements:, using the method structure like this:
def return_top(elements,dictionary_element):
'''Takes the dictionary and the 'N' elements needed in return
'''
topers={}
for h,i in enumerate(dictionary_element):
if h<elements:
topers.update({i:dictionary_element[i]})
return topers
to get the top 2 elements then simply use this structure:
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4}
d=dict(sorted(d.items(),key=operator.itemgetter(1),reverse=True))
d=return_top(2,d)
print(d)
consider a dict
d = {'a': 3, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
from itertools import islice
n = 3
list(islice(d.items(),n))
islice will do the trick :)
hope it helps !
I have tried a few of the answers above and note that some of them are version dependent and do not work in version 3.7.
I also note that since 3.6 all dictionaries are ordered by the sequence in which items are inserted.
Despite dictionaries being ordered since 3.6 some of the statements you expect to work with ordered structures don't seem to work.
The answer to the OP question that worked best for me.
itr = iter(dic.items())
lst = [next(itr) for i in range(3)]
def GetNFirstItems(self):
self.dict = {f'Item{i + 1}': round(uniform(20.40, 50.50), 2) for i in range(10)}#Example Dict
self.get_items = int(input())
for self.index,self.item in zip(range(len(self.dict)),self.dict.items()):
if self.index==self.get_items:
break
else:
print(self.item,",",end="")
Unusual approach, as it gives out intense O(N) time complexity.
I like this one because no new list needs to be created, its a one liner which does exactly what you want and it works with python >= 3.8 (where dictionaries are indeed ordered, I think from python 3.6 on?):
new_d = {kv[0]:kv[1] for i, kv in enumerate(d.items()) if i <= 4}

Categories