Sorting Counter entries by value, then by key

Sorting Counter entries by value, then by key - python

I was trying to sort few values in list using Python's Counter from collection module. But it gives weird result when
>>> diff=["aaa","aa","a"]
>>> c=Counter(diff)
>>> sorted(c.items(), key = lambda x:x[1] , reverse=True)
[('aa', 1), ('a', 1), ('aaa', 1)]
>>> c.items()
[('aa', 1), ('a', 1), ('aaa', 1)]
Output is strange, as it seems to have shuffle 'aa' to the first place, then 'a' and 'aaa' at last.
Ideally, it should have been 'a' then 'aa' then 'aaa'
What is the reason behind this and how would you rectify the same
Edit:
Most people understand the question incorrectly, Hence I am pushing some clarifications. The goal is to sort the number of words in list based on it's occurances.
Let's say list diff = ["this", "this", "world", "cool", "is", "cool", "cool"]. The final output by my above code would be cool then this then is then world which is correct.
but problem is when you supply same characters with same occurences, python misbehaves. As the Input is diff = ["aaa", "aa", "a"] , I expected output to be a then aa then aaa . But python algorithm would never know as every word occurred single time.
But if that is the case, then why did python didn't printed aaa then aa then a (i.e in same order it was inputted) giving benefit of doubt. Python sort did actually swapped . WHY?

Counter is a subclass of dict. It is an unordered collection.
The get the sorting order you want, you can update your code like -
sorted(c.items(), key = lambda x:(x[1], -len(x[0])) , reverse=True)
This gives -
[('a', 1), ('aa', 1), ('aaa', 1)]

sorted does a stable sort. That means for ties, the order of items will be the same as the order they appear in the original input. Since your Counter is unordered, the input to sorted is in some undefined order. If you want you can sort by the key, and then the value:
sorted(sorted(c.items(), key=lambda x:x[0], reverse=True), key = lambda x:x[1] , reverse=True)
Or (probably better) have your sort function return a tuple as the sort key:
sorted(c.items(), key=lambda x:(x[1], x[0]), reverse=True)
An (even better!) version utilizing operator.itemgetter:
sorted(c.items(), key=itemgetter(1,0), reverse=True)

Here's one way you can ensure your ordering remains unchanged.
As previously mentioned dictionaries are not deemed to be ordered. The result will be a sorted list of tuples.
from collections import Counter
diff = ["aaa", "aa", "a"]
c = Counter(diff)
sorted(c.items(), key=lambda x: diff.index(x[0]))
# [('aaa', 1), ('aa', 1), ('a', 1)]

Related

How to reorder an OrderedDict according to a list of values in Python 2.7?

I am trying to find out how to reorder my OrderedDict with a list of values.
This is my code:
import collections
dicti = {'a':1, 'b':2, 'c':3}
ordered_dict = collections.OrderedDict(sorted(dicti.items(), key=lambda t: t[0]))
desired_dict_order = [3,2,1]
print 'ordered_dict before reordering =', ordered_dict
for key in desired_dict_order:
ordered_dict[key] = ordered_dict.popitem(key)
print 'ordered_dict after reordering =', ordered_dict
I got this idea from this thread <[please see it] which also answers this question, but I am doing this in Python 2.7 and it doesn’t seem to work, for the output yields this weird result which I do not understand:
ordered_dict before reordering = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
ordered_dict after reordering = OrderedDict([('a', 1), ('b', 2), (1, (2, (3, ('c', 3))))])
What is it doing and how can I achieve it in the right way?
EDIT: I would like to reorder based on the indices of the OrderedDict, so in the end I could reorder the dict even if the dict-keys as well as the dict-values are anything and don't nessecarily enumerate sequentially (like 1, 2, 3)

Look at your code:
desired_dict_order = [3,2,1] # this is a list of numbers - values in your dict, not keys!
print 'ordered_dict before reordering =', ordered_dict
for key in desired_dict_order: # …but here you're using them as keys, which is wrong
ordered_dict[key] = ordered_dict.popitem(key)
Easy way to do the reordering is:
aux_dict = { v: k for k, v in ordered_dict.items() } # reversed dict - keys and values swapped!
ordered_dict = collections.OrderedDict( [ (aux_dict[v], v) for v in desired_dict_order ] )

ordered_dict = collections.OrderedDict(sorted(dicti.items(), key=lambda t: t[1],reverse=True))
Will this produce what you want?
Because you are sorting already on the key, so why not directly sort on the value in reverse?

pythonic way of sorting a log lexicographically

I'm a newbie to python. I'm trying to solve a problem.Lets assume I'm getting a log file with identifier followed by space separated words. I need to sort the log based on words (identifiers can be omitted). However if the words match I need to sort based on identifier. So I'm building a dictionary with identifier being key and words being value. For simplicity, I'm using sample example below. How can I sort a dictionary by value and then sort by key if the values match? Below is an example.
>>> a_dict = {'aa1':'n','ba2' : 'a','aa2':'a'}
>>> a_dict
{'ba2': 'a', 'aa1': 'n', 'aa2': 'a'}
If I sort the given dictionary by value, it becomes this.
>>> b_tuple = sorted(a_dict.items(),key = lambda x: x[1])
>>> b_tuple
[('ba2', 'a'), ('aa2', 'a'), ('aa1', 'n')]
However the expected output should look like this
[('aa2', 'a'), ('ba2','a'), ('aa1', 'n')]
The reason being if values are same the dictionary has to be sorted by key. Any suggestions as to how this can be done?

The key function in your example only sorts by value, as you've noticed. If you also want to sort by key, then you can return the value and key (in that order) as a tuple:
>>> sorted(a_dict.items(), key=lambda x: (x[1], x[0]))
[('aa2', 'a'), ('ba2', 'a'), ('aa1', 'n')]
The confusing part is that your data looks like ('aa2', 'a'), for example, but it is being sorted as ('a', 'aa2') because of (x[1], x[0]).

You can use an OrderedDict from the collections module to store your sorted value
from collections import OrderedDict
a_dict = {'aa1':'n','ba2' : 'a','aa2':'a'}
sorted_by_key_then_value = sorted(a_dict.items(), key=lambda t: (t[1], t[0])))
sort_dict = OrderedDict(sorted_by_key_then_value)
EDIT: I mix up key and value in (t[0], t[1]). In the key function t[0] give the key, and t[1] give the value. The sorted function will use the tuple(value, key) and order them by alphanumerical order.

Python - map values to index [duplicate]

I am new to Python, and I am familiar with implementations of Multimaps in other languages. Does Python have such a data structure built-in, or available in a commonly-used library?
To illustrate what I mean by "multimap":
a = multidict()
a[1] = 'a'
a[1] = 'b'
a[2] = 'c'
print(a[1]) # prints: ['a', 'b']
print(a[2]) # prints: ['c']

Such a thing is not present in the standard library. You can use a defaultdict though:
>>> from collections import defaultdict
>>> md = defaultdict(list)
>>> md[1].append('a')
>>> md[1].append('b')
>>> md[2].append('c')
>>> md[1]
['a', 'b']
>>> md[2]
['c']
(Instead of list you may want to use set, in which case you'd call .add instead of .append.)
As an aside: look at these two lines you wrote:
a[1] = 'a'
a[1] = 'b'
This seems to indicate that you want the expression a[1] to be equal to two distinct values. This is not possible with dictionaries because their keys are unique and each of them is associated with a single value. What you can do, however, is extract all values inside the list associated with a given key, one by one. You can use iter followed by successive calls to next for that. Or you can just use two loops:
>>> for k, v in md.items():
... for w in v:
... print("md[%d] = '%s'" % (k, w))
...
md[1] = 'a'
md[1] = 'b'
md[2] = 'c'

Just for future visitors. Currently there is a python implementation of Multimap. It's available via pypi

Stephan202 has the right answer, use defaultdict. But if you want something with the interface of C++ STL multimap and much worse performance, you can do this:
multimap = []
multimap.append( (3,'a') )
multimap.append( (2,'x') )
multimap.append( (3,'b') )
multimap.sort()
Now when you iterate through multimap, you'll get pairs like you would in a std::multimap. Unfortunately, that means your loop code will start to look as ugly as C++.
def multimap_iter(multimap,minkey,maxkey=None):
maxkey = minkey if (maxkey is None) else maxkey
for k,v in multimap:
if k<minkey: continue
if k>maxkey: break
yield k,v
# this will print 'a','b'
for k,v in multimap_iter(multimap,3,3):
print v
In summary, defaultdict is really cool and leverages the power of python and you should use it.

You can take list of tuples and than can sort them as if it was a multimap.
listAsMultimap=[]
Let's append some elements (tuples):
listAsMultimap.append((1,'a'))
listAsMultimap.append((2,'c'))
listAsMultimap.append((3,'d'))
listAsMultimap.append((2,'b'))
listAsMultimap.append((5,'e'))
listAsMultimap.append((4,'d'))
Now sort it.
listAsMultimap=sorted(listAsMultimap)
After printing it you will get:
[(1, 'a'), (2, 'b'), (2, 'c'), (3, 'd'), (4, 'd'), (5, 'e')]
That means it is working as a Multimap!
Please note that like multimap here values are also sorted in ascending order if the keys are the same (for key=2, 'b' comes before 'c' although we didn't append them in this order.)
If you want to get them in descending order just change the sorted() function like this:
listAsMultimap=sorted(listAsMultimap,reverse=True)
And after you will get output like this:
[(5, 'e'), (4, 'd'), (3, 'd'), (2, 'c'), (2, 'b'), (1, 'a')]
Similarly here values are in descending order if the keys are the same.

The standard way to write this in Python is with a dict whose elements are each a list or set. As stephan202 says, you can somewhat automate this with a defaultdict, but you don't have to.
In other words I would translate your code to
a = dict()
a[1] = ['a', 'b']
a[2] = ['c']
print(a[1]) # prints: ['a', 'b']
print(a[2]) # prints: ['c']

Or subclass dict:
class Multimap(dict):
def __setitem__(self, key, value):
if key not in self:
dict.__setitem__(self, key, [value]) # call super method to avoid recursion
else
self[key].append(value)

There is no multi-map in the Python standard libs currently.
WebOb has a MultiDict class used to represent HTML form values, and it is used by a few Python Web frameworks, so the implementation is battle tested.
Werkzeug also has a MultiDict class, and for the same reason.

Check if something in a dictionary is the same as the max value in that dictionary?

How can I check if something in a dictionary is the same as the max in that dictionary. In other words, get all the max values instead of the max value with lowest position.
I have this code which returns the max variable name and value:
d = {'g_dirt4': g_dirt4, 'g_destiny2': g_destiny2, 'g_southpark': g_southpark, 'g_codww2': g_codww2, 'g_bfront2': g_bfront2, 'g_reddead2': g_reddead2, 'g_fifa18': g_fifa18, 'g_motogp17': g_motogp17, 'g_elderscrolls': g_elderscrolls, 'g_crashbandicoot': g_crashbandicoot}
print("g_dirt4", g_dirt4, "g_destiny2", g_destiny2, "g_southpark", g_southpark, "g_codww2", g_codww2, "g_bfront2", g_bfront2, "g_reddead2", g_reddead2, "g_fifa18", g_fifa18, "g_motogp17", g_motogp17, "g_elderscrolls", g_elderscrolls, "g_crashbandicoot", g_crashbandicoot)
print (max(d.items(), key=lambda x: x[1]))
Now it prints the variable with the highest value plus the value itself, but what if there are two or three variables with the same max value? I would like to print all of the max values.
Edit:
The user has to fill in a form, which adds values to the variables in the dictionary. When the user is done, there will be one, two or more variables with the highest value. For example, the code gives me this:
2017-06-08 15:05:43 g_dirt4 9 g_destiny2 8 g_southpark 5 g_codww2 8 g_bfront2 8 g_reddead2 7 g_fifa18 8 g_motogp17 9 g_elderscrolls 5 g_crashbandicoot 6
2017-06-08 15:05:43 ('g_dirt4', 9)
Now it tells me that g_dirt4 has the highest value of 9, but if you look at motogp17, it also had 9 but it doesn't get printed because it's at a higher position in the dictionary. So how do I print them both? And what if it has 3 variables with the same max value?

Given a dictionary
d = {'too': 2, 'one': 1, 'two': 2, 'won': 1, 'to': 2}
the following command:
result = [(n,v) for n,v in d.items() if v == max(d.values())]
yields: [('too', 2), ('two', 2), ('to', 2)]

Let me introduce you to a more complicated but more powerful answer. If you sort your dictionary items, you can use itertools.groupby for some powerful results:
import itertools
foo = {"one": 1, "two": 2, "three": 3, "tres": 3, "dos": 2, "troi": 3}
sorted_kvp = sorted(foo.items(), key=lambda kvp: -kvp[1])
grouped = itertools.groupby(sorted_kvp, key=lambda kvp: kvp[1])
The sorted line takes the key/value pairs of dictionary items and sorts them based on the value. I put a - in front so that the values will end up being sorted descending. The results of that line are:
>>> print(sorted_kvp)
[('tres', 3), ('troi', 3), ('three', 3), ('two', 2), ('dos', 2), ('one', 1)]
Note, as the comments said above, the order of the keys (in this case, 'tres', 'troi', and 'three', and then 'two' and 'dos', is arbitrary, since the order of the keys in the dictionary is arbitrary.
The itertools.groupby line makes groups out of the runs of data. The lambda tells it to look at kvp[1] for each key-value pair, i.e. the value.
At the moment, you're only interested in the max, so you can then do this:
max, key_grouper = next(grouped)
print("{}: {}".format(max, list(key_grouper)))
And get the following results:
3: [('tres', 3), ('troi', 3), ('three', 3)]
But if you wanted all the information sorted, instead, with this method, that's just as easy:
for value, grouper in grouped:
print("{}: {}".format(value, list(grouper)))
produces:
3: [('tres', 3), ('troi', 3), ('three', 3)]
2: [('two', 2), ('dos', 2)]
1: [('one', 1)]
One last note: you can use next or you can use the for loop, but using both will give you different results. grouped is an iterator, and calling next on it moves it to its next result (and the for loop consumes the entire iterator, so a subsequent next(grouped) would cause a StopIteration exception).

You could do something like this:
max_value = (max(d.items(), key=lambda x: x[1]))[1]
max_list = [max_value]
for key, value in d.items():
if value == max_value:
max_list.append((key, value))
print(max_list)
This will get the maximum value, then loop through all the keys and values in your dictionary and add all the ones matching that max value to a list. Then you print the list and it should print all of them.

Returning Key matching Value -- Adding Constraint

Searching a Python dictionary based on the value first, to get a key output make sense to me. But what if we want to add another constraint to the search?
For instance, here I am searching a dictionary (multi-dimensional) for the lowest value, then returning the key of that lowest value:
minValue[id] = min(data[id].items(), key=lambda x: x[1])
Since this method only returns one key that matches that value, while there may be multiple, I want to add another constraint.
Is there an elegant way to add: return key that contains overall minimum value AND has the longest length of those matching ?

I think a specific example would be helpful to clarify what the dictionary looks like since python doesn't directly provide a multi-dimensional dict.
I assume that it looks something like this: data = {'a': 1, 'b': 2, 'b': 3} (note, this is not valid python!), so that what you when you do min(data[id].items(), key=lambda x: x[1]) you want it to return ('a', 1), and checking for the longest length matching would give, perhaps [('b', 2), ('b', 3)].
If that is what you mean, then the easiest way is to use a defaultdict with a set:
>>> data = defaultdict(set)
>>> data['a'].add(1)
>>> data['b'].add(2)
>>> data['b'].add(3)
>>> min(data.items(), key=lambda x: min(x[1]))
('a': {1})
>>> min(data.items(), key=lambda x: max(len(x[1])))
('b': {2, 3})

Well, you could add the length to the key function:
>>> data = {'a': 1, 'aa': 1, 'b': 2, 'c': 3}
>>> min(data.items(), key=lambda x: x[1])
('a', 1)
>>> min(data.items(), key=lambda x: (x[1], -len(x[0])))
('aa', 1)
but what if there are two with the same value and the same length? You're back to the same problem of not knowing what the output will be. I'd probably build a list of the matching key-value pairs and then sort them or something, but the right thing to do would probably depend upon what the keys actually mean.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sorting Counter entries by value, then by key - python

Counter is a subclass of dict. It is an unordered collection. The get the sorting order you want, you can update your code like - sorted(c.items(), key = lambda x:(x[1], -len(x[0])) , reverse=True) This gives - [('a', 1), ('aa', 1), ('aaa', 1)]

Related

How to reorder an OrderedDict according to a list of values in Python 2.7?

pythonic way of sorting a log lexicographically

Python - map values to index [duplicate]

Check if something in a dictionary is the same as the max value in that dictionary?

Returning Key matching Value -- Adding Constraint

Categories

Resources