Sorting based on frequency and alphabetical order - python

I am trying to sort based on frequency and display it in alphabetical order
After freq counting , I have a list with (string, count) tuple
E.g tmp = [("xyz", 1), ("foo", 2 ) , ("bar", 2)]
I then sort as sorted(tmp, reverse=True)
This gives me [("foo", 2 ) , ("bar", 2), ("xyz", 1)]
How can I make them sort alphabetically in lowest order when frequency same, Trying to figure out the comparator function
expected output:[("bar", 2), ("foo", 2 ), ("xyz", 1)]

You have to sort by multiple keys.
sorted(tmp, key=lambda x: (-x[1], x[0]))
Source: Sort a list by multiple attributes?.

Use this code:
from operator import itemgetter
tmp = [('xyz',1), ('foo', 2 ) , ('bar', 2)]
print(sorted(tmp, key=itemgetter(0,1)))
This skips the usage of function call.

Related

Sort list of objects based on multiple attributes of the objects? [duplicate]

I have a list of lists:
[[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]
If I wanted to sort by one element, say the tall/short element, I could do it via s = sorted(s, key = itemgetter(1)).
If I wanted to sort by both tall/short and colour, I could do the sort twice, once for each element, but is there a quicker way?
A key can be a function that returns a tuple:
s = sorted(s, key = lambda x: (x[1], x[2]))
Or you can achieve the same using itemgetter (which is faster and avoids a Python function call):
import operator
s = sorted(s, key = operator.itemgetter(1, 2))
And notice that here you can use sort instead of using sorted and then reassigning:
s.sort(key = operator.itemgetter(1, 2))
I'm not sure if this is the most pythonic method ...
I had a list of tuples that needed sorting 1st by descending integer values and 2nd alphabetically. This required reversing the integer sort but not the alphabetical sort. Here was my solution: (on the fly in an exam btw, I was not even aware you could 'nest' sorted functions)
a = [('Al', 2),('Bill', 1),('Carol', 2), ('Abel', 3), ('Zeke', 2), ('Chris', 1)]
b = sorted(sorted(a, key = lambda x : x[0]), key = lambda x : x[1], reverse = True)
print(b)
[('Abel', 3), ('Al', 2), ('Carol', 2), ('Zeke', 2), ('Bill', 1), ('Chris', 1)]
Several years late to the party but I want to both sort on 2 criteria and use reverse=True. In case someone else wants to know how, you can wrap your criteria (functions) in parenthesis:
s = sorted(my_list, key=lambda i: ( criteria_1(i), criteria_2(i) ), reverse=True)
It appears you could use a list instead of a tuple.
This becomes more important I think when you are grabbing attributes instead of 'magic indexes' of a list/tuple.
In my case I wanted to sort by multiple attributes of a class, where the incoming keys were strings. I needed different sorting in different places, and I wanted a common default sort for the parent class that clients were interacting with; only having to override the 'sorting keys' when I really 'needed to', but also in a way that I could store them as lists that the class could share
So first I defined a helper method
def attr_sort(self, attrs=['someAttributeString']:
'''helper to sort by the attributes named by strings of attrs in order'''
return lambda k: [ getattr(k, attr) for attr in attrs ]
then to use it
# would defined elsewhere but showing here for consiseness
self.SortListA = ['attrA', 'attrB']
self.SortListB = ['attrC', 'attrA']
records = .... #list of my objects to sort
records.sort(key=self.attr_sort(attrs=self.SortListA))
# perhaps later nearby or in another function
more_records = .... #another list
more_records.sort(key=self.attr_sort(attrs=self.SortListB))
This will use the generated lambda function sort the list by object.attrA and then object.attrB assuming object has a getter corresponding to the string names provided. And the second case would sort by object.attrC then object.attrA.
This also allows you to potentially expose outward sorting choices to be shared alike by a consumer, a unit test, or for them to perhaps tell you how they want sorting done for some operation in your api by only have to give you a list and not coupling them to your back end implementation.
convert the list of list into a list of tuples then sort the tuple by multiple fields.
data=[[12, 'tall', 'blue', 1],[2, 'short', 'red', 9],[4, 'tall', 'blue', 13]]
data=[tuple(x) for x in data]
result = sorted(data, key = lambda x: (x[1], x[2]))
print(result)
output:
[(2, 'short', 'red', 9), (12, 'tall', 'blue', 1), (4, 'tall', 'blue', 13)]
Here's one way: You basically re-write your sort function to take a list of sort functions, each sort function compares the attributes you want to test, on each sort test, you look and see if the cmp function returns a non-zero return if so break and send the return value.
You call it by calling a Lambda of a function of a list of Lambdas.
Its advantage is that it does single pass through the data not a sort of a previous sort as other methods do. Another thing is that it sorts in place, whereas sorted seems to make a copy.
I used it to write a rank function, that ranks a list of classes where each object is in a group and has a score function, but you can add any list of attributes.
Note the un-lambda-like, though hackish use of a lambda to call a setter.
The rank part won't work for an array of lists, but the sort will.
#First, here's a pure list version
my_sortLambdaLst = [lambda x,y:cmp(x[0], y[0]), lambda x,y:cmp(x[1], y[1])]
def multi_attribute_sort(x,y):
r = 0
for l in my_sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
Lst = [(4, 2.0), (4, 0.01), (4, 0.9), (4, 0.999),(4, 0.2), (1, 2.0), (1, 0.01), (1, 0.9), (1, 0.999), (1, 0.2) ]
Lst.sort(lambda x,y:multi_attribute_sort(x,y)) #The Lambda of the Lambda
for rec in Lst: print str(rec)
Here's a way to rank a list of objects
class probe:
def __init__(self, group, score):
self.group = group
self.score = score
self.rank =-1
def set_rank(self, r):
self.rank = r
def __str__(self):
return '\t'.join([str(self.group), str(self.score), str(self.rank)])
def RankLst(inLst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank)):
#Inner function is the only way (I could think of) to pass the sortLambdaLst into a sort function
def multi_attribute_sort(x,y):
r = 0
for l in sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
inLst.sort(lambda x,y:multi_attribute_sort(x,y))
#Now Rank your probes
rank = 0
last_group = group_lambda(inLst[0])
for i in range(len(inLst)):
rec = inLst[i]
group = group_lambda(rec)
if last_group == group:
rank+=1
else:
rank=1
last_group = group
SetRank_Lambda(inLst[i], rank) #This is pure evil!! The lambda purists are gnashing their teeth
Lst = [probe(4, 2.0), probe(4, 0.01), probe(4, 0.9), probe(4, 0.999), probe(4, 0.2), probe(1, 2.0), probe(1, 0.01), probe(1, 0.9), probe(1, 0.999), probe(1, 0.2) ]
RankLst(Lst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank))
print '\t'.join(['group', 'score', 'rank'])
for r in Lst: print r
There is a operator < between lists e.g.:
[12, 'tall', 'blue', 1] < [4, 'tall', 'blue', 13]
will give
False

Arrange elements with same count in alphabetical order

Python Collection Counter.most_common(n) method returns the top n elements with their counts. However, if the counts for two elements is the same, how can I return the result sorted by alphabetical order?
For example: for a string like: BBBAAACCD, for the "2-most common" elements, I want the result to be for specified n = 2:
[('A', 3), ('B', 3), ('C', 2)]
and NOT:
[('B', 3), ('A', 3), ('C', 2)]
Notice that although A and B have the same frequency, A comes before B in the resultant list since it comes before B in alphabetical order.
[('A', 3), ('B', 3), ('C', 2)]
How can I achieve that?
Although this question is already a bit old i'd like to suggest a very simple solution to the problem which just involves sorting the input of Counter() before creating the Counter object itself. If you then call most_common(n) you will get the top n entries sorted in alphabetical order.
from collections import Counter
char_counter = Counter(sorted('ccccbbbbdaef'))
for char in char_counter.most_common(3):
print(*char)
resulting in the output:
b 4
c 4
a 1
There are two issues here:
Include duplicates when considering top n most common values excluding duplicates.
For any duplicates, order alphabetically.
None of the solutions thus far address the first issue. You can use a heap queue with the itertools unique_everseen recipe (also available in 3rd party libraries such as toolz.unique) to calculate the nth largest count.
Then use sorted with a custom key.
from collections import Counter
from heapq import nlargest
from toolz import unique
x = 'BBBAAACCD'
c = Counter(x)
n = 2
nth_largest = nlargest(n, unique(c.values()))[-1]
def sort_key(x):
return -x[1], x[0]
gen = ((k, v) for k, v in c.items() if v >= nth_largest)
res = sorted(gen, key=sort_key)
[('A', 3), ('B', 3), ('C', 2)]
I would first sort your output array in alphabetical order and than sort again by most occurrences which will keep the alphabetical order:
from collections import Counter
alphabetic_sorted = sorted(Counter('BBBAAACCD').most_common(), key=lambda tup: tup[0])
final_sorted = sorted(alphabetic_sorted, key=lambda tup: tup[1], reverse=True)
print(final_sorted[:3])
Output:
[('A', 3), ('B', 3), ('C', 2)]
I would go for:
sorted(Counter('AAABBBCCD').most_common(), key=lambda t: (-t[1], t[0]))
This sorts count descending (as they are already, which should be more performant) and then sorts by name ascending in each equal count group
This is one of the problems I got in the interview exam and failed to do it. Came home slept for a while and solution came in my mind.
from collections import Counter
def bags(list):
cnt = Counter(list)
print(cnt)
order = sorted(cnt.most_common(2), key=lambda i:( i[1],i[0]), reverse=True)
print(order)
return order[0][0]
print(bags(['a','b','c','a','b']))
s = "BBBAAACCD"
p = [(i,s.count(i)) for i in sorted(set(s))]
**If you are okay with not using the Counter.
from collections import Counter
s = 'qqweertyuiopasdfghjklzxcvbnm'
s_list = list(s)
elements = Counter(s_list).most_common()
print(elements)
alphabet_sort = sorted(elements, key=lambda x: x[0])
print(alphabet_sort)
num_sort = sorted(alphabet_sort, key=lambda x: x[1], reverse=True)
print(num_sort)
if you need to get slice:
print(num_sort[:3])
from collections import Counter
print(sorted(Counter('AAABBBCCD').most_common(3)))
This question seems to be a duplicate
How to sort Counter by value? - python

Spark select top values in RDD

The original dataset is:
# (numbersofrating,title,avg_rating)
newRDD =[(3,'monster',4),(4,'minions 3D',5),....]
I want to select top N avg_ratings in newRDD.I use the following code,it has an error.
selectnewRDD = (newRDD.map(x, key =lambda x: x[2]).sortBy(......))
TypeError: map() takes no keyword arguments
The expected data should be:
# (numbersofrating,title,avg_rating)
selectnewRDD =[(4,'minions 3D',5),(3,'monster',4)....]
You can use either top or takeOrdered with key argument:
newRDD.top(2, key=lambda x: x[2])
or
newRDD.takeOrdered(2, key=lambda x: -x[2])
Note that top is taking elements in descending order and takeOrdered in ascending so key function is different in both cases.
Have you tried using top? Given that you want the top avg ratings (and it is the third item in the tuple), you'll need to assign it to the key using a lambda function.
# items = (number_of_ratings, title, avg_rating)
newRDD = sc.parallelize([(3, 'monster', 4), (4, 'minions 3D', 5)])
top_n = 10
>>> newRDD.top(top_n, key=lambda items: items[2])
[(4, 'minions 3D', 5), (3, 'monster', 4)]

Sorting a dictionary by value then by key [duplicate]

This question already has answers here:
Sorting a dictionary by value then key
(3 answers)
Closed 6 years ago.
This seems like it has to be a dupe but my SO-searching-fu is poor today...
Say I have a dictionary of integer key/values, how can I sort the dictionary by the values descending, then by the key descending (for common values).
Input:
{12:2, 9:1, 14:2}
{100:1, 90:4, 99:3, 92:1, 101:1}
Output:
[(14,2), (12,2), (9,1)] # output from print
[(90,4), (99,3), (101,1), (100,1), (92,1)]
In [62]: y={100:1, 90:4, 99:3, 92:1, 101:1}
In [63]: sorted(y.items(), key=lambda x: (x[1],x[0]), reverse=True)
Out[63]: [(90, 4), (99, 3), (101, 1), (100, 1), (92, 1)]
The key=lambda x: (x[1],x[0]) tells sorted that for each item x in y.items(), use (x[1],x[0]) as the proxy value to be sorted. Since x is of the form (key,value), (x[1],x[0]) yields (value,key). This causes sorted to sort by value first, then by key for tie-breakers.
reverse=True tells sorted to present the result in descending, rather than ascending order.
See this wiki page for a great tutorial on sorting in Python.
PS. I tried using key=reversed instead, but reversed(x) returns an iterator, which does not compare as needed here.
Maybe this is more explicit:
>>> y = {100:1, 90:4, 99:3, 92:1, 101:1}
>>> reverse_comparison = lambda (a1, a2), (b1, b2):cmp((b2, b1), (a2, a1))
>>> sorted(y.items(), cmp=reverse_comparison)
[(90, 4), (99, 3), (101, 1), (100, 1), (92, 1)]
Try this:
>>> d={100:1, 90:4, 99:3, 92:1, 101:1}
>>> sorted(d.items(), lambda a,b:b[1]-a[1] or a[0]-b[0])

Sort a list by multiple attributes?

I have a list of lists:
[[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]
If I wanted to sort by one element, say the tall/short element, I could do it via s = sorted(s, key = itemgetter(1)).
If I wanted to sort by both tall/short and colour, I could do the sort twice, once for each element, but is there a quicker way?
A key can be a function that returns a tuple:
s = sorted(s, key = lambda x: (x[1], x[2]))
Or you can achieve the same using itemgetter (which is faster and avoids a Python function call):
import operator
s = sorted(s, key = operator.itemgetter(1, 2))
And notice that here you can use sort instead of using sorted and then reassigning:
s.sort(key = operator.itemgetter(1, 2))
I'm not sure if this is the most pythonic method ...
I had a list of tuples that needed sorting 1st by descending integer values and 2nd alphabetically. This required reversing the integer sort but not the alphabetical sort. Here was my solution: (on the fly in an exam btw, I was not even aware you could 'nest' sorted functions)
a = [('Al', 2),('Bill', 1),('Carol', 2), ('Abel', 3), ('Zeke', 2), ('Chris', 1)]
b = sorted(sorted(a, key = lambda x : x[0]), key = lambda x : x[1], reverse = True)
print(b)
[('Abel', 3), ('Al', 2), ('Carol', 2), ('Zeke', 2), ('Bill', 1), ('Chris', 1)]
Several years late to the party but I want to both sort on 2 criteria and use reverse=True. In case someone else wants to know how, you can wrap your criteria (functions) in parenthesis:
s = sorted(my_list, key=lambda i: ( criteria_1(i), criteria_2(i) ), reverse=True)
It appears you could use a list instead of a tuple.
This becomes more important I think when you are grabbing attributes instead of 'magic indexes' of a list/tuple.
In my case I wanted to sort by multiple attributes of a class, where the incoming keys were strings. I needed different sorting in different places, and I wanted a common default sort for the parent class that clients were interacting with; only having to override the 'sorting keys' when I really 'needed to', but also in a way that I could store them as lists that the class could share
So first I defined a helper method
def attr_sort(self, attrs=['someAttributeString']:
'''helper to sort by the attributes named by strings of attrs in order'''
return lambda k: [ getattr(k, attr) for attr in attrs ]
then to use it
# would defined elsewhere but showing here for consiseness
self.SortListA = ['attrA', 'attrB']
self.SortListB = ['attrC', 'attrA']
records = .... #list of my objects to sort
records.sort(key=self.attr_sort(attrs=self.SortListA))
# perhaps later nearby or in another function
more_records = .... #another list
more_records.sort(key=self.attr_sort(attrs=self.SortListB))
This will use the generated lambda function sort the list by object.attrA and then object.attrB assuming object has a getter corresponding to the string names provided. And the second case would sort by object.attrC then object.attrA.
This also allows you to potentially expose outward sorting choices to be shared alike by a consumer, a unit test, or for them to perhaps tell you how they want sorting done for some operation in your api by only have to give you a list and not coupling them to your back end implementation.
convert the list of list into a list of tuples then sort the tuple by multiple fields.
data=[[12, 'tall', 'blue', 1],[2, 'short', 'red', 9],[4, 'tall', 'blue', 13]]
data=[tuple(x) for x in data]
result = sorted(data, key = lambda x: (x[1], x[2]))
print(result)
output:
[(2, 'short', 'red', 9), (12, 'tall', 'blue', 1), (4, 'tall', 'blue', 13)]
Here's one way: You basically re-write your sort function to take a list of sort functions, each sort function compares the attributes you want to test, on each sort test, you look and see if the cmp function returns a non-zero return if so break and send the return value.
You call it by calling a Lambda of a function of a list of Lambdas.
Its advantage is that it does single pass through the data not a sort of a previous sort as other methods do. Another thing is that it sorts in place, whereas sorted seems to make a copy.
I used it to write a rank function, that ranks a list of classes where each object is in a group and has a score function, but you can add any list of attributes.
Note the un-lambda-like, though hackish use of a lambda to call a setter.
The rank part won't work for an array of lists, but the sort will.
#First, here's a pure list version
my_sortLambdaLst = [lambda x,y:cmp(x[0], y[0]), lambda x,y:cmp(x[1], y[1])]
def multi_attribute_sort(x,y):
r = 0
for l in my_sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
Lst = [(4, 2.0), (4, 0.01), (4, 0.9), (4, 0.999),(4, 0.2), (1, 2.0), (1, 0.01), (1, 0.9), (1, 0.999), (1, 0.2) ]
Lst.sort(lambda x,y:multi_attribute_sort(x,y)) #The Lambda of the Lambda
for rec in Lst: print str(rec)
Here's a way to rank a list of objects
class probe:
def __init__(self, group, score):
self.group = group
self.score = score
self.rank =-1
def set_rank(self, r):
self.rank = r
def __str__(self):
return '\t'.join([str(self.group), str(self.score), str(self.rank)])
def RankLst(inLst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank)):
#Inner function is the only way (I could think of) to pass the sortLambdaLst into a sort function
def multi_attribute_sort(x,y):
r = 0
for l in sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
inLst.sort(lambda x,y:multi_attribute_sort(x,y))
#Now Rank your probes
rank = 0
last_group = group_lambda(inLst[0])
for i in range(len(inLst)):
rec = inLst[i]
group = group_lambda(rec)
if last_group == group:
rank+=1
else:
rank=1
last_group = group
SetRank_Lambda(inLst[i], rank) #This is pure evil!! The lambda purists are gnashing their teeth
Lst = [probe(4, 2.0), probe(4, 0.01), probe(4, 0.9), probe(4, 0.999), probe(4, 0.2), probe(1, 2.0), probe(1, 0.01), probe(1, 0.9), probe(1, 0.999), probe(1, 0.2) ]
RankLst(Lst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank))
print '\t'.join(['group', 'score', 'rank'])
for r in Lst: print r
There is a operator < between lists e.g.:
[12, 'tall', 'blue', 1] < [4, 'tall', 'blue', 13]
will give
False

Categories