sort list of tuples with multiple criteria - python

I have a list of tuples of k elements. I'd like to sort with respect to element 0, then element 1 and so on and so forth. I googled but I still can't quite figure out how to do it. Would it be something like this?
list.sort(key = lambda x : (x[0], x[1], ...., x[k-1])
In particular, I'd like to sort using different criteria, for example, descending on element 0, ascending on element 1 and so on.

Since python's sort is stable for versions after 2.2 (or perhaps 2.3), the easiest implementation I can think of is a serial repetition of sort using a series of index, reverse_value tuples:
# Specify the index, and whether reverse should be True/False
sort_spec = ((0, True), (1, False), (2, False), (3, True))
# Sort repeatedly from last tuple to the first, to have final output be
# sorted by first tuple, and ties sorted by second tuple etc
for index, reverse_value in sort_spec[::-1]:
list_of_tuples.sort(key = lambda x: x[index], reverse=reverse_value)
This does multiple passes so it may be inefficient in terms of constant time cost, but still O(nlogn) in terms of asymptotic complexity.
If the sort order for indices is truly 0, 1... n-1, n for a list of n-sized tuples as shown in your example, then all you need is a sequence of True and False to denote whether you want reverse or not, and you can use enumerate to add the index.
sort_spec = (True, False, False, True)
for index, reverse_value in list(enumerate(sort_spec))[::-1]:
list_of_tuples.sort(key = lambda x: x[index], reverse=reverse_value)
While the original code allowed for the flexibility of sorting by any order of indices.
Incidentally, this "sequence of sorts" method is recommended in the Python Sorting HOWTO with minor modifications.
Edit
If you didn't have the requirement to sort ascending by some indices and descending by others, then
from operator import itemgetter
list_of_tuples.sort(key = itemgetter(1, 3, 5))
will sort by index 1, then ties will be sorted by index 3, and further ties by index 5. However, changing the ascending/descending order of each index is non-trivial in one-pass.

list.sort(key = lambda x : (x[0], x[1], ...., x[k-1])
This is actually using the tuple as its own sort key. In other words, the same thing as calling sort() with no argument.
If I assume that you simplified the question, and the actual elements are actually not in the same order you want to sort by (for instance, the last value has the most precedence), you can use the same technique, but reorder the parts of the key based on precedence:
list.sort(key = lambda x : (x[k-1], x[1], ...., x[0])
In general, this is a very handy trick, even in other languages like C++ (if you're using libraries): when you want to sort a list of objects by several members with varying precedence, you can construct a sort key by making a tuple containing all the relevant members, in the order of precedence.
Final trick (this one is off topic, but it may help you at some point): When using a library that doesn't support the idea of "sort by" keys, you can usually get the same effect by building a list that contains the sort-key. So, instead of sorting a list of Obj, you would construct then sort a list of tuples: (ObjSortKey, Obj). Also, just inserting the objects into a sorted set will work, if they sort key is unique. (The sort key would be the index, in that case.)

So I am assuming you want to sort tuple_0 ascending, then tuple_1 descending, and so on. A bit verbose but this is what you might be looking for:
ctr = 0
for i in range(list_of_tuples):
if ctr%2 == 0:
list_of_tuples[0] = sorted(list_of_tuples[0])
else:
list_of_tuples[0] = sorted(list_of_tuples[0], reverse=True)
ctr+=1
print list_of_tuples

Related

How do I sort lists by two different values within that list?

I have a list that is of the following form:
my_list= [['A',(3,4)],['A2',(6,11)],['U1',(2,9)],['P9',(1,9)], ['X',(10,4)]...]
I need to sort the letter/number combinations based on the list that corresponds with them, (1,2,3,4) for example.
The second number in the list needs to be in descending order. Then, the first number in that list needs to be in descending order. The last number takes priority over the first number.
These numbers correspond to the location of these values on an image. I am attempting to sort these by the way they appear on the image (top to bottom, left to right).
The correct order for the above list would be:
['A2',(6,11)], ['U1',(2,9)],['P9',(1,9)], ['X',(10,4)], [['A',(3,4)]
To be frank, I do not know where to start with this. Could someone please explain how to properly write this in Python?
You can pass a key function to list.sort to specify what to sort by.
my_list.sort(key=lambda x: (-x[1][1], -x[1][0]))
In general: to sort by multiple keys, if you have a stable sort (and Python's sort is stable), then you can do it in steps from the least important key to the primary key. In your case by the first number descending and then by the second number also descending:
s0 = [['A',(3,4)],['A2',(6,11)],['U1',(2,9)],['P9',(1,9)], ['X',(10,4)]]
s1 = sorted(s0, key = lambda x: x[1][0], reverse=True)
print(s1) # intermediate result
s2 = sorted(s1, key = lambda x: x[1][1], reverse=True)
print(s2) # final result

How can i sort a list by the second item descending and first one ascending?

I have a list like this:
list_results=[('Horror', 2), ('Romance', 2), ('Comedy', 2), ('History', 2), ('Adventure', 1), ('Action', 3)]
I wish to sort the number in descending order and if numbers were the same, according to the name in ascending order.
I tried the following code:
sortlist=sorted(list_results,key=lambda x:(x[1],x[0]))
and the reverse but I couldn't figure out to do it.
The answer that I'm looking for is:
[('Action', 3), ('Comedy', 2) ,('History', 2),('Horror', 2), ('Romance', 2), ('Adventure', 1), ]
First sort the list by the first item, then by the second item:
list_results = sorted(list_results, key=lambda x:x[0])
list_results = sorted(list_results, key=lambda x:x[1], reverse=True)
or better yet without copying:
import operator
list_results.sort(key=operator.itemgetter(0))
list_results.sort(key=operator.itemgetter(1), reverse=True)
Python's sort algorithm is Timsort. It is a stable algorithm meaning if 2 values are the same, they'll stay in their original order.
If you sort alphabetically first, and then by the priority, the list will be sorted according to the alphabet, then re-sorted according to priority with alphabet being secondary.
You want to sort according to two criteria, with one criterion acting as a tie-breaker for the other. Since python's sorted and list.sort are guaranteed to be stable sorts, one solution is to sort the list twice: first sort it by the tie-breaker, then sort it by the main criterion. This is #Bharel's answer.
Another possibility is to sort only once, using a tuple as the key. Python's sorted and list.sort both offer a reverse= True or False argument to specify a sort in increasing or decreasing order; but in your case, we want to sort in decreasing order with respect to the first criterion, and increasing order with respect to the second criterion. The reverse keyword is not helpful because it is all-or-nothing: it will not allow us to choose which criterion to reverse.
Since the first criterion is numeric (an integer), a simple trick to sort in reverse order is to negate it with a minus sign:
sortlist = sorted(list_results, key=lambda x:(-x[1], x[0]))
Note the -x[1] instead of just x[1].
Here are two arguments in favour of sorting once by a tuple, rather than twice:
When sorting according to (-x[1], x[0]), it is immediately clear that -x[1] is the main criterion, and x[0] is only a tie-breaker. By contrast, if you sort twice, someone reading your code needs to take a second to understand that the last sort is the most important, and the previous sort is only there as a tie-breaker relying on sorted being a stable sort.
If the list is long, sorting once with a tuple key is probably faster than sorting twice with simple keys. This is especially true because the second key is a string; comparing string is slower than comparing integers. If you use tuples, the strings will only be compared for two items who are ex aequo on the first key; but if you sort twice, about n log(n) string comparisons will be performed in the first sort.
If your list is small, it probably doesn't matter which version is faster (unless you're repeatedly sorting lots of small lists...), so it's a matter of preference and readability.

numpy.unique has the problem with frozensets

Just run the code:
a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})]
print(set(a)) # out: {frozenset({3, 4}), frozenset({1, 2})}
print(np.unique(a)) # out: [frozenset({1, 2}), frozenset({3, 4}), frozenset({1, 2})]
The first out is correct, the second is not.
The problem exactly is here:
a[0]==a[-1] # out: True
But set from np.unique has 3 elements, not 2.
I used to utilize np.unique to work with duplicates for ex (using return_index=True and others). What can u advise for me to use instead np.unique for these purposes?
numpy.unique operates by sorting, then collapsing runs of identical elements. Per the doc string:
Returns the sorted unique elements of an array.
The "sorted" part implies it's using a sort-collapse-adjacent technique (similar to what the *NIX sort | uniq pipeline accomplishes).
The problem is that while frozenset does define __lt__ (the overload for <, which most Python sorting algorithms use as their basic building block), it's not using it for the purposes of a total ordering like numbers and sequences use it. It's overloaded to test "is a proper subset of" (not including direct equality). So frozenset({1,2}) < frozenset({3,4}) is False, and so is frozenset({3,4}) > frozenset({1,2}).
Because the expected sort invariant is broken, sorting sequences of set-like objects produces implementation-specific and largely useless results. Uniquifying strategies based on sorting will typically fail under those conditions; one possible result is that it will find the sequence to be sorted in order or reverse order already (since each element is "less than" both the prior and subsequent elements); if it determines it to be in order, nothing changes, if it's in reverse order, it swaps the element order (but in this case that's indistinguishable from preserving order). Then it removes adjacent duplicates (since post-sort, all duplicates should be grouped together), finds none (the duplicates aren't adjacent), and returns the original data.
For frozensets, you probably want to use hash based uniquification, e.g. via set or (to preserve original order of appearance on Python 3.7+), dict.fromkeys; the latter would be simply:
a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})]
uniqa = list(dict.fromkeys(a)) # Works on CPython/PyPy 3.6 as implementation detail, and on 3.7+ everywhere
It's also possible to use sort-based uniquification, but numpy.unique doesn't seem to support a key function, so it's easier to stick to Python built-in tools:
from itertools import groupby # With no key argument, can be used much like uniq command line tool
a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})]
uniqa = [k for k, _ in groupby(sorted(a, key=sorted))]
That second line is a little dense, so I'll break it up:
sorted(a, key=sorted) - Returns a new list based on a where each element is sorted based on the sorted list form of the element (so the < comparison actually does put like with like)
groupby(...) returns an iterator of key/group-iterator pairs. With no key argument to groupby, it just means each key is a unique value, and the group-iterator produces that value as many times as it was seen.
[k for k, _ in ...] Since we don't care how many times each duplicate value was seen, so we ignore the group-iterator (assigning to _ means "ignored" by convention), and have the list comprehension produce only the keys (the unique values)

How to order a list of tuples by the integer value of a certain index of each, in Python?

Given a list of tuples e.g.
[('a','b','4'),('c','d','9'),('e','f','2')]
The third element of each tuple will always be the string value of an integer.
I want to write each tuple as a row of a csv using csv.writerow().
Before I do, I want to reorder the tuples (ideally by overwriting the existing list or creating a new one) such that they get written in descending order of the integer value of that third element of each e.g.
c,d,9
a,b,4
e,f,2
I'm trying to imagine some sort of multiple if/else combo in a list comprehension, but surely there's go to be a simpler way?
The sorted function (or the list method sort) takes optional arguments reverse to allow you to sort in decreasing order, and key to allow you to specify what to sort by.
l = [('a','b','4'),('c','d','9'),('e','f','2')]
l.sort(key=lambda x: int(x[2]), reverse=True)
gives you the list in the order you want.
In my answer I use sys.stdout as an example but you may use a file instead
>>> import sys, csv
>>> items = [('a','b','4'),('c','d','9'),('e','f','2')]
>>> w = csv.writer(sys.stdout)
>>> w.writerows(sorted(items, key=lambda x: int(x[2]), reverse=True))
c,d,9
a,b,4
e,f,2
This works in both Python 2 and Python 3:
x = [('a','b','4'),('c','d','9'),('e','f','2')]
x.sort(key=lambda int(x:x[2]), reverse=True)
key is a function applied to each item in the list and returns the key to be used as the basis for sorting.
Another one using itemgetter slightly faster than lambda x: x[2] for small lists and considerably faster for larger lists.
from operator import itemgetter
l = [('a','b','4'),('c','d','9'),('e','f','2')]
l.sort(key=itemgetter(2), revese=True)
Python sorting mini Howto has got a lot many useful tricks worth reading.
Slightly different, albeit less efficient solution:
def sort_tuple_list(l):
sorted_ints = sorted([int(i[2]) for i in l], reverse=True)
sorted_tuple_list = []
for i in sorted_ints:
for tup in l:
if int(tup[2]) == i:
sorted_tuple_list.append(tup)
return sorted_tuple_list
Returns a list sorted according to the original question's specifications.
You can then simply write each row of this returned list to your csv file.

Select maximum value in a list and the attributes related with that value

I am looking for a way to select the major value in a list of numbers in order to get the attributes.
data
[(14549.020163184512, 58.9615170298556),
(18235.00848249135, 39.73350448334156),
(12577.353023695543, 37.6940001866714)]
I wish to extract (18235.00848249135, 39.73350448334156) in order to have 39.73350448334156. The previous list (data) is derived from a a empty list data=[]. Is it the list the best format to store data in a loop?
You can get it by :
max(data)[1]
since tuples will be compared by the first element by default.
max(data)[1]
Sorting a tuple sorts according to the first elements, then the second. It means max(data) sorts according to the first element.
[1] returns then the second element from the "maximal" object.
Hmm, it seems easy or what?)
max(a)[1] ?
You can actually sort on any attribute of the list. You can use itemgetter. Another way to sort would be to use a definitive compare functions (when you might need multiple levels of itemgetter, so the below code is more readable).
dist = ((1, {'a':1}), (7, {'a': 99}), (-1, {'a':99}))
def my_cmp(x, y):
tmp = cmp(x[1][a], y[1][a])
if tmp==0:
return (-1 * cmp(x[0], y[0]))
else: return tmp
sorted = dist.sort(cmp=my_cmp) # sorts first descending on attr "a" of the second item, then sorts ascending on first item

Categories