Sorting by multiple conditions in python - python

I am new to programming and right now i'm writing a league table in python. I would like to sort my league by first points, and if there are two teams with the same points I would like to sort them by goal difference, and if they have the same goal difference i would like to sort by name.
The first condition is pretty easy and is working by the following:
table.sort(reverse=True, key=Team.getPoints)
how do I insert the two following conditions?

Have the key function return a tuple, with items in decreasing order of priority:
table.sort(reverse=True, key=lambda team: (Team.getPoints(team),
Team.getGoalDifference(team),
Team.getName(team))
Alternately, you could remember a factoid from algorithms 101, and make use of the fact .sort() is a stable sort, and thus doesn't change the relative order of items in a list if they compare as equal. This means you can sort three times, in increasing order of priority:
table.sort(reverse=True, key=Team.getName)
table.sort(reverse=True, key=Team.getGoalDifference)
table.sort(reverse=True, key=Team.getPoints)
This will be slower, but allows you to easily specify whether each step should be done in reverse or not. This can be done without multiple sorting passes using cmp_to_key(), but the comparator function would be nontrivial, something like:
def team_cmp(t1, t2):
for key_func, reverse in [(Team.getName, True),
(Team.getGoalDifference, True),
(Team.getPoints, True)]:
result = cmp(key_func(t1), key_func(t2))
if reverse: result = -result;
if result: return result
return 0
table.sort(functools.cmp_to_key(team_cmp))
(Disclaimer: the above is written from memory, untested.) Emphasis is on "without multiple passes", which does not necessarily imply "faster". The overhead from the comparator function and cmp_to_key(), both of which are implemented in Python (as opposed to list.sort() and operator.itemgetter(), which should be part of the C core) is likely to be significant.
As an aside, you don't need to create dummy functions to pass to the key parameters. You can access the attribute directly, using:
table.sort(key=lambda t: t.points)
or the attrgetter operator wrapper:
table.sort(key=attrgetter('points'))

Sort the list by name first, then sort again by score difference. Python's sort is stable, meaning it will preserve order of elements that compare equal.

Python sorting algorithm is Timsort which, as ACEfanatic02 points out, is stable which means order is preserved. This link has a nice visual explanation of how it works.

Related

Why does a set display in same order if sets are unordered?

I'm taking a first look at the python language from Python wikibook.
For sets the following is mentioned:
We can also have a loop move over each of the items in a set. However, since sets are unordered, it is undefined which order the iteration will follow.
and the code example given is :
s = set("blerg")
for letter in s:
print letter
Output:
r b e l g
When I run the program I get the results in the same order, no matter how many times I run. If sets are unordered and order of iteration is undefined, why is it returning the set in the same order? And what is the basis of the order?
They are not randomly ordered, they are arbitrarily ordered. It means you should not count on the order of insertions being maintained as the actual internal implementation details determine the order instead.
The order depends on the insertion and deletion history of the set.
In CPython, sets use a hash table, where inserted values are slotted into a sparse table based on the value returned from the hash() function, modulo the table size and a collision handling algorithm. Listing the set contents then returns the values as ordered in this table.
If you want to go into the nitty-gritty technical details then look at Why is the order in dictionaries and sets arbitrary?; sets are, at their core, dictionaries where the keys are the set values and there are no associated dictionary values. The actual implementation is a little more complicated, as always, but that answer will suffice to get you most of the way there. Then look at the C source code for set for the rest of those details.
Compare this to lists, which do have a fixed order that you can influence; you can move items around in the list and the new ordering would be maintained for you.

Python 3 sorting: Custom comparer removed in favor of key - why?

In Python 2.4, you can pass a custom comparer to sort.
Let's take the list -
list=[5,1,2,3,6,0,7,1,4]
To sort with the even numbers first, and then odds, we can do the following -
evenfirst=lambda x,y:1 if x%2>y%2 else -1 if y%2>x%2 else x-y
list.sort(cmp=evenfirst)
list == [0, 2, 4, 6, 1, 1, 3, 5, 7] # True
In Python 3, you can only pass key (which is also supported in Python 2.4).
Of course, the same sorting can be achieved in Python 3 with the right key:
list.sort(key=lambda x:[x%2,x])
I am curious about the decision of not supporting custom comparers anymore, especially when it seems something that could be implemented easily enough.
Is it true that in all, or most of the cases, a desired sort order has a natural key?
In the example above for example, such a key exists - and actually the code becomes more succinct using it. Is it always the case?
(I am aware of this recipe for converting comparer to key, but ideally, one should not have to take such workarounds if it could be built into the language.)
Performance.
The cmp function was called every time the sorting algorithm needed a comparison between two elements.
In contrast, the key object can be cached. That is, the sorting algorithm only needs to get the key once for each element and then compare the keys. It doesn't need to get a new key for every comparison.
Sorting by keys is well-defined, meaning the result doesn't depend on which (stable) sorting algorithm you use. There's no pathological key function. You might suggest random.random(), but that simply shuffles the list.
Whereas sorting with a compare function is well-defined only if the function is transitive and antisymmetric, which Python can neither test nor prove. What happens if you sort by nonsense compare function lambda(x, y): 1? You can't say, the result depends on the algorithm. Some algorithms might not even terminate.

Look up python dict value by expression

I have a dict that has unix epoch timestamps for keys, like so:
lookup_dict = {
1357899: {} #some dict of data
1357910: {} #some other dict of data
}
Except, you know, millions and millions and millions of entries. I'd like to subset this dict, over and over again. Ideally, I'd love to be able to write something like I can in R, like:
lookup_value = 1357900
dict_subset = lookup_dict[key >= lookup_value]
# dict_subset now contains {1357910: {}}
But I confess, I can't find any actual proof that this is something Python can do without having, one way or the other, to iterate over every row. If I understand Python correctly (and I might not), key lookup of the form key in dict uses binary search, and is thus very fast; any way to do a binary search, on dict keys?
To do this without iterating, you're going to need the keys in sorted order. Then you just need to do a binary search for the first one >= lookup_value, instead of checking each one for >= lookup_value.
If you're willing to use a third-party library, there are plenty out there. The first two that spring to mind are bintrees (which uses a red-black tree, like C++, Java, etc.) and blist (which uses a B+Tree). For example, with bintrees, it's as simple as this:
dict_subset = lookup_dict[lookup_value:]
And this will be as efficient as you'd hope—basically, it adds a single O(log N) search on top of whatever the cost of using that subset. (Of course usually what you want to do with that subset is iterate the whole thing, which ends up being O(N) anyway… but maybe you're doing something different, or maybe the subset is only 10 keys out of 1000000.)
Of course there is a tradeoff. Random access to a tree-based mapping is O(log N) instead of "usually O(1)". Also, your keys obviously need to be fully ordered, instead of hashable (and that's a lot harder to detect automatically and raise nice error messages on).
If you want to build this yourself, you can. You don't even necessarily need a tree; just a sorted list of keys alongside a dict. You can maintain the list with the bisect module in the stdlib, as JonClements suggested. You may want to wrap up bisect to make a sorted list object—or, better, get one of the recipes on ActiveState or PyPI to do it for you. You can then wrap the sorted list and the dict together into a single object, so you don't accidentally update one without updating the other. And then you can extend the interface to be as nice as bintrees, if you want.
Using the following code will work out
some_time_to_filter_for = # blah unix time
# Create a new sub-dictionary
sub_dict = {key: val for key, val in lookup_dict.items()
if key >= some_time_to_filter_for}
Basically we just iterate through all the keys in your dictionary and given a time to filter out for we take all the keys that are greater than or equal to that value and place them into our new dictionary

array vs hash key search

So I'm a longtime perl scripter who's been getting used to python since I changed jobs a few months back. Often in perl, if I had a list of values that I needed to check a variable against (simply to see if there is a match in the list), I found it easier to generate hashes to check against, instead of putting the values into an array, like so:
$checklist{'val1'} = undef;
$checklist{'val2'} = undef;
...
if (exists $checklist{$value_to_check}) { ... }
Obviously this wastes some memory because of the need for a useless right-hand value, but IMO is more efficients and easier to code than to loop through an array.
Now in python, the code for this is exactly the same no matter if you're searching an list or a dictionary:
if value_to_check in checklist_which_can_be_list_or_dict:
<code>
So my real question here is: in perl, the hash method was preferred for speed of processing vs. iterating through an array, but is this true in python? Given the code is the same, I'm wondering if python does list iteration better? Should I still use the dictionary method for larger lists?
Dictionaries are hashes. An in test on a list has to walk through every element to check it against, while an in test on a dictionary uses hashing to see if the key exists. Python just doesn't make you explicitly loop through the list.
Python also has a set datatype. It's basically a hash/dictionary without the right-hand values. If what you want is to be able to build up a collection of things, then test whether something is already in that collection, and you don't care about the order of the things or whether a thing is in the collection multiple times, then a set is exactly what you want!

How to sort a list of inter-linked tuples?

lst = [(u'course', u'session'), (u'instructor', u'session'), (u'session', u'trainee'), (u'person', u'trainee'), (u'person', u'instructor'), (u'course', u'instructor')]
I've above list of tuple, I need to sort it with following logic....
each tuple's 2nd element is dependent on 1st element, e.g. (course, session) -> session is dependent on course and so on..
I want a sorted list based on priority of their dependency, less or independent object will come first so output should be like below,
lst = [course, person, instructor, session, trainee]
You're looking for what's called a topological sort. The wikipedia page shows the classic Kahn and depth-first-search algorithms for it; Python examples are here (a bit dated, but should still run fine), on pypi (stable and reusable -- you can also read the code online here) and here (Tarjan's algorithm, that kind-of also deals with cycles in the dependencies specified), just to name a few.
Conceptually, what you need to do is create a directed acyclic graph with edges determined by the contents of your list, and then do a topological sort on the graph. The algorithm to do this doesn't exist in Python's standard library (at least, not that I can think of off the top of my head), but you can find plenty of third-party implementations online, such as http://www.bitformation.com/art/python_toposort.html
The function at that website takes a list of all the strings, items, and another list of the pairs between strings, partial_order. Your lst should be passed as the second argument. To generate the first argument, you can use itertools.chain.from_iterable(lst), so the overall function call would be
import itertools
lst = ...
ordering = topological_sort(itertools.chain.from_iterable(lst), lst)
Or you could modify the function from the website to only take one argument, and to create the nodes in the graph directly from the values in your lst.
EDIT: Using the topsort module Alex Martelli linked to, you could just pass lst directly.

Categories