Explanation of custom key (helper) sorting function - python

I encountered the following example in a tutorial I watched recently.
We want to sort these numbers:
numbers = [8, 3, 1, 2, 5, 4, 7, 6]
prioritising the ones belonging to the following group:
group = {2, 3, 5, 7}
So the helper (sorting key) function implemented by the author was the following:
def helper(x):
if x in group:
return (0, x)
return (1, x)
and it sorts by calling
numbers.sort(key=helper)
I can't seem to get my head around this return (0,x) vs. return (1,x) which most likely is something easy to explain (but perhaps I am missing an element about how the sorting helper function works)

What that key function does is, instead of comparing
[8, 3, 1, 2, 5, 4, 7, 6]
it compares
[(0, 8), (1, 3), (0, 1), (1, 2), (1, 5), (0, 4), (1, 7), (0, 6)]
Tuples are sorted lexicographically, meaning that the first elements are compared first. If they differ, the comparison stop. If they're the same, the second elements are compared.
This has the effect of bringing all numbers in group to the front (in numerical order) followed by the rest of the numbers (also in numerical order).

Well, (0, x) is smaller than (1, x). In short, Python will first compare the first element, if they are the same, then the second, then the third...
Is it clear enough? I mean, in your example, all elements in that group will be considered as smaller than the elements which are not in that group.

When the following line executes numbers.sort(key=helper), an iterator iterates over each element of the list number.
While iterating, for each element, it makes a call to the helper method with the element.
If this element is a part of the group, it returns (0, element).
If it isn't a part of the group, it returns (1, element).
Now, while sorting, the elements to be sorted are [(0,x), (1,x), (0,x)...] not the actual elements.
It compares two tuples in the list and checks if the values are > or < or =.
While comparing two tuples, it first compares them based on the value at the 0th index in each element.
Then it compares them based on the 1st value in each element of the list and so on..
This results in the following output:
>>> numbers
[2, 3, 5, 7, 1, 4, 6, 8]
If there would have been characters at the first index in each element, they would have been sorted based on their ASCII values.

Related

Finding the number of elements in the first index of the array given (Python)

I'm currently working on a HC single linkage method in Python and when I'm cluster each element I'm storing the current clusters as:
array1=[(0, 1), (3, 5), 2, 4]
elements1=len(array1[0])
array2[((3, 5), 4), (0, 1), 2]
elements2=len(array2[0])
What I want to do is find the number of elements in the zero'th index of each array, for the first elements1 I get the correct answer of 2 but for elements2 I also get 2 but it should be 3, I can see the problem however I can't seem to be able to work around
In array2, the first index has a tuple of size 2, the first element is another tuple (3, 5), and the second element is a single int 4
If you want len(array2[0]) to return 3, you have to unpack all the values present at that index.
That can be done by the following code:
length = len(array2[0])
if isinstance(array2[0][0], tuple):
length -= 1
length += len(array2[0][0])
# replacing one with the size of the tuple
You can do this in a loop to check for all instances.
I tried to find a method to unpack the values you have, but that did not work, this was the next best solution.
Let me know if it works!

Is there a pattern to itertools.permutations

When i am iterating over an itertools.permutations, I would like to know at what indexes specific combinations of numbers would show up, without slowly iterating over the whole thing.
For example:
When I have a list, foo, which equals list(itertools.permutations(range(10))), I would like to know at which indexes the first character will be a zero, and the seventeenth a three. A simple way to do this would be to check every combination and see whether it fits my requirement.
n = 10
foo = list(itertools.permutations(range(n)))
solutions = []
for i, permutation in foo:
if permutation[0] == 0 and permutation[16] == 3:
solutions.append(i)
However, as n gets larger, this becomes incredibly slow, and very memory inefficient.
Is there some pattern that I could use so that instead of creating a long list I could simply say that if (a*i+b)%c == 0 then I know that it will fit my pattern.
EDIT: in reality I will be having many conditions some of which also involve more than 2 positions, therefore I hope that by combining those conditions I can limit the amount of possibilities to the point where this becomes doable. Also, the 100 might have been a big bit, I am expecting n to not get larger than 20.
You need to do a mapping between permutations of not fixed elements and corresponding permutations with fixed cells enrolled. For example, if you count permutations over list [0, 1, 2, 3, 4] and require a value 1, for zero cell and a value 2 for third cell, permutation (0, 4, 3) will be mapped to (1, 0, 4, 2, 3). I know, tuples are not friendly for this case because they are immutable but lists has insert method which is pretty useful here. That's why I convert them to lists and then back to tuples.
import itertools
def item_padding(item, cells):
#returns padding of item, e.g. (0, 4, 3) -> (1, 0, 4, 2, 3)
listed_item = list(item)
for idx in sorted(cells):
listed_item.insert(idx, cells[idx])
return tuple(listed_item)
array = range(5)
cells = {0:1, 3:2} #indexes and their fixed values
remaining_items = set(list(array)) - set(list(cells.values()))
print(list(map(lambda x: item_padding(x, cells), itertools.permutations(remaining_items))))
Output:
[(1, 0, 3, 2, 4), (1, 0, 4, 2, 3), (1, 3, 0, 2, 4), (1, 3, 4, 2, 0), (1, 4, 0, 2, 3), (1, 4, 3, 2, 0)]
To sum up, list conversions are quite slow as well as iterations. Despite that, I think this algorithm is a conceptually good example that reveals what can be done here. Use numpy instead if you really need to optimise it.
Update:
It works 6 seconds on my laptop if array is range(12) (with 3628800 permutations). It's three times more than returning not padded tuples.

Find the points with the steepest slope python

I have a list of float points such as [x1,x2,x3,x4,....xn] that are plotted as a line graph. I would like to find the set of points where the slope is the steepest.
Right now, Im calculating the difference between a set of points in a loop and using the max() function to determine the maximum point.
Any other elegant way of doing this?
Assuming points is the list of your values, you can calculate the differences in a single line using:
max_slope = max([x - z for x, z in zip(points[:-1], points[1:])])
But what you gain in compactness, you probably lose in readability.
What happens in this list comprehension is the following:
Two lists are created based on the original one, namely points[:-1] & points[1:]. Points[:-1] starts from the beginning of the original list and goes to the second to last item (inclusive). Points[1:] starts from the second item and goes all the way to the last item (inclusive again.)
Example
example_list = [1, 2, 3, 4, 5]
ex_a = example_list[:-1] # [1, 2, 3, 4]
ex_b = example_list[1:] # [2, 3, 4, 5]
Then you zip the two lists creating an object from which you can draw x, z pairs to calculate your differences. Note that zip does not create a list in Python 3 so you need to pass it's return value to the list argument.
Like:
example_list = [1, 2, 3, 4, 5]
ex_a = example_list[:-1] # [1, 2, 3, 4]
ex_b = example_list[1:] # [2, 3, 4, 5]
print(list(zip(ex_a, ex_b))) # [(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
Finally, you calculate the differences using the created pairs, store the results in a list and get the maximum value.
If the location of the max slope is also interesting you can get the index from the created list by using the .index() method. In that case, though, it would probably be better to save the list created by the comprehension and not just use it.
Numpy has a number of tools for working with arrays. For example, you could:
import numpy as np
xx = np.array([x1, x2, x3, x4, ...]) # your list of values goes in there
print(np.argmax(xx[:-1] - xx[1:])) # for all python versions

Bisect keep indexes of inserted items

I am using the bisect module to keep a list sorted while inserting numbers.
Lets say I have am going to insert three numbers 9, 2, 5 in this order.
The last state of this list would be obviously [2, 5, 9], however, is there any chance that I can find the index list that numbers are inserted into this list. For this list it would be [1, 2, 0]. So the list I need is the indexes [0, 1, 2] after the sort is happened which in bisect is in happening with each insertion, thats why I could not find a way. I could just sort it with key feature of the sorted function however I dont want to increase the complexity. So my question is this achievable with the bisect module ?
Here is the code I use,
import bisect
lst = []
bisect.insort(lst, 9)
bisect.insort(lst, 2)
bisect.insort(lst, 5)
print lst
Edit: Another example would be, i am going to insert the numbers 4, 7, 1, 2, 9 to some empty list. (Let's first assume without bisect, that I already have the numbers in the list)
[4, 7, 1, 2, 9]
# indexes [0, 1, 2, 3, 4], typical enumeration
after sorting,
[1, 2, 4, 7, 9]
# now the index list [2, 3, 0, 1, 4]
Can it be done with bisect without increasing complexity.
Note: The order of the insertion is not arbitrary. It is known, thats why I try to use indexes with bisect.
insort has no idea in what order the items were inserted. You'll have to add that logic yourself. One way to do so could be to insert 2-tuples consisting of the value and the index:
bisect.insort(lst, (9, 0))
bisect.insort(lst, (2, 1))
bisect.insort(lst, (5, 2))
You would need to keep track of the index yourself as you're adding objects, but as sequences are sorted first by the first item, then by the next, etc., this will still sort properly without any extra effort.

Making a sequence of tuples unique by a specific element

So I have a tuple of tuples
a = ((1, 2), (7, 2), (5, 2), (3, 4), (8, 4))
I would like to remove all the tuples from 'a' which have the a common second element, except for one (any one of them).
For the example above I would like the new output a = ((1,2),(3,4))
In other words I would like to eliminate tuples which are considered duplicate elements in the second position of the tuple.
I would like to know the most efficient way to achieve this, and also like to know if I can do the same with lists instead of tuples?
You could create a dictionary from your elements, with whatever you wanted to be unique as the key, then extracting the values. This works for anything where the 'unique' sub-element is hashable. Integers are hashable:
def unique_by_key(elements, key=None):
if key is None:
# no key: the whole element must be unique
key = lambda e: e
return {key(el): el for el in elements}.values()
This function is pretty generic; it can be used to extract 'unique' elements by any trait, as long as whatever the key callable returns can be used as a key in a dictionary. Order will not be preserved, and currently the last element per key wins.
With the above function you can use a operator.itemgetter() object or a lambda to extract your second value from each element. This then works for both a sequence of tuples and a sequence of lists:
from operator import itemgetter
unique_by_second_element = unique_by_key(a, key=itemgetter(1))
Demo:
>>> from operator import itemgetter
>>> a = ((1, 2), (7, 2), (5, 2), (3, 4), (8, 4))
>>> unique_by_key(a, key=itemgetter(1))
[(5, 2), (8, 4)]
>>> b = [[1, 2], [7, 2], [5, 2], [3, 4], [8, 4]]
>>> unique_by_key(b, key=itemgetter(1))
[[5, 2], [8, 4]]
Note that the function always returns a list; you can always convert that back by calling tuple() on the result.

Categories