Understanding heapq sorting algorithm

Understanding heapq sorting algorithm - python

I am reading the book, "Python from Novice to Expert" by Magnus Lie Hetland (Third Edition) and came across Heaps.
There he discusses the sorting order of a heap list as "the order of
the elements is important (even though it may look a bit haphazard.."
According to him the heap algorithm has 2 rules of ordering the elements:
1) Element at i is greater than element at position i//2
If one is not made then:
2) Element at position i is lower than elements at positions 2*i and 2*i+1
I ran a code checking these rules to see if they work all the time,
from heapq import *
from random import shuffle
data = list(range(10))
heap = []
shuffle(data)
for i in data:
heappush(heap, i)
print(heap)
temp = False
#From p.240
#The order of the elements isn’t as arbitrary as it seems. They aren’t in
#strictly sorted order, but there is one
#guarantee made: the element at position i is always greater than the one
#in position i // 2 (or, conversely,
#it’s smaller than the elements at positions 2 * i and 2 * i + 1). This is
#the basis for the underlying heap
#algorithm. This is called the heap property.
for i in heap:
print('___________')
if heap[i] > heap[i//2]:
print('First if: {}>{}'.format(heap[i],heap[i//2]))
temp = True
try:
if heap[i] < heap[2*i]:
print('Second if: {}<{}'.format(heap[i],heap[i*2]))
temp = True
except IndexError:
pass
try:
if heap[i] < heap[2*i+1]:
print('Third if: {}<{}'.format(heap[i],heap[i*2+1]))
temp = True
except IndexError:
pass
else:
try:
if heap[i] < heap[2*i]:
print('Second if: {}<{}'.format(heap[i],heap[i*2]))
temp = True
except IndexError:
pass
try:
if heap[i] < heap[2*i+1]:
print('Third if: {}<{}'.format(heap[i],heap[i*2+1]))
temp = True
except IndexError:
pass
if not temp:
print('No requirement was made')
temp = False
print('___________')
As expected there were inputs that achieved the goal and some not, such as:
[0, 1, 2, 3, 5, 8, 7, 9, 4, 6]
[0, 3, 1, 5, 4, 6, 2, 7, 8, 9]
My question is are there more rules for sorting when none of these rules apply?

As mentioned in the comments, the rule you had is stated in the framework of arrays with 1-based indices. Python lists are 0-based, and thus
if a child is at heap[i], in Python heap the parent is at heap[(i - 1) // 2], not at heap[i // 2]. Conversely, if a parent is at heap[j], then its children are at heap[j * 2 + 1] and heap[j * 2 + 2]
This is easy to see if you actually take the time to draw the heap:
Example 1 Example 2 Python Index 1-based Index
0 0 0 1
1 2 3 1 1 2 2 3
3 5 8 7 5 4 6 2 3 4 5 6 4 5 6 7
9 4 6 7 8 9 7 8 9 8 9 A

Related

Python Ruler Sequence Generator

I have been struggling for a long time to figure how to define a generator function of a ruler sequence in Python, that follows the rules that the first number of the sequence (starting with 1) shows up once, the next two numbers will show up twice, next three numbers will show up three times, etc.
So what I am trying to get is 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7 etc.
I understand that the way to do this is to have two separate count generators (itertools.count(1)) and then for every number in one generator yield number from the other generator:
def rul():
num = itertools.count(1)
repeator = itertools.count(1)
for x in range(next(repeator)):
yield from num
But if I hit next() on this function, I get back just the regular 1,2,3,4.. sequence...
Any help on this would be appreciated.

how about regular old python with no itertools?
def ruler():
counter = 1
n = 1
while True:
for i in range(counter):
for j in range(counter):
yield n
n += 1
counter += 1
in my humble opinion this is the clearest and most straighforward solution for these types of situations

How about itertools.repeat?
import itertools
def ruler():
num = rep_count = 0
while True:
rep_count += 1
for i in range(rep_count):
num += 1
yield from itertools.repeat(num, rep_count)

You can obtain such a generator without writing your own function using count() and repeat() from itertools:
from itertools import repeat,count
i = count(1,1)
rul = (n for r in count(1,1) for _ in range(r) for n in repeat(next(i),r))
for n in rul: print(n, end = " ")
# 1 2 2 3 3 4 4 4 5 5 5 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 ...

If you want to go all in on itertools, you'll need count, repeat, and chain.
You can group the numbers in your sequence as follows, with
each group corresponding to a single instance of repeat:
1 # repeat(1, 1)
2 2 # repeat(2, 2)
3 3 # repeat(3, 2)
4 4 4 # repeat(4, 3)
5 5 5 # repeat(5, 3)
6 6 6 # repeat(6, 3)
7 7 7 7 # repeat(7, 4)
...
So we can define ruler_numbers = chain.from_iterable(map(repeat, col1, col2)), as long as we can define col1 and col2 appropriately.
col1 is easy: it's just count(1).
col2 is not much more complicated; we can group them similarly to the original seqeunce:
1 # repeat(1, 1)
2 2 # repeat(2, 2)
3 3 3 # repeat(3, 3)
4 4 4 4 # repeat(4, 4)
...
which we can also generate using chain.from_iterable and map:
chain.from_iterable(map(repeat, count(1), count(1))).
In the end, we get our final result in our best attempt at writing Lisp in Python :)
from itertools import chain, repeat, count
ruler_numbers = chain.from_iterable(
map(repeat,
count(1),
chain.from_iterable(
map(repeat,
count(1),
count(1)))))
or if you want to clean it up a bit with a helper function:
def concatmap(f, *xs):
return chain.from_iterable(map(f, *xs))
ruler_numbers = concatmap(repeat,
count(1),
concatmap(repeat,
count(1),
count(1)))

Finding number from list that respects conditions

I need to code a script that chooses a number from a user input (list) depending on two conditions:
Is a multiple of 3
Is the smallest of all numbers
Here is what I've done so far
if a % 3 == 0 and a < b:
print (a)
a = int(input())
r = list(map(int, input().split()))
result(a, r)
The problem is I need to create a loop that keeps verifying these conditions for the (x) number of inputs.

It looks like you want a to be values within r rather than its own input. Here's an example of iterating through r and checking which numbers are multiples of 3, and of finding the minimum of all the numbers (not necessarily only those which are multiples of 3):
r = list(map(int, input().split()))
for a in r:
if a % 3 == 0:
print(f"Multiple of 3: {a}")
print(f"Smallest of numbers: {min(r)}")
1 2 3 4 5 6 7 8 9 0
Multiple of 3: 3
Multiple of 3: 6
Multiple of 3: 9
Multiple of 3: 0
Smallest of numbers: 0

Doing this in one line – or through generators – can improve performance through optimizing memory allocation:
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# The following is a generator
# Also: you need to decide if you want 0 to be included
all_threes = (x for x in my_list if x%3==0)
min_number = min(my_list)

Sorting a random array using permutation

I tried to sort an array by permuting it with itself
(the array contain all the numbers in range between 0 to its length-1)
so to test it I used random.shuffle but it had some unexpected results
a = np.array(range(10))
random.shuffle(a)
a = a[a]
a = a[a]
print(a)
# not a sorted array
# [9 5 2 3 1 7 6 8 0 4]
a = np.array([2,1,4,7,6,5,0,3,8,9])
a = a[a]
a = a[a]
print(a)
# [0 1 2 3 4 5 6 7 8 9]
so for some reason the permutation when using the second example of an unsorted array returns the sorted array as expected but the shuffled array doesn't work the same way.
Does anyone know why? Or if there is an easier way to sort using permutation or something similar it would be great.

TL;DR
There is no reason to expect a = a[a] to sort the array. In most cases it won't. In case of a coincidence it might.
What is the operation c = b[a]? or Applying a permutation
When you use an array a obtained by shuffling range(n) as a mask for an array b of same size n, you are applying a permutation, in the mathematical sense, to the elements of b. For instance:
a = [2,0,1]
b = np.array(['Alice','Bob','Charlie'])
print(b[a])
# ['Charlie' 'Alice' 'Bob']
In this example, array a represents the permutation (2 0 1), which is a cycle of length 3. Since the length of the cycle is 3, if you apply it three times, you will end up where you started:
a = [2,0,1]
b = np.array(['Alice','Bob','Charlie'])
c = b
for i in range(3):
c = c[a]
print(c)
# ['Charlie' 'Alice' 'Bob']
# ['Bob' 'Charlie' 'Alice']
# ['Alice' 'Bob' 'Charlie']
Note that I used strings for the elements of b ton avoid confusing them with indices. Of course, I could have used numbers from range(n):
a = [2,0,1]
b = np.array([0,1,2])
c = b
for i in range(3):
c = c[a]
print(c)
# [2 0 1]
# [1 2 0]
# [0 1 2]
You might see an interesting, but unsurprising fact: The first line is equal to a; in other words, the first result of applying a to b is equal to a itself. This is because b was initialised to [0 1 2], which represent the identity permutation id; thus, the permutations that we find by repeatedly applying a to b are:
id == a^0
a
a^2
a^3 == id
Can we always go back where we started? or The rank of a permutation
It is a well-known result of algebra that if you apply the same permutation again and again, you will eventually end up on the identity permutation. In algebraic notations: for every permutation a, there exists an integer k such that a^k == id.
Can we guess the value of k?
The minimum value of k is called the rank of a permutation.
If a is a cycle, then the minimum possible k is the length of the cycle. In our previous example, a was a cycle of length 3, so it took three applications of a before we found the identity permutation again.
How about a cycle of length 2? A cycle of length 2 is just "swapping two elements". For instance, swapping elements 0 and 1:
a = [1,0,2]
b = np.array([0,1,2])
c = b
for i in range(2):
c = c[a]
print(c)
# [1 0 2]
# [0 1 2]
We swap 0 and 1, then we swap them back.
How about two disjoint cycles? Let's try a cycle of length 3 on the first three elements, simultaneously with swapping the last two elements:
a = [2,0,1,3,4,5,7,6]
b = np.array([0,1,2,3,4,5,6,7])
c = b
for i in range(6):
c = c[a]
print(c)
# [2 0 1 3 4 5 7 6]
# [1 2 0 3 4 5 6 7]
# [0 1 2 3 4 5 7 6]
# [2 0 1 3 4 5 6 7]
# [1 2 0 3 4 5 7 6]
# [0 1 2 3 4 5 6 7]
As you can see by carefully examining the intermediary results, there is a period of length 3 on the first three elements, and a period of length 2 on the last two elements. The overall period is the least common multiple of the two periods, which is 6.
What is k in general? A well-known theorem of algebra states: every permutation can be written as a product of disjoint cycles. The rank of a cycle is the length of the cycle. The rank of a product of disjoint cycles is the least common multiple of the ranks of cycles.
A coincidence in your code: sorting [2,1,4,7,6,5,0,3,8,9]
Let us go back to your python code.
a = np.array([2,1,4,7,6,5,0,3,8,9])
a = a[a]
a = a[a]
print(a)
# [0 1 2 3 4 5 6 7 8 9]
How many times did you apply permutation a? Note that because of the assignment a =, array a changed between the first and the second lines a = a[a]. Let us dissipate some confusion by using a different variable name for every different value. Your code is equivalent to:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a2 = a[a]
a4 = a2[a2]
print(a4)
Or equivalently:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = (a[a])[a[a]]
This last line looks a little bit complicated. However, a cool result of algebra is that composition of permutations is associative. You already knew that addition and multiplication were associative: x+(y+z) == (x+y)+z and x(yz) == (xy)z. Well, it turns out that composition of permutations is associative as well! Using numpy's masks, this means that:
a[b[c]] == (a[b])[c]
Thus your python code is equivalent to:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = ((a[a])[a])[a]
print(a4)
Or without the unneeded parentheses:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = a[a][a][a]
print(a4)
Since a4 is the identity permutation, this tells us that the rank of a divides 4. Thus the rank of a is 1, 2 or 4. This tells us that a can be written as a product of swaps and length-4 cycles. The only permutation of rank 1 is the identity itself. Permutations of rank 2 are products of disjoint swaps, and we can see that this is not the case of a. Thus the rank of a must be exactly 4.
You can find the cycles by choosing an element, and following its orbit: what values is that element successively transformed into? Here we see that:
0 is transformed into 2; 2 is transformed into 4; 4 is transformed into 6; 6 is transformed into 0;
1 remains untouched;
3 becomes 7; 7 becomes 3;
5 is untouched; 8 and 9 are untouched.
Conclusion: Your numpy array represents the permutation (0 -> 2 -> 4 -> 6 -> 0)(3 <-> 7), and its rank is the least common multiple of 4 and 2, lcm(4,2) == 4.

it's took some time but I figure a way to do it.
numpy doesn't have this fiture but panda does have.
by using df.reindex I can sort a data frame by it indexes
import pandas as pd
import numpy as np
train_df = pd.DataFrame(range(10))
train_df = train_df.reindex(np.random.permutation(train_df.index))
print(train_df) # random dataframe contaning all values up to 9
train_df = train_df.reindex(range(10))
print(train_df) # sort data frame

Finding the max element in an array, sorted ascending first and then descending

I tried an online challenge which had a question as follows:
You are given an array which increases at first and then starts decreasing.
For example: 2 3 4 5 6 7 8 6 4 2 0 -2.
Find the maximum element of these array.
Following is my code using binary search and it gives correct answer in O(log(n)) but I don't know whether there is a better solution or not.
Can anyone help me with that?
a= map(int, raw_input().split())
def BS(lo,hi):
mid = lo+ (hi-lo)/2
if a[mid]>=a[mid+1]:
if a[mid]>a[mid-1]:
return mid
else:
return BS(lo,mid)
else:
return BS(mid,hi)
print a[BS(0,len(a)-1)]

An optimised variant - twice faster in most cases:
# ® Видул Николаев Петров
a = [2, 3, 4, 5, 6, 7, 8, 10, 12, 24, 48, 12, 6, 5, 0, -1]
def calc(a):
if len(a) <= 2:
return a[0] if a[0] > a[1] else a[1]
l2 = len(a) / 2
if a[l2 + 1] <= a[l2] and a[l2] >= a[l2 - 1]:
return a[l2]
if a[l2] > a[l2 + 1]:
return calc(a[:l2+1])
else:
return calc(a[l2:])
print calc(a) # 48

i am trying your code with the following input 2 3 4 5 5 8 and the answer should be 8 but the answer is 5 i am posting an image with a few more test cases
i think u cannot run binary search on an unsorted array
the code also gives huge list of exceptions for sorted arrays

Why don't you use the max() method??
max(lst) will return the max value in a list

Constructing Lists

I'm new to Python and I came across the following query. Can anyone explain why the following:
[ n**2 for n in range(1, 6)]
gives:
[1, 4, 9, 16, 25]

It is called a list comprehension. What is happening is similar to the following:
results = []
for n in range(1,6):
results.append(n**2)
It therefore iterates through a list containing the values [0, 1, 2, 3, 4, 5] and squares each value. The result of the squaring is then added to the results list, and you get back the result you see (which is equivalent to 0**2, 1**2, 2**2, etc., where the **2 means 'raised to the second power').
This structure (populating a list with values based on some other criteria) is a common one in Python, so the list comprehension provides a shorthand syntax for allowing you to do so.

Breaking it down into manageable chunks in the interpreter:
>>> range(1, 6)
[1, 2, 3, 4, 5]
>>> 2 ** 2 # `x ** 2` means `x * x`
4
>>> 3 ** 2
9
>>> for n in range(1, 6):
...   print n
1
2
3
4
5
>>> for n in range(1, 6):
... print n ** 2
1
4
9
16
25
>>> [n ** 2 for n in range(1, 6)]
[1, 4, 9, 16, 25]

So that's a list comprehension.
If you break it down into 3 parts; separated by the words: 'for' and 'in' ..
eg.
[ 1 for 2 in 3 ]
Probably reading it backwards is easiest:
3 - This is the list of input into the whole operation
2 - This is the single item from the big list
1 - This is the operation to do on that item
part 1 and 2 are run multiple times, once for each item in the list that part 3 gives us. The output of part 1 being run over and over, is the output of the whole operation.
So in your example:
3 - Generates a list: [1, 2, 3, 4, 5] -- Range runs from the first param to one before the second param
2 - 'n' represents a single number in that list
1 - Generates a new list of n**2 (n to the power of 2)
So an equivalent code would be:
result = []
for n in range(1, 6):
result.append(n**2)
Finally breaking it all out:
input = [1, 2, 3, 4, 5]
output = []
v = input[0] # value is 1
o = v**2 # 1 to the power of two is 1
output.append(o)
v = input[1] # value is 2
o = v**2 # 2 to the power of two = (2*2) = 4
output.append(o)
v = input[2] # value is 3
o = v**2 # 3 to the power of two is = (3*3) = 9
output.append(o)
v = input[3] # value is 4
o = v**2 # 4 to the power of two is = (4*4) = 16
output.append(o)
v = input[4] # value is 5
o = v**2 # 5 to the power of two is = (5*5) = 25
output.append(o)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding heapq sorting algorithm - python

Related

Python Ruler Sequence Generator

Finding number from list that respects conditions

Sorting a random array using permutation

Finding the max element in an array, sorted ascending first and then descending

Constructing Lists

Categories

Resources