quicksort and recursion in python - python

I'm trying to implement quicksort in Python using 2 main functions - partition and quicksort.
The partition function is designed so that it returns 2 arrays - bigger and smaller than p.
after that quicksort is called on both of them separately.
so the quicksort works like this:
def quicksort(array)
pivot = 0 # pivot choice is irrelevant
left,right = partition(array,pivot)
quicksort(left)
quicksoft(right)
return left+right
But from my understanding it should be possible to design partition to return just one single index - delimiting bigger and smaller arrays and redesign quicksort as follows:
def quicksort(array)
pivot = 0 # pivot choice is irrelevant
i = partition(array,pivot)
quicksort(array[:i-1])
quicksoft(array[i:])
return array
but this implementation returns partially sorted array
original array [5, 4, 2, 1, 6, 7, 3, 8, 9]
sorted array [3, 4, 2, 1, 5, 7, 6, 8, 9]
what am i missing here?

without seeing your code it's hard to be sure, but one possible error is the i-1:
>>> [1,2,3,4][:2]
[1, 2]
>>> [1,2,3,4][2:]
[3, 4]
(although you may be simply skipping the pivot?)
also, slices are new lists, not views:
>>> l = [1,2,3,4]
>>> l[2:][0] = 'three'
>>> l
[1, 2, 3, 4]
which is unfortunate (the typical functional program doing quicksort which is not a quicksort at all, dammit, because it's creating a pile of new arrays annoys me too...)
you can work round the second problem by passing the entire list plus lo/hi indices:
def quicksort(data, lo=0, hi=None):
if hi is None: hi = len(data)
....

quicksort(array[:i-1]) doesn't actually call quicksort on the first partition of the array, it calls quicksort on a copy of the first partition of the array. Thus, your code is partitioning the array in place, then creating copies of the halves and trying to sort them (but never doing anything with the resulting arrays), so your recursive calls have no effect.
If you want to do it like this, you'll have to avoid making copies of the list with slicing, and instead pass around the whole list as well as the ranges you want your functions to apply to.

I had the same problem, my quicksort was returning partially sorted lists. I found the problem was that I wasn't returning the pivot in it's own array. When I create an array for the pivot, it allows the recursion to work properly.
ie. my partition function returns instead of:
return left, right
it returns
return left, pivotval, right

Related

Why sort and sorted functions are showing different results? [duplicate]

I am trying to sort a list by frequency of its elements.
>>> a = [5, 5, 4, 4, 4, 1, 2, 2]
>>> a.sort(key = a.count)
>>> a
[5, 5, 4, 4, 4, 1, 2, 2]
a is unchanged. However:
>>> sorted(a, key = a.count)
[1, 5, 5, 2, 2, 4, 4, 4]
Why does this method not work for .sort()?
What you see is the result of a certain CPython implementation detail of list.sort. Try this again, but create a copy of a first:
a.sort(key=a.copy().count)
a
# [1, 5, 5, 2, 2, 4, 4, 4]
.sort modifies a internally, so a.count is going to produce un-predictable results. This is documented as an implementation detail.
What copy call does is it creates a copy of a and uses that list's count method as the key. You can see what happens with some debug statements:
def count(x):
print(a)
return a.count(x)
a.sort(key=count)
[]
[]
[]
...
a turns up as an empty list when accessed inside .sort, and [].count(anything) will be 0. This explains why the output is the same as the input - the predicates are all the same (0).
OTOH, sorted creates a new list, so it doesn't have this problem.
If you really want to sort by frequency counts, the idiomatic method is to use a Counter:
from collections import Counter
a.sort(key=Counter(a).get)
a
# [1, 5, 5, 2, 2, 4, 4, 4]
It doesn't work with the list.sort method because CPython decides to "empty the list" temporarily (the other answer already presents this). This is mentioned in the documentation as implementation detail:
CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.
The source code contains a similar comment with a bit more explanation:
/* The list is temporarily made empty, so that mutations performed
* by comparison functions can't affect the slice of memory we're
* sorting (allowing mutations during sorting is a core-dump
* factory, since ob_item may change).
*/
The explanation isn't straight-forward but the problem is that the key-function and the comparisons could change the list instance during sorting which is very likely to result in undefined behavior of the C-code (which may crash the interpreter). To prevent that the list is emptied during the sorting, so that even if someone changes the instance it won't result in an interpreter crash.
This doesn't happen with sorted because sorted copies the list and simply sorts the copy. The copy is still emptied during the sorting but there's no way to access it, so it isn't visible.
However you really shouldn't sort like this to get a frequency sort. That's because for each item you call the key function once. And list.count iterates over each item, so you effectively iterate the whole list for each element (what is called O(n**2) complexity). A better way would be to calculate the frequency once for each element (can be done in O(n)) and then just access that in the key.
However since CPython has a Counter class that also supports most_common you could really just use that:
>>> from collections import Counter
>>> [item for item, count in reversed(Counter(a).most_common()) for _ in range(count)]
[1, 2, 2, 5, 5, 4, 4, 4]
This may change the order of the elements with equal counts but since you're doing a frequency count that shouldn't matter to much.

Slicing with :-1 and None - What does each of the statement mean?

I came across a code snippet where I could not understand two of the statements, though I could see the end result of each.
I will create a variable before giving the statements:
train = np.random.random((10,100))
One of them read as :
train = train[:-1, 1:-1]
What does this slicing mean? How to read this? I know that that -1 in slicing denotes from the back. But I cannot understand this.
Another statement read as follows:
la = [0.2**(7-j) for j in range(1,t+1)]
np.array(la)[:,None]
What does slicing with None as in [:,None] mean?
For the above two statements, along with how each statement is read, it will be helpful to have an alternative method along, so that I understand it better.
One of Python's strengths is its uniform application of straightforward principles. Numpy indexing, like all indexing in Python, passes a single argument to the indexed object's (i.e., the array's) __getitem__ method, and numpy arrays were one of the primary justifications for the slicing mechanism (or at least one of its very early uses).
When I'm trying to understand new behaviours I like to start with a concrete and comprehensible example, so rather than 10x100 random values I'll start with a one-dimensional 4-element vector and work up to 3x4, which should be big enough to understand what's going on.
simple = np.array([1, 2, 3, 4])
train = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
The interpreter shows these as
array([1, 2, 3, 4])
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
The expression simple[x] is equivalent to (which is to say the interpreter ends up executing) simple.__getitem__(x) under the hood - note this call takes a single argument.
The numpy array's __getitem__ method implements indexing with an integer very simply: it selects a single element from the first dimension. So simple[1] is 2, and train[1] is array([5, 6, 7, 8]).
When __getitem__ receives a tuple as an argument (which is how Python's syntax interprets expressions like array[x, y, z]) it applies each element of the tuple as an index to successive dimensions of the indexed object. So result = train[1, 2] is equivalent (conceptually - the code is more complex in implementation) to
temp = train[1] # i.e. train.__getitem__(1)
result = temp[2] # i.e. temp.__getitem__(2)
and sure enough we find that result comes out at 7. You could think of array[x, y, z] as equivalent to array[x][y][z].
Now we can add slicing to the mix. Expressions containing a colon can be regarded as slice literals (I haven't seen a better name for them), and the interpreter creates slice objects for them. As the documentation notes, a slice object is mostly a container for three values, start, stop and slice, and it's up to each object's __getitem__ method how it interprets them. You might find this question helpful to understand slicing further.
With what you now know, you should be able to understand the answer to your first question.
result = train[:-1, 1:-1]
will call train.__getitem__ with a two-element tuple of slices. This is equivalent to
temp = train[:-1]
result = temp[..., 1:-1]
The first statement can be read as "set temp to all but the last row of train", and the second as "set result to all but the first and last columns of temp". train[:-1] is
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
and applying the [1:-1] subscripting to the second dimension of that array gives
array([[2, 3],
[6, 7]])
The ellipsis on the first dimension of the temp subscript says "pass everything," so the subscript expression[...]can be considered equivalent to[:]. As far as theNonevalues are concerned, a slice has a maximum of three data points: _start_, _stop_ and _step_. ANonevalue for any of these gives the default value, which is0for _start_, the length of the indexed object for _stop_, and1for _step. Sox[None:None:None]is equivalent tox[0:len(x):1]which is equivalent tox[::]`.
With this knowledge under your belt you should stand a bit more chance of understanding what's going on.

Efficient way to rotate a 2D array of integers of n size in Python

I want to rotate (anti-clockwise) a 2D nxn array of integers and the 2D array is stored as a list of lists.
For example:
a = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
After rotation, the output should look like:
b = [[3, 6, 9],
[2, 5, 8],
[1, 4, 7]]
I have written a function which performs the above rotation:
def rotate_clockwise(matrix):
transposed_matrix = zip(*matrix) # transpose the matrix
return list(map(list, reversed(transposed_matrix))) # reverse the transposed matrix
The function works well and the code looks pretty Pythonic to me.
However, I am having trouble understanding the space and time complexity of my solution.
Can someone explain both the complexities of the constructs I have used, namely zip(*matrix), reversed(list), map(list, iterator) and list(iterator)?
How can I make this code more efficient?
Also, what would be the most efficient way to rotate a 2D matrix?
NOTE: As mentioned by #Colonder in the comments, there might be a similar question to this. However, this question focuses more on discussing the space and time complexities of the problem.
The most efficient is probably using numpy for this:
>>> import numpy as np
>>> na = np.array(a)
>>> np.rot90(na)
array([[3, 6, 9],
[2, 5, 8],
[1, 4, 7]])
About the efficiency of your current approach. If the matrix is an n×n-matrix, then zip will work in O(n2), reversed will here work in O(n) (since it does this in a shallow way), the list function will work in O(n), but we do this n times since it is done in a map(..), so map(list,..) will work in O(n2). Finally the outer list will again work in O(n). There is however no way to rotate explicitly in less than O(n2), since we need to move O(n2) items.
In terms of space complexity zip, map, etc. work in an iterative way. But reversed will force the fact that zip will be fully enumerated. Each tuple from zip requires O(n), so the total amount of memory allocated will be O(n2). Next the map(list,..) works again iteratively, and each tuple will be converted to a list, that requires again O(n). We do this n times. So it will produce O(n2) memory complexity.
In numpy, if you do not rotate inplace, this will require O(n2) as well: this is a lower bound, since the new matrix will require O(n2) memory. If you do however rotate inplace, the memory compexity can be reduced to O(1).
I might be late but if it helps, this is w.r.t python3.x
Can someone explain both the complexities of the constructs I have
used, namely zip(*matrix), reversed(list), map(list, iterator) and
list(iterator)?
zip() -> O(1) time and space, it returns the iterators of zip object something like <zip object at 0x104f18480>. But in your case, you've to convert it to a list access the items, which makes it O(n) time and space.
reversed() -> return iterators of list object <list_reverseiterator object at 0x100ffd1c0> and takes O(n/2) = O(n) time and O(n) space
list() -> iterates over n items and stores n items so O(n) time and space
map() -> iterates over m items n times so O(n*m) time and O(n+m) space.
So your code will run in O(n*m) time and O(n+m) space.
def rotate_clockwise(matrix):
transposed_matrix = list(zip(*matrix)) # O(n) and O(n)
return list(map(list, reversed(transposed_matrix))) # O(m*m) and O(n+m)
You could also use something like
def rotate_clockwise(matrix):
return [list(element) for element in zip(*reversed(matrix))]
def rotate_anticlockwise(matrix):
return [list(element) for element in reversed(zip(*(matrix)))]
or shallow copy (inefficient for larger inputs)
def rotate_clockwise(self, matrix):
return [list(x) for x in zip(*matrix[::-1])]
def rotate_anticlockwise(self, matrix):
return [list(x) for x in zip(*matrix)[::-1]]
All of these are very pythonic ways of approaching the question, however these do take extra space which should be alright for most cases. However, from a interview perspective as per your comment for in-place, you could try something like
def transpose(self, mat):
rows = len(mat)
cols = len(mat[0])
for row in range(rows):
for col in range(row+1, cols):
mat[row][col], mat[col][row] = mat[col][row], mat[row][col]
def flip(self, mat):
for row in mat:
l,r = 0, len(row) - 1
while l < r:
row[l], row[r] = row[r], row[l]
l += 1
r -= 1
# Clockwise - call transpose then flip
# Anti-clock - call flip then transpose
If this is for your work, go with pythonic code. In most cases using in-built functions or imported libraries are way better than writing up your own solutions because they are optimised with cPython and are generally faster for all scenarios which can't be achieved with your own solution. Saves a lot of time and efforts.

Remove duplicates from one Python list, prune other lists based on it

I have a problem that's easy enough to do in an ugly way, but I'm wondering if there's a more Pythonic way of doing it.
Say I have three lists, A, B and C.
A = [1, 1, 2, 3, 4, 4, 5, 5, 3]
B = [1, 2, 3, 4, 5, 6, 7, 8, 9]
C = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# The actual data isn't important.
I need to remove all duplicates from list A, but when a duplicate entry is deleted, I would like the corresponding indexes removed from B and C:
A = [1, 2, 3, 4, 5]
B = [1, 3, 4, 5, 7]
C = [1, 3, 4, 5, 7]
This is easy enough to do with longer code by moving everything to new lists:
new_A = []
new_B = []
new_C = []
for i in range(len(A)):
if A[i] not in new_A:
new_A.append(A[i])
new_B.append(B[i])
new_C.append(C[i])
But is there a more elegant and efficient (and less repetitive) way of doing this? This could get cumbersome if the number of lists grows, which it might.
Zip the three lists together, uniquify based on the first element, then unzip:
from operator import itemgetter
from more_itertools import unique_everseen
abc = zip(a, b, c)
abc_unique = unique_everseen(abc, key=itemgetter(0))
a, b, c = zip(*abc_unique)
This is a very common pattern. Whenever you want to do anything in lock step over a bunch of lists (or other iterables), you zip them together and loop over the result.
Also, if you go from 3 lists to 42 of them ("This could get cumbersome if the number of lists grows, which it might."), this is trivial to extend:
abc = zip(*list_of_lists)
abc_unique = unique_everseen(abc, key=itemgetter(0))
list_of_lists = zip(*abc_unique)
Once you get the hang of zip, the "uniquify" is the only hard part, so let me explain it.
Your existing code checks whether each element has been seen by searching for each one in new_A. Since new_A is a list, this means that if you have N elements, M of them unique, on average you're going to be doing M/2 comparisons for each of those N elements. Plug in some big numbers, and NM/2 gets pretty big—e.g., 1 million values, a half of them unique, and you're doing 250 billion comparisons.
To avoid that quadratic time, you use a set. A set can test an element for membership in constant, rather than linear, time. So, instead of 250 billion comparisons, that's 1 million hash lookups.
If you don't need to maintain order or decorate-process-undecorate the values, just copy the list to a set and you're done. If you need to decorate, you can use a dict instead of a set (with the key as the dict keys, and everything else hidden in the values). To preserve order, you could use an OrderedDict, but at that point it's easier to just use a list and a set side by side. For example, the smallest change to your code that works is:
new_A_set = set()
new_A = []
new_B = []
new_C = []
for i in range(len(A)):
if A[i] not in new_A_set:
new_A_set.add(A[i])
new_A.append(A[i])
new_B.append(B[i])
new_C.append(C[i])
But this can be generalized—and should be, especially if you're planning to expand from 3 lists to a whole lot of them.
The recipes in the itertools documentation include a function called unique_everseen that generalizes exactly what we want. You can copy and paste it into your code, write a simplified version yourself, or pip install more-itertools and use someone else's implementation (as I did above).
PadraicCunningham asks:
how efficient is zip(*unique_everseen(zip(a, b, c), key=itemgetter(0)))?
If there are N elements, M unique, it's O(N) time and O(M) space.
In fact, it's effectively doing the same work as the 10-line version above. In both cases, the only work that's not obviously trivial inside the loop is key in seen and seen.add(key), and since both operations are amortized constant time for set, that means the whole thing is O(N) time. In practice, for N=1000000, M=100000 the two versions are about 278ms and 297ms (I forget which is which) compared to minutes for the quadratic version. You could probably micro-optimize that down to 250ms or so—but it's hard to imagine a case where you'd need that, but wouldn't benefit from running it in PyPy instead of CPython, or writing it in Cython or C, or numpy-izing it, or getting a faster computer, or parallelizing it.
As for space, the explicit version makes it pretty obvious. Like any conceivable non-mutating algorithm, we've got the three new_Foo lists around at the same time as the original lists, and we've also added new_A_set of the same size. Since all of those are length M, that's 4M space. We could cut that in half by doing one pass to get indices, then doing the same thing mu 無's answer does:
indices = set(zip(*unique_everseen(enumerate(a), key=itemgetter(1))[0])
a = [a[index] for index in indices]
b = [b[index] for index in indices]
c = [c[index] for index in indices]
But there's no way to go lower than that; you have to have at least a set and a list of length M alive to uniquify a list of length N in linear time.
If you really need to save space, you can mutate all three lists in-place. But this is a lot more complicated, and a bit slower (although still linear*).
Also, it's worth noting another advantage of the zip version: it works on any iterables. You can feed it three lazy iterators, and it won't have to instantiate them eagerly. I don't think it's doable in 2M space, but it's not too hard in 3M:
indices, a = zip(*unique_everseen(enumerate(a), key=itemgetter(1))
indices = set(indices)
b = [value for index, value in enumerate(b) if index in indices]
c = [value for index, value in enumerate(c) if index in indices]
* Note that just del c[i] will make it quadratic, because deleting from the middle of a list takes linear time. Fortunately, that linear time is a giant memmove that's orders of magnitude faster than the equivalent number of Python assignments, so if N isn't too big you can get away with it—in fact, at N=100000, M=10000 it's twice as fast as the immutable version… But if N might be too big, you have to instead replace each duplicate element with a sentinel, then loop over the list in a second pass so you can shift each element only once, which is instead 50% slower than the immutable version.
How about this - basically get a set of all unique elements of A, and then get their indices, and create a new list based on these indices.
new_A = list(set(A))
indices_to_copy = [A.index(element) for element in new_A]
new_B = [B[index] for index in indices_to_copy]
new_C = [C[index] for index in indices_to_copy]
You can write a function for the second statement, for reuse:
def get_new_list(original_list, indices):
return [original_list[idx] for idx in indices]

Inserting and removing into/from sorted list in Python

I have a sorted list of integers, L, and I have a value X that I wish to insert into the list such that L's order is maintained. Similarly, I wish to quickly find and remove the first instance of X.
Questions:
How do I use the bisect module to do the first part, if possible?
Is L.remove(X) going to be the most efficient way to do the second part? Does Python detect that the list has been sorted and automatically use a logarithmic removal process?
Example code attempts:
i = bisect_left(L, y)
L.pop(i) #works
del L[bisect_left(L, i)] #doesn't work if I use this instead of pop
You use the bisect.insort() function:
bisect.insort(L, X)
L.remove(X) will scan the whole list until it finds X. Use del L[bisect.bisect_left(L, X)] instead (provided that X is indeed in L).
Note that removing from the middle of a list is still going to incur a cost as the elements from that position onwards all have to be shifted left one step. A binary tree might be a better solution if that is going to be a performance bottleneck.
You could use Raymond Hettinger's IndexableSkiplist. It performs 3 operations in O(ln n) time:
insert value
remove value
lookup value by rank
import skiplist
import random
random.seed(2013)
N = 10
skip = skiplist.IndexableSkiplist(N)
data = range(N)
random.shuffle(data)
for num in data:
skip.insert(num)
print(list(skip))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for num in data[:N//2]:
skip.remove(num)
print(list(skip))
# [0, 3, 4, 6, 9]

Categories