Python: Inplace Merge sort implementation issue - python

I am implementing inplace merge sort algorithm in python3. Code takes an input array and calls it self recursively (with split array as input) if length of the input array is more than one. After that, it joins two sorted arrays. Here is the code
def merge_sort(array):
"""
Input : list of values
Note :
It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves.
Returns : sorted list of values
"""
def join_sorted_arrays(array1, array2):
"""
Input : 2 sorted arrays.
Returns : New sorted array
"""
new_array = [] # this array will contain values from both input arrays.
j = 0 # Index to keep track where we have reached in second array
n2 = len(array2)
for i, element in enumerate(array1):
# We will compare current element in array1 to current element in array2, if element in array2 is smaller, append it
# to new array and look at next element in array2. Keep doing this until either array2 is exhausted or an element of
# array2 greater than current element of array1 is found.
while j < n2 and element > array2[j]:
new_array.append(array2[j])
j += 1
new_array.append(element)
# If there are any remaining values in array2, that are bigger than last element in array1, then append those to
# new array.
for i in range(j,n2):
new_array.append(array2[i])
return new_array
n = len(array)
if n == 1:
return array
else:
# print('array1 = {0}, array2 = {1}'.format(array[:int(n/2)], array[int(n/2):]))
array[:int(n/2)] = merge_sort(array[:int(n/2)])
array[int(n/2):] = merge_sort(array[int(n/2):])
# print('array before joining : ',array)
array = join_sorted_arrays(array[:int(n/2)],array[int(n/2):])
# print('array after joining : ',array)
return array
Now if the code is tested,
a = [2,1,4,3,1,2,3,4,2,7,8,10,3,4]
merge_sort(a)
print(a)
out : [1, 1, 2, 2, 3, 3, 4, 2, 3, 4, 4, 7, 8, 10]
If you uncomment the print statements in the above function, you will notice that, a = given output, just before the last call of join_sorted_arrays. After this function has been called, array 'a' should be sorted. To my surprise, if I do the following, output is correct.
a = [2,1,4,3,1,2,3,4,2,7,8,10,3,4]
a = merge_sort(a)
print(a)
out : [1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 7, 8, 10]
I need some help to understand why this is happening.
I am beginner, so any other comments about coding practices etc. are also welcome.

When you reassign array as the output of join_sorted_arrays() with
array = join_sorted_arrays(array[:int(n/2)],array[int(n/2):])
you're not updating the value of a anymore.
Seeing as you pass in a as the argument array, it's understandable why all variables named array in a function might seem like they should update the original value of array (aka a). But instead, what's happening with array = join_sorted_arrays(...) is that you have a new variable array scoped within the merge_sort() function. Returning array from the function returns that new, sorted, set of values.
The reference to a was being modified up until that last statement, which is why it looks different with print(a) after merge_sort(a). But you'll only get the final, sorted output from the returned value of merge_sort().
It might be clearer if you look at:
b = merge_sort(a)
print(a) # [1, 1, 2, 2, 3, 3, 4, 2, 3, 4, 4, 7, 8, 10]
print(b) # [1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 7, 8, 10]
Note that Python isn't a pass-by-reference language, and the details of what it actually is can be a little weird to suss out at first. I'm always going back to read on how it works when I get tripped up. There are plenty of SO posts on the topic, which may be of some use to you here.
For example, this one and this one.

Related

Python ~ First Covering Prefix in Array

So I recently gave an online interview for a job. Although my expertise are networks and cyber security.
I came across this question:
Write a function which takes an array of integers and returns the
first covering prefix of that array. The "first covering prefix" of an
array, A, of length N is the smallest index P such that 0 <= P <= N
and each element in A also appears in the list of elements A[0]
through A[P]. For example, the first covering prefix of the following
array: A = [5, 3, 19, 7, 3, 5, 7, 3] is 3, because the elements from
A[0] to A[3] (equal to [5, 3, 19, 7]) contains all values that occur
in array A.
Although I am not a programmer (chose python3 for the interview),
I would like someone to explain the logic behind this.
Just wanting to learn, its been bugging me for a day now.
You can iterate all elements, if not already seen (use a set to keep track efficiently), update P:
A = [5, 3, 19, 7, 3, 5, 7, 3]
S = set()
P = 0 # you could set -1/None as default to account for empty lists?
for i, item in enumerate(A): # iterate elements together with indices
if item not in S: # if we haven't seen this element yet
P = i # update P as the current index
S.add(item) # add the element to the set
print(P)
output: 3

Delete some elements from numpy array

One interesting question:
I would like to delete some elements from a numpy array but just as below simplified example code, it works if didn't delete the last element, but it failure if we wish to delete the last element.
Below code works fine:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,4,1]:
values = np.delete(values,i)
print values
The output is:
[0 1 2 3 4 5]
[0 2 4]
If we only change 4 to 5, then it will fail:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,5,1]:
values = np.delete(values,i)
print values
The error message:
IndexError: index 5 is out of bounds for axis 0 with size 5
Why this error only happen if delete the last element? what's correct way to do such tasks?
Keep in mind that np.delete(arr, ind) deletes the element at index ind NOT the one with that value.
This means that as you delete things, the array is getting shorter. So you start with
values = [0,1,2,3,4,5]
np.delete(values, 3)
[0,1,2,4,5] #deleted element 3 so now only 5 elements in the list
#tries to delete the element at the fifth index but the array indices only go from 0-4
np.delete(values, 5)
One of the ways you can solve the problem is to sort the indices that you want to delete in descending order (if you really want to delete the array).
inds_to_delete = sorted([3,1,5], reverse=True) # [5,3,1]
# then delete in order of largest to smallest ind
Or:
inds_to_keep = np.array([0,2,4])
values = values[inds_to_keep]
A probably faster way (because you don't need to delete every single value but all at once) is using a boolean mask:
values = np.array([0,1,2,3,4,5])
tobedeleted = np.array([False, True, False, True, False, True])
# So index 3, 5 and 1 are True so they will be deleted.
values_deleted = values[~tobedeleted]
#that just gives you what you want.
It is recommended on the numpy reference on np.delete
To your question: You delete one element so the array get's shorter and index 5 is no longer in the array because the former index 5 has now index 4. Delete in descending order if you want to use np.delete.
If you really want to delete with np.delete use the shorthand:
np.delete(values, [3,5,1])
If you want to delete where the values are (not the index) you have to alter the procedure a bit. If you want to delete all values 5 in your array you can use:
values[values != 5]
or with multiple values to delete:
to_delete = (values == 5) | (values == 3) | (values == 1)
values[~to_delete]
all of these give you the desired result, not sure how your data really looks like so I can't say for sure which will be the most appropriate.
The problem is that you have deleted items from values so when you are trying to delete item in index 5 there is no longer value at that index, it's now at index 4.
If you sort the list of indices to delete, and iterate over them from large to small that should workaround this issue.
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [5,3,1]: # iterate in order
values = np.delete(values,i)
print values
If you want to remove the elements of indices 3,4,1 , just do np.delete(values,[3,4,1]).
If you want in the first case to delete the fourth (index=3) item, then the fifth of the rest and finally the second of the rest, due to the order of the operations, you delete the second, fourth and sixth of the initial array. It's therefore logic that the second case fails.
You can compute the shifts (in the exemple fifth become sixth) in this way :
def multidelete(values,todelete):
todelete=np.array(todelete)
shift=np.triu((todelete>=todelete[:,None]),1).sum(0)
return np.delete(values,todelete+shift)
Some tests:
In [91]: multidelete([0, 1, 2, 3, 4, 5],[3,4,1])
Out[91]: array([0, 2, 4])
In [92]: multidelete([0, 1, 2, 3, 4, 5],[1,1,1])
Out[92]: array([0, 4, 5])
N.B. np.delete doesn't complain an do nothing if the bad indice(s) are in a list : np.delete(values,[8]) is values .
Boolean index is deprected. You can use function np.where() instead like this:
values = np.array([0,1,2,3,4,5])
print(values)
for i in [3,5,1]:
values = np.delete(values,np.where(values==i))
# values = np.delete(values,values==i) # still works with warning
print(values)
I know this question is old, but for further reference (as I found a similar source problem):
Instead of making a for loop, a solution is to filter the array with isin numpy's function. Like so,
>>> import numpy as np
>>> # np.isin(element, test_elements, assume_unique=False, invert=False)
>>> arr = np.array([1, 4, 7, 10, 5, 10])
>>> ~np.isin(arr, [4, 10])
array([ True, False, True, False, True, False])
>>> arr = arr[ ~np.isin(arr, [4, 10]) ]
>>> arr
array([1, 7, 5])
So for this particular case we can write:
values = np.array([0,1,2,3,4,5])
torem = [3,4,1]
values = values[ ~np.isin(values, torem) ]
which outputs: array([0, 2, 5])
here's how you can do it without any loop or any indexing, using numpy.setdiff1d
>>> import numpy as np
>>> array_1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> array_1
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> remove_these = np.array([1,3,5,7,9])
>>> remove_these
array([1, 3, 5, 7, 9])
>>> np.setdiff1d(array_1, remove_these)
array([ 2, 4, 6, 8, 10])

Adding new rows to an array dynamcally

I want to initialize an empty list and keep on adding new rows to it. For example.
myarray=[]
now at each iteration I want to add new row which I compute during iteration. For example
for i in range(5):
calc=[i,i+1,i+4,i+5]
After calc I want to add this row to myarray. Therfore after 1st iteration myarray would be 1X4, after 2nd iteration it would be 2X4 etc. I tried numpy.concatenate. It simply adds to same row ie I get 1X4 then 1X8. I tried vstack as well but since myarray is initially [] it gives error "all the input array dimensions except for the concatenation axis must match exactly"
It looks like you need a multi dimensional array
calc = [[0, 1, 4, 5]]
for i in range(1, 5):
calc.append([i, i+1, i+4, i+5])
Will yield you the following array
calc = [[0, 1, 4, 5], [1, 2, 5, 6], [2, 3, 6, 7], [3, 4, 7, 8], [4, 5, 8, 9]]
To access the various elements of calc you can address it like the following
calc[0] returns [0,1,5,6]
calc[1] returns [1,2,5,6]
I'm pretty sure this works, unless I'm misunderstanding:
mylist = [] #I'm using a list, not an array
for i in range(5):
calc=[i,i+1,i+4,i+5]
mylist.append(calc) #You're appending a list into another list, making a nested list
Now, a little more general knowledge. Append vs. Concatenate.
You want to append if you want to add into a list. In this case, you're adding a list into another list. You want to concatenate if you want to 'merge' two lists together to make a single list - which is why your implementation was not making a nested list.

Get the index of values retrieved from random.sample(list,2)?

I have a file which contains a number of lists. I want to access the index of the values retrieved from each of these lists. I use the random function as shown below. It retrieves the values perfectly well, but I need to get the index of the values obtained.
for i in range(M):
print(krr[i])
print(krr[i].index(random.sample(krr[i],2)))
nrr[i]=random.sample(krr[i],2)
outf13.write(str(nrr[i]))
outf13.write("\n")
I got ValueError saying the two values retrieved are not in the list even though they exist...
To retrieve the index of the randomly selected value in your list you could use enumerate that will return the index and the value of an iterable as a tuple:
import random
l = range(10) # example list
random.shuffle(l) # we shuffle the list
print(l) # outputs [4, 1, 5, 0, 6, 7, 9, 2, 8, 3]
index_value = random.sample(list(enumerate(l)), 2)
print(index_value) # outputs [(4, 6), (6, 9)]
Here the 4th value 6 and 6th value 9 were selected - of course each run will return something different.
Also in your code you are printing a first sample of the krr[i] and then sampling it again on the next line assigning it to nrr[i]. Those two calls will result in different samples and might cause your IndexError.
EDIT after OP's comment
The most explicit way to then separate the values from the indexes is:
indexes = []
values = []
for idx, val in index_value:
indexes.append(idx)
values.append(val)
print indexes # [4, 6]
print values # [6, 9]
Note that indexes and values are in the same order as index_value.
If you need to reproduce the results, you can seed the random generator, for instance with random.seed(123). This way, every time you run the code you get the same random result.
In this case, the accepted solution offered by bvidal it would look like this:
import random
l = list(range(10)) # example list (please notice the explicit call to 'list')
random.seed(123)
random.shuffle(l) # shuffle the list
print(l) # outputs [8, 7, 5, 9, 2, 3, 6, 1, 4, 0]
index_value = random.sample(list(enumerate(l)), 2)
print(index_value) # outputs [(8, 4), (9, 0)]
Another approach is to use the random sample function random.sample from the standard library to randomly get an array of indices and use those indices to randomly choose elements from the list. The simplest way to access the elements is converting the list to a numpy array:
import numpy as np
import random
l = [1, -5, 4, 2, 7, 4, 8, 0, 9, 3]
print(l) # prints the list
random.seed(1234) # seed the random generator for reproducing the results
random_indices = random.sample(range(len(l)), 2) # get 2 random indices
print(random_indices) # prints the indices
a = np.asarray(l) # convert to array
print(list(a[random_indices])) # prints the elements
The output of the code is:
[1, -5, 4, 2, 7, 4, 8, 0, 9, 3]
[7, 1]
[0, -5]
You could try using enumerate() on your list objects.
According to the Python official documentation
enumerate() : Return an enumerate object. sequence must be a sequence, an iterator,
or some other object which supports iteration. The next() method of
the iterator returned by enumerate() returns a tuple containing a
count (from start which defaults to 0) and the values obtained from
iterating over sequence
A simple example is this :
my_list=['a','b','c']
for index, element in enumerate(my_list):
print(index, element)
# 0 a
# 1 b
# 2 c
Don't know if I understood the question though.
You are getting the random sample twice, which results in two different random samples.

Renumbering a 1D mesh in Python

First of all, I couldn't find the answer in other questions.
I have a numpy array of integer, this is called ELEM, the array has three columns that indicate, element number, node 1 and node 2. This is one dimensional mesh. What I need to do is to renumber the nodes, I have the old and new node numbering tables, so the algorithm should replace every value in the ELEM array according to this tables.
The code should look like this
old_num = np.array([2, 1, 3, 6, 5, 9, 8, 4, 7])
new_num = np.arange(1,10)
ELEM = np.array([ [1, 1, 3], [2, 3, 6], [3, 1, 3], [4, 5, 6]])
From now, for every element in the second and third column of the ELEM array I should replace every integer from the corresponding integer specified according to the new_num table.
If you're doing a lot of these, it makes sense to encode the renumbering in a dictionary for fast lookup.
lookup_table = dict( zip( old_num, new_num ) ) # create your translation dict
vect_lookup = np.vectorize( lookup_table.get ) # create a function to do the translation
ELEM[:, 1:] = vect_lookup( ELEM[:, 1:] ) # Reassign the elements you want to change
np.vectorize is just there to make things nicer syntactically. All it does is allow us to map over the values of the array with our lookup_table.get function
I actually couldn't exactly get what your problem is but, I tried to help you as far as I could understood...
I think you need to replace, for example 2 with 1, or 7 with 10, right? In such a case, you can create a dictionary for numbers that are to be replaced. The 'dict' below is for that purpose. It could also be done by using tuples or lists but for such purposes it is better to use dictionaries. Afterwards, just replace each element by looking into the dictionary.
The code below is a very basic one is relatively easy to understand. For sure there are more pythonic ways to do that. But if you are new into Python, the code below would be the most appropriate one.
import numpy as np
# Data you provided
old_num = np.array([2, 1, 3, 6, 5, 9, 8, 4, 7])
new_num = np.arange(1,10)
ELEM = np.array([ [1, 1, 3], [2, 3, 6], [3, 1, 3], [4, 5, 6]])
# Create a dict for the elements to be replaced
dict = {}
for i_num in range(len(old_num)):
num = old_num[i_num]
dict[num] = new_num[i_num]
# Replace the elements
for element in ELEM:
element[1] = dict[element[1]]
element[2] = dict[element[2]]
print ELEM

Categories