Can't understand np.partition - python

I tried understanding the function but the output seems weird.
array = np.array([9, 2, 7, 4, 6, 3, 8, 1, 5])
print(np.partition(array, kth=0))
That's the output I'm getting for some reason:
[1 2 7 4 6 3 8 9 5]
Expected to get:
[X X X X X X X X 9]
What am I missing?

From the description in the numpy.partition documentation, [1 2 7 4 6 3 8 9 5] is a correct partition of your input array for kth=0:
Creates a copy of the array with its elements rearranged in such a way that the value of the element in k-th position is in the position it would be in a sorted array. All elements smaller than the k-th element are moved before this element and all equal or greater are moved behind it. The ordering of the elements in the two partitions is undefined.
I think the confusion is in the ambiguous description of kth. That's the index after the partition operation, not before. So, in your example, kth=0 doesn't refer to the value at index 0 of the input array (9), it refers to whatever value would be at index 0 of a sorted array of the same values, in this case 1.
Any arrangement matching [1 X X X X X X X X] is valid.

Related

Is np.argpartition giving me the wrong results?

Take the following code:
import numpy as np
one_dim = np.array([2, 3, 1, 5, 4])
partitioned = np.argpartition(one_dim, 0)
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {one_dim[partitioned]}')
The following output results:
Unpartitioned array: [2 3 1 5 4]
Partitioned array index: [2 1 0 3 4]
Partitioned array: [1 3 2 5 4]
The output for the partitioned array should be [1 2 3 5 4]. How is three on the left side of two? It seems to me the function is making an error or am I missing something?
The second argument is which index will be in sorted position after partitioning, so it is correct that index 0 of the partition (element value 1) is in sorted position, and everything to the right is greater.

How to compare numpy arrays in terms of similarity

I am given two numpy-arrays: One of dimensions i x mand the other of dimensions j x m. What I want to do is, loop through the FirstArray and compare each of its elements with each of the elements of the SecondArray. When I say 'compare', I mean: I want to compute the Euclidean distance between the elements of FirstArray and SecondArray. Then, I want to store the index of the element of SecondArray that is closest to the corresponding element of FirstArray, and I also want to store the index of the element of SecondArray that is second closest to the element of the FirstArray.
In code this would look somewhat similar to this:
smallest = None
idx = 0
for i in range(0, FirstArrayRows):
for j in range(0, SecondArrayRows):
EuclideanDistance = np.sqrt(np.sum(np.square(FirstArray[i,:] - SecondArray[j,:])))
if smallest is None or EuclideanDistance < smallest:
smallest = EuclideanDistance
idx_second = idx
idx = j
Closest[i] = idx
SecondClosest[i] = idx_second
And I think this works. However, there are two cases when this code fails to give the correct index for the second closest element of SecondArray:
when the element of SecondArray that is closest to the element of FirstArray is at j = 0.
when the element of SecondArray that is closest to the element of FirstArray is at j = 1.
So I wonder: Is there a better way of implementing this?
I know there is. Maybe someone can help me see it?
You could use numpy's broadcasting to your advantage. Compute the Euclidean distance with all elements of the second array in a single operation. Then, you can find the two smallest distances using argpartition.
import numpy as np
i, j, m = 3, 4, 5
a = np.random.choice(10,(i,m))
b = np.random.choice(10,(j,m))
print('First array:\n',a)
print('Second array:\n',b)
closest, second_closest = np.zeros(i), np.zeros(i)
for i in range(a.shape[0]):
dist = np.sqrt(((a[i,:] - b)**2).sum(axis=1))
closest[i], second_closest[i] = np.argpartition(dist, 2)[:2]
print('Closest:', closest)
print('Second Closest:', second_closest)
Output:
First array:
[[3 9 0 2 2]
[1 2 9 9 7]
[4 0 6 6 4]]
Second array:
[[9 9 2 2 3]
[9 9 0 2 3]
[1 1 6 7 7]
[5 7 0 4 4]]
Closest: [3. 2. 2.]
Second Closest: [1. 3. 3.]

How to reorder coordinate values using the euclidian distance in Python?

I want to reorder the coordinate value based on the euclidean distance .
For example I have coordinates:
1 2
2 1
1 3
1 9
6 9
3 5
6 8
4 5
7 9
I have got euclidean distance of first coordinate with other coordinate:
With the following code:
with open("../data comparision project/testfile.txt") as f:
# for splitting the text file into to lists of list
my_list = [[x for x in line.strip().split(' ')] for line in f
index = 0
# empty list to store distances.
euclidean_distance_list = []
for list_of_item in my_list:
plot1=my_list[0]
plot2=my_list[index]
euclidean_distance=math.sqrt((float(plot1[0])-float(plot2[0]))**2 + (float(plot1[1])-float(plot2[1]))**2)
index=index+1
# Out of for loop
sorted_list=sorted(euclidean_distance_list)
print(sorted_list)
This generates the following output:
[0.0, 1.0, 1.4142135623730951, 3.605551275463989, 4.242640687119285, 7.0, 7.810249675906654, 8.602325267042627, 9.219544457292887]
Now I want to reorder the original coordinate value based on the these distances such that it will be:
1 2
1 3
1 9
2 1
3 5
4 5
6 8
6 9
7 9
Can anyone help me with python code.I have caluclated distance but unable to get list with sorted coordinate vlaues.
You want to sort the list based on a custom comparator.
Check out the key optional argument to the sort function. You can supply a custom comparator as key.
https://docs.python.org/3/howto/sorting.html
To fill in a bit more detail - supposing that you already wrote the function:
def euclidean_distance(a, b):
# does the math and gives the distance between coordinates a and b.
# If you got the values some other way - better reorganize the code
# first so that you have a function like this :)
We can use functools.partial to make a function for distances from a given point:
distance_from_a = functools.partial(euclidean_distance, points[0])
and then the rest of the logic is built into Python's native sorting functionality:
sorted(points, key=distance_from_a)
You can perform a custom sort by doing something like this assuming you are using numpy:
import numpy as np
def euclidian_distance(a, b):
return np.linalg.norm(a - b)
coords = np.array([[1,2],
[2,1],
[1,3],
[1,9],
[6,9],
[3,5],
[6,8],
[4,5],
[7,9]])
coords = sorted(coords, key=lambda point: euclidian_distance(point, coords[0]))
print(np.matrix(coords)) # matrix is only for formatting for readability purposes
Output:
[[1 2]
[1 3]
[2 1]
[3 5]
[4 5]
[1 9]
[6 8]
[6 9]
[7 9]]
To explain why the above output is different from the OP's. It's because the OP's example output is not actually ordered by distance like they described they wanted.

How does the following code work for Matrix Multiplication in Python

I'm trying to do a matrix multiplication in python I have found the following code that I am trying to understand. (I know how to multiply matrices by hand and I want to understand how the following code performs the same action and by that i mean the first element in BA (row 1 column 1) is calculated by doing (1*1 + 3*3 + 3*3 + 1*1) etc.
from numpy import array
A= array([[ 1, 4, 5 ],[ 3, 2, 9], [ 3,6, 2], [ 1,6, 8]])
B=A.T
BA= array([[ 0, 0, 0 ],[ 0,0, 0], [ 0,0, 0] ])
for i in range(len(B)):
for j in range(len(A[0])):
for k in range(len(A)):
BA[i][j] += B[i][k] * A[k][j]
I know that the length command for a list returns how many elements there are in that list. I am not sure how it works here since B is a matrix, I assume it returns how many rows there are.
range of len(B) would be (0,3) corresponding to row 1,2 and 3.
for i in range would correspond to i=0, i=1, i= 2
next confusing thing is for j in range len(A[0])
The first element of A is the first row, the length here would thus correspond how many elements there are in the first element of A.
Basically I have a basic understanding of what range and len etc put out for this example but I would like to get a better understand of each value of i, j, k as a result of these as well as the last line which I really don't understand.
BA[i][j] += B[i][k] * A[k][j]
Please explain as basic as possible because I am new to programming and so at this point nothing is trivial to me. Thank you for your time to help others :)
Here is the actual result from your code:
B * A = AB
1 3 3 1 1 4 5 20 34 46
4 2 6 6 3 2 9 34 92 98
5 9 2 8 3 6 2 46 98 174
1 6 8
Assuming i = 0 and j = 0 lets calculate the BA[0][0], which is the first element from matrix BA.
BA[0][0] = B[0][k] * A[k][0]
B[0][k] means the line 0 from matrix B. As k is iterating over all lines of A, which is the same size as the number of columns in B.
A[k][0] means the column 0 from matrix A.
The loop for k in range(len(A)): will reproduce:
B[0][0]*A[0][0] + B[0][1]*A[1][0] + B[0][2]*A[2][0] + B[0][3]*A[3][0]
Resulting in:
1×1 + 3×3 + 3×3 + 1×1 = 20
Which is the value for BA[0][0] resulted from your code.
The following nested loops will iterate over all columns of A as j for every line of B as i in order to perform the multiplication for all (line) x (column) pairs:
for i in range(len(B)):
for j in range(len(A[0])):
Consider here the array as just a more convenient representation of the list you gave to the function.
A is built on top of the list [[ 1, 4, 5 ],[ 3, 2, 9], [ 3,6, 2], [ 1,6, 8]], a list of length 4.
range(start, stop) is a function which returns a generator which produces a sequence of integers, from the start to the stop point, stop not being included. If not provided, start defaults to 0.
B has a length of 4 rows so range(len(B)) will be like range(0, 3), which will produce 0,1 and 2 integers, when "asked" by the for loop. i will subsequently be 0,1 and 2.
A[0] returns the first row of A, which has a length of 3, so the same way, j will subsequently be 0, 1 and 2 (in this case) ; and k will subsequently be 0, 1, 2 and 3 as A has a length of 4.
BA[i] returns the row of index i. Which can also be indexed by j
So BA[i][j] is the element of row i and column j, which we increments by the product of the element of row i and index k of matrix B; and the element of row k and index j of matrix A.
In you code sample a matrix is represented a a list of sublists, where each sublist is a row.
So the most outer loop goes over the rows of B:
for i in range(len(B)):
A[0] is the first row of A, and the number of elements in it is the number of A's columns.
So the second loop goes over the columns of A:
for j in range(len(A[0])):
The most inner loop simply sums the products of the elements in the j-th row of B and the i-th row of A.
BA[i][j] += B[i][k] * A[k][j]
This add to BA[i][j] the product. += adds its right argument to its left one.

replace element by element different arrays

I have an array :
a = np.array([1,2,3,4,5,6,7,8])
The array may be reshaped to a = np.array([[1,2,3,4],[5,6,7,8]]), whatever is more convenient.
Now, I have an array :
b = np.array([[11,22,33,44], [55,66,77,88]])
I want to replace to each of these elements the corresponding elements from a.
The a array will always hold as many elements as b has.
So, array b will be :
[1,2,3,4], [5,6,7,8]
Note, that I must keep each b subarray dimension to (4,). I don't want to change it.So, the idx will take values from 0 to 3.I want to make a fit to every four values.
I am struggling with reshape, split,mask ..etc and I can't figure a way to do it.
import numpy as np
#a = np.array([[1,2,3,4],[5,6,7,8]])
a = np.array([1,2,3,4,5,6,7,8])
b = np.array([[11,22,33,44], [55,66,77,88]])
for arr in b:
for idx, x in enumerate(arr):
replace every arr[idx] with corresponding a value
For your current case, what you want is probably:
b, c = list(a.reshape(2, -1))
This isn't the cleanest, but it is a one-liner. Turn your 1D array into a 2D array with with the first dimension as 2 with reshape(2, -1), then list splits it along the first dimension so you can directly assign them to b, c
You can also do it with the specialty function numpy.split
b, c = np.split(a, 2)
EDIT: Based on accepted solution, vectorized way to do this is
b = a.reshape(b.shape)
The following worked for me:
i = 0
for arr in b:
for idx, x in enumerate(arr):
arr[idx] = a[i]
print(arr[idx])
i += 1
Output (arr[idx]): 1 2 3 4 5 6 7 8
If you type print(b) it'll output [[1 2 3 4] [5 6 7 8]]
b = a[:len(a)//2]
c = a[len(a)//2:]
Well, I'm quite new to Python but this worked for me:
for i in range (0, len(a)//2):
b[i] = a[i]
for i in range (len(a)//2,len(a)):
c[i-4] = a[i]
by printing the 3 arrays I have the following output:
[1 2 3 4 5 6 7 8]
[1 2 3 4]
[5 6 7 8]
But I would go for Daniel's solution (the split one): 1 liner, using numpy API, ...

Categories