Distance with array of different sizes - python

I have an array with dimensions as such:
pos = np.array([[ 1.72, 2.56],
[ 0.24, 5.67],
[ -1.24, 5.45],
[ -3.17, -0.23],
[ 1.17, -1.23],
[ 1.12, 1.08]])
and I want to find the distance between each line of the array to an index point which would be
ref = np.array([1.22, 1.18])
I would thus have an array with 4 elements as an answer but I'm really confused as to the method of approaching this with only numpy as I've tried many ways yet the size of the ref array presents a challenge. Thanks for the help.
The expected answer is an array with 6 elements. The elements are approximately:
[ 1.468, 4.596, 4.928 , 4.611, 2.410, 0.141 ]

Using numpy and assuming Euclidean metric:
import numpy as np
np.linalg.norm(pos - ref, axis=1)
If you need a Python list (instead of numpy array), add .tolist() to the previous line:
np.linalg.norm(pos - ref, axis=1).tolist()

Related

Issue with numpy matrix multiplication

I'm trying to multiply two matrices of dimensions (17,2) by transposing one of the matrices
Here is example p1
p1 = [[ 0.15520622 -0.92034567]
[ 0.43294367 -1.05921439]
[ 0.7569707 -1.15179354]
[ 1.08099772 -1.15179354]
[ 1.35873517 -0.96663524]
[-1.51121847 -0.64260822]
[-1.32606018 -0.87405609]
[-1.00203315 -0.96663524]
[-0.67800613 -0.96663524]
[-0.3539791 -0.87405609]
[ 0.89583942 1.02381648]
[ 0.66439155 1.3478435 ]
[ 0.3866541 1.48671223]
[ 0.15520622 1.5330018 ]
[-0.07624165 1.5330018 ]
[-0.3539791 1.44042265]
[-0.58542698 1.20897478]]
here is another example matrix p2
p2 = [[ 0.20932473 -0.90029958]
[ 0.53753779 -1.03849455]
[ 0.88302521 -1.10759204]
[ 1.24578701 -1.02122018]
[ 1.47035383 -0.77937898]
[-1.46628927 -0.69300713]
[-1.29354556 -0.9521227 ]
[-0.96533251 -1.03849455]
[-0.63711946 -1.00394581]
[-0.3089064 -0.90029958]
[ 0.86575084 1.06897874]
[ 0.55481216 1.37991742]
[ 0.26114785 1.50083802]
[ 0.03658102 1.51811239]
[-0.1879858 1.50083802]
[-0.46437574 1.37991742]
[-0.74076568 1.08625311]]
I'm trying to multiply them using numpy
import numpy
print(p1.T * p2)
But I'm getting the following error
operands could not be broadcast together with shapes (2,17) (17,2)
This is the expected matrix multiplication output
[[11.58117944 2.21072324]
[-0.51754442 22.28728876]]
Where exactly am I going wrong
Matrix multiplication is done with np.dot(p1.T,p2), because
A * B means matrix elements-wise multiply.
So you should use np.dot:
p1.T.dot(p2)
Sorry for a vague question. Initially, I was getting p1 and p2 values from numpy matrix. I later stored them in json file as list for optimization by using
.tolist()
method and was reading it back as numpy array using
numpy.array()
method which is apparently wrong..I changed my code to read the numpy array using
numpy.matrix()
method which seems to solve the issue. Hope this helps someone

Avoid using for loop. Python 3

I have an array of shape (3,2):
import numpy as np
arr = np.array([[0.,0.],[0.25,-0.125],[0.5,-0.125]])
I was trying to build a matrix (matrix) of dimensions (6,2), with the results of the outer product of the elements i,i of arr and arr.T. At the moment I am using a for loop such as:
size = np.shape(arr)
matrix = np.zeros((size[0]*size[1],size[1]))
for i in range(np.shape(arr)[0]):
prod = np.outer(arr[i],arr[i].T)
matrix[size[1]*i:size[1]+size[1]*i,:] = prod
Resulting:
matrix =array([[ 0. , 0. ],
[ 0. , 0. ],
[ 0.0625 , -0.03125 ],
[-0.03125 , 0.015625],
[ 0.25 , -0.0625 ],
[-0.0625 , 0.015625]])
Is there any way to build this matrix without using a for loop (e.g. broadcasting)?
Extend arrays to 3D with None/np.newaxis keeping the first axis aligned, while letting the second axis getting pair-wise multiplied, perform multiplication leveraging broadcasting and reshape to 2D -
matrix = (arr[:,None,:]*arr[:,:,None]).reshape(-1,arr.shape[1])
We can also use np.einsum -
matrix = np.einsum('ij,ik->ijk',arr,arr).reshape(-1,arr.shape[1])
einsum string representation might be more intuitive as it lets us visualize three things :
Axes that are aligned (axis=0 here).
Axes that are getting summed up (none here).
Axes that are kept i.e. element-wise multiplied (axis=1 here).

Probability functions convolution in python

There are N distributions which take on integer values 0,... with associated probabilities. Further, I assume 3 variables [value, prob]:
import numpy as np
x = np.array([ [0,0.3],[1,0.2],[3,0.5] ])
y = np.array([ [10,0.2],[11,0.4],[13,0.1],[14,0.3] ])
z = np.array([ [21,0.3],[23,0.7] ])
As there are N variables I convolve first x+y, then I add z, and so on.
Unfortunately numpy.convole() takes 1-d arrays as input variables, so it does not suit in this case directly. I play with variables to take them all values 0,1,2,...,23 (if value is not know then Pr=0)... I feel like there is another much better solution.
Does anyone have a suggestion for making it more efficient? Thanks in advance.
I don't see a built-in method for this in Scipy; there's a way to define a custom discrete random variables, but those don't support addition. Here is an approach using pandas, assuming import pandas as pd and x,y,z as in your example:
values = np.add.outer(x[:,0], y[:,0]).flatten()
probs = np.multiply.outer(x[:,1], y[:,1]).flatten()
df = pd.DataFrame({'values': values, 'probs': probs})
conv = df.groupby('values').sum()
result = conv.reset_index().values
The output is
array([[ 10. , 0.06],
[ 11. , 0.16],
[ 12. , 0.08],
[ 13. , 0.13],
[ 14. , 0.31],
[ 15. , 0.06],
[ 16. , 0.05],
[ 17. , 0.15]])
With more than two variables, you don't have to go back and forth between numpy and pandas: the additional variables can be included at the beginning.
values = np.add.outer(np.add.outer(x[:,0], y[:,0]), z[:,0]).flatten()
probs = np.multiply.outer(np.multiply.outer(x[:,1], y[:,1]), z[:,1]).flatten()
Aside: it would be better to keep values and probabilities in separate numpy arrays, if they have different intrinsic data types (integers vs reals).

Reshaping numpy array from list

I have the following problem with shape of ndarray:
out.shape = (20,)
reference.shape = (20,0)
norm = [out[i] / np.sum(out[i]) for i in range(len(out))]
# norm is a list now so I convert it to ndarray:
norm_array = np.array((norm))
norm_array.shape = (20,30)
# error: operands could not be broadcast together with shapes (20,30) (20,)
diff = np.fabs(norm_array - reference)
How can I change shape of norm_array from (20,30) into (20,) or reference to (20,30), so I can substract them?
EDIT: Can someone explain me, why they have different shape, if I can access both single elements with norm_array[0][0] and reference[0][0] ?
I am not sure what you are trying to do exactly, but here is some information on numpy arrays.
A 1-d numpy array is a row vector with a shape that is a single-valued tuple:
>>> np.array([1,2,3]).shape
(3,)
You can create multidimensional arrays by passing in nested lists. Each sub-list is a 1-d row vector of length 1, and there are 3 of them.
>>> np.array([[1],[2],[3]]).shape
(3,1)
Here is the weird part. You can create the same array, but leave the lists empty. You end up with 3 row vectors of length 0.
>>> np.array([[],[],[]]).shape
(3,0)
This is what you have for you reference array, an array with structure but no values. This brings me back to my original point:
You can't subtract an empty array.
If I make 2 arrays with the shapes you describe, I get an error
In [1856]: norm_array=np.ones((20,30))
In [1857]: reference=np.ones((20,0))
In [1858]: norm_array-reference
...
ValueError: operands could not be broadcast together with shapes (20,30) (20,0)
But it's different from yours. But if I change the shape of reference, the error messages match.
In [1859]: reference=np.ones((20,))
In [1860]: norm_array-reference
...
ValueError: operands could not be broadcast together with shapes (20,30) (20,)
So your (20,0) is wrong. I don't know if you mistyped something or not.
But if I make reference 2d with 1 in the last dimension, broadcasting works, producing a difference that matches (20,30) in shape:
In [1861]: reference=np.ones((20,1))
In [1862]: norm_array-reference
If reference = np.zeros((20,)), then I could use reference[:,None] to add that singleton last dimension.
If reference is (20,), you can't do reference[0][0]. reference[0][0] only works with 2d arrays with at least 1 in the last dim. reference[0,0] is the preferred way of indexing a single element of a 2d array.
So far this is normal array dimensions and broadcasting; something you'll learn with use.
===============
I'm puzzled about the shape of out. If it is (20,), how does norm_array end up as (20,30). out must consist of 20 arrays or lists, each of which has 30 elements.
If out was 2d array, we could normalize without iteration
In [1869]: out=np.arange(12).reshape(3,4)
with the list comprehension:
In [1872]: [out[i]/np.sum(out[i]) for i in range(out.shape[0])]
Out[1872]:
[array([ 0. , 0.16666667, 0.33333333, 0.5 ]),
array([ 0.18181818, 0.22727273, 0.27272727, 0.31818182]),
array([ 0.21052632, 0.23684211, 0.26315789, 0.28947368])]
In [1873]: np.array(_) # and to array
Out[1873]:
array([[ 0. , 0.16666667, 0.33333333, 0.5 ],
[ 0.18181818, 0.22727273, 0.27272727, 0.31818182],
[ 0.21052632, 0.23684211, 0.26315789, 0.28947368]])
Instead take row sums, and tell it to keep it 2d for ease of further use
In [1876]: out.sum(axis=1,keepdims=True)
Out[1876]:
array([[ 6],
[22],
[38]])
now divide
In [1877]: out/out.sum(axis=1,keepdims=True)
Out[1877]:
array([[ 0. , 0.16666667, 0.33333333, 0.5 ],
[ 0.18181818, 0.22727273, 0.27272727, 0.31818182],
[ 0.21052632, 0.23684211, 0.26315789, 0.28947368]])

numpy get std between datasets

I have a dataset array A. A is nĂ—2. It can be plotted on the x and y axis.
A[:,1] gets me all of the Y values ans A[:,0] gets me all the x values.
Now, I have a few other dataset arrays that are similar to A. X values are the same for these similar arrays. How do I calculate the standard deviation of the datasets? There should be a std value for each X. In the end my result std should have a length of n.
I can do this the manual way with loops but I'm not sure how to do this using NumPy in a pythonic and simple manner.
here are some sample data:
A=[[0,2.54],[1,254.5],[2,-43]]
B=[[0,3.34],[1,154.5],[2,-93]]
std_Array=[std(2.54,3.54),std(254.5,154.5),std(-43,-93)]
Suppose your arrays are all the same shape and they are in a list. Then to get the standard deviation of the first column of each you can do
arrays = [np.random.rand(10, 2) for _ in range(8)]
np.dstack(arrays).std(axis=0)[0]
This stacks the 2-D arrays into a 3-D array an then takes the std along the first axis, giving a 2 X 8 (the number of arrays). The first row of the result is the std. devs. of the 8 sets of x-values.
If you post some sample data perhaps we could help more.
Is this pythonic enough?
std_Array = numpy.std((A,B), axis = 0)[:,1]
li_arr = [np.array(x)[: , 1] for x in [A , B]]
This will produce numpy arrays with specifi columns you want to add the result will be
[array([ 2.54, 254.5 , -43. ]), array([ 3.34, 154.5 , -93. ])]
then you stack the values using column_stack
arr = np.column_stack(li_arr)
this will be the result stacking
array([[ 2.54, 3.34],
[ 254.5 , 154.5 ],
[ -43. , -93. ]])
and then finally
np.std(arr , axis = 1)

Categories