Numpy broadcasting 2d-1d problems - python

I have three numpy arrays:
X1.shape = (500,)
X2.shape = (5000,)
Y.shape = (5000,500)
I can run X - X2 without a problem.
But Y - X1 results in:
ValueError: operands could not be broadcast together with shapes (5000,500) (5000,)
If I change to Y - X1[:,None] this seems to work while Y - X2[:,None] gives the error:
ValueError: operands could not be broadcast together with shapes (5000,500) (500,1)
Please clarify!

Related

invert array using SVD function in numpy

I want to get the inverse of the R array of shape (3,2) using the svd method
R = [[190.93095651 189.30517758]
[187.01785506 185.38861727]
[183.29225361 181.47205695]]
I tried the following
u, s, vh = np.linalg.svd(r, full_matrices=True)
vh_1 = np.transpose(vh)
u_1 = np.transpose(u)
s_1 = np.transpose(s)
Rv = (u_1 * s_1) *vh_1
The shape of the resulted matrix RV is (2, 2, 3) I expected to get a (2,3) dimension array instead
I want to proceed with the RV array and multiply it with (2,1) array A
A = [-0.20434669 -0.20225446]
print(np.dot(np.transpose(Rv),A))
And i expect a (3,1) array as a result. However I got a (3,2) array instead.

np.concatenate doesn't allow sequential concatenation

I have been trying to concatenate two 1D arrays using np.concatenate but it doesn't work as expected. Can someone please let me know where I'm making a mistake?
My code is as follows:
x = np.array([1.13793103, 0.24137931, 0.48275862, 1.24137931, 1.00000000, 1.89655172])
y = np.array([0.03666667, 0.00888889, 0.01555556, 0.04 , 0.03222222, 0.06111111])
z = np.concatenate((x,y), axis=0)
print(z)
array([1.13793103, 0.24137931, 0.48275862, ... 0.04, 0.03222222, 0.06111111])
print(f'{type(x)} {type(y)} {type(z)}')
<class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'>
print(f'{x.shape} {y.shape} {z.shape}')
(6,) (6,) (12,)
So, instead of adding y as a new array, it's joining the two arrays which isn't my intention. I am looking for something as follows:
array([1.13793103, 0.24137931, 0.48275862, 1.24137931, 1.00000000, 1.89655172],
[0.03666667, 0.00888889, 0.01555556, 0.04 , 0.03222222, 0.06111111])
You can use np.concatenate to concatenate along some axis if that dimension exists in the arrays that you want to concatenate:
x = np.array([1,2,3])
y = np.array([4,5,6])
here, x and y have shape (3,) so only one axis.
This means you can only concatenate along that axis (i.e. axis=0):
z = np.concatenate((x,y))
z.shape
out : (6,)
concatenating along axis=1 will throw an error:
z = np.concatenate((x,y), axis=1)
AxisError: axis 1 is out of bounds for array of dimension 1
You can make np.concatenate work, if you reshape x and y:
x, y = x.reshape(-1,1), y.reshape(-1,1)
Now both have shape (3,1) and can be concatenated along axis 1:
z = np.concatenate((x.reshape(-1,1),y.reshape(-1,1)),axis=1)
z.shape
(6,2)
alternatively, you can reshape to (1,3) and concatenate along axis 0:
z = np.concatenate((x.reshape(1,-1),y.reshape(1,-1)),axis=0)
z.shape
(2,6)
or you use np.vstack, which does not require the reshaping.

Multipy each row of numpy array with matrix

What is the most pythonic way to multiply each row(axis=2) of a np array with a matrix. For example, I am working with images read as np array of shape (480, 512, 3), I want to multiply each img[i,j] with a 3x3 matrix. I don't want to use for loops for this. This is what I tried but it gives an error
A = np.array([
[.412453, .35758, .180423],
[.212671, .71516, .072169],
[.019334, .119193, .950227]
])
lin_XYZ = lambda x: np.dot(A, x[::-1])
#lin_XYZ = np.vectorize(lin_XYZ)
tmp_img = lin_XYZ(tmp_img[:,:])
File ".\proj1a.py", line 24, in color2luv
tmp_img = lin_XYZ(tmp_img[:,:])
File ".\proj1a.py", line 22, in <lambda>
lin_XYZ = lambda x: np.dot(A, x)
ValueError: shapes (3,3) and (480,512,3) not aligned: 3 (dim 1) != 512 (dim 1)
So A is (3,3) and x is (480, 512, 3), and you what is a dot on the size 3 dimension. The key thing to remember with dot(A,B) is, last dim of A with 2nd to the last of B. (That's what the error is complaining about 3 (dim 1) != 512 (dim 1))
x.dot(A)
x.dot(A.T)
would meet that requirement.
A.dot(x.transpose(0,2,1)) # (3,3) with (480,3,512)
would also work, though the resulting array may need further transposing - assuming you want the 3 to be last.
You can also pair dimensions with einsum or tensordot:
np.einsum('ij,kli->klj', A, x)
x[::-1] flips x on its first dimenion, the 480 one. Shape remains the same. Did you want the transpose?

Input dimensions for distance function for nearest neighbors

In the context of unsupervised nearest neighbors with scikit-learn, I have implemented my own distance function to deal with my uncertain points (i.e. a point is represented as a normal distribution):
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
However, when I set my nearest neighbors:
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
where X is a (N, 4) (n_samples, n_features) array, if I print x and y in my my_mahalanobis_distance, I get shapes of (10,) instead of (4,) as I would expect.
Example:
I add the following line to my_mahalanobis_distance:
print(x.shape)
Then in my main:
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The result is:
(10,)
ValueError: shapes (2,) and (8,8) not aligned: 2 (dim 0) != 8 (dim 0)
I perfectly understand the error, but I do not understand why my x.shape is (10,) while my number of features is 4 in X.
I am using Python 2.7.10 and scikit-learn 0.16.1.
EDIT:
replacing return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv) by return 1 just for testing return:
(10,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
So only the first call to my_mahalanobis_distance is wrong. Looking at the x and y values at this first iteration, my observations are:
x and y are identical
if I run my code multiple times, x and y are still identical but their values have change compared to the previous run.
these values seem coming from a numpy.random function.
I would conclude that such a first call is a debugging piece of code which has not been removed.
This is not an answer, yet too long for a comment. I can not reproduce the error.
Using:
Python 3.5.2 and
Sklearn 0.18.1
with the code:
from sklearn.neighbors import NearestNeighbors
import numpy as np
import scipy as sp
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
def my_mahalanobis_distance(x, y):
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
print(x.shape)
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The output is
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
I customed my my_mahalanobis_distance to handle this issue:
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
if (x.size, y.size) == (4, 4):
return sp.spatial.distance.mahalanobis(x[:2], y[:2],
np.linalg.inv(np.diag(x[2:])
+ np.diag(y[2:])))
# to handle the buggy first call when calling NearestNeighbors.fit()
else:
warnings.warn('x and y are respectively of size %i and %i' % (x.size, y.size))
return sp.spatial.distance.euclidean(x, y)

Raising an array to different values

I'm planning on plotting y^n vs x for different values of n. Here is my sample code:
import numpy as np
x=np.range(1,5)
y=np.range(2,9,2)
exponent=np.linspace(1,8,50)
z=y**exponent
With this, I got the following error:
ValueError: operands could not be broadcast together with shapes (4) (5)
My idea is that for each value of n, I will get an array where that array contains the new values of y that is now raised to n. For instance:
y1= [] #an array where y**1
y2= [] #an array where y**1.5
y3= [] #an array where y**2
etc. I don't know if how I can get that 50 arrays for y**n and is there an easier way to do it? Thank you.
You can use "broadcasting" (explained here in the docs) and create a new axis:
z = y**exponent[:,np.newaxis]
In other words, instead of
>>> y = np.arange(2,9,2)
>>> exponent = np.linspace(1, 8, 50)
>>> z = y**exponent
Traceback (most recent call last):
File "<ipython-input-40-2fe7ff9626ed>", line 1, in <module>
z = y**exponent
ValueError: operands could not be broadcast together with shapes (4,) (50,)
You can use array[:,np.newaxis] (or array[:,None], the same thing, but newaxis is more explicit about your intent) to give the array an extra dimension of size 1:
>>> exponent.shape
(50,)
>>> exponent[:,np.newaxis].shape
(50, 1)
and so
>>> z = y**exponent[:,np.newaxis]
>>> z.shape
(50, 4)
>>> z[0]
array([ 2., 4., 6., 8.])
>>> z[1]
array([ 2.20817903, 4.87605462, 7.75025005, 10.76720154])
>>> z[0]**exponent[1]
array([ 2.20817903, 4.87605462, 7.75025005, 10.76720154])

Categories