I have three numpy arrays:
X1.shape = (500,)
X2.shape = (5000,)
Y.shape = (5000,500)
I can run X - X2 without a problem.
But Y - X1 results in:
ValueError: operands could not be broadcast together with shapes (5000,500) (5000,)
If I change to Y - X1[:,None] this seems to work while Y - X2[:,None] gives the error:
ValueError: operands could not be broadcast together with shapes (5000,500) (500,1)
Please clarify!
Related
I want to get the inverse of the R array of shape (3,2) using the svd method
R = [[190.93095651 189.30517758]
[187.01785506 185.38861727]
[183.29225361 181.47205695]]
I tried the following
u, s, vh = np.linalg.svd(r, full_matrices=True)
vh_1 = np.transpose(vh)
u_1 = np.transpose(u)
s_1 = np.transpose(s)
Rv = (u_1 * s_1) *vh_1
The shape of the resulted matrix RV is (2, 2, 3) I expected to get a (2,3) dimension array instead
I want to proceed with the RV array and multiply it with (2,1) array A
A = [-0.20434669 -0.20225446]
print(np.dot(np.transpose(Rv),A))
And i expect a (3,1) array as a result. However I got a (3,2) array instead.
I have been trying to concatenate two 1D arrays using np.concatenate but it doesn't work as expected. Can someone please let me know where I'm making a mistake?
My code is as follows:
x = np.array([1.13793103, 0.24137931, 0.48275862, 1.24137931, 1.00000000, 1.89655172])
y = np.array([0.03666667, 0.00888889, 0.01555556, 0.04 , 0.03222222, 0.06111111])
z = np.concatenate((x,y), axis=0)
print(z)
array([1.13793103, 0.24137931, 0.48275862, ... 0.04, 0.03222222, 0.06111111])
print(f'{type(x)} {type(y)} {type(z)}')
<class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'>
print(f'{x.shape} {y.shape} {z.shape}')
(6,) (6,) (12,)
So, instead of adding y as a new array, it's joining the two arrays which isn't my intention. I am looking for something as follows:
array([1.13793103, 0.24137931, 0.48275862, 1.24137931, 1.00000000, 1.89655172],
[0.03666667, 0.00888889, 0.01555556, 0.04 , 0.03222222, 0.06111111])
You can use np.concatenate to concatenate along some axis if that dimension exists in the arrays that you want to concatenate:
x = np.array([1,2,3])
y = np.array([4,5,6])
here, x and y have shape (3,) so only one axis.
This means you can only concatenate along that axis (i.e. axis=0):
z = np.concatenate((x,y))
z.shape
out : (6,)
concatenating along axis=1 will throw an error:
z = np.concatenate((x,y), axis=1)
AxisError: axis 1 is out of bounds for array of dimension 1
You can make np.concatenate work, if you reshape x and y:
x, y = x.reshape(-1,1), y.reshape(-1,1)
Now both have shape (3,1) and can be concatenated along axis 1:
z = np.concatenate((x.reshape(-1,1),y.reshape(-1,1)),axis=1)
z.shape
(6,2)
alternatively, you can reshape to (1,3) and concatenate along axis 0:
z = np.concatenate((x.reshape(1,-1),y.reshape(1,-1)),axis=0)
z.shape
(2,6)
or you use np.vstack, which does not require the reshaping.
What is the most pythonic way to multiply each row(axis=2) of a np array with a matrix. For example, I am working with images read as np array of shape (480, 512, 3), I want to multiply each img[i,j] with a 3x3 matrix. I don't want to use for loops for this. This is what I tried but it gives an error
A = np.array([
[.412453, .35758, .180423],
[.212671, .71516, .072169],
[.019334, .119193, .950227]
])
lin_XYZ = lambda x: np.dot(A, x[::-1])
#lin_XYZ = np.vectorize(lin_XYZ)
tmp_img = lin_XYZ(tmp_img[:,:])
File ".\proj1a.py", line 24, in color2luv
tmp_img = lin_XYZ(tmp_img[:,:])
File ".\proj1a.py", line 22, in <lambda>
lin_XYZ = lambda x: np.dot(A, x)
ValueError: shapes (3,3) and (480,512,3) not aligned: 3 (dim 1) != 512 (dim 1)
So A is (3,3) and x is (480, 512, 3), and you what is a dot on the size 3 dimension. The key thing to remember with dot(A,B) is, last dim of A with 2nd to the last of B. (That's what the error is complaining about 3 (dim 1) != 512 (dim 1))
x.dot(A)
x.dot(A.T)
would meet that requirement.
A.dot(x.transpose(0,2,1)) # (3,3) with (480,3,512)
would also work, though the resulting array may need further transposing - assuming you want the 3 to be last.
You can also pair dimensions with einsum or tensordot:
np.einsum('ij,kli->klj', A, x)
x[::-1] flips x on its first dimenion, the 480 one. Shape remains the same. Did you want the transpose?
In the context of unsupervised nearest neighbors with scikit-learn, I have implemented my own distance function to deal with my uncertain points (i.e. a point is represented as a normal distribution):
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
However, when I set my nearest neighbors:
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
where X is a (N, 4) (n_samples, n_features) array, if I print x and y in my my_mahalanobis_distance, I get shapes of (10,) instead of (4,) as I would expect.
Example:
I add the following line to my_mahalanobis_distance:
print(x.shape)
Then in my main:
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The result is:
(10,)
ValueError: shapes (2,) and (8,8) not aligned: 2 (dim 0) != 8 (dim 0)
I perfectly understand the error, but I do not understand why my x.shape is (10,) while my number of features is 4 in X.
I am using Python 2.7.10 and scikit-learn 0.16.1.
EDIT:
replacing return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv) by return 1 just for testing return:
(10,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
So only the first call to my_mahalanobis_distance is wrong. Looking at the x and y values at this first iteration, my observations are:
x and y are identical
if I run my code multiple times, x and y are still identical but their values have change compared to the previous run.
these values seem coming from a numpy.random function.
I would conclude that such a first call is a debugging piece of code which has not been removed.
This is not an answer, yet too long for a comment. I can not reproduce the error.
Using:
Python 3.5.2 and
Sklearn 0.18.1
with the code:
from sklearn.neighbors import NearestNeighbors
import numpy as np
import scipy as sp
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
def my_mahalanobis_distance(x, y):
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
print(x.shape)
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The output is
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
I customed my my_mahalanobis_distance to handle this issue:
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
if (x.size, y.size) == (4, 4):
return sp.spatial.distance.mahalanobis(x[:2], y[:2],
np.linalg.inv(np.diag(x[2:])
+ np.diag(y[2:])))
# to handle the buggy first call when calling NearestNeighbors.fit()
else:
warnings.warn('x and y are respectively of size %i and %i' % (x.size, y.size))
return sp.spatial.distance.euclidean(x, y)
I'm planning on plotting y^n vs x for different values of n. Here is my sample code:
import numpy as np
x=np.range(1,5)
y=np.range(2,9,2)
exponent=np.linspace(1,8,50)
z=y**exponent
With this, I got the following error:
ValueError: operands could not be broadcast together with shapes (4) (5)
My idea is that for each value of n, I will get an array where that array contains the new values of y that is now raised to n. For instance:
y1= [] #an array where y**1
y2= [] #an array where y**1.5
y3= [] #an array where y**2
etc. I don't know if how I can get that 50 arrays for y**n and is there an easier way to do it? Thank you.
You can use "broadcasting" (explained here in the docs) and create a new axis:
z = y**exponent[:,np.newaxis]
In other words, instead of
>>> y = np.arange(2,9,2)
>>> exponent = np.linspace(1, 8, 50)
>>> z = y**exponent
Traceback (most recent call last):
File "<ipython-input-40-2fe7ff9626ed>", line 1, in <module>
z = y**exponent
ValueError: operands could not be broadcast together with shapes (4,) (50,)
You can use array[:,np.newaxis] (or array[:,None], the same thing, but newaxis is more explicit about your intent) to give the array an extra dimension of size 1:
>>> exponent.shape
(50,)
>>> exponent[:,np.newaxis].shape
(50, 1)
and so
>>> z = y**exponent[:,np.newaxis]
>>> z.shape
(50, 4)
>>> z[0]
array([ 2., 4., 6., 8.])
>>> z[1]
array([ 2.20817903, 4.87605462, 7.75025005, 10.76720154])
>>> z[0]**exponent[1]
array([ 2.20817903, 4.87605462, 7.75025005, 10.76720154])