What is the most pythonic way to multiply each row(axis=2) of a np array with a matrix. For example, I am working with images read as np array of shape (480, 512, 3), I want to multiply each img[i,j] with a 3x3 matrix. I don't want to use for loops for this. This is what I tried but it gives an error
A = np.array([
[.412453, .35758, .180423],
[.212671, .71516, .072169],
[.019334, .119193, .950227]
])
lin_XYZ = lambda x: np.dot(A, x[::-1])
#lin_XYZ = np.vectorize(lin_XYZ)
tmp_img = lin_XYZ(tmp_img[:,:])
File ".\proj1a.py", line 24, in color2luv
tmp_img = lin_XYZ(tmp_img[:,:])
File ".\proj1a.py", line 22, in <lambda>
lin_XYZ = lambda x: np.dot(A, x)
ValueError: shapes (3,3) and (480,512,3) not aligned: 3 (dim 1) != 512 (dim 1)
So A is (3,3) and x is (480, 512, 3), and you what is a dot on the size 3 dimension. The key thing to remember with dot(A,B) is, last dim of A with 2nd to the last of B. (That's what the error is complaining about 3 (dim 1) != 512 (dim 1))
x.dot(A)
x.dot(A.T)
would meet that requirement.
A.dot(x.transpose(0,2,1)) # (3,3) with (480,3,512)
would also work, though the resulting array may need further transposing - assuming you want the 3 to be last.
You can also pair dimensions with einsum or tensordot:
np.einsum('ij,kli->klj', A, x)
x[::-1] flips x on its first dimenion, the 480 one. Shape remains the same. Did you want the transpose?
Related
I'm trying to translate the as_strided function of NumPy to a function in Python when I translate ahead the number of strides to the number of variables according to the type of the variable (for float32 I divide the stride by 4, etc).
The code I implemented:
def as_strided(x, shape, strides):
x = x.flatten()
size = 1
for value in shape:
size *= value
arr = np.zeros(size, dtype=np.float32)
curr = 0
for i in range(shape[0]):
for j in range(shape[1]):
for k in range(shape[2]):
arr[curr] = x[i * strides[0] + j * strides[1] + k * strides[2]]
curr = curr + 1
return np.reshape(arr, shape)
In order to test the function I wrote 2 auxiliary functions:
def sliding_window(x, shape, strides):
f_mine = as_strided(x, shape, [stride // 4 for stride in strides])
f_np = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides).copy()
check_strides(x.flatten(), f_mine)
check_strides(x.flatten(), f_np)
return f_mine, f_np
def check_strides(original, strided):
s1 = int(np.where(original == strided[1][0][0])[0])
s2 = int(np.where(original == strided[0][1][0])[0])
s3 = int(np.where(original == strided[0][0][1])[0])
print([s1, s2, s3])
return [s1, s2, s3]
In the main code, I selected some shape and strides values and ran 2 cases:
Uploaded a .npy file that includes a matrix in float32 - variable x.
Created random matrix of the same size and type as variable x - variable y.
When I check the strides of the resulting matrices I get a strange phenomenon.
For case 1 - the final resulted strides obtained using the NumPy function are different from the required stride (and from my implementation).
For case 2 - the outputs are identical.
The main code:
shape = (30, 818, 300)
strides = (4, 120, 120)
# case 1
x = np.load('x.npy')
s_mine, s_np = sliding_window(x, shape, strides)
print(np.array_equal(s_mine, s_np))
#case 2
y = np.random.randn(x.shape[0], x.shape[1]).astype(np.float32)
s_mine, s_np = sliding_window(y, shape, strides)
print(np.array_equal(s_mine, s_np))
Here you can find the x.npy file that causes the desired stride change in the numpy function. I'd be happy if anyone could explain to me why this is happening.
I downloaded x.npy and loaded it. And ran as_strided on y. I haven't looked at your code.
Normally when playing with as_strided I like to look at the arrays, but in this case they are large enough that I'll focus more making sense the strides and shape.
In [39]: x.shape, x.strides
Out[39]: ((30, 1117), (4, 120))
In [40]: y.shape, y.strides
Out[40]: ((30, 1117), (4468, 4))
I wondered where you got the
shape = (30, 818, 300)
strides = (4, 120, 120)
OK the 30 is shared, but the 4 is only for x. And with those strides x looks like it's F ordered, may be even a transpose of a (1117,30) array. Your y, which was constructed with random, has the typical strides for C ordered array, 4 bytes for the inner, trailing dimension, and 4*1117 for the leading dimension.
A n-dimensional array which is initialised as
features=np.empty(shape=(100,5,2), dtype=float)
and I am trying to add 3D array into it as
features[i,:] = features_next
features_next has shape (2,5,2).
However, it shows error,
ValueError: could not broadcast input array from shape (2,5,2) into shape (5,2).
here is the piece of code :
features=np.empty(shape=((historical*2),5,2), dtype=float)
i = 0
while i < 50:
state = self.getDictState(state_index)
asks = state['asks']
bids = state['bids']
features_next = self.getNormalisedFeature(
bids=bids,
asks=asks,
state_index=state_index,
qty=qty,
price=price,
size=size,
normalize=normalize,
levels=levels
)
'''if i == 0:
features = np.array(features_next)
else:
features = np.vstack((features, features_next))'''
features[i,:] = features_next
state_index = (state_index - 1)
return features
Note : I am trying to replace commented 'if condition' with features[i,:] = features_next to make the code execution bit faster.
Its pretty simple, just one point to make features[i,:] has shape (5, 2) and feature_next has shape (2, 5, 2). I want to say that both are compatible shapes. But broadcasting is done on a smaller shape over a larger shape. SO there is error since you are doing revere. Also look up on broadcasting on numpy docs.
Next, this one I think will do
This does not directly solve your problem, but has some ideas about what you can try, like you can reshape your array before going in loop using features.shape = (50, 2, 5, 2).
import numpy as np
features=np.empty(shape=(100,5,2), dtype=float)
features_next = np.random.random((2, 5, 2))
features.shape = ((50, 2, 5, 2))
features[:] = features_next
features.shape = (100, 5, 2)
I am new on Python and I don't know exactly how to perform multiplication between arrays of different shape.
I have two different arrays w and b such that:
W.shape = [32, 5, 20]
b.shape = [5,]
and I want to multiply
W[:, i, :]*b[i]
for each i from 0 to 4.
How can I do that? Thanks in advance.
You could add a new axis to b so it is multiplied accross W's inner arrays' rows, i.e the second axis:
W * b[:,None]
What you want to do is called Broadcasting. In numpy, you can multiply this way, but only if the shapes match according to some restrictions:
Starting from the right, every component of each arrays' shape must be the equal, 1, or not exist
so right now you have:
W.shape = (32, 5, 20)
b.shape = (5,)
since 20 and 5 don't match, they cant' be broadcast.
If you were to have:
W.shape = (32, 5, 20)
b.shape = (5, 1 )
20 would match with 1 (1 is always ok) and the 5's would match, and you can then multiply them.
To get b's shape to (5, 1), you can either do .reshape(5, 1) (or, more robustly, .reshape(-1, 1)) or fancy index with [:, None]
Thus either of these work:
W * b[:,None] #yatu's answer
W * b.reshape(-1, 1)
I am trying to update the weights in a neural network with this line:
self.l1weights[0] = self.l1weights[0] + self.learning_rate * l1error
And this results in a value error:
ValueError: could not broadcast input array from shape (7,7) into shape (7)
Printing the learning_rate*error and the weights returns something like this:
[[-0.00657573]
[-0.01430752]
[-0.01739463]
[-0.00038115]
[-0.01563393]
[-0.02060908]
[-0.01559269]]
[ 4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01
1.46755891e-01 9.23385948e-02 1.86260211e-01]
It is clear the weights are initialized as a vector of length 7 in this example and the error is initialized as a 7x1 matrix. I would expect addition to return a 7x1 matrix or a vector as well, but instead it generates a 7x7 matrix like this:
[[ 4.10446271e-01 7.13748760e-01 -6.46135890e-03 2.95756839e-01
1.40180157e-01 8.57628611e-02 1.79684478e-01]
[ 4.02714481e-01 7.06016970e-01 -1.41931487e-02 2.88025049e-01
1.32448367e-01 7.80310713e-02 1.71952688e-01]
[ 3.99627379e-01 7.02929868e-01 -1.72802505e-02 2.84937947e-01
1.29361266e-01 7.49439695e-02 1.68865586e-01]
[ 4.16640855e-01 7.19943343e-01 -2.66775370e-04 3.01951422e-01
1.46374741e-01 9.19574446e-02 1.85879061e-01]
[ 4.01388075e-01 7.04690564e-01 -1.55195551e-02 2.86698643e-01
1.31121961e-01 7.67046648e-02 1.70626281e-01]
[ 3.96412924e-01 6.99715412e-01 -2.04947062e-02 2.81723492e-01
1.26146810e-01 7.17295137e-02 1.65651130e-01]
[ 4.01429313e-01 7.04731801e-01 -1.54783174e-02 2.86739880e-01
1.31163199e-01 7.67459026e-02 1.70667519e-01]]
Numpy.sum also returns the same 7x7 matrix. Is there a way to solve this without explicit reshaping? Output size is variable and this is an issue specific to when the output size is one.
When adding (7,) array (named a) with (1, 7) array (named b), broadcasting happens and generates (7, 7) array. If your just want to do element-by-element addition, keep them in the same shape.
a + b.flatten() gives (7,). flatten makes all the dimensions collapse into one. This keeps the result as a row.
a.reshape(-1, 1) + b gives (1, 7). -1 in reshape requires numpy to decide how many elements are there given other dimensions. This keeps the result as a column.
a = np.arange(7) # row
b = a.reshape(-1, 1) # column
print((a + b).shape) # (7, 7)
print((a + b.flatten()).shape) # (7,)
print((a.reshape(-1, 1) + b).shape) # (7, 1)
In your case, a and b would be self.l1weights[0] and self.learning_rate * l1error respectively.
In the context of unsupervised nearest neighbors with scikit-learn, I have implemented my own distance function to deal with my uncertain points (i.e. a point is represented as a normal distribution):
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
However, when I set my nearest neighbors:
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
where X is a (N, 4) (n_samples, n_features) array, if I print x and y in my my_mahalanobis_distance, I get shapes of (10,) instead of (4,) as I would expect.
Example:
I add the following line to my_mahalanobis_distance:
print(x.shape)
Then in my main:
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The result is:
(10,)
ValueError: shapes (2,) and (8,8) not aligned: 2 (dim 0) != 8 (dim 0)
I perfectly understand the error, but I do not understand why my x.shape is (10,) while my number of features is 4 in X.
I am using Python 2.7.10 and scikit-learn 0.16.1.
EDIT:
replacing return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv) by return 1 just for testing return:
(10,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
So only the first call to my_mahalanobis_distance is wrong. Looking at the x and y values at this first iteration, my observations are:
x and y are identical
if I run my code multiple times, x and y are still identical but their values have change compared to the previous run.
these values seem coming from a numpy.random function.
I would conclude that such a first call is a debugging piece of code which has not been removed.
This is not an answer, yet too long for a comment. I can not reproduce the error.
Using:
Python 3.5.2 and
Sklearn 0.18.1
with the code:
from sklearn.neighbors import NearestNeighbors
import numpy as np
import scipy as sp
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
def my_mahalanobis_distance(x, y):
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
print(x.shape)
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The output is
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
I customed my my_mahalanobis_distance to handle this issue:
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
if (x.size, y.size) == (4, 4):
return sp.spatial.distance.mahalanobis(x[:2], y[:2],
np.linalg.inv(np.diag(x[2:])
+ np.diag(y[2:])))
# to handle the buggy first call when calling NearestNeighbors.fit()
else:
warnings.warn('x and y are respectively of size %i and %i' % (x.size, y.size))
return sp.spatial.distance.euclidean(x, y)