Suppose that we are given a two dimensional matrix A of dtype=uint8 with N rows and M columns and a uint8 vector of size N called x. We need to bit-wise XOR each row of A, e.g. A[i], with the corresponding element in x, i.e. x[i].
Currently, I am doing this as follows, but think that there are more efficient ways of doing that with numpy vectorization capabilities.
for i in range(A.shape[0]):
A[i,:] = np.bitwise_xor(A[i,:], x[i]
This is the row wised XOR. Besides this, this XOR needs to be applied column-wise, too.
Thanks in advance.
Related
I have two NxM arrays in numpy, a and b. I would like to perform a vectorized operation that does the following:
c = np.zeros(N)
for i in range(N):
for j in range(M):
c[i] += a[i, j]*b[i, j]
Stated in a more mathematical way, I have two matrices A and B, and want to compute the diagonal of the matrix A*B (being imprecise with matrix transposition, etc). I've been trying to accomplish something like this with the tensordot function, but haven't had much success. This is an operation that is going to be performed many times, so I would like for it to be efficient (i.e., without literally calculating the matrix AB and just taking the diagonal from that).
I'm currently trying to fill a matrix K where each entry in the matrix is just a function applied to two entries of an array x.
At the moment I'm using the most obvious method of running through rows and columns one at a time using a double for-loop:
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
for j in range(x.shape[0]):
K[i,j] = f(x[i],x[j])
While this works fine the resulting matrix is a 10,000 by 10,000 matrix and takes very long to calculate. I was wondering if there is a more efficient way to do this built into NumPy?
EDIT: The function in question here is a gaussian kernel:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.dot(vec,vec)/(2*sigma**2))
where I set sigma in advance before calculating the matrix.
The array x is an array of shape (10000, 8). So the scalar product in the gaussian is between two vectors of dimension 8.
You can use a single for loop together with broadcasting. This requires to change the implementation of the gaussian function to accept 2D inputs:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.sum(vec**2, axis=-1)/(2*sigma**2))
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
K[i] = gaussian(x[i:i+1], x)
Theoretically you could accomplish this even without any for loop, again by using broadcasting, but here an intermediary array of size len(x)**2 * x.shape[1] will be created which might run out of memory for your array sizes:
K = gaussian(x[None, :, :], x[:, None, :])
I have a function which currently multiplies a matrix in scipy.sparse.csr_matrix form by a vector. I use this function for different values lots of times and I would like the matrix * vector multiplication to be as efficient as possible. The matrix is an N x N matrix, but only contains m x N non-zero elements, where m << N. The non-zero elements are currently arranged randomly about the matrix. I could perform row operations to get this matrix in a form such that all the elements appear on only m + 2 diagonals. Then use scipy.sparse.dia_matrix instead of scipy.sparse.csr_matrix. It will take quite a bit of work so I was wondering if anyone knows if this will even improve the computational efficiency?
I've got some numpy 2d arrays:
x, of shape(N,T)
W, of shape(V,D)
they are described as the following:
"Minibatches of size N where each sequence has length T. We assume a vocabulary of V words, assigning each to a vector of dimension D."(This is a question from cs231 A3.)
I want an output array of shape(N, T, D), where i can match the N elements to the desired vectors.
First I came out with the solution using a loop to run through all the elements in the first row of x:
for n in range(N):
out[n, :, :] = W[x[n, :]]
Then I go on to experiment with the second solution:
out = W[x]
Both solutions gave me the right answer, but why does the second solution work? Why can I index a 3d array in a 2d array?
Let X be a M x N matrix. Denote xi the i-th column of X. I want to create a 3 dimensional N x M x M array consisting of M x M matrices xi.dot(xi.T).
How can I do it most elegantly with numpy? Is it possible to do this using only matrix operations, without loops?
One approach with broadcasting -
X.T[:,:,None]*X.T[:,None]
Another with broadcasting and swapping axes afterwards -
(X[:,None,:]*X).swapaxes(0,2)
Another with broadcasting and a multi-dimensional transpose afterwards -
(X[:,None,:]*X).T
Another approach with np.einsum, which might be more intuitive thinking in terms of the iterators involved if you are translating from a loopy code -
np.einsum('ij,kj->jik',X,X)
Basic idea in all of these approaches is that we spread out the last axis for elementwise multiplication against each other keeping the first axis aligned. We achieve this process of putting against each other by extending X to two 3D array versions.