matrix product of the two tensors [duplicate] - python

This question already has answers here:
How do I multiply matrices in PyTorch?
(4 answers)
Closed 3 years ago.
How would you calculate matrix product of the two tensors in PyTorch?
x = torch.Tensor([[1, 2, 3], [1, 2, 3]]).view(-1, 2)
y = torch.Tensor([[2, 1]]).view(2, -1)
I am confused between these options.

You can use one of the options from the code below:
In [188]: torch.einsum("ij, jk -> ik", x, y)
Out[188]:
tensor([[4.],
[7.],
[7.]])
In [189]: x.mm(y)
Out[189]:
tensor([[4.],
[7.],
[7.]])
In [193]: x # y
Out[193]:
tensor([[4.],
[7.],
[7.]])
In [194]: torch.matmul(x, y)
Out[194]:
tensor([[4.],
[7.],
[7.]])
As you can see, all these approaches would yield us the same result.
x*y is a hadamard product (element-wise multiplication) and will not work in this case. Also, torch.dot() would fail as well because it expects 1D tensors. torch.sum(x*y) would just give a single scalar value and that is also wrong since you wish to do matrix multiplication, not an inner product.

Related

Fast way to do consecutive one-to-all calculations on Numpy arrays without a for-loop?

I'm working on an optimization problem, but to avoid getting into the details, I'm going to provide a simple example of a bug that's been giving me headaches for a few days.
Say I have a 2D numpy array with observed x-y coordinates:
from scipy.optimize import distance
x = np.array([1,2], [2,3], [4,5], [5,6])
I also have a list of x-y coordinates to compare to these points (y):
y = np.array([11,13], [12, 14])
I have a function that takes the sum of manhattan differences between a value of x and all of the values in y:
def find_sum(ref_row, comp_rows):
modeled_counts = []
y = ref_row * len(comp_rows)
res = list(map(distance.cityblock, ref_row, comp_rows))
modeled_counts.append(sum(res))
return sum(modeled_counts)
Essentially, what I would like to do is find the sum of manhattan distances for every item in y with each item in x (so basically for each item in x, find the sum of the Manhattan distances between that (x,y) pair and every (x,y) pair in y).
I've tried this out with the following line of code:
z = list(map(find_sum, x, y))
However, z is of length 2 (like y), and not 4 like x. Is there a way to ensure that z is the result of consecutive one-to-all calculations? That is, I'd like to calculate the sum of all of the manhattan differences between x[0] and every set in y, and so on and so forth, so the length of z should be equal to the length of x.
Is there a simple way to do this without a for loop? My data is rather large (~ 4 million rows), so I'd really appreciate fast solutions. I'm fairly new to Python programming, so any explanations about why the solution works and is fast would be appreciated as well, but definitely isn't required!
Thanks!
This solution implements the distance in numpy, as I think it is a good example of broadcasting, which is a very useful thing to know if you need to use arrays and matrices.
By definition of Manhattan distance, you need to evaluate the sum of the absolute value of difference between each column. However, the first column of x, x[:, 0], has shape (4,) and the first column of y, y[:, 0], has shape (2,), so they are not compatible in the sense of applying subtraction: the broadcasting property says that each shape is compared starting with the trailing dimensions and two dimensions are compatible when they are equal or one of them is 1. Sadly, none of them are true for your columns.
However, you can add a new dimension of value 1 using np.newaxis, so
x[:, 0]
is array([1, 2, 4, 5]), but
x[:, 0, np.newaxis]
is
array([[1],
[2],
[4],
[5]])
and its shape is (4 ,1). Now, a matrix of shape (4, 1) subtracted by an array of shape 2 results in a matrix of shape (4, 2), by numpy's broadcasting treatment:
4 x 1
2
= 4 x 2
You can obtain the differences for each column:
first_column_difference = x[:, 0, np.newaxis] - y[:, 0]
second_column_difference = x[:, 1, np.newaxis] - y[:, 1]
and evaluate the sum of their absolute values:
np.abs(first_column_difference) + np.abs(second_column_difference)
which results in a (4, 2) matrix. Now, you want to sum the values for each row, so that you have 4 values:
np.sum(np.abs(first_column_difference) + np.abs(second_column_difference), axis=1)
which results in array([73, 69, 61, 57]). The rule is simple: the parameter axis will eliminate that dimension from the result, therefore using axis=1 for a (4, 2) matrix generates 4 values -- if you use axis=0, it will generate 2 values.
So, this will solve your problem:
x = np.array([[1, 2], [2, 3], [4, 5], [5, 6]])
y = np.array([[11, 13], [12, 43]])
first_column_difference = x[:, 0, np.newaxis] - y[:, 0]
second_column_difference = x[:, 1, np.newaxis] - y[:, 1]
z = np.abs(first_column_difference) + np.abs(second_column_difference)
print(np.sum(z, axis=1))
You can also skip the intermediate steps for each column and evaluate everything at once (it is a little bit harder to understand, so I prefer the method described above to explain what is happening):
print(np.abs(x[:, np.newaxis] - y).sum(axis=(1, 2)))
It is a general case for an n-dimensional Manhattan distance: if x is (u, n) and y is (v, n), it generates u rows by broadcasting (u, 1, n) by (v, n) = (u, v, n), then applying sum to eliminate the second and third axis.
Here is how you can do it using numpy broadcast with simplified explanation
Adjust Shape For Broadcasting
import numpy as np
start_points = np.array([[1,2], [2,3], [4,5], [5,6]])
dest_points = np.array([[11,13], [12, 14]])
## using np.newaxis as index add a new dimension at that position
## : give all the elements on that dimension
start_points = start_points[np.newaxis, :, :]
dest_points = dest_points[:, np.newaxis, :]
## Now lets check he shape of the point arrays
print('start_points.shape: ', start_points.shape) # (1, 4, 2)
print('dest_points.shape', dest_points.shape) # (2, 1, 2)
Lets try to understand
last element of shape represent x and y of a point, size 2
we can think of start_points as having 1 row and 4 columns of points
we can think of dest_points as having 2 rows and 1 columns of points
We can think start_points and dest_points as matrix or a table of points of size (1X4) and (2X1)
We clearly see that size are not compatible. What will happen if we perform arithmatic
operation between them? Here is where a smart part of numpy comes, called broadcast.
It will repeat rows of start_points to match that of dest_point making matrix of (2X4)
It will repeat columns of dest_point to match that of start_points making matrix of (2X4)
Result is arithmetic operation between every pair of elements on start_points and dest_points
Calculate the distance
diff_x_y = start_points - dest_points
print(diff_x_y.shape) # (2, 4, 2)
abs_diff_x_y = np.abs(start_points - dest_points)
man_distance = np.sum(abs_diff_x_y, axis=2)
print('man_distance:\n', man_distance)
sum_distance = np.sum(man_distance, axis=0)
print('sum_distance:\n', sum_distance)
Oneliner
start_points = np.array([[1,2], [2,3], [4,5], [5,6]])
dest_points = np.array([[11,13], [12, 14]])
np.sum(np.abs(start_points[np.newaxis, :, :] - dest_points[:, np.newaxis, :]), axis=(0,2))
Here is more detail explanation of broadcasting if you want to understand it more
With so many rows you can make substantial savings by using a smart algorithm. Let us for simplicity assume there is just one dimension; once we have established the algorithm, getting back to the general case is a simple matter of summing over coordinates.
The naive algorithm is O(mn) where m,n are the sizes of sets X,Y. Our algorithm is O((m+n)log(m+n)) so it scales much better.
We first have to sort the union of X and Y by coordinate and then form the cumsum over Y. Next, we find for each x in X the number YbefX of y in Y to its left and use it to look up the corresponding cumsum item YbefXval. The summed distances to all y to the left of x are YbefX times coordinate of x minus YbefXval, the distances to all y to the right are sum of all y coordinates minus YbefXval minus n - YbefX times coordinate of x.
Where does the saving come from? Sorting coordinates enables us to recycle the summations we have done before, instead of starting each time from scratch. This uses the fact that up to a sign we always sum the same y coordinates and going from left to right the signs flip one by one.
Code:
import numpy as np
from scipy.spatial.distance import cdist
from timeit import timeit
def pp(X,Y):
(m,k),(n,k) = X.shape,Y.shape
XY = np.concatenate([X.T,Y.T],1)
idx = XY.argsort(1)
Xmsk = idx<m
Ymsk = ~Xmsk
Xidx = np.arange(k)[:,None],idx[Xmsk].reshape(k,m)
Yidx = np.arange(k)[:,None],idx[Ymsk].reshape(k,n)
YbefX = Ymsk.cumsum(1)[Xmsk].reshape(k,m)
YbefXval = XY[Yidx].cumsum(1)[np.arange(k)[:,None],YbefX-1]
YbefXval[YbefX==0] = 0
XY[Xidx] = ((2*YbefX-n)*XY[Xidx]) - 2*YbefXval + Y.sum(0)[:,None]
return XY[:,:m].sum(0)
def summed_cdist(X,Y):
return cdist(X,Y,"minkowski",p=1).sum(1)
# demo
m,n,k = 1000,500,10
X,Y = np.random.randn(m,k),np.random.randn(n,k)
print("same result:",np.allclose(pp(X,Y),summed_cdist(X,Y)))
print("sort :",timeit(lambda:pp(X,Y),number=1000),"ms")
print("scipy cdist:",timeit(lambda:summed_cdist(X,Y),number=100)*10,"ms")
Sample run, comparing smart algo "sort" to naive algo implemented using cdist library function:
same result: True
sort : 1.4447695480193943 ms
scipy cdist: 36.41934019047767 ms

why won't these matrices broadcast together? ValueError: operands could not be broadcast together with shapes (5,2) (2,1)

I'm trying to setup back propagation for my neural network using numpy, but for some reason when I'm setting up the gradient decent equation for the matrix that holds my output weights, two of the matrix's (2,5)(5,1) in the gradient decent equation are not broadcasting together. Am I doing this wrong?
I've tried to dissect the equation into different parts to see if there is anything else that might be causing this, but so far I've pin pointed it down to specifically the entire matrix in the numerator, and the entire matrix in the denominator (the gradient decent equation is a fraction). I've also thought that it might be happening between the original output weights and the gradient decent equation, but that is also false because the the matrix for the output weights are (5,2) not (2,5). I've also tried functions other than numpy.divide, like using numpy.dot to multiply the first equation by the second to the power of -1.
dissected code
self.outputWeights = self.outputWeights - l *
#numarator
( -numpy.divide((2 * (numpy.dot(y.reshape(self.outputs, 1), (1+numpy.power(e, -n-b))).reshape(self.neurons, self.outputs)-w)).reshape(self.outputs, self.neurons),
#denominator
(numpy.power(1+ numpy.power(e, -n-b), 2)).reshape(self.neurons, 1)))
actual code
n = self.HIDDEN[self.layers]
b = self.bias[self.layers]
w = self.outputWeights
self.outputWeights = self.outputWeights - l * ( -numpy.divide((2 * (numpy.dot(y.reshape(self.outputs, 1), (1+numpy.power(e, -n-b))).reshape(self.neurons, self.outputs)-w)).reshape(self.outputs, self.neurons), (numpy.power(1+ numpy.power(e, -n-b), 2)).reshape(self.neurons, 1)))
I expected that because of the fact that the columns of the first matrix and the rows of the second matrix are the same size, that it wouldn't have a problem.
With a matrix product, dot, the rule is last dim of A pairs with 2nd to the last dim of B:
In [136]: x=np.arange(10).reshape(5,2); y=np.arange(2)[:,None]
In [137]: x.shape, y.shape
Out[137]: ((5, 2), (2, 1))
In [138]: x.dot(y)
Out[138]:
array([[1],
[3],
[5],
[7],
[9]])
In [139]: _.shape
Out[139]: (5, 1)
The inner 2's match, and the result is (5,1).
But with elementwise operations, such as * (multiply), divide and sum, those dimensions don't work
In [140]: x*y
---------------------------------------------------------------------------
ValueError: operands could not be broadcast together with shapes (5,2) (2,1)
A transpose of y works:
In [141]: x*y.T
Out[141]:
array([[0, 1],
[0, 3],
[0, 5],
[0, 7],
[0, 9]])
That's because y.T has shape (1,2). By broadcasting rules that can pair with (5,2) to produce a (5,2) array. The size 1 dimension can be expanded to match the 5 of x.

numpy vectorized assignment of sequences [duplicate]

This question already has answers here:
Vectorized NumPy linspace for multiple start and stop values
(4 answers)
Closed 6 years ago.
Is there a vectorized assigment of elements to sequences in numpy like in this discussion?
for instance:
xx = np.array([1,2], dtype=object)
expanded = np.arange(xx, xx+2)
instead of loops:
xx = np.array([1,2], dtype=object)
expanded = np.array([np.arange(x, x+2) for x in xx]).flatten()
This would be for mapping a scalar heuristic to the neighboring cells in a matrix that determined it (e.g. the range of cells that had the peak overlap from a correlation() operation).
Like this?
>>> xx = np.array([3,8,19])
>>> (xx[:,None]+np.arange(2)[None,:]).flatten()
array([ 3, 4, 8, 9, 19, 20])
The xx[:,None] operation turns the length n vector into an nx1 matrix, and the np.arange(2)[None,:]) operation creates a length 1x2 matrix containing [0., 1.]. Added together using array broadcasting gives an nx2 matrix, which is then flattened into a length 2n vector.

For a np.array([1, 2, 3]) why is the shape (3,) instead of (3,1)? [duplicate]

This question already has answers here:
Difference between numpy.array shape (R, 1) and (R,)
(8 answers)
Closed 6 years ago.
I noticed that for a rank 1 array with 3 elements numpy returns (3,) for the shape. I know that this tuple represents the size of the array along each dimension, but why isn't it (3,1)?
import numpy as np
a = np.array([1, 2, 3]) # Create a rank 1 array
print a.shape # Prints "(3,)"
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array
print b.shape # Prints "(2, 3)"
In a nutshell, this is because it's one-dimensional array (hence the one-element shape tuple). Perhaps the following will help clear things up:
>>> np.array([1, 2, 3]).shape
(3,)
>>> np.array([[1, 2, 3]]).shape
(1, 3)
>>> np.array([[1], [2], [3]]).shape
(3, 1)
We can even go three dimensions (and higher):
>>> np.array([[[1]], [[2]], [[3]]]).shape
(3, 1, 1)
You could equally ask, "Why is the shape not (3,1,1,1,1,1,1)? They are equivalent, after all.
NumPy often chooses to collapse singular dimensions, or treat them as optional, such as during broadcasting. This is powerful because a 3-vector has exactly the same number of elements, and in the same relative orientation, as a 3x1x1x1x1x1 matrix.

Sum a matrix along an axis given a weight vector using numpy [duplicate]

This question already has an answer here:
Numpy Array summing with weights
(1 answer)
Closed 6 years ago.
Does numpy provide a built-in way to sum a matrix along an axis given a corresponding weight vector? My goal is to get z as output:
q = np.array([[1, 2, 3], [10, 20, 30]])
w = [.3, .4]
z = (q[0] * w[0]) + (q[1] * w[1])
print z
>> [ 4.3 8.6 12.9]
If not, is there an efficient way to perform this operation taking advantage of broadcasting using numpy?
If you turn w into a numpy array with shape (2, 1) then you can broadcast the multiplication over the rows of q. One way to do the reshaping would be to index w with np.newaxis (or equivalently, with None):
q = np.array([[1, 2, 3], [10, 20, 30]])
w = np.array([.3, .4])
print(w[:, None] * q).sum(0)
# [ 4.3 8.6 12.9]
A faster and cleaner way would be to perform matrix-vector multiplication using np.dot:
print(w.dot(q))
# [ 4.3 8.6 12.9]
This seems to do the trick:
>>> np.sum(q * w[:, np.newaxis], axis=0)
array([ 4.3, 8.6, 12.9])
The trick is to realize that in order to multiply q by w, we need to insert a new axis into w. Numpy then can then expand along that axis as necessary to match the shape of q via normal broadcasting rules. Once the multiplication has been done, you just need to sum along the correct axis and Bob's your uncle.

Categories