TL;DR
I want to replicate the functionality of numpy.matmul in theano. What's the best way to do this?
Too Short; Didn't Understand
Looking at theano.tensor.dot and theano.tensor.tensordot, I'm not seeing an easy way to do a straightforward batch matrix multiplication. i.e. treat the last two dimensions of N dimensional tensors as matrices, and multiply them. Do I need to resort to some goofy usage of theano.tensor.batched_dot? Or *shudder* loop them myself without broadcasting!?
The current pull requests don't support broadcasting, so I came up with this for now. I may clean it up, add a little more functionality, and submit my own PR as a temporary solution. Until then, I hope this helps someone!
I included the test to show it replicates numpy.matmul, given that the input complies with my more strict (temporary) assertions.
Also, .scan stops iterating the sequences at argmin(*sequencelengths) iterations. So, I believe that mismatched array shapes won't raise any exceptions.
import theano as th
import theano.tensor as tt
import numpy as np
def matmul(a: tt.TensorType, b: tt.TensorType, _left=False):
"""Replicates the functionality of numpy.matmul, except that
the two tensors must have the same number of dimensions, and their ndim must exceed 1."""
# TODO ensure that broadcastability is maintained if both a and b are broadcastable on a dim.
assert a.ndim == b.ndim # TODO support broadcasting for differing ndims.
ndim = a.ndim
assert ndim >= 2
# If we should left multiply, just swap references.
if _left:
tmp = a
a = b
b = tmp
# If a and b are 2 dimensional, compute their matrix product.
if ndim == 2:
return tt.dot(a, b)
# If they are larger...
else:
# If a is broadcastable but b is not.
if a.broadcastable[0] and not b.broadcastable[0]:
# Scan b, but hold a steady.
# Because b will be passed in as a, we need to left multiply to maintain
# matrix orientation.
output, _ = th.scan(matmul, sequences=[b], non_sequences=[a[0], 1])
# If b is broadcastable but a is not.
elif b.broadcastable[0] and not a.broadcastable[0]:
# Scan a, but hold b steady.
output, _ = th.scan(matmul, sequences=[a], non_sequences=[b[0]])
# If neither dimension is broadcastable or they both are.
else:
# Scan through the sequences, assuming the shape for this dimension is equal.
output, _ = th.scan(matmul, sequences=[a, b])
return output
def matmul_test() -> bool:
vlist = []
flist = []
ndlist = []
for i in range(2, 30):
dims = int(np.random.random() * 4 + 2)
# Create a tuple of tensors with potentially different broadcastability.
vs = tuple(
tt.TensorVariable(
tt.TensorType('float64',
tuple((p < .3) for p in np.random.ranf(dims-2))
# Make full matrices
+ (False, False)
)
)
for _ in range(2)
)
vs = tuple(tt.swapaxes(v, -2, -1) if j % 2 == 0 else v for j, v in enumerate(vs))
f = th.function([*vs], [matmul(*vs)])
# Create the default shape for the test ndarrays
defshape = tuple(int(np.random.random() * 5 + 1) for _ in range(dims))
# Create a test array matching the broadcastability of each v, for each v.
nds = tuple(
np.random.ranf(
tuple(s if not v.broadcastable[j] else 1 for j, s in enumerate(defshape))
)
for v in vs
)
nds = tuple(np.swapaxes(nd, -2, -1) if j % 2 == 0 else nd for j, nd in enumerate(nds))
ndlist.append(nds)
vlist.append(vs)
flist.append(f)
for i in range(len(ndlist)):
assert np.allclose(flist[i](*ndlist[i]), np.matmul(*ndlist[i]))
return True
if __name__ == "__main__":
print("matmul_test -> " + str(matmul_test()))
Related
The answer for three matrices was given in this question, but I'm not sure how to apply this logic to an arbitrary amount of pairwise connected matrices:
f(i, j, k, l, ...) = min(A(i, j), B(i,k), C(i,l), D(j,k), E(j,l), F(k,l), ...)
Where A,B,... are matrices and i,j,... are indices that range up to the respective dimensions of the matrices. If we consider n indices, there are n(n-1)/2 pairs and thus matrices. I would like to find (i,j,k,...) such that f(i,j,k,l,...) is maximized. I am currently doing that as follows:
import numpy as np
import itertools
# i j k l ...
dimensions = [50,50,50,50]
n_dims = len(dimensions)
pairs = list(itertools.combinations(range(n_dims), 2))
# Construct the matrices A(i,j), B(i,k), ...
matrices = [];
for pair in pairs:
matrices.append(np.random.rand(dimensions[pair[0]], dimensions[pair[1]]))
# All the different i,j,k,l... combinations
combinations = itertools.product(*list(map(np.arange,dimensions)))
combinations = np.asarray(list(combinations))
# Find the maximum minimum
vals = []
for i in range(len(pairs)):
pair = pairs[i]
matrix = matrices[i]
vals.append(matrix[combinations[:,pair[0]], combinations[:,pair[1]]])
f = np.min(vals,axis=0)
best_indices = combinations[np.argmax(f)]
print(best_indices, np.max(f))
[5 17 17 18] 0.932985854758534
This is faster than iterating over all (i, j, k, l, ...), but a lot of time is spent constructing the combinations and vals matrices. Is there an alternative way to do this where (1) the speed of numpy's matrix computation can be preserved and (2) I don't have to construct the memory-intensive vals matrices?
Here is a generalisation of the 3D solution. I assume there are other (better?) ways of organising the recursion but this works well enough. It does a 6D example (product of dims 9x10^6) in <10 ms
Sample run, note that occasionally the indices returned by the two methods do not match. This is because they are not always unique, sometimes different index combinations yield the same maximum of minima. Also note that in the very end we do a single run of a huge 6D 9x10^12 example. Brute force is no longer viable on that, the smart method takes about 10 seconds.
trial 1
results identical True
results compatible True
brute force 276.8830654968042 ms
branch cut 9.971900499658659 ms
trial 2
results identical True
results compatible True
brute force 273.444719001418 ms
branch cut 9.236706099909497 ms
trial 3
results identical True
results compatible True
brute force 274.2998780013295 ms
branch cut 7.31226220013923 ms
trial 4
results identical True
results compatible True
brute force 273.0268925006385 ms
branch cut 6.956217200058745 ms
HUGE (100, 150, 200, 100, 150, 200) 9000000000000
branch cut 10246.754082996631 ms
Code:
import numpy as np
import itertools as it
import functools as ft
def bf(dims,pairs):
dims,pairs = np.array(dims),np.array(pairs,object)
n,m = len(dims),len(pairs)
IDX = np.empty((m,n),object)
Y,X = np.triu_indices(n,1)
IDX[np.arange(m),Y] = slice(None)
IDX[np.arange(m),X] = slice(None)
idx = np.unravel_index(
ft.reduce(np.minimum,(p[(*i,)] for p,i in zip(pairs,IDX))).argmax(),dims)
return ft.reduce(np.minimum,(
p[I] for p,I in zip(pairs,it.combinations(idx,2)))),idx
def cut(dims,pairs,offs=None):
n = len(dims)
if n<3:
if n==2:
A = pairs[0] if offs is None else np.minimum(
pairs[0],np.minimum.outer(offs[0],offs[1]))
idx = np.unravel_index(A.argmax(),dims)
return A[idx],idx
else:
idx = offs[0].argmax()
return offs[0][idx],(idx,)
gmx = min(map(np.min,pairs))
gidx = n * (0,)
A = pairs[0] if offs is None else np.minimum(
pairs[0],np.minimum.outer(offs[0],offs[1]))
Y,X = np.unravel_index(A.argsort(axis=None)[::-1],dims[:2])
for y,x in zip(Y,X):
if A[y,x] <= gmx:
return gmx,gidx
coffs = [np.minimum(p1[y],p2[x])
for p1,p2 in zip(pairs[1:n-1],pairs[n-1:])]
if not offs is None:
coffs = [*map(np.minimum,coffs,offs[2:])]
cmx,cidx = cut(dims[2:],pairs[2*n-3:],coffs)
if cmx >= A[y,x]:
return A[y,x],(y,x,*cidx)
if gmx < cmx:
gmx = min(A[y,x],cmx)
gidx = y,x,*cidx
return gmx,gidx
from timeit import timeit
IDX = 10,15,20,10,15,20
for rep in range(4):
print("trial",rep+1)
pairs = [np.random.rand(i,j) for i,j in it.combinations(IDX,2)]
print("results identical",cut(IDX,pairs)==bf(IDX,pairs))
print("results compatible",cut(IDX,pairs)[1]==bf(IDX,pairs)[1])
print("brute force",timeit(lambda:bf(IDX,pairs),number=2)*500,"ms")
print("branch cut",timeit(lambda:cut(IDX,pairs),number=10)*100,"ms")
IDX = 100,150,200,100,150,200
pairs = [np.random.rand(i,j) for i,j in it.combinations(IDX,2)]
print("HUGE",IDX,np.prod(IDX))
print("branch cut",timeit(lambda:cut(IDX,pairs),number=1)*1000,"ms")
In the following code I have been able to:
Implement Gaussian elimination with no pivoting for a general square linear system.
I have tested it by solving Ax=b, where A is a random 100x100 matrix and b is a random 100x1 vector.
I have compared my solution against the solution obtained using numpy.linalg.solve
However in the final task I need to compute the infinity norm of the difference between the two solutions. I know the infinity norm is the greatest absolute row sum of a matrix. But how can I do this to compute the infinity norm of the difference between the two solutions, my solution and the numpy.linalg.solve. Looking for some help with this!
import numpy as np
def GENP(A, b):
'''
Gaussian elimination with no pivoting.
% input: A is an n x n nonsingular matrix
% b is an n x 1 vector
% output: x is the solution of Ax=b.
% post-condition: A and b have been modified.
'''
n = len(A)
if b.size != n:
raise ValueError("Invalid argument: incompatible sizes between A & b.", b.size, n)
for pivot_row in range(n-1):
for row in range(pivot_row+1, n):
multiplier = A[row][pivot_row]/A[pivot_row][pivot_row]
#the only one in this column since the rest are zero
A[row][pivot_row] = multiplier
for col in range(pivot_row + 1, n):
A[row][col] = A[row][col] - multiplier*A[pivot_row][col]
#Equation solution column
b[row] = b[row] - multiplier*b[pivot_row]
x = np.zeros(n)
k = n-1
x[k] = b[k]/A[k,k]
while k >= 0:
x[k] = (b[k] - np.dot(A[k,k+1:],x[k+1:]))/A[k,k]
k = k-1
return x
if __name__ == "__main__":
A = np.round(np.random.rand(100, 100)*10)
b = np.round(np.random.rand(100)*10)
print (GENP(np.copy(A), np.copy(b)))
for example this code gives the following output for task 1 listed above:
[-6.61537666 0.95704368 1.30101768 -3.69577873 -2.51427519 -4.56927017
-1.61201589 2.88242622 1.67836096 2.18145556 2.60831672 0.08055869
-2.39347903 2.19672137 -0.91609732 -1.17994959 -3.87309152 -2.53330865
5.97476318 3.74687301 5.38585146 -2.71597978 2.0034079 -0.35045844
0.43988439 -2.2623829 -1.82137544 3.20545721 -4.98871738 -6.94378666
-6.5076601 3.28448129 3.42318453 -1.63900434 4.70352047 -4.12289961
-0.79514656 3.09744616 2.96397264 2.60408589 2.38707091 8.72909353
-1.33584905 1.30879264 -0.28008339 0.93560728 -1.40591226 1.31004142
-1.43422946 0.41875924 3.28412668 3.82169545 1.96675247 2.76094378
-0.90069455 1.3641636 -0.60520103 3.4814196 -1.43076816 5.01222382
0.19160657 2.23163261 2.42183726 -0.52941262 -7.35597457 -3.41685057
-0.24359225 -5.33856181 -1.41741354 -0.35654736 -1.71158503 -2.24469314
-3.26453092 1.0932765 1.58333208 0.15567584 0.02793548 1.59561909
0.31732915 -1.00695954 3.41663177 -4.06869021 3.74388762 -0.82868155
1.49789582 -1.63559124 0.2741194 -1.11709237 1.97177449 0.66410154
0.48397714 -1.96241854 0.34975886 1.3317751 2.25763568 -6.80055066
-0.65903682 -1.07105965 -0.40211347 -0.30507635]
then for task two my code gives the following:
my_solution = GENP(np.copy(A), np.copy(b))
numpy_solution = np.linalg.solve(A, b)
print(numpy_solution)
resulting in:
[-6.61537666 0.95704368 1.30101768 -3.69577873 -2.51427519 -4.56927017
-1.61201589 2.88242622 1.67836096 2.18145556 2.60831672 0.08055869
-2.39347903 2.19672137 -0.91609732 -1.17994959 -3.87309152 -2.53330865
5.97476318 3.74687301 5.38585146 -2.71597978 2.0034079 -0.35045844
0.43988439 -2.2623829 -1.82137544 3.20545721 -4.98871738 -6.94378666
-6.5076601 3.28448129 3.42318453 -1.63900434 4.70352047 -4.12289961
-0.79514656 3.09744616 2.96397264 2.60408589 2.38707091 8.72909353
-1.33584905 1.30879264 -0.28008339 0.93560728 -1.40591226 1.31004142
-1.43422946 0.41875924 3.28412668 3.82169545 1.96675247 2.76094378
-0.90069455 1.3641636 -0.60520103 3.4814196 -1.43076816 5.01222382
0.19160657 2.23163261 2.42183726 -0.52941262 -7.35597457 -3.41685057
-0.24359225 -5.33856181 -1.41741354 -0.35654736 -1.71158503 -2.24469314
-3.26453092 1.0932765 1.58333208 0.15567584 0.02793548 1.59561909
0.31732915 -1.00695954 3.41663177 -4.06869021 3.74388762 -0.82868155
1.49789582 -1.63559124 0.2741194 -1.11709237 1.97177449 0.66410154
0.48397714 -1.96241854 0.34975886 1.3317751 2.25763568 -6.80055066
-0.65903682 -1.07105965 -0.40211347 -0.30507635]
finally for task 3:
if np.allclose(my_solution, numpy_solution):
print("These solutions agree")
else:
print("These solutions do not agree")
resulting in:
These solutions agree
If what you want is only the infinity norm for matrix,
it generally should look something like this:
def inf_norm(matrix):
return max(abs(row.sum()) for row in matrix)
But since your my_solution and numpy_solution are just 1-D vectors, you
may either to reshape them (I assume 100x1 which is what you have in your
example) for use with above function:
alternative 1:
def inf_norm(matrix):
return max(abs(row.sum()) for row in matrix)
diff = my_solution - numpy_solution
inf_norm_result = inf_norm(diff.reshape((100, 1))
alternative 2:
Or if you know they will always be 1-D vectors, you can omit the sum
(because the rows will all have length 1) and compute it directly:
abs(my_solution - numpy_solution).max()
alternative 3:
or as it is written in numpy.linalg.norm (see below) documentation:
max(sum(abs(my_solution - numpy_solution), axis=1))
alternative 4:
or use the numpy.linalg.norm() (see: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.linalg.norm.html):
np.linalg.norm(my_solution - numpy_solution, np.inf)
So I have a problem that might be super duper simple.
I have these numpy ndarrays that I allocated and want to assign values to them via indices returned as lists. It might be easier if I showed you some example code. The questionable code I have is at the bottom, and in my testing (before actually taking this to scale) I keep getting syntax errors :'(
EDIT: edited to make it easier to troubleshoot and put some example code at the bottoms
import numpy as np
def do_stuff(index, mask):
# this is where the calculations are made
magic = sum(mask)
return index, magic
def foo(full_index, comparison_dims, *xargs):
# I have this function executed in Parallel since I'm using a machine with 36 nodes per core, and can access upto 16 cores for each script #blessed
# figure out how many dimensions there are, and how big they are
parent_dims = []
parent_diffs = []
for j in xargs:
parent_dims += [len(j)]
parent_diffs += [j[1] - j[0]] # this is used to find a mask
index = [] # this is where the individual dimension indices will be stored
dim_n = 0
# loop through the dimensions
while dim_n < len(parent_dims):
dim_index = full_index % parent_dims[dim_n]
index += [dim_index]
if dim_n == 0:
mask = (comparison_dims[dim_n] > xargs[dim_n][dim_index] - parent_diffs[dim_n]/2) * \
(comparison_dims[dim_n] <= xargs[dim_n][dim_index] +parent_diffs[dim_n] / 2)
else:
mask *= (comparison_dims[dim_n] > xargs[dim_n][dim_index] - parent_diffs[dim_n]/2) * \
(comparison_dims[dim_n] <=xargs[dim_n][dim_index] + parent_diffs[dim_n] / 2)
full_index //= parent_dims[dim_n]
dim_n += 1
return do_stuff(index, mask)
def bar(comparison_dims, *xargs):
if len(xargs) == comparison_dims.shape[0]:
pass
elif len(comparison_dims.shape) == 2:
pass
else:
raise ValueError("silly person, you failed")
from joblib import Parallel, delayed
dims = []
for j in xargs:
dims += [len(j)]
myArray = np.empty(tuple(dims))
results = Parallel(n_jobs=1)(
delayed(foo)(
index, comparison_dims, *xargs)
for index in range(np.prod(dims))
)
# LOOK HERE, HELP HERE!
for index_list, result in results:
# I thought this would work, but oh golly was I was wrong, index_list here is a list of ints, and result is a value
# for example index, result = [0,3,7], 45.4
# so in execution, that would yield: myArray[0,3,7] = 45.4
# instead it yields SyntaxError because I don't know what I'm doing XD
myArray[*index_list] = result
return myArray
Any ideas how I can make that work. What do I need to do?
I'm not the sharpest tool in the shed, but I think with your help we might be able to figure this out!
A quick example to troubleshoot this problem would be:
compareDims = np.array([np.random.rand(1000), np.random.rand(1000)])
dim0 = np.arange(0,1,1./20)
dim1 = np.arange(0,1,1./30)
myArray = bar(compareDims, dim0, dim1)
To index a numpy array with an arbitrary list of multidimensional indices. you actually need to use a tuple:
for index_list, result in results:
myArray[tuple(index_list)] = result
This might be weird to you people, but I happen to have this weird goal to achieve, code goes as follows.
# A is a numpy array, dtype=int32,
# and each element is actually an ID(int), the ID range might be wide,
# but the actually existing values are quite fewer than the dense range,
A = array([[379621, 552965, 192509],
[509849, 252786, 710979],
[379621, 718598, 591201],
[509849, 35700, 951719]])
# and I need to map these sparse ID to dense ones,
# my idea is to have a dict, mapping actual_sparse_ID -> dense_ID
M = {}
# so I iterate this numpy array, and check if this sparse ID has a dense one or not
for i in np.nditer(A, op_flags=['readwrite']):
if i not in M:
M[i] = len(M) # sparse ID got a dense one
i[...] = M[i] # replace sparse one with the dense ID
My goal could be achieved with np.unique(A, return_inverse=True), and the return_inverse result is what I want.
However, the numpy array I have is too huge to fully load into memory, so I cannot run np.unique over the whole data, and this is why I came up with this dict-mapping idea...
Is this the right way to go? Any possible improvement?
I will make an attempt to provide an alternative way of doing this by using numpy.unique() on sub-arrays. This solution is not fully tested. I also did not do any side-by-side performance evaluation since your solution is not fully working for me.
Let's say we have an array c that we split into two smaller arrays. Let's create some test data, for example:
>>> a = np.array([[1,1,2,3,4],[1,2,6,6,2],[8,0,1,1,4]])
>>> b = np.array([[11,2,-1,12,6],[12,2,6,11,2],[7,0,3,1,3]])
>>> c = np.vstack([a, b])
>>> print(c)
[[ 1 1 2 3 4]
[ 1 2 6 6 2]
[ 8 0 1 1 4]
[11 2 -1 12 6]
[12 2 6 11 2]
[ 7 0 3 1 3]]
Here we assume that c is the large array and a and b are sub-arrays. Of course, one could build c first and then extract sub-arrays.
Next step is to run numpy.unique() on the two sub-arrays:
>>> ua, ia = np.unique(a, return_inverse=True)
>>> ub, ib = np.unique(b, return_inverse=True)
>>> uc, ic = np.unique(c, return_inverse=True) # this is for future reference
Now, here is an algorithm for combining the results from subarrays:
def merge_unique(ua, ia, ub, ib):
# make copies *if* changing inputs is undesirable:
ua = ua.copy()
ia = ia.copy()
ub = ub.copy()
ib = ib.copy()
# find differences between unique values in the two arrays:
diffab = np.setdiff1d(ua, ub, assume_unique=True)
diffba = np.setdiff1d(ub, ua, assume_unique=True)
# find indices in ua, ub where to insert "other" unique values:
ssa = np.searchsorted(ua, diffba)
ssb = np.searchsorted(ub, diffab)
# throw away values that are too large:
ssa = ssa[np.where(ssa < len(ua))]
ssb = ssb[np.where(ssb < len(ub))]
# increment indices past previously computed "insert" positions:
for v in ssa[::-1]:
ia[ia >= v] += 1
for v in ssb[::-1]:
ib[ib >= v] += 1
# combine results:
uc = np.union1d(ua, ub) # or use ssa, ssb, diffba, diffab to update ua, ub
ic = np.concatenate([ia, ib])
return uc, ic
Now, let's run this function on the results of numpy.unique() from sub-arrays and then compare merged indices and unique values with the reference results uc and ic:
>>> uc2, ic2 = merge_unique(ua, ia, ub, ib)
>>> np.all(uc2 == uc)
True
>>> np.all(ic2 == ic)
True
Splitting into more than two sub-arrays can be handled with little additional work - simply keep accumulating "unique" values and indices, like this:
uacc, iacc = np.unique(subarr1, return_inverse=True)
ui, ii = np.unique(subarr2, return_inverse=True)
uacc, iacc = merge_unique(uacc, iacc, ui, ii)
ui, ii = np.unique(subarr3, return_inverse=True)
uacc, iacc = merge_unique(uacc, iacc, ui, ii)
ui, ii = np.unique(subarr4, return_inverse=True)
uacc, iacc = merge_unique(uacc, iacc, ui, ii)
................................ (etc.)
This question focuses on numpy.
I have a set of matrices which all share the same number of columns and have different number of rows. Let's call them A, B, C, D, etc and let their dimensions be IaxK IbxK, IcxK, etc
What I want is to efficiently compute the IaxIbxIc... tensor P defined as follow:
P(ia,ib,ic,id,ie,...)=\sum_k A(ia,k)B(ib,k)C(ic,k)...
So if I have two factors, I end up with simple matrix product.
Of course I can compute this "by hand" through outer products, something like:
def parafac(factors,components=None):
ndims = len(factors)
ncomponents = factors[0].shape[1]
total_result=array([])
if components is None:
components=range(ncomponents)
for k in components:
#for each component (to save memory)
result = array([])
for dim in range(ndims-1,-1,-1):
#Augments model with next dimension
current_dim_slice=[slice(None,None,None)]
current_dim_slice.extend([None]*(ndims-dim-1))
current_dim_slice.append(k)
if result.size:
result = factors[dim].__getitem__(tuple(current_dim_slice))*result[None,...]
else:
result = factors[dim].__getitem__(tuple(current_dim_slice))
if total_result.size:
total_result+=result
else:
total_result=result
return total_result
Still, I would like something much more computationally efficient, like relying on builtin numpy functions, but I cannot find relevant functions, can someone help me ?
Cheers, thanks
Thank you all very much for your answers, I've spent the day on this and I eventually found the solution, so I post it here for the record
This solution requires numpy 1.6 and makes use of einsum, which is
powerful voodoo magic
basically, if you had factor=[A,B,C,D] with A,B,C and D matrices with
the same number of columns, then you would compute the parafac model using:
import numpy
P=numpy.einsum('az,bz,cz,dz->abcd',A,B,C,D)
so, one line!
In the general case, I end up with this:
def parafac(factors):
ndims = len(factors)
request=''
for temp_dim in range(ndims):
request+=string.lowercase[temp_dim]+'z,'
request=request[:-1]+'->'+string.lowercase[:ndims]
return einsum(request,*factors)
Having in mind that outer product is Kronecker product in disguise your problem should be solved by this simple functions:
def outer(vectors):
shape=[v.shape[0] for v in vectors]
return reduce(np.kron, vectors).reshape(shape)
def cp2Tensor(l,A):
terms=[]
for r in xrange(A[0].shape[1]):
term=l[r]*outer([A[n][:,r] for n in xrange(len(A))])
terms.append(term)
return sum(terms)
cp2Tensor takes list of real numbers and list of matrices.
Edited after comment by Jaime.
Ok, so the following works. First a worked out example of what's going on...
a = np.random.rand(5, 8)
b = np.random.rand(4, 8)
c = np.random.rand(3, 8)
ret = np.ones(5,4,3,8)
ret *= a.reshape(5,1,1,8)
ret *= b.reshape(1,4,1,8)
ret *= c.reshape(1,1,3,8)
ret = ret.sum(axis=-1)
And a full function
def tensor(elems) :
cols = elems[0].shape[-1]
n_elems = len(elems)
ret = np.ones(tuple([j.shape[0] for j in elems] + [cols]))
for j,el in enumerate(elems) :
ret *= el.reshape((1,) * j + (el.shape[0],) +
(1,) * (len(elems) - j - 1) + (cols,))
return ret.sum(axis=-1)