Suppose we have
an n-dimensional numpy.array A
a numpy.array B with dtype=int and shape of (n, m)
How do I index A by B so that the result is an array of shape (m,), with values taken from the positions indicated by the columns of B?
For example, consider this code that does what I want when B is a python list:
>>> a = np.arange(27).reshape(3,3,3)
>>> a[[0, 1, 2], [0, 0, 0], [1, 1, 2]]
array([ 1, 10, 20]) # the result we're after
>>> bl = [[0, 1, 2], [0, 0, 0], [1, 1, 2]]
>>> a[bl]
array([ 1, 10, 20]) # also works when indexing with a python list
>>> a[bl].shape
(3,)
However, when B is a numpy array, the result is different:
>>> b = np.array(bl)
>>> a[b].shape
(3, 3, 3, 3)
Now, I can get the desired result by casting B into a tuple, but surely that cannot be the proper/idiomatic way to do it?
>>> a[tuple(b)]
array([ 1, 10, 20])
Is there a numpy function to achieve the same without casting B to a tuple?
One alternative would be converting to linear indices and then index with np.take or index into its flattened version -
np.take(a,np.ravel_multi_index(b, a.shape))
a.flat[np.ravel_multi_index(b, a.shape)]
Custom np.ravel_multi_index for performance boost
We could implement a custom version to simulate the behaviour of np.ravel_multi_index to boost the performance, like so -
def ravel_index(b, shp):
return np.concatenate((np.asarray(shp[1:])[::-1].cumprod()[::-1],[1])).dot(b)
Using it, the desired output would be found in two ways -
np.take(a,ravel_index(b, a.shape))
a.flat[ravel_index(b, a.shape)]
Benchmarking
Additionall incorporating tuple based method from the question and map based one from #Kanak's post.
Case #1 : dims = 3
In [23]: a = np.random.randint(0,9,([20]*3))
In [24]: b = np.random.randint(0,20,(a.ndim,1000000))
In [25]: %timeit a[tuple(b)]
...: %timeit a[map(np.ravel, b)]
...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
...: %timeit np.take(a,ravel_index(b, a.shape))
...: %timeit a.flat[ravel_index(b, a.shape)]
100 loops, best of 3: 6.56 ms per loop
100 loops, best of 3: 6.58 ms per loop
100 loops, best of 3: 6.95 ms per loop
100 loops, best of 3: 9.17 ms per loop
100 loops, best of 3: 6.31 ms per loop
100 loops, best of 3: 8.52 ms per loop
Case #2 : dims = 6
In [29]: a = np.random.randint(0,9,([10]*6))
In [30]: b = np.random.randint(0,10,(a.ndim,1000000))
In [31]: %timeit a[tuple(b)]
...: %timeit a[map(np.ravel, b)]
...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
...: %timeit np.take(a,ravel_index(b, a.shape))
...: %timeit a.flat[ravel_index(b, a.shape)]
10 loops, best of 3: 40.9 ms per loop
10 loops, best of 3: 40 ms per loop
10 loops, best of 3: 20 ms per loop
10 loops, best of 3: 29.9 ms per loop
100 loops, best of 3: 15.7 ms per loop
10 loops, best of 3: 25.8 ms per loop
Case #3 : dims = 10
In [32]: a = np.random.randint(0,9,([4]*10))
In [33]: b = np.random.randint(0,4,(a.ndim,1000000))
In [34]: %timeit a[tuple(b)]
...: %timeit a[map(np.ravel, b)]
...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
...: %timeit np.take(a,ravel_index(b, a.shape))
...: %timeit a.flat[ravel_index(b, a.shape)]
10 loops, best of 3: 60.7 ms per loop
10 loops, best of 3: 60.1 ms per loop
10 loops, best of 3: 27.8 ms per loop
10 loops, best of 3: 38 ms per loop
100 loops, best of 3: 18.7 ms per loop
10 loops, best of 3: 29.3 ms per loop
So, it makes sense to look for alternatives when working with higher-dimensional inputs and with large data.
Another alternative that fits your need involves the use of np.ravel
>>> a[map(np.ravel, b)]
array([ 1, 10, 20])
However not fully numpy-based.
Performance-concerns.
Updated following the comments below.
Be that as it may, your approach is better than mine, but not better than any of #Divakar's.
import numpy as np
import timeit
a = np.arange(27).reshape(3,3,3)
bl = [[0, 1, 2], [0, 0, 0], [1, 1, 2]]
b = np.array(bl)
imps = "from __main__ import np,a,b"
reps = 100000
tup_cas_t = timeit.Timer("a[tuple(b)]", imps).timeit(reps)
map_rav_t = timeit.Timer("a[map(np.ravel, b)]", imps).timeit(reps)
fla_rp1_t = timeit.Timer("np.take(a,np.ravel_multi_index(b, a.shape))", imps).timeit(reps)
fla_rp2_t = timeit.Timer("a.flat[np.ravel_multi_index(b, a.shape)]", imps).timeit(reps)
print tup_cas_t/map_rav_t ## 0.505382211881
print tup_cas_t/fla_rp1_t ## 1.18185817386
print tup_cas_t/fla_rp2_t ## 1.71288705886
Are you looking for numpy.ndarray.tolist() ?
>>> a = np.arange(27).reshape(3,3,3)
>>> bl = [[0, 1, 2], [0, 0, 0], [1, 1, 2]]
>>> b = np.array(bl)
>>> a[b.tolist()]
array([ 1, 10, 20])
Or for arrays indexing arrays which is quite similar to list indexing :
>>> a[np.array([0, 1, 2]), np.array([0, 0, 0]), np.array([1, 1, 2])]
array([ 1, 10, 20])
However as you can from the previous link, indexing an array a with an array b directly means you are indexing the first index of a only with your whole b array which can lead to confusing output.
Related
I have the following function which is returning an array calculating the nearest neighbor:
def p_batch(U,X,Y):
return [nearest(u,X,Y) for u in U]
I would like to replace the for loop using numpy. I've been looking into numpy.vectorize() as this seems to be the right approach, but I can't get it to work. This is what I've tried so far:
def n_batch(U,X,Y):
vbatch = np.vectorize(nearest)
return vbatch(U,X,Y)
Can anyone give me a hint where I went wrong?
Edit:
Implementation of nearest:
def nearest(u,X,Y):
return Y[np.argmin(np.sqrt(np.sum(np.square(np.subtract(u,X)),axis=1)))]
Function for U,X,Y (with M=20,N=100,d=50):
U = numpy.random.mtrand.RandomState(123).uniform(0,1,[M,d])
X = numpy.random.mtrand.RandomState(456).uniform(0,1,[N,d])
Y = numpy.random.mtrand.RandomState(789).randint(0,2,[N])
Approach #1
You could use Scipy's cdist to generate all those euclidean distances and then simply use argmin and index into Y -
from scipy.spatial.distance import cdist
out = Y[cdist(U,X).argmin(1)]
Sample run -
In [76]: M,N,d = 5,6,3
...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
...:
# Using a loop comprehension to verify values
In [77]: [nearest(U[i], X,Y) for i in range(len(U))]
Out[77]: [1, 0, 0, 1, 1]
In [78]: Y[cdist(U,X).argmin(1)]
Out[78]: array([1, 0, 0, 1, 1])
Approach #2
Another way with sklearn.metrics.pairwise_distances_argmin_min to give us those argmin indices directly -
from sklearn.metrics import pairwise
Y[pairwise.pairwise_distances_argmin(U,X)]
Runtime test with M=20,N=100,d=50 -
In [90]: M,N,d = 20,100,50
...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
...:
Testing between cdist and pairwise_distances_argmin -
In [91]: %timeit cdist(U,X).argmin(1)
10000 loops, best of 3: 55.2 µs per loop
In [92]: %timeit pairwise.pairwise_distances_argmin(U,X)
10000 loops, best of 3: 90.6 µs per loop
Timings against loopy version -
In [93]: %timeit [nearest(U[i], X,Y) for i in range(len(U))]
1000 loops, best of 3: 298 µs per loop
In [94]: %timeit Y[cdist(U,X).argmin(1)]
10000 loops, best of 3: 55.6 µs per loop
In [95]: %timeit Y[pairwise.pairwise_distances_argmin(U,X)]
10000 loops, best of 3: 91.1 µs per loop
In [96]: 298.0/55.6 # Speedup with cdist over loopy one
Out[96]: 5.359712230215827
I want to generate a cyclic sequence of numbers like: [A B C A B C] with arbitrary length N I tried:
import numpy as np
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.tile(x,N//3) # Repeats the sequence N//3 times
return y
but the problem with my code is if i enter any integer which ain't dividable by three then the results would have smaller length (N) than I excpected. I know this is very newbish question but i really got stuck
You can just use numpy.resize
x = np.array([1.0, 2.0, 3.0])
y = np.resize(x, 13)
y
Out[332]: array([ 1., 2., 3., 1., 2., 3., 1., 2., 3., 1., 2., 3., 1.])
WARNING: This is answer does not extend to 2D, as resize flattens the array before repeating it.
Approach #1 : Here'e one approach to handle generic sequences using modulus to generate those cyclic indices -
def cyclic_seq(x, N):
return np.take(x, np.mod(np.arange(N),len(x)))
Approach #2 : For performance, here's another method that tiles to the multiple of the max number of intervals and then making use of slicing to select the first N elements -
def cyclic_seq_v2(x, N):
return np.tile(x,(N+N-1)//len(x))[:N]
Sample runs -
In [81]: cyclic_seq([6,9,2,1,7],14)
Out[81]: array([6, 9, 2, 1, 7, 6, 9, 2, 1, 7, 6, 9, 2, 1])
In [82]: cyclic_seq_v2([6,9,2,1,7],14)
Out[82]: array([6, 9, 2, 1, 7, 6, 9, 2, 1, 7, 6, 9, 2, 1])
Runtime test
In [327]: x = np.random.randint(0,9,(3))
In [328]: %timeit np.resize(x, 10000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),10000)) # #Chris soln
...: %timeit cyclic_seq(x,10000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,10000) # Approach #2 from this post
...:
1000 loops, best of 3: 296 µs per loop
10000 loops, best of 3: 185 µs per loop
10000 loops, best of 3: 120 µs per loop
10000 loops, best of 3: 28.7 µs per loop
In [329]: x = np.random.randint(0,9,(30))
In [330]: %timeit np.resize(x, 10000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),10000)) # #Chris soln
...: %timeit cyclic_seq(x,10000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,10000) # Approach #2 from this post
...:
10000 loops, best of 3: 38.8 µs per loop
10000 loops, best of 3: 101 µs per loop
10000 loops, best of 3: 115 µs per loop
100000 loops, best of 3: 13.2 µs per loop
In [331]: %timeit np.resize(x, 100000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),100000)) # #Chris soln
...: %timeit cyclic_seq(x,100000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,100000) # Approach #2 from this post
...:
1000 loops, best of 3: 297 µs per loop
1000 loops, best of 3: 942 µs per loop
1000 loops, best of 3: 1.13 ms per loop
10000 loops, best of 3: 88.3 µs per loop
On performance, approach #2 seems to be working quite well.
First over-length it (using math.ceil) then resize it after tile
import numpy as np
import math
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.tile(x, math.ceil(N / 3.0))
y = np.resize(y, N)
return y
After taking Daniel Forsman's suggestion, it can be simplified as
import numpy as np
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.resize(x, N)
return y
because np.resize automatically tiles the response in 1D
You can use itertools.cycle, an infinite iterator, for this:
>>> import itertools
>>> it = itertools.cycle([1,2,3])
>>> next(it)
1
>>> next(it)
2
>>> next(it)
3
>>> next(it)
1
You get a specific length of sequence (N), combine it with itertools.islice:
>>> list(itertools.islice(itertools.cycle([1,2,3]),11))
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2]
EDIT: as you can see in Divakar's benchmark, this approach is generally intermediate in terms of speed compared to other answers. I recommend when this solution when you want an iterator returned rather than a list or numpy array.
You can use itertools cycle for that.
In [3]: from itertools import cycle
In [4]: for x in cycle(['A','B','C']):
...: print(x)
...:
C
A
B
C
A
B
C
A
B
C
A
B
C
A
B
C
A
B
Edit:
If you want to implement it with out loops, you are going to need recursive functions. Solutions based on itertools cycle and the like are just hiding the loops behind the imported function.
In [5]: def repeater(arr, n):
...: yield arr[0]
...: yield arr[1]
...: yield arr[2]
...: if n == 0:
...: yield StopIteration
...: else:
...: yield from repeater(arr, n-1)
...:
I want to create a toy training set from the XOR function:
xor = [[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 1, 0]]
input_x = np.random.choice(a=xor, size=200)
However, this is giving me
{ValueError} 'a' must be 1-dimensoinal
But, if I add e.g. a number to this list:
xor = [[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 1, 0],
1337] # With this it will work
input_x = np.random.choice(a=xor, size=200)
it starts to work. Why is this the case and how can I make this work without having to add another primitive to the xor list?
In case of an array I would do the following:
xor = np.array([[0,0,0],
[0,1,1],
[1,0,1],
[1,1,0]])
rnd_indices = np.random.choice(len(xor), size=200)
xor_data = xor[rnd_indices]
If you want a random list from xor, you should probably be doing this.
xor[np.random.choice(len(xor),1)]
You can use the random package instead:
import random
input_x = [random.choice(xor) for _ in range(200)]
Interesting!! Seems that numpy implicitly converts the input to np.array first. so, for your first input
np.array(xor).shape == (4, 3)
while for the second value
np.array(xor).shape == (5, )
so, the second value is seen by numpy as 1d!!!
So, to pick a random row, just pick a random index, and then the corresponding row
ind = np.choice(len(xor))
random_row = xor[ind, :]
With focus on performance, we could use the decimal number equivalents of those four numbers, feed those to np.random.choice() to generate 200 such numbers randomly chosen and finally get their binary equivalents with bit-shift operation.
Thus, an implementation would be -
def bitshift_approach(N):
nums = np.random.choice(np.array([0,3,5,6]),size=(N))
return ((nums & (1 << np.arange(3))[:,None])!=0).T.astype(int)
Another approach would be using very similar to what others have suggested to use np.random.choice(len(xor) to generate the row indices and then use row-indexing to select rows off xor. A slight modification to that would be to use np.take to select those rows. With such repeated indices, as is the case here, this should be performant.
Thus, an alternative approach would be -
np.take(xor,np.random.choice(len(xor), size=N))
Runtime test -
In [42]: N = 200
In [43]: %timeit xor[np.random.choice(np.arange(len(xor)), size=N)]
...: %timeit xor[np.random.choice(len(xor), size=N)]
...: %timeit bitshift_approach(N)
...: %timeit np.take(xor,np.random.choice(len(xor), size=N))
...:
10000 loops, best of 3: 43.3 µs per loop
10000 loops, best of 3: 38.3 µs per loop
10000 loops, best of 3: 59.4 µs per loop
10000 loops, best of 3: 35 µs per loop
In [44]: N = 1000
In [45]: %timeit xor[np.random.choice(np.arange(len(xor)), size=N)]
...: %timeit xor[np.random.choice(len(xor), size=N)]
...: %timeit bitshift_approach(N)
...: %timeit np.take(xor,np.random.choice(len(xor), size=N))
...:
10000 loops, best of 3: 69.5 µs per loop
10000 loops, best of 3: 64.7 µs per loop
10000 loops, best of 3: 77.7 µs per loop
10000 loops, best of 3: 38.7 µs per loop
In [46]: N = 10000
In [47]: %timeit xor[np.random.choice(np.arange(len(xor)), size=N)]
...: %timeit xor[np.random.choice(len(xor), size=N)]
...: %timeit bitshift_approach(N)
...: %timeit np.take(xor,np.random.choice(len(xor), size=N))
...:
1000 loops, best of 3: 363 µs per loop
1000 loops, best of 3: 351 µs per loop
1000 loops, best of 3: 225 µs per loop
10000 loops, best of 3: 134 µs per loop
You can use random.choice() directly and just run it 200 times to get 200 sample
Since np.random.choice() requires values to be in 1 d shape like ["1","2","3"]
and can't work with a list of lists or list of tuples only list of scaler values
I am new to NumPy and am trying to use it in my code for some tables.
I have a list of coordinates that looks like this:
coordinates = [["2 0"], ["0 1"], ["3 4"]]
and want to write it like this:
coordinatesNumpy = np.array([[2, 0], [0, 1], [3, 4]])
In regular Python that's easy to do but how do you do it with NumPy? Should I just make the table with regular Python functions for lists and then convert the 2d table to np.array or does NumPy have methods for splitting and stuff?
I tried some things but they all give me an error. The latest thing I tried:
flowers = np.array([np.array([int(coordinate[0]), int(coordinate[2])]) for coordinate in coordinates])
How could I do something like this with NumPy?
Take a look at numpy.fromstring:
coordinates_numpy = np.array([np.fromstring(i, dtype=int, sep=' ')
for j in coordinates for i in j])
List comprehension with pure Python
This works:
>>> flowers = np.array([[int(x) for x in coordinate[0].split()]
for coordinate in coordinates])
>>> flowers
array([[2, 0],
[0, 1],
[3, 4]])
I am not aware of any NumPy function that would do this in one step.
Performance
Let's check how fast things are.
For your example data, the pure Python version is the fastest:
%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in coordinates for i in j])
100000 loops, best of 3: 18.4 µs per loop
%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in coordinates])
10000 loops, best of 3: 19 µs per loop
%timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in coordinates])
100000 loops, best of 3: 12.1 µs per loop
Make the data bigger:
long_coords = coordinates * 1000
But still, the pure Python version is the fastest:
%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in long_coords for i in j])
100 loops, best of 3: 12.2 ms per loop
%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in long_coords])
100 loops, best of 3: 14.2 ms per loop
%timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in long_coords])
100 loops, best of 3: 7.54 ms per loop
Consistent results for even larger data:
very_long_coords = coordinates * 10000
%timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in very_long_coords for i in j])
10 loops, best of 3: 125 ms per loop
%timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in very_long_coords])
10 loops, best of 3: 140 ms per loop
%timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in very_long_coords])
10 loops, best of 3: 73.5 ms per loop
Assuming C as the input list, two approaches could be suggested to solve it.
Approach #1 : Using one level of list comprehension with np.fromstring -
np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in C])
Approach #2 : Vectorized approach using padding with np.core.defchararray.add and then getting the separated numerals -
np.fromstring(np.core.defchararray.add(C," "),dtype=int,sep=" ").reshape(len(C),-1)
Sample runs -
In [82]: C = [['2 0'], ['0 1'], ['3 4']]
In [83]: np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in C])
Out[83]:
array([[2, 0],
[0, 1],
[3, 4]])
In [84]: np.fromstring(np.core.defchararray.add(C, " "),dtype=int,sep=" ").reshape(len(C),-1)
Out[84]:
array([[2, 0],
[0, 1],
[3, 4]])
Benchmarking
Borrowing the benchmarking code from #Mike Müller's solution, here are the runtimes for the long_coords and very_long_coords cases -
In [78]: coordinates = [["2 0"], ["0 1"], ["3 4"]]
...: long_coords = coordinates * 1000
...: %timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in long_coords for i in j])
...: %timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in long_coords])
...: %timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in long_coords])
...: %timeit np.fromstring(np.core.defchararray.add(long_coords, " "), dtype=int,sep=" ").reshape(len(long_coords),-1)
...:
100 loops, best of 3: 7.27 ms per loop
100 loops, best of 3: 9.52 ms per loop
100 loops, best of 3: 6.84 ms per loop
100 loops, best of 3: 2.73 ms per loop
In [79]: coordinates = [["2 0"], ["0 1"], ["3 4"]]
...: very_long_coords = coordinates * 10000
...: %timeit np.array([np.fromstring(i, dtype=int, sep=' ') for j in very_long_coords for i in j])
...: %timeit np.array([np.fromstring(item[0], dtype=int, sep=' ').tolist() for item in very_long_coords])
...: %timeit np.array([[int(x) for x in coordinate[0].split()] for coordinate in very_long_coords])
...: %timeit np.fromstring(np.core.defchararray.add(very_long_coords, " "), dtype=int,sep=" ").reshape(len(very_long_coords),-1)
...:
10 loops, best of 3: 80.7 ms per loop
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 71 ms per loop
10 loops, best of 3: 27.2 ms per loop
Say I have two arrays a and b,
a.shape = (5,2,3)
b.shape = (2,3)
then c = a * b will give me an array c of shape (5,2,3) with c[i,j,k] = a[i,j,k]*b[j,k].
Now the situation is,
a.shape = (5,2,3)
b.shape = (2,3,8)
and I want c to have a shape (5,2,3,8) with c[i,j,k,l] = a[i,j,k]*b[j,k,l].
How to do this efficiently? My a and b are actually quite large.
This should work:
a[..., numpy.newaxis] * b[numpy.newaxis, ...]
Usage:
In : a = numpy.random.randn(5,2,3)
In : b = numpy.random.randn(2,3,8)
In : c = a[..., numpy.newaxis]*b[numpy.newaxis, ...]
In : c.shape
Out: (5, 2, 3, 8)
Ref: Array Broadcasting in numpy
Edit: Updated reference URL
I think the following should work:
import numpy as np
a = np.random.normal(size=(5,2,3))
b = np.random.normal(size=(2,3,8))
c = np.einsum('ijk,jkl->ijkl',a,b)
and:
In [5]: c.shape
Out[5]: (5, 2, 3, 8)
In [6]: a[0,0,1]*b[0,1,2]
Out[6]: -0.041308376453821738
In [7]: c[0,0,1,2]
Out[7]: -0.041308376453821738
np.einsum can be a bit tricky to use, but is quite powerful for these sort of indexing problems:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html
Also note that this requires numpy >= v1.6.0
I'm not sure about efficiency for your particular problem, but if it doesn't perform as well as needed, definitely look into using Cython with explicit for loops, and possibly parallelize it using prange
UPDATE
In [18]: %timeit np.einsum('ijk,jkl->ijkl',a,b)
100000 loops, best of 3: 4.78 us per loop
In [19]: %timeit a[..., np.newaxis]*b[np.newaxis, ...]
100000 loops, best of 3: 12.2 us per loop
In [20]: a = np.random.normal(size=(50,20,30))
In [21]: b = np.random.normal(size=(20,30,80))
In [22]: %timeit np.einsum('ijk,jkl->ijkl',a,b)
100 loops, best of 3: 16.6 ms per loop
In [23]: %timeit a[..., np.newaxis]*b[np.newaxis, ...]
100 loops, best of 3: 16.6 ms per loop
In [2]: a = np.random.normal(size=(500,20,30))
In [3]: b = np.random.normal(size=(20,30,800))
In [4]: %timeit np.einsum('ijk,jkl->ijkl',a,b)
1 loops, best of 3: 3.31 s per loop
In [5]: %timeit a[..., np.newaxis]*b[np.newaxis, ...]
1 loops, best of 3: 2.6 s per loop