I have tried code below to multiply float element of matrix a which is less then one by any integer but it's not working on the other hand it is working properly for the matrix whose element is not a float i.e if you define matix a = np.arange(9).reshape(3,3) then it's working.
import numpy as np
a = np.linspace(0,1,9).reshape(3,3)
print(a)
print('new matrix')
for x in np.nditer(a, op_flags = ['readwrite']):
if x in range(0,1):
x[...] = 100*x
print(a)
In [130]: a = np.linspace(0,1,9).reshape(3,3)
In [131]: a
Out[131]:
array([[0. , 0.125, 0.25 ],
[0.375, 0.5 , 0.625],
[0.75 , 0.875, 1. ]])
I don't usually recommend using nditer to iterate through an array. It's hard to use right, and rarely, if ever, improves speed. I'm not sure who or what is prompting people to use it. Its docs could use a stronger speed disclaimer.
Anyways, lets examine what's happening.
In [136]: for x in np.nditer(a, op_flags = ['readwrite']):
...: print(type(x), x, x.shape)
...: if x in range(0,1):
...: x[...] = 100*x
...: print('mul')
...:
<class 'numpy.ndarray'> 0.0 ()
mul
<class 'numpy.ndarray'> 0.125 ()
<class 'numpy.ndarray'> 0.25 ()
<class 'numpy.ndarray'> 0.375 ()
<class 'numpy.ndarray'> 0.5 ()
<class 'numpy.ndarray'> 0.625 ()
<class 'numpy.ndarray'> 0.75 ()
<class 'numpy.ndarray'> 0.875 ()
<class 'numpy.ndarray'> 1.0 ()
nditer runs through every element of the array (not rows), producing a 0d view each time (shape ()). Only one of those elements is 0, so it multiplies by 100. None of the others are in range(0,1) (Only 0 in range(0,1), everything else is False).
So the iteration is work, at least as coded, if not as you intend.
a = np.arange(9).reshape(3,3) doesn't change anything. Only the 0 is in range(0,1),
===
Change the if test:
In [146]: a = np.linspace(0,1,9).reshape(3,3)
In [147]: a
Out[147]:
array([[0. , 0.125, 0.25 ],
[0.375, 0.5 , 0.625],
[0.75 , 0.875, 1. ]])
In [148]: for x in np.nditer(a, op_flags = ['readwrite']):
...: if x<1:
...: x[...] = 100*x
...: print('mul')
...:
mul
...
mul
In [149]: a
Out[149]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
An alternative to nditer is a flat iteration. In some ways that's messier since it requires a enumerate if we want to modify original values:
In [150]: a = np.linspace(0,1,9).reshape(3,3)
In [151]: for i,v in enumerate(a.flat):
...: if v<1:
...: a.flat[i] *= 100
...:
In [152]: a
Out[152]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
But despite some claims in the nditer docs, it isn't faster:
In [153]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: for i,v in enumerate(a.flat):
...: if v<1:
...: a.flat[i] *= 100
5.4 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [154]:
In [154]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: for x in np.nditer(a, op_flags = ['readwrite']):
...: if x<1:
...: x[...] = 100*x
34.4 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
===
But normally you shouldn't be iterating on an array. A whole-array, vectorized, approach is:
In [157]: mask = a<1
In [158]: mask
Out[158]:
array([[ True, True, True],
[ True, True, True],
[ True, True, False]])
In [159]: a[mask] *= 100
In [160]: a
Out[160]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
In [161]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: a[a<1] *= 100
12.5 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Ouch! that's slower than the flat enumerate - for this small example. For a much larger a this will do much better.
You can apply the range "manually" in a boolean filter to multiply only the elements you are targeting:
import numpy as np
a = np.linspace(0,1,9).reshape(3,3)
print(a)
print('new matrix')
a[(a>0) & (a<1)] *= 100
print(a)
[[0. 0.125 0.25 ]
[0.375 0.5 0.625]
[0.75 0.875 1. ]]
new matrix
[[ 0. 12.5 25. ]
[37.5 50. 62.5]
[75. 87.5 1. ]]
note that your linear space will only ever generate 1 value that is outside of the range (the last one) so you might as well multiply everything and reassign the last value to 1
Related
I have seen Python get column vector from array of tuples, which I expected would have answered my question, but it doesn't.
So, I've prepared an example based on an example in that post, which shows what I want to do, and where I get stuck:
import numpy as np
# based on https://stackoverflow.com/a/48716125/6197439
# arr is a numpy array of tuple "pairs" of floats
oarr = [(0.109, 0.5), (0.109, 0.55), (0.109, 0.6), (0.2, 0.4), (0.3, 0.5)]
arr = np.array(oarr)
print("arr type: {} shape: {} dt {}".format(
type(arr), arr.shape, arr.dtype)) # arr type: <class 'numpy.ndarray'> shape: (5, 2) dt float64
print("slice arr[:, 1]: {}".format(arr[:, 1])) # slice arr[:, 1]: [0.5 0.55 0.6 0.4 0.5 ]
print("slice arr[0, :]: {}".format(arr[0, :])) # slice arr[0, :]: [0.109 0.5 ]
print("arr len: {}".format(len(arr))) # arr len: 5
# arr2, instead, becomes a numpy array of tuple "pairs",
# with first element tuple of string and float, and second element float
# arr2 can still be sliced by numpy fine:
oarr2 = []
for ix in range(len(arr)):
oarr2.append( ( (str(oarr[ix][0]), oarr[ix][0]), oarr[ix][1] ) )
arr2 = np.array( oarr2, dtype=object )
print("arr2 type: {} shape: {} dt {}".format(
type(arr2), arr2.shape, arr2.dtype)) # arr2 type: <class 'numpy.ndarray'> shape: (5, 2) dt object
print("slice arr2[:, 1]: {}".format(arr2[:, 1])) # slice arr2[:, 1]: [0.5 0.55 0.6 0.4 0.5]
print("slice arr2[0, :]: {}".format(arr2[0, :])) # slice arr2[0, :]: [('0.109', 0.109) 0.5]
print("arr2 len: {}".format(len(arr2))) # arr2 len: 5
# arr2fc is where we attempt to extract the tuples in arr2 "first column",
# using numpy slicing syntax.
# arr2fc is now a numpy array of objects, as previously,
# but these objects (tuple pairs of string and float),
# are now *not* considered objects with lengths, (see .shape below)
# so extracting e.g. the first column (the string element)
# of the tuple, with numpy slicing syntax, fails:
arr2fc = arr2[:, 0]
print(arr2fc) # [('0.109', 0.109) ('0.109', 0.109) ('0.109', 0.109) ('0.2', 0.2) ('0.3', 0.3)]
print("arr2fc type: {} shape: {} dt {}".format(
type(arr2fc), arr2fc.shape, arr2fc.dtype)) # arr2fc type: <class 'numpy.ndarray'> shape: (5,) dt object
print("slice arr2fc[:, 1]: {}".format(arr2fc[:, 1])) # IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Basically, I'd like to extract the "columns" formed by tuples in arr2fc as separate numpy arrays; so from the column formed by first (the string) element of this tuple, I'd like to get numpy array of object (here string):
[ '0.109', '0.109', '0.109', '0.2', '0.3' ]
... and from the column formed by second (the float) element of this tuple, I'd like to get numpy array of float:
[ 0.109, 0.109, 0.109, 0.2, 0.3 ]
Sure, I can always do a Python loop, then iterate and populate an empty Python list, then convert that to numpy array -- however, is there something like a numpy slicing syntax, that would enable me to extract these "columns" with a one-liner, avoiding Python loops?
For that you might want to use numpy vectorize. With numpy vectorize you can "vectorize" a function so that it can be applied on an input array and produce a new array or a tuple of arrays. For your example that could look like
vectorized_split = np.vectorize(lambda x: (x[0],x[1]))
string_array,float_array = vectorized_split(arr2fc)
It is important to note that this will not give you any numpy vectorization performance gains, as it just runs a for loop under the hood. However, when you cannot make use of numpy vectorization like in this case, it gives you at least a compact codebase.
Your code as displayed in ipython:
In [178]: oarr = [(0.109, 0.5), (0.109, 0.55), (0.109, 0.6), (0.2, 0.4), (0.3,0.5)]
...: arr = np.array(oarr)
In [179]: oarr
Out[179]: [(0.109, 0.5), (0.109, 0.55), (0.109, 0.6), (0.2, 0.4), (0.3, 0.5)]
In [180]: arr
Out[180]:
array([[0.109, 0.5 ],
[0.109, 0.55 ],
[0.109, 0.6 ],
[0.2 , 0.4 ],
[0.3 , 0.5 ]])
So starting with a list of tuples, we get a 2d array, with float dtype. A list of lists would work the same way.
Your next array:
In [181]: oarr2 = []
...: for ix in range(len(arr)):
...: oarr2.append( ( (str(oarr[ix][0]), oarr[ix][0]), oarr[ix][1] ) )
...: arr2 = np.array( oarr2, dtype=object )
In [182]: oarr2
Out[182]:
[(('0.109', 0.109), 0.5),
(('0.109', 0.109), 0.55),
(('0.109', 0.109), 0.6),
(('0.2', 0.2), 0.4),
(('0.3', 0.3), 0.5)]
In [183]: arr2
Out[183]:
array([[('0.109', 0.109), 0.5],
[('0.109', 0.109), 0.55],
[('0.109', 0.109), 0.6],
[('0.2', 0.2), 0.4],
[('0.3', 0.3), 0.5]], dtype=object)
Again a 2d list, (5,2), but with a tuple as one element in each row.
Selecting a column:
In [184]: arr2fc = arr2[:, 0]
In [185]: arr2fc
Out[185]:
array([('0.109', 0.109), ('0.109', 0.109), ('0.109', 0.109), ('0.2', 0.2),
('0.3', 0.3)], dtype=object)
In [186]: _.shape
Out[186]: (5,)
A 1d array of objects - each a tuple.
Converting it back to list, we can make a 2d array and again index a column:
In [187]: arr2fc.tolist()
Out[187]:
[('0.109', 0.109),
('0.109', 0.109),
('0.109', 0.109),
('0.2', 0.2),
('0.3', 0.3)]
In [188]: np.array(arr2fc.tolist(),object)
Out[188]:
array([['0.109', 0.109],
['0.109', 0.109],
['0.109', 0.109],
['0.2', 0.2],
['0.3', 0.3]], dtype=object)
In [189]: _[:,1]
Out[189]: array([0.109, 0.109, 0.109, 0.2, 0.3], dtype=object)
or with a list comprehension:
In [190]: [x[1] for x in arr2fc]
Out[190]: [0.109, 0.109, 0.109, 0.2, 0.3]
Multidimensional indexing only works on the dimensions shown by the shape. It does not "reach through" and index the objects, even if they are, by themselves, indexable.
Some comparative times:
In [194]: timeit string_array,float_array = vectorized_split(arr2fc)
31.5 µs ± 277 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [195]: timeit [x[1] for x in arr2fc]
1.57 µs ± 1.07 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [196]: timeit np.array(arr2fc.tolist(),object)[:,1]
3.77 µs ± 65 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Here the "vectorize" method is much slower. For large arrays, "vectorize" speeds are closer to the list comprehension speeds.
In a situation like the one below, how do I vstack the two matrices?
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]])
b = np.array([[2,2],[2,2],[2,2]])
a = np.vstack([a, b])
Output:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2
The output I would like would look like this:
a = array([[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]],
[[2, 2],
[2, 2],
[2, 2]]])
My goal is to then to loop over the content of the stacked matrices, index each matrix and call a function on a specific row.
for matrix in a:
row = matrix[1]
print(row)
Output:
[3, 3, 3]
[2, 2]
Be careful with those "Numpy is faster" claims. If you already have arrays, and make full use of array methods, numpy is indeed faster. But if you start with lists, or have to use Python level iteration (as you do in Pack...), the numpy version might well be slower.
Just doing a time test on the Pack step:
In [12]: timeit Pack_Matrices_with_NaN([a,b,c],5)
221 µs ± 9.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Compare that with fetching the first row of each array with a simple list comprehension:
In [13]: [row[1] for row in [a,b,c]]
Out[13]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [14]: timeit [row[1] for row in [a,b,c]]
808 ns ± 2.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
200 µs compared to less than 1 µs!
And timing your Unpack:
In [21]: [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in range(3)]
...:
Out[21]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [22]: timeit [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in ra
...: nge(3)]
199 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I was able to solve this only using NumPy. As NumPy is significantly faster than python's list function (https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347) I wanted to share my answer as it might be useful to others.
I started with adding np.NaN to make the two arrays the same shape.
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)
# Extend each vector in array with Nan to reach same shape
b = np.insert(b, 2, np.nan, axis=1)
# Now vstack the arrays
a = np.vstack([[a], [b]])
print(a)
Output:
[[[ 3. 3. 3.]
[ 3. 3. 3.]
[ 3. 3. 3.]]
[[ 2. 2. nan]
[ 2. 2. nan]
[ 2. 2. nan]]]
Then I wrote a function to unpack each array in a, and remove the nan.
def Unpack_Matrix_with_NaN(Matrix_with_nan, matrix_of_interest):
for first_row in Matrix_with_nan[matrix_of_interest,:1]:
# find shape of matrix row without nan
first_row_without_nan = first_row[~np.isnan(first_row)]
shape = first_row_without_nan.shape[0]
matrix_without_nan = np.arange(shape)
for row in Matrix_with_nan[matrix_of_interest]:
row_without_nan = row[~np.isnan(row)]
matrix_without_nan = np.vstack([matrix_without_nan, row_without_nan])
# Remove vector specifying shape
matrix_without_nan = matrix_without_nan[1:]
return matrix_without_nan
I could then loop through the matrices, find my desired row, and print it.
Matrix_with_nan = a
for matrix in range(len(Matrix_with_nan)):
matrix_of_interest = Unpack_Matrix_with_NaN(a, matrix)
row = matrix_of_interest[1]
print(row)
Output:
[3. 3. 3.]
[2. 2.]
I also made a function to pack matrices when more than one nan needs to be added per row:
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)
c = np.array([[4,4,4,4],[4,4,4,4],[4,4,4,4]]).astype(float)
# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices, Matrix_size):
Matrix_with_nan = np.arange(Matrix_size)
for array in List_of_matrices:
start_position = len(array[0])
for x in range(start_position,Matrix_size):
array = np.insert(array, (x), np.nan, axis=1)
Matrix_with_nan = np.vstack([Matrix_with_nan, array])
Matrix_with_nan = Matrix_with_nan[1:]
return Matrix_with_nan
arrays = [a,b,c]
packed_matrices = Pack_Matrices_with_NaN(arrays, 5)
print(packed_matrices)
Output:
[[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]]
This is what I currently have:
import numpy as np
data = [0.2, 0.6, 0.3, 0.5]
vecs = np.reshape([np.arange(len(data)),data], (2, -1)).transpose()
vecs
array([[ 0. , 0.2],
[ 1. , 0.6],
[ 2. , 0.3],
[ 3. , 0.5]])
This gives me the correct data as I want it, but it seems complex. Am I missing a trick?
You can simplify with np.stack and transpose:
data = np.array([0.2, 0.6, 0.3, 0.5])
np.stack([np.arange(len(data)), data], axis=1)
array([[0. , 0.2],
[1. , 0.6],
[2. , 0.3],
[3. , 0.5]])
Timings -
a = np.random.random(10000)
%timeit np.stack([np.arange(len(a)), a], axis=1)
# 26.3 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.array([*enumerate(a)])
# 4.51 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can try enumerate:
>>> np.array([*enumerate(data)])
array([[0. , 0.2],
[1. , 0.6],
[2. , 0.3],
[3. , 0.5]])
I would like to add float coordinates to a numpy array by splitting the intensity based on the centre of mass of the coordinate to neighbouring pixels.
As an example with integers:
import numpy as np
arr = np.zeros((5, 5), dtype=float)
coord = [2, 2]
arr[coord[0], coord[1]] = 1
arr
>>> array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
However I would like to distribute the intensity across neighbouring pixels when coord is float data, eg. coord = [2.2, 1.7].
I have considered using a gaussian, eg:
grid = np.meshgrid(*[np.arange(i) for i in arr.shape], indexing='ij')
out = np.exp(-np.dstack([(grid[i]-c)**2 for i, c in enumerate(coord)]).sum(axis=-1) / 0.5**2)
which gives good results, but becomes slow for 3d data and thousands of points.
Any advice or ideas would be appreciated, thanks.
Based on #rpoleski suggestion, take a local region and apply weighting by distance. This is a good idea although the implementation I have does not maintain the original centre of mass of the coordinates, for example:
from scipy.ndimage import center_of_mass
coord = [2.2, 1.7]
# get region coords
grid = np.meshgrid(*[range(2) for i in coord], indexing='ij')
# difference Euclidean distance between coords and coord
delta = np.linalg.norm(np.dstack([g-(c%1) for g, c, in zip(grid, coord)]), axis=-1)
value = 3 # pixel value of original coord
# create final array by 1/delta, ie. closer is weighted more
# normalise by sum of 1/delta
out = value * (1/delta) / (1/delta).sum()
out.sum()
>>> 3.0 # as expected
# but
center_of_mass(out)
>>> (0.34, 0.63) # should be (0.2, 0.7) in this case, ie. from coord
Any ideas?
Here is a simple (and hence most probably fast enough) solution that keeps the center of mass and has sum = 1:
arr = np.zeros((5, 5), dtype=float)
coord = [2.2, 0.7]
indexes = np.array([[x, y] for x in [int(coord[0]), int(coord[0])+1] for y in [int(coord[1]), int(coord[1])+1]])
values = [1. / (abs(coord[0]-index[0]) * abs(coord[1]-index[1])) for index in indexes]
sum_values = sum(values)
for (value, index) in zip(values, indexes):
arr[index[0], index[1]] = value / sum_values
print(arr)
print(center_of_mass(arr))
which results in:
[[0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. ]
[0. 0.24 0.56 0. 0. ]
[0. 0.06 0.14 0. 0. ]
[0. 0. 0. 0. 0. ]]
(2.2, 1.7)
Note: I'm using taxicab distances - they're good for center of mass calculations.
For anyone needing this functionality, and thanks to #rpoleski, I came up with this which uses Numba to speed up the calculation.
#numba.njit
def _add_floats_to_array_2d(coords, arr, values):
"""
Distribute float values around neighbouring pixels in array whilst maintinating center of mass.
Uses Manhattan (taxicab) distances for center of mass calculation.
This function uses numba to speed up the calculation but is limited to exactly 2D.
Parameters
----------
coords: (N, ndim) ndarray
Floats to distribute into array.
arr: ndim ndarray
Floats will be distributed into this array.
Array is modified in place.
values: (N,) arraylike
The total value of each coord to distribute into arr.
"""
indices_local = np.array([[[0, 0], [1, 0]], [[0, 1], [1, 1]]])
for i, c in enumerate(coords):
temp_abs = np.abs(indices_local - np.remainder(c, 1))
temp = 1.0 / (temp_abs[..., 0] * temp_abs[..., 1])
# handle perfect integers
for j in range(temp.shape[0]):
for k in range(temp.shape[1]):
if np.isinf(temp[j, k]):
temp[j, k] = 0
arr[int(c[0]) : int(c[0]) + 2, int(c[1]) : int(c[1]) + 2] += (
values[i] * temp / temp.sum()
)
Some testing:
arr = np.zeros((256, 256))
coords = np.random.rand(10000, 2) * arr.shape[0]
values = np.ones(len(coords))
%timeit arr[tuple(coords.astype(int).T)] = values
>>> 106 µs ± 4.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit _add_floats_to_array_2d(coords, arr, values)
>>> 13.5 ms ± 546 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In fact it is better to compare to compare to a buffered function as the first test will overwrite any previous values instead of accumulating:
%timeit np.add.at(arr, tuple(coords.astype(int).T), values)
>>> 1.23 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For example, I got matrix A of shape (3,2,2), e.g.
[
[[1,1],[1,1]],
[[2,2],[2,2]],
[[3,3],[3,3]]
]
and matrix B of shape (2,2), e.g.
[[1, 1], [0,1]]
I would like to achieve c of shape (3,2,2) like:
c = np.zeros((3,2,2))
for i in range(len(A)):
c[i] = np.dot(B, A[i,:,:])
which gives
[[[2. 2.]
[1. 1.]]
[[4. 4.]
[2. 2.]]
[[6. 6.]
[3. 3.]]]
What is the most efficient way to achieve this?
Thanks.
Use np.tensordot and then swap axes. So, use one of these -
np.tensordot(B,A,axes=((1),(1))).swapaxes(0,1)
np.tensordot(A,B,axes=((1),(1))).swapaxes(1,2)
We can reshape A to 2D after swapping axes, use 2D matrix multiplication with np.dot and reshape and swap axes to maybe gain marginal performance boost.
Timings -
# Original approach
def orgapp(A,B):
m = A.shape[0]
n = B.shape[0]
r = A.shape[2]
c = np.zeros((m,n,r))
for i in range(len(A)):
c[i] = np.dot(B, A[i,:,:])
return c
In [91]: n = 10000
...: A = np.random.rand(n,2,2)
...: B = np.random.rand(2,2)
In [92]: %timeit orgapp(A,B)
100 loops, best of 3: 12.2 ms per loop
In [93]: %timeit np.tensordot(B,A,axes=((1),(1))).swapaxes(0,1)
1000 loops, best of 3: 191 µs per loop
In [94]: %timeit np.tensordot(A,B,axes=((1),(1))).swapaxes(1,2)
1000 loops, best of 3: 208 µs per loop
# #Bitwise's solution
In [95]: %timeit np.flip(np.dot(A,B).transpose((0,2,1)),1)
1000 loops, best of 3: 697 µs per loop
Another solution:
np.flip(np.dot(A,B).transpose((0,2,1)),1)