Get product of two one dimensional array in python - python

Something very simple in Matlab, but I can't get it in Python. How to get the following:
x=np.array([1,2,3])
y=np.array([4,5,6,7])
z=x.T*y
z=
[[4,5,6,7],
[8,10,12,14],
[12,15,18,21]]
As in
x [4][5][6][7]
[1]
[2]
[3]

In scientific python that would be an outer product np.outer(x,y)
See http://docs.scipy.org/doc/numpy/reference/generated/numpy.outer.html:
import numpy;
>>> x=numpy.array([1,2,3])
>>> y=numpy.array([4,5,6,7])
>>> numpy.outer(x,y)
array([[ 4, 5, 6, 7],
[ 8, 10, 12, 14],
[12, 15, 18, 21]])

In MATLAB, size(x) is (1,3). So x' is (3,1). Multiply that by y, which is (1,4), produces (3,4) shape.
In numpy, x.shape is (3,). x.T is the same. So to get the same outer product, you need to expand the dimensions of x and y. One way is with reshape.
z = x.reshape(3,1)* y.reshape(1,4)
numpy also lets you do this with a newaxis indexing (None also works). It also automatically adds beginning newaxis if that is needed. So this also does the job:
z = x[:,np.newaxis]*y
np.outer does exactly this (with a minor embelishment): a.ravel()[:, newaxis]*b.ravel()[newaxis,:].
There's another tool in numpy
z = np.einsum('i,j->ij',x,y)
It is based on an indexing notation that is popular in physics, and is especially useful in writing more complicated inner (dot) products.

Using list comprehension:
x = [1, 2, 3]
y = [4, 5, 6, 7]
z = [[i * j for j in y] for i in x]

Related

Numpy convolving along an axis for 2 2D-arrays

I have 2 2D-arrays. I am trying to convolve along the axis 1. np.convolve doesn't provide the axis argument. The answer here, convolves 1 2D-array with a 1D array using np.apply_along_axis. But it cannot be directly applied to my use case. The question here doesn't have an answer.
MWE is as follows.
import numpy as np
a = np.random.randint(0, 5, (2, 5))
"""
a=
array([[4, 2, 0, 4, 3],
[2, 2, 2, 3, 1]])
"""
b = np.random.randint(0, 5, (2, 2))
"""
b=
array([[4, 3],
[4, 0]])
"""
# What I want
c = np.convolve(a, b, axis=1) # axis is not supported as an argument
"""
c=
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
"""
I know I can do it using np.fft.fft, but it seems like an unnecessary step to get a simple thing done. Is there a simple way to do this? Thanks.
Why not just do a list comprehension with zip?
>>> np.array([np.convolve(x, y) for x, y in zip(a, b)])
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
>>>
Or with scipy.signal.convolve2d:
>>> from scipy.signal import convolve2d
>>> convolve2d(a, b)[[0, 2]]
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
>>>
One possibility could be to manually go the way to the Fourier spectrum, and back:
n = np.max([a.shape, b.shape]) + 1
np.abs(np.fft.ifft(np.fft.fft(a, n=n) * np.fft.fft(b, n=n))).astype(int)
# array([[16, 20, 6, 16, 24, 9],
# [ 8, 8, 8, 12, 4, 0]])
Would it be considered too ugly to loop over the orthogonal dimension? That would not add much overhead unless the main dimension is very short. Creating the output array ahead of time ensures that no memory needs to be copied about.
def convolvesecond(a, b):
N1, L1 = a.shape
N2, L2 = b.shape
if N1 != N2:
raise ValueError("Not compatible")
c = np.zeros((N1, L1 + L2 - 1), dtype=a.dtype)
for n in range(N1):
c[n,:] = np.convolve(a[n,:], b[n,:], 'full')
return c
For the generic case (convolving along the k-th axis of a pair of multidimensional arrays), I would resort to a pair of helper functions I always keep on hand to convert multidimensional problems to the basic 2d case:
def semiflatten(x, d=0):
'''SEMIFLATTEN - Permute and reshape an array to convenient matrix form
y, s = SEMIFLATTEN(x, d) permutes and reshapes the arbitrary array X so
that input dimension D (default: 0) becomes the second dimension of the
output, and all other dimensions (if any) are combined into the first
dimension of the output. The output is always 2-D, even if the input is
only 1-D.
If D<0, dimensions are counted from the end.
Return value S can be used to invert the operation using SEMIUNFLATTEN.
This is useful to facilitate looping over arrays with unknown shape.'''
x = np.array(x)
shp = x.shape
ndims = x.ndim
if d<0:
d = ndims + d
perm = list(range(ndims))
perm.pop(d)
perm.append(d)
y = np.transpose(x, perm)
# Y has the original D-th axis last, preceded by the other axes, in order
rest = np.array(shp, int)[perm[:-1]]
y = np.reshape(y, [np.prod(rest), y.shape[-1]])
return y, (d, rest)
def semiunflatten(y, s):
'''SEMIUNFLATTEN - Reverse the operation of SEMIFLATTEN
x = SEMIUNFLATTEN(y, s), where Y, S are as returned from SEMIFLATTEN,
reverses the reshaping and permutation.'''
d, rest = s
x = np.reshape(y, np.append(rest, y.shape[-1]))
perm = list(range(x.ndim))
perm.pop()
perm.insert(d, x.ndim-1)
x = np.transpose(x, perm)
return x
(Note that reshape and transpose do not create copies, so these functions are extremely fast.)
With those, the generic form can be written as:
def convolvealong(a, b, axis=-1):
a, S1 = semiflatten(a, axis)
b, S2 = semiflatten(b, axis)
c = convolvesecond(a, b)
return semiunflatten(c, S1)

count overlap between two numpy arrays [duplicate]

I want to get the intersecting (common) rows across two 2D numpy arrays. E.g., if the following arrays are passed as inputs:
array([[1, 4],
[2, 5],
[3, 6]])
array([[1, 4],
[3, 6],
[7, 8]])
the output should be:
array([[1, 4],
[3, 6])
I know how to do this with loops. I'm looking at a Pythonic/Numpy way to do this.
For short arrays, using sets is probably the clearest and most readable way to do it.
Another way is to use numpy.intersect1d. You'll have to trick it into treating the rows as a single value, though... This makes things a bit less readable...
import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)
For large arrays, this should be considerably faster than using sets.
You could use Python's sets:
>>> import numpy as np
>>> A = np.array([[1,4],[2,5],[3,6]])
>>> B = np.array([[1,4],[3,6],[7,8]])
>>> aset = set([tuple(x) for x in A])
>>> bset = set([tuple(x) for x in B])
>>> np.array([x for x in aset & bset])
array([[1, 4],
[3, 6]])
As Rob Cowie points out, this can be done more concisely as
np.array([x for x in set(tuple(x) for x in A) & set(tuple(x) for x in B)])
There's probably a way to do this without all the going back and forth from arrays to tuples, but it's not coming to me right now.
I could not understand why there is no suggested pure numpy way to get this working. So I found one, that uses numpy broadcast. The basic idea is to transform one of the arrays to 3d by axes swapping. Let's construct 2 arrays:
a=np.random.randint(10, size=(5, 3))
b=np.zeros_like(a)
b[:4,:]=a[np.random.randint(a.shape[0], size=4), :]
With my run it gave:
a=array([[5, 6, 3],
[8, 1, 0],
[2, 1, 4],
[8, 0, 6],
[6, 7, 6]])
b=array([[2, 1, 4],
[2, 1, 4],
[6, 7, 6],
[5, 6, 3],
[0, 0, 0]])
The steps are (arrays can be interchanged) :
#a is nxm and b is kxm
c = np.swapaxes(a[:,:,None],1,2)==b #transform a to nx1xm
# c has nxkxm dimensions due to comparison broadcast
# each nxixj slice holds comparison matrix between a[j,:] and b[i,:]
# Decrease dimension to nxk with product:
c = np.prod(c,axis=2)
#To get around duplicates://
# Calculate cumulative sum in k-th dimension
c= c*np.cumsum(c,axis=0)
# compare with 1, so that to get only one 'True' statement by row
c=c==1
#//
# sum in k-th dimension, so that a nx1 vector is produced
c=np.sum(c,axis=1).astype(bool)
# The intersection between a and b is a[c]
result=a[c]
In a function with 2 lines for used memory reduction (correct me if wrong):
def array_row_intersection(a,b):
tmp=np.prod(np.swapaxes(a[:,:,None],1,2)==b,axis=2)
return a[np.sum(np.cumsum(tmp,axis=0)*tmp==1,axis=1).astype(bool)]
which gave result for my example:
result=array([[5, 6, 3],
[2, 1, 4],
[6, 7, 6]])
This is faster than set solutions, as it makes use only of simple numpy operations, while it reduces constantly dimensions, and is ideal for two big matrices. I guess I might have made mistakes in my comments, as I got the answer by experimentation and instinct. The equivalent for column intersection can either be found by transposing the arrays or by changing the steps a little. Also, if duplicates are wanted, then the steps inside "//" have to be skipped. The function can be edited to return only the boolean array of the indices, which came handy to me ,while trying to get different arrays indices with the same vector. Benchmark for the voted answer and mine (number of elements in each dimension plays role on what to choose):
Code:
def voted_answer(A,B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
return C.view(A.dtype).reshape(-1, ncols)
a_small=np.random.randint(10, size=(10, 10))
b_small=np.zeros_like(a_small)
b_small=a_small[np.random.randint(a_small.shape[0],size=[a_small.shape[0]]),:]
a_big_row=np.random.randint(10, size=(10, 1000))
b_big_row=a_big_row[np.random.randint(a_big_row.shape[0],size=[a_big_row.shape[0]]),:]
a_big_col=np.random.randint(10, size=(1000, 10))
b_big_col=a_big_col[np.random.randint(a_big_col.shape[0],size=[a_big_col.shape[0]]),:]
a_big_all=np.random.randint(10, size=(100,100))
b_big_all=a_big_all[np.random.randint(a_big_all.shape[0],size=[a_big_all.shape[0]]),:]
print 'Small arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_small,b_small),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_small,b_small),number=100)/100
print 'Big column arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_col,b_big_col),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_col,b_big_col),number=100)/100
print 'Big row arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_row,b_big_row),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_row,b_big_row),number=100)/100
print 'Big arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_all,b_big_all),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_all,b_big_all),number=100)/100
with results:
Small arrays:
Voted answer: 7.47108459473e-05
Proposed answer: 2.47001647949e-05
Big column arrays:
Voted answer: 0.00198730945587
Proposed answer: 0.0560171294212
Big row arrays:
Voted answer: 0.00500325918198
Proposed answer: 0.000308241844177
Big arrays:
Voted answer: 0.000864889621735
Proposed answer: 0.00257176160812
Following verdict is that if you have to compare 2 big 2d arrays of 2d points then use voted answer. If you have big matrices in all dimensions, voted answer is the best one by all means. So, it depends on what you choose each time.
Numpy broadcasting
We can create a boolean mask using broadcasting which can be then used to filter the rows in array A which are also present in array B
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
m = (A[:, None] == B).all(-1).any(1)
>>> A[m]
array([[1, 4],
[3, 6]])
Another way to achieve this using structured array:
>>> a = np.array([[3, 1, 2], [5, 8, 9], [7, 4, 3]])
>>> b = np.array([[2, 3, 0], [3, 1, 2], [7, 4, 3]])
>>> av = a.view([('', a.dtype)] * a.shape[1]).ravel()
>>> bv = b.view([('', b.dtype)] * b.shape[1]).ravel()
>>> np.intersect1d(av, bv).view(a.dtype).reshape(-1, a.shape[1])
array([[3, 1, 2],
[7, 4, 3]])
Just for clarity, the structured view looks like this:
>>> a.view([('', a.dtype)] * a.shape[1])
array([[(3, 1, 2)],
[(5, 8, 9)],
[(7, 4, 3)]],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
np.array(set(map(tuple, b)).difference(set(map(tuple, a))))
This could also work
Without Index
Visit https://gist.github.com/RashidLadj/971c7235ce796836853fcf55b4876f3c
def intersect2D(Array_A, Array_B):
"""
Find row intersection between 2D numpy arrays, a and b.
"""
# ''' Using Tuple ''' #
intersectionList = list(set([tuple(x) for x in Array_A for y in Array_B if(tuple(x) == tuple(y))]))
print ("intersectionList = \n",intersectionList)
# ''' Using Numpy function "array_equal" ''' #
""" This method is valid for an ndarray """
intersectionList = list(set([tuple(x) for x in Array_A for y in Array_B if(np.array_equal(x, y))]))
print ("intersectionList = \n",intersectionList)
# ''' Using set and bitwise and '''
intersectionList = [list(y) for y in (set([tuple(x) for x in Array_A]) & set([tuple(x) for x in Array_B]))]
print ("intersectionList = \n",intersectionList)
return intersectionList
With Index
Visit https://gist.github.com/RashidLadj/bac71f3d3380064de2f9abe0ae43c19e
def intersect2D(Array_A, Array_B):
"""
Find row intersection between 2D numpy arrays, a and b.
Returns another numpy array with shared rows and index of items in A & B arrays
"""
# [[IDX], [IDY], [value]] where Equal
# ''' Using Tuple ''' #
IndexEqual = np.asarray([(i, j, x) for i,x in enumerate(Array_A) for j, y in enumerate (Array_B) if(tuple(x) == tuple(y))]).T
# ''' Using Numpy array_equal ''' #
IndexEqual = np.asarray([(i, j, x) for i,x in enumerate(Array_A) for j, y in enumerate (Array_B) if(np.array_equal(x, y))]).T
idx, idy, intersectionList = (IndexEqual[0], IndexEqual[1], IndexEqual[2]) if len(IndexEqual) != 0 else ([], [], [])
return intersectionList, idx, idy
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
def matching_rows(A,B):
matches=[i for i in range(B.shape[0]) if np.any(np.all(A==B[i],axis=1))]
if len(matches)==0:
return B[matches]
return np.unique(B[matches],axis=0)
>>> matching_rows(A,B)
array([[1, 4],
[3, 6]])
This of course assumes the rows are all the same length.
import numpy as np
A=np.array([[1, 4],
[2, 5],
[3, 6]])
B=np.array([[1, 4],
[3, 6],
[7, 8]])
intersetingRows=[(B==irow).all(axis=1).any() for irow in A]
print(A[intersetingRows])

Numpy multiplying different shapes

have two arrays that are like this
x = [a,b]
y = [p,q,r]
I need to multiply this together to a product c which should be like this,
c = [a*p, a*q, a*r, b*p, b*q, b*r]
However x*y gives the following error,
ValueError: operands could not be broadcast together with shapes (2,) (3,)
I can do something like this,
for i in range(len(x)):
for t in range(len(y)):
c.append(x[i] * y[t]
But really the length of my x and y is quite large so what's the most efficient way to make such a multiplication without the looping.
You can use NumPy broadcasting for pairwise elementwise multiplication between x and y and then flatten with .ravel(), like so -
(x[:,None]*y).ravel()
Or use outer product and then flatten -
np.outer(x,y).ravel()
Use Numpy dot...
>>> import numpy as np
>>> a=np.arange(1,3)# [1,2]
>>> b=np.arange(1,4)# [1,2,3]
>>> np.dot(a[:,None],b[None])
array([[1, 2, 3],
[2, 4, 6]])
>>> np.dot(a[:,None],b[None]).ravel()
array([1, 2, 3, 2, 4, 6])
>>>

Element-wise effecient multiplication of arrays of matrices

Suppose array_1 and array_2 are two arrays of matrices of the same sizes. Is there any vectorised way of multiplying element-wise, the elements of these two arrays(which their elements' multiplication is well defined)?
The dummy code:
def mat_multiply(array_1,array_2):
size=np.shape(array_1)[0]
result=np.array([])
for i in range(size):
result=np.append(result,np.dot(array_1[i],array_2[i]),axis=0)
return np.reshape(result,(size,2))
example input:
a=[[[1,2],[3,4]],[[1,2],[3,4]]]
b=[[1,3],[4,5]]
output:
[[ 7. 15.]
[ 14. 32.]]
Contrary to your first sentence, a and b are not the same size. But let's focus on your example.
So you want this - 2 dot products, one for each row of a and b
np.array([np.dot(x,y) for x,y in zip(a,b)])
or to avoid appending
X = np.zeros((2,2))
for i in range(2):
X[i,...] = np.dot(a[i],b[i])
the dot product can be expressed with einsum (matrix index notation) as
[np.einsum('ij,j->i',x,y) for x,y in zip(a,b)]
so the next step is to index that first dimension:
np.einsum('kij,kj->ki',a,b)
I'm quite familiar with einsum, but it still took a bit of trial and error to figure out what you want. Now that the problem is clear I can compute it in several other ways
A, B = np.array(a), np.array(b)
np.multiply(A,B[:,np.newaxis,:]).sum(axis=2)
(A*B[:,None,:]).sum(2)
np.dot(A,B.T)[0,...]
np.tensordot(b,a,(-1,-1))[:,0,:]
I find it helpful to work with arrays that have different sizes. For example if A were (2,3,4) and B (2,4), it would be more obvious the dot sum has to be on the last dimension.
Another numpy iteration tool is np.nditer. einsum uses this (in C).
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
it = np.nditer([A, B, None],flags=['external_loop'],
op_axes=[[0,1,2], [0,-1,1], None])
for x,y,w in it:
# x, y are shape (2,)
w[...] = np.dot(x,y)
it.operands[2][...,0]
Avoiding that [...,0] step, requires a more elaborate setup.
C = np.zeros((2,2))
it = np.nditer([A, B, C],flags=['external_loop','reduce_ok'],
op_axes=[[0,1,2], [0,-1,1], [0,1,-1]],
op_flags=[['readonly'],['readonly'],['readwrite']])
for x,y,w in it:
w[...] = np.dot(x,y)
# w[...] += x*y
print C
# array([[ 7., 15.],[ 14., 32.]])
There's one more option that #hpaulj left out in his extensive and comprehensive list of options:
>>> a = np.array(a)
>>> b = np.array(b)
>>> from numpy.core.umath_tests import matrix_multiply
>>> matrix_multiply.signature
'(m,n),(n,p)->(m,p)'
>>> matrix_multiply(a, b[..., np.newaxis])
array([[[ 7],
[15]],
[[14],
[32]]])
>>> matrix_multiply(a, b[..., np.newaxis]).shape
(2L, 2L, 1L)
>>> np.squeeze(matrix_multiply(a, b[..., np.newaxis]), axis=-1)
array([[ 7, 15],
[14, 32]])
The nice thing about matrix_multiply is that, it being a gufunc, it will work not only with 1D arrays of matrices, but also with broadcastable arrays. As an example, if instead of multiplying the first matrix with the first vector, and the second matrix with the second vector, you wanted to compute all possible multiplications, you could simply do:
>>> a = np.arange(8).reshape(2, 2, 2) # to have different matrices
>>> np.squeeze(matrix_multiply(a[...,np.newaxis, :, :],
... b[..., np.newaxis]), axis=-1)
array([[[ 3, 11],
[ 5, 23]],
[[19, 27],
[41, 59]]])

numpy ndarray slicing and iteration

I'm trying to slice and iterate over a multidimensional array at the same time. I have a solution that's functional, but it's kind of ugly, and I bet there's a slick way to do the iteration and slicing that I don't know about. Here's the code:
import numpy as np
x = np.arange(64).reshape(4,4,4)
y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
for j in range(0,4,2)
for k in range(0,4,2)]
y = np.array(y)
z = np.array([np.min(u) for u in y]).reshape(y.shape[1:])
Your last reshape doesn't work, because y has no shape defined. Without it you get:
>>> x = np.arange(64).reshape(4,4,4)
>>> y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
... for j in range(0,4,2)
... for k in range(0,4,2)]
>>> z = np.array([np.min(u) for u in y])
>>> z
array([ 0, 2, 8, 10, 32, 34, 40, 42])
But despite that, what you probably want is reshaping your array to 6 dimensions, which gets you the same result as above:
>>> xx = x.reshape(2, 2, 2, 2, 2, 2)
>>> zz = xx.min(axis=-1).min(axis=-2).min(axis=-3)
>>> zz
array([[[ 0, 2],
[ 8, 10]],
[[32, 34],
[40, 42]]])
>>> zz.ravel()
array([ 0, 2, 8, 10, 32, 34, 40, 42])
It's hard to tell exactly what you want in the last mean, but you can use stride_tricks to get a "slicker" way. It's rather tricky.
import numpy.lib.stride_tricks
# This returns a view with custom strides, x2[i,j,k] matches y[4*i+2*j+k]
x2 = numpy.lib.stride_tricks(
x, shape=(2,2,2,2,2,2),
strides=(numpy.array([32,8,2,16,4,1])*x.dtype.itemsize))
z2 = z2.min(axis=-1).min(axis=-2).min(axis=-3)
Still, I can't say this is much more readable. (Or efficient, as each min call will make temporaries.)
Note, my answer differs from Jaime's because I tried to match your elements of y. You can tell if you replace the min with max.

Categories