Python numpy array multiplication [duplicate] - python

This question already has an answer here:
How to calculate the outer product of two matrices A and B per rows faster in python (numpy)?
(1 answer)
Closed 6 years ago.
If I have to arrays X (X has n rows and k columns) and Y (Y has n rows and q columns) how do I multiply the two in the vector form, such that I obtain array Z with following characteristics:
Z[0]=X[:,0]*Y
Z[1]=X[:,1]*Y
Z[2]=X[:,2]*Y
...
Z[K-1]=X[:,k-1]*Y
Z[K]=X[:,k]*Y
for c in range(X.shape[1]):
Z[c]=X[:,c].dot(Y)

From your description, and almost no thinking:
Z=np.einsum('nk,nq->kq',X,Y)
I could also write it with np.dot, with a transpose or two. np.dot does the matrix sum over the last dim of the 1st and 2nd to last of 2nd
Z = np.dot(X.T, Y)
=================
In [566]: n,k,q=2,3,4
In [567]: X=np.arange(n*k).reshape(n,k)
In [568]: Y=np.arange(n*q).reshape(n,q)
In [569]: Z=np.einsum('nk,nq->kq',X,Y)
In [570]: Z
Out[570]:
array([[12, 15, 18, 21],
[16, 21, 26, 31],
[20, 27, 34, 41]])
In [571]: Z1=np.empty((k,q))
In [572]: Z1=np.array([X[:,c].dot(Y) for c in range(k)])
In [573]: Z1
Out[573]:
array([[12, 15, 18, 21],
[16, 21, 26, 31],
[20, 27, 34, 41]])
In [574]: X.T.dot(Y)
Out[574]:
array([[12, 15, 18, 21],
[16, 21, 26, 31],
[20, 27, 34, 41]])

Related

How can I extract a set of 2D slices from a larger 2D numpy array?

If I have a large 2D numpy array and 2 arrays which correspond to the x and y indices I want to extract, It's easy enough:
h = np.arange(49).reshape(7,7)
# h = [[0, 1, 2, 3, 4, 5, 6],
# [7, 8, 9, 10, 11, 12, 13],
# [14, 15, 16, 17, 18, 19, 20],
# [21, 22, 23, 24, 25, 26, 27],
# [28, 29, 30, 31, 32, 33, 34],
# [35, 36, 37, 38, 39, 40, 41],
# [42, 43, 44, 45, 46, 47, 48]]
x_indices = np.array([1,3,4])
y_indices = np.array([2,3,5])
reduced_h = h[x_indices, y_indices]
#reduced_h = [ 9, 24, 33]
However, I would like to, for each x,y pair cut out a square (denoted by 'a' - the number of indices in each direction from the centre) surrounding this 'coordinate' and return an array of these little 2D arrays.
For example, for h, x,y_indices as above and a=1:
reduced_h = [[[1,2,3],[8,9,10],[15,16,17]], [[16,17,18],[23,24,25],[30,31,32]], [[25,26,27],[32,33,34],[39,40,41]]]
i.e one 3x3 array for each x-y index pair corresponding to the 3x3 square of elements centred on the x-y index. In general, this should return a numpy array which has shape (len(x_indices),2a+1, 2a+1)
By analogy to reduced_h[0] = h[x_indices[0]-1:x_indices[0]+1 , y_indices[0]-1:y_indices[0]+1] = h[1-1:1+1 , 2-1:2+1] = h[0:2, 1:3] my first try was the following:
h[x_indices-a : x_indices+a, y_indices-a : y_indices+a]
However, perhaps unsurprisingly, slicing between the arrays fails.
So the obvious next thing to try is to create this slice manually. np.arange seems to struggle with this but linspace works:
a=1
xrange = np.linspace(x_indices-a, x_indices+a, 2*a+1, dtype=int)
# xrange = [ [0, 2, 3], [1, 3, 4], [2, 4, 5] ]
yrange = np.linspace(y_indices-a, y_indices+a, 2*a+1, dtype=int)
Now can try h[xrange,yrange] but this unsurprisingly does this element-wise meaning I get only one (2a+1)x(2a+1) array (the same dimensions as xrange and yrange). It there a way to, for every index, take the right slices from these ranges (without loops)? Or is there a way to make the broadcast work initially without having to set up linspace explicitly? Thanks
You can index np.lib.stride_tricks.sliding_window_view using your x and y indices:
import numpy as np
h = np.arange(49).reshape(7,7)
x_indices = np.array([1,3,4])
y_indices = np.array([2,3,5])
a = 1
window = (2*a+1, 2*a+1)
out = np.lib.stride_tricks.sliding_window_view(h, window)[x_indices-a, y_indices-a]
out:
array([[[ 1, 2, 3],
[ 8, 9, 10],
[15, 16, 17]],
[[16, 17, 18],
[23, 24, 25],
[30, 31, 32]],
[[25, 26, 27],
[32, 33, 34],
[39, 40, 41]]])
Note that you may need to pad h first to handle windows around your coordinates that reach "outside" h.

How to vectorize a numpy for loop that has a multiple indexed access

unigram is an array shape (N, M, 100)
I would like to remove the for loop and perform all the calculations.
seq is a 1D array of size M, and the size of M maybe up to 10000.
I would like to remove the for loop and vectorize it for easier computation.
batch_size, seq_len, num_labels = unigram_scores.shape
broadcast = np.broadcast_to(seq, (batch_size, seq_len))
for i in range(0, broadcast.shape[1]):
n_seq[i] = unigram_scores[np.arange(batch_size), i , broadcast[:,i]]
edit:
answer by #hpaulj worked perfectly and also has the advantage of not having to install any extra dependency
the speed up was much lower than I expected
I ended up finally installing numba
import numpy as np
from numba import njit, prange
#njit(parallel=True)
def calculate_unigram_probability(unigram_scores,seq):
batch_size, seq_len, num_labels = unigram_scores.shape
broadcast = np.broadcast_to(seq, (batch_size, seq_len))
for i in prange( broadcast.shape[1]):
n_seq[i] = unigram_scores[np.arange(batch_size), i , broadcast[:,i]]
return n_seq
which is also taking a a bit too long, Currently I am trying to move it from the cpu to cuda which should bring about the speedup I am hoping for
In [129]: N,M = 5,3
In [130]: unigram=np.arange(N*M*4).reshape(N,M,4)
In [131]: seq = np.arange(M)
In [132]: b_seq = np.broadcast_to(seq, (N,M))
For a single i:
In [133]: i=0; unigram[np.arange(N),i,b_seq[:,i]]
Out[133]: array([ 0, 12, 24, 36, 48])
For all i in the range:
In [136]: i=np.arange(M)[:,None]
In [137]: unigram[np.arange(N),i,b_seq[:,i]]
Out[137]:
array([[[ 0, 12, 24, 36, 48],
[ 5, 17, 29, 41, 53],
[10, 22, 34, 46, 58]],
...
[[ 0, 12, 24, 36, 48],
[ 5, 17, 29, 41, 53],
[10, 22, 34, 46, 58]]])
A (5,3,5) array. This (5,3) might be better)
In [141]: i=np.arange(M); unigram[np.arange(N)[:,None],i,b_seq[:,i]]
Out[141]:
array([[ 0, 5, 10],
[12, 17, 22],
[24, 29, 34],
[36, 41, 46],
[48, 53, 58]])
We don't need to index b_seq: unigram[np.arange(N)[:,None],i,b_seq]
Or even use; let the indexing broadcast seq:
unigram[np.arange(N)[:,None],i,seq]
and with the help of ix_:
In [145]: I,J=np.ix_(np.arange(N), np.arange(M))
In [146]: unigram[I,J,seq]
To get a visual idea of what this indexing does, look at unigram. It's pull 'diagonals' from successive blocks/batches:
In [147]: unigram
Out[147]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]],
...
you can use x.flatten() to reshape a 3d array to 1d array (x must be a numpy array )
in your case :
broadcast = broadcast.flatten()
this will transform an array of shape (NM1000) to an array of one dimension

How to fill a matrix in Python using iteration over rows and columns

So I have an array of 5 integers v and another of 10 integers v.
I have a 5 by 10 matrix P that I would want to fill so that (P)ij = v[i] + u[j]
I tried:
P = np.empty((len(asset_grid),len(asset_grid)))
for i in range(asset_grid):
for j in range(asset_grid):
P[i,j] = asset_grid[i] + asset_grid[j]
but it gives me an error
TypeError: only integer arrays with one element can be converted to an index
How should I be able to do this in Python. I apologize if my approach is too naive, I am used to Matlab and now slowly learning Python. Any help is appreciated.
Broadcasting is what you want to do. Although for small arrays such as yours, it doesn't make a difference, it makes a significant difference with larger arrays:
>>> arr1 = np.arange(5)
>>> arr2 = np.arange(10,20)
>>> arr1[:,None] + arr2
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
Generally with numpy you want to avoid iteration over rows and columns and use vectorized/broadcasted operations. This is where speed improvements actually come from.
So, elaborating based on your comment:
Say P_ij is ith element of x raised to the 4th power minus jth element of y raised to 2nd power
In general, Python supports most arithmetical operations you would want in a vectorized way, using the usual Python operators:
>>> arr1[:, None]**4 - arr2**2
array([[-100, -121, -144, -169, -196, -225, -256, -289, -324, -361],
[ -99, -120, -143, -168, -195, -224, -255, -288, -323, -360],
[ -84, -105, -128, -153, -180, -209, -240, -273, -308, -345],
[ -19, -40, -63, -88, -115, -144, -175, -208, -243, -280],
[ 156, 135, 112, 87, 60, 31, 0, -33, -68, -105]])

choose rows from two matrices

I am trying to solve the following problem. I have two matrices A and B and I want to create a new matrix C which consists of the rows of the matrices A and B depending on some condition which is encoded in the array v, i.e. if the i'th entry of v is a one then I want the i'th row of C to be the i'th row of B and if it is a zero then it should be the i'th row of A. I came up with the following solution
C = np.choose(v,A.T,B.T).T
but it is too slow. One obvious bad thing are the two transposes, but since np.choose does not take an axis argument I don't know how to get rid of them. Any ideas for a fast solution of this problem?
For Example let
A = np.arange(20).reshape([4,5])
and
B = 10 - A
Then one could imagine that one wants the matrix C to be the matrix of rows with smallest maximum norm. So we let
v = np.sum(A,axis=1)<np.sum(B,axis=1)
and then C is the matrix
C = np.choose(v,[A.T,B.T]).T
which is
array([[10, 9, 8, 7, 6],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
Seems like a good setup to use np.where to do the chosing operation based on the mask/binary input data -
C = np.where(v[:,None],B,A)
That v[:,None] part basically extends v to broadcastable shape as A and B allowing the broadcasting to let chosing work along the appropriate axis, axis=0 in this case for the two 2D arrays.
Sample run -
In [58]: A
Out[58]:
array([[82, 78, 57],
[14, 97, 32],
[72, 11, 49],
[98, 34, 41],
[89, 71, 52],
[34, 51, 55],
[26, 92, 59]])
In [59]: B
Out[59]:
array([[55, 67, 50],
[49, 64, 21],
[34, 18, 72],
[24, 61, 65],
[56, 59, 23],
[44, 77, 13],
[56, 55, 58]])
In [62]: v
Out[62]: array([1, 0, 0, 0, 0, 1, 1])
In [63]: np.where(v[:,None],B,A)
Out[63]:
array([[55, 67, 50],
[14, 97, 32],
[72, 11, 49],
[98, 34, 41],
[89, 71, 52],
[44, 77, 13],
[56, 55, 58]])
If v doesn't strictly consist of 0s and 1s only, use v[:,None]==1 as the first argument with np.where.
Another approach would be with boolean-indexing -
C = A.copy()
mask = v==1
C[mask] = B[mask]
Note : If v is already a boolean array, skip the comparison against 1 for the mask creation.
Runtime test -
In [77]: A = np.random.randint(11,99,(10000,3))
In [78]: B = np.random.randint(11,99,(10000,3))
In [79]: v = np.random.rand(A.shape[0])>0.5
In [82]: def choose_rows_copy(A, B, v):
...: C = A.copy()
...: C[v] = B[v]
...: return C
...:
In [83]: %timeit np.where(v[:,None],B,A)
10000 loops, best of 3: 107 µs per loop
In [84]: %timeit choose_rows_copy(A, B, v)
1000 loops, best of 3: 226 µs per loop

Python: negative numbers, when multiplying a matrix by a vector with numpy

A = numpy.matrix([[36, 34, 26],
[18, 44, 1],
[11, 31, 41]])
X1 = numpy.matrix([[46231154], [26619349], [37498603]])
Need multiplying a matrix by a vector. I tried:
>>>A*X1
matrix([[ -750624208],
[ 2040910731],
[-1423782060]])
>>> numpy.dot(A,X1)
matrix([[ -750624208],
[ 2040910731],
[-1423782060]])
Why negative numbers? It's ok with lower numbers, for example:
A = numpy.matrix([[36, 34, 26],
[18, 44, 1],
[11, 31, 41]])
X1 = numpy.matrix([[8], [6], [6]])
>>>A*X1
matrix([[58],
[38],
[40]])
I believe you're on a 32-bit system, and you're seeing an integer overflow. Try defining the matrix and vector with the keyword argument dtype=np.int64, and see if you get a more meaningful answer.
On my 64 bit machine, I have the following output
In [1]: import numpy
In [2]: A = numpy.matrix([[36, 34, 26],
...: [18, 44, 1],
...: [11, 31, 41]])
In [3]:
In [3]: X1 = numpy.matrix([[46231154], [26619349], [37498603]])
In [4]: A*X1
Out[4]:
matrix([[3544343088],
[2040910731],
[2871185236]])

Categories