Adding matrix rows to columns in numpy - python

Say I have two 3D matrices/tensors with dimensions:
[10, 3, 1000]
[10, 4, 1000]
How do I add each combination of the third dimensions of each vector together such that to get a dimension of:
[10, 3, 4, 1000]
So each row if you will, in the second x third dimension for each of the vectors adds to the other one in every combination. Sorry if this is not clear I'm having a hard time articulating this...
Is there some kind of clever way to do this with numpy or pytorch (perfectly happy with a numpy solution, though I'm trying to use this in a pytorch context so a torch tensor manipulation would be even better) that doesn't involve me writing a bunch of nested for loops?
Nested loops example:
x = np.random.randint(50, size=(32, 16, 512))
y = np.random.randint(50, size=(32, 21, 512))
scores = np.zeros(shape=(x.shape[0], x.shape[1], y.shape[1], 512))
for b in range(x.shape[0]):
for i in range(x.shape[1]):
for j in range(y.shape[1]):
scores[b, i, j, :] = y[b, j, :] + x[b, i, :]

Does it work for you?
import torch
x1 = torch.rand(5, 3, 6)
y1 = torch.rand(5, 4, 6)
dim1, dim2 = x1.size()[0:2], y1.size()[-2:]
x2 = x1.unsqueeze(2).expand(*dim1, *dim2)
y2 = y1.unsqueeze(1).expand(*dim1, *dim2)
result = x2 + y2
print(x1[0, 1, :])
print(y1[0, 2, :])
print(result[0, 1, 2, :])
Output:
0.2884
0.5253
0.1463
0.4632
0.8944
0.6218
[torch.FloatTensor of size 6]
0.5654
0.0536
0.9355
0.1405
0.9233
0.1738
[torch.FloatTensor of size 6]
0.8538
0.5789
1.0818
0.6037
1.8177
0.7955
[torch.FloatTensor of size 6]

Related

Replicate operation tensor operation using `torch.tensordot()` and `torch.stack()`

I have a tensor operation that I would like to replicate using a combination of torch.stack() and torch.tensordot() to generalize it further on a larger program. In summary, I want to replicate the tensor V_1 using said operations into another tensor called V_2.
N, t , J = 4, 2 , 3
K_f , K_r = 1, 1
R = 5
K = K_f + K_r
id = torch.arange(N).repeat(t).sort()
X = torch.randn(N*t, K , J)
Y = torch.randn(N*t, 1)
D = torch.randn(N, K_r , R)
Draw = D.repeat_interleave(t,0)
beta = torch.randn(2*K_r + K_f, 1)
beta_R = (beta[0:K_r,0] + beta[K_r:2*K_r,0] * Draw ).repeat(1,J,1)
print("shape beta_R:", beta_R.shape)
beta_F = beta[2*K_r:2*K_r + K_f,0].repeat(N*t, J, R)
print("shape beta_F:", beta_F.shape)
XX_0 =X[:,0,:].unsqueeze(2).repeat(1,1,R)
print("shape XX_0:", XX_0.shape)
XX_1 =X[:,1,:].unsqueeze(2).repeat(1,1,R)
print("shape XX_1:", XX_1.shape)
V_1 = XX_0 * beta_R + XX_1 * beta_F
print("shape V_1:",V_1.shape)
#shape beta_R: torch.Size([8, 3, 5])
#shape beta_F: torch.Size([8, 3, 5])
#shape XX_0: torch.Size([8, 3, 5])
#shape XX_1: torch.Size([8, 3, 5])
#shape V_1: torch.Size([8, 3, 5])
Now I want to do the same but stacking my tensors (using torch.stack()) and applying a generalized version of the dot-product (using torch.tensordot()), but I am a bit confused with the dims argument which is not doing what I expected.
#%% Replicating using stacking and tensordot
stack_XX = torch.stack((XX_0, XX_1), 0)
print("shape stack_XX:",stack_XX.shape)
stack_beta = torch.stack((beta_R, beta_F), 0)
print("shape stack_beta:", stack_beta.shape)
# dot product bewteen stack_XX and stack_beta along the first dimension
V_2 = torch.tensordot(stack_XX, stack_beta, dims=([0], [0]))
print("shape V_2:",V_2.shape)
# check if the two are equal
torch.all(V_1.eq(V_2))
#shape stack_XX: torch.Size([2, 8, 3, 5])
#shape stack_beta: torch.Size([2, 8, 3, 5])
#shape V_2: torch.Size([8, 3, 5, 8, 3, 5])
#tensor(False)
So I am basically trying to get tensor(True) when running torch.all(V_1.eq(V_2)).
May be?
torch.einsum( 'abcd,abcd->bcd', stack_XX, stack_beta)

selecting random elements from each column of numpy array

I have an n row, m column numpy array, and would like to create a new k x m array by selecting k random elements from each column of the array. I wrote the following python function to do this, but would like to implement something more efficient and faster:
def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
v = np.ndarray.tolist(v)
subv = random.sample(v, nelements)
vmat = vmat + [subv]
return(np.array(vmat).T)
One question is whether there's a way to loop over each column without transposing the array (and then transposing back). More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?
One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:
arr = np.random.randn(1000, 5000) # arbitrary
k = 10 # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (k, m))
new = np.take_along_axis(arr, idx, axis=0)
Output (shape):
in [215]: new.shape
out[215]: (10, 500) # (k x m)
To sample each column without replacement just like your original solution
import numpy as np
matrix = np.arange(4*3).reshape(4,3)
matrix
Output
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)
Output
array([[ 9, 1, 2],
[ 3, 4, 11]])
I would
Pre-allocate the result array, and fill in columns, and
Use numpy index based indexing
def sample_array_cols(matrix, n_result):
(n,m) = matrix.shape
vmat = numpy.array([n_result, m], dtype= matrix.dtype)
for c in range(m):
random_indices = numpy.random.randint(0, n, n_result)
vmat[:,c] = matrix[random_indices, c]
return vmat
Not quite fully vectorized, but better than building up a list, and the code scans just like your description.

Numpy convolving along an axis for 2 2D-arrays

I have 2 2D-arrays. I am trying to convolve along the axis 1. np.convolve doesn't provide the axis argument. The answer here, convolves 1 2D-array with a 1D array using np.apply_along_axis. But it cannot be directly applied to my use case. The question here doesn't have an answer.
MWE is as follows.
import numpy as np
a = np.random.randint(0, 5, (2, 5))
"""
a=
array([[4, 2, 0, 4, 3],
[2, 2, 2, 3, 1]])
"""
b = np.random.randint(0, 5, (2, 2))
"""
b=
array([[4, 3],
[4, 0]])
"""
# What I want
c = np.convolve(a, b, axis=1) # axis is not supported as an argument
"""
c=
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
"""
I know I can do it using np.fft.fft, but it seems like an unnecessary step to get a simple thing done. Is there a simple way to do this? Thanks.
Why not just do a list comprehension with zip?
>>> np.array([np.convolve(x, y) for x, y in zip(a, b)])
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
>>>
Or with scipy.signal.convolve2d:
>>> from scipy.signal import convolve2d
>>> convolve2d(a, b)[[0, 2]]
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
>>>
One possibility could be to manually go the way to the Fourier spectrum, and back:
n = np.max([a.shape, b.shape]) + 1
np.abs(np.fft.ifft(np.fft.fft(a, n=n) * np.fft.fft(b, n=n))).astype(int)
# array([[16, 20, 6, 16, 24, 9],
# [ 8, 8, 8, 12, 4, 0]])
Would it be considered too ugly to loop over the orthogonal dimension? That would not add much overhead unless the main dimension is very short. Creating the output array ahead of time ensures that no memory needs to be copied about.
def convolvesecond(a, b):
N1, L1 = a.shape
N2, L2 = b.shape
if N1 != N2:
raise ValueError("Not compatible")
c = np.zeros((N1, L1 + L2 - 1), dtype=a.dtype)
for n in range(N1):
c[n,:] = np.convolve(a[n,:], b[n,:], 'full')
return c
For the generic case (convolving along the k-th axis of a pair of multidimensional arrays), I would resort to a pair of helper functions I always keep on hand to convert multidimensional problems to the basic 2d case:
def semiflatten(x, d=0):
'''SEMIFLATTEN - Permute and reshape an array to convenient matrix form
y, s = SEMIFLATTEN(x, d) permutes and reshapes the arbitrary array X so
that input dimension D (default: 0) becomes the second dimension of the
output, and all other dimensions (if any) are combined into the first
dimension of the output. The output is always 2-D, even if the input is
only 1-D.
If D<0, dimensions are counted from the end.
Return value S can be used to invert the operation using SEMIUNFLATTEN.
This is useful to facilitate looping over arrays with unknown shape.'''
x = np.array(x)
shp = x.shape
ndims = x.ndim
if d<0:
d = ndims + d
perm = list(range(ndims))
perm.pop(d)
perm.append(d)
y = np.transpose(x, perm)
# Y has the original D-th axis last, preceded by the other axes, in order
rest = np.array(shp, int)[perm[:-1]]
y = np.reshape(y, [np.prod(rest), y.shape[-1]])
return y, (d, rest)
def semiunflatten(y, s):
'''SEMIUNFLATTEN - Reverse the operation of SEMIFLATTEN
x = SEMIUNFLATTEN(y, s), where Y, S are as returned from SEMIFLATTEN,
reverses the reshaping and permutation.'''
d, rest = s
x = np.reshape(y, np.append(rest, y.shape[-1]))
perm = list(range(x.ndim))
perm.pop()
perm.insert(d, x.ndim-1)
x = np.transpose(x, perm)
return x
(Note that reshape and transpose do not create copies, so these functions are extremely fast.)
With those, the generic form can be written as:
def convolvealong(a, b, axis=-1):
a, S1 = semiflatten(a, axis)
b, S2 = semiflatten(b, axis)
c = convolvesecond(a, b)
return semiunflatten(c, S1)

How to broadcast or vectorize a linear interpolation of a 2D array that uses scipy.ndimage map_coordinates?

I have recently hit a roadblock when it comes to performance. I know how to manually loop and do the interpolation from the origin cell to all the other cells by brute-forcing/looping each row and column in 2d array.
however when I process a 2D array of a shape say (3000, 3000), the linear spacing and the interpolation come to a standstill and severely hurt performance.
I am looking for a way I can optimize this loop, I am aware of vectorization and broadcasting just not sure how I can apply it in this situation.
I will explain it with code and figures
import numpy as np
from scipy.ndimage import map_coordinates
m = np.array([
[10,10,10,10,10,10],
[9,9,9,10,9,9],
[9,8,9,10,8,9],
[9,7,8,0,8,9],
[8,7,7,8,8,9],
[5,6,7,7,6,7]])
origin_row = 3
origin_col = 3
m_max = np.zeros(m.shape)
m_dist = np.zeros(m.shape)
rows, cols = m.shape
for col in range(cols):
for row in range(rows):
# Get spacing linear interpolation
x_plot = np.linspace(col, origin_col, 5)
y_plot = np.linspace(row, origin_row, 5)
# grab the interpolated line
interpolated_line = map_coordinates(m,
np.vstack((y_plot,
x_plot)),
order=1, mode='nearest')
m_max[row][col] = max(interpolated_line)
m_dist[row][col] = np.argmax(interpolated_line)
print(m)
print(m_max)
print(m_dist)
As you can see this is very brute force, and I have managed to broadcast all the code around this part but stuck on this part.
here is an illustration of what I am trying to achieve, I will go through the first iteration
1.) the input array
2.) the first loop from 0,0 to origin (3,3)
3.) this will return [10 9 9 8 0] and the max will be 10 and the index will be 0
5.) here is the output for the sample array I used
Here is an update of the performance based on the accepted answer.
To speed up the code, you could first create the x_plot and y_plot outside of the loops instead of creating them several times each one:
#this would be outside of the loops
num = 5
lin_col = np.array([np.linspace(i, origin_col, num) for i in range(cols)])
lin_row = np.array([np.linspace(i, origin_row, num) for i in range(rows)])
then you could access them in each loop by x_plot = lin_col[col] and y_plot = lin_row[row]
Second, you can avoid both loops by using map_coordinates on more than just one v_stack for each couple (row, col). To do so, you can create all the combinaisons of x_plot and y_plot by using np.tile and np.ravel such as:
arr_vs = np.vstack(( np.tile( lin_row, cols).ravel(),
np.tile( lin_col.ravel(), rows)))
Note that ravel is not used at the same place each time to get all the combinaisons. Now you can use map_coordinates with this arr_vs and reshape the result with the number of rows, cols and num to get each interpolated_line in the last axis of a 3D-array:
arr_map = map_coordinates(m, arr_vs, order=1, mode='nearest').reshape(rows,cols,num)
Finally, you can use np.max and np.argmax on the last axis of arr_map to get the results m_max and m_dist. So all the code would be:
import numpy as np
from scipy.ndimage import map_coordinates
m = np.array([
[10,10,10,10,10,10],
[9,9,9,10,9,9],
[9,8,9,10,8,9],
[9,7,8,0,8,9],
[8,7,7,8,8,9],
[5,6,7,7,6,7]])
origin_row = 3
origin_col = 3
rows, cols = m.shape
num = 5
lin_col = np.array([np.linspace(i, origin_col, num) for i in range(cols)])
lin_row = np.array([np.linspace(i, origin_row, num) for i in range(rows)])
arr_vs = np.vstack(( np.tile( lin_row, cols).ravel(),
np.tile( lin_col.ravel(), rows)))
arr_map = map_coordinates(m, arr_vs, order=1, mode='nearest').reshape(rows,cols,num)
m_max = np.max( arr_map, axis=-1)
m_dist = np.argmax( arr_map, axis=-1)
print (m_max)
print (m_dist)
and you get like expected:
#m_max
array([[10, 10, 10, 10, 10, 10],
[ 9, 9, 10, 10, 9, 9],
[ 9, 9, 9, 10, 8, 9],
[ 9, 8, 8, 0, 8, 9],
[ 8, 8, 7, 8, 8, 9],
[ 7, 7, 8, 8, 8, 8]])
#m_dist
array([[0, 0, 0, 0, 0, 0],
[0, 0, 2, 0, 0, 0],
[0, 2, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 2, 0, 0, 0, 0],
[1, 1, 2, 1, 2, 1]])
EDIT: lin_col and lin_row are related, so you can do faster:
if cols >= rows:
arr = np.arange(cols)[:,None]
lin_col = arr + (origin_col-arr)/(num-1.)*np.arange(num)
lin_row = lin_col[:rows] + np.linspace(0, origin_row - origin_col, num)[None,:]
else:
arr = np.arange(rows)[:,None]
lin_row = arr + (origin_row-arr)/(num-1.)*np.arange(num)
lin_col = lin_row[:cols] + np.linspace(0, origin_col - origin_row, num)[None,:]
Here is a sort-of-vectorized approach. It is not very optimized and there may be one or two index-off-by-one errors, but it may give you ideas.
Two examples a monochrome 384x512 test pattern and a "real" 3-channel 768x1024 image. Both are uint8.
This takes half a minute on my machine.
For larger images one would require more RAM than I have (8GB). Or one would have to break it down into smaller chunks.
And the code
import numpy as np
def rays(img, ctr):
M, N, *d = img.shape
aidx = 2*(slice(None),) + (img.ndim-2)*(None,)
m, n = ctr
out = np.empty_like(img)
offsI = np.empty(img.shape, np.uint16)
offsJ = np.empty(img.shape, np.uint16)
img4, out4, I4, J4 = ((x[m:, n:], x[m:, n::-1], x[m::-1, n:], x[m::-1, n::-1]) for x in (img, out, offsI, offsJ))
for i, o, y, x in zip(img4, out4, I4, J4):
for _ in range(2):
M, N, *d = i.shape
widths = np.arange(1, M+1, dtype=np.uint16).clip(None, N)
I = np.arange(M, dtype=np.uint16).repeat(widths)
J = np.ones_like(I)
J[0] = 0
J[widths[:-1].cumsum()] -= widths[:-1]
J = J.cumsum(dtype=np.uint16)
ii = np.arange(1, 2*M-1, dtype=np.uint16) // 2
II = ii.clip(None, I[:, None])
jj = np.arange(2*M-2, dtype=np.uint32) // 2 * 2 + 1
jj[0] = 0
JJ = ((1 + jj) * J[:, None] // (2*(I+1))[:, None]).astype(np.uint16).clip(None, J[:, None])
idx = i[II, JJ].argmax(axis=1)
II, JJ = (np.take_along_axis(ZZ[aidx] , idx[:, None], 1)[:, 0] for ZZ in (II, JJ))
y[I, J], x[I, J] = II, JJ
SH = II, JJ, *np.ogrid[tuple(map(slice, img.shape))][2:]
o[I, J] = i[SH]
i, o = i.swapaxes(0, 1), o.swapaxes(0, 1)
y, x = x.swapaxes(0, 1), y.swapaxes(0, 1)
return out, offsI, offsJ
from scipy.misc import face
f = face()
fr, *fidx = rays(f, (200, 400))
s = np.uint8((np.arange(384)[:, None] % 41 < 2)&(np.arange(512) % 41 < 2))
s = 255*s + 128*s[::-1, ::-1] + 64*s[::-1] + 32*s[:, ::-1]
sr, *sidx = rays(s, (200, 400))
import Image
Image.fromarray(f).show()
Image.fromarray(fr).show()
Image.fromarray(s).show()
Image.fromarray(sr).show()

Numpy reshape from (m,w,l) to (w,m,l) dimension

I'm working with financial time series data and a bit confused with numpy reshape function. My goal is to calculate log-returns for adj_close parameter.
inputs = np.array([df_historical_data[key][-W:], axis = 1).values for key in stock_list])
inputs.shape //(8, 820, 5)
prices = inputs[:, :, 0]
prices.shape //(8, 820)
prices[:,0]
array([ 4.17000004e+02, 4.68800000e+00, 8.47889000e-03,
3.18835850e+00, 3.58412583e+00, 8.35364850e-01,
5.54610005e-04, 3.33600003e-05]) //close prices of 8 stock for 0 day
However for my program, I need the shape of my inputs be (820, 8, 5) so I decided to reshape my numpy array
inputs = np.array([df_historical_data[key][-W:], axis = 1).values for key in stock_list]).reshape(820, 8, 5)
inputs.shape //(820, 8, 5)
prices = inputs[:, :, 0]
prices.shape //(820, 8)
prices[0]
array([ 417.00000354, 436.5100001 , 441.00000442, 440. ,
416.10000178, 409.45245 , 422.999999 , 432.48000001])
// close price of 1 stock for 8 days
// but should be the same as in the example above
Seems that I didn't reshaped my array properly.
Anyway I can't understand why such strange behaviour occurs.
What you need is transpose not reshape.
Let's assume we have an array as follows:
import numpy as np
m, w, l = 2, 3, 4
array1 = np.array([[['m%d w%d l%d' % (mi, wi, li) for li in range(l)] for wi in range(w)] for mi in range(m)])
print(array1.shape)
print(array1)
Reshape is probably not what you want, but here is how can you do it:
array2 = array1.reshape(w, m, l)
print(array2.shape)
print(array2)
Here is how transpose is done:
# originally
# 0, 1, 2
# m, w, l
# -------
# transposed
array3 = array1.transpose(1, 0, 2)
# w, m, l
print(array3.shape)
print(array3)

Categories