Vectorizing for loop using splicing in NumPy - python

I have this for loop:
blockSize = 5
ds = np.arange(20)
ds = np.reshape(ds, (1, len(ds))
counts = np.zeros(len(ds[0]/blockSize))
for i in range(len(counts[0])):
counts[0, i] = np.floor(np.sum(ds[0, i*blockSize:i*blockSize+blockSize]))
I am trying to vectorize it, doing something like this:
countIndices = np.arange(len(counts[0]))
counts[0, countsIndices] = np.floor(np.sum(ds[0, countIndices*blockSize:countIndices*blockSize + blockSize]))
However, this does not work and gives this error:
counts[0, countIndices] = np.floor(np.sum(ds[0, countIndices*blockSize:countIndices*blockSize + blockSize]))
TypeError: only integer scalar arrays can be converted to a scalar index
I know that something like this works:
counts[0, countIndices] = np.floor(ds[0, countIndices*blockSize]
+ ds[0, countIndices*blockSize + 2] +
... ds[0, countIndices*blockSize + blockSize])
The issue is that for large values of blocksize (which blocksize is very large in my actual code), this is not feasible to implement. I am confused as to how to accomplish what I want. Any help is greatly appreciated.

You don't need to do floor if you store the result to an integer array. You can also create a fake new axis of size blockSize to fully vectorize your operation.
block_size = 5
ds = np.arange(80.0).reshape(4, -1) # Shape (4, 20)
counts = np.empty((ds.shape[0], ds.shape[1] // block_size), dtype=int)
To introduce the fake dimension and sum:
ds.reshape(ds.shape[0], -1, block_size).sum(axis=-1, out=counts)
Reshaping does not copy the data, so the operation ds.reshape(ds.shape[0], -1, block_size) is extremely cheap in both time and space.
You can use -1 for one of the reshape dimensions to avoid computing/writing out long division expressions.

Related

Wrong output datatype (results out of range) on equation with numpy

Got 2 arrays:
print(arr1.shape)
print(arr1.dtype)
print(arr2.shape)
print(arr2.dtype)
output
(500, 500)
uint8
(500, 500)
uint8
How do I correctly substract one from another so the output datatype would fit the real result?
Doing this:
sub = arr1 - arr2
gives me incorrect results, because sub.dtype is uint8, so it can't fit negative values (when arr2 value is bigger than arr1).
What is the best way to deal with that?
PS. This is only a basic example, the real equation is much more complicated, but fails on this step.
A very general approach would be to just cast your arrays to whatever output type you're planning on using:
arr1 = arr1.astype(float)
arr2 = arr2.astype(float)
sub = arr1 - arr2
Another, more subtle, approach is to perform only the casts that you need. An operation like res = a * b / np.exp((c - h)) is actually
temp1 = np.multiply(a, b)
temp2 = np.subtract(c, h)
temp3 = np.exp(temp2)
res = np.true_divide(temp1, temp3)
All of these temporary arrays would produce correct results with unsigned integer inputs except temp2. You could therefore write the offending step explicitly:
temp = np.empty(c.shape, float)
res = a * b / np.exp(np.subtract(c, h, out=temp))

Slicing 2D numpy array periodically

I have a numpy array of 300x300 where I want to keep all elements periodically. Specifically, for both axes I want to keep the first 5 elements, then discard 15, keep 5, discard 15, etc. This should result in an array of 75x75 elements. How can this be done?
You can created a 1D mask, that carries out the keep/discard function, and then repeat the mask and apply the mask to the array. Here is an example.
import numpy as np
size = 300
array = np.arange(size).reshape((size, 1)) * np.arange(size).reshape((1, size))
mask = np.concatenate((np.ones(5), np.zeros(15))).astype(bool)
period = len(mask)
mask = np.repeat(mask.reshape((1, period)), repeats=size // period, axis=0)
mask = np.concatenate(mask, axis=0)
result = array[mask][:, mask]
print(result.shape)
You can view the array as series of 20x20 blocks, of which you want to keep the upper-left 5x5 portion. Let's say you have
keep = 5
discard = 15
This only works if
assert all(s % (keep + discard) == 0 for s in arr.shape)
First compute the shape of the view and use it:
block = keep + discard
shape1 = (arr.shape[0] // block, block, arr.shape[1] // block, block)
view = arr.reshape(shape1)[:, :keep, :, :keep]
The following operation will create a copy of the data because the view creates a non-contiguous buffer:
shape2 = (shape1[0] * keep, shape1[2] * keep)
result = view.reshape(shape2)
You can compute shape1 and shape2 in a more general manner with something like
shape1 = tuple(
np.stack((np.array(arr.shape) // block,
np.full(arr.ndim, block)), -1).ravel())
shape2 = tuple(np.array(shape1[::2]) * keep)
I would recommend packaging this into a function.
Here is my first thought of a solution. Will update later if I think of one with fewer lines. This should work even if the input is not square:
output = []
for i in range(len(arr)):
tmp = []
if i % (15+5) < 5: # keep first 5, then discard next 15
for j in range(len(arr[i])):
if j % (15+5) < 5: # keep first 5, then discard next 15
tmp.append(arr[i,j])
output.append(tmp)
Update:
Building off of Yang's answer, here is another way which uses np.tile, which repeats an array a given number of times along each axis. This relies on the input array being square in dimension.
import numpy as np
# Define one instance of the keep/discard box
keep, discard = 5, 15
mask = np.concatenate([np.ones(keep), np.zeros(discard)])
mask_2d = mask.reshape((keep+discard,1)) * mask.reshape((1,keep+discard))
# Tile it out -- overshoot, then trim to match size
count = len(arr)//len(mask_2d) + 1
tiled = np.tile(mask_2d, [count,count]).astype('bool')
tiled = tiled[:len(arr), :len(arr)]
# Apply the mask to the input array
dim = sum(tiled[0])
output = arr[tiled].reshape((dim,dim))
Another option using meshgrid and a modulo:
# MyArray = 300x300 numpy array
r = np.r_[0:300] # A slide from 0->300
xv, yv = np.meshgrid(r, r) # x and y grid
mask = ((xv%20)<5) & ((yv%20)<5) # We create the boolean mask
result = MyArray[mask].reshape((75,75)) # We apply the mask and reshape the final output

Implement ConvND in Tensorflow

So I need a ND convolutional layer that also supports complex numbers. So I decided to code it myself.
I tested this code on numpy alone and it worked. Tested with several channels, 2D and 1D and complex. However, I have problems when I do it on TF.
This is my code so far:
def call(self, inputs):
with tf.name_scope("ComplexConvolution_" + str(self.layer_number)) as scope:
inputs = self._verify_inputs(inputs) # Check inputs are of expected shape and format
inputs = self.apply_padding(inputs) # Add zeros if needed
output_np = np.zeros( # I use np because tf does not support the assigment
(inputs.shape[0],) + # Per each image
self.output_size, # Image out size
dtype=self.input_dtype # To support complex numbers
)
img_index = 0
for image in inputs:
for filter_index in range(self.filters):
for i in range(int(np.prod(self.output_size[:-1]))): # for each element in the output
index = np.unravel_index(i, self.output_size[:-1])
start_index = tuple([a * b for a, b in zip(index, self.stride_shape)])
end_index = tuple([a+b for a, b in zip(start_index, self.kernel_shape)])
# set_trace()
sector_slice = tuple(
[slice(start_index[ind], end_index[ind]) for ind in range(len(start_index))]
)
sector = image[sector_slice]
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
# I use Tied Bias https://datascience.stackexchange.com/a/37748/75968
output_np[img_index][index][filter_index] = new_value # The complicated line
img_index += 1
output = apply_activation(self.activation, output_np)
return output
input_size is a tuple of shape (dim1, dim2, ..., dim3, channels). An 2D rgb conv for example will be (32, 32, 3) and inputs will have shape (None, 32, 32, 3).
The output size is calculated from an equation I found in this paper: A guide to convolution arithmetic for deep learning
out_list = []
for i in range(len(self.input_size) - 1): # -1 because the number of input channels is irrelevant
out_list.append(int(np.floor((self.input_size[i] + 2 * self.padding_shape[i] - self.kernel_shape[i]) / self.stride_shape[i]) + 1))
out_list.append(self.filters)
Basically, I use np.zeros because if I use tf.zeros I cannot assign the new_value and I get:
TypeError: 'Tensor' object does not support item assignment
However, in this current state I am getting:
NotImplementedError: Cannot convert a symbolic Tensor (placeholder_1:0) to a numpy array.
On that same assignment. I don't see an easy fix, I think I should change the strategy of the code completely.
In the end, I did it in a very inefficient way based in this comment, also commented here but at least it works:
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
indices = (img_index,) + index + (filter_index,)
mask = tf.Variable(tf.fill(output_np.shape, 1))
mask = mask[indices].assign(0)
mask = tf.cast(mask, dtype=self.input_dtype)
output_np = array * mask + (1 - mask) * new_value
I say inefficient because I create a whole new array for each assignment. My code is taking ages to compute for the moment so I will keep looking for improvements and post here if I get something better.

Custom conv2d operation Pytorch

I have tried a custom Conv2d function which has to work similar to nn.Conv2d but the multiplication and addition used inside nn.Conv2d are replaced with mymult(num1,num2) and myadd(num1,num2).
As per insight from very helpful forums 1,2 what i can do is try unfolding it and then do matrix multiplication. That # part given in the code below can be done using loops with mymult() and myadd() as i believe this # is doing matmul.
def convcheck():
torch.manual_seed(123)
batch_size = 2
channels = 2
h, w = 2, 2
image = torch.randn(batch_size, channels, h, w) # input image
out_channels = 3
kh, kw = 1, 1# kernel size
dh, dw = 1, 1 # stride
size = int((h-kh+2*0)/dh+1) #include padding in place of zero
conv = nn.Conv2d(in_channels=channels, out_channels=out_channels, kernel_size=kw, padding=0,stride=dh ,bias=False)
out = conv (image)
#print('out', out)
#print('out.size()', out.size())
#print('')
filt = conv.weight.data
imageunfold = F.unfold(image,kernel_size=kh,padding=0,stride=dh)
print("Unfolded image","\n",imageunfold,"\n",imageunfold.shape)
kernels_flat = filt.view(out_channels,-1)
print("Kernel Flat=","\n",kernels_flat,"\n",kernels_flat.shape)
res = kernels_flat # imageunfold # I have to replace this operation with mymult() and myadd()
print(res,"\n",res.shape)
#print(res.size(2),"\n",res.shape)
res = res.view(-1, out_channels, size, size)
#print("Same answer as buitlin function",res)
res = kernels_flat # imageunfold can be replaced with this. although there can be some other efficient implementation which i am looking to get help for.
for m_batch in range(len(imageunfold)):
#iterate through rows of X
# iterate through columns of Y
for j in range(imageunfold.size(2)):
# iterate through rows of Y
for k in range(imageunfold.size(1)):
#print(result[m_batch][i][j]," +=", kernels_flat[i][k], "*", imageunfold[m_batch][k][j])
result[m_batch][i][j] += kernels_flat[i][k] * imageunfold[m_batch][k][j]
Can someone please help me vectorize these three loops for faster execution.
The problem was with the dimesions as kernels_flat[dim0_1,dim1_1] and imageunfold[batch,dim0_2,dim1_2] the resultant should have [batch,dim0_1,dim1_2]
res = kernels_flat # imageunfold can be replaced with this. although there can be some other efficient implementation.
for m_batch in range(len(imageunfold)):
#iterate through rows of X
# iterate through columns of Y
for j in range(imageunfold.size(2)):
# iterate through rows of Y
for k in range(imageunfold.size(1)):
#print(result[m_batch][i][j]," +=", kernels_flat[i][k], "*", imageunfold[m_batch][k][j])
result[m_batch][i][j] += kernels_flat[i][k] * imageunfold[m_batch][k][j]
Your code for the matrix multiplication is missing a loop for iterating over the filters.
In the code below I fixed your implementation.
I am currently also looking for optimizations on the code. In my use case, the individual results of the multiplications (without performing addition) need to be accessible after computation. I will post here in case I find a faster solution than this.
for batch_image in range (imageunfold.shape[0]):
for i in range (kernels_flat.shape[0]):
for j in range (imageunfold.shape[2]):
for k in range (kernels_flat.shape[1]):
res_c[batch_image][i][j] += kernels_flat[i][k] * imageunfold[batch_image][k][j]

Reverse stacking operation on NumPy array

I'm have a line of code that efficiently reshapes a numpy array from a 400x8x8 array to a 160x160 array and I need to reverse the process but can't figure out the reverse of the line.
I can already do this process but it is very inefficient and requires nested loops which I would like to avoid for performance purposes.
Here is the code that I currently have to reverse the process (160x160 > 400x8x8):
previousRow = 0
for rowBlock in range(noBlocksOn1Axis):
previousRow = rowBlock * blockSize
previousColumn = 0
for columnBlock in range(noBlocksOn1Axis):
previousColumn = columnBlock * blockSize
block =
arrayY[previousRow:previousRow+blockSize,
previousColumn:previousColumn + blockSize]
blocksList.append(block)
And here is the line of code that reshapes 400x8x8 > 160x160:
xy = np.zeros((160,160), dtype = np.uint8)
xy = np.vstack(np.hstack(overDone[20*i:20+20*i]) for i in
range(overDone.shape[0]//20))
So any ideas of how I can perform this line of code in reverse?
Thanks :D
Reshape, swap-axes (or transpose axes) and reshape to get overDone back -
xy.reshape(20,8,20,8).swapaxes(1,2).reshape(400,8,8)
More info on intuition behind nd-to-nd array transformation.
Make it generic to handle generic shapes -
m,n = xy.shape
M,N = 20,20 # block size used to get xy
overDone_ = xy.reshape(M,m//M,N,n//N).swapaxes(1,2).reshape(-1,m//M,n//N)
Sample run -
# Original input
In [21]: overDone = np.random.rand(400,8,8)
# Perform forward step to get xy
In [22]: xy = np.vstack(np.hstack(overDone[20*i:20+20*i]) for i in range(overDone.shape[0]//20))
# Use proposed approach to get back overDone
In [23]: out = xy.reshape(20,8,20,8).swapaxes(1,2).reshape(400,8,8)
# Verify output to be same as overDone
In [42]: np.array_equal(out,overDone)
Out[42]: True
Bonus :
We could use those same vectorized reshape+permute-axes steps to create xy for the forward process -
xy = overDone.reshape(20,20,8,8).swapaxes(1,2).reshape(160,160)
What's wrong with numpy.reshape?
my_array_3d = my_array.reshape((400, 8, 8))
my_array_2d = my_array.reshape((160, 160))

Categories