Conv1d custom dilation value based on list (non-constant dilation) - python

Let's say I have a tensor [1,2,3,4,5,6] and I want to apply kernel of [0,1] to it using conv1d. The slight twist, I want my dilation value to not be constant. In other words, dilation might be equal to [0,1,2] where they are dilation values.
This would yield the following steps (with a stride of 1 and ignoring padding):
Step 1: [*0,2*,3,4,5]
Step 2: [0,*0*,3,*4*,5,6]
Step 3: [0,0,*0*,4,5,*6*]
Would I have to modify nn.Conv1d to make this possible? If so, how do I add this functionality without a for loop?
Would I have to write external CUDA code to make it run in similar speed to dilation convs usually do?
Edit:
Looks like writing a CUDA function might be necessary, but still unsure.

Related

Counting the number of components/objects in an array

I am wondering what is a good way to count the number of components in a matrix. Let's say that ones constitute a component and zeros constitute the background. So in the array below there are 4 components:
a = np.array([
[1,1,0,0,0,0,0,0],
[1,1,1,0,0,0,1,0],
[0,1,0,0,0,1,1,1],
[0,0,0,0,0,1,1,1],
[0,1,0,0,0,0,1,0],
[0,0,0,1,1,0,0,0],
[0,0,1,1,0,0,0,0]])
I know that I can do this with Scipy like this:
from scipy.ndimage import label
labeled_array, num_features = label(a)
where num_features will give me the correct answer of 4.
How would I implement this myself? I am asking what the correct technique would be. Preferably I want to implement this with matrix operations (e.g. a Numpy solution). So no for-loops where I am checking every value individually.
I am asking this because in the end I want to implement the solution in Tensorflow such that the whole thing is differentiable. This way I can add the number of components as a term in my loss function for an image segmentation problem, which is my end goal.
I thought of using morphological erosion, to shrink the components in the matrix until only 1s remain. Then I could just take the sum of the array to count the number of components. Unfortunately if I erode an isolated one [[0,0,0],[0,1,0],[0,0,0]] it will remove it ([[0,0,0],[0,0,0],[0,0,0]]). If I repeat erosion it will continue until there are no components left. I also think I could use skeletonization, which garantuees that there would always remain a center-point? But I am not sure if that is the right technique, and how I would implement that.
I was wondering if anyone has any ideas on this problem or knows how to solve it. Any input would be greatly appreciated!
This can be done in linear time using the flood-fill algorithm. Here is an example:
count = 0
for i in range(a.shape[0]):
for j in range(a.shape[1]):
if a[i, j] == 1:
floodFill(a, i, j)
count += 1
While this solution use loops, you can use Numba or Cython to mitigate the cost of the function calls and the loops. The resulting code should be very fast.
There is no flood fill algorithm in Numpy but there is one in several image packages like OpenCV or scikit-image for example. You can write your own implementation as long as you use Cython or Numba to speed it up.
Actually, this operation is called segmentation and AFAIK it can be done directly in OpenCV in one pass. Once applied, you could easily count the labelled area using Numpy operation (with something like count = len(np.unique(segmentedResult.flatten())).
I highly doubt once can write an efficient code using only matrix-based Numpy functions here. The morphological erosion method would take a quadratic time assuming it could work correctly. A good scalar algorithm can sometimes outperform any vectorized inefficient algorithm.

"Array" detection in Tensorflow

Can Tensorflow handle inputs of varying shapes (sizes)?
The project
I'm developing a image/shape recognizer which captures an array of {x:#,y:#} positions.
For example, a circle might look like this
[{"x":0.38,"y":0.32},{"x":0.33,"y":0.35},{"x":0.31,"y":0.4},{"x":0.31,"y":0.46},{"x":0.34,"y":0.51},{"x":0.39,"y":0.52},{"x":0.44,"y":0.51},{"x":0.47,"y":0.47},{"x":0.49,"y":0.42},{"x":0.47,"y":0.37},{"x":0.42,"y":0.34},{"x":0.37,"y":0.33}]
and a square like this
[{"x":0.15,"y":0.19},{"x":0.15,"y":0.25},{"x":0.15,"y":0.31},{"x":0.15,"y":0.37},{"x":0.14,"y":0.42},{"x":0.14,"y":0.48},{"x":0.14,"y":0.53},{"x":0.14,"y":0.59},{"x":0.14,"y":0.64},{"x":0.2,"y":0.64},{"x":0.26,"y":0.64},{"x":0.31,"y":0.65},{"x":0.37,"y":0.65},{"x":0.43,"y":0.65},{"x":0.49,"y":0.65},{"x":0.54,"y":0.65},{"x":0.6,"y":0.65},{"x":0.65,"y":0.65},{"x":0.67,"y":0.6},{"x":0.68,"y":0.55},{"x":0.68,"y":0.5},{"x":0.68,"y":0.44},{"x":0.68,"y":0.38},{"x":0.68,"y":0.32},{"x":0.67,"y":0.27},{"x":0.67,"y":0.22},{"x":0.66,"y":0.17},{"x":0.61,"y":0.15},{"x":0.56,"y":0.13},{"x":0.51,"y":0.13},{"x":0.45,"y":0.13},{"x":0.39,"y":0.13},{"x":0.33,"y":0.13},{"x":0.27,"y":0.13},{"x":0.22,"y":0.14},{"x":0.17,"y":0.15}]
Because the length of these shapes can vary I was wondering how Tensorflow would handle it...as I understand it, the input "shape" needs to always be the same length, right?
Yes, the shape should be the same. But, in your case, you can make sure that for a batch, all the arrays have the same number of elements by adding dummy elements to those which fall short in length.
Just make sure that for a batch, your shape is same.

How does the groups parameter in torch.nn.conv* influence the convolution process?

I want to convolve a multichannel tensor with the same single channel weight.
I could repeat the weight along the channel dimension, but I thought there might be an other way.
I thought the groups parameter might do the job. However I don't understand the documentation.
That's why I want to ask how the groups parameter influences the convolution process ?
Just minor tips since I never used it.
Group parameter multiplies the number of kernels you would normally have.
So if you set group=2, expect 2 times more kernels.
The definition of conv2d in PyTorch states group is 1 by default.
If you increase the group you get the depth-wise convolution, where each input channel is getting specific kernels per se.
The constraint is both in and out channels should be dividable by group number.
I think in Tensorfolow you can read the documentation of SeparableConv2D since this is what is equivalent when group>1.

Tensorflow Multiple Graph & Patch Concern

I have a situation where I am using two different networks, one network to tell me if there is important information in a given patch, and a second network to tell me where the important information in the patch is using segmentation.
If I operate them in the same TF Graph / Session, I end up having to use tf.where or tf.cond to tell me which patches I actually want to use, but my optimizer is creating gradients for each condition for the whole net, or at least that is my working theory.
This is using segmentation_logit = tf.where(is_useful_patch,coarse_log,negative_log)
Negative log is a tensor of 0's in the same shape as the coarse logit.
If I am using 192 (128x128) patches, the optimizer attempts to create a parameter with over 100 million parameters (ex: [192,222,129,128]), which nukes my GPU ram and causes a crash.
So, short of actually defining two different sessions, graphs, savers, restorers and tensorboard writers, is there a better way to go about this, a better way to calculate gradients, or a way to combine multiple graphs in the same session?
Thanks in advance!
I'm supposing you get a 192 long is_useful_patch vector with values between 0 and 1 (probabilities) as a result of the first network.
First of all, forget tf.cond of tf.where. I suggest to take a smaller number, like 16 or so (or whatever is best based on your experience, how many useful patches there are normally), and take the index of the best 16 patches with tf.nn.top_k like this:
values, idx_best_patches = tf.nn.top_k( is_useful_patch,
k = 16, sorted = False, name = 'idx_best_patches' )
Then use to collect the best patches with tf.gather_nd like this:
best_patches = tf.gather_nd( patches, idx_best_patches, name = 'best_patches' )
This will collect your best 16 patches and then you continue only with those 16 into the segmenter, instead of 192, having just cut the memory requirement for the segmenter to 1/12. This is the core of the idea.
If there are less than 16 patches with useful information, you can mask some of the output. Also I have no idea how your patches are structured, so make sure you review the tf.gather_nd parameters for correctness, it can be tricky.

How do you do ROI-Pooling on Areas smaller than the target size?

I am currently trying to get the Faster R-CNN network from here to work in windows with tensorflow. For that, I wanted to re-implement the ROI-Pooling layer, since it is not working in windows (at least not for me. If you got any tips on porting to windows with tensorflow, I would highly appreciate your comments!). According to this website, what you do is, you take your proposed roi from your feature map and max pool its content to a fixed output size. This fixed output is needed for the following fully connected layers, since they only accept a fixed size input.
The problem now is the follwing:
After conv5_3, the last convolutional layer before roi pooling, the box that results from the region proposal network is mostly 5x5 pixels in size. This is totally fine, since the objects I want to detect usually have dimensions of 80x80 pixels in the original image (downsampling factor due to pooling is 16). However, I now have to max pool an area of 5x5 pixels and ENLARGE it to 7x7, the target size for the ROI-Pooling. My first try by simply doing interpolation did not work. Also, padding with zeros did not work. I always seem to get the same scores for my classes.
Is there anything I am doing wrong? I do not want to change the dimensions of any layer and I know that my trained network in general works because I have the reference implementation running in Linux on my dataset.
Thank you very much for your time and effort :)
There is now an official TF implementation of Faster-RCNN, and other object detection algorithms, in their Object Detection API, you should probably check it out.
If you still want to code it yourself, I wondered exactly the same thing as you and could not find an answer about how you're supposed to do. My three guesses would be:
interpolation, but it changes the feature values, so it destroys some information...
Resizing to 35x35 just by copying 7 times each cell and then max-pooling back to 7x7 (you don't have to actually do the resizing and then the pooling , for instance in 1D it basically reduces itself to output[i]=max(input[floor(i*5/7)], input[ceil(i*5/7)]), with a similar max over 4 elements in 2D -be careful, I might have forgotten some +1/-1 or something-). I see at least two problems: some values are over-represented, being copied more than others; but worse, some (small) values will not even be copied at all in the output ! (which you should avoid given that you can store more information in the output than in the input)
Making sure all input feature values are copied at least once exactly in the output, at the best possible place (basically copy input[i] to output[j] with j=floor((i+1)*7/5)-1)). For the remaining spots, either leave a 0 or do interpolation. I would think this solution is the best, maybe with interpolations but I'm really not sure at all.
It looks like smallcorgi's implementation uses my 2nd solution (without actually resizing, just using max pooling), since it's the same implementation as for the case where the input is bigger than the output.
I know it's late but i post this answer because it might help others. I have written a code that explains how roi pooling works in different height and width conditions for both pool and region.
you can see the link of the code in github:
https://github.com/Parsa33033/RoiPooling

Categories