Random crop and bounding boxes in tensorflow - python

I want to add a data augmentation on the WiderFace dataset and I would like to know, how is it possible to random crop an image and only keep the bouding box of faces with the center inside the crop using tensorflow ?
I have already try to implement a solution but I use TFRecords and the TfExampleDecoder and the shape of the input image is set to [None, None, 3] during the process, so no way to get the shape of the image and do it by myself.

You can get the shape, but only at runtime - when you call sess.run and actually pass in the data - that's when the shape is actually defined.
So do the random crop manually in tesorflow, basically, you want to reimplement tf.random_crop so you can handle the manipulations to the bounding boxes.
First, to get the shape, x = your_tensor.shape[0] will give you the first dimension. It will appear as None until you actually call sess.run, then it will resolve to the appropriate value. Now you can compute some random crop parameters using tf.random_uniform or whatever method you like. Lastly you perform the crop with tf.slice.
If you want to choose whether to perform the crop or not you can use tf.cond.
Between those components, you should be able to implement what you want using only tensorflow constructs. Try it out and if you get stuck along the way post the code and error you run into.

Related

How can I make the annotation follow the rotation or random crop of the photo in Python?

I want to make datasets of my images for an object detection model.
However, I want to try to use random crop and rotating the images to make more datasets but I worry about whether the annotation also can change through the original images's annotation.
What I mean is that I just want to change the anotation automatically when I use the function of random crop and rotation.
If you have any idea? please tell me.
Thank you

Data augmentation after image segmentation

Lets assume i have a little dataset. I want to implement data augmentation. First i implement image segmentation (after this, image will be binary image) and then implement data augmentation. Is this a good way?
For image augmentation in segmentation and instance segmentation, you have to either no change the positions of the objects contained in the image by manipulating colors for example, or modify these positions by applying translations and rotation.
So, yes this way works, but you have to take into consideration the type of data you have and what you are looking to achieve. Data augmentation isn't a ready to-go process with good results everywhere.
In case you have a:
Semantic segmentation : Each pixel of your image has a row i and a column j which are labeled as its enclosing object. This means having your main image I and a label image L with its same size linking every pixel to its object label. In this case, your data augmentation is applied to both I and L, giving a combination of the two transformed images.
Instance segmentation : Here we generate a mask for every instance of the original image and the augmentation is applied to all of them including the original, then from these transformed masks we get our new instances.
EDIT:
Take a look at CLoDSA (Classification, Localization, Detection and Segmentation Augmentor) it may help you implement your idea.
In case your dataset is small, you should add data-augmentation during the training. It is important to change the original image & the targets (masks) in the same way !!.
For example, If an image is rotated 90 degrees, then its mask should also be rotated 90 degrees. Since you are using Keras library, You should check if the ImageDataGenerator also changes the target images (masks), along with the inputs. If it doesn't, You can implement the augmentations by yourself. This repository shows how it is done in OpenCV here:
https://github.com/kochlisGit/random-data-augmentations

clever image augmentation - random zoom out

i'm building a CNN to identify facial keypoints. i want to make the net more robust, so i thought about applying some zoom-out transforms because most pictures have about the same location of keypoints, so the net doesn't learn much.
my approach:
i want augmented images to keep the original image size so apply MaxPool2d and then random (not equal) padding until the original size is reached.
first question
is it going to work with simple average padding or zero padding? i'm sure it would be even better if i made the padding appear more like a background but is there a simple way to do that?
second question
the keypoints are the target vector, they come as a row vector of 30. i'm getting confused with the logic needed to transform them to the smaller space.
generally if an original point was at (x=5,y=7) it transforms to (x=2,y=3)- i'm not sure about it but so far manually checked and it's correct. but what to do if to keypoints are in the same new pixel? i can't feed the network with less target values.
that's it. would be happy to hear your thoughts
I suggest to use torchvision.transforms.RandomResizedCrop as a part of your Compose statement. which will give you random zooms AND resize the resulting the images to some standard size. This avoids issues in both your questions.

Getting started with denoising elements of a 200x200 numpy array

I have a 200x200 numpy array that has a shape in it which I can see when I graph it using matplotlib's imshow() function. However, there is also a lot of noise added in that picture. I am trying to use openCV to emphasize the shape and denoise the image. But it keeps throwing error messages that I don't understand. What should I do to get started on the denoising problem. The shape is visible to me as I see it but extra noise was added using the np.random.randint() function on top of the image. I want to reduce that noise
Here are some tutorials about image denoising techniques available in opencv.
Blurring out the noise
The most basic is applying a blur to average out the random noise. This will have the negative effect that the edges in the image will not be as sharp as originally. Depending on your application, this might be fine. Depending on the amount of noise, you can chance the size of the filter k. A larger value will produce a blurrier image with less noise.
k = 5
filtered_image = cv.blur(img,(k,k))
Advanced denoising
Alternatively, you can use more advanced techniques such as Non-local Means Denoising. This applies averaging across similar patches in the image. This technique has a few more parameters to tune to your specific application which you can read about here. (There are different versions of this function for greyscale and colour images, as well as for image sequences).
luminosity_filter_strength = 10
colour_filter_strength = 10
template_window_size = 7
search_window_size = 21
filtered_image = cv.fastNlMeansDenoisingColored(img,
luminosity_filter_strength,
colour_filter_strength,
template_window_size,
search_window_size)
I solved the problem using Scikit Image. They have very accessible documentation page for new comers and the error messages are a lot easier to understand. As for my problem I had to use Scikit Image's restoration library which has a lot of denoising functions much like openCV however the examples and the easy to understand error messages really helped. Playing around with Bilateral filters and Non-local Means Denoising solved the problem for me.

cv2 CascadeClassifier parameters

Can someone give me example of fully set classifier< I´m talking about parameters i just don´t understand this example:
cv2.CascadeClassifier.detectMultiScale(image, rejectLevels, levelWeights[, scaleFactor[, minNeighbors[, flags[, minSize[, maxSize[, outputRejectLevels]]]]]]) → objects
I am detecting my face but I need to set min and max size of it. To do that you have to set rejectLevels, levelWeights etc.
I´m using module CV2.
In this problem, first you have to create a collection file with bounding boxes on positive images before you create a list of negative images. Then you have to create opencv samples in order to train your cascade. Once you have finished that, You can simply use following code in order to detect your face samples.
#load detection file
cascade = cv2.CascadeClassifier("cascade.xml")
# detect objects, return as list
rects = cascade.detectMultiScale(img)
Then you can iterate over your rect list.
Please have a look on this ref:

Categories