I'm doing object detection for texts in image and want to use Yolo to draw a bounding box where the text is in the image.
Then, how do you do data augmentation? Also, what is the difference between augmentation (contrast adjustment, gamma conversion, smoothing, noise, inversion, scaling, etc.) in ordinary image recognition?
If you have any useful website links, would you tell me plz :)
If you mean by what should you use then, it just a regular object detection task, the common augment, like flips or crop, works fine.
For the difference, if you mean by what the output images will look like then look at this repo https://github.com/albumentations-team/albumentations
But of you mean by the model performance difference then there's probably no answer for that, you can only try several ways and see what's the best.
Related
I made a tif image based on a 3d model of a woodsheet. (x, y, z) represents a point in a 3d space. I simply map (x, y) to a pixel position in the image and (z) to the greyscale value of that pixel. It worked as I have imagined. Then I ran into a low-resolution problem when I tried to print it. The tif image would get pixilated badly as soon as it zooms out. My research suggests that I need to increase the resolution of the image. So I tried a few super-resolution algos found from online sources, including this one https://learnopencv.com/super-resolution-in-opencv/
The final image did get a lot bigger in resolution (10+ times larger in either dimension) but the same problem persists - it gets pixilated as soon as it zooms out, just about the same as the original image.
Looks like quality of an image has something to do not only with resolution of it but also something else. When I say quality of image, I mean how clear the wood texture is in the image. And when I enlarge it, how sharp/clear the texture remains in the image. Can anyone shed some light on this? Thank you.
original tif
The algo generated tif is too large to be included here (32M)
Gigapixel enhanced tif
Update - Here is a recently achieved result: with a GAN-based solution
It has restored/invented some of the wood grain details. But the models need to be retrained.
In short, it is possible to do this via deep learning reconstruction like the Super Resolution package you referred to, but you should understand what something like this is trying to do and whether it is fit for purpose.
Generic algorithms like the Super Resolution is trained on variety of images to "guess" at details that is not present in the original image, typically using generative training methods like using the low vs high resolution version of the same image as training data.
Using a contrived example, let's say you are trying to up-res a picture of someone's face (CSI Zoom-and-Enhance style!). From the algorithm's perspective, if a black circle is always present inside a white blob of a certain shape (i.e. a pupil in an eye), then next time it the algorithm sees the same shape it will guess that there should be a black circle and fill in a black pupil. However, this does not mean that there is details in the original photo that suggests a black pupil.
In your case, you are trying to do a very specific type of up-resing, and algorithms trained on generic data will probably not be good for this type of work. It will be trying to "guess" what detail should be entered, but based on a very generic and diverse set of source data.
If this is a long-term project, you should look to train your algorithm on your specific use-case, which will definitely yield much better results. Otherwise, simple algorithms like smoothing will help make your image less "blocky", but it will not be able to "guess" details that aren't present.
Lets assume i have a little dataset. I want to implement data augmentation. First i implement image segmentation (after this, image will be binary image) and then implement data augmentation. Is this a good way?
For image augmentation in segmentation and instance segmentation, you have to either no change the positions of the objects contained in the image by manipulating colors for example, or modify these positions by applying translations and rotation.
So, yes this way works, but you have to take into consideration the type of data you have and what you are looking to achieve. Data augmentation isn't a ready to-go process with good results everywhere.
In case you have a:
Semantic segmentation : Each pixel of your image has a row i and a column j which are labeled as its enclosing object. This means having your main image I and a label image L with its same size linking every pixel to its object label. In this case, your data augmentation is applied to both I and L, giving a combination of the two transformed images.
Instance segmentation : Here we generate a mask for every instance of the original image and the augmentation is applied to all of them including the original, then from these transformed masks we get our new instances.
EDIT:
Take a look at CLoDSA (Classification, Localization, Detection and Segmentation Augmentor) it may help you implement your idea.
In case your dataset is small, you should add data-augmentation during the training. It is important to change the original image & the targets (masks) in the same way !!.
For example, If an image is rotated 90 degrees, then its mask should also be rotated 90 degrees. Since you are using Keras library, You should check if the ImageDataGenerator also changes the target images (masks), along with the inputs. If it doesn't, You can implement the augmentations by yourself. This repository shows how it is done in OpenCV here:
https://github.com/kochlisGit/random-data-augmentations
I am trying to detect plants in the photos, i've already labeled photos with plants (with labelImg), but i don't understand how to train model with only background photos, so that when there is no plant here model can tell me so.
Do I need to set labeled box as the size of image?
p.s. new to ml so don't be rude, please)
I recently had a problem where all my training images were zoomed in on the object. This meant that the training images all had very little background information. Since object detection models use space outside bounding boxes as negative examples of these objects, this meant that the model had no background knowledge. So the model knew what objects were, but didn't know what they were not.
So I disagree with #Rika, since sometimes background images are useful. With my example, it worked to introduce background images.
As I already said, object detection models use non-labeled space in an image as negative examples of a certain object. So you have to save annotation files without bounding boxes for background images. In the software you use here (labelImg), you can use verify image to say that it saves the annotation file of the image without boxes. So it saves a file that says it should be included in training, but has no bounding box information. The model uses this as negative examples.
In your case, you don't need to do anything in that regard. Just grab the detection data that you created and train your network with it. When it comes to testing, you usually set a threshold for bounding boxes accuracy, because you may get lots of them so you only want the ones with the highest confidence.
Then you get/show the ones with highest bbox accuracies and there your go, you get your detection result and you can do what ever you want like cropping them using the bounding box coordinates you get.
If there are no plants, your network will likely create bboxes with an accuracy below your threshold (very low confidence) and then, you just ignore them.
Is there any simple implemented method for human silhouette extraction in OpenCV? The method can be work only for video.
Here is a sample frame:
For images like these, OpenCV's HOG (Histogram of Oriented Gradients) works very well. And example can be found here. The example is in python, but it is not hard to create a C++ version if you want. The trained parameters are already there, so you can use it immediately.
If you are interested in deep learning based approaches, both SSD (Single Shot Multiple Box Detector) and YOLO (You Only Look Once) can detect persons.
All these methods can only exact a bounding box. For extracting the exact silhouette, you will need to combine the results with image differencing or background subtraction.
I am trying to detect a vehicle in an image (actually a sequence of frames in a video). I am new to opencv and python and work under windows 7.
Is there a way to get horizontal edges and vertical edges of an image and then sum up the resultant images into respective vectors?
Is there a python code or function available for this.
I looked at this and this but would not get a clue how to do it.
You may use the following image for illustration.
EDIT
I was inspired by the idea presented in the following paper (sorry if you do not have access).
Betke, M.; Haritaoglu, E. & Davis, L. S. Real-time multiple vehicle detection and tracking from a moving vehicle Machine Vision and Applications, Springer-Verlag, 2000, 12, 69-83
I would take a look at the squares example for opencv, posted here. It uses canny and then does a contour find to return the sides of each square. You should be able to modify this code to get the horizontal and vertical lines you are looking for. Here is a link to the documentation for the python call of canny. It is rather helpful for all around edge detection. In about an hour I can get home and give you a working example of what you are wanting.
Do some reading on Sobel filters.
http://en.wikipedia.org/wiki/Sobel_operator
You can basically get vertical and horizontal gradients at each pixel.
Here is the OpenCV function for it.
http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=sobel#sobel
Once you get this filtered images then you can collect statistics column/row wise and decide if its an edge and get that location.
Typically geometrical approaches to object detection are not hugely successful as the appearance model you assume can quite easily be violated by occlusion, noise or orientation changes.
Machine learning approaches typically work much better in my opinion and would probably provide a more robust solution to your problem. Since you appear to be working with OpenCV you could take a look at Casacade Classifiers for which OpenCV provides a Haar wavelet and a local binary pattern feature based classifiers.
The link I have provided is to a tutorial with very complete steps explaining how to create a classifier with several prewritten utilities. Basically you will create a directory with 'positive' images of cars and a directory with 'negative' images of typical backgrounds. A utiltiy opencv_createsamples can be used to create training images warped to simulate different orientations and average intensities from a small set of images. You then use the utility opencv_traincascade setting a few command line parameters to select different training options outputting a trained classifier for you.
Detection can be performed using either the C++ or the Python interface with this trained classifier.
For instance, using Python you can load the classifier and perform detection on an image getting back a selection of bounding rectangles using:
image = cv2.imread('path/to/image')
cc = cv2.CascadeClassifier('path/to/classifierfile')
objs = cc.detectMultiScale(image)