I am trying to build a Convolutional Autoencoder which can remove pen marks such as circles, underlines, etc from official documents.
I have the original, clean soft copy of the document and the xeroxed copy with the pen marks.
The xeroxed copy would obviously not completely match up with the original doc and it would be slightly skewed or shifted, etc while scanning.
Additionally, since the doc size is huge (2360, 1650), I had to split the image into 4 halves of size (587, 412) to input into my model.
What I would like to know is would the above issues cause any problems while training and is there any way to rectify it?
Any help would be highly appreciated.
Thanks
EDIT:
As you can see (hopefully!!), the noisy image is slightly different because of skewness or translation while xeroxing.
I don't think it will cause any problems. But, if it does, you can always load the image, resize it to a desired shape and then input it into the model.
Related
Hello i'm currently working on a project where i have to use instance segmentation of different parts of seedlings (the top part and the stem)
Example image:
https://imgur.com/kWAZBed
I have to be able to calculate the angle of the hook for every seedling.
I've heard that the Mask-RCNN instance segmentation method might not be good for biological images, so should i go with U-net semantic segmentation instead?. The problem with U-net is that every seed and root gets categorized into two classes, where as i need to calculate the angle for each of them.
Some input would be appreciated.
You should start with whichever network is easiest for you to get off the ground and see if it's good enough. If not, try another model and see if it's good enough.
You can only go so far in choosing a network architecture for a new image use case. Sometimes you just have to try a few on the new type of image data and see which performs best.
Because your time is valuable, I would recommend starting with the simplest/fastest model for you to use, and try a "trickier" one, only if the first one wasn't good enough.
I must add that it's kind of difficult to understand all of the nuance's of your requirements just from the one image you posted...
good luck.
I want to identify three different objects from a satellite wind image. The problem is three of them are somewhat similar. I tried to identify using templete matching but it didn't work. Three objects are as follows.
Here the direction of the object is not important but the type of the head in the line is important. Can you suggest a way to proceed?
Assuming your image consists of only pure black and pure white pixels,
You can find contours and its bounding rectangle or minAreaRect for each of them.
https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?highlight=minarearect#minarearect
Then iterate over the contours considering there rectangles as separate images. Now you do classification of these images. You may use template matching too.
Good luck!
Have you thought about machine learning?
for example a small cnn that is used for digits recognition could be "retrained" using a small set of your images, Keras also has an data augmentation feature to help ensure a robust classifier is trained.
There is a very good blog post by
Yash Katariya found # https://yashk2810.github.io/Applying-Convolutional-Neural-Network-on-the-MNIST-dataset/, in which the MNIST data set is loaded in and the network is trained, it goes through all of the stages you'd need to in order to use ML for your problem.
You mention you've tried template matching, however you also mention that the rotation is not important, which to me implies that an object could be rotated, that would cause failures for TM.
You could look into LBP (Local Binary Patterns), or maybe OpenCV's Haar Classifier (however it's sensitive to rotation).
Other than the items I have suggested there is a great tutorial found # https://gogul09.github.io/software/image-classification-python which uses features and machine learning you may benefit from looking at to apply to this problem.
I hope while not actually giving you an answer to your question directly, I have given you a set of tools you can use that will solve it with some time invested and some reading.
The requirement of my task is to find the similar output image with the input image in the CNN. There are about half millions images need to be handled with
, it would not realistic to label each image. Beyond that, the images are candlestick charts about all stocks, so it is also very hard to classify all images.
My basic idea is that exporting the key features of images in CNN and comparing these features by using Hashing algorithm to get the similar one. But the idea was not compeleted, and how to done it in the python is also a big challenge. Therefor, is there anyone can help me with this issue,Thank you very much!!. If possible could you give me any relative articles or codes?
Mind reading this
https://towardsdatascience.com/find-similar-images-using-autoencoders-315f374029ea
This uses autoencoders to find similar images.
Please do post the final output code that you will apply so that i have better understand also.
I am currently trying to get the Faster R-CNN network from here to work in windows with tensorflow. For that, I wanted to re-implement the ROI-Pooling layer, since it is not working in windows (at least not for me. If you got any tips on porting to windows with tensorflow, I would highly appreciate your comments!). According to this website, what you do is, you take your proposed roi from your feature map and max pool its content to a fixed output size. This fixed output is needed for the following fully connected layers, since they only accept a fixed size input.
The problem now is the follwing:
After conv5_3, the last convolutional layer before roi pooling, the box that results from the region proposal network is mostly 5x5 pixels in size. This is totally fine, since the objects I want to detect usually have dimensions of 80x80 pixels in the original image (downsampling factor due to pooling is 16). However, I now have to max pool an area of 5x5 pixels and ENLARGE it to 7x7, the target size for the ROI-Pooling. My first try by simply doing interpolation did not work. Also, padding with zeros did not work. I always seem to get the same scores for my classes.
Is there anything I am doing wrong? I do not want to change the dimensions of any layer and I know that my trained network in general works because I have the reference implementation running in Linux on my dataset.
Thank you very much for your time and effort :)
There is now an official TF implementation of Faster-RCNN, and other object detection algorithms, in their Object Detection API, you should probably check it out.
If you still want to code it yourself, I wondered exactly the same thing as you and could not find an answer about how you're supposed to do. My three guesses would be:
interpolation, but it changes the feature values, so it destroys some information...
Resizing to 35x35 just by copying 7 times each cell and then max-pooling back to 7x7 (you don't have to actually do the resizing and then the pooling , for instance in 1D it basically reduces itself to output[i]=max(input[floor(i*5/7)], input[ceil(i*5/7)]), with a similar max over 4 elements in 2D -be careful, I might have forgotten some +1/-1 or something-). I see at least two problems: some values are over-represented, being copied more than others; but worse, some (small) values will not even be copied at all in the output ! (which you should avoid given that you can store more information in the output than in the input)
Making sure all input feature values are copied at least once exactly in the output, at the best possible place (basically copy input[i] to output[j] with j=floor((i+1)*7/5)-1)). For the remaining spots, either leave a 0 or do interpolation. I would think this solution is the best, maybe with interpolations but I'm really not sure at all.
It looks like smallcorgi's implementation uses my 2nd solution (without actually resizing, just using max pooling), since it's the same implementation as for the case where the input is bigger than the output.
I know it's late but i post this answer because it might help others. I have written a code that explains how roi pooling works in different height and width conditions for both pool and region.
you can see the link of the code in github:
https://github.com/Parsa33033/RoiPooling
This is a fairly straightforward question, but I am new to the field. Using this tutorial I have a great way of detecting certain patterns or features. However, the images I'm testing are large and often the feature I'm looking for only occupies a small fraction of the image. When I run it on the entire picture the classification is bad, though when zoomed it and cropped the classification is good.
I've considered writing a script that breaks an image into many different images and runs the test on all (time isn't a huge concern). However, this still seems inefficient and unideal. I'm wondering about suggestions for the best, but also easiest to implement, solution for this.
I'm using Python.
This may seem to be a simple question, which it is, but the answer is not so simple. Localisation is a difficult task and requires much more leg work than classifying an entire image. There are a number of different tools and models that people have experimented with. Some models include R-CNN which looks at many regions in a manner not too dissimilar to what you suggested. Alternatively you could look at a model such as YOLO or TensorBox.
There is no one answer to this, and this gets asked a lot! For example: Does Convolutional Neural Network possess localization abilities on images?
The term you want to be looking for in research papers is "Localization". If you are looking for a dirty solution (that's not time sensitive) then sliding windows is definitely a first step. I hope that this gets you going in your project and you can progress from there.