Let's say I have a set of images of passports. I am working on a project where I have to identify the name on each passport and eventually transform that object into text.
For the very first part of labeling (or classification (I think. beginner here)) where the name is on each passport, how would I go about that?
What techniques / software can I use to accomplish this?
in great detail or any links would be great. I'm trying to figure out how this is done exactly so I can began coding
I know training a model is involved possibly but I'm just not sure
I'm using Python if that matters.
thanks
There's two routes you can take, one where you have labeled data (or you want to label data yourseld), and one where you don't have that.
Let's start with the latter. Say you have an image of a passport. You want to detect where the text in the image is, and what that text says. You can achieve this using a library called pytessaract. It's an AI that does exactly this for you. It works well because it has been trained on a lot of other images, so it's good in detecting text in any image.
If you have labels you might be able to improve your model you could make with pytessaract, but this is a lot harder. If you want to learn it anyway, I would recommend with learning ŧensorflow, and use "transfer learning" to improve your model.
Related
Hello i'm currently working on a project where i have to use instance segmentation of different parts of seedlings (the top part and the stem)
Example image:
https://imgur.com/kWAZBed
I have to be able to calculate the angle of the hook for every seedling.
I've heard that the Mask-RCNN instance segmentation method might not be good for biological images, so should i go with U-net semantic segmentation instead?. The problem with U-net is that every seed and root gets categorized into two classes, where as i need to calculate the angle for each of them.
Some input would be appreciated.
You should start with whichever network is easiest for you to get off the ground and see if it's good enough. If not, try another model and see if it's good enough.
You can only go so far in choosing a network architecture for a new image use case. Sometimes you just have to try a few on the new type of image data and see which performs best.
Because your time is valuable, I would recommend starting with the simplest/fastest model for you to use, and try a "trickier" one, only if the first one wasn't good enough.
I must add that it's kind of difficult to understand all of the nuance's of your requirements just from the one image you posted...
good luck.
I am trying to detect electrical symbol in electrical scheme.
Here I think 2 ways could be use:
classical way with OpenCV, I tried re to recognise shape with opencv and python but some symbole are too complexe
deep learning way: I tried with Mask-RCNN using a handmade dataset of symbol but nothing get really successful
Here is a really simple example of what I would like to do:
I think it could be easy to make a dataset of symbol but all symbol would be the same form and context of the image would not be represented.
How do you think I could handle this problem ?
QATM:Quality-Aware Template Matching For Deep Learning might be what you are looking for.
Original paper : https://arxiv.org/abs/1903.07254
And the following github contain an example with electric scheme:
https://github.com/kamata1729/QATM_pytorch
Since the components of electrical scheme are always the same, I would try first Template Matching with OpenCV. I guess you will have to cut the components and make rotated copies to find all of them. It would be also nice to have better resolution of the images.
Next idea, I would say is making convolution with kernel which would be basicly the component you are expecting to be in the image.
Lastly, idea, which will give you definitely more certain results, but its pretty much overkill, is to use google image recognition, which you can use with python and train it on your images
https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html
This is a fairly straightforward question, but I am new to the field. Using this tutorial I have a great way of detecting certain patterns or features. However, the images I'm testing are large and often the feature I'm looking for only occupies a small fraction of the image. When I run it on the entire picture the classification is bad, though when zoomed it and cropped the classification is good.
I've considered writing a script that breaks an image into many different images and runs the test on all (time isn't a huge concern). However, this still seems inefficient and unideal. I'm wondering about suggestions for the best, but also easiest to implement, solution for this.
I'm using Python.
This may seem to be a simple question, which it is, but the answer is not so simple. Localisation is a difficult task and requires much more leg work than classifying an entire image. There are a number of different tools and models that people have experimented with. Some models include R-CNN which looks at many regions in a manner not too dissimilar to what you suggested. Alternatively you could look at a model such as YOLO or TensorBox.
There is no one answer to this, and this gets asked a lot! For example: Does Convolutional Neural Network possess localization abilities on images?
The term you want to be looking for in research papers is "Localization". If you are looking for a dirty solution (that's not time sensitive) then sliding windows is definitely a first step. I hope that this gets you going in your project and you can progress from there.
Suppose I have an image of a car taken from my mobile camera and I have another image of the same car taken downloaded from the internet.
(For simplicity please assume that both the images contain the same side view projection of the same car.)
How can I detect that both the images are representing the same object i.e. the car, in this case, using OpenCV?
I've tried template matching, feature matching (ORB) etc but those are not working and are not providing satisfactory results.
SIFT feature matching might produce better results than ORB. However, the main problem here is that you have only one image of each type (from the mobile camera and from the Internet. If you have a large number of images of this car model, then you can train a machine learning system using those images. Later you can submit one image of the car to the machine learning system and there is a much higher chance of the machine learning system recognizing it.
From a machine learning point of view, using only one image as the master and matching another with it is analogous to teaching a child the letter "A" using only one handwritten letter "A", and expecting him/her to recognize any handwritten letter "A" written by anyone.
Think about how you can mathematically describe the car's features so that every car is different. Maybe every car has a different size of wheels? Maybe the distance from the door handle to bottom of the side window is a unique characteristic for every car? Maybe every car's proportion of front side window's to rear side window's width is an individual feature of that car?
You probably can't answer yes with 100% confidence to any of these quesitons. But, what you can do, is combine those into a multidimensional feature vector and perform classification.
Now, what will be the crucial part here is that since you're doing manual feature description, you need to take care of doing an excellent work and testing every step of the way. For example, you need to design features that will be scale and perspective invariant. Here, I'd recommend reading on how face detection was designed to fulfill that requirement.
Will Machine Learning be a better solution? Depends greatly on two things. Firstly, what kind of data are you planning to throw at the algorithm. Secondly, how well can you control the process.
What most people don't realize today, is that Machine Learning is not some magical solution to every problem. It is a tool and as every tool it needs proper handling to provide results. If I were to give you advice, I'd say you will not handle it very well yet.
My suggestion: get acquainted with basic feature extraction and general image processing algorithms. Edge detection (Canny, Sobel), contour finding, shape description, hough transform, morphological operations, masking, etc. Without those at your fingertips, I'd say in that particular case, even Machine Learning will not save you.
I'm sorry: there is no shortcut here. You need to do your homework in order to make that one work. But don't let that scare you. It's a great project. Good luck!
Now I want to train my own image data in caffe using SegNet.
But at the first step we need label our own image like these:
I have tried to search github but cannot find anything. So my question is anyone know which tool can make semantic label image?
Check out a tool called sloth: https://github.com/cvhciKIT/sloth, which is an open-source tool written in Python with PyQt for creating ground truth computer vision datasets for a wide array of applications, such as semantically creating data like you have above.
If you don't like sloth, you can use any image editing software, like GIMP where you would make one layer per label and use polygons and flood fill of different hues to create your data. You would then merge all of the layers together to make a final image that you would use for your purposes.
However, as user Miki mentioned (see discussion thread below), creating new datasets from the beginning will take a considerable amount of effort. It is highly advisable that you don't create this on your own as you need a lot of data to ensure your algorithms are performing correctly. You'll need the help of other (hopefully willing) PhD students, preferably those you know personally or work with you in your lab or workplace to help manually curate this data for you.
If this isn't an option, you can use crowd sourced funded places like Amazon Mechanical Turk where you can outsource the work to willing individuals where you inform them of the task at hand and you pay a small amount per image. This would be something to consider if you can't find many people to help you.
All in all, this will take a considerable amount of effort, not only in terms of time but in terms of people if you want to create a large data set within a short span of time. I would recommend you simply use established datasets, such as what you have referenced from Cambridge, or Miki suggested LabelMe by Antonio Torralba which not only is a toolbox for annotating images from his LabelMe dataset but it also allows you to do the same for your own images.
Good luck!
As answer by #rayryeng a tool called sloth is great to finish these task in simple way. However, if I have more than 20 object waiting for me to classify, sloth is not a ideal tools. Thus I develop a simple tool which call IsLabel to finish these problem with few algorithms.
And the result look like these while using IsLabel just took me 40s:
INPUT:
OUTPUT:
I know its not perfect but it work fine for me.
I would recommend using https://www.labelbox.io/. They open sourced a lot of their code and have a hosting platform to manage the whole labeling process end to end.
Here is an example of segmentation
And you can export labels with a mask.