My question is about how to create a labeled image dataset for machine learning?
I have always worked with already available datasets, so I am facing difficulties with how to labeled image dataset(Like we do in the cat vs dog classification).
I have to do labeling as well as image segmentation, after searching on the internet, I found some manual labeling tools such as LabelMe and LabelBox.LabelMe is good but it's returning output in the form of XML files.
Now again my concern is how to feed XML files into the neural network? I am not at all good at image processing task, so I need an alternative suggestion.
Edit: I have scanned copy of degree certificates and normal documents, I have to make a classifier which will classify degree certificates as 1 and non-degree certificates as 0. So my label would be like:
Degree_certificate -> y(1)
Non_degree_cert -> y(0)
You don't feed XML files to the neural network. You process them with an XML parser, and use that to extract the label. See the question How do I parse XML in Python? for advice on how this works.
Image data sets can come in a variety of starting states. Sometimes, for instance, images are in folders which represent their class. If you like to work with this approach, then rather than read the XML file directly every time you train, use it to create a data set in the form that you like or are used to. The reason you find many nice ready-prepared data sets online is because other people have done exactly this. It is worth doing, as you don't then need to repeat all the transformations from raw data just to start training a model.
For example, collect your XML data from LabelMe, then use a short script to read the XML file, extract the label you entered previously using ElementTree, and copy the image to a correct folder. You will end up with a data set consisting of two folders with positive and negative matching images, ready to process with your favourite CNN image-processing package.
Related
I am working on a Python project and I am trying to build a neural network model to extract specific information common across PDFs with different layouts. In simpler terms: for the time being, I have a total of 61 PDF forms with 61 different layouts which all have the common field 'Post Code'. The 'Post Code' field box is located in different page coordinates and I have to build and train a model which is able to load any of the PDFs and extract the Post Code field.
I can build a Python model which can extract the Post Code from only one PDF at a time, which means that if I had to iterate the process over 61 forms it would take too much time. I wouldn't have any problems if the PDFs were all with the same layout. I am struggling to find a way to make the process more efficient when it comes to PDFs with different layouts.
From what I have seen on Stack Overflow most problems are related to information extraction from one document or from more documents of the same layout. I would not want to branch out to another language other than Python and I would appreciate if anyone has found a way to perform specific information extraction from PDFs with different layouts.
Let's say I have a set of images of passports. I am working on a project where I have to identify the name on each passport and eventually transform that object into text.
For the very first part of labeling (or classification (I think. beginner here)) where the name is on each passport, how would I go about that?
What techniques / software can I use to accomplish this?
in great detail or any links would be great. I'm trying to figure out how this is done exactly so I can began coding
I know training a model is involved possibly but I'm just not sure
I'm using Python if that matters.
thanks
There's two routes you can take, one where you have labeled data (or you want to label data yourseld), and one where you don't have that.
Let's start with the latter. Say you have an image of a passport. You want to detect where the text in the image is, and what that text says. You can achieve this using a library called pytessaract. It's an AI that does exactly this for you. It works well because it has been trained on a lot of other images, so it's good in detecting text in any image.
If you have labels you might be able to improve your model you could make with pytessaract, but this is a lot harder. If you want to learn it anyway, I would recommend with learning ŧensorflow, and use "transfer learning" to improve your model.
I have millions of images, and I am able to use OCR with pytesseract to perform descent text extraction, but it takes too long to process all of the images.
Thus I would like to determine if an image simply contains text or not, and if it doesn't, i wouldn't have to perform OCR on it. Ideally this method would have a high recall.
I was thinking about building a SVM or some machine learning model to help detect, but I was hoping if anyone new of a method to quickly determine if an object contains text or not.
Unfortunately there is no way to tell if an image has text in it, without performing OCR of some kind on it.
You could build a machine learning model that handles this, however keep in mind it would still need to process the image as well.
I currently have a binary image classifier. This is for my undergraduate dissertation, so any help would be very much appreciated.
The model performs well when I am loading and using the model as the '.model' file within the kernel. What I want, is to freeze the graph and deploy the classifier to a portable solution, e.g a local Flask web server.
Whilst following tutorials regarding the freezing of the graph, it is apparent that I must define output nodes. I am struggling to select which nodes I should be parsing at this stage. I have printed out the list of available nodes using
[print(n.name) for n in tf.get_default_graph().as_graph_def().node]
This presents me with a huge list of options. Using my 'targets/Y' as an output node, the graph does not freeze any variables and 'frozen_model.pb' has a size of 1kb. When I select the 'FullyConnected_1/Softmax' as the output nodes, a number of around 60/70 variables get saved and the file size is larger, around what I would expect (30mb or so). When I use this frozen graph to classify a test image, I get basically a 50/50 split on the confidence of my two available classes (a clueless classifier): [[0.49903905 0.5009609 ]] - to be exact.
What I extrapolate from this, in relation to the result and where in the list of available nodes the 'FullyConnected_1/Softmax' is placed, this is freezing the model at the stage prior to any training or optimising at all.
I am sure you will want more information, but I don't want to post absolutely everything and confuse the post. Please could somebody help me with this, I will provide more code and information as you need.
Thank you very much!
The requirement of my task is to find the similar output image with the input image in the CNN. There are about half millions images need to be handled with
, it would not realistic to label each image. Beyond that, the images are candlestick charts about all stocks, so it is also very hard to classify all images.
My basic idea is that exporting the key features of images in CNN and comparing these features by using Hashing algorithm to get the similar one. But the idea was not compeleted, and how to done it in the python is also a big challenge. Therefor, is there anyone can help me with this issue,Thank you very much!!. If possible could you give me any relative articles or codes?
Mind reading this
https://towardsdatascience.com/find-similar-images-using-autoencoders-315f374029ea
This uses autoencoders to find similar images.
Please do post the final output code that you will apply so that i have better understand also.