I'm learning neural network by using tensorflow to build a OCR for printed documents.
Would you mind giving me advices which Architecture neural network is good for recognize characters.
I'm confusing because I'm a newbie and there are a lot of neural network designs
I found MNIST CLASSIFIER but their architectures are only about digit.
I don't know their architectures can work with characters or not ?
thank you
As you correctly point out, recognizing documents is a different thing from recognizing single characters. It is a complex system that will take time to implement from scratch. First, there is the problem of preprocessing. You need to find where the text is, perhaps slightly rotate it, etc. That can be done with heuristics and a library like OpenCV. You'll also have to detect things like page numbers, header/footers, tables/figures, etc.
Then, in some cases, you could take the "easy" route and use heuristics to segment the text into characters. That works for block characters, but not cursive scripts.
If the segmentation is given, and you don't have to guess it, you have to solve multiple related problems, each are like MNIST but they are related in that the decisions are not independent. You can look up MEMM (Maximum-Entropy Markov Models) vs HMM (Hidden Markov Models, Hidden Conditional Random Fields, and Segmental Conditional Random Fields, and study the difference between them. You can also read about seq2seq.
So if you're making it simple for yourself, you can essentially run MNIST classifiers multiple times, once the segmentation is revealed (via some heuristic in opencv). On top of that, you have to run a dynamic program which finds the best final sequence based on the score of each decision, and a "language model", which assigns likelihoods of letters occurring close to each other.
If you're starting from scratch, it's not an easy thing. It may take months for you to get a basic understanding. Happy hacking!
Related
I have a database full of items and I've been tasked with classifying them (they could be books, stationary etc). The options are to either go through 100k records manually and figure out what they are or to automate the task.
The codes for each type of item follow some kind of pattern, and so I'm hoping to use machine learning to solve this (I do not want to use regular expressions). Though I'm quite good at python, my ml knowledge only goes as far as random forests and logistic regression.
Is this at all possible? The data looks like this:
Item code type
1 4S2BDANC5L3247151 book
2 1N4AL3AP1JC236284 book
3 3R4BTTGC3L3237430 book
4 KNMAT2MT1KP546287 book
5 97806773062273208 pen
6 07356196706378892 Pen
7 97807345361169253 pen
8 01008130715194136 chair
9 01076305063010CCE44 chair
etc
I'm happy to look up and learn whatever is necessary, I just don't know where to start
Thanks!
I understood that you have 100k example. You can use RNN, LSTM or Attention based deep learning methods because the pattern of codes can be tracked by that models. Also machine learning models can solve that problem. In the end, your problem includes specific type of patterns for different classes. Therefore, you can separate that classes.
1) You need to start with finding embedding to represent your codes. I guess you can use ascii codes of numbers and letters. Also to make all vectors be the same length, use padding. Then you can normalize them to put in between 0-1.
2) Then my advise is to start with SVM with one-vs-all strategy to make multi-class classification. After that you can try XGBoost, which is a powerful ml model. Or you can start with more basic ml models. Idea behind that, start from basic and goes to complicated models.
3) If ml models are not enough for that task, start with basic RNN models.
I don't know about your data distribution among classes and also number of classes. If it is balanced and each class have enough data I guess you can easily automate that task.
Hello i'm currently working on a project where i have to use instance segmentation of different parts of seedlings (the top part and the stem)
Example image:
https://imgur.com/kWAZBed
I have to be able to calculate the angle of the hook for every seedling.
I've heard that the Mask-RCNN instance segmentation method might not be good for biological images, so should i go with U-net semantic segmentation instead?. The problem with U-net is that every seed and root gets categorized into two classes, where as i need to calculate the angle for each of them.
Some input would be appreciated.
You should start with whichever network is easiest for you to get off the ground and see if it's good enough. If not, try another model and see if it's good enough.
You can only go so far in choosing a network architecture for a new image use case. Sometimes you just have to try a few on the new type of image data and see which performs best.
Because your time is valuable, I would recommend starting with the simplest/fastest model for you to use, and try a "trickier" one, only if the first one wasn't good enough.
I must add that it's kind of difficult to understand all of the nuance's of your requirements just from the one image you posted...
good luck.
I would like to understand more the machine learning technics, I have read and watch a bunch of things on Python, sklearn and supervised feed forward net but I am still struggling to see how I can apply all this to my project and where to start with. Maybe it is a little bit too ambitious yet.
I have the following algorithm which generates nice patterns as binary format inputs on csv file. The outputs and the goal is to predict the next row.
The simplify logic of this algorithm is the prediction of the next line (top line being the most recent one) would be 0,0,1,1,1,0 and then the next after that would become either 0,0,0,1,1,0 or come back to its previous step 0,1,1,1,0. However you can see the model is slightly more complex and noisy this is why I would like to introduce some machine learnings here. I am aware to have a reliable prediction I will need to introduce other relevant inputs afterwards.
Would someone please help me to get started and stand on my feet here?
I don't like throwing this here and not being able to provide a single piece of code but I am slightly confused to where to start.
Should I pass as input each (line-1) as vectors and then the associated output would be the top line? Should I build the array manually with all my dataset?
I guess I have to use the sigmoid function and python seems the most common way to answer this but for the synapses (or weights), I understand I need also to provide a constant, should this be 1?
Finally assuming you want this to run continuously what would be required?
Please would you share with me readings or simplification tasks that could help me to increase my knowledge with all this.
Many thanks.
This is a fairly straightforward question, but I am new to the field. Using this tutorial I have a great way of detecting certain patterns or features. However, the images I'm testing are large and often the feature I'm looking for only occupies a small fraction of the image. When I run it on the entire picture the classification is bad, though when zoomed it and cropped the classification is good.
I've considered writing a script that breaks an image into many different images and runs the test on all (time isn't a huge concern). However, this still seems inefficient and unideal. I'm wondering about suggestions for the best, but also easiest to implement, solution for this.
I'm using Python.
This may seem to be a simple question, which it is, but the answer is not so simple. Localisation is a difficult task and requires much more leg work than classifying an entire image. There are a number of different tools and models that people have experimented with. Some models include R-CNN which looks at many regions in a manner not too dissimilar to what you suggested. Alternatively you could look at a model such as YOLO or TensorBox.
There is no one answer to this, and this gets asked a lot! For example: Does Convolutional Neural Network possess localization abilities on images?
The term you want to be looking for in research papers is "Localization". If you are looking for a dirty solution (that's not time sensitive) then sliding windows is definitely a first step. I hope that this gets you going in your project and you can progress from there.
As you may have heard of, there is an online font recognition service call WhatTheFont
I'm curious about the tech behind this tool. I think basically we can seperate this into two parts:
Generate images from font files of various format, refer to http://www.fileinfo.com/filetypes/font for a list of font file extensions.
Compare submitted image with all generated images
I appreciate you share some advice or python code to implement two steps above.
As the OP states, there are two parts (and probably also a third part):
Use PIL to generate images from fonts.
Use an image analysis toolkit, like OpenCV (which has Python bindings) to compare different shapes. There are a variety of standard techniques to compare different objects to see whether they're similar. For example, scale invariant moments work fairly well and are part of the OpenCv toolkit.
Most of the standard tools in #2 are designed to look for similar but not necessarily identical shapes, but for font comparison this might not be what you want, since the differences between fonts can be based on very fine details. For fine-detail analysis, try comparing the x and y profiles of a perimeter path around the each letter, appropriately normalized, of course. (This, or a more mathematically complicated variant of it, has been used with good success in font analysis.)
I can't offer Python code, but here are two possible approaches.
"Eigen-characters." In face recognition, given a large training set of normalized facial images, you can use principal component analysis (PCA) to obtain a set of "eigenfaces" which, when the training faces are projected upon this subspace, exhibit the greatest variance. The "coordinates" of the input test faces with respect to the space of eigenfaces can be used as the feature vector for classification. The same thing can be done with textual characters, i.e., many versions of the character 'A'.
Dynamic Time Warping (DTW). This technique is sometimes used for handwriting character recognition. The idea is that the trajectory taken by the tip of a pencil (i.e., d/dx, d/dy) is similar for similar characters. DTW makes invariant some of the variations across instances of single person's writing. Similarly, the outline of a character can represent a trajectory. This trajectory then becomes the feature vector for each font set. I guess the DTW part is not as necessary with font recognition because a machine creates the characters, not a human. But it may still be useful to disambiguate spatial ambiguities.
This question is a little old, so here goes an updated answer.
You should take a look into this paper DeepFont: Identify Your Font from An Image. Basically it's a neural network trained on tons of images. It was presented commercially in this video.
Unfortunately, there is no code available. However, there is an independent implementation available here. You'll need to train it yourself, since weights are not provided, but the code is really easy to follow. In addition to this, consider that this implementation is only for a few fonts.
There is also a link to the dataset and a repo to generate more data.
Hope it helps.