Simple object recognition - python

===SOLVED===
Thanks for your suggestions and comments. By working on the flood_fill algorithm given in Beginning Python Visualization book (Chapter 9 - Image Processing) I have implemented what I have wanted. I can count the objects, get enclosing rectangles for each object (therefore height and widths), and lastly can construct NumPy arrays or matrices for each of them.
Although it is not an optimized approach it does what I want. The source code (lab2.py) and the png file (lab2-particles.png) that I use have been put under http://code.google.com/p/ccnworks/source/browse/#svn/trunk/AtSc450.
You need NumPy and PIL installed, and matplotlib to see the histogram. Core of the code lies within the objfind function where the main recursive object search action occurs.
One further update:
SciPy's ndimage.label() does exactly what I want, too.
Cheers for David-Warde Farley and Zachary Pincus from the NumPy and SciPy mailing-lists for pointing this right into my eyes :)
=============
Hello,
I have an image that contains the shadows of ice particles measured by a particle spectrometer. I want to be able to identify each object, so that I can later classify and use them further in my calculations.
In essence, what I am willing to do is to simply implement a fuzzy selection tool where I can simply select each entity.
How could I easily solve this problem? (Preferably using Python)
Thanks.
NOTE: In my question I am referring to each specific connected pixels as objects or entities. My intention to extract them and create NumPy array representations as shown below. (Here I am using the top-left object; if a pixel exist use 1's if not use 0's. This object's shape is 3 by 3 which correspondingly 3 pixel height by 3 pixel width. These are projections of real ice-particles onto 2D domain, under the assumption of their being spherical and equivalent radius is (height+width)/2, and later some scalings --from pixels to actual sizes and volume calculations will follow)
import numpy as np
np.array([[1,1,1], [1,1,1], [0,0,1]])
array([[1, 1, 1],
[1, 1, 1],
[0, 0, 1]])
Here is a section from the image which I am going to use.
screenshot http://img43.imageshack.us/img43/2327/particles.png

Scan every square (e.g. from the top-left, left-to-right, top-to-bottom)
When you hit a blue square then:
a. Record this square as a location of a new object
b. Find all the other contiguous blue squares (e.g. by looking at the neighbours of this square, and the neighbours of those neighbours, etc.) and mark them as being part of the same object
Continue to scan
When you find another blue square, test to see whether it's part of a known object before going to step 2; alternatively in step 2b, erase any square after you've associated it with an object

I used to do this kind of analysis on micrographs and eventually put everything I needed into an image processing and analysis package written in C, driven via Tcl. (It worked with 512 x 512 images only, which explains why 512 crops up so often. There were images with pixels of various sizes allocated, but most of the work was done with 8-bit pixels, which explains why there is that business of 0xff and maximum meaningful count of 254 on an image.)
Briefly, the 'zz' at the begining of the Tcl commands sends the remainder of the line to the package's parser which calls the appropriate C routine with the given arguments. Right after the 'zz' is an argument that indicates the input and output of the command. (There can be multiple inputs but only a single output.) 'r' indicates a 512 x 512 x 8-bit image. The third word is the name of the command to be invoked; 'graphs' marks up an image as described in the text below. So, 'zz rr graphs' means 'Call the ZZ parser; input an r image to the graphs command and get back an r image.' The rest of the Tcl command line specifies which of the pre-allocated images to use. (The 'g' image is an ROI, i.e., region-of-interest, image; almost all ZZ ops are done under ROI control.) So, 'r1 r1 g8' means 'Use r1 as input, use r1 as output (that is, mark up the input image itself), and do the operation wherever the corresponding pixel on image g8 --- that is, r8, used as an ROI --- is >0.
I don't think it is available online anywhere, but if you want to pick through the source code or even compile the whole shebang, I'll be happy to send it to you. Here's an excerpt from the manual (but I think I see some errors in the manual at this late date --- that's embarrassing ...):
Example 6. Counting features.
Problem
Counting is a common task. The items counted are called “features”, and it is usually necessary to prepare images carefully so that features correspond in a one-to-one way with things that are the real objects to be counted. Here, however, we ignore image preparation and consider, instead, the mechanics of counting. The first counting exercise is to find out how many features are on the images in the directory ./cells?
Approach
First, let us define “feature”. A feature is the largest group of “set” (non-zero) pixels all of which can be reached by travelling from one set pixel to another along north-south-east-west (up-down-right-left) routes, starting from a given set pixel. The zz command that detects and marks such features on an image is “zz rr graphs R:src R:dest G:ROI”, so called because the mathematical term for such a feature is a “graph”. If all the pixels on an image are set, then there is only a single graph on the image, but it contains 262144 pixels (512 * 512). If pixels are set and clear (equal to zero) in a checkerboard pattern,
then there will be 131072 (512 * 512 / 2) graphs, but each will containing only one pixel.
Briefly explained, “zz rr graphs” starts in the upper-left corner of an image and scans each
succeeding row left to right until it finds a set pixel, then finds all the set pixels attached to that through north, south, east, or west borders (“4-connected”). It then sets all pixels in that graph to 1 (0x01). After finding and marking graph 1, it starts scanning again at the pixel after the one where it first discovered graph 1, this time ignoring any pixels that already belong to a graph. The first 254 graphs that it finds will be marked uniquely; all graphs found after that, however, will be marked with the value 255 (0xff)
and so cannot be distinguished from each other. The key to being able to count any number of graphs accurately is to process each image in stages, that is, find the number of graphs on an image and, if the number is greater than 254, erase the 254 graphs just found, repeating the process until 254 or fewer graphs are found. The Tcl language provides the means to set up control of this operation.
Let us begin to build the commands needed by reading a ZZ image file into an R image and detecting and marking the graphs. Before the processing loop, we declare and zero a variable to hold the total number of features in the image series. Within the processing loop, we begin by reading the image file into an R image and detecting and marking the graphs.
zz ur to $inDir/$img r1
zz rr graphs r1 r1 g8
Next, we zero some variables to keep track of the counts, then use the “ra max” command to find out whether more than 254 graphs were detected.
set nGraphs [ zz ra max r1 a1 g1 ]
If nGraphs does equal 255, then the 254 accurately counted graphs should be added to the total, the graphs from 1 through 254 should be erased, and the count repeated for as many times as it takes to reduce the number of graphs below 255.
while {$nGraphs == 255} {
incr sumGraphs 254
zz rbr lt r1 155 r1 g1 0 255
set sumGraphs 0
zz rr graphs r1 r1 g8
set nGraphs [ zz ra max r1 a1 g8 ]
}
When the “while” loop exits, the variable nGraphs must hold a number less than 255, that is, a number of accurately counted graphs; this is added to the rising total of the number of features in the image series.
incr sumGraphs $nGraphs
After the processing loop, print out the total number of features found in the series.
puts “Total number of features in $inDir \
images $beginImg through $endImg is $sumGraphs.”
After the processing loop, print out the total number of features found in the series.

Looking at the image you provided, all you need to do next is to apply a simple region growing algorithm.
If I were using MATLAB, I would use bwlabel/bwboundaries functions. I believe there's an equivalent function somewhere in Numpy, or use OpenCV with python wrappers as suggested by #kwatford

OpenCV has a Python interface that you might find useful.

Connected component analysis may be what you are looking for.

Related

Tesseract OCR fails to detect varying font size and letters that are not horizontally aligned

I am trying to detect these price labels text which is always clearly preprocessed. Although it can easily read the text written above it, it fails to detect price values. I am using python bindings pytesseract although it also fails to read from the CLI commands. Most of the time it tries to recognize the part where the price as one or two characters.
Sample 1:
tesseract D:\tesseract\tesseract_test_images\test.png output
And the output of the sample image is this.
je Beutel
13
However if I crop and stretch the price to look like they are seperated and are the same font size, output is just fine.
Processed image(cropped and shrinked price):
je Beutel
1,89
How do get OCR tesseract to work as I intended, as I will be going over a lot of similar images?
Edit: Added more price tags:
sample5 sample6 sample7
The problem is the image you are using is of small size. Now when tesseract processes the image it considers '8', '9' and ',' as a single letter and thus predicts it to '3' or may consider '8' and ',' as one letter and '9' as a different letter and so produces wrong output. The image shown below explains it.
A simple solution could be increasing its size by factor of 2 or 3 or even more as per the size of your original image and then passing to tesseract so that it detects each letter individually as shown below. (Here I increased its size by factor of 2)
Bellow is a simple python script that will solve your purpose
import pytesseract
import cv2
img = cv2.imread('dKC6k.png')
img = cv2.resize(img, None, fx=2, fy=2)
data = pytesseract.image_to_string(img)
print(data)
Detected text:
je Beutel
89
1.
Now you can simply extract the required data from the text and format it as per your requirement.
data = data.replace('\n\n', '\n')
data = data.split('\n')
dollars = data[2].strip(',').strip('.')
cents = data[1]
print('{}.{}'.format(dollars, cents))
Desired Format:
1.89
The problem is that the Tesseract engine was not trained to read this kind of text topology.
You can:
train your own model, and you'll need in particular to provide images with variations of topology (position of characters). You can actually use the same image, and shuffle the positions of the characters.
reorganize the image into clusters of text and use tesseract, in particular, I would consider the cents part and move it on the right of the coma, in that case you can use tesseract out of the box. Few relevant criterions would be the height of the clusters (to differenciate cents and integers), and the position of the clusters (read from the left to the right).
In general computer vision algorithms (including CNNs) are giving you tool to have a higher representation of an image (features or descriptors), but they fail to create a logic or an algorithm to process intermediate results in a certain way.
In your case that would be:
"if the height of those letters are smaller, it's cents",
"if the height, and vertical position is the same, it's about the
same number, either on left of coma, or on the right of coma".
The thing is that it's difficult to reach that through training, and at the same time it's extremely simple to write this for a human as an algorithm. Sorry for not giving you an actual implementation, but my text is the pseudo code.
TrainingTesseract2
TrainingTesseract4
Joint Unsupervised Learning of Deep Representations and Image Clusters

How to properly configure coloring for mandelbrot set using python?

Looking for trainings on python I decided to draw the mandelbrot set using a script. Drawing it wasn't too complicated so I decided to use color and I discovered the smooth coloring algorithm. Using this question I was able to render something really beautiful and similar to this one.
To achieve that I set up a gradation color palette using three "steps" : From dark blue to light blue, then light blue to yellow and finally yellow to dark brown. The overall image is perfect.
Problem comes when I try too zoom in. Let's take the example of this area. When I'm at this level of zoom, my script doesn't draw dark blue anymore. I think I mis coded something because whereever you see dark blue on the wikipedia image, I have dark brown (so a color near the end of my palette). When I first thought about this I told myself if the pattern is going back to the original one, then it should use the same colors cause escape time should be the same.
So, was this coloring configured in the palette or is there something about escape time I didn't understand ?
Here is the code I use for the coloring :
def color_pixel(n, z):
smoothcolor = n + 1 - math.log(math.log(abs(z)))/math.log(2)
f = smoothcolor/iterate_max
i = int(f*500)
color = palette[i]
return color
500 is the number of colors in my palette (len(palette)-1).
z the value of z when it escaped over 10.
I use 100 as the max iterations but same results with a higher value.
Thanks !
My colouring method is to use a rotative array in three sections. First blue cross-fades to green without using red, then green to red without using blue, and finally red to (almost) blue with no green, where the next iteration level will wrap back to pure blue at the bottom of the array by using a modulus of the iterations.
However when I made a supposedly smoothe realtime zoom (by storing the data with a doubling scale, and then in-betweening 16 frames by interpolation for playback), I found that in the neighbourhood of the M-set, where the contours look chaotic, the effect was messy as the colours tend to dance around. There I used a different scheme, bending the colours to a gray scale.
My final colouring method was to use the rotating palette for pixels having one or more neighbours of the same depth, but tending towards mid-gray depending on how many neighbours were different. Bear in mind though, that the requirements for a moving image are different from a static image, and sharp detail is not necessarily desirable.
At deep zooms the number of iterations needed to extract the detail can be 1000 or more. I solved the problem laterally. I do not brute-force the map calculations. I developed a curve-stitching method that follows the contour of an iteration level, and then fills the region. In the smoothly changing areas that means large areas do not have to be iterated. Similarly for the M-Set itself where the function has not escaped - I avoid iterating there as far as possible by again trying to follow round its edge and then filling. This method can suffer from nipping off some detail, but the speed gain is enormous. In the chaotic region near the edge of the M-Set my method was to just iterate at every pixel.
I'm also looking into this now as well (the coloring scheme). Since the image was made using Ultra Fractal 3, I looked into that program and poked around and finally found the details, which are slightly different than from what you and the wiki are doing. It's written in some custom scripting language but hopefully you can understand what it's doing. Here's the code:
Smooth(OUTSIDE) {
;
; This coloring method provides smooth iteration
; colors for Mandelbrot and other z^2 formula types
; (Phoenix, Julia). Results on other types may be
; unpredictable, but might be interesting.
;
; Thanks to F. Slijkerman for some tweaks.
; Thanks to Linas Vepstas for the math.
;
; Written by Damien M. Jones
;
init:
complex il = 1/log(#power) ; Inverse log (power).
float lp = log(log(#bailout)) ; log(log bailout).
final:
#index = 0.05 * real(#numiter + il*lp - il*log(log(cabs(#z))))
default:
title = "Smooth (Mandelbrot)"
helpfile = "Uf*.chm"
helptopic = "Html/coloring/standard/smooth.html"
$IFDEF VER50
rating = recommended
$ENDIF
param power
caption = "Exponent"
default = (2,0)
hint = "This should be set to match the exponent of the \
formula you are using. For Mandelbrot, this is usually 2."
endparam
param bailout
caption = "Bail-out value"
default = 128.0
min = 1
hint = "This should be set to match the bail-out value in \
the Formula tab. This formula works best with bail-out \
values higher than 100."
endparam
}
My math isn't good enough to know how to compute the log of a complex number so I'm stuck at the moment in going further using this, but I thought I'd share what I've found on this topic.

Cases where Morphological Opening and Closing yields the same results?

I would like to know if there are any examples or cases where Opening and Closing Morphology operations on an single image produce the same results.
As an example, let's say we have an image X, and we have done opening operation to produce Y. Similarly, we have done a closing operation on the original X to produce the same Y. I would like to know if there are examples for these type of images X. Programming examples in Python or MATLAB are also appreciated.
Yes there are. As one small example, if you had a binary image where it consists of a bunch of squares that are disconnected and distinct. Provided that you specify a structuring element that is square, and choosing the structuring element so that it is smaller than the smallest square in the image, then doing either operation will give you the same results.
If you did an opening on this image and a closing on this image, you will produce the same results. Remember, an opening is an erosion followed by a dilation where a closing is a dilation followed by an erosion. In terms of analyzing the shapes, erosion slightly shrinks the area of the image while dilation slightly enlarges it.
By doing an erosion followed by a dilation (opening), you're shrinking the object and then growing it again. This will bring the image back to where it was before, provided that you choose the structuring element like what we talked about before. Similarly, if you did an dilation followed by an erosion (closing), you're growing the object and then shrinking it again, also bringing the image back to where it was before... following that same guideline I just talked about of course.
If you were to choose a structuring element where it is larger than the smallest object, doing an opening will remove this object from the image, and so you won't get the original image back. Also, you need to make sure that the objects are well far away from each other, and that the size of the structuring element does not overlap any of the objects as you slide over and do the morphology operations. The reason why is because if you were to do a closing, you would join these two objects together and so that won't get you the same results either!
Here's an example image that I generated that is binary:
To generate this image in MATLAB, you can do:
A = false(200,200);
A(30:60,30:60) = true;
A(90:110,90:110) = true;
A(10:30, 135:155) = true;
A(150:180,100:120) = true;
In Python, you can do this with numpy:
import numpy as np
A = np.zeros((200,200), dtype='uint8')
A[29:60,29:60] = 255
A[89:110,89:110] = 255
A[9:30, 134:155] = 255
A[149:180, 99:120] = 255
The reason why I had to create the array as uint8 in numpy is because when we want to show this image, I'm going to use OpenCV and it requires that the image be at least a uint8 type.
Now, let's choose a 5 x 5 square structuring element, and let's perform a closing and an opening with this image. We will display the results in a single figure going from left to right:
se = strel('square', 5);
A_close = imclose(A, se);
A_open = imopen(A, se);
figure;
subplot(1,3,1);
imshow(A);
title('Original');
subplot(1,3,2);
imshow(A_close);
title('Closed');
subplot(1,3,3);
imshow(A_open);
title('Open');
This is the result:
It certainly looks the same! To really show the difference, let's subtract the closed and opened result from the original image. You should get a blank image in the end if they're both equal to the original image.
figure;
subplot(1,2,1);
imshow(abs(double(A) - double(A_close)));
subplot(1,2,2);
imshow(abs(double(A) - double(A_open)));
Bear in mind that I converted the images to double to facilitate subtraction, and I used abs to ensure that negative differences are reflected. This is what I get:
As you can see, both results are totally blank, meaning they're exact copies of the original image after each result.
The equivalent code in Python for the first part is the following:
import cv2
se = np.ones((5,5), dtype='uint8')
A_close = cv2.morphologyEx(A, cv2.MORPH_CLOSE, se)
A_open = cv2.morphologyEx(A, cv2.MORPH_OPEN, se)
cv2.imshow('Original', A)
cv2.imshow('Close', A_close)
cv2.imshow('Open', A_open)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here's what I get:
You'll need to install the OpenCV package for this Python code. I displayed all of the images as three separate figures, then left the windows there until you choose any one of them and push a key. Once you do this, all of the windows will close. If you want to show the subtraction stuff, this is the code in Python:
A_close_diff = A - A_close
A_open_diff = A - A_open
cv2.imshow('Close Diff', A_close_diff)
cv2.imshow('Open Diff', A_open_diff)
cv2.waitKey(0)
cv2.destroyAllWindows()
I didn't name the figures in MATLAB because what we're showing is obvious, but for OpenCV, you need to name the windows, and so I put names that describe what we're showing for each. I also didn't need to take the absolute value, because in numpy, doing arithmetic operations that result in an overflow or underflow will simply wrap around itself, while for MATLAB, the values get clipped. That's why for MATLAB, I needed to convert to double and take the absolute value because imshow doesn't display negative intensities or if we were to have a situation where we did 0 - 1, the output would be 0 and you wouldn't be able to show that this location has a difference. With Python, doing 0 - 1 for uint8, will result in 255, so we can certainly see a difference here.... so there's no need to do any of this abs and casting stuff that we did in MATLAB. Here's what I get:
In general, you can reproduce what I did with any kind of shape and any size shape, so long as you choose a structuring element that mimics the properties of the shape that is in your image, and you choose a structuring element that is smaller than the smallest shape seen in that image. I'm sure there are many more examples that don't have to follow these specific guidelines, but this is the best example that I can think of at this moment.
This should hopefully get you started.
Good luck!
Yes, there are such images. One of the properties of opening (it's mentioned in wiki article, for example) is that it is an anti-extensive operation, i.e. if Y is opening of X, then Y ⊆ X. Similarly, closing is an extensive operation (i.e. X ⊆ Y), therefore for any such image X = Y. Any image invariant to both opening and closing will satisfy your requirement (and, as I have just shown, only such images will).
Concrete examples depend on structuring element used when performing erosion or dilation. For example, if it is a square n x n matrix with all elements equal to 1, then any rectangle with both height and width greater than n (and located far enough, i.e. at least n/2 pixels, from image edges) will satisfy this requirement.

Detect the size of a QR Code in Python using OpenCV and Zbar

I have code that takes an image from the webcam, scans it for QR codes using zBar and returns the value of the code and an image with the QR code highlighted (based off http://sourceforge.net/p/qrtracker/wiki/Home/). How can I also make it tell me the size of the code (as a pixel value or % of the screen)?
Additional question: is there a way to detect how skewed it is (e.g rotation in Z about the Y-axis)?
Regarding the size of Code
zBar provides a method to do this in terms of pixel values (Once you know the size in pixel values, you can find it in %)
I would like to extend the code here: http://sourceforge.net/apps/mediawiki/zbar/index.php?title=HOWTO:_Scan_images_using_the_API
Above code finds a QR code in an image, prints its data etc. Now checking last few lines of code:
import math
scanner.scan(image)
[a,b,c,d] = x.location # it returns the four corners of the QR code in an order
w = math.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2) # Just distance between two points
h = math.sqrt((b[0]-c[0])**2 + (b[1]-c[1])**2)
Area = w*h
Skewness of QRCode
I think you want to transform it into a pre-defined shape (like square, rectangle, etc). If so, you can define corners of a pre-defined shape, say ((100,100), (300,100),(300,300),(100,300)). Then find the perspective transform and apply the transformation if you would like. An example in OpenCV is provided here: http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_geometric_transformations/py_geometric_transformations.html#perspective-transformation

Drawing fast lines in pygame

I'm trying to draw fast lines using pygame that aren't rendered directly to the screen. I've got a Python list as large as the number of pixels for the desired resolution, and store integer values corresponding to the number of times that pixel was hit by the line algorithm. Using this, a 2D heat map is built up, so rather than drawing a flat pixel value, pixel values are incremented based on the number of times a line runs through it, and "hot" pixels get brighter colours.
The reason for doing it this way is that we don't know in advance how many of these lines are going to get drawn, and what the maximum number of times any given pixel is going to be hit. Since we'd like to scale the output so that each rendering has the correct maximum and minimum RGB values, we can't just draw to the screen.
Is there a better way to draw these lines than a relatively naive Bresenham's algorithm? Here's the critical part of the drawLine function:
# before the loop, to save repeated multiplications
xm = []
for i in range(resolution[0]):
xm.append(i * resolution[0])
# inside of drawLine, index into the f list, of size resolution[0] * resolution[1]
for x in range(x0, x1 + 1):
if steep:
idx = y + xm[x]
f[idx] += 1
else:
idx = x + xm[y]
f[idx] += 1
The end result is scaled and drawn to the screen based on the maximum value inside of f. For example, if the maximum value is 1000, then you can assume the RGB value of each of the pixels is (f[i] * 255) / 1000.
The profile information says that runtime is dominated by the index lookups into f. I've used previous questions here to prove that these basic lists are faster than numpy arrays or arrays in Python, but for drawing lines like this, it still seems like there's room to improve.
What's a good and fast method for drawing an unknown number of lines to the screen, knowing that you'll be scaling the output in the end to render to the screen? Is there a good way to get rid of the index overhead?
Try Cython or something similar. (If you do, I would be interested in knowing if/how much that helped)
Cython is a programming language to
simplify writing C and C++ extension
modules for the CPython Python
runtime. Strictly speaking, Cython
syntax is a superset of Python syntax
additionally supporting: Direct
calling of C functions, or C++
functions/methods, from Cython code.
Strong typing of Cython variables,
classes, and class attributes as C
types. Cython compiles to C or C++
code rather than Python, and the
result is used as a Python Extension
Module or as a stand-alone application
embedding the CPython runtime.
(http://en.wikipedia.org/wiki/Cython)

Categories