I'm attempting to do some image analysis using OpenCV in python, but I think the images themselves are going to be quite tricky, and I've never done anything like this before so I want to sound out my logic and maybe get some ideas/practical code to achieve what I want to do, before I invest a lot of time going down the wrong path.
This thread comes pretty close to what I want to achieve, and in my opinion, uses an image that should be even harder to analyse than mine. I'd be interested in the size of those coloured blobs though, rather than their distance from the top left. I've also been following this code, though I'm not especially interested in a reference object (the dimensions in pixels alone would be enough for now and can be converted afterwards).
Here's the input image:
What you're looking at are ice crystals, and I want to find the average size of each. The boundaries of each are reasonably well defined, so conceptually this is my approach, and would like to hear any suggestions or comments if this is the wrong way to go:
Image in RGB is imported and converted to 8bit gray (32 would be better based on my testing in ImageJ, but I haven't figured out how to do that in OpenCV yet).
The edges are optionally Gaussian blurred to remove noise
A Canny edge detector picks up the lines
Morphological transforms (erosion + dilation) are done to attempt to close the boundaries a bit further.
At this point it seems like I have a choice to make. I could either binarise the image, and measure blobs above a threshold (i.e. max value pixels if the blobs are white), or continue with the edge detection by closing and filling contours more fully. Contours seems complicated though looking at that tutorial, and though I can get the code to run on my images, it doesn't detect the crystals properly (unsurprisingly). I'm also not sure if I should morph transform before binarizing too?
Assuming I can get all that to work, I'm thinking a reasonable measure would be the longest axis of the minimum enclosing box or ellipse.
I haven't quite ironed out all the thresholds yet, and consequently some of the crystals are missed, but since they're being averaged, this isn't presenting a massive problem at the moment.
The script stores the processed images as it goes along, so I'd also like the final output image similar to the 'labelled blobs' image in the linked SO thread, but with each blob annotated with its dimensions maybe.
Here's what an (incomplete) idealised output would look like, each crystal is identified, annotated and measured (pretty sure I can tackle the measurement when I get that far).
Abridged the images and previous code attempts as they are making the thread overly long and are no longer that relevant.
Edit III:
As per the comments, the watershed algorithm looks to be very close to achieving what I'm after. The problem here though is that it's very difficult to assign the marker regions that the algorithm requires (http://docs.opencv.org/3.2.0/d3/db4/tutorial_py_watershed.html).
I don't think this is something that can be solved with thresholds through the binarization process, as the apparent colour of the grains varies by much more than the toy example in that thread.
Edit IV
Here are a couple of the other test images I've played with. It fares much better than I expected with the smaller crystals, and theres obviously a lot of finessing that could be done with the thresholds that I havent tried yet.
Here's 1, top left to bottom right correspond to the images output in Alex's steps below.
And here's a second one with bigger crystals.
You'll notice these tend to be more homogeneous in colour, but with harder to discern edges. Something I found a little suprising is that the edge floodfilling is a little overzealous with some of the images, I would have thought this would be particularly the case for the image with the very tiny crystals, but actually it appears to have more of an effect on the larger ones. There is probably a lot of room to improve the quality of the input images from our actual microscopy, but the more 'slack' the programming can take from the system, the easier our lives will be!
As I mentioned in the comments, watershed looks to be an ok approach for this problem. But as you replied, defining the foreground and the background for the markers is the hard part! My idea was to use the morphological gradient to get good edges along the ice crystals and work from there; the morphological gradient seems to work great.
import numpy as np
import cv2
img = cv2.imread('image.png')
blur = cv2.GaussianBlur(img, (7, 7), 2)
h, w = img.shape[:2]
# Morphological gradient
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
gradient = cv2.morphologyEx(blur, cv2.MORPH_GRADIENT, kernel)
cv2.imshow('Morphological gradient', gradient)
cv2.waitKey()
From here, I binarized the gradient using some thresholding. There's probably a cleaner way to do this...but this happens to work better than the dozen other ideas I tried.
# Binarize gradient
lowerb = np.array([0, 0, 0])
upperb = np.array([15, 15, 15])
binary = cv2.inRange(gradient, lowerb, upperb)
cv2.imshow('Binarized gradient', binary)
cv2.waitKey()
Now we have a couple issues with this. It needs some cleaning up as it's messy, and further, the ice crystals that are on the edge of the image are showing up---but we don't know where those crystals actually end so we should actually ignore those. To remove those from the mask, I looped through the pixels on the edge and used floodFill() to remove them from the binary image. Don't get confused here on the orders of rows and columns; the if statements are specifying rows and columns of the image matrix, while the input to floodFill() expects points (i.e. x, y form, which is opposite from row, col).
# Flood fill from the edges to remove edge crystals
for row in range(h):
if binary[row, 0] == 255:
cv2.floodFill(binary, None, (0, row), 0)
if binary[row, w-1] == 255:
cv2.floodFill(binary, None, (w-1, row), 0)
for col in range(w):
if binary[0, col] == 255:
cv2.floodFill(binary, None, (col, 0), 0)
if binary[h-1, col] == 255:
cv2.floodFill(binary, None, (col, h-1), 0)
cv2.imshow('Filled binary gradient', binary)
cv2.waitKey()
Great! Now just to clean this up with some opening and closing...
# Cleaning up mask
foreground = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
foreground = cv2.morphologyEx(foreground, cv2.MORPH_CLOSE, kernel)
cv2.imshow('Cleanup up crystal foreground mask', foreground)
cv2.waitKey()
So this image was labeled as "foreground" because it has the sure foreground of the objects we want to segment. Now we need to create a sure background of the objects. Now, I did this in the naïve way, which just is to grow your foreground a bunch, so that your objects are probably all defined within that foreground. However, you could probably use the original mask or even the gradient in a different way to get a better definition. Still, this works OK, but is not very robust.
# Creating background and unknown mask for labeling
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (17, 17))
background = cv2.dilate(foreground, kernel, iterations=3)
unknown = cv2.subtract(background, foreground)
cv2.imshow('Background', background)
cv2.waitKey()
So all the black there is "sure background" for the watershed. Also I created the unknown matrix, which is the area between foreground and background, so that we can pre-label the markers that get passed to watershed as "hey, these pixels are definitely in the foreground, these others are definitely background, and I'm not sure about these ones between." Now all that's left to do is run the watershed! First, you label the foreground image with connected components, identify the unknown and background portions, and pass them in:
# Watershed
markers = cv2.connectedComponents(foreground)[1]
markers += 1 # Add one to all labels so that background is 1, not 0
markers[unknown==255] = 0 # mark the region of unknown with zero
markers = cv2.watershed(img, markers)
You'll notice that I ran watershed() on img. You might experiment running it on a blurred version of the image (maybe median blurring---I tried this and got a little smoother of boundaries for the crystals) or other preprocessed versions of the images which define better boundaries or something.
It takes a little work to visualize the markers as they're all small numbers in a uint8 image. So what I did was assign them some hue in 0 to 179 and set inside a HSV image, then convert to BGR to display the markers:
# Assign the markers a hue between 0 and 179
hue_markers = np.uint8(179*np.float32(markers)/np.max(markers))
blank_channel = 255*np.ones((h, w), dtype=np.uint8)
marker_img = cv2.merge([hue_markers, blank_channel, blank_channel])
marker_img = cv2.cvtColor(marker_img, cv2.COLOR_HSV2BGR)
cv2.imshow('Colored markers', marker_img)
cv2.waitKey()
And finally, overlay the markers onto the original image to check how they look.
# Label the original image with the watershed markers
labeled_img = img.copy()
labeled_img[markers>1] = marker_img[markers>1] # 1 is background color
labeled_img = cv2.addWeighted(img, 0.5, labeled_img, 0.5, 0)
cv2.imshow('watershed_result.png', labeled_img)
cv2.waitKey()
Well, that's the pipeline in it's entirety. You should be able to copy/paste each section in a row and you should be able to get the same results. The weakest parts of this pipeline is binarizing the gradient and defining the sure background for watershed. The distance transform might be useful in binarizing the gradient somehow, but I haven't gotten there yet. Either way...this was a cool problem, I would be interested to see any changes you make to this pipeline or how it fares on other ice-crystal images.
Related
EDIT: This is a deeper explanation of a question I asked earlier, which is still not solved for me.
I'm currently trying to write some code that can extract data from some uncommon graphs in a book. I scanned the pages of the book, and by using opencv I would like to detect some features ofthe graphs in order to convert them into useable data. In the left graph I'm looking for the height of the "triangles" and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
For the left graph, I thought of detecting all the individual colors and computing the area of each sector by counting the amount of pixels in that color. When I have the area of these sectors, I can easily calculate their heights, using basic math. The following code snippet shows how far I've gotten already with identifying different colors. However I can't manage to make this work accurately. It always seems to detect some colors of other sectors as well, or not detect all pixels of one sector. I think it has something to do with the boundaries I'm using. I can't quite figure out how to make them work. Does someone know how I can determine these values?
import numpy as np
import cv2
img = cv2.imread('images/test2.jpg')
lower = np.array([0,0,100])
upper = np.array([50,56,150])
mask = cv2.inRange(img, lower, upper)
output = cv2.bitwise_and(img, img, mask = mask)
cv2.imshow('img', img)
cv2.imshow('mask', mask)
cv2.imshow('output', output)
cv2.waitKey(0)
cv2.destroyAllWindows()
For the right graph, I still have no idea how to extract data from it. I thought of identifying the center by detecting all the dotted lines, and then by detecting the intersections of these dotted lines with the gray area, I could measure the distance between the center and these intersections. However I couldn't yet figure out how to do this properly, since it sounds quite complex. The following code snippet shows how far I've gotten with the line detection. Also in this case the detection is far from accurate. Does someone have an idea how to tackle this problem?
import numpy as np
import cv2
# Reading the image
img = cv2.imread('test2.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply edge detection
edges = cv2.Canny(gray,50,150,apertureSize = 3)
# Line detection
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength=50,maxLineGap=20)
for line in lines:
x1,y1,x2,y2 = line[0]
cv2.line(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imwrite('linesDetected.jpg',img)
For the left image, using your approach, try to look at the RGB histogram, the colors should be significant peaks, if you would like to use the relative area of the segments.
Another alternative could be to use Hough Circle Transform, which should work on circle segments. See also here.
For the right image ... let me think ...
You could create a "empty" diagram with no data inside. You know the locations of the circle segment ("cake pieces"). Then you could identify the area where the data is (the dark ones), either by using a grey threshold, an RGB threshold, or Find Contours or look for Watershed / Distance Transform.
In the end the idea is to make a boolean overlay between the cleared image and the segments (your data) that was found. Then you can identify which share of your circle segments is covered, or knowing the center, find the farthest point from the center.
I am working with frames from a video. The video is overlaid with several semi-transparent boxes and my goal is to find the coordinates of these boxes. These boxes are the only fixed points in the video - the camera is moving, color intensity changes, there is no fixed reference. The problem is that the boxes are semi-transparent, so they also change with the video, albeit not as much. It seems that neither background substraction nor tracking have the right tools for this problem.
Nevertheless, I've tried the background substractors that come with cv2 as well as some homebrewn methods using differences between frames and thresholding. Unfortunately, these don't work due to the box transparency.
For reference, here is what the mean difference between the first 50 frames looks like:
And here is what cv2 background subtractor KNN returns:
I've experimented with thresholds, number of frames taken into account, various contouring algorithms, blurring/sharpening/etc. I've also tried techniques from document layout analysis.
I wonder if maybe there is something I'm missing due to not knowing the right keyword. I don't expect anyone here to give me the perfect solution, but any pointers as to where to look/what approach to try, are appreciated. I'm not bound to cv2 either, anything that works in python will do.
If you take a sample of random frames as elements of an array, and calculate the FFT, all the semi-transparent boxes will have a very high signal, and the rest of the pixels would behave as noise, so noise remotion will filter away the semi-transparent boxes. You can add the result of your other methods as additional frames for the fft
You are trying to find something that does not changes on the entire video, so do not use consecutive frames, or if you are forced to use consecutive frames, shuffle them randomly.
To gain speed, you may only take only one color channel from each frame, and pick the color channel randomly. That way the colors becomes noise, and cancel each other.
If the FFT is too expensive, just averaging random frames should filter the noise.
Ok here is first step, you can make Canny from that image, from canny you can make countours:
import cv2
import random as rng
image = cv2.imread("c:\stackoverflow\interface.png")
edges = cv2.Canny(image, 100, 240)
contoursext, hierarchy = cv2.findContours(
edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#cv2.RETR_EXTERNAL would work better if the image would not be framed.
for i in range(len(contoursext)):
color = (rng.randint(0,256), rng.randint(0,256), rng.randint(0,256))
cv2.drawContours(image, contoursext, i, color, 1, cv2.LINE_8, hierarchy, 0)
# Show in a window
cv2.imshow("Canny", edges)
cv2.imshow("Contour", image)
cv2.waitKey(0)
Then you can test if the contour or combination of 2 contours is rectangles for example...wich would probably detect most of the rectangle overlays...
Or Also you can try to detect canny lines if they are similar to rectangles.
I need some help with an OpenCV project I'm working on. I'm taking images from a computer game (in this case, Fortnite), and I would like to extract different elements from them, eg. timer value, quantities of materials, health and shield etc.
Currently I perform a series of image preprocessing functions until I get a binary image, followed by locating the contours in the image and then sending those contours to a machine learning algorithm (K-Nearest-Neighbours).
I am able to succeed in a lot of cases, but there are some images where I don't manage to find some of the contours, therefore I don't find all the data.
An important thing to note is that I use the same preprocessing pipeline for all images, because I'm looking for as robust of a solution that I can manage.
I would like to know what I can do to improve the performance of my program. -
Is KNN a good model for this sort of task, or are there other models that might give me better results?
Is there any way to recognise characters without locating contours?
How can I make my preprocessing pipeline as robust as possible, given the fact that there is a lot of variance in the background across all images?
My goal is to process the images as fast as possible, starting out with a minimum of at least 2 images per second.
Thanks in advance for any help or advice you can give me!
Here is an example image before preprocessing
Here is the image after preprocessing, in this example I cannot find the contour for the 4 on the right side.
Quite simply, enlarging the image might help, since it increases the dark border of the number.
I threw together some code that does that. The result could be improved, but my point here is to show that the 4 can now be detected as a contour. To increase efficiency I only selected contours within a certain size.
Also, since it is part of the HUD, that usually means that the location on screen is always the same. If so, you can get great performance increase by only selecting the area with values (described here) - as I have done manually.
Finally, since the numbers have a consistent shape, you could try matchShapes as an alternative to kNN to recognize the numbers. I don't know how they compare in performance though, so you'll have to try that out yourself.
Result:
Code:
import numpy as np
import cv2
# load image
img = cv2.imread("fn2.JPG")
# enlarge image
img = cv2.resize(img,None,fx=4, fy=4, interpolation = cv2.INTER_CUBIC)
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# create mask using threshold
ret,mask = cv2.threshold(gray,200,255,cv2.THRESH_BINARY)
# find contours in mask
im, contours, hierarchy = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# draw contour on image
for cnt in contours:
if cv2.contourArea(cnt) < 3000 and cv2.contourArea(cnt) > 200:
cv2.drawContours(img, [cnt], 0, (255,0,0), 2)
#show images
cv2.imshow("Mask", mask)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
When humans see markers suggesting the form of a shape, they immediately perceive the shape itself, as in https://en.wikipedia.org/wiki/Illusory_contours. I'm trying to accomplish something similar in OpenCV in order to detect the shape of a hand in a depth image with very heavy noise. In this question, assume that skin color based detection is not working (actually it is the best I've achieved so far but it is not robust under changing light conditions, shadows or skin colors. Also various paper shapes (flat and colorful) are on the table, confusing color-based approaches. This is why I'm attempting to use the depth cam instead).
Here's a sample image of the live footage that is already pre-processed for better contrast and with background gradient removed:
I want to isolate the exact shape of the hand from the rest of the picture. For a human eye this is a trivial thing to do. So here are a few attempts I did:
Here's the result with canny edge detection applied. The problem here is that the black shape inside the hand is larger than the actual hand, causing the detected hand to overshoot in size. Also, the lines are not connected and I fail at detecting contours.
Update: Combining Canny and a morphological closing (4x4 px ellipse) makes contour detection possible with the following result. It is still waaay too noisy.
Update 2: The result can be slightly enhanced by drawing that contour to an empty mask, save that in a buffer and re-detect yet another contour on a merge of three buffered images. The line that combines the buffered images is is hand_img = np.array(np.minimum(255, np.multiply.reduce(self.buf)), np.uint8) which is then morphed once again (closing) and finally contour detected. The results are slightly less horrible than in the picture above but laggy instead.
Alternatively I tried to use an existing CNN (https://github.com/victordibia/handtracking) for detecting the approximate position of the hand's center (this step works) and then flood from there. In order to detect contours the result is put into an OTSU filter and then the largest contour is taken, resulting in the following picture (ignore black rectangles in the left). The problem is that some of the noise is flooded as well and the results are mediocre:
Finally, I tried background removers such as MOG2 or GMG. They are confused by the enormous amount of fast-moving noise. Also they cut off the fingertips (which are crucial for this project). Finally, they don't see enough details in the hand (8 bit plus further color reduction via equalizeHist yield a very poor grayscale resolution) to reliably detect small movements.
It's ridiculous how simple it is for a human to see the exact precise shape of the hand in the first picture and how incredibly hard it is for the computer to draw a shape.
What would be your recommended method to achieve an exact hand segmentation?
After two days of desperate testing, the solution was to VERY carefully apply thresholding to an well-preprocessed image.
Here are the steps:
Remove as much noise as you possibly can. In my case, denoising was done using Intel's pyrealsense2 (I'm using an Intel RealSense depth camera and the algorithms were written for that camera family, thus they work very well). I used rs.temporal_filter() and directly after rs.hole_filling_filter() on every frame.
Capture the very first frame. Besides capturing the exact distance to the table (for later thresholding), this step also saves a still picture that is blurred by a 100x100 px kernel. Since the camera is never mounted perfectly but slightly tilted, there's an ugly grayscale gradient going over the picture and making operations impossible. This still picture is then subtracted from every single later frame, eliminating the gradient. BTW: this gradient removal step is already incorporated in the screenshots shown in the question above
Now the picture is almost noise-free. Do not use equalizeHist. This does not simply increase the general contrast regularly but instead empathizes the remaining noise way too much. This was my main error I did in almost all experiments. Instead, apply a threshold (binary with fixed border) directly. The border is extremely thin, setting it at 104 instead of 205 makes a huge difference.
Invert colors (unless you have taken BINARY_INV in the previous step), apply contours, take the largest one and write it to a mask
Voilà!
I have this image:
I am trying to put the background into focus in order to perform edge-detection on the image. What would be the methods available to me (either on
space/frequency filed)?
What I tried is the following:
kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
im = cv2.filter2D(equ, -1, kernel)
This outputs this image:
I also played around with the centre value but with no positive result.
I also tried this:
psf = np.ones((5, 5)) / 25
equ = convolve2d(equ, psf, 'same')
deconvolved = restoration.wiener(equ, psf, 1, clip=False)
plt.imshow(deconvolved, cmap='gray')
With no appreciable changes to the image.
Any help on the matter is greatly appreciated!
EDIT:
Here is the code that I took from here:
psf = np.ones((5, 5)) / 25
equ = convolve2d(equ, psf, 'same')
deconvolved, _ = restoration.unsupervised_wiener(equ, psf)
plt.imshow(deconvolved, cmap='gray')
and here is the output:
Deblurring images is (unfortunately) quite difficult, the reason for this is that blurring removes noise, so there are several (noisy) images that will yield the same image when you blur it. This means that there is no simple way for the computer to "choose" which of the noisy images when you deblur it. Because of this, deblurring will often yield noisy images.
Now then, you might ask how photographers do this in reality. Well, they do not actually deblur images, they sharpen them (which is slightly different). When you sharpen an image, you increase the contrast near borders to emphasise them (this is why you sometimes see a halo around borders on images that have been too heavily sharpened).
In you case, you want to deblur it (and there is no convolution kernel that will allow you to do this). To do it in a good way, you need to know what process blurred the image in the first place (that is if you don't want to spend thousands of dollars on special software or don't have a masters in mathematics or astronomy).
If you still want to do this, I'd recommend searching for deconvolution, and if you don't know the blurring process, blind deconvolution. There are some (crude) functions for it in skimage, which might be of help (http://scikit-image.org/docs/stable/auto_examples/filters/plot_restoration.html#sphx-glr-auto-examples-filters-plot-restoration-py).
Finally, the final link in Jax Briggs seem helpful, but I would not cross my fingers for magical results.