I'm stuck with a problem where I want to identify different coffee beans in a mix.
I created a neural network which is able to identify different beans individually. But in practice, I want to develop a algorithm where I can detect these beans in a bigger batch. It is not necessary to identify all the beans in the picture, but when i'm able to identify 10-15 beans in a bigger batch, this would be enough.
The problem is now that I'm able to segment the beans when there is just one layer of beans with a contrasting background, but when there are multiple layers of beans below this first layer, it gets really hard.
I tried to use distance transform and the watershed algorithm from openCV and as mentioned, this worked for just single beans and for some small overlap between beans (just as in this example). The picture below shows the results: results of single layer segmentation
My code used was based on the example mention before:
import cv2
import numpy as np
from matplotlib import pyplot as plt
from scipy.ndimage import label
from scipy.ndimage import morphology
# load the image as normal and grayscale
img_path = "FINAL/segmentation/IMG_6699.JPG"
img= cv.imread(img_path,0)
img0 = cv.imread(img_path)
#preprocess the image
img= cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,80,255,cv.THRESH_BINARY_INV)
kernel = np.ones((5,5),np.uint8)
opening = cv2.morphologyEx(th1, cv2.MORPH_OPEN, kernel)
dilation = cv2.dilate(opening, None, iterations=2)
erosion = cv2.erode(dilation,kernel,iterations = 50)
border_nonseg = dilation - cv2.erode(dilation, None, iterations = 1)
#distance transform
#dt = morphology.distance_transform_bf(dilation, metric='chessboard')
dt = cv2.distanceTransform(dilation, 2, 5)
dt = ((dt - dt.min()) / (dt.max() - dt.min()) * 255).astype(np.uint8)
hier, dt1 = cv2.threshold(dt, 170, 255, cv2.THRESH_BINARY)
# label the centers found by the distance transform
lbl, ncc = label(dt1)
lbl = lbl * (255/ncc)
# Completing the markers now.
lbl[border_nonseg == 255] = 255
lbl = lbl.astype(np.int32)
# Watershed algorithm
cv2.watershed(img0, lbl)
lbl[lbl == -1] = 0
lbl = lbl.astype(np.uint8)
result = 255 - lbl
lbl_cont = result
# Draw the borders
result[result != 255] = 0
result = cv2.dilate(result, None, iterations=1)
img0[result == 255] = (255, 0, 0)
cv2.imwrite("output.png", img0)
contours, _ = cv.findContours(result, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
titles = ['Original Image', 'dilation',
'gradient morph', 'erode']
images = [ border_nonseg, dt, lbl_cont, img0]
plt.figure(figsize=(20,20))
for i in range(4):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()
But the problem starts when there is a picture like this (which is a real situation): multi layer segmentation and harder multi layer segmentation
I don't think that I'm able to reuse the code mentioned before and that I need a different approach. Because the contrast between the first and second layer is just to small, because of the small size of the beans there is not a big shade created by them, which would give a nice contrast and the the color of the beans is also quite dark which doesn't make it easier.
So do you have any suggestion for different approaches to tackle this problem or maybe adjustment the the current code to solve for my problem?
I'm very curious to hear different opinions on this!
If I've understood correcly, you just need some image segmentation methods to separate the beans from a big picture into many small pictures with just one bean in it so you can feed them to your NN to train/test it.
With the kind of pictures that you showed, the segmentation is almost an identification by itself. I mean, you would almost need a trained NN to identify the beans in the picture to then separate them and feed them into your non-trained NN.
For these kinds of problems, I believe that there are some NN architectures (non-supervised) that are trained to extract the relevant features for you. I think autoencoders was one of the options, but I'm not sure right now.
The other approach is to use some kind of more general pattern recognition:
a) Shape-Based: They try to match a contour model over the gradient image
b) Correlation-Based: They try to match a sample image over your original image using grayscale correlations
These methods use pyramidal search to increase speed but you might want as well try different pyramid levels of the model for each pyramid level of the image to analyze to cope with different zoomings of the grains, which is equivalent to a matching method with scaling. You will need also several models of your beans (different perspectives of a single bean) to increase the number of results per image.
You could try as well c) a region expansion methodthat expands a seed region to the neighboring pixels under some conditions based on color smoothness, or d) a border finder combined with some contour closing algorithms; but I fear they could cause you many problems based on your images variability.
Related
I have an image with some random components... border of each component has some blurred pixels i.e....see screenshot
So using OpenCv and python, i want this image to be really sharp...i.e. no blurry pixels on the edge... as shown in this image below
To a certain tolerance for errors this behaviour can be achived by using k-means algorithm. This algorithm can be used to build clusters of pixels having similar color. The OpenCV k-means implementation provides a lot of parameters to tweak with to get the desired results. You may use the following code snippet to start with:
import cv2
import numpy as np
img = cv2.imread("/path/to/img.png")
Z = img.reshape((-1, 3))
Z = np.float32(Z)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 2
ret, label, center = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center)
res = center[label.flatten()]
res2 = res.reshape((img.shape))
cv2.imwrite("./output.png", res2)
The above code will generate the following results for the sample image:
But this code may not work very will with the larger image(even after passing K=5), this is due to the fact that, we are depending upon the K-means to pick the seeds by random sampling. We have an option of passing in the seed colors to look for which will be BGR values for yellow, light greem, dark green, brown and blue in your code. After supplying these BGR values as seed, you can get better results.
I'm working on a project which requires detection of people and due to the complexity of the system, I decided to use movement detection.
I faced some problems and upon asking on stack overflow, this answer seemed the best.
So I implemented the algorithm in the following steps:
Implement saliency on the input video
Applied K-means clustering
Background Subtraction
Morphological Transformation
Here is the code
import cv2
import time
import numpy as np
cap=cv2.VideoCapture(0)
#i wanted to try different background subtractors to get the best result.
fgbg=cv2.createBackgroundSubtractorMOG2()
fgbg1 = cv2.bgsegm.createBackgroundSubtractorMOG()
h = cap.get(4)
w = cap.get(3)
frameArea = h*w
areaTH = frameArea/150
while(cap.isOpened()):
#time.sleep(0.05)
_,frame=cap.read()
cv2.imshow("frame",frame)
image=frame
################Implementing Saliency########################
saliency = cv2.saliency.StaticSaliencySpectralResidual_create()
(success, saliencyMap) = saliency.computeSaliency(image)
saliencyMap = (saliencyMap * 255).astype("uint8")
#cv2.imshow("Image", image)
#cv2.imshow("Output", saliencyMap)
saliency = cv2.saliency.StaticSaliencyFineGrained_create()
(success, saliencyMap) = saliency.computeSaliency(image)
saliencyMap = (saliencyMap * 255).astype("uint8")
threshMap = cv2.threshold(saliencyMap.astype("uint8"), 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# show the images
#cv2.imshow("Image", image)
#cv2.imshow("saliency", saliencyMap)
#cv2.imshow("Thresh", threshMap)
kouts=saliencyMap
#cv2.imshow("kouts", kouts)
##############implementing k-means clustering#######################
clusters=12
z=kouts.reshape((-1,3))
#covert to np.float32
z=np.float32(z)
#define criteria and accuracy
criteria= (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER,5,1.0)
#apply k-means
ret,label,center=cv2.kmeans(z,clusters,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
#converting back the float 32 data to unit 8 and making the image
center=np.uint8(center)
res=center[label.flatten()]
kouts=res.reshape((kouts.shape))
cv2.imshow('clustered image',kouts)
############applying background subtraction#######################
fgmask=fgbg.apply(kouts)
fgmask1=fgbg1.apply(kouts)
cv2.imshow('fg',fgmask)
cv2.imshow('fgmask1',fgmask1)
#as i said earlier, i wanted to get the best background subtractor
#########################morphological transformation#####################
#Below i tried various techniques to get the best possible result
kernel=np.ones((5,5),np.uint8)
erosion=cv2.erode(fgmask1,kernel,iterations=1)
cv2.imshow('erosion',erosion)
dilation=cv2.dilate(fgmask1,kernel,iterations=1)
cv2.imshow('dilation',dilation)
gradient = cv2.morphologyEx(fgmask1, cv2.MORPH_GRADIENT, kernel)
cv2.imshow("gradient",gradient)
opening=cv2.morphologyEx(fgmask1,cv2.MORPH_OPEN,kernel)
closing=cv2.morphologyEx(fgmask1,cv2.MORPH_CLOSE,kernel)
cv2.imshow('opening',opening)
cv2.imshow('closing',closing)
#########for detection of contours##################
contours0, hierarchy = cv2.findContours(erosion,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours0:
area = cv2.contourArea(cnt)
if area > areaTH and area<frameArea*0.50:
M = cv2.moments(cnt)
x,y,f,g = cv2.boundingRect(cnt)
img = cv2.rectangle(frame,(x,y),(x+f,y+g),(0,255,0),2)
cv2.imshow('Original',frame)
k = cv2.waitKey(1) & 0xff
if k == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
I tried this algorithm on this video but still there was a lot of noise in the output. I previously thought that the problem might be in the quality of the video but when I did cv2.VideoCapture(0), the problem still persist and the code doesn't seem to remove the noise and the situation I'm working in, has sometimes high noise.
Tell me any suggestions or where did I go wrong or a different approach to the problem.
Thanks in advance.
I spent sometime trying to see if something can be done with noise reduction, but I believe you already tried many of the known techniques in OpenCV. My opinion is to approach your problem using neural networks as they will be more accurate detecting the objects.
I created a Colab notebook, to illustrate this:
https://colab.research.google.com/drive/1rBrcu46sfo0F7fsQf4BC9hKoXTk_wNBk?usp=sharing
Even with this simple approach, it's possible to detect objects: persons and clothing. You can set a criteria that can just consider the top 10 items. As a bus entrance has a limit of people that can enter at the same time.
This is not a final solution because I am using a general purpose detector. This can be improved in your application by training the network with your video inputs. Labeling will be required but I believe this will give you the most accurate results.
I also think for the challenge to track the people that are inside the bus and the ones entering. For that you can take track the rectangles. There is an excellent example using dlib: https://www.pyimagesearch.com/2018/10/22/object-tracking-with-dlib/
I am trying to threshold images with challenging noise.
.
The numbers on the side are the dimensions. I have tried various standard methods:
ret,thresh1 = cv2.threshold(img,95,255,0)
#cv2.THRESH_BINARY
thresh2 = cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,7,0.5)
thresh3 = cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,3,1.5)
# Otsu's thresholding after Gaussian filtering
blur = cv2.GaussianBlur(img,(3,3),0)
ret3,th3 = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
I want to segment the "lighter" portion inside the darker grey zone (or vice versa). I have played with various kernel sizes, and constant values but nothing is giving me a good segmentation. Any ideas what else i can try or how to improve the results? Some sample results i get using the code is
I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.
every time I change the image from the original I lose accuracy in detecting some characters.
I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.
Example:
some variations on the image from gray scale or b&w
What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?
some ideas I already tried, also some thresholding.
dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)
contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)
sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
I can only offer a butcher's solution, potentially a nightmare to maintain.
In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.
My prerequisites:
I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.
What I did:
- I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
- The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.
(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).
In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.
You tried almost every standard step. I would advise you to try some PIL built-in filters like sharpness filter. Apply sharpness and contrast on the RGB image, then binarise it. Perhaps use Image.split() and Image.merge() to binarise each colour separately and then bring them back together.
Or convert your image to YUV and then use just Y channel for further processing.
Also, if you do not have a monochrome background consider performing some background substraction.
What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image. (You might need to keep the picture size though, so you should replace it with white colour). Tesseract also likes straight lines. So some deskewing might be in order if your text is recorded at an angle. Tesseract also sometimes gives better results if you resize the image to twice its original size.
I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive.
You will need to pre-process the image more than once, and use a bitwise_or operation to combine the results. To extract the colors, you could use
import cv2
boundaries = [ #BGR colorspace for opencv, *not* RGB
([15, 15, 100], [50, 60, 200]), #red
([85, 30, 2], [220, 90, 50]), #blue
([25, 145, 190], [65, 175, 250]), #yellow
]
for (low, high) in boundaries:
low = np.array(low, dtype = "uint8")
high = np.array(high, dtype = "uint8")
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, low, high)
bitWise = cv2.bitwise_and(image, image, mask=mask)
#now here is the image masked with the specific color boundary...
Once you have the masked image, you can do another bitwise_or operation on your to-be "final" image, essentially adding this mask to it.
but this specific implementation requires opencv, however the same principle applies for other image packages.
I need a little more context on this.
How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.
What are you going to do with this data? How accurate does the OCR need to be?
Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.
The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR. If you are set on OCR, one approach you could try:
Binarize the image to get the non-red text in white and background black as in your binarized image:
from PIL import Image
def binarize_image(im, threshold):
"""Binarize an image."""
image = im.convert('L') # convert image to monochrome
bin_im = image.point(lambda p: p > threshold and 255)
return bin_im
im = Image.open("game_text.JPG")
binarized = binarize_image(im, 100)
Extract only the red text values with a filter, then binarize it:
import cv2
from matplotlib import pyplot as plt
lower = [15, 15, 100]
upper = [50, 60, 200]
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
mask = cv2.inRange(im, lower, upper)
red_binarized = cv2.bitwise_and(im, im, mask = mask)
plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB))
plt.show()
However, even with this filtering, it still doesn't extract red well.
Add images obtained in (1.) and (2.).
combined_image = binarized + red_binarized
Do OCR on (3.)
This is not a full solution but it may drive to something better.
By converting your data from BGR (or RGB) to CIE-Lab you can process a grayscale image as the weighted sum of the colour channels a* and b*.
This grayscale image will enhance colour regions of the text.
But adapting the threshold you can from this grayscale image segment the coloured word in your original image and get the other words from the a L channel thresholding.
A bitwise and operator should be enough to merge to two segmentation image.
If you can have an image with a better contrast a very last step could be a filling based on the contours.
For that take a look to RETR_FLOODFILL of the function 'cv2.findContours'.
Any other hole filing function from other package may also fit for that purpose.
Here is a code that show the first part of my idea.
import cv2
import numpy as np
from matplotlib import pyplot as plt
I = cv2.UMat(cv2.imread('/home/smile/QSKN.png',cv2.IMREAD_ANYCOLOR))
Lab = cv2.cvtColor(I,cv2.COLOR_BGR2Lab)
L,a,b = cv2.split(Lab)
Ig = cv2.addWeighted(cv2.UMat(a),0.5,cv2.UMat(b),0.5,0,dtype=cv2.CV_32F)
Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
#k = np.ones((3,3),np.float32)
#k[2,2] = 0
#k*=-1
#
#Ig = cv2.filter2D(Ig,cv2.CV_32F,k)
#Ig = cv2.absdiff(Ig,0)
#Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
_, Ib = cv2.threshold(Ig,0.,255.,cv2.THRESH_OTSU)
_, Lb = cv2.threshold(cv2.UMat(L),0.,255.,cv2.THRESH_OTSU)
_, ax = plt.subplots(2,2)
ax[0,0].imshow(Ig.get(),cmap='gray')
ax[0,1].imshow(L,cmap='gray')
ax[1,0].imshow(Ib.get(),cmap='gray')
ax[1,1].imshow(Lb.get(),cmap='gray')
import numpy as np
from skimage.morphology import selem
from skimage.filters import rank, threshold_otsu
from skimage.util import img_as_float
from PIL import ImageGrab
import matplotlib.pyplot as plt
def preprocessing(image, strelem, s0=30, s1=30, p0=.3, p1=1.):
image = rank.mean_bilateral(image, strelem, s0=s0, s1=s1)
condition = (lambda x: x>threshold_otsu(x))(rank.maximum(image, strelem))
normalize_image = rank.autolevel_percentile(image, strelem, p0=p0, p1=p1)
return np.where(condition, normalize_image, 0)
#Grab image from clipboard
image = np.array(ImageGrab.grabclipboard())
sel = selem.disk(4)
a = sum([img_as_float(preprocessing(image[:, :, x], sel, p0=0.3)) for x in range(3)])/3
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True)
ax[0].imshow(image)
ax[1].imshow(rank.autolevel_percentile(a, sel, p0=.4))
This is my code for clearing text from noise and creating uniform brightness for characters.
With minor modifications, I used it to solve your problem.
I am trying to make a program that is capable of identifying a road in a scene and proceeded to using morphological filtering and the watershed algorithm. However the program produces either mediocre or bad results. It seems to do okay (not good enough through) if the road takes up most of the scene. However in other pictures, it turns out that the sky gets segmented instead (watershed with the clouds).
I tried to see if I can preform more image processing to improve the results, but this is the best I have so far and don't know how to move forward to improve my program.
How can I improve my program?
Code:
import numpy as np
import cv2
from matplotlib import pyplot as plt
import imutils
def invert_img(img):
img = (255-img)
return img
#img = cv2.imread('images/coins_clustered.jpg')
img = cv2.imread('images/road_4.jpg')
img = imutils.resize(img, height = 300)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
thresh = invert_img(thresh)
# noise removal
kernel = np.ones((3,3), np.uint8)
opening = cv2.morphologyEx(thresh,cv2.MORPH_OPEN,kernel, iterations = 4)
# sure background area
sure_bg = cv2.dilate(opening,kernel,iterations=3)
#sure_bg = cv2.morphologyEx(sure_bg, cv2.MORPH_TOPHAT, kernel)
# Finding sure foreground area
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
ret, sure_fg = cv2.threshold(dist_transform,0.7*dist_transform.max(),255,0)
# Finding unknown region
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg,sure_fg)
# Marker labelling
ret, markers = cv2.connectedComponents(sure_fg)
# Add one to all labels so that sure background is not 0, but 1
markers = markers+1
# Now, mark the region of unknown with zero
markers[unknown==255] = 0
'''
imgray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
imgray = cv2.GaussianBlur(imgray, (5, 5), 0)
img = cv2.Canny(imgray,200,500)
'''
markers = cv2.watershed(img,markers)
img[markers == -1] = [255,0,0]
cv2.imshow('background',sure_bg)
cv2.imshow('foreground',sure_fg)
cv2.imshow('threshold',thresh)
cv2.imshow('result',img)
cv2.waitKey(0)
For start, segmentation problems are hard. The more general you want the solution to be, the more hard it gets. Road segemntation is a well-known problem, and i'm sure you can find many papers which tackle this issue from various directions.
Something that helps me get ideas for computer vision problems is trying to think what makes it so easy for me to detect it and so hard for computer.
For example, let's look on the road on your images. What makes it unique from the background?
Distinct gray color.
Always have 2 shoulders lines in white color
Always on the bottom section of the image
Always have a seperation line in the middle (yellow/white)
Pretty smooth
Wider on the bottom and vanishing into horizon.
Now, after we have found some unique features, we need to find ways to quantify them, so it will be obvious to the algorithm as it is obvious to us.
Work on the RGB (or even better - HSV) image, don't convert it to gray on the beginning and lose all the color data. Look for gray area!
Again, let's find white regions (inside gray ones). You can try do edge detection in the specific orientation of the shoulders line. You are looking for line that takes about half of the height of the image. etc...
Lets delete the upper half of the image. It is hardly that you ever have there a road, and you will get rid from a lot of noise in your algorithm.
see 2...
Lets check the local standard deviation, or some other smoothness feature.
If we found some shape, lets check if it fits what we expect.
I know these are just ideas and I don't claim they are easy to implement, but if you want to improve your algorithm you must give it more "knowledge", just as you have.
Exploit some domain knowledge; in other words, make some simplifying assumptions. Even basic things like "the camera's not upside down" and "the pavement has a uniform hue" will improve the common case.
If you can treat crossroads as a special case, then finding the edges of the roadway may be a simpler and more useful task than finding the roadway itself.