I'm doing a project in which we need to do segmentation of the window cars from inside the car.
I'm working with OpenCV but this is not mandatory. (Also python or C++ are OK)
Until now, I have some (not so good) results. I have followed this sequence:
1) Apply cv2.grabCut() in ROIs where windows might be.
import cv2
import numpy as np
from matplotlib import pyplot as plt
#Read Image
img = cv2.imread("test1.png",-1)
#Grab Cut iterations
itera = 30
p1 = True
if p1: # This takes quite long so this helps when debugging
gb_mask_w1 = np.zeros(img.shape[:2],np.uint8) #output mask
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
rect = (1,12,390,845) #ROI of window 1
# 0-pixels and 2-pixels are put to 0 (ie background) and all 1-pixels and
# 3-pixels are put to 1(ie foreground pixels).
mask_w1 = np.where((gb_mask_w1==3),0,1).astype('uint8') #
gb_mask_w2 = np.zeros(img.shape[:2],np.uint8)#output mask
rect_1 = (1500,12,323,820) #ROI of window 2
# 0-pixels and 2-pixels are put to 0 (ie background) and all 1-pixels and
# 3-pixels are put to 1(ie foreground pixels).
mask_w2 = np.where((gb_mask_w2==3),0,1).astype('uint8')
2) Erode on the foreground pixels to have clean(er) edges
# Morphological Operations
kernel = np.ones((10,10),np.uint8)
eroded_w1 = cv2.erode(mask_w1,kernel,iterations = 1)
eroded_w2 = cv2.erode(mask_w2,kernel,iterations = 1)
3) Aplpy cv2.fastNlMeansDenoising() to avoid artifacts
# DeNoise artifacts
mask_w1_porc = cv2.fastNlMeansDenoising(eroded_w1)
mask_w2_porc = cv2.fastNlMeansDenoising(eroded_w2)
4) Just apply resulting mask & plot
img_treated = img.copy()
# Green background
bskg = np.zeros(img.shape[:3],np.uint8)
bskg[:] = (0, 177, 64)
#Apply mask
img_treated[mask_w1_porc==0] = bskg[mask_w1_porc==0]
img_treated[mask_w2_porc==0] = bskg[mask_w2_porc==0]
ax[0,0].imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB ))
ax[1,1].imshow(cv2.cvtColor(img_treated,cv2.COLOR_BGR2RGB ))
It happens that only a segment of the window is actually detected.
It may happen that parts of the car are detected as window
Very slow (~20 seconds per window)
Is something working?
At least an important section of the windows is always detected
It's robust to occlusion
What do I want?
I'm wondering what can I add to this pipeline to have a better window detection => a filter to make cv2.grabCut()'s work easier?
Maybe there is there a faster way?
I'm avoiding machine learning or AI aproaches because we work with a not so powerful computer. But I'm open to those ideas
I'm adding the plot so that you can see the results (Original image, both masks and ouput):
I'm also posting an image with occlusion(which doesn't seem so fail but is the reason to detect windows instead of having fixed masks)
For my next university-project i will have to teach a Convoluted Neural Network how to denoise a picture of a face so i started digging the we for datasets of faces. I stumbled upon this dataset (CelebA) with 200k+ pictures of people and i found the first few problems: there are too many pictures to do basic computation on them.
I should:
Open each image and make a numpy array out of it (dlib.load_rgb_image is fine)
Find a face it, use the 5 point shape predictor to find the eyes and align them
Rotate the picture so that the eyes are in a straight horizontal line
Crop the face and resize it to 256x256 (i could choose 64x64 but its not a huge time saver)
Make a copy and add artificial noise to it
Save them both to two different folder
On a pc that the university gave me i could do about 40ish image each minute, around 57k images every 24hours.
To speedup thing i have tried threads; one thread for each pictures but the speedup is about 2-3 images more per-minute.
This is the code i'm running:
### Out of the threads, before running them ###
def img_crop(img, bounding_box):
# some code using cv2.copyMakeBorder to crop the image
MODEL_5_LANDMARK = "5_point.dat"
shape_preditor = dlib.shape_predictor(MODEL_5_LANDMARK)
detector = dlib.get_frontal_face_detector()
### Inside each thread ###
img_in = dlib.load_rgb_image("img_in.jpg")
dets = detector(img_in, 1)
shape = shape_preditor(img_in, dets[0])
points = []
for i in range(0, shape.num_parts):
point = shape.part(i)
points.append((point.x, point.y))
eye_sx = points[1]
eye_dx = points[3]
dy = eye_dx[1] - eye_sx[1]
dx = eye_dx[0] - eye_sx[0]
angle = math.degrees(math.atan2(dy, dx))
center = (dets[0].center().x, dets[0].center().y)
h, w, _ = img_in.shape
M = cv2.getRotationMatrix2D(center, angle + 180, 1)
img_in = cv2.warpAffine(img_in, M, (w, h))
dets = detector(img_in, 1)
bbox = (dets[0].left(), dets[0].top(), dets[0].right(), dets[0].bottom())
img_out = cv2.resize(imcrop(img_in, bbox), (256, 256))
img_out = cv2.cvtColor(img_out, cv2.COLOR_BGR2RGB)
img_noisy = skimage.util.random_noise(img_out, ....)
cv2.imwrite('out.jpg', img_out)
cv2.imwrite('out_noise.jpg', img_noisy)
My programming language is Python3.6, how i can speedup things?
Another problem will be loading the whole 200k images into memory as numpy array, from my initial testing 12k images will take around 80seconds with a final shape of (12000, 256, 256, 3). Is there a faster way to achieve this?
First of all, forgive me because I am familiar with c++ only. Please find below my suggestion to speed up dlib functions and convert to your python version if it is helpful.
Color does not matter to dlib. Hence, change input image to gray before proceeding to save time.
I saw you call the below function twice, what is the purpose? it could double the consuming time. If you need to get the new landmarks after alignment, try to rotate landmarks points directly instead of re-detecting. How to rotate points
dets = detector(img_in, 1)
Because you just want to detect 1 face per image only. Try to set pyramid_down to 6 (by default it is 1 - room out the image to detect more face). You can test value from 1 - 6
dets = detector(img_in, 6)
Turn on AVX instruction.
Note: more detail could be found here Dlib Github
I need to alpha-blend 2 images that are not the same size. I've managed to get them to composite by resizing to the same size, so I've got part of the logic:
import cv2 as cv
def combine_two_color_images_composited(foreground_image, background_image):
foreground = cv.resize(foreground_image, (400,400), interpolation=cv.INTER_CUBIC).copy()
background = cv.resize(background_image, (400,400), interpolation=cv.INTER_CUBIC).copy()
alpha =0.5
# do composite of foreground onto the background
cv.addWeighted(foreground, alpha, background, 1 - alpha, 0, background)
cv.imshow('composited image', background)
I'm wondering if I need to make a mask that is the same size as the larger image and then use that with my first image. If so, I don't know how to do masking yet in OpenCV.... this is but a tiny portion of my project so it's not something I've been able to spend a ton of time researching to learn how masking works.
I have searched all over but the code I'm finding does things like 'adds' the images together (side by side).
To combine the two images you can make use of numpy slicing to select the portion of the background image where you want to blend the foreground, then insert the newly blended portion in your background again.
import cv
def combine_two_color_images(image1, image2):
foreground, background = image1.copy(), image2.copy()
foreground_height = foreground.shape[0]
foreground_width = foreground.shape[1]
alpha =0.5
# do composite on the upper-left corner of the background image.
blended_portion = cv.addWeighted(foreground,
1 - alpha,
background[:foreground_height,:foreground_width,:] = blended_portion
cv.imshow('composited image', background)
To place the foreground at a specified location you use numpy indexing as before. Numpy indexing is very powerful and you will find it useful on many occasions. I linked the documentation above. Is really worth to take a look at.
def combine_two_color_images_with_anchor(image1, image2, anchor_y, anchor_x):
foreground, background = image1.copy(), image2.copy()
# Check if the foreground is inbound with the new coordinates and raise an error if out of bounds
background_height = background.shape[0]
background_width = background.shape[1]
foreground_height = foreground.shape[0]
foreground_width = foreground.shape[1]
if foreground_height+anchor_y > background_height or foreground_width+anchor_x > background_width:
raise ValueError("The foreground image exceeds the background boundaries at this location")
alpha =0.5
# do composite at specified location
start_y = anchor_y
start_x = anchor_x
end_y = anchor_y+foreground_height
end_x = anchor_x+foreground_width
blended_portion = cv.addWeighted(foreground,
background[start_y:end_y, start_x:end_x,:],
1 - alpha,
background[start_y:end_y, start_x:end_x,:] = blended_portion
cv.imshow('composited image', background)
I have two images, one with only background and the other with background + detectable object (in my case its a car). Below are the images
I am trying to remove the background such that I only have car in the resulting image. Following is the code that with which I am trying to get the desired results
import numpy as np
import cv2
original_image = cv2.imread('IMG1.jpg', cv2.IMREAD_COLOR)
gray_original = cv2.cvtColor(original_image, cv2.COLOR_BGR2GRAY)
background_image = cv2.imread('IMG2.jpg', cv2.IMREAD_COLOR)
gray_background = cv2.cvtColor(background_image, cv2.COLOR_BGR2GRAY)
foreground = np.absolute(gray_original - gray_background)
foreground[foreground > 0] = 255
cv2.imshow('Original Image', foreground)
The resulting image by subtracting the two images is
Here is the problem. The expected resulting image should be a car only.
Also, If you take a deep look in the two images, you'll see that they are not exactly same that is, the camera moved a little so background had been disturbed a little. My question is that with these two images how can I subtract the background. I do not want to use grabCut or backgroundSubtractorMOG algorithm right now because I do not know right now whats going on inside those algorithms.
What I am trying to do is to get the following resulting image
Also if possible, please guide me with a general way of doing this not only in this specific case that is, I have a background in one image and background+object in the second image. What could be the best possible way of doing this. Sorry for such a long question.
I solved your problem using the OpenCV's watershed algorithm. You can find the theory and examples of watershed here.
First I selected several points (markers) to dictate where is the object I want to keep, and where is the background. This step is manual, and can vary a lot from image to image. Also, it requires some repetition until you get the desired result. I suggest using a tool to get the pixel coordinates.
Then I created an empty integer array of zeros, with the size of the car image. And then I assigned some values (1:background, [255,192,128,64]:car_parts) to pixels at marker positions.
NOTE: When I downloaded your image I had to crop it to get the one with the car. After cropping, the image has size of 400x601. This may not be what the size of the image you have, so the markers will be off.
Afterwards I used the watershed algorithm. The 1st input is your image and 2nd input is the marker image (zero everywhere except at marker positions). The result is shown in the image below.
I set all pixels with value greater than 1 to 255 (the car), and the rest (background) to zero. Then I dilated the obtained image with a 3x3 kernel to avoid losing information on the outline of the car. Finally, I used the dilated image as a mask for the original image, using the cv2.bitwise_and() function, and the result lies in the following image:
Here is my code:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the image
img = cv2.imread("/path/to/image.png", 3)
# Create a blank image of zeros (same dimension as img)
# It should be grayscale (1 color channel)
marker = np.zeros_like(img[:,:,0]).astype(np.int32)
# This step is manual. The goal is to find the points
# which create the result we want. I suggest using a
# tool to get the pixel coordinates.
# Dictate the background and set the markers to 1
marker[204][95] = 1
marker[240][137] = 1
marker[245][444] = 1
marker[260][427] = 1
marker[257][378] = 1
marker[217][466] = 1
# Dictate the area of interest
# I used different values for each part of the car (for visibility)
marker[235][370] = 255 # car body
marker[135][294] = 64 # rooftop
marker[190][454] = 64 # rear light
marker[167][458] = 64 # rear wing
marker[205][103] = 128 # front bumper
# rear bumper
marker[225][456] = 128
marker[224][461] = 128
marker[216][461] = 128
# front wheel
marker[225][189] = 192
marker[240][147] = 192
# rear wheel
marker[258][409] = 192
marker[257][391] = 192
marker[254][421] = 192
# Now we have set the markers, we use the watershed
# algorithm to generate a marked image
marked = cv2.watershed(img, marker)
# Plot this one. If it does what we want, proceed;
# otherwise edit your markers and repeat
plt.imshow(marked, cmap='gray')
# Make the background black, and what we want to keep white
marked[marked == 1] = 0
marked[marked > 1] = 255
# Use a kernel to dilate the image, to not lose any detail on the outline
# I used a kernel of 3x3 pixels
kernel = np.ones((3,3),np.uint8)
dilation = cv2.dilate(marked.astype(np.float32), kernel, iterations = 1)
# Plot again to check whether the dilation is according to our needs
# If not, repeat by using a smaller/bigger kernel, or more/less iterations
plt.imshow(dilation, cmap='gray')
# Now apply the mask we created on the initial image
final_img = cv2.bitwise_and(img, img, mask=dilation.astype(np.uint8))
# cv2.imread reads the image as BGR, but matplotlib uses RGB
# BGR to RGB so we can plot the image with accurate colors
b, g, r = cv2.split(final_img)
final_img = cv2.merge([r, g, b])
# Plot the final result
If you have a lot of images you will probably need to create a tool to annotate the markers graphically, or even an algorithm to find markers automatically.
The problem is that you're subtracting arrays of unsigned 8 bit integers. This operation can overflow.
To demonstrate
>>> import numpy as np
>>> a = np.array([[10,10]],dtype=np.uint8)
>>> b = np.array([[11,11]],dtype=np.uint8)
>>> a - b
array([[255, 255]], dtype=uint8)
Since you're using OpenCV, the simplest way to achieve your goal is to use cv2.absdiff().
>>> cv2.absdiff(a,b)
array([[1, 1]], dtype=uint8)
I recommend using OpenCV's grabcut algorithm. You first draw a few lines on the foreground and background, and keep doing this until your foreground is sufficiently separated from the background. It is covered here: https://docs.opencv.org/trunk/d8/d83/tutorial_py_grabcut.html
as well as in this video: https://www.youtube.com/watch?v=kAwxLTDDAwU
I am attempting to write a program that will automatically locate a protein in an image, this will ultimately be used to differentiate between two proteins of different heights that are present.
The white area on top of the background is a membrane in which the proteins sit and the white blobs that are present are the proteins. The proteins have two lobes hence they appear in pairs (actually one protein).
I have been writing a script in Fiji (Jython) to try and locate the proteins so we can work out the height from the local background. This so far involves applying an adaptive histogram equalisation and then subtracting the background with a rolling ball of radius 10 pixels. After that I have been applying a kernel of sorts which is 10 pixels by 10 pixels and works out the average of the 5 centre pixels and divides it by the average of the pixels on the 4 edges of the kernel to get a ratio. if the ratio is above a certain value then it is a candidate.
the output I got was this image which apart from some wrapping and sensitivity (ratio=2.0) issues seems to be ok. My questions are:
Is this a reasonable approach or is there an obviously better way of doing this?
Can you suggest a way on from here? I am a little stuck now and not really sure how to proceed.
code if necessary: http://pastebin.com/D45LNJCu
How about starting off a bit more simple and using the Harris-point approach and detect local maxima. Eg.
import numpy as np
import Image
from scipy import ndimage
import matplotlib.pyplot as plt
roi = 2.5
peak_threshold = 120
im = Image.open('Q766c.png');
image = im.copy()
size = 2 * roi + 1
image_max = ndimage.maximum_filter(image, size=size, mode='constant')
mask = (image == image_max)
image *= mask
# Remove the image borders
image[:size] = 0
image[-size:] = 0
image[:, :size] = 0
image[:, -size:] = 0
# Find peaks
image_t = (image > peak_threshold) * 1
# get coordinates of peaks
f = np.transpose(image_t.nonzero())
# Show
img = plt.imshow(np.asarray(im))
plt.plot(f[:, 1], f[:, 0], 'o', markeredgewidth=0.45, markeredgecolor='b', markerfacecolor='None')
plt.savefig('local_max.png', format='png', bbox_inches='tight')
Which gives this:
ImageJ "Find maxima" does also similar.
Here is the Jython code
from ij import ImagePlus, IJ, Prefs
from ij.plugin import RGBStackMerge
from ij.process import ImageProcessor, ImageConverter
from ij.plugin.filter import Binary, MaximumFinder
from jarray import array
# define background is black (0)
Prefs.blackBackground = True
# find maxima
#imp = IJ.getImage()
imp = ImagePlus('http://i.stack.imgur.com/Q766c.png')
ip = imp.getProcessor()
segip = MaximumFinder().findMaxima( ip, 10, 200, MaximumFinder.SINGLE_POINTS , False, False)
# display detection result
binner = Binary()
binner.setup("dilate", None)
segimp = ImagePlus("seg", segip)
mergeimp = RGBStackMerge.mergeChannels(array([segimp, imp, None, None, None, None, None], ImagePlus), True)
EDIT: Updated the code to allow processing PNG image (RGB), and directly loading image from this thread. See comments for more details.
I wrote a little script to transform pictures of chalkboards into a form that I can print off and mark up.
I take an image like this:
Auto-crop it, and binarize it. Here's the output of the script:
I would like to remove the largest connected black regions from the image. Is there a simple way to do this?
I was thinking of eroding the image to eliminate the text and then subtracting the eroded image from the original binarized image, but I can't help thinking that there's a more appropriate method.
Sure you can just get connected components (of certain size) with findContours or floodFill, and erase them leaving some smear. However, if you like to do it right you would think about why do you have the black area in the first place.
You did not use adaptive thresholding (locally adaptive) and this made your output sensitive to shading. Try not to get the black region in the first place by running something like this:
Mat img = imread("desk.jpg", 0);
Mat img2, dst;
pyrDown(img, img2);
adaptiveThreshold(255-img2, dst, 255, ADAPTIVE_THRESH_MEAN_C,
THRESH_BINARY, 9, 10); imwrite("adaptiveT.png", dst);
imshow("dst", dst);
In the future, you may read something about adaptive thresholds and how to sample colors locally. I personally found it useful to sample binary colors orthogonally to the image gradient (that is on the both sides of it). This way the samples of white and black are of equal size which is a big deal since typically there are more background color which biases estimation. Using SWT and MSER may give you even more ideas about text segmentation.
I tried this:
import numpy as np
import cv2
im = cv2.imread('image.png')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
grayout = 255*np.ones((im.shape[0],im.shape[1],1), np.uint8)
blur = cv2.GaussianBlur(gray,(5,5),1)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
wcnt = 0
for item in contours:
area =cv2.contourArea(item)
print wcnt,area
[x,y,w,h] = cv2.boundingRect(item)
if area>10 and area<200:
roi = gray[y:y+h,x:x+w]
cntd = 0
for i in range(x,x+w):
for j in range(y,y+h):
if gray[j,i]==0:
cntd = cntd + 1
density = cntd/(float(h*w))
if density<0.5:
for i in range(x,x+w):
for j in range(y,y+h):
grayout[j,i] = gray[j,i];
wcnt = wcnt + 1
You have to balance two things, removing the black spots but balance that with not losing the contents of what is on the board. The output I got is this:
Here is a Python numpy implementation (using my own mahotas package) of the method for the top answer (almost the same, I think):
import mahotas as mh
import numpy as np
Imported mahotas & numpy with standard abbreviations
im = mh.imread('7Esco.jpg', as_grey=1)
Load the image & convert to gray
im2 = im[::2,::2]
im2 = mh.gaussian_filter(im2, 1.4)
Downsample and blur (for speed and noise removal).
im2 = 255 - im2
Invert the image
mean_filtered = mh.convolve(im2.astype(float), np.ones((9,9))/81.)
Mean filtering is implemented "by hand" with a convolution.
imc = im2 > mean_filtered - 4
You might need to adjust the number 4 here, but it worked well for this image.
mh.imsave('binarized.png', (imc*255).astype(np.uint8))
Convert to 8 bits and save in PNG format.