For my next university-project i will have to teach a Convoluted Neural Network how to denoise a picture of a face so i started digging the we for datasets of faces. I stumbled upon this dataset (CelebA) with 200k+ pictures of people and i found the first few problems: there are too many pictures to do basic computation on them.
I should:
Open each image and make a numpy array out of it (dlib.load_rgb_image is fine)
Find a face it, use the 5 point shape predictor to find the eyes and align them
Rotate the picture so that the eyes are in a straight horizontal line
Crop the face and resize it to 256x256 (i could choose 64x64 but its not a huge time saver)
Make a copy and add artificial noise to it
Save them both to two different folder
On a pc that the university gave me i could do about 40ish image each minute, around 57k images every 24hours.
To speedup thing i have tried threads; one thread for each pictures but the speedup is about 2-3 images more per-minute.
This is the code i'm running:
### Out of the threads, before running them ###
def img_crop(img, bounding_box):
# some code using cv2.copyMakeBorder to crop the image
MODEL_5_LANDMARK = "5_point.dat"
shape_preditor = dlib.shape_predictor(MODEL_5_LANDMARK)
detector = dlib.get_frontal_face_detector()
### Inside each thread ###
img_in = dlib.load_rgb_image("img_in.jpg")
dets = detector(img_in, 1)
shape = shape_preditor(img_in, dets[0])
points = []
for i in range(0, shape.num_parts):
point = shape.part(i)
points.append((point.x, point.y))
eye_sx = points[1]
eye_dx = points[3]
dy = eye_dx[1] - eye_sx[1]
dx = eye_dx[0] - eye_sx[0]
angle = math.degrees(math.atan2(dy, dx))
center = (dets[0].center().x, dets[0].center().y)
h, w, _ = img_in.shape
M = cv2.getRotationMatrix2D(center, angle + 180, 1)
img_in = cv2.warpAffine(img_in, M, (w, h))
dets = detector(img_in, 1)
bbox = (dets[0].left(), dets[0].top(), dets[0].right(), dets[0].bottom())
img_out = cv2.resize(imcrop(img_in, bbox), (256, 256))
img_out = cv2.cvtColor(img_out, cv2.COLOR_BGR2RGB)
img_noisy = skimage.util.random_noise(img_out, ....)
cv2.imwrite('out.jpg', img_out)
cv2.imwrite('out_noise.jpg', img_noisy)
My programming language is Python3.6, how i can speedup things?
Another problem will be loading the whole 200k images into memory as numpy array, from my initial testing 12k images will take around 80seconds with a final shape of (12000, 256, 256, 3). Is there a faster way to achieve this?
First of all, forgive me because I am familiar with c++ only. Please find below my suggestion to speed up dlib functions and convert to your python version if it is helpful.
Color does not matter to dlib. Hence, change input image to gray before proceeding to save time.
I saw you call the below function twice, what is the purpose? it could double the consuming time. If you need to get the new landmarks after alignment, try to rotate landmarks points directly instead of re-detecting. How to rotate points
dets = detector(img_in, 1)
Because you just want to detect 1 face per image only. Try to set pyramid_down to 6 (by default it is 1 - room out the image to detect more face). You can test value from 1 - 6
dets = detector(img_in, 6)
Turn on AVX instruction.
Note: more detail could be found here Dlib Github
Related
I have two images from a video, frameA and frameB. Assuming the video is panning, slowly, one can imagine that frameA and frameB have significant overlap. We can then create a panorama from the video footage.
I have tried using: opencv2.stitcher, SURF/ORB detectors with BF matching, and a few vanilla approaches. None of them are producing the results that I need [for some reason]. The main problem I am identifying is that SURF/ORB is identify too "small" a region of interest and matching incorrectly.
Example: I am in a desert with 1 single cactus in my view. I am panning across it.
The SURF/ORB is detecting regions of interest such as the EDGES of my cactus with sky/land and unable to match (not sure why) it in the next frame. The things it does detect, it does not match up well and when you use homography, it matches say the middle of the cactus with the top part of the cactus in the next frame... and it gets warped.
Is there a way to do the following?
Enforce only rotation and translation? between 2 frames -- note that there is "new" information in subsequent frames, so it can never 100% overlap.
Find best rotation and translation, with the base assumption that there is a best match? (i am very very slowly panning, and guarantee high overlap).
Ignore minor fluctuations. If my feature detectors were "large" enough, it would say, "cactus in frame 1" matches "catctus in frame 2", translate by X,Y and maybe rotate by Z.
My attempt at a solution is take the entire picture and do an "overlapping" sweep, and find the difference. Where I have a minimum, I have the proper X,Y shift. This however has two problems:
It's slow. way too slow.
it can't do rotation, without being even more slow due to search space increase.
image1 = cv2.imread('img1.png')
print(image1.shape)
img1 = cv2.cvtColor(image1,cv2.COLOR_BGR2GRAY)
nw1, nh1 = img1.shape
nw15, nh15 = int(nw1/2), int(nh1/2)
# load image 2
image2 = cv2.imread('img2.png')
img2 = cv2.cvtColor(image2,cv2.COLOR_BGR2GRAY)
nw2, nh2 = img2.shape
nw25, nh25 = int(nw2/2), int(nh2/2)
# generate base canvas, note that img1 could be top left of img2, or img2 could be top left of img1
# the search space of this is very large
nw, nh = nw1+nw2*2, nh1+nh2*2
cnw, cnh = int(nw/2), int(nh/2) # get the center point for later calculations
base_image1 = np.ones((nw,nh), np.uint8)*255 # make the background white
base_image1[cnw-nw15: cnw+nw15, cnh-nh15: cnh+nh15] = img1 # set the first image in the center
# create the image we want to "sweep over" we "pre-allocate" since creating new ones is expensive.
sweep_image = np.zeros((nw,nh), np.uint8) # keep at 0 for BLACK
import time
stime = time.time()
total_blend = []
# sweep over my search space!
for x_s in np.arange(20, 80): # limit search space so it finish this year
for y_s in np.arange(300, 500): # limit search space so it finish this year
w1, w2 = cnw-nw25+x_s, cnw+nw25+x_s # get the width slice to set our sweep image
h1, h2 = cnh-nh25+y_s, cnh+nh25+y_s # get the height slice to set our sweep image
sweep_image[w1: w2, h1: h2] = img2 # set the image
diff = cv2.absdiff(base_image1, sweep_image) # calculate the difference
total_blend.append([x_s, y_s, np.sum([diff])]) # store the transformation and coordinates
sweep_image[w1: w2, h1: h2] = 0 # reset back to zero
cv2.imshow('diff',diff)
cv2.waitKey(0)
print(time.time() - stime)
# convert to array
total_blend = np.array(total_blend)
mymin = np.min(total_blend[:,2])
print(total_blend[total_blend[:,2]==mymin]) # get the best coordinates for translation
Example below:
Example 1: Note the giant white borders, due to making sure the images the same size across the ENTIRE search space. Example 1, here is an ok ish match, but notice how the dark regions aren't very dark.
Example 2: (large white borders), but notice how the dark regions are actually black. This is close to minimum.
All help and thoughts appreciated. Is there a way to dictate the "size" of feature detectors? Is there a faster way to sweep? Maybe some RMSE and numpy eigenvalues - this is linear algebra after all...?
I am using python3, opencv2.
So far have gone with creating my own Keypoints that is similar to a Dense Feature Dector. Unlike SIFT/Corners/ORBs or any of those that find small features, a Dense Feature can be thought of as taking keypoints in a grid across the entire image.
(More here)
https://subscription.packtpub.com/book/application-development/9781785283932/10/ch10lvl1sec81/what-is-a-dense-feature-detector
I'm working on a project which requires detection of people and due to the complexity of the system, I decided to use movement detection.
I faced some problems and upon asking on stack overflow, this answer seemed the best.
So I implemented the algorithm in the following steps:
Implement saliency on the input video
Applied K-means clustering
Background Subtraction
Morphological Transformation
Here is the code
import cv2
import time
import numpy as np
cap=cv2.VideoCapture(0)
#i wanted to try different background subtractors to get the best result.
fgbg=cv2.createBackgroundSubtractorMOG2()
fgbg1 = cv2.bgsegm.createBackgroundSubtractorMOG()
h = cap.get(4)
w = cap.get(3)
frameArea = h*w
areaTH = frameArea/150
while(cap.isOpened()):
#time.sleep(0.05)
_,frame=cap.read()
cv2.imshow("frame",frame)
image=frame
################Implementing Saliency########################
saliency = cv2.saliency.StaticSaliencySpectralResidual_create()
(success, saliencyMap) = saliency.computeSaliency(image)
saliencyMap = (saliencyMap * 255).astype("uint8")
#cv2.imshow("Image", image)
#cv2.imshow("Output", saliencyMap)
saliency = cv2.saliency.StaticSaliencyFineGrained_create()
(success, saliencyMap) = saliency.computeSaliency(image)
saliencyMap = (saliencyMap * 255).astype("uint8")
threshMap = cv2.threshold(saliencyMap.astype("uint8"), 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# show the images
#cv2.imshow("Image", image)
#cv2.imshow("saliency", saliencyMap)
#cv2.imshow("Thresh", threshMap)
kouts=saliencyMap
#cv2.imshow("kouts", kouts)
##############implementing k-means clustering#######################
clusters=12
z=kouts.reshape((-1,3))
#covert to np.float32
z=np.float32(z)
#define criteria and accuracy
criteria= (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER,5,1.0)
#apply k-means
ret,label,center=cv2.kmeans(z,clusters,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
#converting back the float 32 data to unit 8 and making the image
center=np.uint8(center)
res=center[label.flatten()]
kouts=res.reshape((kouts.shape))
cv2.imshow('clustered image',kouts)
############applying background subtraction#######################
fgmask=fgbg.apply(kouts)
fgmask1=fgbg1.apply(kouts)
cv2.imshow('fg',fgmask)
cv2.imshow('fgmask1',fgmask1)
#as i said earlier, i wanted to get the best background subtractor
#########################morphological transformation#####################
#Below i tried various techniques to get the best possible result
kernel=np.ones((5,5),np.uint8)
erosion=cv2.erode(fgmask1,kernel,iterations=1)
cv2.imshow('erosion',erosion)
dilation=cv2.dilate(fgmask1,kernel,iterations=1)
cv2.imshow('dilation',dilation)
gradient = cv2.morphologyEx(fgmask1, cv2.MORPH_GRADIENT, kernel)
cv2.imshow("gradient",gradient)
opening=cv2.morphologyEx(fgmask1,cv2.MORPH_OPEN,kernel)
closing=cv2.morphologyEx(fgmask1,cv2.MORPH_CLOSE,kernel)
cv2.imshow('opening',opening)
cv2.imshow('closing',closing)
#########for detection of contours##################
contours0, hierarchy = cv2.findContours(erosion,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours0:
area = cv2.contourArea(cnt)
if area > areaTH and area<frameArea*0.50:
M = cv2.moments(cnt)
x,y,f,g = cv2.boundingRect(cnt)
img = cv2.rectangle(frame,(x,y),(x+f,y+g),(0,255,0),2)
cv2.imshow('Original',frame)
k = cv2.waitKey(1) & 0xff
if k == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
I tried this algorithm on this video but still there was a lot of noise in the output. I previously thought that the problem might be in the quality of the video but when I did cv2.VideoCapture(0), the problem still persist and the code doesn't seem to remove the noise and the situation I'm working in, has sometimes high noise.
Tell me any suggestions or where did I go wrong or a different approach to the problem.
Thanks in advance.
I spent sometime trying to see if something can be done with noise reduction, but I believe you already tried many of the known techniques in OpenCV. My opinion is to approach your problem using neural networks as they will be more accurate detecting the objects.
I created a Colab notebook, to illustrate this:
https://colab.research.google.com/drive/1rBrcu46sfo0F7fsQf4BC9hKoXTk_wNBk?usp=sharing
Even with this simple approach, it's possible to detect objects: persons and clothing. You can set a criteria that can just consider the top 10 items. As a bus entrance has a limit of people that can enter at the same time.
This is not a final solution because I am using a general purpose detector. This can be improved in your application by training the network with your video inputs. Labeling will be required but I believe this will give you the most accurate results.
I also think for the challenge to track the people that are inside the bus and the ones entering. For that you can take track the rectangles. There is an excellent example using dlib: https://www.pyimagesearch.com/2018/10/22/object-tracking-with-dlib/
I'm working on a project that lets user take photos of handwritten formulas and send them to my server. I want to leave only symbols related to mathematics, not sheet grid.
Sample photo:
(1) Original RGB photo
(2) Blurred Grayscale
(3) After applying Adaptive Threshold
NOTE:
I expect my algorithm to deal with sheet grid of any color.
Any code snippets will be greatly appreciated.
Thanks in advance.
Result
This is a challenging problem to generalize without knowing exactly what kind of paper/lines and ink combination to expect, and what exactly the output will be used for. I'd thought I'd attempt it and maybe learn something.
I see two ways to approach this problem:
The clever way: identify the grid, its color, orientation, size to find the regions of the image occupied by it, in order to ignore it. There are major caveats here that would need to be addressed. e.g. the page may not be photographed flat and squared (warp, distortion, rotation have to accounted for). There will also be lines that we don't want removed.
The simple way: Apply general image manipulations, knowing little about the problem other than the assumptions that the pen is always darker than the grid, and the output is to be binary (black pen / white page).
I like the second one better because it is easier to implement and generalizes better.
We first notice that the "white" of the page is actually a non-uniform shade of grey (if we convert to grayscale). The CV adaptive thresholding deals with this nicely. It almost gets us there.
The code below treats the image in 50x50 pixel blocks to address the non-uniformity of lighting. In each block, we subtract the median before applying a threshold. A simple solution, but maybe what you need. I haven't tested it on many images and the threshold and pre- and post-processing may need tweaking. It will not work if input images vary significantly, or if the grid is too dark relative to the ink.
import cv2
import numpy
import sys
BLOCK_SIZE = 50
THRESHOLD = 25
def preprocess(image):
image = cv2.medianBlur(image, 3)
image = cv2.GaussianBlur(image, (3, 3), 0)
return 255 - image
def postprocess(image):
image = cv2.medianBlur(image, 5)
# image = cv2.medianBlur(image, 5)
# kernel = numpy.ones((3,3), numpy.uint8)
# image = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
return image
def get_block_index(image_shape, yx, block_size):
y = numpy.arange(max(0, yx[0]-block_size), min(image_shape[0], yx[0]+block_size))
x = numpy.arange(max(0, yx[1]-block_size), min(image_shape[1], yx[1]+block_size))
return numpy.meshgrid(y, x)
def adaptive_median_threshold(img_in):
med = numpy.median(img_in)
img_out = numpy.zeros_like(img_in)
img_out[img_in - med < THRESHOLD] = 255
return img_out
def block_image_process(image, block_size):
out_image = numpy.zeros_like(image)
for row in range(0, image.shape[0], block_size):
for col in range(0, image.shape[1], block_size):
idx = (row, col)
block_idx = get_block_index(image.shape, idx, block_size)
out_image[block_idx] = adaptive_median_threshold(image[block_idx])
return out_image
def process_image_file(filename):
image_in = cv2.cvtColor(cv2.imread(filename), cv2.COLOR_BGR2GRAY)
image_in = preprocess(image_in)
image_out = block_image_process(image_in, BLOCK_SIZE)
image_out = postprocess(image_out)
cv2.imwrite('bin_' + filename, image_out)
if __name__ == "__main__":
process_image_file(sys.argv[1])
OpenCV has a tutorial dealing with removing grid from an image:
"Extract horizontal and vertical lines by using morphological operations", OpenCV documentation,
source : https://docs.opencv.org/master/dd/dd7/tutorial_morph_lines_detection.html
This is a pretty difficult task. I also had this problem and I discovered that the solution can't be 100% accurate. BTW, just a few days ago I saw this link. Maybe it could help.
I read this blog post where he uses a Laser and a Webcam to estimated the distance of the cardboard from the Webcam.
I had another idea about that. I don't want to calculate the distance from the webcam.
I want to check if an object is approaching the webcam. The algorithm, according to me, will be something like:
Detect the object in the webcam feed.
If the object is approaching the webcam it'll grow larger and larger in the video feed.
Use this data for further calculations.
Since I want to detect random objects, I am using the findContours() method to find the contours in the video feed. Using that, I will at least have the outlines of the objects in the video feed. The source code is:
import numpy as np
import cv2
vid=cv2.VideoCapture(0)
ans, instant=vid.read()
average=np.float32(instant)
cv2.accumulateWeighted(instant, average, 0.01)
background=cv2.convertScaleAbs(average)
while(1):
_,f=vid.read()
imgray=cv2.cvtColor(f, cv2.COLOR_BGR2GRAY)
ret, thresh=cv2.threshold(imgray,127,255,0)
diff=cv2.absdiff(f, background)
cv2.imshow("input", f)
cv2.imshow("Difference", diff)
if cv2.waitKey(5)==27:
break
cv2.destroyAllWindows()
The output is:
I am stuck here. I have the contours stored in an array. What do I do with it when the size increases? How do I proceed?
One trouble here is recognising and differentiating the moving objects from other stuff in the video feed. An approach might be to let the camera 'learn' what the background looks like with no object. Then you can constantly compare its input against this background. One way to get the background is to use a running average.
Any difference greater than a small threshold means there is a moving object. If you constantly display this difference, you basically have a motion tracker. The size of the objects is simply the sum of all the non-zero (thresholded) pixels, or their bounding rectangles. You can track this size and use it to guess whether the object is moving closer or further. Morphological operations can help group the contours into one cohesive object.
Since it will be tracking ANY movement, if there are two objects, they will be counted together. Here is where you can use the contours to find and track individual objects, e.g. using the contour bounds or centroids. You could also possibly separate them by colour.
Here are some results using this strategy (the grey blob is my hand):
It actually did a fairly good job of guessing which way my hand was moving.
Code:
import cv2
import numpy as np
AVERAGE_ALPHA = 0.2 # 0-1 where 0 never adapts, and 1 instantly adapts
MOVEMENT_THRESHOLD = 30 # Lower values pick up more movement
REDUCED_SIZE = (400, 600)
MORPH_KERNEL = np.ones((10, 10), np.uint8)
def reduce_image(input_image):
"""Make the image easier to deal with."""
reduced = cv2.resize(input_image, REDUCED_SIZE)
reduced = cv2.cvtColor(reduced, cv2.COLOR_BGR2GRAY)
return reduced
# Initialise
vid = cv2.VideoCapture(0)
average = None
old_sizes = np.zeros(20)
size_update_index = 0
while (True):
got_frame, frame = vid.read()
if got_frame:
# Reduce image
reduced = reduce_image(frame)
if average is None: average = np.float32(reduced)
# Get background
cv2.accumulateWeighted(reduced, average, AVERAGE_ALPHA)
background = cv2.convertScaleAbs(average)
# Get thresholded difference image
movement = cv2.absdiff(reduced, background)
_, threshold = cv2.threshold(movement, MOVEMENT_THRESHOLD, 255, cv2.THRESH_BINARY)
# Apply morphology to help find object
dilated = cv2.dilate(threshold, MORPH_KERNEL, iterations=10)
closed = cv2.morphologyEx(dilated, cv2.MORPH_CLOSE, MORPH_KERNEL)
# Get contours
contours, _ = cv2.findContours(closed, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(closed, contours, -1, (150, 150, 150), -1)
# Find biggest bounding rectangle
areas = [cv2.contourArea(c) for c in contours]
if (areas != list()):
max_index = np.argmax(areas)
max_cont = contours[max_index]
x, y, w, h = cv2.boundingRect(max_cont)
cv2.rectangle(closed, (x, y), (x+w, y+h), (255, 255, 255), 5)
# Guess movement direction
size = w*h
if size > old_sizes.mean():
print "Towards"
else:
print "Away"
# Update object size
old_sizes[size_update_index] = size
size_update_index += 1
if (size_update_index) >= len(old_sizes): size_update_index = 0
# Display image
cv2.imshow('RaptorVision', closed)
Obviously this needs more work in terms of identifying, selecting and tracking the objects etc (at the moment it does horribly if there is something else moving in the background). There are also many parameters to vary and tweak (the ones set are what worked well for my system). I'll leave that up to you though.
Some links:
background extraction
motion tracking
If you want to get a bit more high-tech with the background removal, have a look here:
wallflower
Detect the object in the webcam feed.
If the object is approaching the webcam it'll grow larger and larger in the video feed.
Use this data for further calculations.
Good idea.
If you want to use the contour detection approach, you could do it the following way:
You have a series of Images I1, I2, ... In
Do a contour detection on each one. C1, C2, ..., Cn (Contour is a set of points in OpenCV)
Take a large enough sample on your Image i and i+1: S_i \leq C_i, i \in 1...n
Check for all points in your sample for the nearest point on i+1. Then you trajectorys for all your points.
Check if this trajectorys point mostly outwards (tricky part ;)
If they appear outwards for a suffiecent number of frames your contour got bigger.
Alternative you could try to prune the points that are not part of the correct contour and work with a covering rectangle. It's very easy to check the size that way, but i don't knwo how easy it will be to choose the "correct" points.
I'm using PIL to scale images that range anywhere from 600px wide to 2400px wide down to around 200px wide. I've already incorporated Image.ANTIALIAS and set quality=95 to try and get the highest quality image possible.
However the scaled down images still have pretty poor quality compared to the originals.
Here's the code that I'm using:
# Open the original image
fp = urllib.urlopen(image_path)
img = cStringIO.StringIO(fp.read())
im = Image.open(img)
im = im.convert('RGB')
# Resize the image
resized_image = ImageOps.fit(im, size, Image.ANTIALIAS)
# Save the image
resized_image_object = cStringIO.StringIO()
resized_image.save(resized_image_object, image_type, quality=95)
What's the best way to scale an image along these ratios while preserving as much of the image quality as possible?
I should note that my primary goal is get the maximum quality image possible. I'm not really concerned with how efficient the process is time wise.
If you can't get results with the native resize options in PIL, you can manually calculate the resize pixel values by running them through your own resizing function. There are three main algorithms (that I know of) for resizing images:
Nearest Neighbor
Bilinear Interpolation
Bicubic Interpolation
The last one will produce the highest quality image at the longest calculation time. To do this, imagine the pixel layout of the the smaller image, then scale it up to match the larger image and think about where the new pixel locations would be over the old ones. Then for each new pixel take the average value of the 16 nearest pixels (4x4 radius around it) and use that as its new value.
The resulting values for each of the pixels in the small image will be a smooth but clear resized version of the large image.
For further reading look here: Wikipedia - Bicubic interpolation
Try a different approach. I'm not sure if this will help, but I did something similar a while back:
https://stackoverflow.com/a/13211834/1339024
It may be that the original image on the urlpath is not that great quality to begin with. But if you want, try my script. I made it to shrink images in a given directory, but this portion could be of use:
parentDir = "Some\\Path"
width = 200
height = 200
cdpi = 75
cquality = 95
a = Image.open(parentDir+'\\'+imgfile) # Change this to your url type
iw,ih = a.size
if iw > width or ih > height:
pcw = width/float(iw)
pch = height/float(ih)
if pcw <= pch:
LPC = pcw
else:
LPC = pch
if 'gif' in imgfile:
a = a.convert("RGB")#,dither=Image.NONE)
a = a.resize((int(iw*LPC),int(ih*LPC)),Image.ANTIALIAS)
a = a.convert("P", dither=Image.NONE, palette=Image.ADAPTIVE)
a.save(outputDir+"\\"+imgfile,dpi=(cdpi,cdpi), quality=cquality)
else:
a = a.resize((int(iw*LPC),int(ih*LPC)),Image.ANTIALIAS)
a.save(outputDir+"\\"+imgfile,dpi=(cdpi,cdpi), quality=cquality)