I am working with face crops of different sizes, which are generated at inference time from a video. The number of face crops per frame can vary. Lets say I have n = 3 face crops of sizes (127,321,3), (119,258,3), (135,127,3). I need to resize them in a single go to a new size (new_w,new_h,3). The resizing will generally be down-sampling. What if it some of the crops require up-sampling?
I am currently using opencv resize function in a loop, but I need to do this in realtime (< 10 ms). I tried to convert images to numpy arrays and resize them as numpy arrays but it distorts the images. I have also tried creating process pool, but it takes too many resources which I can't afford since I also have a deep learning network loaded in my memory.
new_w, new_h = 112, 112 # new shape we need to resize images to
crops_batch = [[...],[...],[...]] # contains 3 images
resized_crops = [len(crops_batch),3,new_w,new_h]
for i in range(0, len(crops_batch)):
temp_img = crops_batch[i]
resized_img = cv2.resize(temp_img, (new_h, new_w))
resized_img = np.transpose(resized_img , (2,0,1)) # channels, width, height
resized_crops[i,:,:,:] = resized_img
My target is to get resized_crops array of shape (n,3,new_w,new_h) using batching, since I will have to process hundreds of crops per second.
**Edit: ** What if all the face_crops are of same size already? How can we do the resizing in batch then?
**Edit 2: ** I am open to a solution in C++ which can help batch resize.
Related
When using cv2.HOGDescriptor().detectMultiScale, what is the starting size of the detection window used? Is it the same size as my training data?
For example, if my training data are all 64*128 images then the detection window starts at 64*128?
and how is the scaling factor used? For example, If I want to detect humans on an image of size 640*512, and I set scale=1.05, how is this 1.05 used?
The detection window is always 64 x 128 by default. To accommodate for the multiscale, the image is progressively scaled to create an image pyramid while keeping the detection window of 64 x 128 the same. This achieves the effect of searching for humans at larger sized search windows in order to keep the search window the same size. The image pyramid is constructed by progressively decreasing the image size by the scale factor until the 64 x 128 search window can no longer fit inside the rescaled image. Therefore, if your search images already consist of 64 x 128 images then there will only be one scale.
This moves to your next question where if scale=1.05, we produce an image pyramid by progressively resizing the input image rows and columns by rows / (scale ** i) and cols / (scale ** i) where i = 0, 1, 2, ... to provide an image pyramid. For each image in the pyramid, we use the 64 x 128 search window to look for the object of interest.
I am training a Convolutional Neural Network (in tensorflow-gpu) to segment histology slides.
My problem is that the prediction method is extremely slow. The architecture of the neural network is set-up to receive a 75 x 75 RGB pixel array as an input, and classify the central pixel. In other words, for each 75x75 window of pixels the neural net receives, it only classifies 1 pixel (at the window's centre):
I've set up the neural network in this way so that it can be scaled-up and applied to any size image. Each 'window' exists purely to contextualise it's central pixel, which the neural network classifies. The prediction method loops through every pixel in the input image and uses its corresponding 75 x 75 RGB window to classify it.
My current method of generating the 75x75 windows is python-written, slow and unnecessarily serialised (uses for-loops).
Does a parallelised method, that can convert an image into a set of RGB windows, exist?
For example, it would convert a 400 x 700 x 3 image into an matrix of size 280'000 x 75 x 75 x 3. This is because as there are 280'000 pixels in the input image (400x700=280'000), and therefore there should be 280'000 windows, with each of the input's pixels at their centre. As each window has the dimension 75 x 75 x 3, and there are 280'000 windows, the method's output size would be 280'000 x 75 x 75 x 3.
Ideally, I imagine such a method would utilise any available GPUs, due to their advantages in image-processing and parallelised jobs.
Thank you for reading, all suggestions are welcome. :)
I managed to find the perfect function, skimage.util.view_as_windows().
Here is how I used this function, in an example:
First open the image and set your preferred 'window' size:
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from skimage.util import view_as_windows as windows
#lets use a window size of 75 pixels
window_size=75
#load image
data = np.asarray(Image.open("N:\\Insigneo Research Project\\MyScripts\\PoC_CNN\\MATLABLabelling\\split image\\12-1550A-001_01_01.png"))
print(data.shape)
img = plt.imshow(data, interpolation='nearest')
plt.show()
Output: (742, 486, 3), (Image in link below)
Output of above cell: A loaded example image
Then, you must use the window size to pad each of the RGB channels, to allow all pixels to have their own centralised window (padding allows the edge pixels to be centralised):
#assertion ensures window dimensions are odd, ensuring a 'central' pixel exists.
assert window_size%2==1
buffer = int((window_size-1)/2)
padded_image_data = [0]*3
for i in range(3):
padded_image_data[i] = np.pad(data[:,:,i], (buffer,buffer), 'symmetric')
padded_image_data = np.dstack((padded_image_data[0],padded_image_data[1],padded_image_data[2]))
img = plt.imshow(padded_image_data, interpolation='nearest')
plt.show()
Output: (Image in link below)
Output image of above cell: Symmetrically padded version of example image.
And then finally, apply the view_as_windows (shortened to 'windows') function to the padded RGB image:
large_window_matrix = windows(padded_image_data, ((window_size,window_size,3))).reshape(data.shape[0]*data.shape[1],window_size,window_size,3)
print(large_window_matrix.shape)
img = plt.imshow(large_window_matrix[1], interpolation='nearest')
plt.show()
Output: (360612, 75, 75, 3), (Image in link below)
Output of above cell: The first 75x75 RGB window from the padded example image. This window is centred around the most upper-left pixel.
There you have it! the 'large_window_matrix' corresponds to the large matrix holding all the RGB windows:)
I am trying to perform pixel classification to perform segmentation of images using machine learning such as SVM, RandomForest etc.
I managed to get an acceptable result by using the grayscale values and RGB values of the image and associating each pixel with its ground truth. Avoiding to post the full code, here is how I made the feature and label array when using the full image:
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
feature_img = np.zeros((img.shape[0], img.shape[1], 4)) # container array, first three dimensions for the rgb values, and the last will hold the grayscale
feature_img[:, :, :3] = img
feature_img[:, :, 3] = img_gray
features = feature_img.reshape(feature_img.shape[0] * feature_img.shape[1], feature_img.shape[2])
gt_features = gt_img.reshape(gt_img.shape[0] * gt_img.shape[1], 1)
For an image of size 512*512 the above will give a features of shape [262144, 4] and an accompanying gt_feature of shape [262144, 1].
this gives me the x and y for sklearn.svm.SVC and like mentioned above this works well.. but the image is very noisy.. since SVM works well with higher dimensionality data I intend to explore that by splitting the image into windows.
Based on the above code, I wanted to split my image of size [512, 1024] into blocks of size [64*64] and use these for training the SVM.
Following the above format, I wrote the below code to split my image into blocks and then .reshape() it into the required format for the classifier but its not working as expected:
win_size = 64
feature_img = blockshaped(img_gray, win_size, win_size)
feature_label = blockshaped(gt_img, win_size, win_size)
# above returns arrays of shape [128, 64, 64]
features = feature_img.reshape(feature_img.shape[1] * feature_img.shape[2], feature_img.shape[0])
# features is of shape [4096, 128]
label_ = feature_label.reshape(feature_label.shape[0] * feature_label.shape[1] * feature_label.shape[2], 1)
# this, as expected returns ``[524288, 1]``
The function blockshaped is from the answer provided here: Slice 2d array into smaller 2d arrays
The reason i want to increase the dimensionality of my feature data is because it is known that SVM works well with higher dimension data and also want to see if block or patch based approach helps the segmentation result.
How would I go about arranging my data, that I have broken down into windows, in a form that can be used to train a classifier?
I've been thinking about your question for 5 hours and read some books to find the answer!
your approach is completely wrong if you are doing segmentation!
when we are using machine learning methods to segmentation we absolutely don't change any pixel place at all.
not only in SVM but also in the neural network when we are approaching segmentation we don't use Pooling methods and even in CNN we use the Same padding to avoid pixel moving.
I have a numpy array batch initialized as follows:
batch = np.zeros((50, 60, 1920, 1080, 3))
Its supposed to be an array of 50 different, 60FPS videos of dimension 1920x1080, and the 3 represents three channels - red, green, blue. Each video is of exactly 1 second.
I iterate through all videos in my video folder and perform image processing on each frame of every video. Then, I write the transformed video in to the batch array. How do I properly index the batch array to save each video conforming with the dimension of batch array?
So far I have tried the following:
batch[:batches_produced, :idx, :] = frame[:]
where batches_produced is the current batch item index, idx is the current frame's index and frame is the actual frame of dimension (1920x1080x3).
When I
print(batch_data[1,2,:,:,:].shape), it throws
IndexError: index 1 is out of bounds for axis 0 with size 1.
Needless to say, this isn't working at all. I have spent most of my day trying to figure this out.
Any help would be greatly appreciated!
For my next university-project i will have to teach a Convoluted Neural Network how to denoise a picture of a face so i started digging the we for datasets of faces. I stumbled upon this dataset (CelebA) with 200k+ pictures of people and i found the first few problems: there are too many pictures to do basic computation on them.
I should:
Open each image and make a numpy array out of it (dlib.load_rgb_image is fine)
Find a face it, use the 5 point shape predictor to find the eyes and align them
Rotate the picture so that the eyes are in a straight horizontal line
Crop the face and resize it to 256x256 (i could choose 64x64 but its not a huge time saver)
Make a copy and add artificial noise to it
Save them both to two different folder
On a pc that the university gave me i could do about 40ish image each minute, around 57k images every 24hours.
To speedup thing i have tried threads; one thread for each pictures but the speedup is about 2-3 images more per-minute.
This is the code i'm running:
### Out of the threads, before running them ###
def img_crop(img, bounding_box):
# some code using cv2.copyMakeBorder to crop the image
MODEL_5_LANDMARK = "5_point.dat"
shape_preditor = dlib.shape_predictor(MODEL_5_LANDMARK)
detector = dlib.get_frontal_face_detector()
### Inside each thread ###
img_in = dlib.load_rgb_image("img_in.jpg")
dets = detector(img_in, 1)
shape = shape_preditor(img_in, dets[0])
points = []
for i in range(0, shape.num_parts):
point = shape.part(i)
points.append((point.x, point.y))
eye_sx = points[1]
eye_dx = points[3]
dy = eye_dx[1] - eye_sx[1]
dx = eye_dx[0] - eye_sx[0]
angle = math.degrees(math.atan2(dy, dx))
center = (dets[0].center().x, dets[0].center().y)
h, w, _ = img_in.shape
M = cv2.getRotationMatrix2D(center, angle + 180, 1)
img_in = cv2.warpAffine(img_in, M, (w, h))
dets = detector(img_in, 1)
bbox = (dets[0].left(), dets[0].top(), dets[0].right(), dets[0].bottom())
img_out = cv2.resize(imcrop(img_in, bbox), (256, 256))
img_out = cv2.cvtColor(img_out, cv2.COLOR_BGR2RGB)
img_noisy = skimage.util.random_noise(img_out, ....)
cv2.imwrite('out.jpg', img_out)
cv2.imwrite('out_noise.jpg', img_noisy)
My programming language is Python3.6, how i can speedup things?
Another problem will be loading the whole 200k images into memory as numpy array, from my initial testing 12k images will take around 80seconds with a final shape of (12000, 256, 256, 3). Is there a faster way to achieve this?
First of all, forgive me because I am familiar with c++ only. Please find below my suggestion to speed up dlib functions and convert to your python version if it is helpful.
Color does not matter to dlib. Hence, change input image to gray before proceeding to save time.
I saw you call the below function twice, what is the purpose? it could double the consuming time. If you need to get the new landmarks after alignment, try to rotate landmarks points directly instead of re-detecting. How to rotate points
dets = detector(img_in, 1)
Because you just want to detect 1 face per image only. Try to set pyramid_down to 6 (by default it is 1 - room out the image to detect more face). You can test value from 1 - 6
dets = detector(img_in, 6)
Turn on AVX instruction.
Note: more detail could be found here Dlib Github