When using cv2.HOGDescriptor().detectMultiScale, what is the starting size of the detection window used? Is it the same size as my training data?
For example, if my training data are all 64*128 images then the detection window starts at 64*128?
and how is the scaling factor used? For example, If I want to detect humans on an image of size 640*512, and I set scale=1.05, how is this 1.05 used?
The detection window is always 64 x 128 by default. To accommodate for the multiscale, the image is progressively scaled to create an image pyramid while keeping the detection window of 64 x 128 the same. This achieves the effect of searching for humans at larger sized search windows in order to keep the search window the same size. The image pyramid is constructed by progressively decreasing the image size by the scale factor until the 64 x 128 search window can no longer fit inside the rescaled image. Therefore, if your search images already consist of 64 x 128 images then there will only be one scale.
This moves to your next question where if scale=1.05, we produce an image pyramid by progressively resizing the input image rows and columns by rows / (scale ** i) and cols / (scale ** i) where i = 0, 1, 2, ... to provide an image pyramid. For each image in the pyramid, we use the 64 x 128 search window to look for the object of interest.
Related
I want to load a Keras model using saved training weights and make prediction on RBG images.
My model generate a binary value for each pixel. For the input, the model should load w x w pixels around the center pixel, then apply a an special filter (e.g. crop the window circularly and rotate the circle. just assume some filter that uses all pixel in that patch) and then make the prediction.
# Circular filter added to each window
def circular_mask(width = 5):
radius = (width - 1) / 2
Y, X = np.ogrid[:width, :width]
distance = np.sqrt((Y - radius) ** 2 + (X - radius) ** 2)
return distance <= radius
I can prepare the dataset myself, generate a huge numpy array and call model.predict() on that huge matrix. But when the image is large, the data doesn't fit in the GPU's memory. My questions is, is there a better way to automate the sliding window prediction on the image on which my special_filter is added to every window?
I am training a Convolutional Neural Network (in tensorflow-gpu) to segment histology slides.
My problem is that the prediction method is extremely slow. The architecture of the neural network is set-up to receive a 75 x 75 RGB pixel array as an input, and classify the central pixel. In other words, for each 75x75 window of pixels the neural net receives, it only classifies 1 pixel (at the window's centre):
I've set up the neural network in this way so that it can be scaled-up and applied to any size image. Each 'window' exists purely to contextualise it's central pixel, which the neural network classifies. The prediction method loops through every pixel in the input image and uses its corresponding 75 x 75 RGB window to classify it.
My current method of generating the 75x75 windows is python-written, slow and unnecessarily serialised (uses for-loops).
Does a parallelised method, that can convert an image into a set of RGB windows, exist?
For example, it would convert a 400 x 700 x 3 image into an matrix of size 280'000 x 75 x 75 x 3. This is because as there are 280'000 pixels in the input image (400x700=280'000), and therefore there should be 280'000 windows, with each of the input's pixels at their centre. As each window has the dimension 75 x 75 x 3, and there are 280'000 windows, the method's output size would be 280'000 x 75 x 75 x 3.
Ideally, I imagine such a method would utilise any available GPUs, due to their advantages in image-processing and parallelised jobs.
Thank you for reading, all suggestions are welcome. :)
I managed to find the perfect function, skimage.util.view_as_windows().
Here is how I used this function, in an example:
First open the image and set your preferred 'window' size:
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from skimage.util import view_as_windows as windows
#lets use a window size of 75 pixels
window_size=75
#load image
data = np.asarray(Image.open("N:\\Insigneo Research Project\\MyScripts\\PoC_CNN\\MATLABLabelling\\split image\\12-1550A-001_01_01.png"))
print(data.shape)
img = plt.imshow(data, interpolation='nearest')
plt.show()
Output: (742, 486, 3), (Image in link below)
Output of above cell: A loaded example image
Then, you must use the window size to pad each of the RGB channels, to allow all pixels to have their own centralised window (padding allows the edge pixels to be centralised):
#assertion ensures window dimensions are odd, ensuring a 'central' pixel exists.
assert window_size%2==1
buffer = int((window_size-1)/2)
padded_image_data = [0]*3
for i in range(3):
padded_image_data[i] = np.pad(data[:,:,i], (buffer,buffer), 'symmetric')
padded_image_data = np.dstack((padded_image_data[0],padded_image_data[1],padded_image_data[2]))
img = plt.imshow(padded_image_data, interpolation='nearest')
plt.show()
Output: (Image in link below)
Output image of above cell: Symmetrically padded version of example image.
And then finally, apply the view_as_windows (shortened to 'windows') function to the padded RGB image:
large_window_matrix = windows(padded_image_data, ((window_size,window_size,3))).reshape(data.shape[0]*data.shape[1],window_size,window_size,3)
print(large_window_matrix.shape)
img = plt.imshow(large_window_matrix[1], interpolation='nearest')
plt.show()
Output: (360612, 75, 75, 3), (Image in link below)
Output of above cell: The first 75x75 RGB window from the padded example image. This window is centred around the most upper-left pixel.
There you have it! the 'large_window_matrix' corresponds to the large matrix holding all the RGB windows:)
The best code for mosaic I've found you can see at this page:
https://github.com/codebox/mosaic
However, the code doesn't work well on my Windows computer, and also I think the code is too advanced for what it should do. Here are my requirements I've posted on reddit:
1) The main photo already has reduced number of colors (8)
2) I have already every image associated with colour needed to be replaced (e.g. number 1 is supposed to replace black pixels, number 2 replaces green pixels...)
3) I need to enlarge the photo by the small photo's size (9 x 9 small photos will produce 81 times bigger image), which should push the pixels "2n" points away from each other, but instead of producing a n x n same-coloured area around every single one of them (this is how I believe enlarging works in general, correct me if I'm wrong), it will just colour the white spaces with unrecognized colour, which is not associated with any small photo (let's call that colour C)
4) Now all it needs is to run through all non-C coloured pixels and put an image centered on that pixel, which would create the mosaic.
Since I'm pretty new to Python (esp. graphics) and need it just for one use, could someone help me with creating that code? I think that code I got inspired with is too complicated. Two things I don't need:
1) "approximation" - if the enlargement is lesser than needed for 100% quality (e.g. the pictures are 9x9, but every side of the original photo can be only 3 times larger, then the program needs to merge some pixels of different colours together, leading to quality loss)
2) association colour - picture: my palette of pictures is small and of colours as well, I can do it manually
For the ones who didn't get what I mean, here is my idea: https://ibb.co/9GNhqBx
I had a quick go using pyvips:
#!/usr/bin/python3
import sys
import os
import pyvips
if len(sys.argv) != 4:
print("usage: tile-directory input-image output-image")
sys.exit(1)
# the size of each tile ... 16x16 for us
tile_size = 16
# load all the tile images, forcing them to the tile size
print(f"loading tiles from {sys.argv[1]} ...")
for root, dirs, files in os.walk(sys.argv[1]):
tiles = [pyvips.Image.thumbnail(os.path.join(root, name), tile_size,
height=tile_size, size="force")
for name in files]
# drop any alpha
tiles = [image.flatten() if image.hasalpha() else image
for image in tiles]
# copy the tiles to memory, since we'll be using them many times
tiles = [image.copy_memory() for image in tiles]
# calculate the average rgb for an image, eg. image -> [12, 13, 128]
def avg_rgb(image):
m = image.stats()
return [m(4,i)[0] for i in range(1,4)]
# find the avg rgb for each tile
tile_colours = [avg_rgb(image) for image in tiles]
# load the main image ... we can do this in streaming mode, since we only
# make a single pass over the image
main = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
# find the abs of an image, treating each pixel as a vector
def pyth(image):
return sum([band ** 2 for band in image.bandsplit()]) ** 0.5
# calculate a distance map from the main image to each tile colour
distance = [pyth(main - colour) for colour in tile_colours]
# make a distance index -- hide the tile index in the bottom 16 bits of the
# distance measure
index = [(distance[i] << 16) + i for i in range(len(distance))]
# find the minimum distance for each pixel and mask out the bottom 16 bits to
# get the tile index for each pixel
index = index[0].bandrank(index[1:], index=0) & 0xffff
# replicate each tile image to make a set of layers, and zoom the index to
# make an index matching the output size
layers = [tile.replicate(main.width, main.height) for tile in tiles]
index = index.zoom(tile_size, tile_size)
# now for each layer, select pixels matching the index
final = pyvips.Image.black(main.width * tile_size, main.height * tile_size)
for i in range(len(layers)):
final = (index == i).ifthenelse(layers[i], final)
print(f"writing {sys.argv[3]} ...")
final.write_to_file(sys.argv[3])
I hope it's easy to read. I can run it like this:
$ ./mosaic3.py smallpic/ mainpic/Use\ this.jpg x.png
loading tiles from smallpic/ ...
writing x.png ...
$
It takes about 5s on this 2015 laptop and makes this image:
I had to shrink it for upload, but here's a detail (bottom left of the first H):
Here's a google drive link to the mosaic, perhaps it'll work: https://drive.google.com/file/d/1J3ofrLUhkuvALKN1xamWqfW4sUksIKQl/view?usp=sharing
And here's this code on github: https://github.com/jcupitt/mosaic
I am working with face crops of different sizes, which are generated at inference time from a video. The number of face crops per frame can vary. Lets say I have n = 3 face crops of sizes (127,321,3), (119,258,3), (135,127,3). I need to resize them in a single go to a new size (new_w,new_h,3). The resizing will generally be down-sampling. What if it some of the crops require up-sampling?
I am currently using opencv resize function in a loop, but I need to do this in realtime (< 10 ms). I tried to convert images to numpy arrays and resize them as numpy arrays but it distorts the images. I have also tried creating process pool, but it takes too many resources which I can't afford since I also have a deep learning network loaded in my memory.
new_w, new_h = 112, 112 # new shape we need to resize images to
crops_batch = [[...],[...],[...]] # contains 3 images
resized_crops = [len(crops_batch),3,new_w,new_h]
for i in range(0, len(crops_batch)):
temp_img = crops_batch[i]
resized_img = cv2.resize(temp_img, (new_h, new_w))
resized_img = np.transpose(resized_img , (2,0,1)) # channels, width, height
resized_crops[i,:,:,:] = resized_img
My target is to get resized_crops array of shape (n,3,new_w,new_h) using batching, since I will have to process hundreds of crops per second.
**Edit: ** What if all the face_crops are of same size already? How can we do the resizing in batch then?
**Edit 2: ** I am open to a solution in C++ which can help batch resize.
I want to create a thumbnail, and the width has to be either fixed or no bigger than 200 pixels wide (the length can be anything).
The images are either .jpg or .png or .gif
I am using python.
The reason it has to be fixed is so that it fits inside a html table cell.
To keep proportions the same, you need to multiply both the width and the height by the same scaling factor. Calculate each independently to fit inside your space, then choose the smallest of the two. You say you don't care about the height, but you might want to set a bound on it anyway in case someone feeds you a really skinny image.
In the code below, I've added two additional constraints: the resulting thumbnail width or height will always be >= 1, and the scaling factor will will always be <= 1 (so that the thumbnail isn't larger than the original).
scale_x = max_width / image_width
scale_y = max_height / image_height
scale = min(scale_x, scale_y, 1)
thumb_width = max(round(image_width * scale), 1)
thumb_height = max(round(image_height * scale), 1)
Look at PyMagick, the python interface for the ImageMagick libraries. It's fairly simple to resize an image, retaining proportion, while limiting the longest side.
edit: when I say fairly simple, I mean you can describe your resize in terms of the longest acceptable values for each side, and ImageMagick will preserve the proportions automatically.
Support suggestion of using PIL. However, the calculation is actually much simpler:
from PIL import Image as PILImage
imageObj = PILImage.open(image_filename,'r')
iwidth, iheight = imageObj.size # pixels
size_proportion = iheight / iwidth # make sure your "limiter" is the denominator
newheight = size_proportion * 200
# resize the image to (newheight, 200) and save it
Alternatively, just call out to subprocess and use ImageMagic or GraphicsMagic (i use latter) These libs give you very good scaling algorithms, are written in lower level language and are very much optimized. One extra nice thing IM and GM do is mass processing of images. Another nice thing is that in some modes you don't need to give GraphicsMagic the needed size, just give maximums, and it will scale the picture down based on whichever constraint exceeds your given maximums. Check it out.