Why is my image convolution function so slow? - python

I wasn't sure if I should post this on the machine learning board or this one, but I chose this one since my problem has more to do with optimization. I am trying to build a YOLO model from scratch in python, but each convolution operation takes 10 seconds. Clearly I am doing something wrong, as YOLO is supposed to be super fast (able to produce results real-time). I don't need the network to run real-time, but it will be a nightmare trying to train it if it takes several hours to run on one image. How could I optimize the code below? Apparently there is a lot of room for improvement.
Here is my convolution function:
def convolve(image, filter, stride, modifier):
new_image = np.zeros ([image.shape[0], _round((image.shape[1]-filter.shape[1])/stride)+1, _round((image.shape[2]-filter.shape[2])/stride)+1], float)
#convolve
for channel in range (0, image.shape[0]):
filterPositionX = 0
filterPositionY = 0
while filterPositionX < image.shape[1]-filter.shape[1]+1:
while filterPositionY < image.shape[2]-filter.shape[2]+1:
sum = 0
for i in range(0,filter.shape[1]):
for j in range(0,filter.shape[2]):
if filterPositionX+i<image.shape[1] and filterPositionY+j<image.shape[2]:
sum += image[channel][filterPositionX+i][filterPositionY+j]*filter[channel][i][j]
new_image[channel][int(filterPositionX/stride)][int(filterPositionY/stride)] = sum*modifier
filterPositionY += stride
filterPositionX += stride
filterPositionY = 0
#condense
condensed_new_image = np.zeros ([new_image.shape[1], new_image.shape[2]], float)
for i in range(0, new_image.shape[1]):
for j in range(0, new_image.shape[2]):
sum = 0
for channel in range (0, new_image.shape[0]):
sum += new_image[channel][i][j]
condensed_new_image[i][j] = sum
condensed_new_image = np.clip (condensed_new_image, 0, 255)
return condensed_new_image
Running the function on a 448x448 grayscale image with a 7x7 filter and a stride of 2 takes about 10 seconds. My computer has an i7 processor.

Why is it slow: Because the time complexity of the function you coded is
O(n*n*n*k*k), where image size is n*n and filter size is k*k
How to make it faster: Avoid loops and use matrix operations (vectorize). Matrix operations are parallelized.

Because it relays a lot in normal python code; i.e. your operations are performed element by element. You should vectorise them. Please take a look into this guide: https://wiseodd.github.io/techblog/2016/07/16/convnet-conv-layer/

Related

Fast way to iterate over sparse numpy array but only where elements are 1

I have an array representing a light source in space made as such:
source2D = np.zeros((256, 256))
with any amount of the pixels = 1. An example I have is a point source which is generated by:
source2D[126:128, 126:128] = 1
And I am running a monte carlo simulation which shoots a ray from each part of the array where the value = 1. Currently I am iterating over the entire array but I would save a lot of time by only picking out the elements where array = 1 and iterating over them. I should add that this function should be made to accept a generic 256x256 where any elements could be set to 1, so cropping the array is not an option. What is the fastest way to do this? I am also using tensorflow so if there is an implementation using that, that would also be an option
Right now my code looks somethinglike this:
while pc < 1000000:
pc+=1
# Randomize x and y as coordinates on source
x = np.random.randint(0, source2D.shape[0]) # 0 to 255 for this example
y = np.random.randint(0, source2D.shape[1]) # 0 to 255 for this example
# Shoot raycast from x,y to point on detector
Solved with hpaulj's comment:
source2D = np.zeros((256, 256)) # Testing with point source
source2D[126:128, 126:128] = 1
nonzero_entries = np.nonzero(source2D)
i = np.random.choice(nonzero_entries[0])
j = np.random.choice(nonzero_entries[1])

How do I reshape an image to NxNx3 blocks and perform operations on their channels separately

I am trying to get a better understanding of numpy reshaping and transpose operations so that I can perform tasks on each local area of a color image (as opposed to the image as a whole). I can do these by creating slices and looping over slices, but I would prefer not having to create python loops. I have come up with some examples that should help me understand the parts that I have been having trouble with. I ordered them from easiest to most difficult. The last one is ultimately the one that I want to solve.
img = np.random.randint(low=0, high=256, size=(6,6,3), dtype=np.uint8)
img_mean = np.mean(img) #mean of the whole image, one value.
channel_means = np.mean(img, axis=(0,1)) #mean of each channel, three values.
binarized_img = np.where(img > img_mean, np.uint8(255), np.uint8(0)) #all values changed to either 0 or 255. Shape of image remains 5,5,3.
binarized_channels = #I would like to be able to do the same as above, but by using a different mean for each channel and without using python loops.
three_by_three_block_means = #I want to reshape the array into four 3x3x3 blocks and get each block's mean (should be 4 different means).
three_by_three_block_channel_means = #Same as above, but this time I want the mean of each channel of each block (should be 12 different means).
#I also want to be able to change the block's size arbitrarily, i.e. from 3x3x3 blocks to 2x2x3 blocks when needed.
binarized_blocks = #same as binarized_img, but done separately for each block based on their means instead of the mean of the whole image.
binarized_block_channels = #same as binarized_blocks, but done separately for each channel in each block.
If someone could show me how to complete these examples using only numpy (no python loops), I could learn from them and use them to accomplish the (similar) tasks that I frequently have trouble with.
The solution to your problem are Strided Convolutions, use scipy.signal.convolve to compute the block means.
from scipy import signal
img = np.random.randint(low=0, high=256, size=(6,6,3), dtype=np.uint8)
img_mean = np.mean(img) #mean of the whole image, one value.
channel_means = np.mean(img, axis=(0,1)) #mean of each channel, three values.
binarized_img = np.where(img > img_mean, np.uint8(255), np.uint8(0)) #all values changed to either 0 or 255. Shape of image remains 5,5,3.
I would like to be able to do the same as above, but by using a
different mean for each channel and without using python loops.
binarized_channels = np.where(img > channels_mean, np.uint8(0),np.uint8(255))
I want to reshape the array into four 3x3x3 blocks and get each
block's mean (should be 4 different means).
Define a mean kernel (all ones divided by the sum of the kernel) of arbitrary shape, and perform a valid convolution of the image. Since scipy does not offer a stride argument we have to do this manually with [::s,::s].
s = 3
kernel = np.ones((s,s,s))/s**3
three_by_three_block_means = signal.convolve(img, kernel, 'valid')[::s,::s] # shape: (2, 2, 1)
Same as above, but this time I want the mean of each channel of each
block (should be 12 different means).
kernel = np.ones(s,s,1)/s**2
three_by_three_block_channel_means = np.concolve(img, kernel, 'valid')[::s,::s] # shape: (2, 2, 3)
I also want to be able to change the block's size arbitrarily, i.e.
from 3x3x3 blocks to 2x2x3 blocks when needed.
Simply change the size of the kernel.
Same as binarized_img, but done separately for each block based on
their means instead of the mean of the whole image.
binarized_blocks = np.where(three_by_three_block_means > img_mean,np.uint8(0),np.uint8(255))
Same as binarized_blocks, but done separately for each channel in each
block.
binarized_block_channels = np.where(three_by_three_block_channel_means > channel_means, np.uint8(0), np.uint8(255))
Hope that solves your problem. Let me know if something is unclear.

3D Perlin Noise Normalize function in C#

I've spent the past week or two making a personalized Perlin noise generator (notice I said personalized because I don't want to use other generators), but I'm not a super-skilled programmer, and it's really slow. To speed it up, I've been looking into C#, because it's close to python and java, which are my two best languages, and it's not C. Problem is, I programmed the entire generator in python, which is not my strongest language, and had I programmed it in java I would've had an easier time converting it to C#.
Now I'm trying to translate my generator directly from python to C#, which I can do pretty easily for the most part, but I'm a little iffy on some stuff that my instructor coded for me. Namely, this normalize function:
# np is numpy
def normalize(img):
img_copy=img*1.0
img_copy-=np.min(img_copy)
img_copy/=np.max(img_copy)
img_copy*=255.9999
return np.uint8(img_copy)
I don't know if C# can do this almost instantaneous list comprehension without excessive for-looping, and I also don't know much about NumSharp, which is what I'd use instead of numpy.
how would I write this function in C#, and how do I use the NumSharp equivalent of the numpy functions zeros(), max(), min() and the cv2 function resize?
P.S. I have the program on repl.it if you need more context.
https://repl.it/#JoshuaFavorite/PerlinNoiseGenerator#main.py
Edit: apparently it isn't clear that my python program is fully functioning and I don't need any help with that, I need help with C#. Specifically instantaneous matrix multiplication, matrix statistics and such things that are so easily done with python.
Sorry, I do not know C#, but here is one way to generate perlin noise in Python OpenCV
- Start with a black image
- Iterate generating a noise image at different dimensions according to power law
- Resize it
- Attenuate it
- Add it to the previous iteration
- Scale to range 0 to 255 as integer
- Save the results
import cv2
import skimage.exposure
import numpy as np
from numpy.random import default_rng
rng = default_rng()
# define argument
wd = 500 # width output
ht = 500 # height of output
base = 2 #integer>1; typically 2 to 4; frequency=base^(octave level)
startlevel = 1 #starting octave level; integer>0
endlevel = 6 #ending octave level; nominally 5 or 6
atten = 0 #persistence=1/atten; amp=persist^(j-1); atten=0 -> amp=1/j; j=1,2...; atten=2 is good, also
# compute larger of wd and ht
max_dim = max(wd,ht)
#compute start dim as base^(startlevel)
start_dim = base**startlevel
# compute end dim as base^(endlevel)
end_dim = base**endlevel
# create zero frequency black base to which to add octaves
result = np.zeros((max_dim,max_dim),dtype=np.float32)
# process octaves
j = 1
for i in range (startlevel,endlevel):
if atten == 0:
amp = 1/j
else:
amp = 1/(atten**(j-1))
print("Processing Octave Level:",i," and Amplitude:",amp)
# create noise image, attenuate it, combine with zero frequency initial level
dim = base**i
noise = rng.integers(0,255,(dim,dim),np.uint8,True).astype(np.float32)
# resize to max_dim and attenuate
noise = amp * cv2.resize(noise, (max_dim,max_dim), interpolation=cv2.INTER_CUBIC).astype(np.float32)
# add to result
result = cv2.add(result, noise)
# stop if end_dim for given endlevel is larger than max_dim
if end_dim > max_dim:
break
# increment
i += 1
j += 1
# scale to range 0 to 255, and crop to desired dimensions
result = skimage.exposure.rescale_intensity(result, in_range='image', out_range=(0,255)).clip(0,255).astype(np.uint8)
result = result[0:ht, 0:wd]
# save result
cv2.imwrite('perlin.jpg', result)
# show results
cv2.imshow('perlin', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Atten 0, Startlevel 1 and Endlevel 6:
Atten 0, Startlevel 1 and Endlevel 5:
Atten 2, Startlevel 1 and Endlevel 6:

Which is the fastest method to calculate means square error in large image dataset?

I'm trying to calculate the mean square error in an image dataset(CIFAR-10). I have a numpy array of dimension 5*10000*32*32*3 which is, in words, 5 batches of 10000 images each with dimensions of 32*32*3. These images belong to 10 categories of images. I have calculated average of each class and now I'm trying to calculate the mean square error of each of the 50000 images wrt the 10 average images. Here is the code:
for i in range(0, 5):
for j in range(0, 10000):
min_diff, min_class = float('inf'), 0
for avg in class_avg: # avg class comprises of 10 average images
temp = mse(avg[1], images[i][j])
if temp < min_diff:
min_diff = temp
min_class = avg[0]
train_pred[i][j] = min_class
Problem: Is there any way to make it faster. Any numpy magic? Thank you.
You can use expand_dims and tile.
There are many ways of expanding the dimension of an array, I will use one of them, which is something like [:,None,:], this adds a new axis in the middle.
Below is an example of how you can combine the two methods to fulfill your task:
test = np.ones((5,100,32,32,3)) # batches of images
average = np.ones((10,32,32,3)) # the 10 images
average = average[None,None,...] # reshape to (1,1,10,32,32,3)
test = test[:,:,None,...] # insert an axis
test = np.tile(test,(1,1,10,1,1,1)) # reshape to (5,100,10,32,32,3)
print(test.shape,average.shape)
mse = ((test-average)**2).mean(axis=(3,4,5))
class_idx = np.argmin(mse,axis=-1)
UPDATE
The purpose of using expand_dims and tile is to avoid using a for-loop. However, the np.tile operation will create 10 replicates of the original array, this will definitely hurt the performance if the array is large. To avoid using np.tile, you can try the code below:
labels = np.empty((5,100,10))
average = np.ones((10,32,32,3))
average = average[None,...]
test = np.ones((5,100,32,32,3))
for ind in range(10):
labels[...,ind] = ((test-average[:,ind,...])**2).mean(axis=(2,3,4))
labels = np.argmin(labels,axis=-1)

Python (numpy) crashes system with large number of array elements

I'm trying to build a basic character recognition model using the many classifiers that scikit provides. The dataset being used is a standard handwritten set of alphanumeric samples (Chars74K image dataset taken from this source: EnglishHnd.tgz).
There are 55 samples of each character (62 alphanumeric characters in all), each being 900x1200 pixels. I'm flattening the matrix (first converting to grayscale) into a 1x1080000 array (each representing a feature).
for sample in sample_images: # sample images is the list of the .png files
img = imread(sample);
img_gray = rgb2gray(img);
if n == 0 and m == 0: # n and m are global variables
n, m = np.shape(img_gray);
img_gray = np.reshape(img_gray, n*m);
img_gray = np.append(img_gray, sample_id); # sample id stores the label of the training sample
if len(samples) == 0: # samples is the final numpy ndarray
samples = np.append(samples, img_gray);
samples = np.reshape(samples, [1, n*m + 1]);
else:
samples = np.append(samples, [img_gray], axis=0);
So the final data structure should have 55x62 arrays, where each array is 1080000 elements in capacity. Only the final structure is being stored (the scope of the intermediate matrices is local).
The amount of data being stored to learn the model is pretty large (I guess), because the program isn't really progressing beyond a point, and crashed my system to the extent that the BIOS had to be repaired!
Upto this point, the program is only gathering the data to send to the classifier ... the classification hasn't even been introduced into the code yet.
Any suggestions as to what can be done to handle the data more efficiently?
Note: I'm using numpy to store the final structure of flattened matrices.
Also, the system has an 8Gb RAM.
This seems like a case of stack overflow. You have 3,682,800,000 array elements, if I understand your question. What is the element type? If it is one byte, that is about 3 gigabytes of data, easily enough to fill up your stack size (usually about 1 megabyte). Even with one bit an element, you are still at 500 mb. Try using heap memory (up to 8 gigs on your machine)
I was encouraged to post this as a solution, although the comments above are probably more enlightening.
The issue with the users program is two fold. Really it's just overwhelming the stack.
Much more common, especially with image processing in things like computer graphics or computer vision, is to process the images one at a time. This could work well with sklearn where you could just be updating your models as you read in the image.
You could use this bit of code found from this stack article:
import os
rootdir = '/path/to/my/pictures'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file[-3:] == 'png': # or whatever your file type is / some check
# do your training here
img = imread(file)
img_gray = rgb2gray(img)
if n == 0 and m == 0: # n and m are global variables
n, m = np.shape(img_gray);
img_gray = np.reshape(img_gray, n*m)
# sample id stores the label of the training sample
img_gray = np.append(img_gray, sample_id)
# samples is the final numpy ndarray
if len(samples) == 0:
samples = np.append(samples, img_gray);
samples = np.reshape(samples, [1, n*m + 1])
else:
samples = np.append(samples, [img_gray], axis=0)
This is more of pseudocode, but the general flow should have the right idea. Let me know if there's anything else I can do! Also check out OpenCV if you're interested on some cool deep learning algorithms. They're a bunch of cool stuff there and images make for great sample data.
Hope this helps.

Categories