Python OpenCV speedup for multiprocessing

Python OpenCV speedup for multiprocessing - python

I am trying to run my image processing algorithm on a live feed from the webcam.
I want this to run in a parallel process from the multiprocessing module, how can i implement this?
This is my current code without parallel coding:
from cv2 import VideoCapture , imshow , waitKey ,imwrite
import numpy as np
from time import time
def greenify (x):
return some_value
skip = 4
video = VideoCapture(0)
video.set(3,640/skip)
video.set(4,480/skip)
total = 0
top_N = 100
while True:
image = video.read()[1]
if waitKey(1) == 27:
break
arr = array([list(map(greenify,j)) for j in image])
result = unravel_index(argpartition(arr,arr.size-top_N,axis=None)[-top_N:], arr.shape)
centre = skip*np.median(result[0]) , skip*np.median(result[1])
imshow('Feed', image)
print('Time taken:',total)
video.release()

I have modified your code, basically, you make it a function, then you call it in parallel. call bob.start() wherever you want in the code, and within a few miliseconds, the parallel code will run
import numpy as np
from cv2 import VideoCapture
from multiprocessing import Process, Manager
import multiprocessing as mp
def getcors():
skip = 4
top_N = 100
video = VideoCapture(0)
video.set(3,640/skip)
video.set(4,480/skip)
while True:
frame = video.read()[1]
arr = np.array([list(map(greenify,j)) for j in frame])
result = np.unravel_index(np.argpartition(arr,arr.size-top_N,axis=None)[-top_N:], arr.shape)
centre = skip * np.median(result[1]) , skip*np.median(result[0])
bob = Process(target = getcors)

Related

How Could I increase the speed

I am using below code for an image processing related study. The code works fine as functionality but it is too slow that one step takes up to 10 seconds.
I need faster process speed to reach at the aim.
import numpy
import glob, os
import cv2
import os
input = cv2.imread(path)
def nothing(x): # for trackbar
pass
windowName = "Image"
cv2.namedWindow(windowName)
cv2.createTrackbar("coef", windowName, 0, 25000, nothing)
condition = True
while (condition):
coef = cv2.getTrackbarPos("coef", windowName)
temp_img = input
row = temp_img.shape[0]
col = temp_img.shape[1]
print(coef)
red = []
green = []
for i in range(row):
for y in range(col):
# temp_img[i][y][0] = 0
temp_img[i][y][1] = temp_img[i][y][1]* (coef / 100)
temp_img[i][y][1] = temp_img[i][y][2] * (1 - (coef / 100))
# relative_diff = value_g - value_r
# temp =cv2.resize(temp,(1000,800))
cv2.imshow(windowName, temp_img)
# cv2.imwrite("output2.jpg", temp)
print("fin")
# cv2.waitKey(0)
if cv2.waitKey(30) >= 0:
condition = False
cv2.destroyAllWindows()
Is there anybody have an idea having faster result on the aim?

It's not entirely clear to me what object temp_img is exactly, but if it behaves like a numpy array, you could replace your loop by
temp_img[:,:,0] = temp_img[:,:,1]*(coef/100)
temp_img[:,:,1] = temp_img[:,:,2]*(1-coef/1000)
which should result in a significant speed up if your array is large. The implementation of such operations on arrays are optimised very well, whereas python loops are generally quite slow.
Edit based on comments:
Since you're working with large images and have some expensive operations that need an unscaled version but only need to be executed once, your code could get the following kind of structure
import... #do all your imports
def expensive_operations(image, *args, **kwargs):
#do all your expensive operations like object detection
def scale_image(image, scale):
#create a scaled version of image
def cheap_operations(scaled_image, windowName):
#perform cheap operations, e.g.
coef = cv2.getTrackbarPos("coef", windowName)
temp_img = np.copy(scaled_image)
temp_img[:,:,1] = temp_img[:,:,1]* (coef / 100)
temp_img[:,:,2] = temp_img[:,:,2] * (1 - (coef / 100))
cv2.imshow(windowName, temp_img)
input = cv2.imread(path)
windowName = "Image"
cv2.namedWindow(windowName)
cv2.createTrackbar("coef", windowName, 0, 25000, nothing)
condition = True
expensive_results = expensive_operations(input) #possibly with some more args and keyword args
scaled_image = scale_image(input)
while condition:
cheap_operations(scaled_image, windowName)
if cv2.waitKey(30) >= 0:
condition = False
cv2.destroyAllWindows()

I do this kind of thing in nip2. It's an image processing spreadsheet that can manipulate huge images quickly. It has no problems doing this kind of operation on any size image at 60fps.
I made you an example workspace: http://www.rollthepotato.net/~john/coeff.ws
Here's what it looks like working on a 1gb starfield image:
You can drag the slider to change coeff. The processed image updates instantly as you drag. You can zoom and pan around the processed image to check details and adjust coeff.
The underlying image processing library is libvips, which has a Python binding, pyvips. In pyvips, your program would be:
import pyvips
def adjust(image, coeff):
return image * [1, coeff / 100, 1 - coeff / 100]
Though that's without the GUI elements, of course.

How to merge images as transparent layers?

I am working on video editor for raspberry pi, and I have a problem with speed of placing image over image. Currently, using imagemagick it takes up to 10 seconds just to place one image over another, using 1080x1920 png images, on raspberry pi, and that's way too much. With the number of images time goes up as well. Any ideas on how to speed it up?
Imagemagick code:
composite -blend 90 img1.png img2.png new.png
Video editor with yet slow opacity support here
--------EDIT--------
slightly faster way:
import numpy as np
from PIL import Image
size_X, size_Y = 1920, 1080# put images resolution, else output may look wierd
image1 = np.resize(np.asarray(Image.open('img1.png').convert('RGB')), (size_X, size_Y, 3))
image2 = np.resize(np.asarray(Image.open('img2.png').convert('RGB')), (size_X, size_Y, 3))
output = image1*transparency+image2*(1-transparency)
Image.fromarray(np.uint8(output)).save('output.png')

My Raspberry Pi is unavailable at the moment - all I am saying is that there was some smoke involved and I do software, not hardware! As a result, I have only tested this on a Mac. It uses Numba.
First I used your Numpy code on these 2 images:
and
Then I implemented the same thing using Numba. The Numba version runs 5.5x faster on my iMac. As the Raspberry Pi has 4 cores, you could try experimenting with:
#jit(nopython=True,parallel=True)
def method2(image1,image2,transparency):
...
Here is the code:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
import numba
from numba import jit
def method1(image1,image2,transparency):
result = image1*transparency+image2*(1-transparency)
return result
#jit(nopython=True)
def method2(image1,image2,transparency):
h, w, c = image1.shape
for y in range(h):
for x in range(w):
for z in range(c):
image1[y][x][z] = image1[y][x][z] * transparency + (image2[y][x][z]*(1-transparency))
return image1
i1 = np.array(Image.open('image1.jpg').convert('RGB'))
i2 = np.array(Image.open('image2.jpg').convert('RGB'))
res = method1(i1,i2,0.4)
res = method2(i1,i2,0.4)
Image.fromarray(np.uint8(res)).save('result.png')
The result is:
Other thoughts... I did the composite in-place, overwriting the input image1 to try and save cache space. That may help or hinder - please experiment. I may not have processed the pixels in the optimal order - please experiment.

Just as another option, I tried in pyvips (full disclosure: I'm the pyvips maintainer, so I'm not very neutral):
#!/usr/bin/python3
import sys
import time
import pyvips
start = time.time()
a = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
b = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
out = a * 0.2 + b * 0.8
out.write_to_file(sys.argv[3])
print("pyvips took {} milliseconds".format(1000 * (time.time() - start)))
pyvips is a "pipeline" image processing library, so that code will execute the load, processing and save all in parallel.
On this two core, four thread i5 laptop using Mark's two test images I see:
$ ./overlay-vips.py blobs.jpg ships.jpg x.jpg
took 39.156198501586914 milliseconds
So 39ms for two jpg loads, processing and one jpg save.
You can time just the blend part by copying the source images and the result to memory, like this:
a = pyvips.Image.new_from_file(sys.argv[1]).copy_memory()
b = pyvips.Image.new_from_file(sys.argv[2]).copy_memory()
start = time.time()
out = (a * 0.2 + b * 0.8).copy_memory()
print("pyvips between memory buffers took {} milliseconds"
.format(1000 * (time.time() - start)))
I see:
$ ./overlay-vips.py blobs.jpg ships.jpg x.jpg
pyvips between memory buffers took 15.432596206665039 milliseconds
numpy is about 60ms on this same test.
I tried a slight variant of Mark's nice numba example:
#!/usr/bin/python3
import sys
import time
import numpy as np
from PIL import Image
import numba
from numba import jit, prange
#jit(nopython=True, parallel=True)
def method2(image1, image2, transparency):
h, w, c = image1.shape
for y in prange(h):
for x in range(w):
for z in range(c):
image1[y][x][z] = image1[y][x][z] * transparency \
+ (image2[y][x][z] * (1 - transparency))
return image1
# run once to force a compile
i1 = np.array(Image.open(sys.argv[1]).convert('RGB'))
i2 = np.array(Image.open(sys.argv[2]).convert('RGB'))
res = method2(i1, i2, 0.2)
# run again and time it
i1 = np.array(Image.open(sys.argv[1]).convert('RGB'))
i2 = np.array(Image.open(sys.argv[2]).convert('RGB'))
start = time.time()
res = method2(i1, i2, 0.2)
print("numba took {} milliseconds".format(1000 * (time.time() - start)))
Image.fromarray(np.uint8(res)).save(sys.argv[3])
And I see:
$ ./overlay-numba.py blobs.jpg ships.jpg x.jpg
numba took 8.110523223876953 milliseconds
So on this laptop, numba is about 2x faster than pyvips.
If you time load and save as well, it's quite a bit slower:
$ ./overlay-numba.py blobs.jpg ships.jpg x.jpg
numba plus load and save took 272.8157043457031 milliseconds
But that seems unfair, since almost all that time is in PIL load and save.

Python 3.7 : multiprocessing a for loop with shared variables

first a bit of context :
I'm trying to write down a python script to convert Image in greyscale (.tif) to a .jpeg with the so called ''jet'' colormap. I managed to do it with a for loop but it's a bit long for one image (millions of pixels to treat !), so I would like to use multiprocessing.
My problem here is that to convert each grey pixel into a coloured one I have to use two variables (the minimum value of light intensity ''min_img'' and an vector ''dx_cm'' to go from the initial grey scale to a 256 scale, corresponding to the jet colormap).
So to pass the information of ''min_img'' and ''dx_cm'' to the processes I try to use multiprocessing.Value() but in return I get the error :
RuntimeError: Synchronized objects should only be shared between processes through inheritance
I tried many different things from different sources and no matter the version of my code I'm struggling with that error. So I'm sorry if my code isn't clean, I would be very grateful if someone could help me with that.
My non-working code :
import multiprocessing
from PIL import Image
from matplotlib import cm
def fun(gr_list,dx,minp):
dx_cmp = dx.value
min_imgp = minp.value
rgb_res=list()
for i in range(len(gr_list)):
rgb_res.extend(cm.jet(round(((gr_list[i]-min_imgp)/dx_cmp)-1))[0:-1])
return rgb_res
if __name__ == '__main__':
RGB_list=list()
n = multiprocessing.cpu_count()
img = Image.open(r'some_path_to_a.tif')
Img_grey=list(img.getdata())
dx_cm = multiprocessing.Value('d',(max(Img_grey)-min(Img_grey))/256)
min_img = multiprocessing.Value('d',min(Img_grey))
with multiprocessing.Pool(n) as p:
RGB_list = list(p.map(fun, (Img_grey,dx_cm,min_img)))
res = Image.frombytes("RGB", (img.size[0], img.size[1]), bytes([int(0.5 + 255*i) for i in RGB_list]))
res.save('rgb_file.jpg')
PS : Here is an example of the the initial for loop that I would like to parallelize :
from PIL import Image
from matplotlib import cm
if __name__ == '__main__':
img = Image.open(r'some_path_to_a.tif')
Img_grey = list(img.getdata())
dx_cm = (max(Img_grey)-min(Img_grey))/256
min_img = min(Img_grey)
Img_rgb = list()
for i in range(len(Img_grey)):
Img_rgb.extend(cm.jet(round(((Img_grey[i]-min_img)/dx_cm)-1))[0:-1])
res = Image.frombytes("RGB", (img.size[0], img.size[1]), bytes([int(0.5 + 255*i) for i in Img_rgb]))
res.save('rgb_file.jpg')

Your fun method is looping over some list, but in this case it will receive a "part", an item from your list, so it should return only the result of its processing.
I have changed the working code to run with multiprocessing.
As the fun method returns a list, the p.map will return a list of lists (a list of results) and that need to be flatten, were done with list extends method before.
Tried with process pool and thread pool multiprocessing, in my scenario there wasn't any performance gains.
Process multiprocessing:
from PIL import Image
from matplotlib import cm
import multiprocessing
def fun(d):
part, dx_cm, min_img = d
return cm.jet(round(((part-min_img)/dx_cm)-1))[0:-1]
if __name__ == '__main__':
img = Image.open(r'a.tif')
Img_grey = list(img.getdata())
def Gen(img_data):
dx_cm = (max(img_data)-min(img_data))/256
min_img = min(img_data)
for part in img_data:
yield part, dx_cm, min_img
n = multiprocessing.cpu_count()
with multiprocessing.Pool(n) as p:
Img_rgb = [item for sublist in p.map(fun, Gen(Img_grey)) for item in sublist]
res = Image.frombytes("RGB", (img.size[0], img.size[1]), bytes([int(0.5 + 255*i) for i in Img_rgb]))
res.save('b.jpg')
Thread multiprocessing:
from PIL import Image
from matplotlib import cm
import multiprocessing
from multiprocessing.pool import ThreadPool
if __name__ == '__main__':
img = Image.open(r'a.tif')
Img_grey = list(img.getdata())
dx_cm = (max(Img_grey)-min(Img_grey))/256
min_img = min(Img_grey)
def fun(part):
return cm.jet(round(((part-min_img)/dx_cm)-1))[0:-1]
n = multiprocessing.cpu_count()
with ThreadPool(n) as p:
Img_rgb = [item for sublist in p.map(fun, Img_grey) for item in sublist]
res = Image.frombytes("RGB", (img.size[0], img.size[1]), bytes([int(0.5 + 255*i) for i in Img_rgb]))
res.save('b.jpg')

So it seems that the computational burden isn't big enough for multiprocessing to be helpful.
Nevertheless, for those coming across this topic interested in the image processing part of my question, I found another much quicker way (15 to 20 x than previous method) to do the same thing without a for loop :
from matplotlib import cm
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
from PIL import Image
cm_jet = cm.get_cmap('jet')
img_src = Image.open(r'path to your grey image')
img_src.mode='I'
Img_grey = list(img_src.getdata())
max_img = max(Img_grey)
min_img = min(Img_grey)
rgb_array=np.uint8(cm_jet(((np.array(img_src)-min_img)/(max_img-min_img)))*255)
ax = plt.subplot(111)
im = ax.imshow(rgb_array, cmap='jet')
divider = make_axes_locatable(ax)
cax_plot = divider.append_axes("right", size="5%", pad=0.05)
cbar=plt.colorbar(im, cax=cax_plot, ticks=[0,63.75,127.5,191.25,255])
dx_plot=(max_img-min_img)/255
cbar.ax.set_yticklabels([str(min_img),str(round(min_img+63.75*dx_plot)),str(round(min_img+127.5*dx_plot)),str(round(min_img+191.25*dx_plot)), str(max_img)])
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
plt.savefig('test_jet.jpg', quality=95, dpi=1000)

how to use multi-threading for optimizing face detection?

I have a code which uses a list of image URLs from a CSV file and then performs face detection on those images after which it loads some models and does predictions on those images.
I did some load tests and found that the get_face function in the code takes a major chunk of the time required to produce the results and the extra time is taken by the pickle file created for predictions.
Question: Is there a possibility that by running these processes in threads, time can be reduced and also how this can be done in a multi threading way?
Here is the code example:
from __future__ import division
import numpy as np
from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize
df = pd.read_csv('/home/instaurls.csv')
detector = dlib.get_frontal_face_detector()
img_width, img_height = 139, 139
confidence = 0.8
def get_face():
output = None
data1 = []
for row in df.itertuples():
img = io.imread(row[1])
dets = detector(img, 1)
for i, d in enumerate(dets):
img = img[d.top():d.bottom(), d.left():d.right()]
img = resize(img, (img_width, img_height))
output = np.expand_dims(img, axis=0)
break
data1.append(output)
data1 = np.concatenate(data1)
return data1
get_face()
csv sample
data
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg

Here is how you could try to do it in parallel:
from __future__ import division
import numpy as np
from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize
from csv import DictReader
df = DictReader(open('/home/instaurls.csv')) # DictReader is iterable
detector = dlib.get_frontal_face_detector()
img_width, img_height = 139, 139
confidence = 0.8
def get_face(row):
"""
Here row is dictionary where keys are CSV header names
and values are values from current CSV row.
"""
output = None
img = io.imread(row[1]) # row[1] has to be changed to row['data']?
dets = detector(img, 1)
for i, d in enumerate(dets):
img = img[d.top():d.bottom(), d.left():d.right()]
img = resize(img, (img_width, img_height))
output = np.expand_dims(img, axis=0)
break
return output
if __name__ == '__main__':
pool = Pool() # default to number CPU cores
data = list(pool.imap(get_face, df))
print np.concatenate(data)
Pay attention to get_face and argument that it has. Also, to what it returns. This is what I meant when I said smaller chunks of work. Now get_face processes one row from CSV.
When you run this script, pool will be a reference to a instance of a Pool and you then call get_face for each row/tuple in df.itertuples().
After everything is done, data holds processing data and then you do np.concatenate on it.

Speed up Image Feature Extraction

It's the program of import multiple images and extract feature.
The problem is that it's too slow
I think it's because there's so many for loop.
For example
for q in range(0, height-32 , 32):
for w in range(0 , width-32 ,32):
for j in range(0,64,8):
for n in range(0,64,8):
How can I change my code to speed up?
import numpy as np
from scipy.fftpack import dct
from PIL import Image
import glob
import matplotlib.pyplot as plt
def image_open(path):
image_list = []
#for filename in glob.glob('path/*.jpg'):
for filename in glob.glob(path+'/*.jpg'):
im=Image.open(filename)
image_list.append(im)
return image_list
path = 'C:\\Users\\LG\\PycharmProjects\\photo'
images = image_open(path)
for i in range(0, len(images)):
box3 = (0,0,256,256)
a = images[i].crop(box3)
(y,cb,cr) = a.split()
width , height = y.size
y.show()
for q in range(0, height-32 , 32):
for w in range(0 , width-32 ,32):
for j in range(0,64,8):
for n in range(0,64,8):
print(w/32)

Not sure why you are using so many loops so I’d suggest you try to improve that as best as you can first.
After that, look into threads. Threading
If the number of images being consumed is low, you could run the operation on each image on a separate thread.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python OpenCV speedup for multiprocessing - python

Related

How Could I increase the speed

How to merge images as transparent layers?

Python 3.7 : multiprocessing a for loop with shared variables

how to use multi-threading for optimizing face detection?

Speed up Image Feature Extraction

Categories

Resources