Opencv Python: Fastest way to multiply pixel value - python

I'm trying to change the pixel value of an image.
I have a factor r, g and b which will be used to multiply the pixel values ​​of this image.
import cv2
import numpy as np
from matplotlib import pyplot as plt
import time
im = cv2.imread("boat.jpg")
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im = cv2.resize(im, (4096,4096))
r_factor = 1.10
g_factor = 0.90
b_factor = 1.15
start = time.time()
im[...,0] = cv2.multiply(im[...,0], r_factor)
im[...,1] = cv2.multiply(im[...,1], g_factor)
im[...,2] = cv2.multiply(im[...,2], b_factor)
end = time.time()
This process takes time on large images. Is there any other method to multiply the value of the pixels ?

If I do this on my system, I get 568 ms:
import cv2
import numpy as np
# Known start image
im = np.full((4096,4096,3), [10,20,30], np.uint8)
In [49]: %%timeit
...: im[...,0] = cv2.multiply(im[...,0], r_factor)
...: im[...,1] = cv2.multiply(im[...,1], g_factor)
...: im[...,2] = cv2.multiply(im[...,2], b_factor)
...:
...:
568 ms ± 16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If I do it like this, it takes 394 ms:
In [42]: %timeit res = cv2.multiply(im,(r_factor, g_factor,b_factor,0))
394 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
You may get faster results doing it in-place, i.e. by specifying dst=im in the call. If I specify the type of the result, it comes out 5x faster at 63 ms - there must be something SIMD going on under the covers:
%timeit _ = cv2.multiply(im,(r_factor, g_factor,b_factor,0), dst=im, dtype=1)
63 ms ± 79.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If you are really keen on making it even faster, look at some answers tagged with [numba].

Related

Vectorization a code to make it faster than this

I have a little bit code which I'll have to vectorizate it to make it faster. I'm not very attached into python and thinking that the for loop is not so efficient.
Is there any way to reduce the time?
import numpy as np
import time
start = time.time()
N = 10000000 #9 seconds
#N = 100000000 #93 seconds
alpha = np.linspace(0.00000000000001, np.pi/2, N)
tmp = 2.47*np.sin(alpha)
for i in range(N):
if (abs(tmp[i])>1.0):
tmp[i]=1.0*np.sign(tmp[i])
beta = np.arcsin(tmp)
end = time.time()
print("Executed time: ",round(end-start,1),"Seconds")
I read about some numpy functions but I don't have a solution for this.
Clip the array:
tmp = np.clip(2.47 * np.sin(alpha), -1.0, 1.0)
Instead of using loop with condition, you can access the values by computing a mask. Here is example:
N = 10000000
alpha = np.linspace(0.00000000000001, np.pi/2, N)
tmp = 2.47*np.sin(alpha)
indices = np.abs(tmp) > 1.0
tmp[indices] = np.sign(tmp[indices])
beta = np.arcsin(tmp)
Results on my setup:
before: 5.66 s ± 30.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each),
after: 182 ms ± 877 µs per loop (mean ± std. dev. of 7 runs, 10 loops each).

Create random integer ndarray sampled from different span per element

I want to generate an ndarray a full of random integers which are sampled from different ranges according to another array span. For example:
import numpy as np
span = [5,6,7,8,9]
def get_a(span, count):
a = np.stack([np.random.choice(i, count) for i in span], axis=0)
return a
get_a(span,2)
Is there a fast way to do get_a?
Yes. Yours:
import timeit
import numpy as np
span = np.arange(1,100)
def get_a(span, count):
a = np.stack([np.random.choice(i, count) for i in span], axis=0)
return a
%timeit get_a(span,2)
2.32 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
My solution is 100s times faster for largish arrays:
def get_b(span, count):
b = (np.random.rand(len(span), count)*span[:,None]).astype(int)
return b
%timeit get_b(span,2)
6.91 µs ± 267 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Single precision rfft

I seek single precision rfft to accelerate computation; scipy.fftpack.rfft does this, but returns a real array that packs real and imaginary components in same axis, requiring a post-processing step. I implemented below to obtain the standard complex array, but Numpy's rfft ends up being faster for 2D inputs (but slower for 1D). Memory is also of concern, OOM with float64.
Does scipy or another library have a single precision rfft implementation that returns the standard complex array? (else, can below be done faster?)
import numpy as np
from numpy.fft import rfft
from scipy.fftpack import rfft as srfft
def rfft_sp(x): # assumes len(x) is even
xf = np.zeros((len(x)//2 + 1, x.shape[1]), dtype='complex64')
h = srfft(x, axis=0)
xf[0] = h[0]
xf[1:] = h[1::2]
xf[:1].imag = 0
xf[-1:].imag = 0
xf[1:-1].imag = h[2::2]
return xf
x = np.random.randn(500, 100000).astype('float32')
%timeit rfft_sp(x)
%timeit rfft(x, axis=0)
>>> 565 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> 517 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
on the machine on which I tested, using scipy.fft.rfft and casting to complex64 is faster than your implementation:
import numpy as np
from numpy.fft import rfft
from scipy.fft import rfft as srfft
from scipy.fftpack import rfft as srfft2
def rfft_sp(x): # assumes len(x) is even
xf = np.zeros((len(x)//2 + 1, x.shape[1]), dtype='complex64')
h = srfft2(x, axis=0)
xf[0] = h[0]
xf[1:] = h[1::2]
xf[:1].imag = 0
xf[-1:].imag = 0
xf[1:-1].imag = h[2::2]
return xf
def rfft_cast(x):
h = srfft(x, axis=0)
return h.astype('complex64')
x = np.random.randn(500, 100000).astype('float32')
%timeit rfft(x, axis = 0 )
%timeit rfft_sp(x )
%timeit rfft_cast(x)
produces:
1.81 s ± 144 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.89 s ± 7.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.24 s ± 9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
scipy.fft works with single precision.

Fastest way to average sign-normalized segments of data with NumPy?

What would be the fastest way to collect segments of data from a NumPy array at every point in a dataset, normalize them based on the sign (+ve/-ve) at the start of the segment, and average all segments together?
At present I have:
import numpy as np
x0 = np.random.normal(0,1,5000) # Dataset to be analysed
l0 = 100 # Length of segment to be averaged
def average_seg(x,l):
return np.mean([x[i:i+l]*np.sign(x[i]) for i in range(len(x)-l)],axis=0)
av_seg = average_seg(x0,l0)
Timing for this is as follows:
%timeit average_seg(x0,l0)
22.2 ms ± 362 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
This does the job, but is there a faster way to do this?
The above code suffers when the length of x0 is large, and when the value of l0 is large. We're looking at looping through this code several million times, so even incremental improvements will help!
We can leverage 1D convolution -
np.convolve(x,np.sign(x[:-l+1][::-1]),'valid')/(len(x)-l+1)
The idea is to do the windowed summations with convolution and with a flipped kernel as per the convolution definition.
Timings -
In [150]: x = np.random.normal(0,1,5000) # Dataset to be analysed
...: l = 100 # Length of segment to be averaged
In [151]: %timeit average_seg(x,l)
17.2 ms ± 689 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [152]: %timeit np.convolve(x,np.sign(x[:-l+1][::-1]),'valid')/(len(x)-l+1)
149 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [153]: av_seg = average_seg(x,l)
...: out = np.convolve(x,np.sign(x[:-l+1][::-1]),'valid')/(len(x)-l+1)
...: print(np.allclose(out, av_seg))
True
100x+ speedup!

Efficient calculation of vector between sets of 3D points

I'm coding a particular version of raytracing in Python, and I'm trying to calculate the vectors between points on different planes.
I'm working with sets of point light sources, simulating a nonpoint light source. Each source generates one ray for each pixel on the "camera" plane. I managed to compute the vector for each of those rays, by iterating with a for loop for each pixel:
for sensor_point in sensor_points:
sp_min_ro = sensor_point - rayorigins #Vectors between the points
normalv = normalize(sp_min_ro) #Normalized vector between the points
Where sensor_points is a large numpy array with the [x,y,z] coordinates of the different pixel positions, and rayorigins is a numpy array with the [x,y,z] coordinates for the different point sources
This for loop approach works, but is extremely slow. I tried to remove the for loop and directly calculate spr_min_ro = sensor_points - rayorigins, with the whole array, but numpy can't operate it:
ValueError: operands could not be broadcast together with shapes (1002001,3) (36,3)
Is there a way to accelerate the process of finding the vectors between all the points?
Edit: Adding the normalize function definition I have been using, because it is also giving problems:
def normalize(v):
norm = np.linalg.norm(v, axis=1)
return v / norm[:,None]
When I try to pass the new (1002001, 36, 3) array from #aganders3 solution, it fails, I suppose because of the axis?
Numpy solution
import numpy as np
sensor_points=np.random.randn(1002001,3)#.astype(np.float32)
rayorigins=np.random.rand(36,3)#.astype(np.float32)
sp_min_ro = sensor_points[:, np.newaxis, :] - rayorigins
norm=np.linalg.norm(sp_min_ro,axis=2)
sp_min_ro/=norm[:,:,np.newaxis]
Timings
np.float64: 1.76 s ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.float32: 1.42 s ± 9.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Numba solution
import numba as nb
#nb.njit(fastmath=True,error_model="numpy",parallel=True)
def normalized_vec(sensor_points,rayorigins):
res=np.empty((sensor_points.shape[0],rayorigins.shape[0],3),dtype=sensor_points.dtype)
for i in nb.prange(sensor_points.shape[0]):
for j in range(rayorigins.shape[0]):
vec_x=sensor_points[i,0]-rayorigins[j,0]
vec_y=sensor_points[i,1]-rayorigins[j,1]
vec_z=sensor_points[i,2]-rayorigins[j,2]
dist=np.sqrt(vec_x**2+vec_y**2+vec_z**2)
res[i,j,0]=vec_x/dist
res[i,j,1]=vec_y/dist
res[i,j,2]=vec_z/dist
return res
Timings
%timeit res=normalized_vec(sensor_points,rayorigins)
np.float64: 208 ms ± 4.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
np.float32: 104 ms ± 515 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba solution with preallocated memory
Memory allocation could be very costly. This example should show, why it is sometimes a good idea to avoid large temporary arrays if possible.
#nb.njit(fastmath=True,error_model="numpy",parallel=True)
def normalized_vec(sensor_points,rayorigins,res):
for i in nb.prange(sensor_points.shape[0]):
for j in range(rayorigins.shape[0]):
vec_x=sensor_points[i,0]-rayorigins[j,0]
vec_y=sensor_points[i,1]-rayorigins[j,1]
vec_z=sensor_points[i,2]-rayorigins[j,2]
dist=np.sqrt(vec_x**2+vec_y**2+vec_z**2)
res[i,j,0]=vec_x/dist
res[i,j,1]=vec_y/dist
res[i,j,2]=vec_z/dist
return res
Timings
res=np.empty((sensor_points.shape[0],rayorigins.shape[0],3),dtype=sensor_points.dtype)
%timeit res=normalized_vec(sensor_points,rayorigins)
np.float64: 66.6 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
np.float32: 33.8 ms ± 375 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Check out the rules for NumPy broadcasting. I think adding a new axis in the middle of your sensor_points array will work:
>> sp_min_ro = sensor_points[:, np.newaxis, :] - rayorigins
>> sp_min_ro.shape
(1002001, 36, 3)

Categories