I am in the process of attempting to implement a script in MATLAB over to Python. I have the following:
% I = im2double(imread('images\image.tif'));
% IC(:,:,1) = imresize(squeeze(I(:,:,1)),[N1 N2]);
% IC(:,:,2) = imresize(squeeze(I(:,:,2)),[N1 N2]);
% IC(:,:,3) = imresize(squeeze(I(:,:,3)),[N1 N2]);
and
[xi yi imv1] = find(squeeze(imagee(:,:,1))+0.1);
imv1 = imv1 - 0.1;
wd1 = (imv1*ones(1,length(imv1)) - ones(length(imv1),1)*imv1').^2;
I understand I can load the image with open cv, i.e.
I= cv2.imread('image',1)
And that I can use np.nonzero for MATLAB find,
index=(imagee+0.1).nonzero()
np.outer for the wd1 calculation:
timv1=np.transpose(imv1)
wd1 = np.abs(np.outer(imv1,holder1) - np.outer(holder1,timv1))
as well as np.squeeze for MATLAB's squeeze function.
However, how can i write these functions in a compact form to produce maximum efficiency and speed when the Python script is finalized?
Related
Reading through ArrayFire documentation, I noticed that the library supports batched operations when using 2D convolution. Therefore, I need to apply N filters to an image using the C++ API.
For easy testing, I decided to create a simple Python script to assert the convolution results. However, I couldn't get proper results when using >1 filters and comparing them to OpenCV's 2D convolution separately. Following is my Python script:
import arrayfire as af
import cv2
import numpy as np
np.random.seed(1)
np.set_printoptions(precision=3)
af.set_backend('cuda')
n_kernels = 2
image = np.random.randn(512,512).astype(np.float32)
kernels_list = [np.random.randn(7,7).astype(np.float32) for _ in range(n_kernels)]
conv_cv_list = [cv2.filter2D(image, -1, cv2.flip(kernel,-1), borderType=cv2.BORDER_CONSTANT) for kernel in kernels_list]
image_gpu = af.array.Array(image.ctypes.data, image.shape, image.dtype.char)
kernels = np.stack(kernels_list, axis=-1) if n_kernels > 1 else kernels_list[0]
kernels_gpu = af.array.Array(kernels.ctypes.data, kernels.shape, kernels.dtype.char)
conv_af_gpu = af.convolve2(image_gpu, kernels_gpu)
conv_af = conv_af_gpu.to_ndarray()
if n_kernels == 1:
conv_af = conv_af[..., None]
for kernel_idx in range(n_kernels):
print("CV conv:", conv_cv_list[kernel_idx][0, 0])
print("AF conv", conv_af[0, 0, kernel_idx])
That said, I would like to know how properly use ArrayFire batched support.
ArrayFire is column-major, whereas OpenCV and NumPy are row-major. This can cause issues. We provide an interop function to handle this for you, as follows.
import arrayfire as af
import cv2
import numpy as np
np.random.seed(1)
np.set_printoptions(precision=3)
af.set_backend('cuda')
n_kernels = 2
image = np.random.randn(512,512).astype(np.float32)
kernels_list = [np.random.randn(7,7).astype(np.float32) for _ in range(n_kernels)]
conv_cv_list = [cv2.filter2D(image, -1, cv2.flip(kernel,-1), borderType=cv2.BORDER_CONSTANT) for kernel in kernels_list]
image_gpu = af.interop.from_ndarray(image) # CHECK OUT THIS
kernels = np.stack(kernels_list, axis=-1) if n_kernels > 1 else kernels_list[0]
kernels_gpu = af.interop.from_ndarray(kernels) # CHECK OUT THIS
conv_af_gpu = af.convolve2(image_gpu, kernels_gpu)
conv_af = conv_af_gpu.to_ndarray()
if n_kernels == 1:
conv_af = conv_af[..., None]
for kernel_idx in range(n_kernels):
print("CV conv:", conv_cv_list[kernel_idx][0, 0])
print("AF conv", conv_af[0, 0, kernel_idx])
We are working on a newer version of ArrayFire Python which will address this issue.
Good luck!
I have to convert these MATLAB lines to python. I don't know how exacly to do this, what I know is that I should use csr_matrix from scipy.sparse.
A = sparse(Nx,Nx);
A = A + sparse([1 Nx],[1 Nx],[1 1],Nx,Nx);
A = A + sparse(2:Nx-1,2:Nx-1,ones(Nx-2,1)*(1+2*r),Nx,Nx);
A = A - sparse(2:Nx-1,1:Nx-2,ones(Nx-2,1)*r,Nx,Nx);
A = A - sparse(2:Nx-1,3:Nx,ones(Nx-2,1)*r,Nx,Nx);
I tried something like this:
A = sp.csr_matrix((Nx,Nx), dtype=np.float)
A = A + sp.csr_matrix((z,(x,y)), shape=(Nx,Nx), dtype=np.float)
But I don't know what to put in this x,y,z positions:
Nx=1000 and r=99.8001
I want to use cuda streams in order to speed up small calculations on the GPU. My test so far consists of the following:
import cupy as xp
import time
x = xp.random.randn(10003, 20000) + 1j * xp.random.randn(10003, 20000)
y = xp.zeros_like(x)
nStreams = 16
streams = [xp.cuda.stream.Stream() for ii in range(nStreams)]
f = xp.fft.fft(x[:,:200])
t = time.time()
for ii in range(int(x.shape[1]/100)):
ss = streams[ii % nStreams]
with ss:
y[:,ii*200:(ii+1)*200] = xp.fft.fft(x[:,ii*200:(ii+1)*200], axis=0)
for ii,ss in enumerate(streams):
ss.synchronize()
print(time.time()-t)
t = time.time()
for ii in range(int(x.shape[1]/100)):
y[:,ii*200:(ii+1)*200] = xp.fft.fft(x[:,ii*200:(ii+1)*200], axis=0)
xp.cuda.Stream.null.synchronize()
print(time.time()-t)
produces
[user#pc snippets]$ intelpython3 strm.py
0.019365549087524414
0.018717050552368164
which I have trouble believing that I do everything correctly. Additionally, the situation becomes even more severe when replacing the FFT-calls with calls to xp.sum, which yields
[user#pc snippets]$ intelpython3 strm.py
0.002195596694946289
0.001004934310913086
What is the rationale behind cupy streams? How do I use them to my advantage?
I am working on video editor for raspberry pi, and I have a problem with speed of placing image over image. Currently, using imagemagick it takes up to 10 seconds just to place one image over another, using 1080x1920 png images, on raspberry pi, and that's way too much. With the number of images time goes up as well. Any ideas on how to speed it up?
Imagemagick code:
composite -blend 90 img1.png img2.png new.png
Video editor with yet slow opacity support here
--------EDIT--------
slightly faster way:
import numpy as np
from PIL import Image
size_X, size_Y = 1920, 1080# put images resolution, else output may look wierd
image1 = np.resize(np.asarray(Image.open('img1.png').convert('RGB')), (size_X, size_Y, 3))
image2 = np.resize(np.asarray(Image.open('img2.png').convert('RGB')), (size_X, size_Y, 3))
output = image1*transparency+image2*(1-transparency)
Image.fromarray(np.uint8(output)).save('output.png')
My Raspberry Pi is unavailable at the moment - all I am saying is that there was some smoke involved and I do software, not hardware! As a result, I have only tested this on a Mac. It uses Numba.
First I used your Numpy code on these 2 images:
and
Then I implemented the same thing using Numba. The Numba version runs 5.5x faster on my iMac. As the Raspberry Pi has 4 cores, you could try experimenting with:
#jit(nopython=True,parallel=True)
def method2(image1,image2,transparency):
...
Here is the code:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
import numba
from numba import jit
def method1(image1,image2,transparency):
result = image1*transparency+image2*(1-transparency)
return result
#jit(nopython=True)
def method2(image1,image2,transparency):
h, w, c = image1.shape
for y in range(h):
for x in range(w):
for z in range(c):
image1[y][x][z] = image1[y][x][z] * transparency + (image2[y][x][z]*(1-transparency))
return image1
i1 = np.array(Image.open('image1.jpg').convert('RGB'))
i2 = np.array(Image.open('image2.jpg').convert('RGB'))
res = method1(i1,i2,0.4)
res = method2(i1,i2,0.4)
Image.fromarray(np.uint8(res)).save('result.png')
The result is:
Other thoughts... I did the composite in-place, overwriting the input image1 to try and save cache space. That may help or hinder - please experiment. I may not have processed the pixels in the optimal order - please experiment.
Just as another option, I tried in pyvips (full disclosure: I'm the pyvips maintainer, so I'm not very neutral):
#!/usr/bin/python3
import sys
import time
import pyvips
start = time.time()
a = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
b = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
out = a * 0.2 + b * 0.8
out.write_to_file(sys.argv[3])
print("pyvips took {} milliseconds".format(1000 * (time.time() - start)))
pyvips is a "pipeline" image processing library, so that code will execute the load, processing and save all in parallel.
On this two core, four thread i5 laptop using Mark's two test images I see:
$ ./overlay-vips.py blobs.jpg ships.jpg x.jpg
took 39.156198501586914 milliseconds
So 39ms for two jpg loads, processing and one jpg save.
You can time just the blend part by copying the source images and the result to memory, like this:
a = pyvips.Image.new_from_file(sys.argv[1]).copy_memory()
b = pyvips.Image.new_from_file(sys.argv[2]).copy_memory()
start = time.time()
out = (a * 0.2 + b * 0.8).copy_memory()
print("pyvips between memory buffers took {} milliseconds"
.format(1000 * (time.time() - start)))
I see:
$ ./overlay-vips.py blobs.jpg ships.jpg x.jpg
pyvips between memory buffers took 15.432596206665039 milliseconds
numpy is about 60ms on this same test.
I tried a slight variant of Mark's nice numba example:
#!/usr/bin/python3
import sys
import time
import numpy as np
from PIL import Image
import numba
from numba import jit, prange
#jit(nopython=True, parallel=True)
def method2(image1, image2, transparency):
h, w, c = image1.shape
for y in prange(h):
for x in range(w):
for z in range(c):
image1[y][x][z] = image1[y][x][z] * transparency \
+ (image2[y][x][z] * (1 - transparency))
return image1
# run once to force a compile
i1 = np.array(Image.open(sys.argv[1]).convert('RGB'))
i2 = np.array(Image.open(sys.argv[2]).convert('RGB'))
res = method2(i1, i2, 0.2)
# run again and time it
i1 = np.array(Image.open(sys.argv[1]).convert('RGB'))
i2 = np.array(Image.open(sys.argv[2]).convert('RGB'))
start = time.time()
res = method2(i1, i2, 0.2)
print("numba took {} milliseconds".format(1000 * (time.time() - start)))
Image.fromarray(np.uint8(res)).save(sys.argv[3])
And I see:
$ ./overlay-numba.py blobs.jpg ships.jpg x.jpg
numba took 8.110523223876953 milliseconds
So on this laptop, numba is about 2x faster than pyvips.
If you time load and save as well, it's quite a bit slower:
$ ./overlay-numba.py blobs.jpg ships.jpg x.jpg
numba plus load and save took 272.8157043457031 milliseconds
But that seems unfair, since almost all that time is in PIL load and save.
I'm new to signal processing (and numpy, scipy, and matlab for that matter). I'm trying to estimate vowel formants with LPC in Python by adapting this matlab code:
http://www.mathworks.com/help/signal/ug/formant-estimation-with-lpc-coefficients.html
Here is my code so far:
#!/usr/bin/env python
import sys
import numpy
import wave
import math
from scipy.signal import lfilter, hamming
from scikits.talkbox import lpc
"""
Estimate formants using LPC.
"""
def get_formants(file_path):
# Read from file.
spf = wave.open(file_path, 'r') # http://www.linguistics.ucla.edu/people/hayes/103/Charts/VChart/ae.wav
# Get file as numpy array.
x = spf.readframes(-1)
x = numpy.fromstring(x, 'Int16')
# Get Hamming window.
N = len(x)
w = numpy.hamming(N)
# Apply window and high pass filter.
x1 = x * w
x1 = lfilter([1., -0.63], 1, x1)
# Get LPC.
A, e, k = lpc(x1, 8)
# Get roots.
rts = numpy.roots(A)
rts = [r for r in rts if numpy.imag(r) >= 0]
# Get angles.
angz = numpy.arctan2(numpy.imag(rts), numpy.real(rts))
# Get frequencies.
Fs = spf.getframerate()
frqs = sorted(angz * (Fs / (2 * math.pi)))
return frqs
print get_formants(sys.argv[1])
Using this file as input, my script returns this list:
[682.18960189917243, 1886.3054773107765, 3518.8326108511073, 6524.8112723782951]
I didn't even get to the last steps where they filter the frequencies by bandwidth because the frequencies in the list aren't right. According to Praat, I should get something like this (this is the formant listing for the middle of the vowel):
Time_s F1_Hz F2_Hz F3_Hz F4_Hz
0.164969 731.914588 1737.980346 2115.510104 3191.775838
What am I doing wrong?
Thanks very much
UPDATE:
I changed this
x1 = lfilter([1., -0.63], 1, x1)
to
x1 = lfilter([1], [1., 0.63], x1)
as per Warren Weckesser's suggestion and am now getting
[631.44354635609318, 1815.8629524985781, 3421.8288991389031, 6667.5030877036006]
I feel like I'm missing something since F3 is very off.
UPDATE 2:
I realized that the order being passed to scikits.talkbox.lpc was off due to a difference in sampling frequency. Changed it to:
Fs = spf.getframerate()
ncoeff = 2 + Fs / 1000
A, e, k = lpc(x1, ncoeff)
Now I'm getting:
[257.86573127888488, 774.59006835496086, 1769.4624576002402, 2386.7093679399809, 3282.387975973973, 4413.0428174593926, 6060.8150432549655, 6503.3090645887842, 7266.5069407315023]
Much closer to Praat's estimation!
The problem had to do with the order being passed to the lpc function. 2 + fs / 1000 where fs is the sampling frequency is the rule of thumb according to:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html
I have not been able to get the results you expect, but I do notice two things which might cause some differences:
Your code uses [1, -0.63] where the MATLAB code from the link you provided has [1 0.63].
Your processing is being applied to the entire x vector at once instead of smaller segments of it (see where the MATLAB code does this: x = mtlb(I0:Iend); ).
Hope that helps.
There are at least two problems:
According to the link, the "pre-emphasis filter is a highpass all-pole (AR(1)) filter". The signs of the coefficients given there are correct: [1, 0.63]. If you use [1, -0.63], you get a lowpass filter.
You have the first two arguments to scipy.signal.lfilter reversed.
So, try changing this:
x1 = lfilter([1., -0.63], 1, x1)
to this:
x1 = lfilter([1.], [1., 0.63], x1)
I haven't tried running your code yet, so I don't know if those are the only problems.