I'm in the process of learning some ML concepts using OpenCV, and I have a piece of python code that I was given to translate into c++. I have a very basic knowledge of python, and I've run into some syntax that I can't seem to find the meaning for.
I have a variable being passed into a method (whole method not shown) that is coming from the result of cv2.imread(), so an image. In c++, it's of type Mat:
def preprocess_image(img, side = 96):
min_side = min(img.shape[0], img.shape[1])
img = img[:min_side, :min_side * 2]
I have a couple questions:
What does the syntax ":min_side" do?
What is that line doing in terms of the image?
I am assuming the input of the image is a Matrix. In Python the image is generally read as numpy matrix
1.What does the syntax ":min_side" do?
It "Slice" the List/Array or basically in this case, a Matrix.
2.What is that line doing in terms of the image?
It "crops" the 2D Array(Basically a Matrix/Image)
A simple example of slicing:
x = np.array([[0, 1, 2],[3, 4, 5], [6, 7, 8]])
print(x)
out:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Performing Slicing on this Matrix(Image):
x[:2, :3]
output after Slicing:
array([[0, 1, 2],
[3, 4, 5]])
A good source to read more about it would be straight from the source: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html
The line:
img = img[:min_side, :min_side * 2]
is cropping the image so that the resulting image is min_side in height and min_side * 2 in width. The colon preceding a variable name is python's slicing syntax. Observe:
arr = [1, 2, 3, 4, 5, 6]
length = 4
print(arr[:length])
Output:
[1, 2, 3, 4]
:min_side is a shorthand for 0:min_side i.e it produces a slice the object from the start to min_side. For example:
f = [2, 4, 5, 6, 8, 9]
f[:3] # returns [2,4,5]
img = img[:min_side, :min_side *2] produces a crop of the image (which is a numpy array) from 0 to min_side along the height and from 0 to min_side * 2 along the width. Therefore the resulting image would be one of width min_side * 2 and height min_side .
Related
I have a batch of 20 flattened tensors representing 256X256 images.
>>> imgs.shape
(20, 65536)
Each image was split into 32x32 patches (a total of 64 patches per image). I have calculated a score for each patch and got a vector with the shape of (20,64)
I would like to multiply each pixel with the corresponding patch score.
imgs * score yields an error and score.repeat(1,1,64) didn't repeat the scores in a way that preserves the score of each pixel.
How can this be achieved?
EDIT:
A simple example can be using
import torch
img_size = 4
patch_size = 2
img = torch.rand((2,img_size,img_size)) # (2,4,4)
score = torch.tensor([[1,2,3,4],[5,6,7,8]]) # (2,4)
And trying to achieve
score = [[1,1,3,3],[2,2,4,4],[5,5,6,6][7,7,8,8]]
I would suggest reshaping your scores array to preserve information about how it relates to the original image, then using repeat_interleave() twice.
Example:
import torch
img_size = 4
patch_size = 2
patches_per_axis = int(img_size / patch_size)
num_images = 2
img = torch.rand((2,img_size,img_size)) # (2,4,4)
score = torch.tensor([[1,2,3,4],[5,6,7,8]]) # (2,4)
def expand_scores(scores):
# Unflatten scores
scores = scores.reshape((num_images, patches_per_axis, patches_per_axis))
# Repeat scores to match dimensions of image, in vertical direction
scores = scores.repeat_interleave(repeats=patch_size, axis=1)
# Repeat scores to match dimensions of image, in horizontal direction
scores = scores.repeat_interleave(repeats=patch_size, axis=2)
# Optional: use reshape() to re-flatten scores. If you do that here, you'll need to do it to the image tensor too.
return scores
(I added two constants at the top to your example, num_images, and patches_per_axis. In your original example, these would be set to 20 and 8, respectively.)
When you call expand_scores(), you'll get the following output:
tensor([[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]],
[[5, 5, 6, 6],
[5, 5, 6, 6],
[7, 7, 8, 8],
[7, 7, 8, 8]]])
You can multiply that by the pixel values:
expand_scores(score) * img
So I have lots of data in a single, flat array that is grouped into irregularly sized chunks. The sizes of these chunks are given in another array. What I need to do is rearrange the chunks based on a third index array (think fancy indexing)
These chunks are always >= 3 long, usually 4, but technically unbounded, so it's not feasible to pad up to a max length and mask. Also, due to technical reasons I only have access to numpy, so nothing like scipy or pandas.
Just to be easier to read, the data in this example is easily grouped. In the real data, the numbers can be anything and do not follow this pattern.
[EDIT] Updated with less confusing data
data = np.array([1,2,3,4, 11,12,13, 21,22,23,24, 31,32,33,34, 41,42,43, 51,52,53,54])
chunkSizes = np.array([4, 3, 4, 4, 3, 4])
newOrder = np.array([0, 5, 4, 5, 2, 1])
The expected output in this case would be
np.array([1,2,3,4, 51,52,53,54, 41,42,43, 51,52,53,54, 21,22,23,24, 11,12,13])
Since the real data can be millions long, I'm hoping for some kind of numpy magic that can do this without python loops.
Approach #1
Here's a vectorized one based on creating a regular array and masking -
def chunk_rearrange(data, chunkSizes, newOrder):
m = chunkSizes[:,None] > np.arange(chunkSizes.max())
d1 = np.empty(m.shape, dtype=data.dtype)
d1[m] = data
return d1[newOrder][m[newOrder]]
Output for given sample -
In [4]: chunk_rearrange(data, chunkSizes, newOrder)
Out[4]: array([0, 0, 0, 0, 5, 5, 5, 5, 4, 4, 4, 5, 5, 5, 5, 2, 2, 2, 2, 1, 1, 1])
Approach #2
Another vectorized one based on cumsum and with smaller footprint for those very-ragged chunksizes -
def chunk_rearrange_cumsum(data, chunkSizes, newOrder):
# Setup ID array that will hold specific values at those interval starts,
# such that a final cumsum would lead us to the indices which when indexed
# by the input array gives us the re-arranged o/p
idar = np.ones(len(data), dtype=int)
# New chunk lengths
newlens = chunkSizes[newOrder]
# Original chunk intervals
c = np.r_[0,chunkSizes[:-1].cumsum()]
# Indices from original order that form the interval starts in new arrangement
d1 = c[newOrder]
# Starts of chunks in new arrangement where those from d1 are to be assigned
c2 = np.r_[0,newlens[:-1].cumsum()]
# Offset required for the starts in new arrangement for final cumsum to work
diffs = np.diff(d1)+1-np.diff(c2)
idar[c2[1:]] = diffs
idar[0] = d1[0]
# Final cumsum and indexing leads to desired new arrangement
out = data[idar.cumsum()]
return out
You can use np.split to create views into your data array corresponding to the chunkSizes, if you build up the indices with np.cumsum. You can then reorder the views according to the newOrder indices using fancy indexing. This should be reasonably efficient since the data is only copied to the new array when you call np.concatenate on the reordered views:
import numpy as np
data = np.array([0,0,0,0, 1,1,1, 2,2,2,2, 3,3,3,3, 4,4,4, 5,5,5,5])
chunkSizes = np.array([4, 3, 4, 4, 3, 4])
newOrder = np.array([0, 5, 4, 5, 2, 1])
cumIndices = np.cumsum(chunkSizes)
splitArray = np.array(np.split(data, cumIndices[:-1]))
targetArray = np.concatenate(splitArray[newOrder])
# >>> targetArray
# array([0, 0, 0, 0, 5, 5, 5, 5, 4, 4, 4, 5, 5, 5, 5, 2, 2, 2, 2, 1, 1, 1])
I would like to find a reshape function that is able to transform my arrays of different dimensions in arrays of the same dimension. Let me explain it:
import numpy as np
a = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,3]]])
b = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,4]]])
c = np.array([[[1,2,3,3],[1,2,3,3]]])
I would like to be able to make b,c shapes equal to a shape. However, np.reshape throws an error because as explained here (Numpy resize or Numpy reshape) the function is explicitly made to handle the same dimensions.
I would like some version of that function that adds zeros at the start of the first dimension if the shape is smaller or remove the start if the shape is bigger. My example will look like this:
b = np.array([[[1,2,3,3],[1,2,3,3]],[[1,2,3,3],[1,2,3,4]]])
c = np.array([[[0,0,0,0],[0,0,0,0]],[[1,2,3,3],[1,2,3,3]]])
Do I need to write my own function to do that?
This is similar to above solution but will also work also if lower dimensions don't match
def custom_reshape(a, b):
result = np.zeros_like(a).ravel()
result[-min(a.size, b.size):] = b.ravel()[-min(a.size, b.size):]
return result.reshape(a.shape)
custom_reshape(a,b)
I would write a function like this:
def align(a,b):
out = np.zeros_like(a)
x = min(a.shape[0], b.shape[0])
out[-x:] = b[-x:]
return out
Output:
align(a,b)
# array([[[1, 2, 3, 3],
# [1, 2, 3, 3]],
# [[1, 2, 3, 3],
# [1, 2, 3, 4]]])
align(a,c)
# array([[[0, 0, 0, 0],
# [0, 0, 0, 0]],
# [[1, 2, 3, 3],
# [1, 2, 3, 3]]])
I newbie in python. I have matlab script like below. I want to re-write matrix 3D in matlab script in to python 3.x language. How can I fix it?
nl=length(res);
ndat=length(per);
phi=atan(1)*4;
amu=phi*4e-7;
for i=1:ndat
for j=1:nl
z=sqrt(phi*amu*res(j)/per(i));
zz(j)=complex(z,z);
exp0=exp((-2)*zz(j)/res(j)*thi(j));
exp1=complex(1,0)+exp0;
exp2=complex(1,0)-exp0;
%matrix 3D
ldi(1,1,j)=exp1;
ldi(1,2,j)=zz(j)*exp2
ldi(2,1,j)=exp2/zz(j);
ldi(2,2,j)=exp1;`
end
end
You'll find a self-contained implementation of your code (below), with a few key differences:
Python indexing starts from 0 instead of one
Python indexing uses square brackets instead of round ones
Mathematic functions must be imported from libraries (here math and cmath)
Good luck!
import math
import cmath
# Data
res = [1, 4, 1, 2, 3]
per = [5, 5, 1, 1, 0.5, 0.6]
thi = [1, 2, 3, 4, 5, 6]
nl = len(res)
ldi = [[[0 for x in range(nl)],[0 for x in range(nl)]], [[0 for x in range(nl)],[0 for x in range(nl)]]]
zz = [0]*nl
nl = len(res)
ndat = len(per)
phi = math.atan(1)*4
amu = phi*4e-7
for i in range(ndat):
for j in range(nl):
z = math.sqrt(phi*amu*res[j]/per[i])
zz[j] = complex(z,z)
exp0=cmath.exp((-2)*zz[j]/res[j]*thi[j]);
exp1=complex(1,0)+exp0;
exp2=complex(1,0)-exp0;
#- matrix 3D
ldi[0][0][j]=exp1;
ldi[0][1][j]=zz[j]*exp2
ldi[1][0][j]=exp2/zz[j]
ldi[1][1][j]=exp1
I'm trying to register two images that are a rotated and translated version of one another using opencv. Generally speaking, the procedure is (pseudo code):
a. IF1 = FFT2(I1); IF2 = FFT2(I2)
b. R_translation = (IF1).*(IF2_conjugate)
c. R_translation = R_translation./abs(R_translation)
d. r_translation = IFFT2(R_translation)
where the maximum of r_translation corresponds to the translation. Moving on to calculate the rotation, the abs value removes the translation part,
e. IF1_abs = abs(IF1); IF2_abs = abs(IF2)
Converting to Linear-Polar coordinates,
f. IF1_abs_pol = LINPOL(IF1_abs); IF2_abs_pol = LINPOL(IF2_abs)
f. IFF1 = FFT2(IF1_abs_pol); IFF2 = FFT2(IF2_abs_pol)
f. R_rot = (IFF1).*(IFF2_conjugate)
c. R_rot = R_rot./abs(R_rot)
d. r_rot = IFFT2(R_rot)
where the maximum of r_rotationn corresponds to the rotation. While for translation alone, the cv2.phaseCorrelate function returns expected results, for rotation, it returns odd results. So I had tried the following.
I took two numpy.array-s 5x5, which are a rotated version of one another like so:
a = numpy.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
a = a.astype('float')/a.astype('float').max()
b = numpy.array([[5, 5, 5, 5, 5], [4, 4, 4, 4, 4], [3, 3, 3, 3, 3], [2, 2, 2, 2, 2], [1, 1, 1, 1, 1]])
b = b.astype('float') / b.astype('float').max()
First I calculated the phase correlation myself:
center_x = numpy.floor(a.shape[0] / 2.0)#the x center of rotation (= x center of image)
center_y = numpy.floor(a.shape[1] / 2.0)#the y center of rotation (= y center of image)
Mvalue = a.shape[1] / numpy.sqrt(
((a.shape[0] / 2.0) ** 2.0) + ((a.shape[1] / 2.0) ** 2.0)) # rotation radius
Calculating the FFT, taking the absolute value (losing the translation difference data if existed), and switching to Linear-Polar coordinates and normalizing:
a_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(a)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
b_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(b)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
a_polar = a_polar/a_polar.max()
b_polar = b_polar / b_polar.max()
Another FFT step, multiplying point wise, and IFFT back:
aff = numpy.fft.fft2(a_polar)
bff = numpy.fft.fft2(b_polar)
R = aff * numpy.ma.conjugate(bff)
R = R / numpy.absolute(R)
r = numpy.fft.ifft2(R).real
r = r/r.max()
yields,
Phase correlation for rotation, b with respect to a
According to cv2.linearPolar() the rows, span the angle (in this case with step size of 360/5 = 72degrees) and the columns span the radius (from 0 to the maximum radius given in Mvalue. The maximum is evident at the last row (corresponding to approximately -90degree shift). So far so good..
The second method is using cv2.phaseCorrelate() directly,
r_direct = cv2.phaseCorrelate(a_polar, b_polar)
which yields,
Phase correlation for rotation, b with respect to a direct method
The first tuple, is the X,Y correlation coefficient (in pixels?) and the third number is the fit grade. When it is close to unity, the correlation coefficient represents better the data (the blob around the maximum is more distinct).
Other than the fact that the result is not distinct enough (why?), the result is confusing...
Generally, The first FFT process in this 5x5 example was not necessary. If rotation is the only interference, one can immediately switch to Linear-Polar coordinates and use cv2.phaseCorrelate. In that case, the result is also confusing.
Any help would be appreciated :)
Thanks!
David