how does image rotation work in this function - python

I can't understand the code starting from #LIST DEST PIXEL INDICES, I don't get why the and x and y
are the way they are, why aren't they starting from 0,0 instead of DIM//2. From my understanding, this function transforms the image by "inverse" transforming the destination pixels and plugging the original pixels in the destination pixels positions.
get_mat(*args) returns a 3x3 matrix that does various transformations, pretend it is a rotation matrix
take IMAGE_SIZE[0] as 224
this is part of Chris Deotte Kaggel notebook https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96
def transform(image,label):
# input image - is one image of size [dim,dim,3] not a batch of [b,dim,dim,3]
# output - image randomly rotated, sheared, zoomed, and shifted
DIM = IMAGE_SIZE[0]
XDIM = DIM%2 #fix for size 331
rot = 15. * tf.random.normal([1],dtype='float32')
shr = 5. * tf.random.normal([1],dtype='float32')
h_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
w_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
h_shift = 16. * tf.random.normal([1],dtype='float32')
w_shift = 16. * tf.random.normal([1],dtype='float32')
# GET TRANSFORMATION MATRIX
m = get_mat(rot,shr,h_zoom,w_zoom,h_shift,w_shift)
# LIST DESTINATION PIXEL INDICES
x = tf.repeat( tf.range(DIM//2,-DIM//2,-1), DIM )
y = tf.tile( tf.range(-DIM//2,DIM//2),[DIM] )
z = tf.ones([DIM*DIM],dtype='int32')
idx = tf.stack( [x,y,z] )
# ROTATE DESTINATION PIXELS ONTO ORIGIN PIXELS
idx2 = K.dot(m,tf.cast(idx,dtype='float32'))
idx2 = K.cast(idx2,dtype='int32')
idx2 = K.clip(idx2,-DIM//2+XDIM+1,DIM//2)
# FIND ORIGIN PIXEL VALUES
idx3 = tf.stack( [DIM//2-idx2[0,], DIM//2-1+idx2[1,]] )
d = tf.gather_nd(image,tf.transpose(idx3))
return tf.reshape(d,[DIM,DIM,3]),label

This seems like an affine transform of an image done the simple way(without multisampling), i.e. for all of destination pixels find corresponding pixel in source array and take its value.
destination pixel coorinates are put on a list idx, which is the transformed with matrix m, then they are cast into integers, and culled to image size.
idx3 is assembled from reverse-offset components (x,y) of source coorinates and used to index original image
why aren't they starting from 0,0 instead of DIM//2
(0,0) is top left corner of an image data array. When thinking of matrix transform you usually think that (0,0) point lies at the center of an image, and thus an offset

Related

Shapley polygon to array for each pixel, not just boundaries

The shape of a shapely polygon can easily be converted to an array of points by using
x,y = polygon.exterior.xy
However, this returns only the actual points.
How can I convert a shapely polygon to an array, with a 0 for each pixel outside of the shape, and a 1 for each pixel inside the shape?
Thus, for an example, a polygon with width=100, height=100 should result in a (100,100) array.
I could do this by getting the exterior points, and then looping through each pixel and seeing if it's inside/on the shape, or outside. But I think there should be an easier method?
I don't know about your exact requirement, but the fastest to get the general result should be this:
width = int(shape.bounds[2] - shape.bounds[0])
height = int(shape.bounds[3] - shape.bounds[1])
points = MultiPoint( [(x,y) for x in range(width) for y in range(height)] )
zeroes = points.difference( shape )
ones = points.intersection( shape )
This could answer my question, but I'm wondering if there is a faster way.
data = []
width = int(shape.bounds[2] - shape.bounds[0])
height = int(shape.bounds[3] - shape.bounds[1])
for y in range(0,height):
row = []
for x in range(0,width):
val = 1 if shape.convex_hull.contains(Point(x,y)) else 0
row.append(val)
data.append(row)

How to extract first component of FFT from 4D Image

I have 4D( 2D + slices along z axis + time frames) gray-scale image for the heart beating on different moments.
I do like to take Fourier Transform along time axis(for each slice separately), and analyze the fundamental Harmonic (also called H1 component, where H stands for Hilbert Space) so I can determine pixel regions corresponding to ROI which show strongest response to cardiac frequency.
I'm using python for this purpose, and I tried to do that with the following code, but I'm not sure that this is the correct way to do it, because I don't know how to determine the cut-frequency to keep only the fundamental Harmonic.
This link to the image which I'm dealing with
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
img = nib.load('patient057_4d.nii.gz')
f = np.fft.fft2(img)
# Move the DC component of the FFT output to the center of the spectrum
fshift = np.fft.fftshift(f)
fshift_orig = fshift.copy()
# logarithmic transformation
magnitude_spectrum = 20*np.log(np.abs(fshift))
# Create mask
rows, cols = img.shape
crow, ccol = int(rows/2), int(cols/2)
# Use mask to remove low frequency components
dist1 = 20
dist2 = 10
fshift[crow-dist1:crow+dist1, ccol-dist1:ccol+dist1] = 0
#fshift[crow-dist2:crow+dist2, ccol-dist2:ccol+dist2] = fshift_orig[crow-dist2:crow+dist2, ccol-dist2:ccol+dist2]
# logarithmic transformation
magnitude_spectrum1 = 20*np.log(np.abs(fshift))
f_ishift = np.fft.ifftshift(fshift)
# inverse Fourier transform
img_back = np.fft.ifft2(f_ishift)
# get rid of imaginary part by abs
img_back = np.abs(img_back)
plt.figure(num = 'Im_Back')
plt.imshow(abs(fshift[:,:,2,2]).astype('uint8'),cmap='gray')
plt.show()
The solution was to take Fourier transform 3D for each slice seperately, then to chose only the 2nd component of the Transform to transform it back to the spatial space, and that's it.
The benefit of this is to detect if something is moving along the third axis(time in my case).
for sl in range(img.shape[2]):
#-----Fourier--H1-----------------------------------------
# ff1[:, :, 1] H1 compnent 1, if 0 then DC
ff1 = FFT.fftn(img[:,:,sl,:])
fh = np.absolute(FFT.ifftn(ff1[:, :, 1]))
#-----Fourier--H1-----------------------------------------

3d coordinates x,y,z to 3d numpy array

I have a 3d mask which is an ellipsoid. I have extracted the coordinates of the mask using np.argwhere. The coordinates can be assigned as x, y, z as in the example code. My question is how can I get my mask back (in the form of 3d numpy or boolean array of the same shape) from the coordinates x, y, z ?
import numpy as np
import scipy
import skimage
from skimage import draw
mask = skimage.draw.ellipsoid(10,12,18)
print mask.shape
coord = np.argwhere(mask)
x = coord[:,0]
y = coord[:,1]
z = coord[:,2]
The above code gives me boolean mask of the shape (23, 27, 39) and now I want to construct the same mask of exactly same shape using x, y, z coordinates. How can it be done?
I would like to modify the question above a bit. Now if I rotate my coordinates using quaternion which will give me new set of coordinates and then with new coordinates x1,y1,z1 I want to construct my boolean mask of shape (23,27,39) as that of original mask ? How can that be done ?
import quaternion
angle1 = 90
rotation = np.exp(quaternion.quaternion(0,0, 1) * angle1*(np.pi/180) / 2)
coord_rotd = quaternion.rotate_vectors(rotation, coord)
x1 = coord_rotd[:,0]
y1 = coord_rotd[:,1]
z1 = coord_rotd[:,2]
You can use directly x, y and z to reconstruct your mask. First, use a new array with the same shape as your mask. I pre-filled everything with zeros (i.e. False). Next, set each coordinate defined by x, y and z to True:
new_mask = np.zeros_like(mask)
new_mask[x,y,z] = True
# Check if mask and new_mask is the same
np.allclose(mask, new_mask)
# True
If you ask, if you can reconstruct your mask only knowing x, y and z, this is not possible. Because you loose information of what is not filled. Just imagine having your ellipsoid at a corner of a huge cube. How would you know (only knowing how the ellipsoid looks), how large the cube is?
Regarding your second question:
You have to fix your coordinates, because they can be out of your scenery. So I defined a function that takes care of this:
def fixCoordinates(coord, shape):
# move to the positive edge
# remove negative indices
# you can also add now +1 to
# have a margin around your ellipse
coord -= coord.min(0)
# trim coordinates outside of scene
for i, s in enumerate(shape):
coord[coord[:,i] >= s] = s-1
# Return coordinates and change dtype
return coord.astype(np.int)
And if you modify your code slightly, you can use the same strategy as before:
# your code
import quaternion
angle1 = 90
rotation = np.exp(quaternion.quaternion(0,0, 1) * angle1*(np.pi/180) / 2)
coord_rotd = quaternion.rotate_vectors(rotation, coord_rotd)
# Create new mask
new_mask2 = np.zeros_like(new_mask)
# Fix coordinates
coord_rotd = fixCoordinates(coord_rotd, mask.shape)
x1 = coord_rotd[:,0]
y1 = coord_rotd[:,1]
z1 = coord_rotd[:,2]
# create new mask, similar as before
new_mask2[x1, y1, z1] = True
Given your example rotation, you can now plot both masks (that have the same shape), side by side:
If you know the shape of your old mask, try this:
new_mask = np.full(old_mask_shape, True) # Fill new_mask with True everywhere
new_mask[x,y,z] = False # Set False for the ellipsoid part alone
Note:
old_mask_shape should be the same as shape of the image on which you intend to apply the mask.
If you want a True mask rather than a False one (if you want the ellipsoid part to be True and everywhere else False) just interchange True and False in the above two lines of code.

How to shift image array with supixel precison in python?

I am trying to shift a 2D array representing an image with subpixel precision using 2D FFTs and the Fourier transform shift theorem. It works well when the shift value is in an integer (pixel precision), however I get a lot of artifacts when the shift value is not an integer,ie., a fraction of a pixel.
The code is below:
import numpy as np
from scipy.fftpack import fftfreq
def shift_fft(input_array,shift):
shift_rows,shift_cols = shift
nr,nc = input_array.shape
Nr, Nc = fftfreq(nr), fftfreq(nc)
Nc,Nr = np.meshgrid(Nc,Nr)
fft_inputarray = np.fft.fft2(input_array)
fourier_shift = np.exp(1j*2*np.pi*((shift_rows*Nr)+(shift_cols*Nc)))
output_array = np.fft.ifft2(fft_inputarray*fourier_shift)
return np.real(output_array)
Thus, shift_fft(input_array,[2,0]) will work, but shift_fft(input_array,[2.4,0]) will not work without artifacts. What I am doing wrong?
For example, considering the image of Lena with 128x128 pixels. If I want to shift by 10.4 pixel in each direction, I get some wobbling modulation of the image.
The images are the following:
Before:
After:
You can try using scipy.ndimage.shift. It shifts pixels similar to numpy.roll, but also allows fractional shift values with interpolations.
For a colored image, make sure to provide a shift of 0 for the 3rd axis (channels).
import scipy.ndimage
scipy.ndimage.shift(input_array, (2.4, 0))
By default it'll set the background to black, but you can adjust the mode to have it wrap around or have a custom color.
For some reason I found scipy ndimage shift to be very slow, especially for n dimensional images. For shifting along a specific axis, for integer and non-integer shifts, I created a simple function :
def shift_img_along_axis( img, axis=0, shift = 1 , constant_values=0):
""" shift array along a specific axis. New value is taken as weighted by the two distances to the assocaited original pixels.
NOTE: at the border of image, when not enough original pixel is accessible, data will be meaned with regard to additional constant_values.
constant_values: value to set to pixels with no association in original image img
RETURNS : shifted image.
A.Mau. """
intshift = int(shift)
remain0 = abs( shift - int(shift) )
remain1 = 1-remain0 #if shift is uint : remain1=1 and remain0 =0
npad = int( np.ceil( abs( shift ) ) ) #ceil relative to 0. ( 0.5=> 1 and -0.5=> -1 )
pad_arg = [(0,0)]*img.ndim
pad_arg[axis] = (npad,npad)
bigger_image = np.pad( img, pad_arg, 'constant', constant_values=constant_values)
part1 = remain1*bigger_image.take( np.arange(npad+intshift, npad+intshift+img.shape[axis]) ,axis)
if remain0==0:
shifted = part1
else:
if shift>0:
part0 = remain0*bigger_image.take( np.arange(npad+intshift+1, npad+intshift+1+img.shape[axis]) ,axis) #
else:
part0 = remain0*bigger_image.take( np.arange(npad+intshift-1, npad+intshift-1+img.shape[axis]) ,axis) #
shifted = part0 + part1
return shifted
A quick example :
np.random.seed(1)
img = np.random.uniform(0,10,(3,4)).astype('int')
print( img )
shift = 1.5
shifted = shift_img_along_axis( img, axis=1, shift=shift )
print( shifted )
Image print :
[[4 7 0 3]
[1 0 1 3]
[3 5 4 6]]
Shifted image:
[[3.5 1.5 1.5 0. ]
[0.5 2. 1.5 0. ]
[4.5 5. 3. 0. ]]
With our shift of 1.5 the first value in shifted image is the mean of 7 and 0, and so on... If a value is missing in the original image an additionnal value of 0 will be taken.
If you want to get a similar result as np.roll (image goes back to the other side...) you would have to modify it a bit !

Correct method and Python package that can find width of an image's feature

The input is a spectrum with colorful (sorry) vertical lines on a black background. Given the approximate x coordinate of that band (as marked by X), I want to find the width of that band.
I am unfamiliar with image processing. Please direct me to the correct method of image processing and a Python image processing package that can do the same.
I am thinking PIL, OpenCV gave me an impression of being overkill for this particular application.
What if I want to make this an expert system that can classify them in the future?
I'll give a complete minimal working example (as suggested by sega_sai). I don't have access to your original image, but you'll see it doesn't really matter! The peak distributions found by the code below are:
Mean values at: 26.2840960523 80.8255092125
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("band2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
subplot(211)
imshow(pic)
subplot(223)
plot(projection)
subplot(224)
plot(X,Y,'r',lw=5)
show()
Below is a simple thresholding method to find the lines and their width, it should work quite reliably for any number of lines. The yellow and black image below was processed using this script, the red/black plot illustrates the found lines using parameters of threshold = 0.3, min_line_width = 5)
The script averages the rows of an image, and then determines the basic start and end positions of each line based on a threshold (which you can set between 0 and 1), and a minimum line width (in pixels). By using thresholding and minimum line width you can easily filter your input images to get the lines out of them. The first function find_lines returns all the lines in an image as a list of tuples containing the start, end, center, and width of each line. The second function find_closest_band_width is called with the specified x_position, and returns the width of the closest line to this position (assuming you want distance to centre for each line). As the lines are saturated (255 cut-off per channel), their cross-sections are not far from a uniform distribution, so I don't believe trying to fit any kind of distribution is really going to help too much, just unnecessarily complicates.
import Image, ImageStat
def find_lines(image_file, threshold, min_line_width):
im = Image.open(image_file)
width, height = im.size
hist = []
lines = []
start = end = 0
for x in xrange(width):
column = im.crop((x, 0, x + 1, height))
stat = ImageStat.Stat(column)
## normalises by 2 * 255 as in your example the colour is yellow
## if your images start using white lines change this to 3 * 255
hist.append(sum(stat.sum) / (height * 2 * 255))
for index, value in enumerate(hist):
if value > threshold and end >= start:
start = index
if value < threshold and end < start:
if index - start < min_line_width:
start = 0
else:
end = index
center = start + (end - start) / 2.0
width = end - start
lines.append((start, end, center, width))
return lines
def find_closest_band_width(x_position, lines):
distances = [((value[2] - x_position) ** 2) for value in lines]
index = distances.index(min(distances))
return lines[index][3]
## set your threshold, and min_line_width for finding lines
lines = find_lines("8IxWA_sample.png", 0.7, 4)
## sets x_position to 59th pixel
print 'width of nearest line:', find_closest_band_width(59, lines)
I don't think that you need anything fancy for you particular task.
I would just use PIL + scipy. That should be enough.
Because you essentially need to take your image, make a 1D-projection of it
and then fit a Gaussian or something like that to it. The information about the approximate location of the band should be used a first guess for the fitter.

Categories