I am trying to shift a 2D array representing an image with subpixel precision using 2D FFTs and the Fourier transform shift theorem. It works well when the shift value is in an integer (pixel precision), however I get a lot of artifacts when the shift value is not an integer,ie., a fraction of a pixel.
The code is below:
import numpy as np
from scipy.fftpack import fftfreq
def shift_fft(input_array,shift):
shift_rows,shift_cols = shift
nr,nc = input_array.shape
Nr, Nc = fftfreq(nr), fftfreq(nc)
Nc,Nr = np.meshgrid(Nc,Nr)
fft_inputarray = np.fft.fft2(input_array)
fourier_shift = np.exp(1j*2*np.pi*((shift_rows*Nr)+(shift_cols*Nc)))
output_array = np.fft.ifft2(fft_inputarray*fourier_shift)
return np.real(output_array)
Thus, shift_fft(input_array,[2,0]) will work, but shift_fft(input_array,[2.4,0]) will not work without artifacts. What I am doing wrong?
For example, considering the image of Lena with 128x128 pixels. If I want to shift by 10.4 pixel in each direction, I get some wobbling modulation of the image.
The images are the following:
Before:
After:
You can try using scipy.ndimage.shift. It shifts pixels similar to numpy.roll, but also allows fractional shift values with interpolations.
For a colored image, make sure to provide a shift of 0 for the 3rd axis (channels).
import scipy.ndimage
scipy.ndimage.shift(input_array, (2.4, 0))
By default it'll set the background to black, but you can adjust the mode to have it wrap around or have a custom color.
For some reason I found scipy ndimage shift to be very slow, especially for n dimensional images. For shifting along a specific axis, for integer and non-integer shifts, I created a simple function :
def shift_img_along_axis( img, axis=0, shift = 1 , constant_values=0):
""" shift array along a specific axis. New value is taken as weighted by the two distances to the assocaited original pixels.
NOTE: at the border of image, when not enough original pixel is accessible, data will be meaned with regard to additional constant_values.
constant_values: value to set to pixels with no association in original image img
RETURNS : shifted image.
A.Mau. """
intshift = int(shift)
remain0 = abs( shift - int(shift) )
remain1 = 1-remain0 #if shift is uint : remain1=1 and remain0 =0
npad = int( np.ceil( abs( shift ) ) ) #ceil relative to 0. ( 0.5=> 1 and -0.5=> -1 )
pad_arg = [(0,0)]*img.ndim
pad_arg[axis] = (npad,npad)
bigger_image = np.pad( img, pad_arg, 'constant', constant_values=constant_values)
part1 = remain1*bigger_image.take( np.arange(npad+intshift, npad+intshift+img.shape[axis]) ,axis)
if remain0==0:
shifted = part1
else:
if shift>0:
part0 = remain0*bigger_image.take( np.arange(npad+intshift+1, npad+intshift+1+img.shape[axis]) ,axis) #
else:
part0 = remain0*bigger_image.take( np.arange(npad+intshift-1, npad+intshift-1+img.shape[axis]) ,axis) #
shifted = part0 + part1
return shifted
A quick example :
np.random.seed(1)
img = np.random.uniform(0,10,(3,4)).astype('int')
print( img )
shift = 1.5
shifted = shift_img_along_axis( img, axis=1, shift=shift )
print( shifted )
Image print :
[[4 7 0 3]
[1 0 1 3]
[3 5 4 6]]
Shifted image:
[[3.5 1.5 1.5 0. ]
[0.5 2. 1.5 0. ]
[4.5 5. 3. 0. ]]
With our shift of 1.5 the first value in shifted image is the mean of 7 and 0, and so on... If a value is missing in the original image an additionnal value of 0 will be taken.
If you want to get a similar result as np.roll (image goes back to the other side...) you would have to modify it a bit !
Related
I have a set of data like these:
And I want to recognize this 2 kind of corner shape, is there any way? I wrote s snippet but it sucks.
In this way I am trying to find desc corner (from top to down) and to find asc corner I am "rotating" the dataset and I apply the same algorithm for desc corners.
...
n_rows = len(matrix)
edge_closing_left_total = 0
j=0
for i in range(n_rows):
edge_closing_left = True
current_value = matrix[i][j]
if current_value >= 230:
current_row = i
start = j+1
end = j+1+3
for k in range(start, end):
if current_row+2 < n_rows:
if matrix[current_row][k]<230 and matrix[current_row+1][k]<230 or matrix[current_row+1][k]<230 and matrix[current_row+2][k]<=230:
current_row+=1
else:
edge_closing_left = False
break
if edge_closing_left:
edge_closing_left_total+=1
return edge_closing_left_total
Here the csv dataset file.
So you need to do template/pattern matching. This could be achieved with correlation. The following code should demonstrate how to do it using scipy:
import pandas as pd
from scipy.signal import correlate2d
img = pd.read_csv('matrix.csv', header=None).to_numpy()
norm = img - img.mean() # subtract mean to normalize
edge = norm[26:36, :] # the edge template, adjust if needed
corr = correlate2d(norm, edge, mode='valid')
auto_corr = correlate2d(edge, edge, mode='valid')
corr /= auto_corr # normalize correlation so that 1 means perfect correlation
corr_cutoff = .9 # 1 is pixel-perfect match
print(f'Found edge template {(corr > corr_cutoff).sum()} times in image when taking a similarity cut-off of {corr_cutoff}')
# Found edge template 2 times in image when taking a similarity cut-off of 0.9
The original image (transposed to make it wider than high):
The edge template looks like this:
The correlation map looks like this (note: this is actually only 1 px wide but blown up for visualization):
The thresholded correlation map with cut-off 0.9 (giving 2 results as expected, choose threshold as needed):
I've been trying to do a code that labels a binary matrix, i.e. I want to do a function that finds all connected components in an image and assigns a unique label to all points in the same component. The problem is that I found a function, imbinarize(), that creates a binary image and I want to know how to do it without that function (because I don't know how to do it).
EDIT: I realized that it isn't needed to binarize the image, because it is being assumed that all the images that are put as argument are already binarized. So, I changed my code. It happens that code is not working, and I think the problem is in one of the cycles, but I can't understand why.
import numpy as np
%matplotlib inline
from matplotlib import pyplot as plt
def connected_components(image):
M = image * 1
# write your code here
(row, column) = M.shape #shape of the matrix
#Second step
L = 2
#Third step
q = []
#Fourth step
#Method to look for ones starting on the pixel (0, 0) and going from left to right and top-down
for i in np.arange(row):
for j in np.arange(column):
if M[i][j] == 1:
M[i][j] = L
q.append(M[i-1][j])
q.append(M[i+1][j])
q.append(M[i][j-1])
q.append(M[i][j+1])
#Fifth step
while len(q) != 0: #same as saying 'while q is not empty'
if q[0] == 1:
M[0] = L
q.append(M[i-1][j])
q.append(M[i+1][j])
q.append(M[i][j-1])
q.append(M[i][j+1])
#Sixth step
L = L + 1
#Seventh step: goes to the beginning of the for-cycle
return labels
pyplot.binarize in its most simple form thresholds an image such that any intensity whose value is beyond a certain threshold is assigned a binary 1 / True and a binary 0 / False otherwise. It is actually more sophisticated than this as it uses some image morphology for noise removal as well as use adaptive thresholds to find the most optimal value to separate between foreground and background. As I see this post as more for validating the connected components algorithm you've created, I'm going to assume that the basic algorithm is fine and the actual algorithm to be out of scope for your needs.
Once you read in the image with matplotlib, it is most likely going to be three channels so you'll need to convert the image into grayscale first, then threshold after. We can make this more adaptive based on the number of channels that exist.
Therefore, let's define a function to threshold the image for us. You'll need to play around with the threshold until you get good results. Also take note that plt.imread reads in float32 values, so the threshold will be defined between [0-1]. We can try 0.5 as a good start:
def binarize(im, threshold=0.5):
if len(im.shape) == 3:
gray = 0.299*im[...,0] + 0.587*im[...,1] + 0.114*im[...,2]
else:
gray = im
return (gray >= threshold).astype(np.uint8)
This will check if the input image is in RGB. If it is, convert to grayscale accordingly. The method to convert from RGB to grayscale uses the SMPTE Rec. 709 standard. Once we have the grayscale image, simply return a new image where everything that meets the threshold and beyond gets assigned an integer 1 and everything else is integer 0. I've converted the result to an integer type because your connected components algorithm assumes a 0/1 labelling.
You can then replace your code with:
#First step
Image = plt.imread(image) #reads the image on the argument
M = binarize(Image) #imbinarize() converts an image to a binary matrix
(row, column) = np.M.shape #shape of the matrix
Minor Note
In your test code, you are supplying a test image directly whereas your actual code performs an imread operation. imread expects a string so by specifying the actual array, your code will produce an error. If you want to accommodate for both an array and a string, you should check to see if the input is a string vs. an array:
if type(image) is str:
Image = plt.imread(image) #reads the image on the argument
else:
Image = image
M = binarize(Image) #imbinarize() converts an image to a binary matrix
(row, column) = np.M.shape #shape of the matrix
I can't understand the code starting from #LIST DEST PIXEL INDICES, I don't get why the and x and y
are the way they are, why aren't they starting from 0,0 instead of DIM//2. From my understanding, this function transforms the image by "inverse" transforming the destination pixels and plugging the original pixels in the destination pixels positions.
get_mat(*args) returns a 3x3 matrix that does various transformations, pretend it is a rotation matrix
take IMAGE_SIZE[0] as 224
this is part of Chris Deotte Kaggel notebook https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96
def transform(image,label):
# input image - is one image of size [dim,dim,3] not a batch of [b,dim,dim,3]
# output - image randomly rotated, sheared, zoomed, and shifted
DIM = IMAGE_SIZE[0]
XDIM = DIM%2 #fix for size 331
rot = 15. * tf.random.normal([1],dtype='float32')
shr = 5. * tf.random.normal([1],dtype='float32')
h_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
w_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
h_shift = 16. * tf.random.normal([1],dtype='float32')
w_shift = 16. * tf.random.normal([1],dtype='float32')
# GET TRANSFORMATION MATRIX
m = get_mat(rot,shr,h_zoom,w_zoom,h_shift,w_shift)
# LIST DESTINATION PIXEL INDICES
x = tf.repeat( tf.range(DIM//2,-DIM//2,-1), DIM )
y = tf.tile( tf.range(-DIM//2,DIM//2),[DIM] )
z = tf.ones([DIM*DIM],dtype='int32')
idx = tf.stack( [x,y,z] )
# ROTATE DESTINATION PIXELS ONTO ORIGIN PIXELS
idx2 = K.dot(m,tf.cast(idx,dtype='float32'))
idx2 = K.cast(idx2,dtype='int32')
idx2 = K.clip(idx2,-DIM//2+XDIM+1,DIM//2)
# FIND ORIGIN PIXEL VALUES
idx3 = tf.stack( [DIM//2-idx2[0,], DIM//2-1+idx2[1,]] )
d = tf.gather_nd(image,tf.transpose(idx3))
return tf.reshape(d,[DIM,DIM,3]),label
This seems like an affine transform of an image done the simple way(without multisampling), i.e. for all of destination pixels find corresponding pixel in source array and take its value.
destination pixel coorinates are put on a list idx, which is the transformed with matrix m, then they are cast into integers, and culled to image size.
idx3 is assembled from reverse-offset components (x,y) of source coorinates and used to index original image
why aren't they starting from 0,0 instead of DIM//2
(0,0) is top left corner of an image data array. When thinking of matrix transform you usually think that (0,0) point lies at the center of an image, and thus an offset
I have a hybrid image that was created by superimposing the low frequencies of one image with the high frequencies of another. I'm trying to separate (de-hybridize) this image by passing it through a low-pass filter to extract the low frequencies (one of the two images), and then subtracting that from the original image to yield the other image (high frequencies).
**Problem: ** When I extract the low frequencies, the values are all higher than the original image, so when I subtract the low frequencies from the original image, what's left is a bunch of negative values.
Does anyone know why my low pass filter is yielding higher frequency values than the original image?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from numpy.fft import fft2, ifft2, fftshift, ifftshift
# Make Gaussian filter
def makeGaussianFilter(numRows, numCols, sigma, highPass=True):
centerI = int(numRows/2) + 1 if numRows % 2 == 1 else int(numRows/2)
centerJ = int(numCols/2) + 1 if numCols % 2 == 1 else int(numCols/2)
def gaussian(i,j):
coefficient = np.exp(-1.0 * ((i - centerI)**2 + (j - centerJ)**2) / (2 * sigma**2))
return 1 - coefficient if highPass else coefficient
return np.array([[gaussian(i,j) for j in range(numCols)] for i in range(numRows)])
# Filter discrete Fourier transform
def filterDFT(imageMatrix, filterMatrix):
shiftedDFT = fftshift(fft2(imageMatrix))
filteredDFT = shiftedDFT * filterMatrix
return ifft2(ifftshift(filteredDFT))
# Low-pass filter
def lowPass(imageMatrix, sigma):
n,m = imageMatrix.shape
return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=False))
# Read in einsteinandwho.png and convert to format that can be displayed by plt.imshow
im3 = mpimg.imread('einsteinandwho.png')
rows = im3.shape[0]
cols = im3.shape[1]
img3 = np.ones((rows, cols, 4))
for i in range(rows):
for j in range(cols):
img3[i][j][0:3] = im3[i][j]
img3[j][j][3] = 1
# Extract low frequencies and convert to format that can be displayed by plt.imshow
lowPassed = np.real(lowPass(im3, 10))
low = np.ones((rows, cols, 4))
for i in range(rows):
for j in range(cols):
low[i][j][0:3] = lowPassed[i][j]
low[j][j][3] = 1
# Remove low frequencies from image
output = img3[:,:,0:3] - low[:,:,0:3]
Does anyone know why my low pass filter is yielding higher frequency values than the original image?
Do notice the difference between pixel values and frequency values. You are seeing the pixel values being higher, not the frequency values!
When I run your code I see the high-frequency component having both negative and positive pixel values, not all negative values. It is expected for this image to have a zero mean. The zero frequency component (also called DC component) is the one that sets the mean pixel value. By subtracting a low-pass filtered image, you are setting the zero frequency to 0, and thus setting the mean pixel value to 0 (the low-pass filtered image contains all of the power of the zero frequency).
i have a large numpy array and labeled it with the connected component labeling in scipy. Now i want to create subsets of this array, where only the biggest or smallest labels in size are left.
Both extrema can of course occur several times.
import numpy
from scipy import ndimage
....
# Loaded in my image file here. To big to paste
....
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
# get the area (nr. of pixels) of each labeled patch
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())
mip = numpy.where(sizes==sizes.min())
# This here doesn't work! Now i want to create a copy of the array and fill only those cells
# inside the largest, respecitively the smallest labeled patches with values
feature = numpy.zeros_like(array, dtype=int)
feature[labeled_array == map] = 1
Someone can give me hint how to move on?
Here is the full code:
import numpy
from scipy import ndimage
array = numpy.zeros((100, 100), dtype=np.uint8)
x = np.random.randint(0, 100, 2000)
y = np.random.randint(0, 100, 2000)
array[x, y] = 1
pl.imshow(array, cmap="gray", interpolation="nearest")
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())[0] + 1
mip = numpy.where(sizes==sizes.min())[0] + 1
# inside the largest, respecitively the smallest labeled patches with values
max_index = np.zeros(numpatches + 1, np.uint8)
max_index[map] = 1
max_feature = max_index[labeled_array]
min_index = np.zeros(numpatches + 1, np.uint8)
min_index[mip] = 1
min_feature = min_index[labeled_array]
Notes:
numpy.where returns a tuple
the size of label 1 is sizes[0], so you need to add 1 to the result of numpy.where
To get a mask array with multiple labels, you can use labeled_array as the index of a label mask array.
The results:
first you need a labeled mask, given a mask with only 0(background) and 1(foreground):
labeled_mask, cc_num = ndimage.label(mask)
then find the largest connected component:
largest_cc_mask = (labeled_mask == (np.bincount(labeled_mask.flat)[1:].argmax() + 1))
you can deduce the smallest object finding by using argmin()..