I want to compare the similarity between one reference window(patch) and all other windows(patches) taken from an image. My code is given below.
Can anyone please help me evaluate the similarity between 'ref' (reference window) and all other 10000 windows given by variable 'test'? thank you
Detailed explanation:
I tried doing using for loop. it is time-consuming. I tried to use the built-in function "ssim" but it says the dimension of tensors do not match. please suggest any method to do this batch processing
# Read grayscale image from file.
Im = Image.open("cameraman.png")
#Resize it to desired shape (h,w)
Im = Im.resize((100,100))
# expand dimensions to get the shape [ no of batches, height, width, channel]
Im = np.expand_dims(Im,axis=0)
Im = np.expand_dims(Im,axis=0)
x = tf.convert_to_tensor(Im)
x=tf.reshape(x,[1,100,100,1]) # this is the required image shape in a tensor
# Break one image into windows of 11x11 (overlapping)
wsize=11
ws=50 # Index of centre window (this window is reference window)
#Extract windows of 11 x 11 around each pixel
p1=tf.extract_image_patches(x,sizes=[1,wsize,wsize,1],strides=[1,1,1,1],rates=[1,1,1,1],padding="SAME")
patches_shape = tf.shape(p1)
test=tf.reshape(p1, [tf.reduce_prod(patches_shape[0:3]), 11, 11, ]) # returns [#window_patches, h, w, c]
print(test.shape) #test has shape [ 10000, 11,11]
ref=test[5000,] # this is the reference window of shape [ 1, 11,11]
ref=tf.reshape(ref,[1,11,11])
print(im1.shape)
The following statement says size mismatch:
ssim1 = tf.image.ssim(ref, test, max_val=255, filter_size=11,filter_sigma=1.5, k1=0.01, k2=0.03)
**ValueError: Shapes (1, 11, 11) and (10000, 11, 11) are incompatible.**
I expect the distance between each of these windows and the reference to be printed.
You need to align the first dimension. You can either iterate over your 10000 image batch or broadcast your original patch. However, from a performance perspective, it is recommended to iterate over them by using tf.map_fn().
Furthermore, you need to expand the last dimension, bc tf.image.ssim expects a third order tensor.
Here is a working example tested with tf 2.0 and eager execution:
arr1 = tf.convert_to_tensor(np.random.random([10000, 11, 11, 1]), dtype=tf.dtypes.float32)
arr2 = tf.convert_to_tensor(np.random.random([1, 11, 11, 1]), dtype=tf.dtypes.float32)
result_tensor = tf.map_fn(lambda x: tf.image.ssim(arr2[1:], x, 1), arr1)
The result tensor has the shape [10000, 0]. To get the mean call tf.reduce_mean.
However, pls revise the filter shape of 11x11 for a 11x11 patch and provide a working example next time.
Related
I have a tensor T of the shape (8, 5, 300), where 8 is the batch size, 5 is the number of documents in each batch, and 300 is the encoding of each of the document. If I reshape the Tensor as follows, does the properties of my Tensor remain the same?
T = T.reshape(5, 300, 8)
T.shape
>> Size[5, 300, 8]
So, does this new Tensor indicate the same properties as the original one? By the properties, I mean, can I say that this is also a Tensor of batch size 8, with 5 documents for each batch, and a 300 dimensional encoding for each document?
Does this affect the training of the model? If reshaping of Tensor messes up the datapoints, then there is no point in training. For example, If reshaping like above gives output as a batch of 5 samples, with 300 documents of size 8 each. If it happens so, then it's useless, since I do not have 300 documents, neither do I have batch of 5 samples.
I need to reshape it like this because my model in between produces output of the shape [8, 5, 300], and the next layer accepts input as [5, 300, 8].
NO
You need to understand the difference between reshape/view and permute.
reshape and view only changes the "shape" of the tensor, without re-ordering the elements. Therefore
orig = torch.rand((8, 5, 300))
resh = orig.reshape(5, 300, 8)
orig[0, 0, :] != resh[0, :, 0]
If you want to change the order of the elements as well, you need to permute it:
perm = orig.permute(1, 2, 0)
orig[0, 0, :] == perm[0, :, 0]
NOOO!
I made a similar mistake.
Imagine you converting 2-d Tensor( Matrix) into 1-D Tensor(Array) and applying transform functionality on it. This would create serious issues in code as your new tensor has characteristic of an array.
Hope you got my point.
I am analyzing some image represented datasets using keras. I am stuck that I have two different dimensions of images. Please see the snapshot. Features has 14637 images having dimension (10,10,3) and features2 has dimension (10,10,100)
Is there any way that I can merge/concatenate these two data together.?
If features and features2 contain the features of the same batch of images, that is features[i] is the same image of features2[i] for each i, then it would make sense to group the features in a single array using the numpy function concatenate():
newArray = np.concatenate((features, features2), axis=3)
Where 3 is the axis along which the arrays will be concatenated. In this case, you'll end up with a new array having dimension (14637, 10, 10, 103).
However, if they refer to completely different batches of images and you would like to merge them on the first axis such that the 14637 images of features2 are placed after the first 14637 image, then, there no way you can end up with an array, since numpy array are structured as matrix, non as a list of objects.
For instance, if you try to execute:
> a = np.array([[0, 1, 2]]) // shape = (1, 3)
> b = np.array([[0, 1]]) // shape = (1, 2)
> c = np.concatenate((a, b), axis=0)
Then, you'll get:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
since you are concatenating along axis = 0 but axis 1's dimensions differ.
If dealing with numpy arrays, you should be able to use concatenate method and specify the axis, along which the data should be merged. Basically: np.concatenate((array_a, array_b), axis=2)
I think it would be better if you use class.
class your_class:
array_1 = []
array_2 = []
final_array = []
for x in range(len(your_previous_one_array)):
temp_class = your_class
temp_class.array_1 = your_previous_one_array
temp_class.array_2 = your_previous_two_array
final_array.append(temp_class)
I want to make a dynamic loss function in tensorflow. I want to calculate the energy of a signal's FFT, more specifically only a window of size 3 around the most dominant peak. I am unable to implement in TF, as it throws a lot of errors like Stride and InvalidArgumentError (see above for traceback): Expected begin, end, and strides to be 1D equal size tensors, but got shapes [1,64], [1,64], and [1] instead.
My code is this:
self.spec = tf.fft(self.signal)
self.spec_mag = tf.complex_abs(self.spec[:,1:33])
self.argm = tf.cast(tf.argmax(self.spec_mag, 1), dtype=tf.int32)
self.frac = tf.reduce_sum(self.spec_mag[self.argm-1:self.argm+2], 1)
Since I am computing batchwise of 64 and dimension of data as 64 too, the shape of self.signal is (64,64). I wish to calculate only the AC components of the FFT. As the signal is real valued, only half the spectrum would do the job. Hence, the shape of self.spec_mag is (64,32).
The max in this fft is located at self.argm which has a shape (64,1).
Now I want to calculate the energy of 3 elements around the max peak via: self.spec_mag[self.argm-1:self.argm+2].
However when I run the code and try to obtain the value of self.frac, I get thrown with multiple errors.
It seems like you were missing and index when accessing argm. Here is the fixed version of the 1, 64 version.
import tensorflow as tf
import numpy as np
x = np.random.rand(1, 64)
xt = tf.constant(value=x, dtype=tf.complex64)
signal = xt
print('signal', signal.shape)
print('signal', signal.eval())
spec = tf.fft(signal)
print('spec', spec.shape)
print('spec', spec.eval())
spec_mag = tf.abs(spec[:,1:33])
print('spec_mag', spec_mag.shape)
print('spec_mag', spec_mag.eval())
argm = tf.cast(tf.argmax(spec_mag, 1), dtype=tf.int32)
print('argm', argm.shape)
print('argm', argm.eval())
frac = tf.reduce_sum(spec_mag[0][(argm[0]-1):(argm[0]+2)], 0)
print('frac', frac.shape)
print('frac', frac.eval())
and here is the expanded version (batch, m, n)
import tensorflow as tf
import numpy as np
x = np.random.rand(1, 1, 64)
xt = tf.constant(value=x, dtype=tf.complex64)
signal = xt
print('signal', signal.shape)
print('signal', signal.eval())
spec = tf.fft(signal)
print('spec', spec.shape)
print('spec', spec.eval())
spec_mag = tf.abs(spec[:, :, 1:33])
print('spec_mag', spec_mag.shape)
print('spec_mag', spec_mag.eval())
argm = tf.cast(tf.argmax(spec_mag, 2), dtype=tf.int32)
print('argm', argm.shape)
print('argm', argm.eval())
frac = tf.reduce_sum(spec_mag[0][0][(argm[0][0]-1):(argm[0][0]+2)], 0)
print('frac', frac.shape)
print('frac', frac.eval())
you may want to fix function names since I edit this code at a newer version of tensorflow.
Tensorflow indexing uses tf.Tensor.getitem:
This operation extracts the specified region from the tensor. The notation is similar to NumPy with the restriction that currently only support basic indexing. That means that using a tensor as input is not currently allowed
So using tf.slice and tf.strided_slice is out of the question as well.
Whereas in tf.gather indices defines slices into the first dimension of Tensor, in tf.gather_nd, indices defines slices into the first N dimensions of the Tensor, where N = indices.shape[-1]
Since you wanted the 3 values around the max, I manually extract the first, second and third element using a list comprehension, followed be a tf.stack
import tensorflow as tf
signal = tf.placeholder(shape=(64, 64), dtype=tf.complex64)
spec = tf.fft(signal)
spec_mag = tf.abs(spec[:,1:33])
argm = tf.cast(tf.argmax(spec_mag, 1), dtype=tf.int32)
frac = tf.stack([tf.gather_nd(spec,tf.transpose(tf.stack(
[tf.range(64), argm+i]))) for i in [-1, 0, 1]])
frac = tf.reduce_sum(frac, 1)
This will fail for the corner case where argm is the first or last element in the row, but it should be easy to resolve.
I am trying to teach myself to build a CNN that takes more than one image as an input. Since the dataset I created to test this is large and in the long run I hope to solve a problem involving a very large dataset, I am using a generator to read images into arrays which I am passing to Keras Model's fit_generator function.
When I run my generator in isolation it works fine, and produces outputs of the appropriate shape. It yields a tuple containing two entries, the first of which has shape (4, 100, 100, 1) and the second of which has shape (4, ).
Reading about multiple input Keras CNNs has given me the impression that this is the right format for a generator for a 4 input CNN that is identifying which of the 4 inputs contains an image.
However, when I run the code I get:
"ValueError: Error when checking input: expected input_121 to have 4 dimensions, but got array with shape (100, 100, 1)"
I've been searching for a solution for some time now and I suspect that the problem lies in getting my (100, 100, 1) shape arrays to be sent to the Inputs as (None, 100, 100, 1) shape arrays.
But when I tried to modify the output of my generator I get an error about having dimension 5, which makes sense as an error because the output of the generator should have the form X, y = [X1, X2, X3, X4], [a, b, c, d], where Xn has shape (100, 100, 1), and a/b/c/d are numbers.
Here is the code:
https://gist.github.com/anonymous/d283494aee982fbc30f3b52f2a6f422c
Thanks in advance!
You are creating a list of arrays in your generator with the wrong dimensions.
If you want the correct shape, reshape individual images to have the 4 dimensions: (n_samples, x_size, y_size, n_bands) your model will work. In your case you should reshape your images to (1, 100, 100, 1).
At the end stack them with np.vstack. The generator will yield an array of shape (4, 100, 100, 1).
Check if this adapted code works
def input_generator(folder, directories):
Streams = []
for i in range(len(directories)):
Streams.append(os.listdir(folder + "/" + directories[i]))
for j in range(len(Streams[i])):
Streams[i][j] = "Stream" + str(i + 1) + "/" + Streams[i][j]
Streams[i].sort()
length = len(Streams[0])
index = 0
while True:
X = []
y = np.zeros(4)
for Stream in Streams:
image = load_img(folder + '/' + Stream[index], grayscale = True)
array = img_to_array(image).reshape((1,100,100,1))
X.append(array)
y[int(Stream[index][15]) - 1] = 1
index += 1
index = index % length
yield np.vstack(X), y
I have a three dimensional numpy array of images (CIFAR-10 dataset). The image array shape is like below:
a = np.random.rand(32, 32, 3)
Before I do any deep learning, I want to normalize the data to get better result. With a 1D array, I know we can do min max normalization like this:
v = np.random.rand(6)
(v - v.min())/(v.max() - v.min())
Out[68]:
array([ 0.89502294, 0. , 1. , 0.65069468, 0.63657915,
0.08932196])
However, when it comes to a 3D array, I am totally lost. Specifically, I have the following questions:
Along which axis do we take the min and max?
How do we implement this with the 3D array?
I appreciate your help!
EDIT:
It turns out I need to work with a 4D Numpy array with shape (202, 32, 32, 3), so the first dimension would be the index for the image, and the last 3 dimensions are the actual image. It'll be great if someone can provide me with the code to normalize such a 4D array. Thanks!
EDIT 2:
Thanks to #Eric's code below, I've figured it out:
x_min = x.min(axis=(1, 2), keepdims=True)
x_max = x.max(axis=(1, 2), keepdims=True)
x = (x - x_min)/(x_max-x_min)
Assuming you're working with image data of shape (W, H, 3), you should probably normalize over each channel (axis=2) separately, as mentioned in the other answer.
You can do this with:
# keepdims makes the result shape (1, 1, 3) instead of (3,). This doesn't matter here, but
# would matter if you wanted to normalize over a different axis.
v_min = v.min(axis=(0, 1), keepdims=True)
v_max = v.max(axis=(0, 1), keepdims=True)
(v - v_min)/(v_max - v_min)
Along which axis do we take the min and max?
To answer this we probably need more information about your data, but in general, when discussing 3 channel images for example, we would normalize using the per-channel min and max. this means that we would perform the normalization 3 times - once per channel.
Here's an example:
img = numpy.random.randint(0, 100, size=(10, 10, 3)) # Generating some random numbers
img = img.astype(numpy.float32) # converting array of ints to floats
img_a = img[:, :, 0]
img_b = img[:, :, 1]
img_c = img[:, :, 2] # Extracting single channels from 3 channel image
# The above code could also be replaced with cv2.split(img) << which will return 3 numpy arrays (using opencv)
# normalizing per channel data:
img_a = (img_a - numpy.min(img_a)) / (numpy.max(img_a) - numpy.min(img_a))
img_b = (img_b - numpy.min(img_b)) / (numpy.max(img_b) - numpy.min(img_b))
img_c = (img_c - numpy.min(img_c)) / (numpy.max(img_c) - numpy.min(img_c))
# putting the 3 channels back together:
img_norm = numpy.empty((10, 10, 3), dtype=numpy.float32)
img_norm[:, :, 0] = img_a
img_norm[:, :, 1] = img_b
img_norm[:, :, 2] = img_c
Edit: It just occurred to me that once you have the one channel data (32x32 image for instance) you can simply use:
from sklearn.preprocessing import normalize
img_a_norm = normalize(img_a)
How do we work with the 3D array?
Well, this is a bit of a big question. If you need functions like array-wise min and max I would use the Numpy versions. Indexing, for instance, is achieved through axis-wide separators - as you can see from my example above.
Also, please refer to Numpy's documentation of ndarray # https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html
to learn more. they really have an amazing set of tools for n-dimensional arrays.
There are different approaches here. You can either decide to normalize over the whole batch of images or normalize per single image. To do that you can either use the mean of a single image or use the mean of the whole batch of images or use a fixed mean from another dataset - e.g. you can use the ImageNet mean value.
If you want to do the same as Tensorflow's tf.image.per_image_standardization you should normalize per single image with the mean of this image. So you loop through all images and do the normalization for all axes in a single image like this:
import math
import numpy as np
from PIL import Image
# open images
image_1 = Image.open("your_image_1.jpg")
image_2 = Image.open("your_image_2.jpg")
images = [image_1, image_2]
images = np.array(images)
standardized_images = []
# standardize images
for image in images:
mean = image.mean()
stddev = image.std()
adjusted_stddev = max(stddev, 1.0/math.sqrt(image.size))
standardized_image = (image - mean) / adjusted_stddev
standardized_images.append(standardized_image)
standardized_images = np.array(standardized_images)