I am trying to teach myself to build a CNN that takes more than one image as an input. Since the dataset I created to test this is large and in the long run I hope to solve a problem involving a very large dataset, I am using a generator to read images into arrays which I am passing to Keras Model's fit_generator function.
When I run my generator in isolation it works fine, and produces outputs of the appropriate shape. It yields a tuple containing two entries, the first of which has shape (4, 100, 100, 1) and the second of which has shape (4, ).
Reading about multiple input Keras CNNs has given me the impression that this is the right format for a generator for a 4 input CNN that is identifying which of the 4 inputs contains an image.
However, when I run the code I get:
"ValueError: Error when checking input: expected input_121 to have 4 dimensions, but got array with shape (100, 100, 1)"
I've been searching for a solution for some time now and I suspect that the problem lies in getting my (100, 100, 1) shape arrays to be sent to the Inputs as (None, 100, 100, 1) shape arrays.
But when I tried to modify the output of my generator I get an error about having dimension 5, which makes sense as an error because the output of the generator should have the form X, y = [X1, X2, X3, X4], [a, b, c, d], where Xn has shape (100, 100, 1), and a/b/c/d are numbers.
Here is the code:
https://gist.github.com/anonymous/d283494aee982fbc30f3b52f2a6f422c
Thanks in advance!
You are creating a list of arrays in your generator with the wrong dimensions.
If you want the correct shape, reshape individual images to have the 4 dimensions: (n_samples, x_size, y_size, n_bands) your model will work. In your case you should reshape your images to (1, 100, 100, 1).
At the end stack them with np.vstack. The generator will yield an array of shape (4, 100, 100, 1).
Check if this adapted code works
def input_generator(folder, directories):
Streams = []
for i in range(len(directories)):
Streams.append(os.listdir(folder + "/" + directories[i]))
for j in range(len(Streams[i])):
Streams[i][j] = "Stream" + str(i + 1) + "/" + Streams[i][j]
Streams[i].sort()
length = len(Streams[0])
index = 0
while True:
X = []
y = np.zeros(4)
for Stream in Streams:
image = load_img(folder + '/' + Stream[index], grayscale = True)
array = img_to_array(image).reshape((1,100,100,1))
X.append(array)
y[int(Stream[index][15]) - 1] = 1
index += 1
index = index % length
yield np.vstack(X), y
Related
I want to compare the similarity between one reference window(patch) and all other windows(patches) taken from an image. My code is given below.
Can anyone please help me evaluate the similarity between 'ref' (reference window) and all other 10000 windows given by variable 'test'? thank you
Detailed explanation:
I tried doing using for loop. it is time-consuming. I tried to use the built-in function "ssim" but it says the dimension of tensors do not match. please suggest any method to do this batch processing
# Read grayscale image from file.
Im = Image.open("cameraman.png")
#Resize it to desired shape (h,w)
Im = Im.resize((100,100))
# expand dimensions to get the shape [ no of batches, height, width, channel]
Im = np.expand_dims(Im,axis=0)
Im = np.expand_dims(Im,axis=0)
x = tf.convert_to_tensor(Im)
x=tf.reshape(x,[1,100,100,1]) # this is the required image shape in a tensor
# Break one image into windows of 11x11 (overlapping)
wsize=11
ws=50 # Index of centre window (this window is reference window)
#Extract windows of 11 x 11 around each pixel
p1=tf.extract_image_patches(x,sizes=[1,wsize,wsize,1],strides=[1,1,1,1],rates=[1,1,1,1],padding="SAME")
patches_shape = tf.shape(p1)
test=tf.reshape(p1, [tf.reduce_prod(patches_shape[0:3]), 11, 11, ]) # returns [#window_patches, h, w, c]
print(test.shape) #test has shape [ 10000, 11,11]
ref=test[5000,] # this is the reference window of shape [ 1, 11,11]
ref=tf.reshape(ref,[1,11,11])
print(im1.shape)
The following statement says size mismatch:
ssim1 = tf.image.ssim(ref, test, max_val=255, filter_size=11,filter_sigma=1.5, k1=0.01, k2=0.03)
**ValueError: Shapes (1, 11, 11) and (10000, 11, 11) are incompatible.**
I expect the distance between each of these windows and the reference to be printed.
You need to align the first dimension. You can either iterate over your 10000 image batch or broadcast your original patch. However, from a performance perspective, it is recommended to iterate over them by using tf.map_fn().
Furthermore, you need to expand the last dimension, bc tf.image.ssim expects a third order tensor.
Here is a working example tested with tf 2.0 and eager execution:
arr1 = tf.convert_to_tensor(np.random.random([10000, 11, 11, 1]), dtype=tf.dtypes.float32)
arr2 = tf.convert_to_tensor(np.random.random([1, 11, 11, 1]), dtype=tf.dtypes.float32)
result_tensor = tf.map_fn(lambda x: tf.image.ssim(arr2[1:], x, 1), arr1)
The result tensor has the shape [10000, 0]. To get the mean call tf.reduce_mean.
However, pls revise the filter shape of 11x11 for a 11x11 patch and provide a working example next time.
I am trying to reshape separate arrays of three components each Y - U - V into a YUV array. The size of this array input 64x64 and then will be resized to 224x224. How should I do it correctly?
This is for my input data before training a CNN network. So, the following is the way I use to do it.
orgY = np.loadtxt(path)
orgU = np.loadtxt(path)
orgV = np.loadtxt(path)
originalYUV = np.dstack((orgY, orgU, orgV))
originalYUV = array(originalYUV).reshape(1, 224, 224, 3)
When I run it, this shows me: ValueError: cannot reshape array of size 1 into shape (1,224,224,3). How should I fix it? I am afraid the code is wrong when I use np.dstack as I even cannot call width = originalYUV.shape[0]
I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue).
I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task.
ValueError: could not broadcast input array from shape (128,128,3) into shape (128,128)
Value Error: could not broadcast input array from shape (857,3) into shape (857)
Full Error Message:
Traceback (most recent call last): File
"/..../.../...../Batch_MFCC_Data.py",
line 68, in
X = np.array(MFCCs) ValueError: could not broadcast input array from shape (20,590) into shape (20)
Code Example:
all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)
MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels
for i, wav_path in enumerate(all_wav_paths):
individual_MFCC = MFCC_from_wav(wav_path)
#MFCC_from_wav() -> returns the MFCC coefficients
label = get_class(wav_path)
#get_class() -> returns the label of the wav file either 0 or 1
#add features and label to the array
MFCCs.append(individual_MFCC)
labels.append(label)
#Must convert the training data to a Numpy Array for
#train_test_split and saving to local drive
X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR
# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)
#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)
#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)
Here is a snapshot of the shape of the MFCC's (from .wav files) in the MFCCs array
The MFCCs array contains with the following shapes :
...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....
As you can see, the MFCC's in the MFCCs array don't all have the same shape, and this is because the recordings are not all the same lengths of time. Is this the reason why I can't convert the array to a numpy array? If this is the issue, how do I fix this issue to have the same shape throughout the MFCC array?
Any code snippets for accomplishing this and advice would be greatly appreciated!
Thanks!
Use the following logic to downsample the arrays to min_shape i.e. reduce larger arrays to min_shape
min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]
for idx, arr in enumerate(MFCCs):
MFCCs[idx] = arr[:, :min_shape[1]]
batch_arr = np.array(MFCCs)
And then you can stack these arrays in a batch array as in the below minimal example:
In [33]: a1 = np.random.randn(2, 3)
In [34]: a2 = np.random.randn(2, 5)
In [35]: a3 = np.random.randn(2, 10)
In [36]: MFCCs = [a1, a2, a3]
In [37]: min_shape = (2, 2)
In [38]: for idx, arr in enumerate(MFCCs):
...: MFCCs[idx] = arr[:, :min_shape[1]]
...:
In [42]: batch_arr = np.array(MFCCs)
In [43]: batch_arr.shape
Out[43]: (3, 2, 2)
Now for the second strategy, to upsample the arrays smaller arrays to max_shape, follow similar logic but fill the missing values with either zeros or nan values as you prefer.
And then again, you can stack the arrays as a batch array of shape (num_arrays, dim1, dim2); So, for your case, the shape should be (num_wav_files, 20, max_column)
I already have an array with shape (1, 224, 224), a single channel image. I want to change that to (1, 1, 224, 224). I have been trying
newarr.shape
#(1,224,224)
arr = np.array([])
np.append(arr, newarr, 1)
I always get this
IndexError: axis 1 out of bounds [0, 1). If i remove the axis as 0 , then the array gets flattened . What am I doing wrong ?
A dimension of 1 is arbitrary, so it sounds like you want to simply reshape the array. This can accomplished by:
newarr.shape = (1, 1, 244, 244)
or
newarr = newarr[None]
The only way to do an insert into a higher dimensional array is
bigger_arr = np.zeros((1, 1, 224, 224))
bigger_arr[0,...] = arr
In other words, make a target array of the right size, and assign values.
np.append is a booby trap. Avoid it.
Occasionally that's a useful way of thinking of this. But it's simpler, and quicker, to think of this as a reshape problem.
bigger_arr = arr.reshape(1,1,224,224)
bigger_arr = arr[np.newaxis,...]
arr.shape = (1,1,224,224) # a picky inplace change
bigger_arr = np.expand_dims(arr, 0)
This last one does
a.reshape(shape[:axis] + (1,) + a.shape[axis:])
which gives an idea of how to deal with dimensions programmatically.
I'm attempting to fill an h5py dataset with a series of numpy arrays that I generate in sequence so my memory can handle it.
The h5py array is initialised so that the first dimension can have any magnitude,
f.create_dataset('x-data', (1, maxlen, 50), maxshape=(None, maxlen, 50))
After generating each numpy array X, I am using
f['x-data'][alen:alen + len(data),:,:] = X
Where for example, in the first array, alen=0 and len(data)=10056. I then increment alen so the next array will start from where the last one ended.
print f['x-data'][alen:alen + len(data),:,:].shape, alen, len(data)
(1L, 60L, 50L) 0 10056
Does anyone know why the 0:10056 indexing is being interpreted as 1L?
I replicated your example, but on a much smaller scale. I had to do a resize each time I added elements, e.g.
f['xdata'].resize(50,axis=0)
The first time I tried to add a block I got an error:
TypeError: Can't broadcast (10, 20, 10) -> (1, 20, 10)
But subsequent times, when I'd outgrown the allocated space, it failed silently. No error, it just didn't end up storing the new values.
This is for version 2.2.1
I found the answer from a helpful person on the user group.
The maxshape(None) feature does not mean that the dataset automatically resizes - it must be resized each time new input is added. So the first dimension must be increased before adding new data:
x.resize((x.shape[0] + X.shape[0], X.shape[1], X.shape[2]))
y.resize((y.shape[0] + Y.shape[0], Y.shape[1], Y.shape[2]))
The dataset then adds the values correctly.