I'm trying to train a CNN model for a speech emotion recognition task using spectrograms as input. I've reshaped the spectrograms to have the shape (num_frequency_bins, num_time_frames, 1) which I thought would be sufficient, but upon trying to fit the model to the dataset, which is stored in a Tensorflow dataset, I got the following error:
Input 0 of layer "sequential_12" is incompatible with the layer: expected shape=(None, 257, 1001, 1), found shape=(257, 1001, 1)
I tried reshaping the spectrograms to have the shape (1, num_frequency_bins, num_time_frames, 1), but that produced an error when creating the Sequential model:
ValueError: Exception encountered when calling layer "resizing_14" (type Resizing).
'images' must have either 3 or 4 dimensions.
Call arguments received:
• inputs=tf.Tensor(shape=(None, 1, 257, 1001, 1), dtype=float32)
So I passed in the shape as (num_frequency_bins, num_time_frames, 1) when creating the model, and then fitted the model to the training data with the 4-dimensional data, but that raised this error:
InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/
So I'm kind of at a loss now. I genuinely have no idea what to do and how I can go about fixing this. I've read around but haven't come across anything useful. Would really appreciate any help.
Here's some of the code for context.
dataset = [[specgram_files[i], labels[i]] for i in range(len(specgram_files))]
specgram_files_and_labels_dataset = tf.data.Dataset.from_tensor_slices((specgram_files, labels))
def read_npy_file(data):
# 'data' stores the file name of the numpy binary file storing the features of a particular sound file
# item() returns numpy array of size 1 as a suitable python scalar.
# data.item() then returns the bytes string stored in the numpy array.
# decode() is then called on the bytes string to decode it from a bytes string to a regular string
# so that it can be passed as a parameter in np.load()
data = np.load(data.item().decode())
# Shape of data is now (1, rows, columns)
# Needs to be reshaped to (rows, columns, 1):
data = np.reshape(data, (data.shape[0], data.shape[1], 1))
return data.astype(np.float32)
specgram_dataset = specgram_files_and_labels_dataset.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)
num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)
specgram_dataset.shuffle(buffer_size=1000)
specgram_train_ds = specgram_dataset.take(num_train)
specgram_test_ds = specgram_dataset.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)
batch_size = 32
specgram_train_ds.batch(batch_size)
specgram_val_ds.batch(batch_size)
specgram_train_ds = specgram_train_ds.cache().prefetch(tf.data.AUTOTUNE)
specgram_val_ds = specgram_val_ds.cache().prefetch(tf.data.AUTOTUNE)
for specgram, label in specgram_train_ds.take(1):
input_shape = specgram.shape
num_emotions = len(train_df["emotion"].unique())
model = models.Sequential([
layers.Input(shape=input_shape),
# downsampling the input.
layers.Resizing(32, 128),
layers.Conv2D(32, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation="softmax"),
layers.Dense(num_emotions)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.01),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=["accuracy"]
)
EPOCHS = 10
model.fit(
specgram_train_ds,
validation_data=specgram_val_ds,
epochs=EPOCHS,
callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2)
)
Assuming you know your input_shape, I would recommend first hard-coding it into your model:
model = models.Sequential([
layers.Input(shape=(257, 1001, 1),
# downsampling the input.
layers.Resizing(32, 128),
layers.Conv2D(32, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation="softmax"),
layers.Dense(num_emotions)
])
Also, when using tf.data.Dataset.batch, you should assign the Dataset output to a variable:
batch_size = 32
specgram_train_ds = specgram_train_ds.batch(batch_size)
specgram_val_ds = specgram_val_ds.batch(batch_size)
Afterwards, make sure that specgram_train_ds really does have the correct shape:
specgrams, _ = next(iter(specgram_train_ds.take(1)))
assert specgrams.shape == (32, 257, 1001, 1)
Related
This is the code in question:
batch_size = 1
epochs = 1
begin_from_row = 3
rows_to_train = 1000
data = data.loc[begin_from_row:rows_to_train, :]
data['Close_next'] = data['Close'].shift(-1)
data = data.dropna()
output_data = data['Close_next']
input_data = data.drop(columns=['Close_next'])
input_size = 9
output_size = 1
hidden_size_1 = 9
input_layer = tf.keras.Input(batch_shape=(batch_size, input_size))
input_layer_expanded = tf.expand_dims(input_layer, axis=-1)
hidden_1 = tf.keras.layers.LSTM(hidden_size_1, stateful=True)(input_layer_expanded)
output_layer = tf.keras.layers.Dense(1, activation='relu')(hidden_1)
model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='mean_squared_error', optimizer='adam', run_eagerly=True)
model.fit(input_data, output_data, epochs=epochs)
model.save("model_1.h5")
It returns the following error:
Input 0 of layer "lstm" is incompatible with the layer: expected shape=(1, None, 1), found shape=(32, 9, 1)
I can’t quite get where it gets the number 32 from, since it soesn’™ appear anywhere in my code
The code works when I specify the batch_size=32, just for one batch. The number 32 doesn’t appear anywhere in the code, so I would like to know where it’s coming from.
The vector (32, 9, 1) represents the size of your input data, whereas (1, None, 1) is the expected shape that you defined in the InputLayer (batch_size = 1).
Batch_size 32 is the default value in the fit method. The default value is being used since you did not specify the batch_size argument and you are not using tensorflow datasets:
batch_size: Integer or None. Number of samples per gradient update. If
unspecified, batch_size will default to 32. Do not specify the
batch_size if your data is in the form of datasets, generators, or
keras.utils.Sequence instances (since they generate batches).
From: Tensorflow fit function
I am doing a binary regression problem using keras.
The input shape is: (None, 2, 94, 3) (channels is the last dimension)
I have the following architecture:
input1 = Input(shape=(time, n_rows, n_channels))
masking = Masking(mask_value=-999)(input1)
convlstm = ConvLSTM1D(filters=16, kernel_size=15,
data_format='channels_last',
activation="tanh")(masking)
dropout = Dropout(0.2)(convlstm)
flatten1 = Flatten()(dropout)
outputs = Dense(n_outputs, activation='sigmoid')(flatten1)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss=keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
However when training I get this error: Dimensions must be equal, but are 94 and 80 for '{{node conv_lstm1d/while/SelectV2}} = SelectV2[T=DT_FLOAT](conv_lstm1d/while/Tile, conv_lstm1d/while/mul_5, conv_lstm1d/while/Placeholder_2)' with input shapes: [?,94,16], [?,80,16], [?,80,16].
If I remove the masking layer this error disappears, what is the masking doing that triggers this error? Also the only way I was able to run the above architecture was with a kernel_size of 1.
Seems like the ConvLSTM1D layer needs a mask with the shape (samples, timesteps) according to the docs. The mask you are calculating has the shape (samples, time, rows). Here is one solution to fix your problem but I am not sure if it is the 'correct' way to go:
import tensorflow as tf
input1 = tf.keras.layers.Input(shape=(2, 94, 3))
masking = tf.keras.layers.Masking(mask_value=-999)(input1)
convlstm = tf.keras.layers.ConvLSTM1D(filters=16, kernel_size=15,
data_format='channels_last',
activation="tanh")(inputs = masking, mask = tf.reduce_all(masking._keras_mask, axis=-1))
dropout = tf.keras.layers.Dropout(0.2)(convlstm)
flatten1 = tf.keras.layers.Flatten()(dropout)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(flatten1)
model = tf.keras.Model(inputs=input1, outputs=outputs)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
This line mask = tf.reduce_all(masking._keras_mask, axis=-1) essentially reduces your mask to (samples, timesteps) by applying an AND operation to the last dimension of the mask. Alternatively, you could just create your own custom mask layer:
import tensorflow as tf
class Reduce(tf.keras.layers.Layer):
def __init__(self):
super(Reduce, self).__init__()
def call(self, inputs):
return tf.reduce_all(tf.reduce_any(tf.not_equal(inputs, -999), axis=-1, keepdims=False), axis=1)
input1 = tf.keras.layers.Input(shape=(2, 94, 3))
reduce_layer = Reduce()
boolean_mask = reduce_layer(input1)
convlstm = tf.keras.layers.ConvLSTM1D(filters=16, kernel_size=15,
data_format='channels_last',
activation="tanh")(inputs = input1, mask = boolean_mask)
dropout = tf.keras.layers.Dropout(0.2)(convlstm)
flatten1 = tf.keras.layers.Flatten()(dropout)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(flatten1)
model = tf.keras.Model(inputs=input1, outputs=outputs)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
print(model.summary(expand_nested=True))
x = tf.random.normal((50, 2, 94, 3))
y = tf.random.uniform((50, ), maxval=3, dtype=tf.int32)
model.fit(x, y)
I want to train an autoencoder on mp3 songs. Given the size of the dataset, it would be better if only part of the dataset is in memory at any given time.
What I tried
is using tfio and tf.data.Dataset but that gives me an error when fitting the model.
ValueError: Cannot iterate over a shape with unknown rank.
The code was as follows
segment_length = 1024
filenames= tf.data.Dataset.list_files('data/*')
def decode_mp3(mp3_path):
mp3_path = mp3_path.numpy().decode("utf-8")
audio = tfio.audio.AudioIOTensor(mp3_path)
audio_tensor = tf.cast(audio[:], tf.float32)
overflow = len(audio_tensor) % segment_length
audio_tensor = audio_tensor[:-overflow, 0]
audio_tensor = tf.reshape(audio_tensor,(len(audio_tensor), 1))
audio_tensor = audio_tensor[:, 0]
return audio_tensor
song_dataset = filenames.map(lambda path:
tf.py_function(func=decode_mp3, inp=[path], Tout=tf.float32))
segment_dataset = song_dataset.flat_map(lambda song:
tf.data.Dataset.from_tensor_slices(song)).batch(segment_length)
dataset = segment_dataset.map(lambda x: (x, x)) # add labels (identical to inputs here)
With a model like so
encoder = keras.models.Sequential([
keras.layers.Input((segment_length, 1)),
keras.layers.Conv1D(128, 3, strides=2, padding="same"),
...
)]
but as I said, calling fit would throw the error above. Even though the shape is exactly as I would hope
for x,y in dataset.take(1):
print(x.shape, y.shape)
> (1024, 1) (1024, 1)
Any help on this would be appreciated. I might be misunderstanding something with input shapes and datasets.
So I finally found part of the answer. The Input layer seems to be meant for models with the functional API (?) and I removed it. Now the model is like this
encoder = keras.models.Sequential([
keras.layers.Conv1D(128, 3, strides=2, padding="same", input_shape=(segment_length, 1)),
...
where the Input layer is replaced with an input_shape parameter in the first Conv1D layer. Also I batched the dataset with
ds = dataset.batch(2)
and that was important too. Any further clarification would still be appreciated. None the less, I hope this can help people with the same problem.
I'm trying to fit a CNN Keras model, feeding it with data handled by the Datasets API from Tensorflow. However, I stumble again and again upon the same Exception, despite following the official documentation (see there):
ValueError: No data provided for "conv2d_8_input". Need data for each key in: ['conv2d_8_input']
# conv2d_8 is the first Conv2D layer of my model, see below
I'm using the MNIST dataset from tensorflow-datasets, images are normalized and class labels are converted into one-hot encodings. You can see an excerpt from the code below.
test_data, train_data = tfds.load("mnist", split=Split.ALL.subsplit([1, 3]))
# [...] Images are normalized using Dataset.map method
# [...] Labels are converted into one-hot encodings as well, using tf.one_hot function
model = keras.Sequential([
keras.layers.Conv2D(
32,
kernel_size=5,
padding="same",
input_shape=(28, 28, 1),
activation="relu",
),
keras.layers.MaxPooling2D(
(2, 2),
padding="same"
),
keras.layers.Conv2D(
64,
kernel_size=5,
padding="same",
activation="relu"
),
keras.layers.MaxPooling2D(
(2, 2),
padding="same"
),
keras.layers.Flatten(),
keras.layers.Dense(
512,
activation="relu"
),
keras.layers.Dropout(rate=0.4),
keras.layers.Dense(10, activation="softmax")
])
model.compile(
optimizer=tf.train.AdamOptimizer(0.01),
loss="categorical_crossentropy",
metrics=["accuracy"]
)
train_data = train_data.batch(32).repeat()
test_data = test_data.batch(32).repeat()
model.fit(
train_data,
epochs=10,
steps_per_epoch=30,
validation_data=test_data,
validation_steps=3
) # The exception occurs at this step
I don't understand why it doesn't work, I tried to feed the fit method with one shot iterators instead of the datasets, but I get the same result. I'm not used to Keras and TensorFlow (I usually work with PyTorch), so I think I may be missing something obvious.
For those coming to this page after following TF 2.0 Beta tutorial on Loading images (https://www.tensorflow.org/beta/tutorials/load_data/images):
I was able to avoid the error by returning a tuple in the preprocess_image function
def preprocess_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [192, 192])
image /= 255.0 # normalize to [0,1] range
return (image,image)
I am not using the labels in my Use Case so you might have to do other changes to follow the tutorial
Ok, I got it. I enabled eager execution to see if Keras would yield a more precise exception, and I got this:
ValueError: Output of generator should be a tuple `(x, y, sample_weight)` or `(x, y)`. Found: {'image': <tf.Tensor: id=1012, shape=(32, 28, 28, 1), dtype=float64, numpy=array([...])>, 'label': <tf.Tensor: id=1013, shape=(32, 10), dtype=uint8, numpy=array([...]), dtype=uint8)>}
Indeed, the components of my datasets (images and their associated labels) have names ("image" and "label"), because this is how tensorflow_datasets loads them. As a result, an iterator on the datasets yields a dictionary with two values: "image" and "label".
However, Keras expects a tuple of two values (inputs, targets) (or three values (inputs, targets, sample_wheights)), and it doesn't like the dictionary yielded by the Dataset iterator (hence the error I got).
I added the following code before model.fit:
train_data = train_data.map(lambda x: tuple(x.values()))
test_data = test_data.map(lambda x: tuple(x.values()))
And it works.
You can load data from tensorflow-datasets directly as a tuple using as_supervised
test_data, train_data = tfds.load("mnist", split=tfds.Split.ALL.subsplit([1, 3]), as_supervised=True)
i'm new to NN and trying to create a simple NN for image understanding.
I tried using the triplet loss method, but keep getting errors that made me think i'm missing some fundamental concept.
My code is :
def triplet_loss(x):
anchor, positive, negative = tf.split(x, 3)
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), ALPHA)
loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
return loss
def build_model(input_shape):
K.set_image_data_format('channels_last')
positive_example = Input(shape=input_shape)
negative_example = Input(shape=input_shape)
anchor_example = Input(shape=input_shape)
embedding_network = create_embedding_network(input_shape)
positive_embedding = embedding_network(positive_example)
negative_embedding = embedding_network(negative_example)
anchor_embedding = embedding_network(anchor_example)
merged_output = concatenate([anchor_embedding, positive_embedding, negative_embedding])
loss = Lambda(triplet_loss, (1,))(merged_output)
model = Model(inputs=[anchor_example, positive_example, negative_example],
outputs=loss)
model.compile(loss='mean_absolute_error', optimizer=Adam())
return model
def create_embedding_network(input_shape):
input_shape = Input(input_shape)
x = Conv2D(32, (3, 3))(input_shape)
x = PReLU()(x)
x = Conv2D(64, (3, 3))(x)
x = PReLU()(x)
x = Flatten()(x)
x = Dense(10, activation='softmax')(x)
model = Model(inputs=input_shape, outputs=x)
return model
Every image is read using:
imageio.imread(imagePath, pilmode="RGB")
And the shape of each image:
(1024, 1024, 3)
Then i use my own triplet method (just creating 3 sets of anchor, positive and negative)
triplets = get_triplets(data)
triplets.shape
The shape is (number of examples, triplet, x_image, y_image, number of channels
(RGB)):
(20, 3, 1024, 1024, 3)
Then i use the build_model function:
model = build_model((1024, 1024, 3))
And the problem starts here:
model.fit(triplets, y=np.zeros(len(triplets)), batch_size=1)
For this line of code when i'm trying to train my model i'm getting this error:
For more details, my code is in this collab notebook
The pictures i used can be found in this Drive
For this to run seamlessly - place this folder under
My Drive/Colab Notebooks/images/
For anyone also struggling
My problem was actually the dimension of each observation.
By changing the dimension as suggested in the comments
(?, 1024, 1024, 3)
The colab notebook updated with the solution
P.s - i also changed the size of the pictures to 256 * 256 so that the code will run much faster on my pc.