I want to train an autoencoder on mp3 songs. Given the size of the dataset, it would be better if only part of the dataset is in memory at any given time.
What I tried
is using tfio and tf.data.Dataset but that gives me an error when fitting the model.
ValueError: Cannot iterate over a shape with unknown rank.
The code was as follows
segment_length = 1024
filenames= tf.data.Dataset.list_files('data/*')
def decode_mp3(mp3_path):
mp3_path = mp3_path.numpy().decode("utf-8")
audio = tfio.audio.AudioIOTensor(mp3_path)
audio_tensor = tf.cast(audio[:], tf.float32)
overflow = len(audio_tensor) % segment_length
audio_tensor = audio_tensor[:-overflow, 0]
audio_tensor = tf.reshape(audio_tensor,(len(audio_tensor), 1))
audio_tensor = audio_tensor[:, 0]
return audio_tensor
song_dataset = filenames.map(lambda path:
tf.py_function(func=decode_mp3, inp=[path], Tout=tf.float32))
segment_dataset = song_dataset.flat_map(lambda song:
tf.data.Dataset.from_tensor_slices(song)).batch(segment_length)
dataset = segment_dataset.map(lambda x: (x, x)) # add labels (identical to inputs here)
With a model like so
encoder = keras.models.Sequential([
keras.layers.Input((segment_length, 1)),
keras.layers.Conv1D(128, 3, strides=2, padding="same"),
...
)]
but as I said, calling fit would throw the error above. Even though the shape is exactly as I would hope
for x,y in dataset.take(1):
print(x.shape, y.shape)
> (1024, 1) (1024, 1)
Any help on this would be appreciated. I might be misunderstanding something with input shapes and datasets.
So I finally found part of the answer. The Input layer seems to be meant for models with the functional API (?) and I removed it. Now the model is like this
encoder = keras.models.Sequential([
keras.layers.Conv1D(128, 3, strides=2, padding="same", input_shape=(segment_length, 1)),
...
where the Input layer is replaced with an input_shape parameter in the first Conv1D layer. Also I batched the dataset with
ds = dataset.batch(2)
and that was important too. Any further clarification would still be appreciated. None the less, I hope this can help people with the same problem.
Related
I am getting an error
Received incompatible tensor with shape (13, 224) when attempting to restore variable with shape (1, 224) and name layer_with_weights-0/kernel/.ATTRIBUTES/VARIABLE_VALUE.
I don't know what it means. Any lead to this could be helpful.
my Code is following -
callback = EarlyStopping(monitor='loss', patience=3)
def build_model(hp):
model=keras.Sequential() #.reshape(input_shape=(1,len(x_test),totalColumns))
#model.add(layers.Flatten()) #layers.Flatten(input_shape=(totalColumns,1)))
for i in range(hp.Int('layers',1,10)):
model.add(layers.Dense(units=hp.Int('units_'+str(i),32,512,step = 32),input_shape=(totalColumns,1),
activation=hp.Choice('act_'+str(i),['relu'])))
model.add(Dropout(hp.Float('Dropout_rate',min_value=0,max_value=0.5,step=0.1)))
model.add(layers.Dense(10,activation='softmax'))
model.compile(optimizer = keras.optimizers.Adam(hp.Choice('learning_rate',values=[1e-2,1e-4])),
loss='mean_squared_error',metrics=[tf.keras.metrics.RootMeanSquaredError()])
return model
tuner = RandomSearch(build_model,objective=keras_tuner.Objective("root_mean_squared_error", direction="min")
,max_trials = 1,executions_per_trial=1) #
tuner.search(x_train,y_train,epochs = 1,validation_data=(x_test,y_test),callbacks=[callback])
best_model = tuner.get_best_models(num_models=1)[0]
pred=best_model.predict(x_test)
rows,columns = pred.shape
test_pred = np.array([])
for i in range(rows):
value = pred[i].sum()
test_pred= np.append(test_pred, value)
My x_train is of shape (43130,13) and y_train is of shape (43130,1). and its a time-series data
I also don't know when I use layer.Flatten it works and I get prediction with shape (43130,10) instead of (43130,1)
Please suggest me few things where I am going wrong!
I'm trying to train a CNN model for a speech emotion recognition task using spectrograms as input. I've reshaped the spectrograms to have the shape (num_frequency_bins, num_time_frames, 1) which I thought would be sufficient, but upon trying to fit the model to the dataset, which is stored in a Tensorflow dataset, I got the following error:
Input 0 of layer "sequential_12" is incompatible with the layer: expected shape=(None, 257, 1001, 1), found shape=(257, 1001, 1)
I tried reshaping the spectrograms to have the shape (1, num_frequency_bins, num_time_frames, 1), but that produced an error when creating the Sequential model:
ValueError: Exception encountered when calling layer "resizing_14" (type Resizing).
'images' must have either 3 or 4 dimensions.
Call arguments received:
• inputs=tf.Tensor(shape=(None, 1, 257, 1001, 1), dtype=float32)
So I passed in the shape as (num_frequency_bins, num_time_frames, 1) when creating the model, and then fitted the model to the training data with the 4-dimensional data, but that raised this error:
InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/
So I'm kind of at a loss now. I genuinely have no idea what to do and how I can go about fixing this. I've read around but haven't come across anything useful. Would really appreciate any help.
Here's some of the code for context.
dataset = [[specgram_files[i], labels[i]] for i in range(len(specgram_files))]
specgram_files_and_labels_dataset = tf.data.Dataset.from_tensor_slices((specgram_files, labels))
def read_npy_file(data):
# 'data' stores the file name of the numpy binary file storing the features of a particular sound file
# item() returns numpy array of size 1 as a suitable python scalar.
# data.item() then returns the bytes string stored in the numpy array.
# decode() is then called on the bytes string to decode it from a bytes string to a regular string
# so that it can be passed as a parameter in np.load()
data = np.load(data.item().decode())
# Shape of data is now (1, rows, columns)
# Needs to be reshaped to (rows, columns, 1):
data = np.reshape(data, (data.shape[0], data.shape[1], 1))
return data.astype(np.float32)
specgram_dataset = specgram_files_and_labels_dataset.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)
num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)
specgram_dataset.shuffle(buffer_size=1000)
specgram_train_ds = specgram_dataset.take(num_train)
specgram_test_ds = specgram_dataset.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)
batch_size = 32
specgram_train_ds.batch(batch_size)
specgram_val_ds.batch(batch_size)
specgram_train_ds = specgram_train_ds.cache().prefetch(tf.data.AUTOTUNE)
specgram_val_ds = specgram_val_ds.cache().prefetch(tf.data.AUTOTUNE)
for specgram, label in specgram_train_ds.take(1):
input_shape = specgram.shape
num_emotions = len(train_df["emotion"].unique())
model = models.Sequential([
layers.Input(shape=input_shape),
# downsampling the input.
layers.Resizing(32, 128),
layers.Conv2D(32, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation="softmax"),
layers.Dense(num_emotions)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.01),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=["accuracy"]
)
EPOCHS = 10
model.fit(
specgram_train_ds,
validation_data=specgram_val_ds,
epochs=EPOCHS,
callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2)
)
Assuming you know your input_shape, I would recommend first hard-coding it into your model:
model = models.Sequential([
layers.Input(shape=(257, 1001, 1),
# downsampling the input.
layers.Resizing(32, 128),
layers.Conv2D(32, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation="softmax"),
layers.Dense(num_emotions)
])
Also, when using tf.data.Dataset.batch, you should assign the Dataset output to a variable:
batch_size = 32
specgram_train_ds = specgram_train_ds.batch(batch_size)
specgram_val_ds = specgram_val_ds.batch(batch_size)
Afterwards, make sure that specgram_train_ds really does have the correct shape:
specgrams, _ = next(iter(specgram_train_ds.take(1)))
assert specgrams.shape == (32, 257, 1001, 1)
I have 256×192 pixel images and search a small and fast cnn as a "pre scanner" to find interesting parts on a image (like 52x52, 32x32 chunks etc), which will be checked with a more complex cnn. The reason for this small cnn is the use within an embedded system with limited resources.
Unfortunately I'm new at this topic, tensorflow and keras. My first idea was to create an net with only one 2D conv which works like an 1d conv. In this case the kernel should have a height of 192 and a width of 1 (maybe later 3).
This is the model which I build on tensorflow 2:
# Model
model = models.Sequential()
model.add(layers.Conv2D(5, (1, 192), activation='relu', input_shape=(256, 192, 3)))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
The idea is to get one value per row which indicates if something "interesting" can be in this row. Based of this information and the neighbors a bigger part of the image will be cut out and feed into the more complex cnn.
I have prepared normal images with 256x192px and for each image a text file with 256 values (0.0 or 1.0). Each 0/1 represents one row and indicates if something interesting is in this row or not.
This was my naive plan but the training crashes immediately with an error that I dont understand:
ValueError: Dimensions must be equal, but are 32 and 256 for 'metrics/accuracy/Equal' (op: 'Equal') with input shapes: [32,256], [32,256,1].
I think I basic idea/strategy is wrong. I dont unterstand where the 32 comes from. Can someone explain my mistake? And is my idea even feasible?
Edit:
As requested the complete, dirty code. If there are some major flaws, please be forgiving. This is my first Python experiment.
import tensorflow as tf
import os
from tensorflow.keras import datasets, layers, models
from PIL import Image
from numpy import asarray
train_images = []
train_labels = []
test_images = []
test_labels = []
# Prepare
dir = os.listdir('images/gt_image')
split = len(dir)*0.2
c = 0
for file in dir:
c = c + 1
im = Image.open('images/gt_image/' + file)
data = im.load()
image = []
for x in range(0, im.size[0]):
row = []
for y in range(0, im.size[1]):
row.append([x/255 for x in data[x, y]])
image.append(row)
if c <= split:
test_images.append(image)
else:
train_images.append(image)
file = open('images/gt_labels/' + file + '.txt', 'r')
label = file.readlines()[0].split(', ')
if c <= split:
test_labels.append(label)
else:
train_labels.append(label)
print('prepare done')
# Model
model = models.Sequential()
model.add(layers.Conv2D(5, (1, 192), activation='relu', input_shape=(256, 192, 3)))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.summary()
print('compile done')
# Learning
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
I can't get keras.backend.function to work properly. I'm trying to follow this post:
How to calculate prediction uncertainty using Keras?
In this post they create a function f:
f = K.function([model.layers[0].input],[model.layers[-1].output]) #(I actually simplified the function a little bit).
In my neural network I have 3 inputs. When I try to compute f([[3], [23], [0.0]]) I get this error:
InvalidArgumentError: You must feed a value for placeholder tensor 'input_3' with dtype float and shape [?,1]
[[{{node input_3}} = Placeholder[dtype=DT_FLOAT, shape=[?,1], _device="/job:localhost/replica:0/task:0/device:CPU:0"]
Now I know using [[3], [23], [0.0]] as an input in my model doesn't give me an error during the testing phase. Can anyone tell me where I'm going wrong?
This is what my model looks like if it matters:
home_in = Input(shape=(1,))
away_in = Input(shape=(1,))
time_in = Input(shape = (1,))
embed_home = Embedding(input_dim = in_dim, output_dim = out_dim, input_length = 1)
embed_away = Embedding(input_dim = in_dim, output_dim = out_dim, input_length = 1)
embedding_home = Flatten()(embed_home(home_in))
embedding_away = Flatten()(embed_away(away_in))
keras.backend.set_learning_phase(1) #this will keep dropout on during the testing phase
model_layers = Dense(units=2)\
(Dropout(0.3)\
(Dense(units=64, activation = "relu")\
(Dropout(0.3)\
(Dense(units=64, activation = "relu")\
(Dropout(0.3)\
(Dense(units=64, activation = "relu")\
(concatenate([embedding_home, embedding_away, time_in]))))))))
model = Model(inputs=[home_in, away_in, time_in], outputs=model_layers)`
The function you have defined is only using one of the input layers (i.e. model.layers[0].input) as its input. Instead, it must use all the inputs so the model could be run. There are inputs and outputs attributes for the model which you can use to include all the inputs and outputs with less verbosity:
f = K.function(model.inputs, model.outputs)
Update: The shape of all the input arrays must be (num_samples, 1). Therefore, you need to pass a list of lists (e.g. [[3]]) instead of a list (e.g. [3]):
outs = f([[[3]], [[23]], [[0.0]]])
I am trying to figure out how to train a CNN in Keras without using ImageDataGenerator. Essentially I'm trying to figure out the magic behind the ImageDataGenerator class so that I don't have to rely on it for all my projects.
I have a dataset organized into 2 folders: training_set and test_set. Each of these folders contains 2 sub-folders: cats and dogs.
I am loading them all into memory using Keras' load_img class in a for loop as follows:
trainingImages = []
trainingLabels = []
validationImages = []
validationLabels = []
imgHeight = 32
imgWidth = 32
inputShape = (imgHeight, imgWidth, 3)
print('Loading images into RAM...')
for path in imgPaths:
classLabel = path.split(os.path.sep)[-2]
classes.add(classLabel)
img = img_to_array(load_img(path, target_size=(imgHeight, imgWidth)))
if path.split(os.path.sep)[-3] == 'training_set':
trainingImages.append(img)
trainingLabels.append(classLabel)
else:
validationImages.append(img)
validationLabels.append(classLabel)
trainingImages = np.array(trainingImages)
trainingLabels = np.array(trainingLabels)
validationImages = np.array(validationImages)
validationLabels = np.array(validationLabels)
When I print the shape() of the trainingImages and trainingLabels I get:
Shape of trainingImages: (8000, 32, 32, 3)
Shape of trainingLabels: (8000,)
My model looks like this:
model = Sequential()
model.add(Conv2D(
32, (3, 3), padding="same", input_shape=inputShape))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(len(classes)))
model.add(Activation("softmax"))
And when I compile and try to fit the data, I get:
ValueError: Error when checking target: expected activation_2 to have shape (2,) but got array with shape (1,)
Which tells me my data is not input into the system correctly. How can I properly prepare my data arrays without using ImageDataGenerator?
The error is because of your model definition instead of ImageDataGenerator (which I don't see used in the code you have posted). I am assuming that len(classes) = 2 because of the error message that you are getting. You are getting the error because the last layer of your model expects trainingLabels to have a vector of size 2 for each datapoint but your trainingLabels is a 1-D array.
For fixing this, you can either change your last layer to have just 1 unit because it's binary classification:
model.add(Dense(1))
or you can change your training and validation labels to vectors using one hot encoding:
from keras.utils import to_categorical
training_labels_one_hot = to_categorical(trainingLabels)
validation_labels_one_hot = to_categorical(validationLabels)