My main input feature is 60x256x256 numpy array that is meant to generate a 60x256x256 binary mask (also in the form of a numpy array). The binary mask functions as a label, but I do not know how to generate a 3D numpy array or tensor output from my neural network. This is my current code:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1),
activation='relu',
input_shape=(60, 256, 256)))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(tf.keras.layers.Conv2D(64, (5, 5), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1000, activation='relu'))
model.add(tf.keras.layers.Dense(256, activation='softmax'))
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.CosineSimilarity(),
metrics=[tf.keras.metrics.CosineSimilarity()],
)
model.fit(
train,
epochs=6,
validation_data=ds_valid,
)
In short, I want the output of the last layer to match the input layer so that it can work with the CosineSimilarity loss function. Any suggestions other than this CNN-based approach will also be very helpful, as it seems CNNs are mostly used for classification.
At the most basic level you can use tf.keras.layers.Reshape. See https://www.tensorflow.org/tutorials/generative/autoencoder
So your last two layers could be:
model.add(tf.keras.layers.Dense(60*256*256))
model.add(tf.keras.layers.Reshape(60, 256, 256))
However I think what you're looking for is an autoencoder type network and to usetf.keras.layers.Conv2DTranspose layers.
The above link is an intro to Autoencoders and should be a good starting point I think.
Not sure about your use case but I think it's very likely you do want to use a convolution based approach because when you flatten the convolution you are forcing your network to forget all the information about the symmetry of the problem (i.e that it is a picture in 2D space). I don't think the fact that your problem is a regression problem affects this.
Related
I am trying to make a classifier of voices, mine and others and then apply it to a future program. I used the CNN model for this, in training it gave very good results, I converted the audio to a spectrogram for CNN to understand.
The problem is in the prediction, I do the same of converting the audio to a spectrogram but it gives me this error.
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 129, 1071, 1), found shape=(None, 1071)
While in the model I put this and gave no error
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(129, 1071, 1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
This is my code for the prediction
### VOICE CLASSIFICATION ###
voice_model = load_model(os.path.abspath('Models/voiceclassify2.model'))
classes = ['Other', 'Bernardo']
sample = os.path.abspath('Voiceclassification/Data/me/5.wav')
samplerate, data = wavfile.read(str(sample))
# convert into spectogram
frecuencies, times, spectogram = signal.spectrogram(data, samplerate)
vc_prediction = voice_model.predict(spectogram)[0]
idx = np.argmax(vc_prediction)
label = classes[idx]
print(label, " | ", vc_prediction[idx]*100, "%")
any idea?
EDIT:
After some fiddling this was the solution:
On the one hand there was an error with the final dimension of the input (the 1 in the the input_shape). This represents the number of channels (think of RGB channels in an image). To expand our spectrogram we can use either
spectrogram = spectrogram.reshape(spectrogram.shape + (1,)) or
spectrogram = np.expand_dims(spectrogram, -1).
At this point the shape of spectrogram would be (129, 1071, 1).
On the other hand during inference the first dimension (129) was removed, because TensorFlow would interpret it as the batch dimension. You can solve this by wrapping the spectrogram in a (one element) NumPy array like this:
spectrogram = np.array([spectrogram])
Now spectrogram's shape is (1, 129, 1071, 1) which is exactly what we need.
Original:
This is definitely more of a comment than an answer but I cannot write those due to a lack in reputation, so feel free to move it to comments...
So the problem is that the expected shape (and thus the architecture of your network) and your data's shape don't match.
I guess that's because the predict() call expects you to hand over a batch (look at the first dimension of each shape) of samples to evaluate.
You may get around this by wrapping the spectrogram argument inside the predict call with a list: vc_prediction = voice_model.predict([spectogram])[0].
If this doesn't do the trick I'd recommend to further investigate the shapes of training and evaluation data, I like to do this during runtime in debug mode.
I have three different keras models namely multi-scale CNN, single-scale CNN, shallow CNN performing same task. I want to compare performance of these models. Almost every model achieved same accuracy in same time as I can see those graphs from model.history parameters like acc, val-acc. Now I want to specifically point out some differences among these models. Is there any way to observe the performance in much more detailed manner? and I want to make these graphs and I don't know how to plot them.
Model accuracy vs time
Model accuracy vs no.of input batches (my batch size is 5)
My CNN code is like this
def Single_Scale_Model():
model = Sequential()
model.add(Conv2D(20, (1, 3), activation='relu', kernel_initializer='glorot_uniform', data_format='channels_first', input_shape=(19, 1, 50)))
model.add(MaxPooling2D((1, 2), data_format='channels_first'))
model.add(Conv2D(40, (1, 3), activation='tanh', kernel_initializer='glorot_uniform', data_format='channels_first'))
model.add(MaxPooling2D((1, 2), data_format='channels_first'))
model.add(Conv2D(60, (1, 3), activation='relu', kernel_initializer='glorot_uniform', data_format='channels_first'))
model.add(MaxPooling2D((1, 3), data_format='channels_first'))
model.add(Flatten(data_format='channels_first'))
model.add(Dense(100, activation='relu'))
model.add(Dense(4, activation='softmax'))
#print(model.summary())
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
return model
other two models are also like this but with a slight change in terms of number of layers
Plots look same for both models(Model is meant to achieve 100% accuracy very fast.)
I have created a simple convolution network using keras that comes packed with tensorflow. I have trained the model and the accuracy looks good.
I have trained the network on 10 different classes. The network is able to differentiate between each of the 10 classes with an accuracy of 0.93.
Now, it is very much possible that there are multiple classes in the same image. Is there a way I could use my trained network to detect multiple objects in the same image? The best thing would be to get the coordinates/bounding-box around the objects detected, so that it is easier to test/visualize.
Here is how I wrote the network:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(256))
model.add(tf.keras.layers.Activation('elu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Activation('softmax'))
model.compile(
optimizer=tf.train.AdamOptimizer(learning_rate=1e-3, ),
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['sparse_categorical_accuracy']
)
def train_gen(batch_size):
while True:
offset = np.random.randint(0, x_train.shape[0] - batch_size)
yield x_train[offset:offset+batch_size], y_train[offset:offset + batch_size]
model.fit_generator(
train_gen(512),
epochs=15,
steps_per_epoch=100,
validation_data=(x_valid, y_valid)
)
This works fine. How could I use this network to detect multiple objects from the 10 classes? Would I have re-train the network in someway?
In order to teach your model to detect more than one class per image, you will need to perform a few changes to your model and data, and re-train it.
Your final activation will now need to be a sigmoid, since you will not predict a single class probability distribution anymore. Now you want each output neuron to predict a value between 0 and 1, with more than one neuron possibly having values close to 1.
Your loss function should now be binary_crossentropy, since you will treat each output neuron as an independent prediction, which you will compare to the true label.
As I see you have been using sparse_categorical_crossentropy, I assume your labels were integers. You will want to change your label encoding to one-hot style now, each label having a len equal to num_classes, and having 1's only at those positions where the image has that class, the rest being 0's.
With these changes, you can now re-train your model to learn to predict more than one class per image.
As for predicting bounding boxes around the objects, that is a very different and much more challenging task. Advanced models such as YOLO or CRNN can do this, but their structure is much more complex.
I'm new to machine learning and Keras. I made an Neural Network with Keras for regression looking like this:
model = Sequential()
model.add(Dense(57, input_dim=44, kernel_initializer='normal',
activation='relu'))
model.add(Dense(45, activation='relu'))
model.add(Dense(35, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(18, activation='relu'))
model.add(Dense(15, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='linear'))
My data after preprocessing has 44 dimensions, so could you please give me an example how could i make an CNN.
Originally it looks like this: https://scontent.fskp1-1.fna.fbcdn.net/v/t1.0-9/40159383_10204721730878434_598395145989128192_n.jpg?_nc_cat=0&_nc_eui2=AeEYA4Nb3gomElC9qt0kF6Ou86P7jidco_LeHxEkmCB0-oVA9YKVe9VAh41SF25YomKTqKdkS96E18-sTCBidxJdbml4OV7FvFuAOWxI4mRafQ&oh=e81f4f56ebdf15e9c6eefbb078b8a982&oe=5BFD4157
Convolution neural network is not the best choice in this case. BTW you can do this thing easily with Conv1d:
model = keras.Sequential()
model.add(keras.layers.Embedding(44, 100))
model.add(keras.layers.Conv1D(50, kernel_size=1, strides=1))
model.add(keras.layers.GlobalAveragePooling1D())
# model.add(keras.layers.Dense(10, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))
To answer your question upfront I don't think you can use CNNs for your problem. Generally when people say they are using CNNs they usually mean the 2D convolution. It is operated on 2D spatial data (images). In NLP there exists 1D Convolution which people use to find local patterns in sequentual data. I don't think 1D convolution is relevant in your case. If you are from ML background you can think of regression using feed forward neural networks as polynomial regression. Intuitively you let the network decide which polynomial degree should we use to fit the data properly
You can add 2Dconvnet-layers like this:
model.add(Conv2D(32, (3, 3), input_shape=(3, 150, 150)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
where
model.add(Conv2D(<feature maps>, (<kernel size>), input_shape=(<imput-tensor-shape)))
But be careful, 2Dconfnet-layers are mathematically different than dense-layers, so you can't stack them easily. To stack 2Dconvnet-layers with dense layers, you'll have to flatten them (you'll normally do this at the end to get your "fully-connected layer"):
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Activation('relu'))
You'll find a lot of good tutorials on creating conv-nets with keras. This one for example focuses on image recognition. The examples above are taken from this article.
To find out, what a convolutional network does, I'd recommend you this article.
Edit:
But I share the opinion, that it might not be useful to use 2DConvnet layers for your example. Your data structure seems kind of "flat" and 2Dconvnets only make sense, when you have some multidimensional tensors as inputs.
I'm a student in acoustics and really new at deep learning. My goal is to get a good understanding in how a CNN exactly works. There is one part that I don't understand. I can't find any precise information about that.
My model is something like this:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape = input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(48, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(ndim, activation='relu', use_bias=True, batch_size=batchSize, kernel_initializer='glorot_uniform', kernel_regularizer=None))
model.add(Dense(nclasses, activation='softmax', kernel_regularizer=l2(1e-2)))
model.compile(loss='categorical_crossentropy', optimizer=opt)
It works, that's not the problem. I know, that the input of second conv-layer consists of 32 feature maps (output of first pooling-layer).
What is every single kernels of the second conv-layer exactly convoluted with?
Thank you for your time and help!
As my knowledge, if your input image is M*N, then the output for the first Conv2D is M*N with depth 32, that is M*N*32. And (M/2)*(N/2)*32 after the first max pooling. So the input for the second Conv2D is the (M/2)*(N/2)*32 matrix (tensor). Then the second Conv2D convolution the 32 (M/2)*(N/2) 2D matrix (tensor) to (M/2)*(N/2) * 64 matrix.
To specify how the convolution behave. I used tensorflow for deep learning, the statement below will also help you understand CNN.
Input image size [M, N, 3], 3 for image deepth (RGB for example), with this size of image the first convolution should have size [3, 3, 3, 32], first two 3s for convolution window size, the third 3 for the depth, 32 for output depth as your example. Then to do a second convolution should have size [3, 3, 32, 64], the third number 32 must be the same as the first conv output depth.
For this we can see, convolution is done with multiply 3*3 windows, that is one depth to one convolution window. In your example, the second conv should have 3*3*32 parameters to conv your M*N*32 output from the first conv.
Hope this is what you want and that I have state it clearly.