I have a set of 100x100 images, and an output array corresponding to the size of the input (i.e. length of 10000), where each element can be an 1 or 0.
I am trying to write a python program using TensorFlow/Keras to train a CNN on this data, however, I am not sure how to setup the layers to handle it, or the type of network to use.
Currently, I am doing the following (based off the TensorFlow tutorials):
model = keras.Sequential([
keras.layers.Flatten(input_shape=(100, 100)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10000, activation=tf.nn.softmax)
])
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
However, I can't seem to find what type of activation I should be using for the output layer to enable me to have multiple output values?
How would I set that up?
I am not sure how to setup the layers to handle it.
Your code is one way to handle that but as you might read in literature, is not the best one. State-of-the-art models usually use 2D Convolution Neural Networks. E.g:
img_input = keras.layers.Input(shape=img_shape)
conv1 = keras.layers.Conv2D(16, 3, activation='relu', padding='same')(img_input)
pol1 = keras.layers.MaxPooling2D(2)(conv1)
conv2 = keras.layers.Conv2D(32, 3, activation='relu', padding='same')(pol1)
pol2 = keras.layers.MaxPooling2D(2)(conv2)
conv3 = keras.layers.Conv2D(64, 3, activation='relu', padding='same')(pol2)
pol3 = keras.layers.MaxPooling2D(2)(conv3)
flatten = keras.layers.Flatten()(pol3)
dens1 = keras.layers.Dense(512, activation='relu')(flatten)
dens2 = keras.layers.Dense(512, activation='relu')(dens1)
drop1 = keras.layers.Dropout(0.2)(dens2)
output = keras.layers.Dense(10000, activation='softmax')(drop1)
I can't seem to find what type of activation I should be using for the
output layer to enable me to have multiple output values
Softmax is a good choice. It squashes a K-dimensional vector of arbitrary real values to a K-dimensional vector of real values, where each entry is in the range (0, 1].
You can pas output of your Softmax to top_k function to extract top k prediction:
softmax_out = tf.nn.softmax(logit)
tf.nn.top_k(softmax_out, k=5, sorted=True)
If you need multi-label classification you should change the above network. Last Activation function will change to sigmoid:
output = keras.layers.Dense(10000, activation='sigmoid')(drop1)
Then use tf.round and tf.where to extract labels:
indices = tf.where(tf.round(output) > 0.5)
final_output = tf.gather(x, indices)
Related
I'm working on a machine learning project with convolutional neural networks using TF/Keras in Python, and my goal is to split up an image up into patches, run a convolution on each one separately, and then put it back together.
What I can't figure out how to do is run a convolution for each slice of a 3D array.
For example, if I have a tensor of size (500,100,100) I want to do a separate convolution for all 500 slices of size (100 x 100). I'm implementing this within a custom Keras layer and want these to be trainable weights I've tried a few different things:
Using map.fn() to run a convolution for each slice of the array
This doesn't seem to attach weights to each layer separately.
Using the DepthwiseConv2D layer:
This works well for the first call of the layer, but fails when I call the layer the second time with more filters because it wants to perform the depthwise convolution on each of the previous filtered layers
This, of course isn't what I want because I want one convolution for each of the previous sets of filters from the previous layer.
Any ideas are appreciated, as I'm truly stuck here. Thank you!
If you have a tensor with shape (500,100,100) and want to feed some subset of this tensor, to separate conv2d layers at the same time, you may do this by defining conv2d layers in the same level. You should first define Lambda layers to split input, then feed their output to Conv2D layers, then concatenate them.
Let's take a tensor with shape (100,28,28,1) as an example, that we want to split it into 2 subset tensor and apply conv2d layers on each subset separately:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Input, concatenate, Lambda
from tensorflow.keras.models import Model
# define a sample dataset
x = tf.random.uniform((100, 28, 28, 1))
y = tf.random.uniform((100, 1), dtype=tf.int32, minval=0, maxval=9)
ds = tf.data.Dataset.from_tensor_slices((x, y))
ds = ds.batch(16)
def create_nn_model():
input = Input(shape=(28,28,1))
b1 = Lambda(lambda a: a[:,:14,:,:], name="first_slice")(input)
b2 = Lambda(lambda a: a[:,14:,:,:], name="second_slice")(input)
d1 = Conv2D(64, 2, padding='same', activation='relu', name="conv1_first_slice")(b1)
d2 = Conv2D(64, 2, padding='same', activation='relu', name="conv2_second_slice")(b2)
x = concatenate([d1,d2], axis=1)
x = Flatten()(x)
x = Dense(64, activation='relu')(x)
out = Dense(10, activation='softmax')(x)
model = Model(input, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = create_nn_model()
tf.keras.utils.plot_model(model, show_shapes=True)
Here is the plotted model architecture:
I am working with a CNN and my professor wants me to try and include some information that is relevant, but isn't available in the images itself. As of right now, the data is a 1-D array. He thinks that adding it after the flattening layer and before the dense layers should be possible but neither of us are quite knowledgeable enough for it yet.
model = Sequential()
for i, feat in enumerate(args.conv_f):
if i==0:
model.add(Conv2D(feat, input_shape=x[0].shape, kernel_size=3, padding = 'same',use_bias=False))
else:
model.add(Conv2D(feat, kernel_size=3, padding = 'same',use_bias=False))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=args.conv_act))
model.add(Conv2D(feat, kernel_size=3, padding = 'same',use_bias=False))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=args.conv_act))
model.add(Dropout(args.conv_do[i]))
model.add(Flatten())
#Input code here
denseArgs = {'use_bias':False}
for i,feat in enumerate(args.dense_f):
model.add(Dense(feat,**denseArgs))
model.add(BatchNormalization())
model.add(LeakyReLU(alpha=args.dense_act))
model.add(Dropout(args.dense_do[i]))
model.add(Dense(1))
We could be wrong, obviously, so any help is appreciated! Thanks!
One approach I know of requires the use of the functional API of keras.
This means you would have to drop the Sequential approach you are currently using.
Using a toy model as an example, let the bloc:
img_input = Input((64, 64, 1))
model = Conv2D(20, (5, 5))(img_input)
model = MaxPooling2D((2, 2))(model)
model = Flatten()(model)
be the convolutional layers of a CNN with final flattening.
It is possible to add information by concatenating the last model layer with the new information. The new information can be packaged by creating a short model (here af_input) with just an input layer.
As an example:
af_input = Input(shape=(2,))
model = Concatenate()([model, af_input])
model = Dense(120, activation='relu')(model)
model = Dropout(0.1)(model)
model = Dense(100, activation='relu')(model)
predictions = Dense(2)(model)
fullmodel = Model(inputs=[img_input,af_input], outputs=predictions)
So now the results of the flatten layer of the CNN will be concatenated with a vector of extra information (here 2 features).
You can then keep adding layers to the networks as usual.
I suggest you check the stackoverflow link:
How to concatenate two layers in keras?
for another example and a good explanation.
I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.
During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:
# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, calling fit works!
But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64, similar to the one presented here.
# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))
cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1
# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)
# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)
# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
# [[{{node ctc_loss_1/CTCLoss}}]]
After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.
TL;DR Solution:
If you're using ctc_batch_cost:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.
If you're using ctc_loss:
Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.
Solution:
The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.
The TensorFlow documentation for ctc_batch_cost reads:
tf.keras.backend.ctc_batch_cost(
y_true, y_pred, input_length, label_length
)
...
input_length tensor (samples, 1) containing the sequence length for
each batch item in y_pred.
...
input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation:
tf.nn.ctc_loss(
labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
blank_index=None, name=None
)
...
logit_length tensor of shape [batch_size] Length of input sequence in
logits.
...
That's where it clicked, at the word logit. So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (i.e. number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.
I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.
Now fit runs without crashing.
I am working on a regression CNN using Keras/Tensorflow. I have a multi-output feed-forward model that I have trained up with some success. The model takes in a 201x201 grayscale image and returns two regression targets.
Here is an example of an input/target pair:
is associated with (z=562.59, a=4.53)
There exists an analytical solution for this problem, so I know it's solvable.
Here is the model architecture:
model_input = keras.Input(shape=input_shape, name='image')
x = model_input
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Conv2D(16, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size = (4,4))(x)
x = Flatten()(x)
model_outputs = list()
out_names = ['z', 'a']
for i in range(2):
out_name = out_names[i]
local_output= x
local_output = Dense(10, activation='relu')(local_output)
local_output = Dropout(0.2)(local_output)
local_output = Dense(units=1, activation='linear', name = out_name)(local_output)
model_outputs.append(local_output)
model = Model(model_input, model_outputs)
model.compile(loss = 'mean_squared_error', optimizer='adam', loss_weights = [1,1])
My targets are on different scales, so I normalized one of them (name 'a') to the range [0,1] for training. Here is how I rescale:
def rescale(min, max, list):
scalar = 1./(max-min)
list = (list-min)*scalar
return list
Where min,max for each parameter are known a priori and are constant.
Here is how I trained:
model.fit({'image' : x_train},
{'z' : z_train, 'a' : a_train},
batch_size = 32,
epochs=20,
verbose=1,
validation_data = ({'image' : x_test},
{'z' : z_test, 'a' : a_test}))
When I predict for 'a', I get a fairly good accuracy, but with an offset:
This is a fairly easy thing to fix, I just apply a linear fit to the predictions and invert it to rescale:
But I can't think of a reason why this would be happening in the first place. I've used this same model architecture for other problems, and I get that same offset again. Has anyone seen this sort of thing before?
EDIT: This offset occurs in multiple different models of mine, which each predict different parameters but are rescaled/preprocessed in the same way. It happens regardless of how many epochs I train for, with more training resulting in predictions hugging the green line (in the first graph) more closely.
As a temporary work-around, I trained a single-node model to take the input as the original model's prediction and the output as the ground truth. This trained up nicely, and corrects the offset. What's strange though, is that I can apply this rescale model to ANY of the models with this issue, and it corrects the offset equally well.
Essentially: the offset has the same weight for multiple different models, which predict completely different parameters. This makes me think there is something to do with the activation or regularization.
I would like to combine 2 neural networks which are showing probabilities of classes.
One says that it is a cat on the image.
The second says that the cat has a collar.
How to use softmax activation function on the output of the neural network?
Please, see the picture to understand the main idea:
You can use the functional API to create a multi-output network. Essentially every output will be a separate prediction. Something along the lines of:
in = Input(shape=(w,h,c)) # image input
latent = Conv...(...)(in) # some convolutional layers to extract features
# How share the underlying features to predict
animal = Dense(2, activation='softmax')(latent)
collar = Dense(2, activation='softmax')(latent)
model = Model(in, [animal, coller])
model.compile(loss='categorical_crossentropy', optimiser='adam')
You can have as many separate outputs you like. If you have only binary features you can have a single vector output as well, Dense(2, activation='sigmoid') and first entry could predict cat or not, while second whether it has a collar. This would be multi-class multi-label setup.
Juste create two separate dense layers (with sofmax activation) at the end of your model, e.g.:
from keras.layers import Input, Dense, Conv2D
from keras.models import Model
# Input example:
inputs = Input(shape=(64, 64, 3))
# Example of model:
x = Conv2D(16, (3, 3), padding='same')(inputs)
x = Dense(512, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# ... (replace with your actual layers)
# Then add two separate layers taking the previous output and generating two estimations:
cat_predictions = Dense(2, activation='softmax')(x)
collar_predictions = Dense(2, activation='softmax')(x)
model = Model(inputs=inputs, outputs=[cat_predictions, collar_predictions])