With Reference to the concept of Capsule Network, I am trying to extract just the classification module from the Intel's Implementation of Capsules in Keras, As I don't need the decoder or reconstruction part that is provided in the mentioned link.
My Try:
So I just commented out the decoder part of the network
#dec = Dense(512, activation='relu')(mask)
#dec = Dense(1024, activation='relu')(dec)
#dec = Dense(784, activation='sigmoid')(dec)
#dec = Reshape(input_shape)(dec)
and the decoder argument in the following line
#model = Model([x, mask_input], [output_capsule, dec])
model = Model([x, mask_input], [output_capsule])
model.compile(optimizer='adam', loss=[ margin_loss, 'mae' ], metrics=[ margin_loss, 'mae', 'accuracy'])
model.fit([X, Y], [Y, X], batch_size=128, epochs=3, validation_split=0.2)
Error
I am getting the following error.
ValueError: When passing a list as loss, it should have one entry per model outputs. The model has 1 outputs, but you passed loss=[<function margin_loss at 0x0000020C3E7A30D0>, 'mae']
Help Required:
Can somebody guide me how can I use that only the classification part of the Module because I have images that have dimension 90 x 90 and I want to use the classification part to check the accuracy and later on I would analyze each capsules?
To answer why you are getting that error:
The original model has a two outputs. Each output has its own loss function. In this case, the outputs are [output_capsule, dec] and the corresponding loss functions are [margin_loss, 'mae']. Because you removed the dec output, you need to remove its loss function from the compilation of the model.
In addition, you will need to make sure that you are passing the correct inputs and outputs everywhere else. In the code that you have here, that matters for model.fit. The first argument is the inputs. You still have two inputs to your model (not sure if you want two inputs, but that's a different problem), so passing in a list of two inputs to fit is good. However, the second argument is the desired outputs. You're currently passing in two output arrays, but you only have one output for your model. Since you removed your model's second output, you should remove the second output that you pass in to fit, i.e. X.
Modified code:
model = Model([x, mask_input], [output_capsule])
model.compile(optimizer='adam', loss=[margin_loss], metrics=[margin_loss, 'mae', 'accuracy'])
model.fit([X, Y], [Y], batch_size=128, epochs=3, validation_split=0.2)
Related
Say I have a function F that takes in a parameter vector P (say, a 5-element vector), and produces a (numerical) time series Y[t] of length T (eg T=100, so t=1,...,100). The function could be complicated (eg enzyme reaction models)
I want to make a neural network that predicts the output (Y[t]) that would result from feeding a new parameter set (P') into the function. How can this be done?
A simple feed-forward network can work, but it requires a very large number of output nodes, and doesn't take into account the temporal correlation / relationships between points. Is it possible/better to use a RNN or Transformer instead?
Using RNN might work for you. Here is some example code in Keras to get you started:
param_length = 5
time_length = 100
hidden_size = 20
model = tf.keras.Sequential([
# Encode input parameters.
tf.keras.layers.Dense(hidden_size, input_shape=[param_length]),
# Generate a sequence.
tf.keras.layers.RepeatVector(time_length),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
model.compile(loss="mse", optimizer="nadam")
model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=10)
The first Dense layer converts input parameters to a hidden state. Then LSTM RNN units generate time sequences. You will need to experiment with hyperparameters like the number of dense and LTSM layers, the size of hidden layers etc.
One more thing you can try is to use different loss function like:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_mae", patience=50, restore_best_weights=True)
model.compile(loss=tf.keras.losses.Huber(), optimizer="nadam", metrics=["mae"])
history = model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=500,
callbacks=[early_stopping_cb])
trainX, trainY, sequence_length=len(train), batch_size=batchTrain
)
val=timeseries_dataset_from_array(
valX, valY, sequence_length=len(val), batch_size=batchVal
)
test=timeseries_dataset_from_array(
testX, testY, sequence_length=len(test), batch_size=batchTest
)
return train, val, test
train, val, test = preprocessor()
model=Sequential()
model.add(LSTM(4,return_sequences=True))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='Adam', loss="mae")
model.fit(train, epochs=200, verbose=2, validation_data=val, shuffle=False)
I'm trying to make an LSTM from time-series data and when I run the above, the loss doesn't change at all. I'm definitely struggling to understand how lstm input/output shapes work. I've read as much online as I could find, but I can't seem to get the model to learn. I'm under the impression that the first argument is the dimensionality of the output space. I want the lstm to return the whole sequence to the output function.
There are many problems in your model. You final layer is dense with two units and you are using softmax which should be replaced by sigmoid. Since you are using softmax, i guess that you are using this model for classification and not regression.
If you are using a model for classification tasks then you should use BinaryCrossentropy and not MeanAbsoluteError as loss.
To answer the question in full detail, you need to post the additional information. For example: What are you target variables etc.
I would like to train two Autoencoders jointly and connect their activation layer in the deepest layer.
How can I add all the terms in one loss function?
Assume:
diffLR = Lambda(lambda x: abs(x[0] - x[1]))([model1_act7, model2_act5])
model = Model(inputs=[in1, in2], outputs=[diffLR, model1_conv15, model2_conv10])
model.compile(loss=['MAE', 'mean_squared_error','mean_squared_error'],
optimizer='SGD',
metrics=['mae', rmse])
model.fit([x_train_n, y_train_n], [yM1, x_train_n, y_train_n], batch_size=10, epochs=350, validation_split=0.2, shuffle=True) #, callbacks=[es])
Two networks are convolutional Autoencoders mapping x->x and y->y. Lambda layer connects the latent space of two networks. Target for diffLR is to train the network to the point that two feature spaces represent same distribution. (yM1 is a zero matrix of the same size as latent feature space.)
Now each are optimized separately (or I think they are optimized separately...), I would like to join them in a single loss function like this:
def my_loss(z, x, y, z_pred, x_pred, y_pred):
loss = backend.sqrt(backend.mean(backend.square(x_pred-x))) + backend.sqrt(backend.mean(backend.square(y_pred-y))) + backend.sqrt(backend.mean(backend.square(z_pred-z)))
return loss
model.compile(loss=[my_loss],
optimizer='SGD',
metrics=['mae', rmse])
I get this error:
ValueError: When passing a list as loss, it should have one entry per model outputs. The model has 3 outputs, but you passed loss=[<function my_loss at 0x7fa3d17f2158>]
or
model.compile(loss=my_loss,
optimizer='SGD',
metrics=['mae', rmse])
TypeError: my_loss() missing 4 required positional arguments: 'y', 'z_pred', 'x_pred', and 'y_pred'
Is this possible to do? How can I do this?
So, what you are doing is performing RootMeanSquareError on each of your n=3 output followed by a weighted sum (same weight in your case).
As the Error message says clearly:
ValueError: When passing a list as loss, it should have one entry per
model outputs. The model has 3 outputs, but you passed....
By passing a list of 3 loss function (might be same or different) while compiling your model You can do the same thing what you are doing in your custom loss function. Additionally, you can also define the weight for each individual loss by passing loss_weights argument value. moreover, you can do something following:
def my_loss(y_true, y_pred):
return backend.sqrt(K.mean(K.square(y_pred - y_true)))
model.compile(loss=[my_loss, my_loss, my_loss], # you can pass 3 different (custom) loss function as well
loss_weight=[1.0, 1.0, 1.0], # Default value is 1
optimizer='SGD',
metrics=['mae', rmse])
I am building a very simple DNN binary model which I define as:
def __build_model(self, vocabulary_size):
model = Sequential()
model.add(Embedding(vocabulary_size, 12, input_length=vocabulary_size))
model.add(Flatten())
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
return model
with training like:
def __train_model(self, model, model_data, training_data, labels):
hist = model.fit(training_data, labels, epochs=20, verbose=True, validation_split=0.2)
model.save('models/' + model_data['Key'] + '.h5')
return model
The idea is to feed tfidf vectorized text after training and predict whenever it belongs to class 1 or 0. Sadly when I run predict against it, I get an array of predictions instead of expected 1 probability for the article belonging to class 1. The array values seem very uniform. I assume this comes from some mistake in the model. I try popping prediction like so:
self._tokenizer.fit_on_texts(asset_article_data.content)
predicted_post_vector = self._tokenizer.texts_to_matrix(post, mode='tfidf')
return model.predict(predicted_post_vector) > 0.60 // here return array instead of true/false
The training data is vectorized text itself. What might be off?
The mistake you are probably making is that the post is a string, whereas it should be a list of strings. That's why, as you mentioned, the model.predict() produces a lot of values: because tokenizer has iterated over the characters of post and produced a Tf-idf vector for each of them! Just put it in a list and the problem would be resolved:
... = self._tokenizer.texts_to_matrix([post], ...)
There are two ways of solving your issue:
model.predict_classes as Simon said or use argmax
np.argmax(model.predict(predicted_post_vector), axis=1)
I would personally use pd.get_dummies(y_train) in your target variable and adjust output layer to Dense(2, activation='sigmoid').
Keras is build to predict output for more than one input that's why the output is an array. Refer to the keras doc here (Returns Numpy array(s) of predictions). So if you need a single output, just select the first element of the array :
model.predict(predicted_post_vector)[0] > 0.60
I have built and trained a sequential binary classification model using keras layers. Everything seems to work fine until I start using the predict method. This function starts to give me a weird exponential value rather than probabilities of the two classes.
This what I get after training and using predict method on the model
This classification model has two classes, let's say a cat or a dog, so I was expecting the result to be something like [99.9999, 0.0001] suggesting that it's a cat. I'm not sure how to interpret the value that I'm getting back instead.
Here is the code I have:
# Get the data.
(train_texts, train_labels), (val_texts, val_labels) = data
train_labels = np.asarray(train_labels).astype('float32')
val_labels = np.asarray(val_labels).astype('float32')
# Vectorizing data
train_texts,val_texts, word_index = vectorize_data.sequence_vectorize(
train_texts, val_texts)
# Building the model architecture( adding layers to the model)
model = build_model.simple_model_layers(train_texts.shape[1:])
# Setting and compiling with the features like the optimizer, loss and metrics functions
model = build_model.simple_model_compile(model=model)
# This is when the learning happens
history = model.fit(train_texts,
train_labels,
epochs=EPOCHS,
validation_data=(val_texts, val_labels),
verbose=VERBOSE_OFF, batch_size=BATCH_SIZE)
print('Validation accuracy: {acc}, loss: {loss}'.format(
acc=history['val_acc'][-1], loss=history['val_loss'][-1]))
# loading data to predict on
test_text = any
with open('text_req.pickle', 'rb') as pickle_file:
test_text = pickle.load(pickle_file)
print('Lets make a prediction of this requirement:')
prediction = model.predict(test_text, batch_size=None, verbose=0, steps=None)
print(prediction)
Here is how the simple model function looks like:
model = models.Sequential()
model.add(Dense(26, activation='relu', input_shape=input_shape))
model.add(Dense(16, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
return model
Gradient desent functions:
optimizer='adam', loss='binary_crossentropy'
Sample data is of String type which I convert to constant size matrices of 1’s and 0’s using padding and all. The features have two classes, so labels are simply 1 and 0. That's all for data. In my opinion, data doesn’t seem to be the problem it could be something more trivial than that which I‘m overlooking and have failed to recognize.
Thank you guys, This last problem was resolved, but I need better understanding at this:
I read that sigmoid returns the probability of all possible classes and all the probabilities should add up to 1. The values that I am getting back are:
Validation accuracy: 0.792168688343232, loss: 2.8360600299145804
Let's make a prediction of this requirement:
[[2.7182817, 1. ]
[2.7182817, 1. ]
[1., 2.7182817]
[1. , 2.7182817]]
They don't add up to 1 and looking at these values 1 or otherwise is not intuitive enough in what to make of it.
Your model only has one output. If your training labels are set to 0 for cat and 1 for dog then that means the network thinks its a cat if the output is [[2.977094e-12]]. If you want the probabilities of the two classes like you were expecting then you need to change the output of your model as follows:
model = models.Sequential()
model.add(Dense(26, activation='relu', input_shape=input_shape))
model.add(Dense(16, activation='relu'))
model.add(Dense(10, activation='relu')
model.add(Dense(2, activation='softmax'))
Your labels would also need to change to [1, 0] and [0, 1] for cat and dog.
I want to clarify that you don't get a weird exponential value, you just get a weird value. The E is scientific notation for x10, so you basically get 2.7 x 10^-12. I'd love to help but I can't check your data nor your model. I tried to Google some parts of your code, in the hope to find some clarification but I can't seem to find out what's under the hood of these two lines:
model = build_model.simple_model_layers(train_texts.shape[1:])
model = build_model.simple_model_compile(model=model)
I've no clue what network has been build, I'd like to know at least the loss function and the full final layer, that'd already be much to go by. Are you also sure that your data is correct?
EDIT:
sigmoid does not do what you describe, softmax does that. Sigmoid is often used as multilabel classification, since it can detect multiple labels as True. Sigmoid output could look for example like [0.99, 0.3], it has the ability to look at each label separately. Softmax on the other hand doesn't, softmax could look like [0.99, 0.01], and the sum of all probabilities is always 1.
That solved that confusion, now about your output, I've no clue what that is, it should be between 1 and 0, unless I'm missing something here.
To answer your data question you asked to K. Streutker:
The goal of a neural network is to create the labels you feed it, on new data. If you want a probability distribution, then you also need to feed one. Every image should have a label [1, 0] and dog [0, 1], or reversed, whatever you like. Then once the model is trained it will be able to give you two outputs that make sense. The loss function, most likely cross entropy takes these labels and the output of your model, and tries to minimize the difference over time. So this is sort of what you need:
image (dog)--> model --> loss --> optimizer that updates the weights
labels ([0,1]) ------------------┘
then predicting will look like this
image --> model --> labels
Hope I helped a bit!