Layering softmax classifier into RNN autoencoder - python

The paper I'm implementing is using an RNN with autoencoder to classify anomalous network data(binary classification). They first train the model unsupervised, and then they describe this process:
Next, fine-tuning training (supervised) is conducted to train the last layer of
the network using labeled samples. Implementing the fine-tuning using
supervised training criterion can further optimize the whole network. We use softmax regression layer with two channels at the top
layer
Currently, I've implemented the autoencoder:
class AnomalyDetector(Model):
def __init__(self):
super(AnomalyDetector, self).__init__()
self.encoder = tf.keras.Sequential([
layers.Dense(64, activation="relu"),
layers.Dense(32, activation="relu"),
layers.Dense(16, activation="relu"),
layers.Dense(8, activation="relu")])
self.decoder = tf.keras.Sequential([
layers.Dense(16, activation="relu"),
layers.Dense(32, activation="relu"),
layers.Dense(64, activation="relu"),
layers.Dense(79, activation='relu')
])
How do you implement the softmax regression layer in TensorFlow?
I'm having trouble understanding the process, am I supposed to add another layer to the autoencoder? Am I supposed to add another function to the class?

Just in case anyone in the future visits this -
You can create a softmax layer by changing the activation. I chose a sigmoid activation in my case since sigmoid is equivalent to a two-element softmax. As per the documentation.
class AnomalyDetector(Model):
def __init__(self):
super(AnomalyDetector, self).__init__()
self.pretrained = False
self.finished_training = False
self.encoder = tf.keras.Sequential([
layers.SimpleRNN(64, activation="relu", return_sequences=True),
layers.SimpleRNN(32, activation="relu", return_sequences=True),
layers.SimpleRNN(16, activation="relu", return_sequences=True),
layers.SimpleRNN(8, activation="relu", return_sequences=True)])
self.decoder = tf.keras.Sequential([
layers.SimpleRNN(16, activation="relu", return_sequences=True),
layers.SimpleRNN(32, activation="relu", return_sequences=True),
layers.SimpleRNN(64, activation="relu", return_sequences=True),
layers.SimpleRNN(79, activation="relu"), return_sequences=True])
layers.SimpleRNN(1, activation="sigmoid")

Related

Using LSTM layer without Embedding

I have been trying to train a model for tf.keras.datasets.imdb using LSTM in tensorflow.
After some processing i have input -> x_train shape: (25000, 100).
Since i cannot feed it in a LSTM layer directly, i used a lambda function to change dimension of input:
model = Sequential([
Lambda(lambda x: tf.expand_dims(x,axis=-1)),
LSTM(62, dropout=0.2, recurrent_dropout=0.2, return_sequences=True),
LSTM(32),
Dense(1, activation='sigmoid')
])
But i am getting a horrible accuracy (around 55%).
But upon using embedding layer the accuracy increases to 90% using same hyper parameters:
model = Sequential([
Embedding(5000, 32),
LSTM(62, dropout=0.2, recurrent_dropout=0.2, return_sequences=True),
LSTM(32),
Dense(1, activation='sigmoid')
])
Am i doing something wrong in the first case?

How do I save the embeddings my model creates during training?

I am doing am image similarity problem and want to save the image embeddings the model creates during training. Is there a way I can capture the embeddings before they are passed into the loss function?
Here is my model
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=128, kernel_size=2, padding='same', activation='relu', input_shape=(32,32,3)),
tf.keras.layers.MaxPooling2D(pool_size=2),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Conv2D(filters=64, kernel_size=2, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=2),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation=None), # No activation on final dense layer
tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1)) # L2 normalize embeddings
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tfa.losses.TripletSemiHardLoss(),
metrics = ["accuracy"])
history = model.fit(train_dataset, epochs=3, validation_data=test_dataset)
To be clear I do not want the outputs just from the final layer shown here. I want to save the final resulting vector that is output from my model.
You can create a custom callback and at the end of training step of each batch, call your model with input batch and get the output and save it.
class SaveEmbeddingCallback(tf.keras.callbacks.Callback):
def on_train_batch_end(self, batch, logs=None):
embedding = self.model.predict(batch)
# IN THIS STAGE YOU HAVE THE OUTPUT OF THE MODEL
# YOU CAN SAVE IT OR WHATEVER YOU WANT
...
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tfa.losses.TripletSemiHardLoss(),
metrics = ["accuracy"])
history = model.fit(train_dataset,
epochs=3,
validation_data=test_dataset,
callbacks=[SaveEmbeddingCallback()])
For more information on keras custom callbacks read this TensorFlow tutorial

Extract subnetwork from Keras Sequential model

I trained a very simple autoencoder network similar to this example:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
layers.Dense(128, activation="relu"),
layers.Dense(64, activation="relu"),
layers.Dense(32, activation="relu"),
layers.Dense(16, activation="relu"),
layers.Dense(8, activation="relu", name="latent_space"),
layers.Dense(16, activation="relu"),
layers.Dense(32, activation="relu", name="decode_32"),
layers.Dense(64, activation="relu"),
layers.Dense(128, activation="sigmoid"),
])
model.compile(...)
model.fit(...)
# Extract subnetwork here after training
I would like to know if it is possible to feed data to the latent_space layer such that I can afterwards extract the activations from layer decode_32? Ideally I would like to crop a subnetwork after training with the latent_space layer as the input and the decode_32 layer as the output layer. Is that possible?
Does this answer fits your question?
def extract_layers(main_model, starting_layer_ix, ending_layer_ix) :
# create an empty model
new_model = Sequential()
for ix in range(starting_layer_ix, ending_layer_ix + 1):
curr_layer = main_model.get_layer(index=ix)
# copy this layer over to the new model
new_model.add(curr_layer)
return new_model
If you prefer selecting your subnetwork with the names of the first and last layers, the get_layer method also has an argument for the layer's name, but an easier solution would be to retrieve the indexes of the layers to select thanks to the layer.name argument.
That way, you just have to modify the previous function by adding
layer_names = [layer.name for layer in main_model.layers]
starting_layer_ix = layer_names.index(starting_layer_name)
ending_layer_ix = layer_names.index(ending_layer_name)

CNN-LSTM Timeseries input for TimeDistributed layer

I created a CNN-LSTM for survival prediction of web sessions, my training data looks as follows:
print(x_train.shape)
(288, 3, 393)
with (samples, timesteps, features) and my model:
model = Sequential()
model.add(TimeDistributed(Conv1D(128, 5, activation='relu'),
input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(TimeDistributed(MaxPooling1D()))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(64, stateful=True, return_sequences=True))
model.add(LSTM(16, stateful=True))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer=Adam(lr=0.001), loss='binary_crossentropy', metrics=['accuracy'])
However, the TimeDistributed Layer requires a minimum of 3 dimensions, how should I transform the data to get it work?
Thanks a lot!
your data are in 3d format and this is all you need to feed a conv1d or an LSTM. if your target is 2D remember to set return_sequences=False in your last LSTM cell.
using a flatten before an LSTM is a mistake because you are destroying the 3D dimensionality
pay attention also on the pooling operation in order to not have a negative time dimension to reduce (I use 'same' padding in the convolution above in order to avoid this)
below is an example in a binary classification task
n_sample, time_step, n_features = 288, 3, 393
X = np.random.uniform(0,1, (n_sample, time_step, n_features))
y = np.random.randint(0,2, n_sample)
model = Sequential()
model.add(Conv1D(128, 5, padding='same', activation='relu',
input_shape=(time_step, n_features)))
model.add(MaxPooling1D())
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(16, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X,y, epochs=3)

Probability for Tensorflow Binary Image Classification

I try to follow the Image Classification Tutorial but unfortunally it doesn't tell you how to use the model after you've created it.
The code I currently use to create the model is:
model = Sequential([
tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_SIZE, IMG_SIZE ,3)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
On my first attempt I hadn't the activation='sigmoid' on the last Dense layer, but then the predictions I get from the model are for example [[332.9539]] which I don't know how to interpret.
After I read this answer I added the Sigmoid activation to receive a value between 0 and 1, but unfortunally when training the model the accuracy is stuck at 0.5 while it worked before.
What am I doing wrong?
If you add the sigmoid activation to the last layer, then you need to remove the from_logits=True from the loss instance, since your model is no longer producing logits:
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])

Categories