a question concerning keras regression with multiple outputs:
Could you explain the difference beteween this net:
two inputs -> two outputs
input = Input(shape=(2,), name='bla')
hidden = Dense(hidden, activation='tanh', name='bla')(input)
output = Dense(2, activation='tanh', name='bla')(hidden)
and: two single inputs -> two single outputs:
input = Input(shape=(2,), name='speed_input')
hidden = Dense(hidden_dim, activation='tanh', name='hidden')(input)
output = Dense(1, activation='tanh', name='bla')(hidden)
input_2 = Input(shape=(1,), name='angle_input')
hidden_2 = Dense(hidden_dim, activation='tanh', name='hidden')(input_2)
output_2 = Dense(1, activation='tanh', name='bla')(hidden_2)
model = Model(inputs=[speed_input, angle_input], outputs=[speed_output, angle_output])
They behave very similar. Other when I completly seperate them, then the two nets behave like they re supposed to.
And is it normal that two single output nets behave much more intelligible than a bigger one with two outputs, I didnt think the difference could be huge like I experienced.
Thanks a lot :)
This goes back to how neural networks operate. In your first model, each hidden neuron receives 2 input values (as it is a 'Dense' layer, the input propagates to every neuron). In your second model, you have twice as many neurons, but each of these only receive either speed_input or angle_input, and only works with that data instead of the entire data.
So, if speed_input and angle_input are 2 completely unrelated attributes, you'll likely see better performance from splitting the 2 models, since neurons aren't receiving what is basically noise input (they don't know that your outputs correspond to your inputs, they can only try to optimize your loss function). Essentially, you're creating 2 seperate models.
But in most cases, you want to feed the model relevant attributes that combine to draw a prediction. So splitting the model then would not make sense, since you're just stripping neccessary information away.
Related
I have a simple model for a project which takes in some weather data and predicts a temperature. The structure is the following:
24 inputs with 10 features (one input per hour so 24 hours of input data)
1 output being a temperature
The model works, I'm all fine with that, however I need to present the way the model works and I'm unsure how some values I see about the model describe it. I'm struggling to visually represent the inner structure (as in the neurons or nodes).
Here is the model (The static input_shape is not set statically in my code, it is purely to help answer the question):
forward_layer = tf.keras.layers.LSTM(units=32, return_sequences=True)
backward_layer = tf.keras.layers.LSTM(units=32, return_sequences=True, go_backwards=True)
bilstm_model = tf.keras.models.Sequential([
tf.keras.layers.Bidirectional(forward_layer, backward_layer=backward_layer, input_shape=(24, 10)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.LSTM(32, return_sequences=False),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(units=1)
])
Note that the reason I seperated the layers for the Bidirectional is because I am running this with TFLite and had issues if the two weren't pre-defined but the inner workings are the same as far as I understand it.
Now as some experts have probably already figured out just looking at this, the input shape is (32, 24, 10) and the output shape is (32, 1) where the input is under the (batch_size, sequence_length, features) format and the output is under the (batch_size, units) format.
As well as I feel I understand what the numbers represent, I can't quite wrap my head around how the model would "look". I mainly struggle to differentiate the structure during training and during predictions.
Here are the ideas I have (note that I will be describing the model based on neurons):
Input is a 24x10 DataFrame
The input is therefore 240 values which can be seen as 24 sets of 10 neurons ? (Not sure I'm right here)
return_sequences means that the 240 values propagate throughout the model ? (Still not sure here)
The 'neuron' structure would ressemble this:
Input(240 neurons) -> BiLSTM(240 neurons) ->
Dropout(240 neurons, 20% drop rate) -> LSTM(240 neurons) ->
Dropout(240 neurons, 20% drop rate) -> LSTM(240 neurons) ->
Dropout(240 neurons, 20% drop rate) -> LSTM(240 neurons) ->
Dropout(? neurons, 20% drop rate) -> Dense(1 neurons) = Output
If I'm not mistaken, the Dropout layer isn't a layer strictly speaking but it stops (in this case) 20% of the input neurons (or output, I'm not sure) from activating 20% of the output neurons.
I'd really appreciate the help on visualising the structure so thanks in advance to any brave soul ready to help me out
EDIT
Here is an example of what I am trying to get out of the numbers. Note that this image ONLY represents the first BiLSTM layer, not the whole model.
The x ? in the image represent what I'm trying to understand, ie how many layers are there and how many neurons are there in each layer
I have a Keras LSTM multitask model that performs two tasks. One is a sequence tagging task (so I predict a label per token). The other is a global classification task over the whole sequence using a CNN that is stacked on the hidden states of the LSTM.
In my setup (don't ask why) I only need the CNN task during training, but the labels it predicts have no use on the final product. So, on Keras, one can train a LSTM model without especifiying the input sequence lenght. like this:
l_input = Input(shape=(None,), dtype="int32", name=input_name)
However, if I add the CNN stacked on the LSTM hidden states I need to set a fixed sequence length for the model.
l_input = Input(shape=(timesteps_size,), dtype="int32", name=input_name)
The problem is that once I have trained the model with a fixed timestep_size I can no longer use it to predict longer sequences.
In other frameworks this is not a problem. But in Keras, I cannot get rid of the CNN and change the expected input shape of the model once it has been trained.
Here is a simplified version of the model
l_input = Input(shape=(timesteps_size,), dtype="int32")
l_embs = Embedding(len(input.keys()), 100)(l_input)
l_blstm = Bidirectional(GRU(300, return_sequences=True))(l_embs)
# Sequential output
l_out1 = TimeDistributed(Dense(len(labels.keys()),
activation="softmax"))(l_blstm)
# Global output
conv1 = Conv1D( filters=5 , kernel_size=10 )( l_embs )
conv1 = Flatten()(MaxPooling1D(pool_size=2)( conv1 ))
conv2 = Conv1D( filters=5 , kernel_size=8 )( l_embs )
conv2 = Flatten()(MaxPooling1D(pool_size=2)( conv2 ))
conv = Concatenate()( [conv1,conv2] )
conv = Dense(50, activation="relu")(conv)
l_out2 = Dense( len(global_labels.keys()) ,activation='softmax')(conv)
model = Model(input=input, output=[l_out1, l_out2])
optimizer = Adam()
model.compile(optimizer=optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"])
I would like to know if anyone here has faced this issue, and if there are any solutions to delete layers from a model after training and, more important, how to reshape input layer sizes after training.
Thanks
Variable timesteps length makes a problem not because of using convolution layers (actually the good thing about convolution layers is that they do not depend on the input size). Rather, using Flatten layers cause the problem here since they need an input with specified size. Instead, you can use Global Pooling layers. Further, I think stacking convolution and pooling layers on top of each other might give a better result instead of using two separate convolution layers and merging them (although this depends on the specific problem and dataset you are working on). So considering these two points it might be better to write your model like this:
# Global output
conv1 = Conv1D(filters=16, kernel_size=5)(l_embs)
conv1 = MaxPooling1D(pool_size=2)(conv1)
conv2 = Conv1D(filters=32, kernel_size=5)(conv1)
conv2 = MaxPooling1D(pool_size=2)(conv2)
gpool = GlobalAveragePooling1D()(conv2)
x = Dense(50, activation="relu")(gpool)
l_out2 = Dense(len(global_labels.keys()), activation='softmax')(x)
model = Model(inputs=l_input, outputs=[l_out1, l_out2])
You may need to tune the number of conv+maxpool layers, number of filters, kernel size and even add dropout or batch normalization layers.
As a side note, using TimeDistributed on a Dense layer is redundant as the Dense layer is applied on the last axis.
I don't understand what's happening in this code:
def construct_model(use_imagenet=True):
# line 1: how do we keep all layers of this model ?
model = keras.applications.InceptionV3(include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3),
weights='imagenet' if use_imagenet else None) # line 1: how do we keep all layers of this model ?
new_output = keras.layers.GlobalAveragePooling2D()(model.output)
new_output = keras.layers.Dense(N_CLASSES, activation='softmax')(new_output)
model = keras.engine.training.Model(model.inputs, new_output)
return model
Specifically, my confusion is, when we call the last constructor
model = keras.engine.training.Model(model.inputs, new_output)
we specify input layer and output layer, but how does it know we want all the other layers to stay?
In other words, we append the new_output layer to the pre-trained model we load in line 1, that is the new_output layer, and then in the final constructor (final line), we just create and return a model with a specified input and output layers, but how does it know what other layers we want in between?
Side question 1): What is the difference between keras.engine.training.Model and keras.models.Model?
Side question 2): What exactly happens when we do new_layer = keras.layers.Dense(...)(prev_layer)? Does the () operation return new layer, what does it do exactly?
This model was created using the Functional API Model
Basically it works like this (perhaps if you go to the "side question 2" below before reading this it may get clearer):
You have an input tensor (you can see it as "input data" too)
You create (or reuse) a layer
You pass the input tensor to a layer (you "call" a layer with an input)
You get an output tensor
You keep working with these tensors until you have created the entire graph.
But this hasn't created a "model" yet. (One you can train and use other things).
All you have is a graph telling which tensors go where.
To create a model, you define it's start end end points.
In the example.
They take an existing model: model = keras.applications.InceptionV3(...)
They want to expand this model, so they get its output tensor: model.output
They pass this tensor as the input of a GlobalAveragePooling2D layer
They get this layer's output tensor as new_output
They pass this as input to yet another layer: Dense(N_CLASSES, ....)
And get its output as new_output (this var was replaced as they are not interested in keeping its old value...)
But, as it works with the functional API, we don't have a model yet, only a graph. In order to create a model, we use Model defining the input tensor and the output tensor:
new_model = Model(old_model.inputs, new_output)
Now you have your model.
If you use it in another var, as I did (new_model), the old model will still exist in model. And these models are sharing the same layers, in a way that whenever you train one of them, the other gets updated as well.
Question: how does it know what other layers we want in between?
When you do:
outputTensor = SomeLayer(...)(inputTensor)
you have a connection between the input and output. (Keras will use the inner tensorflow mechanism and add these tensors and nodes to the graph). The output tensor cannot exist without the input. The entire InceptionV3 model is connected from start to end. Its input tensor goes through all the layers to yield an ouptut tensor. There is only one possible way for the data to follow, and the graph is the way.
When you get the output of this model and use it to get further outputs, all your new outputs are connected to this, and thus to the first input of the model.
Probably the attribute _keras_history that is added to the tensors is closely related to how it tracks the graph.
So, doing Model(old_model.inputs, new_output) will naturally follow the only way possible: the graph.
If you try doing this with tensors that are not connected, you will get an error.
Side question 1
Prefer to import from "keras.models". Basically, this module will import from the other module:
https://github.com/keras-team/keras/blob/master/keras/models.py
Notice that the file keras/models.py imports Model from keras.engine.training. So, it's the same thing.
Side question 2
It's not new_layer = keras.layers.Dense(...)(prev_layer).
It is output_tensor = keras.layers.Dense(...)(input_tensor).
You're doing two things in the same line:
Creating a layer - with keras.layers.Dense(...)
Calling the layer with an input tensor to get an output tensor
If you wanted to use the same layer with different inputs:
denseLayer = keras.layers.Dense(...) #creating a layer
output1 = denseLayer(input1) #calling a layer with an input and getting an output
output2 = denseLayer(input2) #calling the same layer on another input
output3 = denseLayer(input3) #again
Bonus - Creating a functional model that is equal to a sequential model
If you create this sequential model:
model = Sequential()
model.add(Layer1(...., input_shape=some_shape))
model.add(Layer2(...))
model.add(Layer3(...))
You're doing exactly the same as:
inputTensor = Input(some_shape)
outputTensor = Layer1(...)(inputTensor)
outputTensor = Layer2(...)(outputTensor)
outputTensor = Layer3(...)(outputTensor)
model = Model(inputTensor,outputTensor)
What is the difference?
Well, functional API models are totally free to be build anyway you want. You can create branches:
out1 = Layer1(..)(inputTensor)
out2 = Layer2(..)(inputTensor)
You can join tensors:
joinedOut = Concatenate()([out1,out2])
With this, you can create anything you want with all kinds of fancy stuff, branches, gates, concatenations, additions, etc., which you can't do with a sequential model.
In fact, a Sequential model is also a Model, but created for a quick use in models without branches.
There's this way of building a model from a pretrained one that you may build upon.
See https://keras.io/applications/#fine-tune-inceptionv3-on-a-new-set-of-classes:
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(200, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
Each time a layer is added by an op like "x=Dense(...", information about the computational graph is updated. You can type this interactively to see what it contains:
x.graph.__dict__
You can see there's all kinds of attributes, including about previous and next layers. These are internal implementation details and possibly change over time.
I am creating a Tensorflow model which predicts multiple outputs (with different activations). I think there are two ways to do this:
Method 1: Create multiple loss functions (one for each output), merge them (using tf.reduce_mean or tf.reduce_sum) and pass it to the training op like so:
final_loss = tf.reduce_mean(loss1 + loss2)
train_op = tf.train.AdamOptimizer().minimize(final_loss)
Method 2: Create multiple training operations and then group them like so:
train_op1 = tf.train.AdamOptimizer().minimize(loss1)
train_op2 = tf.train.AdamOptimizer().minimize(loss2)
final_train_op = tf.group(train_op1 train_op2)
My question is whether one method is advantageous over the other? Is there a third method I don't know?
Thanks
I want to make a subtle point that I don't think was made in previous answers.
If you were using something like GradientDescentOptimizer, these would be very similar operations. That's because taking gradients is a linear operation, and the gradient of a sum is the same as the sum of the gradients.
But, ADAM does something special: regardless of the scale of your loss, it scales the gradients so that they're always on the order of your learning rate. If you multiplied your loss by 1000, it wouldn't affect ADAM, because the change it would be normalized away.
So, if your two losses are roughly the same magnitude, then it shouldn't make a difference. If one is much larger than the other, then keep in mind that summing before the minimization will essentially ignore the small one, while making two ops will spend equal effort minimizing both.
I personally like dividing them up, which gives you more control over how much to focus on one loss or the other. For example, if it was multi-task learning, and one task was more important to get right than the other, two ops with different learning rates roughly accomplishes this.
The difference between the two methods is demonstrated clearly in this post on multi-task learning in tensorflow.
In short:
Method 1:
This is called joint training, since it directly adds the losses together, the result is that all the gradients and updates are done with respect to both losses at the same time. Generally this is used when training multiple outputs using the same set of input features.
Method 2:
This creates two separate optimizers and is called alternate training. This is used when you use a subset of input features for each of the outputs. Therefore, when feeding in the feature subset for train_op1, the sub-graph for train_op2 is untouched. Each optimizer can be called in an alternating order using different input features.
If you run both optimizer concurrently with the same input data, then the differences with method 1 is probably very minor.
The method 1 is the correct one because you're defining only once the gradient graph (for computing the backpropagation). In this way, you use a single loss function with a single graph, for doing a single update of the same parameter (the update takes into account both terms of the loss).
The second method, instead, defines 2 different graphs for computing the gradient, and is wrong.
When you execute the training op, you're executing in parallel (because you used tf.group / tf.tuple / tf.control_dependencies) the computation of the training operations.
The operations will compute two different losses and two different set of updated variables.
When the moment of updating the variables comes, you have a problem:
which update operation executes first, the one defined by the first graph or the other?
And in any case, you're discarding one computation, because one will overwrite the other. There's no synchronization in the update and there's no relation in the computed losses.
Both of the method you recommended are correct. The difference is quite subtle. Main difference is that AdamOptimizer keeps separate gradient accumulators for each loss in second solution. Which one works better needs an experiment.
I will showcase how to implement a regression model using Tensorflow's functional API.
In multi-task learning, we need a base network that is shared between tasks and a network head for each individual task:
from tensorflow.keras import layers, models, Model
def create_base_cnn(input_shape):
model = models.Sequential()
model.add(layers.Conv2D(filters=32, kernel_size=(3, 3), padding="same", activation="relu", input_shape=input_shape))
model.add(layers.Conv2D(filters=32, kernel_size=(3, 3), padding="same", activation="relu"))
# put more layers if you like
model.add(layers.Dense(128, activation="relu"))
return model
def create_head(input_shape, name):
model = models.Sequential(name=name)
model.add(layers.Dense(128, activation="relu", input_shape=input_shape))
model.add(layers.Dense(64, activation="relu"))
# put more layers if you like
model.add(layers.Dense(1, activation="linear"))
return model
We can now combine the base model with the heads.
# Create the model.
input_shape = (240, 180, 1)
base_model = create_base_cnn(input_shape)
head_model1 = create_head((128,), name="head1")
head_model2 = create_head((128,), name="head2")
model_input = layers.Input(shape=input_shape)
# Combine base with heads (using TF's functional API)
features = base_model(model_input)
model_output1 = head_model1(features)
model_output2 = head_model2(features)
model = Model(inputs=model_input, outputs=[model_output1, model_output2])
Finally to train the model we can refer to the different outputs by name (in my case: "head1" and "head2"). We can define a hyperparameter for the weight of each head in the loss function:
HEAD1_WEIGHT = 0.4
HEAD2_WEIGHT = 0.6
model.compile(
optimizer="Adam",
loss={"head1": "mse", "head2": "mse"},
loss_weights={"head1": HEAD1_WEIGHT, "head2": HEAD2_WEIGHT},
metrics={"head1": ["mae"], "head2": ["mae"]}
)
model.fit(dataset_training, validation_data, epochs)
I'm new in Keras and Neural Networks. I'm writing a thesis and trying to create a SimpleRNN in Keras as it is illustrated below:
As it is shown in the picture, I need to create a model with 4 inputs + 2 outputs and with any number of neurons in the hidden layer.
This is my code:
model = Sequential()
model.add(SimpleRNN(4, input_shape=(1, 4), activation='sigmoid', return_sequences=True))
model.add(Dense(2))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, epochs=5000, batch_size=1, verbose=2)
predict = model.predict(data)
1) Does my model implement the graph?
2) Is it possible to specify connections between neurons Input and Hidden layers or Output and Input layers?
Explanation:
I am going to use backpropagation to train my network.
I have input and target values
Input is a 10*4 array and target is a 10*2 array which I then reshape:
input = input.reshape((10, 1, 4))
target = target.reshape((10, 1, 2))
It is crucial for to able to specify connections between neurons as they can be different. For instance, here you can have an example:
1) Not really. But I'm not sure about what exactly you want in that graph. (Let's see how Keras recurrent layers work below)
2) Yes, it's possible to connect every layer to every layer, but you can't use Sequential for that, you must use Model.
This answer may not be what you're looking for. What exactly do you want to achieve? What kind of data you have, what output you expect, what is the model supposed to do? etc...
1 - How does a recurrent layer work?
Documentation
Recurrent layers in keras work with an "input sequence" and may output a single result or a sequence result. It's recurrency is totally contained in it and doesn't interact with other layers.
You should have inputs with shape (NumberOrExamples, TimeStepsInSequence, DimensionOfEachStep). This means input_shape=(TimeSteps,Dimension).
The recurrent layer will work internally with each time step. The cycles happen from step to step and this behavior is totally invisible. The layer seems to work just like any other layer.
This doesn't seem to be what you want. Unless you have a "sequence" to input. The only way I know if using recurrent layers in Keras that is similar to you graph is when you have a segment of a sequence and want to predict the next step. If that's the case, see some examples by searching for "predicting the next element" in Google.
2 - How to connect layers using Model:
Instead of adding layers to a sequential model (which will always follow a straight line), start using the layers independently, starting from an input tensor:
from keras.layers import *
from keras.models import Model
inputTensor = Input(shapeOfYourInput) #it seems the shape is "(2,)", but we must see your data.
#A dense layer with 2 outputs:
myDense = Dense(2, activation=ItsAGoodIdeaToUseAnActivation)
#The output tensor of that layer when you give it the input:
denseOut1 = myDense(inputTensor)
#You can do as many cycles as you want here:
denseOut2 = myDense(denseOut1)
#you can even make a loop:
denseOut = Activation(ItsAGoodIdeaToUseAnActivation)(inputTensor) #you may create a layer and call it with the input tensor in just one line if you're not going to reuse the layer
#I'm applying this activation layer here because since we defined an activation for the dense layer and we're going to cycle it, it's not going to behave very well receiving huge values in the first pass and small values the next passes....
for i in range(n):
denseOut = myDense(denseOut)
This kind of usage allows you to create any kind of model, with branches, alternative ways, connections from anywhere to anywhere, provided you respect the shape rules. For a cycle like that, inputs and outputs must have the same shape.
At the end, you must define a model from one or many inputs to one or many outputs (you must have training data to match all inputs and outputs you choose):
model = Model(inputTensor,denseOut)
But notice that this model is static. If you want to change the number of cycles, you will have to create a new model.
In this case, it would be as simple as repeating the loop step denseOut = myDense(denseOut) and creating another model2=Model(inputTensor,denseOut).
3 - Trying to create something like the image below:
I am supposing C and F will participate in all iterations. If not,
Since there are four actual inputs, and we are going to treat them all separately, let's create 4 inputs instead, all like (1,).
Your input array should be divided in 4 arrays, all being (10,1).
from keras.models import Model
from keras.layers import *
inputA = Input((1,))
inputB = Input((1,))
inputC = Input((1,))
inputF = Input((1,))
Now the layers N2 and N3, that will be used only once, since C and F are constant:
outN2 = Dense(1)(inputC)
outN3 = Dense(1)(inputF)
Now the recurrent layer N1, without giving it the tensors yet:
layN1 = Dense(1)
For the loop, let's create outA and outB. They start as actual inputs and will be given to the layer N1, but in the loop they will be replaced
outA = inputA
outB = inputB
Now in the loop, let's do the "passes":
for i in range(n):
#unite A and B in one
inputAB = Concatenate()([outA,outB])
#pass through N1
outN1 = layN1(inputAB)
#sum results of N1 and N2 into A
outA = Add()([outN1,outN2])
#this is constant for all the passes except the first
outB = outN3 #looks like B is never changing in your image....
Now the model:
finalOut = Concatenate()([outA,outB])
model = Model([inputA,inputB,inputC,inputF], finalOut)