I have the folowing models that i want to train (See image below):
The model has an input of 20. The model A has an input of 10 (the first 10 elements of the initial input), the model B has an input of 10 (the last 10 elements of the initial input) finally the input of the model C is the concatenation of the output of the models A and B.
How can I train this 3 models at the same time in Keras? Can I merge it in one big model? (I only have data to train the big model)
Can I merge it in one big model?
Yes!
How can I train this 3 models at the same time in Keras?
I will give you pointers:
Use functional APIs. Want to know how it is different from sequential? Look here
Use concatenate layer - Reference
Lets assume that you have your three models defined, and named model_A, model_B and model_C. You can now define you complete model somewhat like this (I did not check the exact code):
def complete_model(model_A, model_B, model_C):
input_1 = layers.Input(shape=(10,))
input_2 = layers.Input(shape=(10,))
model_A_output = model_A(input_1)
model_B_output = model_B(input_2)
concatenated = tf.concat([model_A_output, model_B_output], axis=-1)
model_C_output = model_C(concatenated)
model = Model(inputs=[input_1, input_2], outputs=model_C_output)
model.compile(loss=losses.MSE)
model.summary()
return model
This requires you to give two-dimensional inputs, so you have to do some numpy slicing to preprocess your inputs.
If you still want your one-dimensional inputs, you can just define a single input layer with shape (20,) and then use the tf.split function to split it in half and feed it into the next networks.
Related
I am trying to test many ML models using keras.models.Sequential.
My idea is that once I have an iterator that looks like [num_layers, num_units_per_layers], for example [(1, 64),(2, (64,128))], to create a script using a kind of for loop running the iterator to be able to create a keras sequential model with the number of layers and units in each step of the iterator.
This is what I am trying:
it = [[(1, 128),(2, (64,128)), (3,(128,64,256))]]
for layers, units in it:
model = keras.Sequential([
layers.Dense(units[0])
#How to get another layers here when layers > 1.
])
But I am stuck when adding new layers automatically. To sum up, what I want in each step of the iterator is the keras model represented by its values.
Is there any way to do this?
For example, when layers = 2 and units = (64,128) the code should look like:
model = keras.Sequential([
layers.Dense(64),
layers.Dense(128)
])
If layers = 1 and units = 128 the code must be:
model = keras.Sequential([
layers.Dense(128)
])
Well the first issue is the way you set up it. The way you're doing it makes it a single list, where you want a list of n lists (here n is 3). If you define it as follows, you can extract layers, units the way you are looking for.
it = [[1,[128]],[2,(64,128)],[3,(128,64,256)]]
If you want a model with one layer, you need to put the number of units in brackets, or it won't work well with the other architectures (because of indexing). Next, there are some necessary tweaks to the code that I would suggest. First I would use a different way to build a Sequential model (shown below). Then, you would need to define your input shape otherwise the model will not know how to build. Finally, just create an output layer for each model outside the hidden layer generator loop.
I wrote this toy problem to fit your idea of iterating through models for 10 training samples and one input dimension and one output dimension.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
x = np.random.rand(10,1)
y = np.random.rand(10,1)
it = [[1,[128]],[2,(64,128)],[3,(128,64,256)]]
for layers, units in it:
model = Sequential()
for i in range(layers):
model.add(Dense(units[i],input_shape=(1,)))
model.add(Dense(1))
model.summary()
model.compile(loss='mse',optimizer='Adam')
model.fit(x,y,batch_size=1,epochs=1)
Is there a way to use the already trained RNN (SimpleRNN or LSTM) model to generate new sequences in Keras?
I'm trying to modify an exercise from the Coursera Deep Learning Specialization - Sequence Models course, where you train an RNN to generate dinosaurus's names. In the exercise you build the RNN using only numpy, but I want to use Keras.
One of the problems is different lengths of the sequences (dino names), so I used padding and set sequence length to the max size appearing in the dataset (I padded with 0, which is also the code for '\n').
My question is how to generate the actual sequence once training is done? In the numpy version of the exercise you take the softmax output of the previous cell and use it as a distribution to sample a new input for the next cell. But is there a way to connect the output of the previous cell as the input of the next cell in Keras, during testing/generation time?
Also - some additional side-question:
Since I'm using padding, I suspect the accuracy is way too optimistic. Is there a way to tell Keras not to include the padding values in its accuracy calculations?
Am I even doing this right? Is there a better way to use Keras with sequences of different lengths?
You can check my (WIP) code here.
Inferring from a model that has been trained on a sequence
So it's a pretty common thing to do in RNN models and in Keras the best way (at least from what I know) is to create two different models.
One model for training (which uses sequences instead of individual items)
Another model for predicting (which uses a single element instead of a sequence)
So let's see an example. Suppose you have the following model.
from tensorflow.keras import models, layers
n_chars = 26
timesteps = 10
inp = layers.Input(shape=(timesteps, n_chars))
lstm = layers.LSTM(100, return_sequences=True)
out1 = lstm(inp)
dense = layers.Dense(n_chars, activation='softmax')
out2 = layers.TimeDistributed(dense)(out1)
model = models.Model(inp, out2)
model.summary()
Now to infer from this model, you create another model which looks like the one below.
inp_infer = layers.Input(shape=(1, n_chars))
# Inputs to feed LSTM states back in
h_inp_infer = layers.Input(shape=(100,))
c_inp_infer = layers.Input(shape=(100,))
# We need return_state=True so we are creating a new layer
lstm_infer = layers.LSTM(100, return_state=True, return_sequences=True)
out1_infer, h, c = lstm_infer(inp_infer, initial_state=[h_inp_infer, c_inp_infer])
out2_infer = layers.TimeDistributed(dense)(out1_infer)
# Our model takes the previous states as inputs and spits out new states as outputs
model_infer = models.Model([inp_infer, h_inp_infer, c_inp_infer], [out2_infer, h, c])
# We are setting the weights from the trained model
lstm_infer.set_weights(lstm.get_weights())
model_infer.summary()
So what's different. You see that we have defined a new input layer which accepts an input which has only one timestep (or in other words, just a single item). Then the model outputs an output which has a single timestep (technically we don't need the TimeDistributedLayer. But I've kept that for consistency). Other than that we take the previous LSTM state output as an input and produces the new state as the output. More specifically we have the following inference model.
Input: [(None, 1, n_chars) (None, 100), (None, 100)] list of tensor
Output: [(None, 1, n_chars), (None, 100), (None, 100)] list of Tensor
Note that I'm updating the weights of the new layers from the trained model or using the existing layers from the training model. It will be a pretty useless model if you don't reuse the trained layers and weights.
Now we can write inference logic.
import numpy as np
x = np.random.randint(0,2,size=(1, 1, n_chars))
h = np.zeros(shape=(1, 100))
c = np.zeros(shape=(1, 100))
seq_len = 10
for _ in range(seq_len):
print(x)
y_pred, h, c = model_infer.predict([x, h, c])
y_pred = x[:,0,:]
y_onehot = np.zeros(shape=(x.shape[0],n_chars))
y_onehot[np.arange(x.shape[0]),np.argmax(y_pred,axis=1)] = 1.0
x = np.expand_dims(y_onehot, axis=1)
This part starts with an initial x, h, c. Gets the prediction y_pred, h, c and convert that to an input in the following lines and assign it back to x, h, c. So you keep going for n iterations of your choice.
About masking zeros
Keras does offer a Masking layer which can be used for this purpose. And the second answer in this question seems to be what you're looking for.
I have a trained TF model which has the following architecture:
Inputs:
word_a, one-hot representation, vocab-size: 50000
word_b, one-hot representation, vocab-size: 50
Output:
probs, size: 1x10000
The network consists of embedding lookup of word_a of size 1x100 (dense_word_a) from an embedding matrix. word_b is transformed into a similar vector using a Character CNN into a dense vector of size 1x250. Both the vectors are concatenated into a 1x350 vector and using a decoder layer and sigmoid we're mapping it to the output layer and sigmoid with vector size 1x10000.
I need to run this model on the client, and for this I'm converting it to TFLite.
However, I also need to break the model into two sub-models with the following inputs and outputs:
Model 1:
Inputs:
word_a: one-hot representation, (1x50000) vocab-size: 50000
Output:
dense_word_a: dense word-embedding looked up from embedding matrix (1x100)
Network:
Simple embedding lookup for word_a from embedding matrix.
Model 2:
Inputs:
dense_word_a: embedding for word_a received from Model 1. (1x100).
word_b, one-hot representation, vocab-size: 50 (1x50)
Output:
probs, size: 1x10000
In Model 1, the input word_a is a placeholder and dense_word_a is a variable. In Model 2, dense_word_a is a placeholder and it's value is concatenated with the word_b's embedding.
The embeddings in my model are not pre-trained, and are trained as part of the model training process itself. So I need to train the model as a combined model but during inference I want to break it up into Model 1 and Model 2 as described above.
The idea is to run the Model 1 on server side and pass the embedding values to client so it can perform inference using word_b and not have a 5MB embedding matrix on the client. So, I'm not constrained on the size of Model 1, but since Model 2 runs on the client I need it to be small.
Here's what I've tried:
1. During model freezing time, I freeze the full model but in the output nodes list I also add the variable name dense_word_a along with probs. I then convert the model to TFLite. During inference, I'm able to see the dense_word_a output as a 1x100 vector. This seems to work fine. I'm also getting the probs as output,
For generating Model 2, I just remove the dense_word_a variable and convert it into a placeholder (using tf.placeholder), remove the placeholder for word_a and freeze the graph again.
However, I'm not able to match the probs value. The probs vector generated by the full model don't match with the probs values vector generated by Model 2.
How can I go about breaking the model into sub-models and also match the results?
I think what you described should work.
Is it easy to reproduce the problem that you're seeing? If you can isolate the reproducible steps and you believe there's a bug, could you file a bug on github? Thanks!
As an example, I'd like to train a neural network to predict the location of a picture(longitude, latitude) with the image, temperature, humidity and time of year as inputs into the model.
My question is, what is the best way to add this addition information to a cnn? Should I just merge the numeric inputs with the cnn in the last dense layer or at the beginning? Should I encode the numeric values (temperature, humidity and time of year)?
Any information, resources, sources would be greatly appreciated, thanks in advance.
You can process numeric inputs separately and merge them afterwards before making the final prediction:
# Your usual CNN whatever it may be
img_in = Input(shape=(width, height, channels))
img_features = SomeCNN(...)(img_in)
# Your usual MLP model
aux_in = Input(shape=(3,))
aux_features = Dense(24, activation='relu')(aux_in)
# Possibly add more hidden layers, then merge
merged = concatenate([img_features, aux_features])
# create last layer.
out = Dense(num_locations, activation='softmax')(merged)
# build model
model = Model([img_in, aux_in], out)
model.compile(loss='categorical_crossentropy', ...)
Essentially, you treat them as separate inputs and learn useful features that combined allow your model to predict. How you encode numeric inputs really depends on their type.
For continuous inputs like temperature you can normalize between -1, 1 for discrete inputs one-hot is very often. Here is a quick guide.
If you want to predict basis on those four features then i would suggest go with cnn + rnn
so feed the image to cnn and take the logits after that make a sequence like
logits=np.array(output).flatten()
[[logits] , [temperature], [humidity] , [time_of_year]] and feed it to
rnn , Rnn will treat it like a sequence input.
I'm new in Keras and Neural Networks. I'm writing a thesis and trying to create a SimpleRNN in Keras as it is illustrated below:
As it is shown in the picture, I need to create a model with 4 inputs + 2 outputs and with any number of neurons in the hidden layer.
This is my code:
model = Sequential()
model.add(SimpleRNN(4, input_shape=(1, 4), activation='sigmoid', return_sequences=True))
model.add(Dense(2))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, epochs=5000, batch_size=1, verbose=2)
predict = model.predict(data)
1) Does my model implement the graph?
2) Is it possible to specify connections between neurons Input and Hidden layers or Output and Input layers?
Explanation:
I am going to use backpropagation to train my network.
I have input and target values
Input is a 10*4 array and target is a 10*2 array which I then reshape:
input = input.reshape((10, 1, 4))
target = target.reshape((10, 1, 2))
It is crucial for to able to specify connections between neurons as they can be different. For instance, here you can have an example:
1) Not really. But I'm not sure about what exactly you want in that graph. (Let's see how Keras recurrent layers work below)
2) Yes, it's possible to connect every layer to every layer, but you can't use Sequential for that, you must use Model.
This answer may not be what you're looking for. What exactly do you want to achieve? What kind of data you have, what output you expect, what is the model supposed to do? etc...
1 - How does a recurrent layer work?
Documentation
Recurrent layers in keras work with an "input sequence" and may output a single result or a sequence result. It's recurrency is totally contained in it and doesn't interact with other layers.
You should have inputs with shape (NumberOrExamples, TimeStepsInSequence, DimensionOfEachStep). This means input_shape=(TimeSteps,Dimension).
The recurrent layer will work internally with each time step. The cycles happen from step to step and this behavior is totally invisible. The layer seems to work just like any other layer.
This doesn't seem to be what you want. Unless you have a "sequence" to input. The only way I know if using recurrent layers in Keras that is similar to you graph is when you have a segment of a sequence and want to predict the next step. If that's the case, see some examples by searching for "predicting the next element" in Google.
2 - How to connect layers using Model:
Instead of adding layers to a sequential model (which will always follow a straight line), start using the layers independently, starting from an input tensor:
from keras.layers import *
from keras.models import Model
inputTensor = Input(shapeOfYourInput) #it seems the shape is "(2,)", but we must see your data.
#A dense layer with 2 outputs:
myDense = Dense(2, activation=ItsAGoodIdeaToUseAnActivation)
#The output tensor of that layer when you give it the input:
denseOut1 = myDense(inputTensor)
#You can do as many cycles as you want here:
denseOut2 = myDense(denseOut1)
#you can even make a loop:
denseOut = Activation(ItsAGoodIdeaToUseAnActivation)(inputTensor) #you may create a layer and call it with the input tensor in just one line if you're not going to reuse the layer
#I'm applying this activation layer here because since we defined an activation for the dense layer and we're going to cycle it, it's not going to behave very well receiving huge values in the first pass and small values the next passes....
for i in range(n):
denseOut = myDense(denseOut)
This kind of usage allows you to create any kind of model, with branches, alternative ways, connections from anywhere to anywhere, provided you respect the shape rules. For a cycle like that, inputs and outputs must have the same shape.
At the end, you must define a model from one or many inputs to one or many outputs (you must have training data to match all inputs and outputs you choose):
model = Model(inputTensor,denseOut)
But notice that this model is static. If you want to change the number of cycles, you will have to create a new model.
In this case, it would be as simple as repeating the loop step denseOut = myDense(denseOut) and creating another model2=Model(inputTensor,denseOut).
3 - Trying to create something like the image below:
I am supposing C and F will participate in all iterations. If not,
Since there are four actual inputs, and we are going to treat them all separately, let's create 4 inputs instead, all like (1,).
Your input array should be divided in 4 arrays, all being (10,1).
from keras.models import Model
from keras.layers import *
inputA = Input((1,))
inputB = Input((1,))
inputC = Input((1,))
inputF = Input((1,))
Now the layers N2 and N3, that will be used only once, since C and F are constant:
outN2 = Dense(1)(inputC)
outN3 = Dense(1)(inputF)
Now the recurrent layer N1, without giving it the tensors yet:
layN1 = Dense(1)
For the loop, let's create outA and outB. They start as actual inputs and will be given to the layer N1, but in the loop they will be replaced
outA = inputA
outB = inputB
Now in the loop, let's do the "passes":
for i in range(n):
#unite A and B in one
inputAB = Concatenate()([outA,outB])
#pass through N1
outN1 = layN1(inputAB)
#sum results of N1 and N2 into A
outA = Add()([outN1,outN2])
#this is constant for all the passes except the first
outB = outN3 #looks like B is never changing in your image....
Now the model:
finalOut = Concatenate()([outA,outB])
model = Model([inputA,inputB,inputC,inputF], finalOut)