I'll start with the code and after put my question.
model1_input= keras.Input(shape=(5,10))
x = layers.Dense(16, activation='relu')(model1_input)
model1_output = layers.Dense(4)(x)
model1= keras.Model(model1_input, model1_output, name='model1')
model1.summary()
//----
model2_input= keras.Input(shape=(5,10))
y = layers.Dense(16, activation='relu')(model2_input)
model2_output = layers.Dense(4)(y)
model2= keras.Model(model2_input, model2_output, name='model2')
model2.summary()
//----
model3_input= keras.Input(shape=(5, 10))
layer1 = model1(model3_input)
layer2 = model2(layer1)
model3_output = layers.Dense(1)(layer2)
model3= keras.Model(model3_input, model3_output , name='model3')
model3.summary()
model3.compile(loss='mse', optimizer='adam')
model3.fit(inputs, outputs, epochs=10, batch_size=32)
When execute this code, what will happen with the model 1 and model 2 weights? they would stay untrained?
I would like to use trained model1 and trained model2 predictions to train model3. Can I write something like that?
model1_input= keras.Input(shape=(5,10))
x = layers.Dense(16, activation='relu')(model1_input)
model1_output = layers.Dense(4)(x)
model1= keras.Model(model1_input, model1_output, name='model1')
model1.summary()
model1.compile(loss='mse', optimizer='adam')
model1.fit(model1_inputs, model1_outputs, epochs=10, batch_size=32)
//----
model2_input= keras.Input(shape=(5,10))
y = layers.Dense(16, activation='relu')(model2_input)
model2_output = layers.Dense(4)(y)
model2= keras.Model(model2_input, model2_output, name='model2')
model2.summary()
model2.compile(loss='mse', optimizer='adam')
model2.fit(model2_inputs, model2_outputs, epochs=10, batch_size=32)
//----
model3_input= keras.Input(shape=(5, 10))
layer1 = model1(model3_input)
layer2 = model2(layer1)
model3_output = layers.Dense(1)(layer2)
model3= keras.Model(model3_input, model3_output , name='model3')
model3.summary()
model3.compile(loss='mse', optimizer='adam')
model3.fit(inputs, outputs, epochs=10, batch_size=32)
I'm afraid when I train model3 this will change the already trained weights of models 1 and 2. In this case, what will happen with models 1 and 2 weights?
I am not sure if keras works that way but even if it does it still will change the weights as long as the layers are trainable. Try freezing the layers. These links might help you 1,2.
Another option will be to branch the layers like this.
Related
Problem: I have S sequences of T timesteps each and each timestep contains F features so collectively, a dataset of
(S x T x F) and each s in S is described by 2 values (Target_1 and Target_2)
Goal: Model/Train an architecture using LSTMs in order to learn/achieve a function approximator model M and given a sequence s, to predict Target_1 and Target_2 ?
Something like this:
M(s) ~ (Target_1, Target_2)
I'm really struggling to find a way, below is a Keras implementation of an example that probably does not work. I made 2 models one for the first Target value and 1 for the second.
model1 = Sequential()
model1.add(Masking(mask_value=-10.0))
model1.add(LSTM(1, input_shape=(batch, timesteps, features), return_sequences = True))
model1.add(Flatten())
model1.add(Dense(hidden_units, activation = "relu"))
model1.add(Dense(1, activation = "linear"))
model1.compile(loss='mse', optimizer=Adam(learning_rate=0.0001))
model1.fit(x_train, y_train[:,0], validation_data=(x_test, y_test[:,0]), epochs=epochs, batch_size=batch, shuffle=False)
model2 = Sequential()
model2.add(Masking(mask_value=-10.0))
model2.add(LSTM(1, input_shape=(batch, timesteps, features), return_sequences=True))
model2.add(Flatten())
model2.add(Dense(hidden_units, activation = "relu"))
model2.add(Dense(1, activation = "linear"))
model2.compile(loss='mse', optimizer=Adam(learning_rate=0.0001))
model2.fit(x_train, y_train[:,1], validation_data=(x_test, y_test[:,1]), epochs=epochs, batch_size=batch, shuffle=False)
I want to make somehow good use of LSTMs time relevant memory in order to achieve good regression.
IIUC, you can start off with a simple (naive) approach by using two output layers:
import tensorflow as tf
timesteps, features = 20, 5
inputs = tf.keras.layers.Input((timesteps, features))
x = tf.keras.layers.Masking(mask_value=-10.0)(inputs)
x = tf.keras.layers.LSTM(32, return_sequences=False)(x)
x = tf.keras.layers.Dense(32, activation = "relu")(x)
output1 = Dense(1, activation = "linear", name='output1')(x)
output2 = Dense(1, activation = "linear", name='output2')(x)
model = tf.keras.Model(inputs, [output1, output2])
model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001))
x_train = tf.random.normal((500, timesteps, features))
y_train = tf.random.normal((500, 2))
model.fit(x_train, [y_train[:,0],y_train[:,1]] , epochs=5, batch_size=32, shuffle=False)
I'm trying to add add conv layers to the transfer learning code mentioned below. But not sure how to proceed. I want to add
conv, max-pooling, 3x3 filter and stride 3 and activation mode ReLU or
conv, max-pooling, 3x3 filter and stride 3 and activation mode LReLU this layer in the below mentioned transfer learning code. Let me know if it's possible and if yes how?
CLASSES = 2
# setup model
base_model = MobileNet(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dropout(0.4)(x)
predictions = Dense(CLASSES, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
# transfer learning
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
"""##Data augmentation"""
# data prep
"""
## Transfer learning
"""
from tensorflow.keras.callbacks import ModelCheckpoint
filepath="mobilenet/my_model.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
EPOCHS = 1
BATCH_SIZE = 32
STEPS_PER_EPOCH = 5
VALIDATION_STEPS = 32
MODEL_FILE = 'mobilenet/filename.model'
history = model.fit_generator(
train_generator,
epochs=EPOCHS,
steps_per_epoch=STEPS_PER_EPOCH,
validation_data=validation_generator,
validation_steps=VALIDATION_STEPS,
callbacks=callbacks_list)
model.save(MODEL_FILE)
backup_model = model
model.summary()
You can do it several ways, one of them is:
model = Sequential([
base_model,
GlobalAveragePooling2D(name='avg_pool'),
Dropout(0.4),
Conv(...), # the layers you would like to add for the base model
MaxPool(...),
...
])
model.compile(...)
I think this is what you are after
CLASSES=2
new_filters=256 # specify the number of filter you want in the added convolutional layer
img_shape=(224,224,3)
base_model=tf.keras.applications.mobilenet.MobileNet( include_top=False, input_shape=img_shape, weights='imagenet',dropout=.4)
x=base_model.output
x= Conv2D(new_filters, 3, padding='same', strides= (3,3), activation='relu', name='added')(x)
x= GlobalAveragePooling2D(name='avg_pool')(x)
x= Dropout(0.4)(x)
predictions= Dense(CLASSES, activation='softmax', name='output')(x)
model=Model(inputs=base_model.input, outputs=predictions)
model.summary()
My goal is to first train only using ResNet152 and then save the learned weights. Then i want to use these weights as a base for a more complex model with added layers which i ultimately want to do hyperparameter tuning on. The reason for this approach is that doing it all at once takes a very long time. The problem i am having is that my code doesnt seem to work. I dont get an error message but when i start training the more complex model it seems to start from 0 again and not using the learned ResNet152 weights.
Here is the code:
First i am only using ResNet152 and the output layer
input_tensor = Input(shape=train_generator.image_shape)
base_model = applications.ResNet152(weights='imagenet', include_top=False, input_tensor=input_tensor)
for layer in base_model.layers[:]:
layer.trainable = True¨
x = Flatten()(base_model.output)
predictions = Dense(num_classes, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit(
train_generator,
validation_data=valid_generator,
epochs=epochs,
steps_per_epoch=len_train // batch_size,
validation_steps=len_val // batch_size,
callbacks=[earlyStopping, reduce_lr]
)
Then i am saving the weights:
model.save_weights('/content/drive/MyDrive/MODELS_SAVED/model_RESNET152/model_weights.h5')
Adding more layers.
input_tensor = Input(shape=train_generator.image_shape)
base_model = applications.ResNet152(weights='imagenet', include_top=False, input_tensor=input_tensor)
for layer in base_model.layers[:]:
layer.trainable = False
x = Flatten()(base_model.output)
x = Dense(1024, kernel_regularizer=tf.keras.regularizers.L2(l2=0.01),
kernel_initializer=tf.keras.initializers.HeNormal(),
kernel_constraint=tf.keras.constraints.UnitNorm(axis=0))(x)
x = LeakyReLU()(x)
x = BatchNormalization()(x)
x = Dropout(rate=0.1)(x)
x = Dense(512, kernel_regularizer=tf.keras.regularizers.L2(l2=0.01),
kernel_initializer=tf.keras.initializers.HeNormal(),
kernel_constraint=tf.keras.constraints.UnitNorm(axis=0))(x)
x = LeakyReLU()(x)
x = BatchNormalization()(x)
predictions = Dense(num_classes, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)
Loading the weights after the added layers and using by_name=True, both according to the keras tutorial.
model.load_weights('/content/drive/MyDrive/MODELS_SAVED/model_RESNET152/model_weights.h5', by_name=True)
Then i start training again.
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
model.fit(
train_generator,
validation_data=valid_generator,
epochs=epochs,
steps_per_epoch=len_train // batch_size,
validation_steps=len_val // batch_size,
callbacks=[earlyStopping, reduce_lr]
)
But it is starting at a very low accuracy, basically from 0 again, so im guessing something is wrong here. Any ideas on how to fix this?
When you use adam and save model weights only - you have to save/load optimizer weights as well:
weight_values = model.optimizer.get_weights()
with open(output_path+'optimizer.pkl', 'wb') as f:
pickle.dump(weight_values, f)
dummy_input = tf.random.uniform(inp_shape) # create a tensor of input shape
dummy_label = tf.random.uniform(label_shape) # create a tensor of label shape
hist = model.fit(dummy_input, dummy_label)
with open(path_to_saved_model+'optimizer.pkl', 'rb') as f:
weight_values = pickle.load(f)
optimizer.set_weights(weight_values)
I am having a time series prediction problem and building an LSTM like below :
def create_model():
model = Sequential()
model.add(LSTM(50,kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01), bias_regularizer=l2(0.01), input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.591))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
When I train the model on 5 splits like below :
tss = TimeSeriesSplit(n_splits = 5)
X = data.drop(labels=['target_prediction'], axis=1)
y = data['target_prediction']
for train_index, test_index in tss.split(X):
train_X, test_X = X.iloc[train_index, :].values, X.iloc[test_index,:].values
train_y, test_y = y.iloc[train_index].values, y.iloc[test_index].values
model=create_model()
history = model.fit(train_X, train_y, epochs=10, batch_size=64,validation_data=(test_X, test_y), verbose=0, shuffle=False)
I get an overfitting problem. The graph of loss is attached
I am not sure why there is overfitting when I use regularizers in my Keras model. Any help is appreciated .
EDIT:
Tried the architectures
def create_model():
model = Sequential()
model.add(LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
def create_model(x,y):
# define LSTM
model = Sequential()
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(x,y)))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
but still it is overfitting.
First of all remove all your regularizers and dropout. You are literally spamming with all the tricks out there and 0.5 dropout is too high.
Reduce the number of units in your LSTM. Start from there. Reach a point where your model stops overfitting.
Then, add dropout if required.
After that, the next step is to add the tf.keras.Bidirectional. If still, you are not satfisfied then, increase number of layers. Remember to keep return_sequences True for every LSTM layer except the last one.
It is seldom I come across networks using layer regularization despite the availability because dropout and layer regularization have a same effect and people usually go with dropout (at maximum, I have seen 0.3 being used).
I'm using keras to implement a basic CNN emotion detection. here is my model architecture
def HappyModel(input_shape):
X_Input = Input(input_shape)
X = ZeroPadding2D((3,3))(X_Input)
X = Conv2D(32, (7,7), strides=(1,1), name='conv0')(X)
X = BatchNormalization(axis = 3, name='bn0')(X)
X = Activation('relu')(X)
X = MaxPooling2D((2,2), name='mp0')(X)
X = Flatten()(X)
X = Dense(1, activation='sigmoid', name='fc0')(X)
model = Model(inputs = X_Input, outputs = X, name='hmodel')
return model
happyModel = HappyModel(X_train.shape[1:])
happyModel.compile(Adam(lr=0.1) ,loss= 'binary_crossentropy', metrics=['accuracy'])
happyModel.fit(X_train, Y_train, epochs = 50, batch_size=16, validation_data=(X_test, Y_test))
it appears that the model loss and accuracy doesn't change at all in every epoch step. it feels like the gradient descent stuck on local minima as follow:
https://i.imgur.com/9As8v0c.png
have tried using Adam and SGD optimizer, with both learning rate .1 and .5, still no luck.
it turns out if I change the compile method command parameters, the model would converge nicely on training epoch
happyModel.compile(optimizer = 'adam' ,loss= 'binary_crossentropy', metrics=['accuracy'])
Keras documentation says that if we write the parameter this way, it would use the default parameters for adam (https://keras.io/optimizers/)
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
but then, if I change the model compile method to the default parameters
happyModel.compile(Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=0.0),loss= 'binary_crossentropy', metrics=['accuracy'])
accuracy and loss still stuck.
what's the difference between the two different implementations of Adam optimizer on Keras?
You can check a closed issue on official keras-team page:
https://github.com/keras-team/keras/issues/5564
You probably have a syntax issue, since the two ways are not completely equivalent.