is keras LSTM supposed to work without an input_shape parameter? - python

I am using an LSTM for fake news detection and added an embedding layer to my model.
It is working fine without adding any input_shape in the LSTM function, but I thought the input_shape parameter was mandatory. Could someone help me with why there is no error even without defining input_shape? Is it because the embedding layer implicitly defines the input_shape?
Following is the code:
model=Sequential()
embedding_layer = Embedding(total_words, embedding_dim, weights=[embedding_matrix], input_length=max_length)
model.add(embedding_layer)
model.add(LSTM(64,))
model.add(Dense(1,activation='sigmoid'))
opt = SGD(learning_rate=0.01,decay=1e-6)
model.compile(loss = "binary_crossentropy", optimizer = opt,metrics=['accuracy'])
model.fit(data,train['label'], epochs=30, verbose=1)

You only need to provide an input_length to the Embedding layer. Furthermore, if you use a sequential model, you do not need to provide an input layer. Avoiding an input layer essentially means that your models weights are only created when you pass real data, as you did in model.fit(*). If you wanted to see the weights of your model before providing real data, you would have to define an input layer before your Embedding layer like this:
embedding_input = tf.keras.layers.Input(shape=(max_length,))
And yes, as you mentioned, your model infers the input_shape implicitly when you provide the real data. Your LSTM layer does not need an input_shape as it is also derived based on the output of your Embedding layer. If the LSTM layer were the first layer of your model, it would be best to specify an input_shape for clarity. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(10, 5)))
model.add(tf.keras.layers.Dense(1))
where 10 represents the number of time steps and 5 the number of features. In your example, your input to the LSTM layer has the shape(max_length, embedding_dim). Also here, if you do not specify the input_shape, your model will infer the shape based on your input data.
For more information check out the Keras documentation.

Related

Need an Example of tf.keras.Sequential() Weight Initialization

I need to see how I would initialize all layers of a Sequential model with data from a same-sized sequential model.
E.G. How would I initialize the weights for every layer of the following Sequential model?
model = tf.keras.Sequential([Dense(2000, activation='relu', input_shape=(11,)),
Dense(1, activation='relu'),
Dropout(0.5),
Dense(400, activation='relu'),
Dropout(0.5),
Dense(150, activation='relu'),
BatchNormalization(),
Dense(y_max+1, activation='softmax')
])
I am fairly new to CNN training and have managed to make the above code work through trial and error and extensive research.
Datatype is list and np.array() of dtype np.float64
The idea is that I grab the weights from one model (same as above) and return it to another model (also same as above). I just need to be able to visualize how I can initialize the weights and biases of all layers using the following:
weights = model.get_weights()[0]
biases = model.get_weights()[1]
return weights, biases
I have attempted the model.set_weights() method, but I keep getting the following error message, given the code before the TypeError:
if iteration == 1:
for layer in model.layers:
layer.set_weights(None, None)
TypeError: set_weights() takes 2 positional arguments but 3 were given
I'd be very appreciative of any help, thank you.
In the Sequential example above, each layer parameters can be accessed and assigned new weights as shown below,
#example of first layer
model.layers[0]
#weights of the first layer,
model.layers[0].weights #gives the weights of kernel and bias of dense in this case
#assign new_weights by
model.layers[0].kernel.assign(tf.Variable(new_kernel_weights))
model.layers[0].bias.assign(tf.Variable(new_bias_weights))

How to select number of hidden layers and number of memory cells in LSTM?

How to select number of hidden layers and number of memory cells in LSTM?
I want make LSTM model about classification.
from tensorflow.keras import Sequential
model = Sequential()
model.add(Embedding(44000,32))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
The number of memory cells can be set by passing in input_length parameter to your embedding layer, as it is defined by the length of your input sequences. This is optional and can be inferred when training data is provided.
You can increase the number of hidden LSTM layers by simply adding more. You need to set return_sequences=True to maintain the temporal dimension for your intermediate LSTM layers, however, ie,
model = Sequential()
model.add(Embedding(44000,32)) # 32-dim encoding is pretty small
model.add(LSTM(32), return_sequences=True)
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
gives two LSTM layers.
There is a pretty comprehensive guide to using RNNs for text classification in the Tensorflow documentation.

Where do the parameters in keras layers apply?

I'm trying to get to grips with the basics of neural networks and am struggling to understand keras layers.
Take the following code from tensorflow's tutorials:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
So this network has 3 layers? The first is just the 28*28 nodes representing the pixel values. The second is a hidden layer which takes weighted sums from the first, applies relu and then sends these to 10 output layers which are softmaxed?
but then this model seems to require different inputs to the layers:
model = keras.Sequential([
layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
Why does the input layer now have both an input_shape and a value 64? I read that the first parameter specifies the number of nodes in the second layer, but that doesn't seem to fit with the code in the first example. Also, why does the input layer have an activation? Is this just relu-ing the values before they enter the network?
Also, with regards activation functions, why are softmax and relu treated as alternatives? I thought relu applied to all the inputs of a single node, whereas softmax acted on the outputs of all the nodes across a layer?
Any help is really appreciated!
First example is from: https://www.tensorflow.org/tutorials/keras/basic_classification
Second example is from: https://www.tensorflow.org/tutorials/keras/basic_regression
Basically you have two types of API in Keras: Sequential and Functional API https://keras.io/getting-started/sequential-model-guide/
In Sequential API, you don't explictly refers an Input Layer Input https://keras.io/layers/core/#input
That is why you need to add an input_shape to specify the dimensions Of the First layer,
more information in https://jovianlin.io/keras-models-sequential-vs-functional/

Keras extending embedding layer input

A keras sequential model with embedding needs to be retrained starting from the currently known weights.
A Keras sequential model is trained on the provided (text) training data. The training data is tokenized by a (custom made) tokenizer. The input dimension for the first layer in the model, a embedding layer, is the number of words known by the tokenizer.
After a few days additional training data becomes available. The tokenizer needs to be refitted on this new data as it may contain additional words. That means that the input dimension of the embedding layer changes, so the previously trained model is not usable anymore.
self.model = Sequential()
self.model.add(Embedding(tokenizer.totalDistinctWords + 1,
hiddenSize + 1, batch_size=1,
input_length=int(self.config['numWords'])))
self.model.add(LSTM(hiddenSize, return_sequences=True,
stateful=True, activation='tanh', dropout = dropout))
self.model.add(LSTM(hiddenSize, return_sequences=True,
stateful=True, activation='tanh', dropout = dropout))
self.model.add(TimeDistributed(Dense(
len(self.controlSupervisionConfig.predictableOptionsAsList))))
self.model.add(Activation('softmax'))
I want to use the previously trained model as initializer for the new training session. For the new words in the tokenizer, the embedding layer should just use a random initialization. For the words already known by the tokenizer, it should use the previously trained embedding.
You can access (get and set) the layer's weights directly as numpy arrays with code like weights = model.layers[0].get_weights() and model.layers[0].set_weighs(weights, where model.layers[0] is your Embedding layer. This way you can store embeddings separately and set known embeddings by copying them from stored data.

Keras how to change trainable layers of a loaded model

I built and trained a network based on vgg16 network. In the original network I froze all the layers of vgg16 and trained only the last 4 layers which I added at the end of vgg16. Now I want to load and re-train this model by changing the trainable layers to use my own weights instead of ImageNet weights. Initially I tried to build the same model by changing the trainable layers of vgg16 and model weights with the following code.
# Load the VGG model
vgg_conv = VGG16(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
# Freeze n number of layers from the last
for layer in vgg_conv.layers[:-8]: layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers: print(layer, layer.trainable)
# Create and compile the model
model = createModel()
trained_model = keras.models.load_model(trained_dir)
model.set_weights(trained_model.get_weights())
model.compile(loss='categorical_crossentropy', optimizer=optimizers.RMSprop(lr=lr), metrics=['acc'])
But this gives me this error:
ValueError: Cannot feed value of shape (3, 3, 3, 64) for Tensor 'Placeholder_869:0', which has shape '(3, 3, 256, 512)'
When I check the weights of the original and new networks I see that shapes of some weights are different. I also tried to change the trainable layers of the original network but for layer in trained_model.layers: print(layer, layer.trainable) shows only the last layers that I added. So how can change the trainable layers of my own trained_model? Or is there another way to get the same result?
This might be the possible solution. I created a vgg16 based model with above code. Then I changed weights of the last layers by running this code: model.layers[1].set_weights(trained_model.layers[1].get_weights()). Since I added 4 layers to vgg16 I executed this code by changing the layer index from 1 to 4. I have not tried the model yet. If this is not a correct solution I would be glad to read your answers.

Categories