Where do the parameters in keras layers apply?

Where do the parameters in keras layers apply? - python

I'm trying to get to grips with the basics of neural networks and am struggling to understand keras layers.
Take the following code from tensorflow's tutorials:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
So this network has 3 layers? The first is just the 28*28 nodes representing the pixel values. The second is a hidden layer which takes weighted sums from the first, applies relu and then sends these to 10 output layers which are softmaxed?
but then this model seems to require different inputs to the layers:
model = keras.Sequential([
layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
Why does the input layer now have both an input_shape and a value 64? I read that the first parameter specifies the number of nodes in the second layer, but that doesn't seem to fit with the code in the first example. Also, why does the input layer have an activation? Is this just relu-ing the values before they enter the network?
Also, with regards activation functions, why are softmax and relu treated as alternatives? I thought relu applied to all the inputs of a single node, whereas softmax acted on the outputs of all the nodes across a layer?
Any help is really appreciated!
First example is from: https://www.tensorflow.org/tutorials/keras/basic_classification
Second example is from: https://www.tensorflow.org/tutorials/keras/basic_regression

Basically you have two types of API in Keras: Sequential and Functional API https://keras.io/getting-started/sequential-model-guide/
In Sequential API, you don't explictly refers an Input Layer Input https://keras.io/layers/core/#input
That is why you need to add an input_shape to specify the dimensions Of the First layer,
more information in https://jovianlin.io/keras-models-sequential-vs-functional/

Related

Need an Example of tf.keras.Sequential() Weight Initialization

I need to see how I would initialize all layers of a Sequential model with data from a same-sized sequential model.
E.G. How would I initialize the weights for every layer of the following Sequential model?
model = tf.keras.Sequential([Dense(2000, activation='relu', input_shape=(11,)),
Dense(1, activation='relu'),
Dropout(0.5),
Dense(400, activation='relu'),
Dropout(0.5),
Dense(150, activation='relu'),
BatchNormalization(),
Dense(y_max+1, activation='softmax')
])
I am fairly new to CNN training and have managed to make the above code work through trial and error and extensive research.
Datatype is list and np.array() of dtype np.float64
The idea is that I grab the weights from one model (same as above) and return it to another model (also same as above). I just need to be able to visualize how I can initialize the weights and biases of all layers using the following:
weights = model.get_weights()[0]
biases = model.get_weights()[1]
return weights, biases
I have attempted the model.set_weights() method, but I keep getting the following error message, given the code before the TypeError:
if iteration == 1:
for layer in model.layers:
layer.set_weights(None, None)
TypeError: set_weights() takes 2 positional arguments but 3 were given
I'd be very appreciative of any help, thank you.

In the Sequential example above, each layer parameters can be accessed and assigned new weights as shown below,
#example of first layer
model.layers[0]
#weights of the first layer,
model.layers[0].weights #gives the weights of kernel and bias of dense in this case
#assign new_weights by
model.layers[0].kernel.assign(tf.Variable(new_kernel_weights))
model.layers[0].bias.assign(tf.Variable(new_bias_weights))

How to select number of hidden layers and number of memory cells in LSTM?

How to select number of hidden layers and number of memory cells in LSTM?
I want make LSTM model about classification.
from tensorflow.keras import Sequential
model = Sequential()
model.add(Embedding(44000,32))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))

The number of memory cells can be set by passing in input_length parameter to your embedding layer, as it is defined by the length of your input sequences. This is optional and can be inferred when training data is provided.
You can increase the number of hidden LSTM layers by simply adding more. You need to set return_sequences=True to maintain the temporal dimension for your intermediate LSTM layers, however, ie,
model = Sequential()
model.add(Embedding(44000,32)) # 32-dim encoding is pretty small
model.add(LSTM(32), return_sequences=True)
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
gives two LSTM layers.
There is a pretty comprehensive guide to using RNNs for text classification in the Tensorflow documentation.

Can we have a dense layer between Conv layers in 1D CNN Architecture?

I have a question regarding the one-dimensional convolutional neural network 1D CNN.
Can we have a dense layer between Conv layers in the architecture? just like what I have done in the following example:
Note: It is working correctly with CSV files for classification problems.
model = Sequential()
# First Convolusional Layer
model.add(Conv1D(128, 5, input_shape=(20,1), strides=2, padding='same'))
model.add(Dense(256, activation="relu"))
model.add(MaxPooling1D())
# Second Convolusional Layer
model.add(Conv1D(128, 3, strides=1, padding='same'))
model.add(Dense(64, activation="relu"))
model.add(MaxPooling1D())
# Passing to Fully Connected Layers
model.add(Flatten())
model.add(Dense(32, activation = 'relu'))
#model.add(Dropout(0.02))
# Output Layer
model.add(Dense(2, activation = 'sigmoid'))
# Model Compilation
model.compile(loss = 'sparse_categorical_crossentropy',
optimizer = "adam", metrics = ['accuracy'])
# Summary of The Model
model.summary()
Thank you very much!

Yes, that you can certainly do. It is not usual at all and not very advisable from a theoretical perspective but it is possible.
why is it not advisable? (theory) With convolutions one tries to capture spatial features (i.e. information). Values next to each other should have an influence but values far away from this point (in time -- in the case of time series data) should have less influence. That is the whole idea of CNNs. To a fully connected NN the order in which the input is presented to it is not important. It looks at all inputs at the same time since it is equally connected to all. So you loose spatial information. BTW, that is also the reason why it is plausible to do a global pooling before feeding the output of the CNN-part of a model to the fully-connected part of a model (i.e. the dense layers).
Now if you do convolution, you care about spatial information. If you then apply a dense layer, you kind of say "I cared enough about the spatial info". If you then apply convolution again on the output vector of a dense layer it becomes totally irrational.
feasibility
Nonetheless, such a network would be feasible. You would just need to make sure that the dense layer outputs a vector (or matrix) again, on which you can apply convolution.
However, your code lacks of a proper adapter from the output of the convolution layer to the dense layer. You should apply some type of global pooling operation to create a vector that serves as an input to the dense layer. That would also save you the Flatten() step. Again, it should work your way anyway. It is just about style since now your are sending mixed signals: flatten should concatenate all spatial layers but the NN ignores spatial information...
I don't get the point of applying MaxPooling1D´ after the Dense-layer. One could simply reduce the number of outputs of the Dense-layer. And you definitely don't need a second Flattenafter aDense`-layer as it returns a vector by definition (and pooling won't add a dimension to it)

is keras LSTM supposed to work without an input_shape parameter?

I am using an LSTM for fake news detection and added an embedding layer to my model.
It is working fine without adding any input_shape in the LSTM function, but I thought the input_shape parameter was mandatory. Could someone help me with why there is no error even without defining input_shape? Is it because the embedding layer implicitly defines the input_shape?
Following is the code:
model=Sequential()
embedding_layer = Embedding(total_words, embedding_dim, weights=[embedding_matrix], input_length=max_length)
model.add(embedding_layer)
model.add(LSTM(64,))
model.add(Dense(1,activation='sigmoid'))
opt = SGD(learning_rate=0.01,decay=1e-6)
model.compile(loss = "binary_crossentropy", optimizer = opt,metrics=['accuracy'])
model.fit(data,train['label'], epochs=30, verbose=1)

You only need to provide an input_length to the Embedding layer. Furthermore, if you use a sequential model, you do not need to provide an input layer. Avoiding an input layer essentially means that your models weights are only created when you pass real data, as you did in model.fit(*). If you wanted to see the weights of your model before providing real data, you would have to define an input layer before your Embedding layer like this:
embedding_input = tf.keras.layers.Input(shape=(max_length,))
And yes, as you mentioned, your model infers the input_shape implicitly when you provide the real data. Your LSTM layer does not need an input_shape as it is also derived based on the output of your Embedding layer. If the LSTM layer were the first layer of your model, it would be best to specify an input_shape for clarity. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(10, 5)))
model.add(tf.keras.layers.Dense(1))
where 10 represents the number of time steps and 5 the number of features. In your example, your input to the LSTM layer has the shape(max_length, embedding_dim). Also here, if you do not specify the input_shape, your model will infer the shape based on your input data.
For more information check out the Keras documentation.

Usage of sigmoid activation function in Keras

I have a big dataset composed of 18260 input field with 4 outputs. I am using Keras and Tensorflow to build a neural network that can detect the possible output.
However I tried many solutions but the accuracy is not getting above 55% unless I use sigmoid activation function in all model layers except the first one as below:
def baseline_model(optimizer= 'adam' , init= 'random_uniform'):
# create model
model = Sequential()
model.add(Dense(40, input_dim=18260, activation="relu", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(10, activation="sigmoid", kernel_initializer=init))
model.add(Dense(4, activation="sigmoid", kernel_initializer=init))
model.summary()
# Compile model
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
Is using sigmoid for activation correct in all layers? The accuracy is reaching 99.9% when using sigmoid as shown above. So I was wondering if there is something wrong in the model implementation.

The sigmoid might work. But I suggest using relu activation for hidden layers' activation. The problem is, your output layer's activation is sigmoid but it should be softmax(because you are using sparse_categorical_crossentropy loss).
model.add(Dense(4, activation="softmax", kernel_initializer=init))
Edit after discussion on comments
Your outputs are integers for class labels. Sigmoid logistic function outputs values in range (0,1). The output of the softmax is also in range (0,1), but the Softmax function adds another constraint on the outputs:- the sum of the outputs must be 1. Therefore the outputs of softmax can be interpreted as probability of the input being each class.
E.g
def sigmoid(x):
return 1.0/(1 + np.exp(-x))
def softmax(a):
return np.exp(a-max(a))/np.sum(np.exp(a-max(a)))
a = np.array([0.6, 10, -5, 4, 7])
print(sigmoid(a))
# [0.64565631, 0.9999546 , 0.00669285, 0.98201379, 0.99908895]
print(softmax(a))
# [7.86089760e-05, 9.50255231e-01, 2.90685280e-07, 2.35544722e-03,
4.73104222e-02]
print(sum(softmax(a))
# 1.0

You got to use one or the other activation, as activations are the source to bring non-linearity into the model. If the model doesn't have any activation, then it basically behaves like a single layer network. Read more about 'Why to use activations here'. You can check various activations here.
Although it seems like your model is overfitting when using sigmoid, so try techniques to overcome it like creating train/dev/test sets, reducing complexity of the model, dropouts, etc.

Neural networks require non-linearity at each layer to work. Without non-linear activation no matter how many layers you have, you could write the same thing with only one layer.
Linear functions are limited in complexity and if "g" and "f" are linear functions g(f(x)) could be written as z(x) where z is also a linear function. It is pointless to stack them without adding non-linearity.
And that's why we use non-linear activation functions. sigmoid(g(f(x))) cannot be written as a linear function.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.