Is convolution useful on a network with a timestep of 1? - python

This code comes from https://www.kaggle.com/dkaraflos/1-geomean-nn-and-6featlgbm-2-259-private-lb, The goal of this competition is to use seismic signals to predict the timing of laboratory earthquakes. The person in this link has won first place among more than 4000 teams
def get_model():
inp = Input(shape=(1,train_sample.shape[1]))
x = BatchNormalization()(inp)
x = LSTM(128,return_sequences=True)(x) # LSTM as first layer performed better than Dense.
x = Convolution1D(128, (2),activation='relu', padding="same")(x)
x = Convolution1D(84, (2),activation='relu', padding="same")(x)
x = Convolution1D(64, (2),activation='relu', padding="same")(x)
x = Flatten()(x)
x = Dense(64, activation="relu")(x)
x = Dense(32, activation="relu")(x)
#outputs
ttf = Dense(1, activation='relu',name='regressor')(x) # Time to Failure
tsf = Dense(1)(x) # Time Since Failure
classifier = Dense(1, activation='sigmoid')(x) # Binary for TTF<0.5 seconds
model = models.Model(inputs=inp, outputs=[ttf,tsf,classifier])
opt = optimizers.Nadam(lr=0.008)
# We are fitting to 3 targets simultaneously: Time to Failure (TTF), Time Since Failure (TSF), and Binary for TTF<0.5 seconds
# We weight the model to optimize heavily for TTF
# Optimizing for TSF and Binary TTF<0.5 helps to reduce overfitting, and helps for generalization.
model.compile(optimizer=opt, loss=['mae','mae','binary_crossentropy'],loss_weights=[8,1,1],metrics=['mae'])
return model
However, According to my derivation, I think x = Convolution1D(128, (2),activation='relu', padding="same")(x) and x = Dense(128, activation='relu ')(x) has the same effect, because the convolution kernel performs convolution on the sequence with a time step of 1. In principle, it is very similar to the fully connected layer. Why use conv1D here instead of directly using the fullly connection layer? Is my derivation wrong?

1) Assuming you would input a sequence to the LSTM (the normal use case):
It would not be the same since the LSTM returns a sequence (return_sequences=True), thereby not reducing the input dimensionality. The output shape is therefore (Batch, Sequence, Hid). This is being fed to the Convolution1D layer which performs convolution on the Sequence dimension, i.e. on (Sequence, Hid). So in effect, the purpose of the 1D Convolutions is to extract local 1D subsequences/patches after the LSTM.
If we had return_sequences=False, the LSTM would return the final state h_t. To ensure the same behavior as a Dense layer, you need a fully connected convolutional layer, i.e. a kernel size of Sequence length, and we need as many filters as we have Hid in the output shape. This would then make the 1D Convolution equivalent to a Dense layer.
2) Assuming you do not input a sequence to the LSTM (your example):
In your example, the LSTM is used as a replacement for a Dense layer.
It serves the same function, though it gives you a slightly different
result as the gates do additional transformations (even though we
have no sequence).
Since the Convolution is then performed on (Sequence, Hid) = (1, Hid), it is indeed operating per timestep. Since we have 128 inputs and 128 filters, it is fully connected and the kernel size is large enough to operate on the single element. This meets the above defined criteria for a 1D Convolution to be equivalent to a Dense layer, so you're correct.
As a side note, this type of architecture is something you would typically get with a Neural Architecture Search. The "replacements" used here are not really commonplace and not generally guaranteed to be better than the more established counterparts. In a lot of cases, using Reinforcement Learning or Evolutionary Algorithms can however yield slightly better accuracy using "untraditional" solutions since very small performance gains can just happen by chance and don't have to necessarily reflect back on the usefulness of the architecture.

Related

Tensorflow - building LSTM model - need for tf.keras.layers.Dense()

Python 3.7 tensorflow
I am experimenting Time series forecasting w Tensorflow
I understand the second line creates a LSTM RNN i.e. a Recurrent Neural Network of type Long Short Term Memory.
Why do we need to add a Dense(1) layer in the end?
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Tutorial for Dense() says
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
would you like to rephrase or elaborate on need for Dense() here ?
The following line
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
creates an LSTM layer which transforms each input step of size #features into a latent representation of size 32. You want to predict a single value so you need to convert this latent representation of size 32 into a single value. Hence, you add the following line
single_step_model.add(tf.keras.layers.Dense(1))
which adds a Dense Layer (Fully-Connected Neural Network) with one neuron in the output which, obviously, produces a single value. Look at it as a way to transform an intermediate result of higher dimensionality into the final result.
Well in the tutorial you are following Time series forecasting, they are trying to forecast temperature (6 hrs ahead). For which they are using an LSTM followed by a Dense layer.
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Dense layer is nothing but a regular fully-connected NN layer. In this case you are bringing down the output dimensionality to 1, which should represent some proportionality (need not be linear) to the temperature you are trying to predict. There are other layers you could use as well. Check out, Keras Layers.
If you are confused about the input and output shape of LSTM, check out
I/O Shape.

Run multiple models of an ensemble in parallel with PyTorch

My neural network has the following architecture:
input -> 128x (separate fully connected layers) -> output averaging
I am using a ModuleList to hold the list of fully connected layers. Here's how it looks at this point:
class MultiHead(nn.Module):
def __init__(self, dim_state, dim_action, hidden_size=32, nb_heads=1):
super(MultiHead, self).__init__()
self.networks = nn.ModuleList()
for _ in range(nb_heads):
network = nn.Sequential(
nn.Linear(dim_state, hidden_size),
nn.Tanh(),
nn.Linear(hidden_size, dim_action)
)
self.networks.append(network)
self.cuda()
self.optimizer = optim.Adam(self.parameters())
Then, when I need to calculate the output, I use a for ... in construct to perform the forward and backward pass through all the layers:
q_values = torch.cat([net(observations) for net in self.networks])
# skipped code which ultimately computes the loss I need
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
This works! But I am wondering if I couldn't do this more efficiently. I feel like by doing a for...in, I am actually going through each separate FC layer one by one, while I'd expect this operation could be done in parallel.
In the case of Convnd in place of Linear you could use the groups argument for "grouped convolutions" (a.k.a. "depthwise convolutions"). This let's you handle all parallel networks simultaneously.
If you use a convolution kernel of size 1, then the convolution does nothing else than applying a Linear layer, where each channel is considered an input dimension. So the rough structure of your network would look like this:
Modify the input tensor of shape B x dim_state as follows: add an additional dimension and replicate by nb_state-times B x dim_state to B x (dim_state * nb_heads) x 1
replace the two Linear with
nn.Conv1d(in_channels=dim_state * nb_heads, out_channels=hidden_size * nb_heads, kernel_size=1, groups=nb_heads)
and
nn.Conv1d(in_channels=hidden_size * nb_heads, out_channels=dim_action * nb_heads, kernel_size=1, groups=nb_heads)
we now have a tensor of size B x (dim_action x nb_heads) x 1 you can now modify it to whatever shape you want (e.g. B x nb_heads x dim_action)
While CUDA natively supports grouped convolutions, there were some issues in pytorch with the speed of grouped convolutions (see e.g. here) but I think that was solved now.

How to properly setup an RNN in Keras for sequence to sequence modelling?

Although not new to Machine Learning, I am still relatively new to Neural Networks, more specifically how to implement them (In Keras/Python). Feedforwards and Convolutional architectures are fairly straightforward, but I am having trouble with RNNs.
My X data consists of variable length sequences, each data-point in that sequence having 26 features. My y data, although of variable length, each pair of X and y have the same length, e.g:
X_train[0].shape: (226,26)
y_train[0].shape: (226,)
X_train[1].shape: (314,26)
y_train[1].shape: (314,)
X_train[2].shape: (189,26)
y_train[2].shape: (189,)
And my objective is to classify each item in the sequence into one of 39 categories.
What I can gather thus far from reading example code, is that we do something like the following:
encoder_inputs = Input(shape=(None, 26))
encoder = GRU(256, return_state=True)
encoder_outputs, state_h = encoder(encoder_inputs)
decoder_inputs = Input(shape=(None, 39))
decoder_gru= GRU(256, return_sequences=True)
decoder_outputs, _ = decoder_gru(decoder_inputs, initial_state=state_h)
decoder_dense = Dense(39, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
Which makes sense to me, because each of the sequences have different lengths.
So with a for loop that loops over all sequences, we use None in the input shape of the first GRU layer because we are unsure what the sequence length will be, and then return the hidden state state_h of that encoder. With the second GRU layer returning sequences, and the initial state being the state returned from the encoder, we then pass the outputs to a final softmax activation layer.
Obviously something is flawed here because I get:
decoder_outputs, _ = decoder_gru(decoder_inputs, initial_state=state_h)
File "/usr/local/lib/python3.6/dist-
packages/tensorflow/python/framework/ops.py", line 458, in __iter__
"Tensor objects are only iterable when eager execution is "
TypeError: Tensor objects are only iterable when eager execution is
enabled. To iterate over this tensor use tf.map_fn.
This link points to a proposed solution, but I don't understand why you would add encoder states to a tuple for as many layers you have in the network.
I'm really looking for help in being able to successfully write this RNN to do this task, but also understanding. I am very interested in RNNs and want to understand them more in depth so I can apply them to other problems.
As an extra note, each sequence is of shape (sequence_length, 26), but I expand the dimension to be (1, sequence_length, 26) for X and (1, sequence_length) for y, and then pass them in a for loop to be fit, with the decoder_target_data one step ahead of the current input:
for idx in range(X_train.shape[0]):
X_train_s = np.expand_dims(X_train[idx], axis=0)
y_train_s = np.expand_dims(y_train[idx], axis=0)
y_train_s1 = np.expand_dims(y_train[idx+1], axis=0)
encoder_input_data = X_train_s
decoder_input_data = y_train_s
decoder_target_data = y_train_s1
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
epochs=50,
validation_split=0.2)
With other networks I have wrote (FeedForward and CNN), I specify the model by adding layers on top of Keras's Sequential class. Because of the inherent complexity of RNNs I see the general format of using Keras's Input class like above and retrieving hidden states (and cell states for LSTM) etc... to be logical, but I have also seen them built from using Keras's Sequential Class. Although these were many to one type tasks, I would be interested in how you would write it that way too.
The problem is that the decoder_gru layer does not return its state, therefore you should not use _ as the return value for the state (i.e. just remove , _):
decoder_outputs = decoder_gru(decoder_inputs, initial_state=state_h)
Since the input and output lengths are the same and there is a one to one mapping between the elements of input and output, you can alternatively construct the model this way:
inputs = Input(shape=(None, 26))
gru = GRU(64, return_sequences=True)(inputs)
outputs = Dense(39, activation='softmax')(gru)
model = Model(inputs, outputs)
Now you can make this model more complex (i.e. increase its capacity) by stacking multiple GRU layers on top of each other:
inputs = Input(shape=(None, 26))
gru = GRU(256, return_sequences=True)(inputs)
gru = GRU(128, return_sequences=True)(gru)
gru = GRU(64, return_sequences=True)(gru)
outputs = Dense(39, activation='softmax')(gru)
model = Model(inputs, outputs)
Further, instead of using GRU layers, you can use LSTM layers which has more representational capacity (of course this may come at the cost of increasing computational cost). And don't forget that when you increase the capacity of the model you increase the chance of overfitting as well. So you must keep that in mind and consider solutions that prevent overfitting (e.g. adding regularization).
Side note: If you have a GPU available, then you can use CuDNNGRU (or CuDNNLSTM) layer instead, which has been optimized for GPUs so it runs much faster compared to GRU.

Implementing weight normalization using TensorFlow layers' `kernel_constraint`

Some of the TensorFlow layers, such as tf.layers.dense and tf.layers.conv2d, take in a kernel_constraint argument, which according to the tf api docs docs implements an
Optional projection function to be applied to the kernel after being updated by an Optimizer (e.g. used to implement norm constraints or value constraints for layer weights).
In [1], Salimans et al. present a neural network normalization technique, called weight normalization, which normalizes the weight vectors of the network layers, in contrast to, for example the batch normalization [2], which normalizes the actual data batch flowing through the layer. In some cases the computational overhead of the weight normalization method is lower and it can also be used in cases where the use of batch normalization is not feasible.
My question is: is it possible to implement the weight normalization using the abovementioned TensorFlow layers' kernel_constraint? Assuming x is an input with shape (batch, height, width, channels), I thought I could implement it as follows:
x = tf.layers.conv2d(
inputs=x,
filters=16,
kernel_size=(3, 3),
strides=(1, 1),
kernel_constraint=lambda kernel: (
tf.nn.l2_normalize(w, list(range(kernel.shape.ndims-1)))))
What would be a simple test case to validate/invalidate my solution?
[1] SALIMANS, Tim; KINGMA, Diederik P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems. 2016. p. 901-909.
[2] IOFFE, Sergey; SZEGEDY, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
Despite the title, the paper by Salimans and Kingma suggests to decouple the weight norm and their direction, rather than actually normalising the weights (i.e. setting their l2 norm to one as you suggested).
If you want to verify that your code has the intended effect even if it is not what they proposed, you can get the weights of the model and check their norm.
In pseudo-code:
model = tf.models.Model(inputs=inputs, outputs=x)
weights = model.get_weights()[i] # checking the weights of the i-th layer
flat_weights = weights.flatten()
import numpy as np
print(np.linalg.norm(flat_weights, 2))

How to verify structure a neural network in keras model?

I'm new in Keras and Neural Networks. I'm writing a thesis and trying to create a SimpleRNN in Keras as it is illustrated below:
As it is shown in the picture, I need to create a model with 4 inputs + 2 outputs and with any number of neurons in the hidden layer.
This is my code:
model = Sequential()
model.add(SimpleRNN(4, input_shape=(1, 4), activation='sigmoid', return_sequences=True))
model.add(Dense(2))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, epochs=5000, batch_size=1, verbose=2)
predict = model.predict(data)
1) Does my model implement the graph?
2) Is it possible to specify connections between neurons Input and Hidden layers or Output and Input layers?
Explanation:
I am going to use backpropagation to train my network.
I have input and target values
Input is a 10*4 array and target is a 10*2 array which I then reshape:
input = input.reshape((10, 1, 4))
target = target.reshape((10, 1, 2))
It is crucial for to able to specify connections between neurons as they can be different. For instance, here you can have an example:
1) Not really. But I'm not sure about what exactly you want in that graph. (Let's see how Keras recurrent layers work below)
2) Yes, it's possible to connect every layer to every layer, but you can't use Sequential for that, you must use Model.
This answer may not be what you're looking for. What exactly do you want to achieve? What kind of data you have, what output you expect, what is the model supposed to do? etc...
1 - How does a recurrent layer work?
Documentation
Recurrent layers in keras work with an "input sequence" and may output a single result or a sequence result. It's recurrency is totally contained in it and doesn't interact with other layers.
You should have inputs with shape (NumberOrExamples, TimeStepsInSequence, DimensionOfEachStep). This means input_shape=(TimeSteps,Dimension).
The recurrent layer will work internally with each time step. The cycles happen from step to step and this behavior is totally invisible. The layer seems to work just like any other layer.
This doesn't seem to be what you want. Unless you have a "sequence" to input. The only way I know if using recurrent layers in Keras that is similar to you graph is when you have a segment of a sequence and want to predict the next step. If that's the case, see some examples by searching for "predicting the next element" in Google.
2 - How to connect layers using Model:
Instead of adding layers to a sequential model (which will always follow a straight line), start using the layers independently, starting from an input tensor:
from keras.layers import *
from keras.models import Model
inputTensor = Input(shapeOfYourInput) #it seems the shape is "(2,)", but we must see your data.
#A dense layer with 2 outputs:
myDense = Dense(2, activation=ItsAGoodIdeaToUseAnActivation)
#The output tensor of that layer when you give it the input:
denseOut1 = myDense(inputTensor)
#You can do as many cycles as you want here:
denseOut2 = myDense(denseOut1)
#you can even make a loop:
denseOut = Activation(ItsAGoodIdeaToUseAnActivation)(inputTensor) #you may create a layer and call it with the input tensor in just one line if you're not going to reuse the layer
#I'm applying this activation layer here because since we defined an activation for the dense layer and we're going to cycle it, it's not going to behave very well receiving huge values in the first pass and small values the next passes....
for i in range(n):
denseOut = myDense(denseOut)
This kind of usage allows you to create any kind of model, with branches, alternative ways, connections from anywhere to anywhere, provided you respect the shape rules. For a cycle like that, inputs and outputs must have the same shape.
At the end, you must define a model from one or many inputs to one or many outputs (you must have training data to match all inputs and outputs you choose):
model = Model(inputTensor,denseOut)
But notice that this model is static. If you want to change the number of cycles, you will have to create a new model.
In this case, it would be as simple as repeating the loop step denseOut = myDense(denseOut) and creating another model2=Model(inputTensor,denseOut).
3 - Trying to create something like the image below:
I am supposing C and F will participate in all iterations. If not,
Since there are four actual inputs, and we are going to treat them all separately, let's create 4 inputs instead, all like (1,).
Your input array should be divided in 4 arrays, all being (10,1).
from keras.models import Model
from keras.layers import *
inputA = Input((1,))
inputB = Input((1,))
inputC = Input((1,))
inputF = Input((1,))
Now the layers N2 and N3, that will be used only once, since C and F are constant:
outN2 = Dense(1)(inputC)
outN3 = Dense(1)(inputF)
Now the recurrent layer N1, without giving it the tensors yet:
layN1 = Dense(1)
For the loop, let's create outA and outB. They start as actual inputs and will be given to the layer N1, but in the loop they will be replaced
outA = inputA
outB = inputB
Now in the loop, let's do the "passes":
for i in range(n):
#unite A and B in one
inputAB = Concatenate()([outA,outB])
#pass through N1
outN1 = layN1(inputAB)
#sum results of N1 and N2 into A
outA = Add()([outN1,outN2])
#this is constant for all the passes except the first
outB = outN3 #looks like B is never changing in your image....
Now the model:
finalOut = Concatenate()([outA,outB])
model = Model([inputA,inputB,inputC,inputF], finalOut)

Categories