According to the documentation, tf.keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0) seems to clip x within [threshold, max_value], but x must be specified. How can I use it for clipping the output of a layer in neural network? Or is there a more convenient way to do so?
Suppose I want to output the linear combination of all elements of a 10-by-10 2D-array only when the result is between 0 and 5.
import tensorflow as tf
from tensorflow import keras
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[10, 10]))
model.add(keras.layers.Dense(1, activation='relu') # output layer
Related
I am all new to neural network. I have a dataset of 3d joint positions (6400*23*3) and orientations in quaternions (6400*23*4) and I want to predict the joint angles for all 22 joints and 3 motion planes (6400*22*3). I have tried to make a model however it will not run as the input data don't match the output shape, and I can't figure out how to change it.
my code
import scipy
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.utils import to_categorical
Jaload = scipy.io.loadmat('JointAnglesXsens11MovementsIforlængelse.mat')
Orload = scipy.io.loadmat('OrientationXsens11MovementsIforlængelse.mat')
Or = np.array((Orload['OR'][:,:]), dtype='float')
Ja = np.array((Jaload['JA'][:,:]), dtype='float')
Jalabel = np.array(Ja)
a = 0.6108652382
Jalabel[Jalabel<a] = 0
Jalabel[Jalabel>a] = 1
Ja3d = np.array(Jalabel.reshape(6814,22,3)) # der er 22 ledvinkler
Or3d = np.array(Or.reshape(6814,23,4)) # der er 23 segmenter
X_train = np.array(Or3d)
Y_train = np.array(Ja3d)
model = Sequential([
Dense(64, activation='relu', input_shape=(23,4)),
Dense(64, activation='relu'),
Dense(3, activation='softmax'),])
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam') # works
model.fit(
X_train,
to_categorical(Y_train),
epochs=3,)
Running the model.fit returns with:
ValueError: A target array with shape (6814, 22, 3, 2) was passed for an output of shape (None, 3) while using as loss categorical_crossentropy. This loss expects targets to have the same shape as the output.
Here are some suggestions that might get you further down the road:
(1) You might want to insert a "Flatten()" layer just before the final Dense. This will basically collapse the output from the previous layers into a single dimension.
(2) You might want to make the final Dense layer have 22*3=66 units as opposed to three. Each output unit will represent a particular joint angle.
(3) You might want to likewise collapse the Y_train to be (num_samples, 22*3) using the numpy reshape.
(4) You might want to make the final Dense layer have "linear" activation instead of "softmax" - softmax will force the outputs to sum to 1 as a probability.
(5) Don't convert the y_train to categorical. It is already in the correct format I believe (after you reshape it to match the revised output of the model).
(6) The metric to use is probably not "categorical_crossentropy" but perhaps "mse" (mean squared error).
Hopefully, some of the above will help move you in the right direction. I hope this helps.
The problem is the following. I have a categorical prediction task of vocabulary size 25K. On one of them (input vocab 10K, output dim i.e. embedding 50), I want to introduce a trainable weight matrix for a matrix multiplication between the input embedding (shape 1,50) and the weights (shape(50,128)) (no bias) and the resulting vector score is an input for a prediction task along with other features.
The crux is, I think that the trainable weight matrix varies for each input, if I simply add it in. I want this weight matrix to be common across all inputs.
I should clarify - by input here I mean training examples. So all examples would learn some example specific embedding and be multiplied by a shared weight matrix.
After every so many epochs, I intend to do a batch update to learn these common weights (or use other target variables to do multiple output prediction)
LSTM? Is that something I should look into here?
With the exception of an Embedding layer, layers apply to all examples in the batch.
Take as an example a very simple network:
inp = Input(shape=(4,))
h1 = Dense(2, activation='relu', use_bias=False)(inp)
out = Dense(1)(h1)
model = Model(inp, out)
This a simple network with 1 input layer, 1 hidden layer and an output layer. If we take the hidden layer as an example; this layer has a weights matrix of shape (4, 2,). At each iteration the input data which is a matrix of shape (batch_size, 4) is multiplied by the hidden layer weights (feed forward phase). Thus h1 activation is dependent on all samples. The loss is also computed on a per batch_size basis. The output layer has a shape (batch_size, 1). Given that in the forward phase all the batch samples affected the values of the weights, the same is true for backdrop and gradient updates.
When one is dealing with text, often the problem is specified as predicting a specific label from a sequence of words. This is modelled as a shape of (batch_size, sequence_length, word_index). Lets take a very basic example:
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
sequence_length = 80
emb_vec_size = 100
vocab_size = 10_000
def make_model():
inp = Input(shape=(sequence_length, 1))
emb = Embedding(vocab_size, emb_vec_size)(inp)
emb = Reshape((sequence_length, emb_vec_size))(emb)
h1 = Dense(64)(emb)
recurrent = LSTM(32)(h1)
output = Dense(1)(recurrent)
model = Model(inp, output)
model.compile('adam', 'mse')
return model
model = make_model()
model.summary()
You can copy and paste this into colab and see the summary.
What this example is doing is:
Transform a sequence of word indices into a sequence of word embedding vectors.
Applying a Dense layer called h1 to all the batches (and all the elements in the sequence); this layer reduces the dimensions of the embedding vector. It is not a typical element of a network to process text (in isolation). But this seemed to match your question.
Using a recurrent layer to reduce the sequence into a single vector per example.
Predicting a single label from the "sentence" vector.
If I get the problem correctly you can reuse layers or even models inside another model.
Example with a Dense layer. Let's say you have 10 Inputs
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# defining 10 inputs in a List with (X,) shape
inputs = [Input(shape = (X,),name='input_{}'.format(k)) for k in
range(10)]
# defining a common Dense layer
D = Dense(64, name='one_layer_to_rule_them_all')
nets = [D(inp) for inp in inputs]
model = Model(inputs = inputs, outputs = nets)
model.compile(optimizer='adam', loss='categorical_crossentropy')
This code is not going to work if the inputs have different shapes. The first call to D defines its properties. In this example, outputs are set directly to nets. But of course you can concatenate, stack, or whatever you want.
Now if you have some trainable model you can use it instead of the D:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# defining 10 inputs in a List with (X,) shape
inputs = [Input(shape = (X,),name='input_{}'.format(k)) for k in
range(10)]
# defining a shared model with the same weights for all inputs
nets = [special_model(inp) for inp in inputs]
model = Model(inputs = inputs, outputs = nets)
model.compile(optimizer='adam', loss='categorical_crossentropy')
The weights of this model are shared among all inputs.
I'm studying the paper "An Introduction to Deep Learning for the Physical Layer". While implementing the proposed network with python keras, I should normalize output of some layer.
One way is simple L2 Normalization (||X||^2 = 1), where X is a tensor of former layer output. I can implement simple L2 Normalization by the following code:
from keras import backend as K
Lambda(lambda x: K.l2_normalize(x,axis=1))
The other way, what I want to know, is ||X||^2 ≤ 1.
Is there any way that constrains the value of layer outputs?
You can apply constraint on layer wights (kernels) for some keras layers. For example on a Dense() layer like:
from keras.constraints import max_norm
from keras.layers import Dense
model.add(Dense(units, kernel_constraint=max_norm(1.)))
But keras layer does not accept an activity_constraint argument, However they accept activity_regularizer and you can use that to implement the first kind of regularization easier).
You can also clip output values of any layer to have maximum norm 1.0 (although I'm not sure if this is what you're looking for). For example if you're using a tensorflow backend, you can define a custom activation layer that clips the value of the layer by norm like:
import tensorflow as tf
def norm_clip(x):
return tf.clip_by_norm(x, 1, axes=[1])
And use it in your model like:
model.add(Dense(units, activation=norm_clip))
I am brand new to Deep-Learning so I'm reading though Deep Learning with Keras by Antonio Gulli and learning a lot. I want to start using some of the concepts. I want to try and implement a neural network with a 1-dimensional convolutional layer that feeds into a bidirectional recurrent layer (like the paper below). All the tutorials or code snippets I've encountered do not implement anything remotely similar to this (e.g. image recognition) or use an older version of keras with different functions and usage.
What I'm trying to do is a variation of this paper:
(1) convert DNA sequences to one-hot encoding vectors; ✓
(2) use a 1 dimensional convolutional neural network; ✓
(3) with max pooling; ✓
(4) send the output to a bidirectional RNN; ⓧ
(5) classify the input;
I cannot figure out how to get the shapes to match up on the Bidirectional RNN. I can't even get an ordinary RNN to work at this stage. How can I restructure the incoming layers to work with a Bidirectional RNN?
Note:
The original code came from https://github.com/uci-cbcl/DanQ/blob/master/DanQ_train.py but I simplified the output layer to just do binary classification. This processed was described (kind of) in https://github.com/fchollet/keras/issues/3322 but I cannot get it to work with the updated keras. The original code (and the 2nd link) work on a very large dataset so I am generating some fake data to illustrate the concept. They are also using an older version of keras where key functionality changes have been made since then.
# Imports
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl.keras.layers.core import *
from tensorflow.python.keras._impl.keras.layers import Conv1D, MaxPooling1D, SimpleRNN, Bidirectional, Input
from tensorflow.python.keras._impl.keras.models import Model, Sequential
# Set up TensorFlow backend
K = tf.keras.backend
K.set_session(tf.Session())
np.random.seed(0) # For keras?
# Constants
NUMBER_OF_POSITIONS = 40
NUMBER_OF_CLASSES = 2
NUMBER_OF_SAMPLES_IN_EACH_CLASS = 25
# Generate sequences
https://pastebin.com/GvfLQte2
# Build model
# ===========
# Input Layer
input_layer = Input(shape=(NUMBER_OF_POSITIONS,4))
# Hidden Layers
y = Conv1D(100, 10, strides=1, activation="relu", )(input_layer)
y = MaxPooling1D(pool_size=5, strides=5)(y)
y = Flatten()(y)
y = Bidirectional(SimpleRNN(100, return_sequences = True, activation="tanh", ))(y)
y = Flatten()(y)
y = Dense(100, activation='relu')(y)
# Output layer
output_layer = Dense(NUMBER_OF_CLASSES, activation="softmax")(y)
model = Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="categorical_crossentropy", )
model.summary()
# ~/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/recurrent.py in build(self, input_shape)
# 1049 input_shape = tensor_shape.TensorShape(input_shape).as_list()
# 1050 batch_size = input_shape[0] if self.stateful else None
# -> 1051 self.input_dim = input_shape[2]
# 1052 self.input_spec[0] = InputSpec(shape=(batch_size, None, self.input_dim))
# 1053
# IndexError: list index out of range
You don't need to restructure anything at all to get the output of a Conv1D layer into an LSTM layer.
So, the problem is simply the presence of the Flatten layer, which destroys the shape.
These are the shapes used by Conv1D and LSTM:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
Length is the same as timeSteps, and channels is the same as features.
Using the Bidirectional wrapper won't change a thing either. It will only duplicate your output features.
Classifying.
If you're going to classify the entire sequence as a whole, your last LSTM must use return_sequences=False. (Or you may use some flatten + dense instead after)
If you're going to classify each step of the sequence, all your LSTMs should have return_sequences=True. You should not flatten the data after them.
I learning python by using of the Keras high-level neural networks library.
I made a simple neural network with only one neuron for a linear classification problem for 2 inputs and one outputs.
My network is working well, but if I want to calculate the prediction self (to demonstrate the working of the NN), using the weights of the network, there will be a little difference between the own "implementation" of neural network's firing and Keras's. E.g.:
Using predict() method on trained network:
testset = np.array([[5],[1]])
prediction = model.predict(testset.transpose())
print prediction
The result is:
[[ 0.22708023]]
Calculate the result self:
# get the weights of the neural network
for layer in model.layers:
weights = layer.get_weights()
# the math of the prediction
prediction_calc = testset[0]*weights[0][0]+testset[1]*weights[0][1]+1*weights[1]
print prediction_calc
The reusult is:
[ 0.22708024]
Why is this little difference between the two values? The neural network should do more as I made at calculating the prediction_calc variable.
I think it should be something with casting between variables. If I print the weights variable, I see, it is a matrices of float32 values.
[array([[-0.07256483],
[ 0.02924729]], dtype=float32), array([ 0.56065708], dtype=float32)]
I don't find why is this difference, it should be a simple rounding error, but I don't know how to avoid it.
For some help here is the whole code:
import numpy as np
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load and transpose input data set
inputset = np.loadtxt("learningsets/hyperplane2d_INPUTS.csv", delimiter=",")
inputset = inputset.transpose()
# load output data set
outputset = np.loadtxt("learningsets/hyperplane2d_OUTPUTS.csv", delimiter=",")
outputset = outputset
# build the simple NN
from keras.models import Sequential
model = Sequential()
from keras.layers import Dense
# The neuron
# Dense: 2 inputs, 1 outputs . Linear activation
model.add(Dense(output_dim=1, input_dim=2, activation="linear"))
model.compile(loss='mean_squared_error', metrics=['accuracy'], optimizer='sgd')
model.fit(inputset, outputset, nb_epoch=20, batch_size=10)
# calculate prediction for one input
testset = np.array([[5],[1]])
predictions = model.predict(testset.transpose())
print predictions
# get the weights of the neural network
for layer in model.layers:
weights = layer.get_weights()
# the math of the prediction
prediction_calc = testset[0]*weights[0][0]+testset[1]*weights[0][1]+1*weights[1]
print prediction_calc