I am fairly new to tensorflow and I was following the answer to the question below in order to build a custom loss function in Keras that considers only the top 20 predictions.
How can I sort the values in a custom Keras / Tensorflow Loss Function?
However, when I try to compile my model using this code I get the following error about dimensions
InvalidArgumentError: input must have last dimension >= k = 20 but is 1 for 'loss_21/dense_65_loss/TopKV2' (op: 'TopKV2') with input shapes: [?,1], [] and with computed input tensors: input[1] = <20>.
A simplified version of the code that re-produces the error is the following.
import tensorflow as tf
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.optimizers import SGD
top = 20
def top_loss(y_true, y_pred):
y_pred_top_k, y_pred_ind_k = tf.nn.top_k(y_pred, top)
loss_per_sample = tf.reduce_mean(tf.reduce_sum(y_pred_top_k,
axis=-1))
return loss_per_sample
model = Sequential()
model.add(Dense(50, input_dim=201))
model.add(Dense(1))
sgd = SGD(lr=0.01, decay=0, momentum=0.9)
model.compile(loss=top_loss, optimizer=sgd)
and the error is thrown at the following line of the top_loss function when the model is compiled.
y_pred_top_k, y_pred_ind_k = tf.nn.top_k(y_pred, top)
It seems that y_pred in compile time is by default of shape [?,1] while the tf.nn.top_k function expects dimension at least higher than 'k` (i.e. 20).
Do I have to cast y_pred to something so that tf.nn.top_k knows it is of the correct dimensions?
Use:
y_pred_top_k, y_pred_ind_k = tf.nn.top_k(y_pred[:,0], top)
y_pred[:,0] gets the predicted values of the full batch as a rank 1 tensor.
Another Problem:
However, you will still end up with problem with the last batch. Say your batch size is 32 and your train data is of size 100 then the last batch will be of size less then 20 and so tf.nn.top_k will result in a run time error for the last batch. Just make sure your last batch size is >= 20 to avoid this issue. However a much better way is to check if the current batch is less then 20 and if so adjust your k value to be used in the top_k
Code
import tensorflow as tf
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.optimizers import SGD
top = tf.constant(20)
def top_loss(y_true, y_pred):
result = tf.cond(tf.math.greater(top_, tf.shape(y_true)[0]),
lambda: tf.shape(y_true)[0], lambda: top)
y_pred_top_k, y_pred_ind_k = tf.nn.top_k(y_pred[:,0], result)
loss_per_sample = tf.reduce_mean(tf.reduce_sum(y_pred_top_k,
axis=-1))
return loss_per_sample
model = Sequential()
model.add(Dense(50, input_dim=201))
model.add(Dense(1))
sgd = SGD(lr=0.01, decay=0, momentum=0.9)
model.compile(loss=top_loss, optimizer=sgd)
Related
I am developing an LSTM autoencoder model for anomaly detection. I have my keras model setup as below:
from keras.models import Sequential
from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape
def create_RNN_with_attention():
x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
attention_layer = attention()(RNN_layer_1)
dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
model=Model(x,output)
model.compile(loss='mae', optimizer='adam')
return model
Notice the attention layer that I added, attention_layer. Before adding this, the model compiled perfectly, however after adding this attention_layer - the model is throwing out the following error: ValueError: Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1
My attention layer is setup as follows:
import keras.backend as K
class attention(Layer):
def __init__(self,**kwargs):
super(attention,self).__init__(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1),
initializer='random_normal', trainable=True)
self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1),
initializer='zeros', trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
The idea of the attention mask is to allow the model to focus on more prominent features as is trains.
Why am I getting the error above and how can I fix this?
I think that the problem lies in this line:
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
This layer outputs a tensor of shape (batch_size, 64). So this means that you output a vector and then run attention mechanism on w.r.t. to the batch dimension instead of a sequential dimension. This also means that you output with a squeezed batch dimension that is not acceptable for any keras layer. This is why the Repeat layer raises error as it expects vector of at least shape (batch_dimension, dim).
If you want to run attention mechanism over a sequence then you should switch the line mentioned above to:
RNN_layer_1 = LSTM(units=64, return_sequences=True)(x)
This is my code:-
# Importing the essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Getting the dataset
data = pd.read_csv("sales_train.csv")
X = data.iloc[:, 1:-1].values
y = data.iloc[:, -1].values
# y = np.array(y).reshape(-1, 1)
# Getting the values for november 2013 and 2014 to predict 2015
list_of_november_values = []
list_of_november_values_y = []
for i in range(0, len(y)):
if X[i, 0] == 10 or X[i, 0] == 22:
list_of_november_values.append(X[i, 1:])
list_of_november_values_y.append(y[i])
# Converting list to array
arr_of_november_values = np.array(list_of_november_values)
y_train = np.array(list_of_november_values_y).reshape(-1, 1)
# Scaling the independent values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(arr_of_november_values)
# Creating the neural network
from keras.models import Sequential
from keras.layers import Dense
nn = Sequential()
nn.add(Dense(units=120, activation='relu'))
nn.add(Dense(units=60, activation='relu'))
nn.add(Dense(units=30, activation='relu'))
nn.add(Dense(units=15, activation='relu'))
nn.add(Dense(units=1, activation='softmax'))
nn.compile(optimizer='adam', loss='mse')
nn.fit(X_train, y_train, batch_size=100, epochs=25)
# Saving the weights
nn.save_weights('weights.h5')
print("Weights Saved")
For my loss, I am getting the same value for every epoch. Is it possible if there is a concept I am missing that is causing my loss to be constant??
Here is the dataset for the code.
The predominant reason is your odd choice of final-layer activation, paired with the loss function used. Reconsider this: you are using softmax activation on a single-unit fully-connected layer. Softmax activation takes a vector and scales it such that the sum of the values are equal to one and it retains proportion according to the following function:
The idea is that your network will only ever output 1, thus there are no gradients, and no learning.
To resolve this, first change your final layer activation to either ReLU or Linear, depending upon the structure of your dataset (I'm not willing to use the provided data myself, but I'm sure you understand the structure of your dataset).
I expect there may be further issues regarding the structure of your network, but I'll leave that up to you. For now, the big issue is your final-layer activation.
Change this line:
nn.add(Dense(units=1, activation='softmax'))
To this line:
nn.add(Dense(units=1))
For a regression problem, you don't need an activation function.
I am trying to create my own custom activation function in keras, which would return 0 if x < 0 and 1 if x >= 0
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import Activation
import tensorflow as tf
def hard_lim(x):
zero = tf.convert_to_tensor(0., x.dtype.base_dtype)
one = tf.convert_to_tensor(1., x.dtype.base_dtype)
sess = tf.Session()
if sess.run(tf.greater_equal(x, zero)):
return one
else:
return zero
model = Sequential()
model.add(Dense(4, input_dim=2, activation=Activation(hard_lim))
model.add(Dense(2, activation=Activation(hard_lim))
model.add(Dense(1, activation=Activation(hard_lim))
It's giving me this error
InvalidArgumentError (see above for traceback): You must feed a value
for placeholder tensor '1_input' with dtype float and shape [?,2]
How can I fix it?
Warning: this operation you want has no gradients and will not allow any weights before it to be trainable. You will see error messages like "an operation has None for gradient" or something like "None type not supported".
As a workaround for your activation, I believe the 'relu' activation would be the closest and best option, with the advantage of being very popular and used in most models.
In Keras, you don't usually run sessions. For custom operations, you create a function using backend functions.
So, you'd use a Lambda layer:
import keras.backend as K
def hardlim(x):
return K.cast(K.greater_equal(x,0), K.floatx())
You can then use activation=hardlim in layers.
I'm trying to get the activation values for each layer in this baseline autoencoder built using Keras since I want to add a sparsity penalty to the loss function based on the Kullbach-Leibler (KL) divergence, as shown here, pag. 14.
In this scenario, I'm going to calculate the KL divergence for each layer and then sum all of them with the main loss function, e.g. mse.
I therefore made a script in Jupyter where I do that but all the time, when I try to compile I get ZeroDivisionError: integer division or modulo by zero.
This is the code
import numpy as np
from keras.layers import Conv2D, Activation
from keras.models import Sequential
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32')
beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
model = Sequential()
model.add(Conv2D(filters=16,kernel_size=(4,4),padding='same',
name='encoder',input_shape=(128,128,1)))
model.add(Activation('relu'))
# get the average activation
A = K.mean(x=model.output)
# calculate the value for the KL divergence
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, A)],axis=0)
# decoder
model.add(Conv2D(filters=1,kernel_size=(4,4),padding='same', name='encoder'))
model.add(Activation('relu'))
B = K.mean(x=model.output)
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, B)],axis=0)
Here seems the cause
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in _normalize_axis(axis, ndim)
989 else:
990 if axis is not None and axis < 0:
991 axis %= ndim <----------
992 return axis
993
so there might be something wrong in the mean calculation. If I print the value I get
Tensor("Mean_10:0", shape=(), dtype=float32)
that is quite strange because the weights and the biases are non-zero initialised. Thus, there might be something wrong in the way of getting the activation values either.
I really would not know hot to fix it, I'm not much of a skilled programmer.
Could anyone help me in understanding where I'm wrong?
First, you shouldn't be doing calculations outside layers. The model must keep track of all calculations.
If you need a specific calculation to be done in the middle of the model, you should use a Lambda layer.
If you need that a specific output be used in the loss function, you should split your model for that output and do calculations inside a custom loss function.
Here, I used Lambda layer to calculate the mean, and a customLoss to calculate the kullback-leibler divergence.
import numpy as np
from keras.layers import *
from keras.models import Model
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32') #you'll probably not need this anymore, since losses will be treated individually in each output.
beta = beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
inp = Input((128,128,1))
lay = Convolution2D(filters=16,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(inp)
#apply the mean using a lambda layer:
intermediateOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(lay)
# decoder
finalOut = Convolution2D(filters=1,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(lay)
#but from that, let's also calculate a mean output for loss:
meanFinalOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(finalOut)
#Now, you have to create a model taking one input and those three outputs:
splitModel = Model(inp,[intermediateOut,meanFinalOut,finalOut])
And finally, compile your model with your custom loss function (we will define that later). But since I don't know if you're actually using the final output (not mean) for training, I'll suggest creating one model for training and another for predicting:
trainingModel = Model(inp,[intermediateOut,meanFinalOut])
trainingModel.compile(...,loss=customLoss)
predictingModel = Model(inp,finalOut)
#you don't need to compile the predicting model since you're only training the trainingModel
#both will share the same weights, you train one, and predict in the other
Our custom loss function should then deal with the kullback.
def customLoss(p,mean):
return #your own kullback expression (I don't know how it works, but maybe keras' one can be used with single values?)
Alternatively, if you want a single loss function to be called instead of two:
summedMeans = Add([intermediateOut,meanFinalOut])
trainingModel = Model(inp, summedMeans)
I'm new to Keras and python, now I'm working on Keras to find a model of data and use that model.predict for optimization, however the model.predict can only take input as numpy array of at least 2 elements.
My code is
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
import numpy as np
x = np.arange(-2,3.0,0.01)
y = x**2 - 2*x + 1
model = Sequential()
model.add(Dense(50, activation='sigmoid',
input_dim=1, init='uniform'))
model.add(Dense(1, activation='linear'))
sgd = SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=False)
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])
model.fit(x,y,nb_epoch=300, batch_size = 5,verbose = 0)
The code can fit fine, but if I try to use model.predict for a scalar number it gives me error
(Pdb) model.predict(0.0)
*** Exception: Error when checking : data should be a Numpy array, or list/dict of Numpy arrays. Found: 0.0...
I force it to be numpy array but still failed, and it said the input needs to be 2 dimensions!!!
(Pdb) model.predict(np.asarray(0.0))
*** Exception: Error when checking : expected dense_input_1 to have 2 dimensions, but got array with shape ()
but if I input two numbers then it gives me the answer
(Pdb) model.predict([0.0,0.0])
array([[ 1.07415712],
[ 1.07415712]], dtype=float32)
I need the model.predict to take single number as input to use for optimization. I'm not sure any setting I use wrong. Please help, thanks.
Try:
model.predict(np.asarray(0.0).reshape((1,1)))
In Keras first dimension is always connected with example number, so it must be provided.