I'm using a custom loss function:
def ratio_loss(y, y0):
return (K.mean(K.abs(y - y0) / y))
and get negative predicted values - which in my case doesn't makes scene (I use CNN and regression as last layer to get a length of an object).
I used division in order to penalize more where the true value is relative small to the predicted).
how can i prevent the negative predictions ?
this is the mode (for now..):
def create_model():
model = Sequential()
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same', input_shape=(128, 128, 1)))
model.add(Dropout(0.5))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(Dropout(0.25))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(Dropout(0.25))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
#
#
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(Dropout(0.25))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(1))
#model.compile(loss=keras.losses.mean_squared_error, optimizer=keras.optimizers.Adadelta(), metrics=[sacc])
model.compile(loss=ratio_loss, optimizer=keras.optimizers.Adadelta(), metrics=[sacc])
return model
Thanks,
Amir
You could continue training your neural network, and hopefully it will learn not to make any prediction below 0 (assuming all of the training data has output below 0). You could then add a post-prediction step where you turn an And if it makes any predictions below 0, then you can just convert it to 0.
You could add an activation function as Daniel Möller answered.
That would involve changing
model.add(Dense(1))
to
model.add(Dense(1, activation='softplus'))
since you mentioned you wanted the output to be from 0 to ~200 in a comment.
This would guarantee there's not output below 0.
def ratio_loss(y, y0):
return (K.mean(K.abs(y - y0 / y)))
But what is the range of your expected output?
You should probably be using some activation function at the end such as:
activation ='sigmoid' - from 0 to 1
activation = 'tanh' - from -1 to +1
activation = 'softmax' - if it's a classification problem with only one correct class
actication = 'softplus' - from 0 to +inf.
etc.
Usage in the last layer:
model.add(Dense(1,activation='sigmoid')) #from 0 to 1
#optional, from 0 to 200 after using the sigmoid above
model.add(Lambda(lambda x: 200*x))
Hint: if you're a starter, avoid using too much "relu", it often gets stuck in 0 and must be used with carefully selected learning rates.
Related
I'm following this tutorial here.
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
I am trying to understand the given code which uses the CIFAR-10 dataset.
why is he using kernel_initializer='he_uniform'?
why did he choose the 128 for the dense layer?
what will happen if we add more dense layer to the code like:
model.add(Dense(512, activation='relu', kernel_initializer='he_uniform'))
is there any way to increase the accuracy of the model?
what would be a suitable dropout rate?
why is he using kernel_initializer='he_uniform'?
The weights in a layer of a neural network are initialized randomly. How though? Which distribution should they follow? he_uniform is a strategy for initializing the weights of that layer.
why did he choose the 128 for the dense layer?
This was chosen arbitrarily.
What will happen if we add more dense layer to the code like:
model.add(Dense(512, activation='relu', kernel_initializer='he_uniform'))
I assime you mean to add them where the other 128-neuron Dense layer is (there it won't break the model) The model will become deeper and have a much higher number of parameters (i.e. your model will become more complex) with whatever positives or negatives come along with this.
what would be a suitable dropout rate?
Usually you see rates in the range of [0.2, 0.5]. Higher rates reduce overfitting but might cause training to become more unstable.
I am working in python and tensor flow but I miss 'units' argument and I do not know how to solve it, It looks like your post is mostly code; please add some more details.It looks like your post is mostly code; please add some more details.
here the code
def createModel():
model = Sequential()
# first set of CONV => RELU => MAX POOL layers
model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=inputShape))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(output_dim=NUM_CLASSES, activation='softmax'))
# returns our fully constructed deep learning + Keras image classifier
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
# use binary_crossentropy if there are two classes
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
return model
print("Reshaping trainX at..."+ str(datetime.now()))
#print(trainX.sample())
print(type(trainX)) # <class 'pandas.core.series.Series'>
print(trainX.shape) # (750,)
from numpy import zeros
Xtrain = np.zeros([trainX.shape[0],HEIGHT, WIDTH, DEPTH])
for i in range(trainX.shape[0]): # 0 to traindf Size -1
Xtrain[i] = trainX[i]
print(Xtrain.shape) # (750,128,128,3)
print("Reshaped trainX at..."+ str(datetime.now()))
print("Reshaping valX at..."+ str(datetime.now()))
print(type(valX)) # <class 'pandas.core.series.Series'>
print(valX.shape) # (250,)
from numpy import zeros
Xval = np.zeros([valX.shape[0],HEIGHT, WIDTH, DEPTH])
for i in range(valX.shape[0]): # 0 to traindf Size -1
Xval[i] = valX[i]
print(Xval.shape) # (250,128,128,3)
print("Reshaped valX at..."+ str(datetime.now()))
# initialize the model
print("compiling model...")
sys.stdout.flush()
model = createModel()
# print the summary of model
from keras.utils import print_summary
print_summary(model, line_length=None, positions=None, print_fn=None)
# add some visualization
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
SVG(model_to_dot(model).create(prog='dot', format='svg'))
Try changing this line:
model.add(Dense(output_dim=NUM_CLASSES, activation='softmax'))
to
model.add(Dense(NUM_CLASSES, activation='softmax'))
I'm not experience in keras but I could not find a parameter called output_dim on the documentation page for Dense. I think you meant to provide units but labelled it as output_dim
The Keras Dense layer documentation is as follows:
keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
Using the following :
classifier.add(Dense(6, activation='relu', kernel_initializer='glorot_uniform',input_dim=11))
Will work as here the units means the output_dim saying that we need 6 neurons in the hidden layer. The weights are initialized with the uniform function and the input layer has 11 independent variables of the dataset (input_dim) to feed the above-hidden layer.
I think it's a version issue. In updated version of keras for Dense there is no "output_dim" argument.
You can see this documentation link for Dense arguments.
https://keras.io/api/layers/core_layers/dense/
tf.keras.layers.Dense(
units,
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
So the first argument is "units", Which is mandatory.
instead of this line:
model.add(Dense(output_dim=NUM_CLASSES, activation='softmax'))
use this:
model.add(Dense(units=NUM_CLASSES, activation='softmax'))
or
model.add(Dense(NUM_CLASSES, activation='softmax'))
I have a classifier model that I trained using 'theano' backend. The model works properly and I got the expected classification perforamance. The tensor size is Nx3x28x112 However, I would like to use the same classifier in another file (main_file.py) which contains a GANs implementation (with'tensorflow' backend). Thereby, I want to use the same classificer in the main_file.py and to change the input size of the tensor in order to be Nx28x112x3 (that is the proper input for the tensorflow backend). While the training procedure starts the performance is not close to the one I got with 'theano' and is close to random performance. My model looks like:
def createModel():
model = Sequential()
# The first two layers with 32 filters of window size 3x3
model.add(Conv2D(28, (3, 3), padding='same', activation='relu', input_shape=(28, 112, 3)))
# or input_shape = (3, 28, 112) in case of theano backend
model.add(Conv2D(28, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(nClasses, activation='softmax'))
return model
What should I do in order to make the model perform properly? Is there any fundamental difference when the backend is changing except the order of the input tensors?
I'm trying to combine LSTM with CNN but I got stuck because of an error.
Here is the model I'm trying to implement:
model=Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(28, 28,3), activation='relu'))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(LSTM(128, return_sequences=True,input_shape=(1,32), activation='relu'))
model.add(LSTM(256))
model.add(Dropout(0.25))
model.add(Dense(37))
model.compile(loss='categorical_crossentropy', optimizer='adam')
and error happens in the first LSTM layer:
ERROR: Input 0 is incompatible with layer lstm_12: expected ndim=3, found ndim=2
The input of LSTM layer should be a 3D array which represents a sequence or a timeseries (this is what the error is trying to say: expected ndim=3). However, in your model the input of LSTM layer, which is actually the output of the Dense layer before it, is a 2D array (i.e. found ndim=2). To make it into a 3D array of shape (n_samples, n_timesteps, n_features), one solution is to use a RepeatVector layer to repeat it as much as the number of timesteps (which you need to specify in your code):
model.add(Dense(32, activation='relu'))
model.add(RepeatVector(n_timesteps))
model.add(LSTM(128, return_sequences=True, input_shape=(n_timesteps,32), activation='relu'))
My input shape is (20, 10, 1)
my non working model looks like this:
num_classes = 2
model.add(Conv2D(32, (5, 5),
padding='same',
data_format='channels_last',
input_shape=input_shape))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(32, (5, 5)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (5, 5), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(64, (5, 5)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
self.model = model
Which gives me the following error:
Negative dimension size caused by subtracting 5 from 2 for 'conv2d_4/convolution' (op: 'Conv2D') with input shapes: [?,5,2,64], [5,5,64,64].
However the error disappears if I did one of two things:
1. remove all three model.add(Conv2D(64, (5, 5)))
or
2. Change ^ three Conv2D layers from (5,5)to (3,3) and change all
pool_size(2,2)
I understand that the dimensions at end of 4th layer is causing the trouble .
What should I do to make ^ model work in the present state?
Basically I want to compare performance of this model (filter size 5x5 with pool_size(3,3) with another model that uses a 3x3 filter with pool_size(2,2) . Thanks
The problems is that the output of layer conv2d_4 became zero or negative.
To solve this problem, you must design the network so that the input data would not be highly downsampled.
Here are some possible solutions:
Use less layers. Especially remove a max-pooling layer, which downsamples a lot (by one third under this setting).
Use smaller max-pooling, e.g. pool_size=(2, 2), which results in downsampling by a half.
Use "same padding" for Conv2D layer, which results in no downsampling during the convolution step.
Change the strides values to 1,1 as shown below. That resolves my problem
model.add(MaxPooling2D(pool_size=(2, 2),strides=(1,1),padding='same',name = 'pool2'))
The full layer diagram for my code is
def getModel():
optim = Adam(lr= LR, decay=0)
model =Sequential()
model.add(Conv2D(32,(3,3),activation = 'relu',input_shape =[28,28,1],kernel_initializer='he_uniform',
kernel_regularizer=regularizers.l2(l2_labmda),data_format='channels_last',name='1st'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2),strides=(1,1),padding='same',name = 'pool1'))
model.add(Dropout(0.2,name='2'))
model.add(Conv2D(64,(3,3),activation = 'relu',kernel_initializer='he_uniform',
kernel_regularizer=regularizers.l2(l2_labmda),name='3'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2),strides=(1,1),padding='same',name = 'pool2'))
model.add(Dropout(0.2,name='4'))
model.add(Conv2D(64,(5,5),activation = 'relu',kernel_initializer='he_uniform',
kernel_regularizer=regularizers.l2(l2_labmda),name='5'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2),strides=(1,1),padding='same',name = 'pool3'))
model.add(Dropout(0.2, name='6'))
# =============================================================================
model.add(Conv2D(128,(3,3),activation = 'relu',kernel_initializer='he_uniform',
kernel_regularizer=regularizers.l2(l2_labmda),name='7'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2,2),padding='same',name = 'pool5'))
model.add(Dropout(0.2,name='8'))
#
# =============================================================================
model.add(Conv2D(128,(3,3),activation = 'relu',kernel_initializer='he_uniform',
kernel_regularizer=regularizers.l2(l2_labmda),name='9'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2,2),padding='same',name = 'pool6'))
model.add(Dropout(0.2))
model.add(Flatten(name='12'))
model.add(Dense(512,activation='relu',kernel_initializer='he_uniform',
kernel_regularizer=regularizers.l2(l2_labmda), name='13'))
model.add(Dropout(0.2,name='14'))
model.add(Dense(512,activation='relu',kernel_initializer='he_uniform',
kernel_regularizer=regularizers.l2(l2_labmda),name='15'))
model.add(Dropout(0.2,name='16'))
model.add(Dense(40,activation='softmax',kernel_initializer='glorot_uniform ',
kernel_regularizer=regularizers.l2(l2_labmda),name='17'))
model.compile(optimizer=optim,loss='categorical_crossentropy',metrics=['accuracy'])
print(model.summary())
return model
model = getModel()
Max pooling reduces the width and height of your output, for example if you apply pool_size = (4, 4) and a stride of 4 to an input of shape (20, 20, 1), then your output shape would be (5, 5, 1) right?, but what if you input the same pool_size = (4, 4) and a stride of 4' to an input of (10, 10, 1), then you see that that doesn't work because your output shape would now have to be (-10, -10, 1) which doesn't make any sense, and causes an error. So consider reducing the pool_sizes of your MaxPooling2D so that the input to that layer doesn't produce negative output dimensions.