Restrict the sum of outputs in a neural network regression (Keras) - python

I'm predicting 7 targets, which is ratio from one value, so for each sample sum of all predicted values should be 1.
Except of using softmax at the output (which seems obviously incorrect) I just cant figure out other ways to restrict sum of all predicted outputs to be =1..
Thanks for any suggestuions.
input_x = Input(shape=(input_size,))
output = Dense(512, activation=PReLU())(input_x)
output = Dropout(0.5)(output)
output = Dense(512, activation=PReLU())(output)
output = Dropout(0.5)(output)
output = Dense(16, activation=PReLU())(output)
output = Dropout(0.3)(output)
outputs = Dense(output_size, activation='softmax')(output)
#outputs = [Dense(1, activation=PReLU())(output) for i in range(output_size)] #multioutput nn
nn = Model(inputs=input_x, outputs=outputs)
es = EarlyStopping(monitor='val_loss',min_delta=0,patience=10,verbose=1, mode='auto')
opt=Adam(lr=0.001, decay=1-0.995)
nn.compile(loss='mean_absolute_error', optimizer=opt)
history = nn.fit(X, Y, validation_data = (X_t, Y_t), epochs=100, verbose=1, callbacks=[es])
Example of targets:
So, this is all ratios from one feature, sum for each row =1.
For example Feature - 'Total' =100 points, A=25 points, B=25 points, all others - 10 points. So, my 7 target ratios will be 0.25/0.25/0.1/0.1/0.1/0.1/0.1.
I need to train and predict such ratios, so in future, knowing 'Total' we can restore points from predicted ratios.

I think I understand your motivation, and also why "softmax won't cut it".
This is because softmax doesn't scale linearly, so:
>>> from scipy.special import softmax
>>> softmax([1, 2, 3, 4])
array([0.0320586 , 0.08714432, 0.23688282, 0.64391426])
>>> softmax([1, 2, 3, 4]) * 10
array([0.32058603, 0.87144319, 2.36882818, 6.4391426 ])
Which looks nothing like the original array.
Don't dismiss softmax too easy though - it can handle special situations like negative values, zeros, zero sum of pre-activation signal... But if you want the final regression to be normalized to one, and expect the results to be non-negative, you can simply divide it by the sum:
input_x = Input(shape=(input_size,))
output = Dense(512, activation=PReLU())(input_x)
output = Dropout(0.5)(output)
output = Dense(512, activation=PReLU())(output)
output = Dropout(0.5)(output)
output = Dense(16, activation=PReLU())(output)
output = Dropout(0.3)(output)
outputs = Dense(output_size, activation='relu')(output)
outputs = Lambda(lambda x: x / K.sum(x))(outputs)
nn = Model(inputs=input_x, outputs=outputs)
The Dense layer of course needs a different activation than 'softmax' (relu or even linear is OK).

Related

How to print out equation that multiple linear regression model is using in Tensorflow?

For a multiple linear regression model in Tensorflow in python, how can you print out the equation that the model is using to predict the label. The model I am currently using takes two features to predict one label, so I think the general equation is this but how could I get the unknown parameters and values of all the constants using Tensorflow?
Code:
fundingFeatures = fundingTrainSet.copy()
fundingLabels = fundingFeatures.pop('% of total funding spent')
fundingFeatures = np.array(fundingFeatures)
normalizer = preprocessing.Normalization()
normalizer.adapt(fundingFeatures)
model = tf.keras.Sequential([
normalizer,
layers.Dense(units=1)
])
model.compile(loss = tf.losses.MeanSquaredError(),
optimizer = tf.keras.optimizers.SGD(
learning_rate=0.06, momentum=0.0, nesterov=True, name="SGD",
))
model.fit(fundingFeatures, fundingLabels, epochs=1000)
I will explain how you can write the equation of your NN.
In order to do that, I have modified your code and added fixed values for your Y features and Y labels. I'm doing that in order to show the whole calculation step by step so that next time you can do it yourself.
Based on all the information you have provided, it seems that you have
NN with 2 layers.
First layer is a Normatization layer
Second layer is a Dense layer
You have 2 features in your input tensor and 1 single output
Let's start with the normalization layer. For normalization layers, it is kind "strange" in my opinion to use the term "weight". The weights are basically
the mean and variance which will be applied to each input in order to normalize the data.
I wil call the 2 input features x0 and x1
if you run my code (which is your code with my fixed data), you will see that the weights for the normalization layer are
[5. 4.6]
[ 5.4 11.24]
It means that the means for your [x0 x1] columns are [5. 4.6] and the variances are [5.4 11.24]
Can we verify that? Yes, we can. Let's check for x0.
[1,4,8,7,3,6,6,5,2,8,5]
mean = 5
stddev = 2.323790008
variance = 5.4 ( variance = stddev^2)
As you can see, it matches the "weights" of the normalization layer.
As data is pushed thru the normalization layer, each value will be normalized based on
x' = (x-mean)/stddev ( stddev, not variance )
You can check that by applying the normalization to the data.
In the code, if you run this 2 lines
normalized_data = normalizer(fundingFeatures)
print(normalized_data)
You will get
[[-1.7213259 1.31241 ]
[-0.43033147 1.014135 ]
[ 1.2909944 0.41758505]
[ 0.86066294 -0.47723997]
[-0.86066294 -1.07379 ]
[ 0.43033147 1.31241 ]
[ 0.43033147 -1.07379 ]
[ 0. -1.07379 ]
[-1.2909944 0.71586 ]
[ 1.2909944 -1.07379 ]]
Let's verify the first number.
x0[0] = 1
x0'[0] = (1-5)/2.323790008 = -1.7213 ( it does match)
At this point, we should be able to write the equations for the normalization layer
y[0]' = (x0-5)/2.323790008 # (x-mean)/stddev
y[1]' = (x1-4.6)/3.352610923
Now, these 2 outputs will be inject in the next layer. Remember, you have a Dense layer and therefore it is fully connected. It means that both values will be inject in the single neuron.
These lines show the value of both weights and bias for the Dense layer.
weights = model.layers[1].get_weights()[0]
biases = model.layers[1].get_weights()[1]
print(weights)
print(biases)
[[-0.12915221]
[-0.41322172]]
[0.32663438]
A neuron multiplies each input by a given weight, adds all results with the bias.
Let's modify y[0]' and y[1]' to include the weights.
y[0]' = (x0-5)/2.323790008)* -0.12915221
y[1]' = (x1-4.6)/3.352610923 * -0.41322172
We are close, we just need to sum up these 2 and add the bias
y' = ((x0-5)/2.323790008)* -0.12915221 + (x1-4.6)/3.352610923 * -0.41322172 + 0.32663438
Since you don't have an activation function, we can stop here.
How can we verify if the formula is right?
Let's use the model to predict the label for a random input and see if it matches the result we get when we put the same values in our equation.
First, let's run a model prediction for [4,5]
print(model.predict( [[4,5]] ))
[[0.3329112]]
Now, let's plug the same inputs to our equation
y' = (((4-5)/2.323790008)* -0.12915221) + ((5-4.6)/3.352610923 * -0.41322172) + 0.32663438
y' = 0.332911
It seems that we are good. I cut some precisions just be make my life easier.
Here is the function for your model. Just replace my numbers with your numbers.
y' = ((x0-5)/2.323790008)* -0.12915221 + (x1-4.6)/3.352610923 * -0.41322172 + 0.32663438
And here is the code. I have also added tensorboard so you can verify yourself what I have said here.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from matplotlib import pyplot as plt
import numpy as np
import datetime
fundingFeatures = tf.constant([[1, 9], [4, 8], [8, 6], [7 ,3], [3 ,1], [6, 9], [6, 1], [5, 1], [2, 7], [8, 1]], dtype=tf.int32)
fundingLabels = tf.constant([ 0.8160469, -0.05249139, 1.1515405, 1.0792135, 0.80369186, -1.7353221, 1.0092108, 0.19228514, -0.10366996, 0.10583907])
normalizer = preprocessing.Normalization()
normalizer.adapt(fundingFeatures)
normalized_data = normalizer(fundingFeatures)
print(normalized_data)
print("Features mean raw: %.2f" % (fundingFeatures[:,0].numpy().mean()))
print("Features std raw: %.2f" % (fundingFeatures[:,0].numpy().std()))
print("Features mean raw: %.2f" % (fundingFeatures[:,1].numpy().mean()))
print("Features std raw: %.2f" % (fundingFeatures[:,1].numpy().std()))
print("Features mean: %.2f" % (normalized_data.numpy().mean()))
print("Features std: %.2f" % (normalized_data.numpy().std()))
model = tf.keras.Sequential([
normalizer,
layers.Dense(units=1)
])
model.compile(loss = tf.losses.MeanSquaredError(),
optimizer = tf.keras.optimizers.SGD(
learning_rate=0.06, momentum=0.0, nesterov=True, name="SGD",
))
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
model.summary()
print('--------------')
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]
print('--------------')
model.fit(fundingFeatures, fundingLabels, epochs=1000, callbacks=[tensorboard_callback])
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]
print(weights)
print(biases)
print ("\n")
weights = model.layers[1].get_weights()[0]
biases = model.layers[1].get_weights()[1]
print(weights)
print(biases)
print('\n--------- Prediction ------')
print(model.predict( [[4,5]] ))

Depthwise convolution training loss is not decreasing

the model is for binary classification.
this is my model:
im_input= layers.Input(shape=[160,160,3])
x = layers.Conv2D(30,(3,3),strides=(2,2),padding='same')(im_input)
z = layers.DepthwiseConv2D((3,3),strides=2,padding='same',depth_multiplier=10)(im_input)
x = layers.ReLU()(x)
z = layers.ReLU()(z)
x = layers.Conv2D(60,(3,3),strides=(2,2),padding='same')(x)
z = layers.Conv2D(60,(3,3),strides=2,padding='same')(z)
x = layers.ReLU()(x)
z = layers.ReLU()(z)
x = layers.Concatenate()([x,z])
x = layers.Conv2D(120,(3,3),strides=2,padding='same')(x)
x = layers.ReLU()(x)
x = layers.Conv2D(200,(3,3),strides=2,padding='same')(x)
x = layers.ReLU()(x)
x = layers.Conv2D(400,(3,3),strides=1,padding='same')(x)
x = layers.ReLU()(x)
x = layers.Conv2D(900,(3,3),strides=1,padding='same')(x)
x = layers.Flatten()(x)
#x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(100,activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(20, activation='relu')(x)
out = layers.Dense(1,activation='sigmoid')(x)
smodel = tf.keras.Model(inputs=im_input, outputs=out, name="myModel2")
smodel.summary()
and this is the loss function:
cross_entropy = tf.keras.losses.BinaryCrossentropy()
the optimizer:
optimizer = tf.keras.optimizers.SGD(0.001)
any suggestions for the optimizer?
why does this model loss is not decreasing? is there something wrong in model? someone, please help...
Instead of SGD, you should try Adam optimizer.
Also, in your network, increase the units in the Dense layer as this is the final representation of the data.
Finally, the number of filters should be less, keep it maximum to 512.
If your input size is small, then reduce the number of layers also.
Try changing the optimizer to Adam
I don't think there is anything wrong with the code.
Also try changing the dense layers-
after flatten use dense layer with 512 units and then directly your final output layer.
You don't need so many dense layers.
Also can you post your loss value, if its two large then maybe there is something wrong with your Train labels.

Feature learning with triplet loss after 1-2 epochs yields 100% val accuracy?

My NN has to learn image similarity with a custom triplet loss. The positive image is similar to the anchor, while the negative is not.
My task is to predict whether the second image or the third image of an unseen triplet is more similar to the anchor or not.
The triplets are given for both train and test sets in the task, so I did not have to mine them or randomly generate them: they are fixed in my task.
---> Idea: To improve my model, I try to use feature learning with Xception layers frozen and adding a Dense layer on top.
Problem:
When training the below model with Xception layers frozen, after 1-2 epochs it learns to just set all positive images to a very low distance to the anchor and all negative images to a very high distance. Hence, the 100% val accuracy.
I immediately thought of overfitting but I only have one fully connected layer that I train? How can I combat this? Or is my triplet loss somehow wrongly defined?
I dont use data augmentation so could that potentially help?
Somehow this happens only when using a pretrained model. When I use a simple model I get realistic accuracy...
What am I missing here?
My triplet loss:
def triplet_loss(y_true, y_pred, alpha = 0.4):
"""
Implementation of the triplet loss function
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor data
positive -- the encodings for the positive data (similar to anchor)
negative -- the encodings for the negative data (different from anchor)
Returns:
loss -- real number, value of the loss
"""
total_length = y_pred.shape.as_list()[-1]
anchor = y_pred[:,0:int(total_length*1/3)]
positive = y_pred[:,int(total_length*1/3):int(total_length*2/3)]
negative = y_pred[:,int(total_length*2/3):int(total_length*3/3)]
# distance between the anchor and the positive
pos_dist = K.sum(K.square(anchor-positive),axis=1)
# distance between the anchor and the negative
neg_dist = K.sum(K.square(anchor-negative),axis=1)
# compute loss
basic_loss = pos_dist-neg_dist+alpha
loss = K.maximum(basic_loss,0.0)
return loss
Then my model:
def baseline_model():
input_1 = Input(shape=(256, 256, 3))
input_2 = Input(shape=(256, 256, 3))
input_3 = Input(shape=(256, 256, 3))
pretrained_model = Xception(include_top=False, weights="imagenet")
for layer in pretrained_model.layers:
layer.trainable = False
x1 = pretrained_model(input_1)
x2 = pretrained_model(input_2)
x3 = pretrained_model(input_3)
x1 = Flatten(name='flatten1')(x1)
x2 = Flatten(name='flatten2')(x2)
x3 = Flatten(name='flatten3')(x3)
x1 = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x1)
x2 = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x2)
x3 = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x3)
x1 = Lambda(lambda x: K.l2_normalize(x,axis=-1))(x1)
x2 = Lambda(lambda x: K.l2_normalize(x,axis=-1))(x2)
x3 = Lambda(lambda x: K.l2_normalize(x,axis=-1))(x3)
concat_vector = concatenate([x1, x2, x3], axis=-1, name='concat')
model = Model([input_1, input_2, input_3], concat_vector)
model.compile(loss=triplet_loss, optimizer=Adam(0.00001), metrics=[accuracy])
model.summary()
return model
Fitting my model:
model.fit(
gen(X_train,batch_size=batch_size),
steps_per_epoch=13281 // batch_size,
epochs=10,
validation_data=gen(X_val,batch_size=batch_size),
validation_steps=1666 // batch_size,
verbose=1,
callbacks=callbacks_list
)
model.save_weights('try_6.h5')
Please note that you use different Dense layers for each input (you define 3 different Dense layers. each time you create a new Dense object it generate a new layer, with new parameters, independent of the previous layers you created). If the input is consistent, meaning input 1 is always the anchor, input 2 is always the positive, and input 3 is always the negative - it will be super easy for the model to overfit. What you should probably do is use only a single Dense layer for all 3 inputs.
For example, based on your code you can define the model like this:
pretrained_model = Xception(include_top=False, weights="imagenet")
for layer in pretrained_model.layers:
layer.trainable = False
general_input = Input(shape=(256, 256, 3))
x = pretrained_model(general_input)
x = Flatten()(x)
x = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x)
base_model = Model([general_input], [x])
input_1 = Input(shape=(256, 256, 3))
input_2 = Input(shape=(256, 256, 3))
input_3 = Input(shape=(256, 256, 3))
x1 = base_model(input_1)
x2 = base_model(input_2)
x3 = base_model(input_3)
# ... continue with your code - normalize, concat, etc.

Keras model doesn't learn at all

My model weights (I output them to weights_before.txt and weights_after.txt) are precisely the same before and after the training, i.e. the training doesn't change anything, there's no fitting happening.
My data look like this (I basically want the model to predict the sign of feature, result is 0 if feature is negative, 1 if positive):
,feature,zerosColumn,result
0,-5,0,0
1,5,0,1
2,-3,0,0
3,5,0,1
4,3,0,1
5,3,0,1
6,-3,0,0
...
Brief summary of my approach:
Load the data.
Split it column-wise to x (feature) and y (result), split these two row-wise to test and validation sets.
Transform these sets into TimeseriesGenerators (not necessary in this scenario but I want to get this setup working and I don't see any reason why it shouldn't).
Create and compile simple Sequential model with few Dense layers and softmax activation on its output layer, use binary_crossentropy as loss function.
Train the model... nothing happens!
Complete code follows:
import keras
import pandas as pd
import numpy as np
np.random.seed(570)
TIMESERIES_LENGTH = 1
TIMESERIES_SAMPLING_RATE = 1
TIMESERIES_BATCH_SIZE = 1024
TEST_SET_RATIO = 0.2 # the portion of total data to be used as test set
VALIDATION_SET_RATIO = 0.2 # the portion of total data to be used as validation set
RESULT_COLUMN_NAME = 'feature'
FEATURE_COLUMN_NAME = 'result'
def create_network(csv_path, save_model):
before_file = open("weights_before.txt", "w")
after_file = open("weights_after.txt", "w")
data = pd.read_csv(csv_path)
data[RESULT_COLUMN_NAME] = data[RESULT_COLUMN_NAME].shift(1)
data = data.dropna()
x = data.ix[:, 1:2]
y = data.ix[:, 3]
test_set_length = int(round(len(x) * TEST_SET_RATIO))
validation_set_length = int(round(len(x) * VALIDATION_SET_RATIO))
x_train_and_val = x[:-test_set_length]
y_train_and_val = y[:-test_set_length]
x_train = x_train_and_val[:-validation_set_length].values
y_train = y_train_and_val[:-validation_set_length].values
x_val = x_train_and_val[-validation_set_length:].values
y_val = y_train_and_val[-validation_set_length:].values
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x_train,
y_train,
length=TIMESERIES_LENGTH,
sampling_rate=TIMESERIES_SAMPLING_RATE,
batch_size=TIMESERIES_BATCH_SIZE
)
val_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x_val,
y_val,
length=TIMESERIES_LENGTH,
sampling_rate=TIMESERIES_SAMPLING_RATE,
batch_size=TIMESERIES_BATCH_SIZE
)
model = keras.models.Sequential()
model.add(keras.layers.Dense(10, activation='relu', input_shape=(TIMESERIES_LENGTH, 1)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(1, activation='softmax'))
for item in model.get_weights():
before_file.write("%s\n" % item)
model.compile(
loss=keras.losses.binary_crossentropy,
optimizer="adam",
metrics=[keras.metrics.binary_accuracy]
)
history = model.fit_generator(
train_gen,
epochs=10,
verbose=1,
validation_data=val_gen
)
for item in model.get_weights():
after_file.write("%s\n" % item)
before_file.close()
after_file.close()
create_network("data/sign_data.csv", False)
Do you have any ideas?
The problem is that you are using softmax as the activation function of last layer. Essentially, softmax normalizes its input to make the sum of the elements to be one. Therefore, if you use it on a layer with only one unit (i.e. Dense(1,...)), then it would always output 1. To fix this, change the activation function of last layer to sigmoid which outputs a value in the range (0,1).

Keras custom softmax layer: Is it possible to have output neurons set to 0 in the output of a softmax layer based on zeros as data in an input layer?

I have a neural network with 10 output neurons in the last layer using softmax activation. I also know exactly that based on the input values, certain neurons in the output layer shall have 0 values. So I have a special input layer of 10 neurons, each of them being either 0 or 1.
Would it be somehow possible to force let's say the output neuron no. 3 to have value = 0 if the input neuron no 3 is also 0?
action_input = Input(shape=(10,), name='action_input')
...
x = Dense(10, kernel_initializer = RandomNormal(),bias_initializer = RandomNormal() )(x)
x = Activation('softmax')(x)
I know that there is a method via which I can mask out the results of the output layer OUTSIDE the neural network, and have all non zero related outputs reshaped (in order to have a total sum of 1). But I would like to solve this issue within the network and use it during the training of the network, too. Shall I use a custom layer for this?
You can use a Lambda layer and K.switch to check for zero values in the input and mask them in the output:
from keras import backend as K
inp = Input((5,))
soft_out = Dense(5, activation='softmax')(inp)
out = Lambda(lambda x: K.switch(x[0], x[1], K.zeros_like(x[1])))([inp, soft_out])
model = Model(inp, out)
model.predict(np.array([[0, 3, 0, 2, 0]]))
# array([[0., 0.35963967, 0., 0.47805876, 0.]], dtype=float32)
However, as you can see the sum of outputs are no longer one. If you want the sum to be one, you can rescale the values:
def mask_output(x):
inp, soft_out = x
y = K.switch(inp, soft_out, K.zeros_like(inp))
y /= K.sum(y, axis=-1)
return y
# ...
out = Lambda(mask_output)([inp, soft_out])
At the end I came up with this code:
from keras import backend as K
import tensorflow as tf
def mask_output2(x):
inp, soft_out = x
# add a very small value in order to avoid having 0 everywhere
c = K.constant(0.0000001, dtype='float32', shape=(32, 13))
y = soft_out + c
y = Lambda(lambda x: K.switch(K.equal(x[0],0), x[1], K.zeros_like(x[1])))([inp, soft_out])
y_sum = K.sum(y, axis=-1)
y_sum_corrected = Lambda(lambda x: K.switch(K.equal(x[0],0), K.ones_like(x[0]), x[0] ))([y_sum])
y_sum_corrected = tf.divide(1,y_sum_corrected)
y = tf.einsum('ij,i->ij', y, y_sum_corrected)
return y

Categories