Mapping NN Output of Keras CNN in Python to interval [0;1] - python

I attempt to train a CNN to binary classify images of the (maybe uncommon) shape of height=2 and width=1000 pixels. My first approach is a small and simple CNN coded as follows:
def cnn_model_01():
model = Sequential()
# Assembly of layers
model.add(Conv2D(16, (2, 2), input_shape=(1, 2, 1000), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 1)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compilation of model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = cnn_model_01()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=200, verbose=2)
The accuracy and prediction of the NN results in a value which reflects simply the distribution of values in the sample. Typical training output is
13s - loss: 0.7772 - acc: 0.5680 - val_loss: 0.6657 - val_acc: 0.6048
Epoch 2/5
15s - loss: 0.6654 - acc: 0.5952 - val_loss: 0.6552 - val_acc: 0.6048
Epoch 3/5
15s - loss: 0.6514 - acc: 0.5952 - val_loss: 0.6396 - val_acc: 0.6048
Epoch 4/5
15s - loss: 0.6294 - acc: 0.5952 - val_loss: 0.6100 - val_acc: 0.6048
Epoch 5/5
13s - loss: 0.5933 - acc: 0.6116 - val_loss: 0.5660 - val_acc: 0.6052
The reason for this is that the NN assigns all input samples to one class. So, in approximately two thirds it is correct by chance in the case of a sample distributed in exactly this way.
In order to fix the problem and get the NN to produce better results I've inspected the output and encountered that the interval or domain of these values is relatively small, e.g. between [0.55;0.62]. I've tried to map resp. resize this interval to [0;1]. As a result a got a really good accuracy of ~99%. I've done this mapping "by hand": subtract the minimum value of the array from each value and divide it by the difference of the maximum and minimum.
Can I implement this mapping in Keras? Is there a layer with this functionality?
Or did I do something completely wrong/not advisable with the layers, which leads to this narrow interval of the output?

I'm not sure I entirely understand what you want to achieve.
But I have three ideas out of which one or two may help you.
1) Add a Dense(2) layer before the output layer and change the activation of the output layer to softmax. That way you'd have the previous layer classify the image as class 1 or class 2. The last Dense(1) layer would then "merge" that information into a single value 0 or 1 as output.
2) I assume you could pick a threshold, e.g. 0.5 and simply compare the probability-based output of your NN and so something like result = output > 0.5. This could also be done inside a Lambda layer, i.e. model.add(Lambda(lambda x: 1 if x > 0.5 else 0))
3) When predicting, you can use predict_class instead of predict and get 0 or 1 as result instead of probabilities. This resembles my previous suggestion 2.
I hope one of the suggestions matches your task.

Related

Tensorflow model not improving

I'm just starting learning ML/Tensorflow/etc, so I'm pretty novice and still don't really know what the troubleshooting method is like. I'm currently having an issue with my model as it doesn't seem to really ever improve. For instance, the output appears as
Epoch 1/10
4/4 [==============================] - 41s 10s/step - loss: 0.8833 - accuracy: 0.4300
Epoch 2/10
4/4 [==============================] - 12s 3s/step - loss: 0.8833 - accuracy: 0.4300
Epoch 3/10
4/4 [==============================] - 10s 3s/step - loss: 0.8833 - accuracy: 0.4300
Epoch 7/1000
4/4 [==============================] - 10s 3s/step - loss: 0.8833 - accuracy: 0.4300
The main aspect that worries me is that it doesn't change at all which makes me think I'm doing something completely wrong. To give some more context and code, I am trying to do some time series classification. Basically, the input is the normalized time series of the song and the net should classify if it's classical music (output of 1 means it is, output of 0 means it isn't).
This is the current model I am trying.
model = keras.Sequential([
keras.layers.Conv1D(filters=100, kernel_size=10000, strides=5000, input_shape=(1323000, 1), activation='relu'),
keras.layers.Conv1D(filters=100, kernel_size=10, strides=3, input_shape=(263, 100), activation='relu'),
keras.layers.LSTM(1000),
keras.layers.Dense(500, activation='relu'),
keras.layers.Dense(250, activation='relu'),
keras.layers.Dense(1, activation='softmax')
])
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
This is how I get the training data (x and y are dictionaries with time series from different songs).
minute = 1323000
x_train = np.zeros((100, minute, 1))
y_train = np.zeros((100,))
for kk in range(0, 100):
num = randint(0, 41)
ts = x[num]
start = randint(0, len(ts) - minute)
x_train[kk, :] = np.array([ts[start:(start + minute)]]).T
y_train[kk] = 1 - y[num]
and then training:
for kk in range(1, 1000):
x_train, y_train = create_training_set(x, y)
model.fit(x_train, y_train, epochs=1000)
I looked at some similar questions asked, however, I was already doing what was suggested or the advice was too specific for the asker. I also tried some relatively different models/activators, so I don't think it's because the model is too complicated and the data is already normalized, so that shouldn't be an issue. But, as I said, I'm knew to all this and could be wrong.
The usage of softmax activation in a single-node last layer is not correct. Additionally, the argument from_logits=True in your loss definition means that the model is expected to produce logits, and not probabilities (which are generally produced by softmax and sigmoid final activations).
So, you should change your last layer to
keras.layers.Dense(1) # linear activation by default
Alternatively, you could change both your last layer and your loss function respectively to
keras.layers.Dense(1, activation='sigmoid')
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False)
According to the docs, usage of from_logits=True may be more numerically stable, and probably this is the reason it is preferred in the standard Tensorflow classification tutorials (see here and here).
So I can't promise this is going to work, because I don't know your data and this is a very weird architecture, but here are a few things that seem wrong:
Last dense layer should have sigmoid activation function
from_logits should be False

can't train Tensorflow on a simple dataset

I'm trying to learn a bit about Tensorflow/Machine Learning. As a starting point, I'm trying to create a model that is trained on a simple 1-D function (y=x^2) and see how it behaves for other inputs outside of the training range.
The problem I'm having is that the training function doesn't really ever improve. I'm sure it's due to a lack of understanding and/or misconfiguration on my part, but there really doesn't seem to be any sort of "baby's first machine learning" out there that deals with a dataset of a known form.
My code is pretty simple, and is borrowed from TensorFlow's introduction notebook here
import tensorflow as tf
import numpy as np
# Load the dataset
x_train = np.linspace(0,10,1000)
y_train = np.power(x_train,2.0)
x_test = np.linspace(8,12,100)
y_test = np.power(x_test,2.0)
# (x_train, y_train), (x_test, y_test) = mnist.load_data()
# x_train, x_test = x_train / 255.0, x_test / 255.0
"""Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:"""
from tensorflow.keras import layers
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='mse',
metrics=['mae'])
"""Train and evaluate the model:"""
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
and I get output like this:
Train on 1000 samples
Epoch 1/5
1000/1000 [==============================] - 0s 489us/sample - loss: 1996.3631 - mae: 33.2543
Epoch 2/5
1000/1000 [==============================] - 0s 36us/sample - loss: 1996.3540 - mae: 33.2543
Epoch 3/5
1000/1000 [==============================] - 0s 36us/sample - loss: 1996.3495 - mae: 33.2543
Epoch 4/5
1000/1000 [==============================] - 0s 33us/sample - loss: 1996.3474 - mae: 33.2543
Epoch 5/5
1000/1000 [==============================] - 0s 38us/sample - loss: 1996.3450 - mae: 33.2543
100/1 - 0s - loss: 15546.3655 - mae: 101.2603
Like I said, I'm positive that this is a misconfiguration/lack of understanding on my part. I really learn best when I can take something this simple and incrementally make it more complex rather than starting on something whose patterns I can't readily identify, but I can't find any tutorials, etc that take this approach. Can anyone recommend either a good tutorial source, or just educate me on what I am doing wrong here?
I think you have mix of the problems here. I try to explain to you one by one:
First of all, the problem you want to solve is to learn the function f=x^2. So this can fit into a regression task. For a regression task ( and any other tasks ^_^ ) you should pay attention to the activation function and also to what you really try to predict.
You have chosen softmax for activation function, which does not make sense at all. I suggest to replace it with a linear activation function ( if you remove it completely, it will be considered linear automatically by TF/Keras).
On the other hand, why you have a 10 DENSE at the last layer? Per each entry, you want to predict one value ( for 5 as the input value you wanna predict 25, right),
so one DENSE should be enough to generate your value.
On the other hand, since your network is not big, I would start by SGD as the optimizer, but Adam might be good as well. Additionally, for the problem you are trying to solve, I do not believe you really need 128 DENSE as the first hidden layer. you can start by a smaller number and look at how it goes. I would start by 3-4 DENSE as a start
Long story short, let's replace your model with these lines, and hopefully, it gets working
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1)
])

why loss is so high for 4 inputs 2 outputs function approximation using MLP?

My code is very simple since my understanding is MLP can approximate any function:
def build_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(5, activation='tanh', input_shape=(4, ), name='a'),
tf.keras.layers.Dense(5, activation='tanh'),
tf.keras.layers.Dense(2, activation='sigmoid', name='b')])
optimizer = tf.keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='mse',optimizer=optimizer)
return model
def train_benchmark_NN(x, y, epochs=10000):
model = build_model()
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=20, verbose=0)
history = model.fit(x, y, batch_size = 1000, epochs=epochs, validation_split = 0.2, verbose=1, callbacks=[es])
return model, history
I tried different number of layers(like[256, 128, 64, 32]), nodes, optimizer, initializer, activation function. I also tried handle the two outputs separately instead of training one model for them together, but the result is bad too. Actually I don't have a decent judge on how heavy my model should be for data like this. I tried training model for some know function with same number of input and output, it's always very hard when the function like
y1=math.cos(x1)+math.cos(x2)+math.cos(x3)+math.cos(x4).
Can anyone tell me, I should try much heavier model or I did something wrong in my code? or I have to preprocess the data differently? I only normalized it with zscore. Data size is ~6000 in total.
Current results:
Epoch 62/10000
4936/4936 [==============================] - 0s 4us/sample - loss: 0.2711 - val_loss: 3.9427
Epoch 63/10000
4936/4936 [==============================] - 0s 4us/sample - loss: 0.2686 - val_loss: 3.9444
Epoch 64/10000
4936/4936 [==============================] - 0s 3us/sample - loss: 0.2661 - val_loss: 3.9457
If I change validation_split from 0.2 to 0.01,the result became very different:
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3729 - val_loss: 0.0589
Epoch 96/10000
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3683 - val_loss: 0.0356
Epoch 97/10000
6109/6109 [==============================] - 0s 5us/sample - loss: 0.3702 - val_loss: 0.0381
i: 0 , err_mean: 2.383471436639142
<matplotlib.legend.Legend at 0x7fdbb329d7f0>
Although the val_loss became much smaller, that probably because the validation size isn't big enough, because when I plot the errors, it still looks same.
Some visualization of the relationships in my data:
inputs are x1-car speed, x2-engine torque, x3-DOC temperature, x4-DPF temperature
outputs are y1-tailpipe CO gas, y2-tailpipe HC gas.
y1 against x1, x2, x3, x4 are shown below:
Should this be function easy to approximate at all? Thanks!!!
I plotted errors along targets, it seems the model didn't learn at all, because the errors is very correlated to the targets.

What does categorical_accuracy value mean? categorical_accuracy 100% but data shown false prediction

In training, Keras categorical_accuracy value is 100%. But that same training data that I've save it's output to a file, shown several (actually quite many) data that classified to wrong class. I've check input file for label, and it's the correct one.
What does categorical_accuracy measure? Is there any better metric to debug a LSTM?
model = Sequential()
model.add(LSTM(64, batch_input_shape=(7763, TimeStep.TIME_STEP + 1, 10), return_sequences=True, activation='relu'))
model.add(LSTM(128, activation='relu', return_sequences=True))
model.add(LSTM(64, activation='relu', return_sequences=True))
model.add(LSTM(32, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=[categorical_accuracy])
history = model.fit(TimeStep.fodder, TimeStep.target, epochs=300, batch_size=7763)
Epoch 400/400
7763/31052 [======>.......................] - ETA: 1s - loss: 2.7971e-04 - categorical_accuracy: 1.0000
15526/31052 [==============>...............] - ETA: 1s - loss: 3.0596e-04 - categorical_accuracy: 1.0000
23289/31052 [=====================>........] - ETA: 0s - loss: 3.0003e-04 - categorical_accuracy: 1.0000
31052/31052 [==============================] - 2s 78us/step - loss: 2.9869e-04 - categorical_accuracy: 1.0000
From the keras github repo we have the categorical_accuracyfunction.
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
Here we can see that if the position of the maximum value of y_true is the same as y_pred it returns 1, else 0. A 100% accuracy should indicate that the position of the maximum value of y_true is always the same as y_pred (position here is classes, so always predict the same class).
One possible reason for this could be that you only have one single output (single binary class). hence the position of the maximum value will always be 0 for both y_true and y_pred.
In this is the case use binary_accuracy instead.

Validation loss increases after 3 epochs but validation accuracy keeps increasing

Training and validation is healthy for 2 epochs but after 2-3 epochs the Val_loss keeps increasing while the Val_acc keeps increasing.
I'm trying to train a CNN model to classify a given review to a single class of 1-5. Therefore, I considered it as a multi-class classification.
I've divided the dataset to 3 sets - 70% training, 20% testing and 10% validation.
Distribution of training data for 5 classes as follows.
1 - 31613, 2 - 32527, 3 - 61044, 4 - 140005, 5 - 173023.
Therefore I've added class weights as follows.
{1: 5.47, 2: 5.32, 3: 2.83, 4: 1.26, 5: 1}
Model structure is as below.
input_layer = Input(shape=(max_length, ), dtype='int32')
embedding = Embedding(vocab_size, 200, input_length=max_length)(input_layer)
channel1 = Conv1D(filters=100, kernel_size=2, padding='valid', activation='relu', strides=1)(embedding)
channel1 = GlobalMaxPooling1D()(channel1)
channel2 = Conv1D(filters=100, kernel_size=3, padding='valid', activation='relu', strides=1)(embedding)
channel2 = GlobalMaxPooling1D()(channel2)
channel3 = Conv1D(filters=100, kernel_size=4, padding='valid', activation='relu', strides=1)(embedding)
channel3 = GlobalMaxPooling1D()(channel3)
merged = concatenate([channel1, channel2, channel3], axis=1)
merged = Dense(256, activation='relu')(merged)
merged = Dropout(0.6)(merged)
merged = Dense(5)(merged)
output = Activation('softmax')(merged)
model = Model(inputs=[input_layer], outputs=[output])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
model.fit(final_X_train, final_Y_train, epochs=5, batch_size=512, validation_data=(final_X_val, final_Y_val), callbacks=callback, class_weight=class_weights)
1/5 - loss: 1.8733 - categorical_accuracy: 0.5892 - val_loss: 0.7749 - val_categorical_accuracy: 0.6558
2/5 - loss: 1.3908 - categorical_accuracy: 0.6917 - val_loss: 0.7421 - val_categorical_accuracy: 0.6784
3/5 - loss: 0.9587 - categorical_accuracy: 0.7734 - val_loss: 0.7595 - val_categorical_accuracy: 0.6947
4/5 - loss: 0.6402 - categorical_accuracy: 0.8370 - val_loss: 0.7921 - val_categorical_accuracy: 0.7216
5/5 - loss: 0.4520 - categorical_accuracy: 0.8814 - val_loss: 0.8556 - val_categorical_accuracy: 0.7331
Final accuracy = 0.7328754744261703
This seems to be an overfitting behavior, but I've tried adding dropout layers which didn't help. I've also tried increasing the data, which made the results even worst.
I'm totally new to deep learning, if anyone has any suggestions to improve, please let me know.
val_loss keeps increasing while the Val_acc keeps increasing This is maybe because of the loss function...loss function is being calculated using actual predicted probabilities while accuracy is being calculated using one hot vectors.
Let's take your 4-class example. For one of the review true class is, say 1. The predicted probabilities by the system are [0.25, 0.30, 0.25, 0.2]. According to categorical_accuracy your output is correct i.e [0, 1, 0, 0] but since your probability mass is so distributed...categorical_crossentropy will give a high loss as well.
As for the overfitting problem. I am not really sure why introducing more data is causing problems.
Try increasing the strides.
Don't make the data more imbalanced by adding data to any particular class.

Categories