Why isnt my loss function zero when using predictions as training examples? - python

I am building a neural network to control a pan tilt gimbal using ROS, OpenCV and Keras. I have position errors and their derivatives presented as the state input, and my approach is to add to the output command position error that resulted from the movement. So my examples are collected as
input(t) -> output(t) + error(t+1) and my understanding is that updates to the network will stop when the error is small.
However, after failing to converge I decided to try removing my error term to verify that my loss is zero. Thinking that what ever the current state of the network, I am passing it training examples that it is already predicting. To my surprise this is still giving me a non zero loss. I feel there is something fundamental that I am missing about how the network trains.
I have put together a minimum working example with a random input, removing the tilt axis for simplicity but otherwise leaving in the basic neural network architecture in case that was where my problem is.
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Dropout, Activation
from tensorflow.keras.models import Model
from tensorflow.keras import optimizers
from tensorflow.keras.regularizers import l2
import numpy as np
input_dims = 4
batch_size = 4
state_input = Input(shape=input_dims)
initializer = 'glorot_uniform'
x = Dense(128, kernel_initializer = initializer, kernel_regularizer=l2(0.1),
bias_regularizer=l2(0.01), name='neural_inputLayer')(state_input)
x = BatchNormalization()(x)
x = Activation('softplus')(x)
x = Dropout(0.3)(x)
x_pan = Dense(128, activation='softplus', kernel_initializer = initializer,
kernel_regularizer=l2(0.1), bias_regularizer=l2(0.01), name='neural_PantLayer')(x)
mu_pan = Dense(1, activation='linear',kernel_initializer = initializer, kernel_regularizer=l2(0.1),
bias_regularizer=l2(0.01), name='neural_ctrl_output_mu_pan')(x_pan)
model = Model(inputs=state_input, outputs=mu_pan)
OPTIMIZER = optimizers.Nadam(learning_rate=0.01, beta_1=0.9, beta_2=0.999, clipnorm=0.5,
clipvalue=0.1)
model.compile(optimizer=OPTIMIZER, loss='mse')
model.summary()
test_inputs = np.array([])
test_outputs = np.array([])
for i in range(batch_size):
test_input = np.random.rand(input_dims)
test_output = model.predict(test_input.reshape(1,input_dims))
#save IO pairs
if i == 0:
test_inputs = test_input
else:
test_inputs = np.vstack((test_inputs, test_input))
test_outputs = np.append(test_outputs,test_output)
test_loss = model.fit(test_inputs,
test_outputs,batch_size=batch_size,verbose=True,shuffle=True,epochs=10)

Related

Convolutional LSTM Model Dimension Incompatibility when making predictions & prediction dimension issues

I structured a Convolutional LSTM model to predict the forthcoming Bitcoin price data, using the analyzed past data of the Bitcoin close price and other features.
Let me jump straight to the code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
import tensorflow.keras as keras
import keras_tuner as kt
from keras_tuner import HyperParameters as hp
from keras.models import Sequential
from keras.layers import InputLayer, ConvLSTM1D, LSTM, Flatten, RepeatVector, Dense, TimeDistributed
from keras.callbacks import EarlyStopping
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam
import keras.backend as K
from keras.losses import Huber
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
DIR = '../input/btc-features-targets'
SEG_DIR = '../input/segmented'
segmentized_features = os.listdir(SEG_DIR)
btc_train_features = []
for seg in segmentized_features:
train_features = pd.read_csv(f'{SEG_DIR}/{seg}')
train_features.set_index('date', inplace=True)
btc_train_features.append(scaler.fit_transform(train_features.values))
btc_train_targets = pd.read_csv(f'{DIR}/btc_train_targets.csv')
btc_train_targets.set_index('date', inplace=True)
btc_test_features = pd.read_csv(f'{DIR}/btc_test_features.csv')
btc_tef1 = btc_test_features.iloc[:111]
btc_tef2 = btc_test_features.iloc[25:]
btc_tef1.set_index('date', inplace=True)
btc_tef2.set_index('date', inplace=True)
btc_test_targets = pd.read_csv(f'{DIR}/btc_test_targets.csv')
btc_test_targets.set_index('date', inplace=True)
btc_trt_log = np.log(btc_train_targets)
btc_tefs1 = scaler.fit_transform(btc_tef1.values)
btc_tefs2 = scaler.fit_transform(btc_tef2.values)
btc_tet_log = np.log(btc_test_targets)
scaled_train_features = []
for features in btc_train_features:
shape = features.shape
scaled_train_features.append(np.expand_dims(features, [0,3]))
shape_2 = btc_tefs1.shape
btc_tefs1 = np.expand_dims(btc_tefs1, [0,3])
shape_3 = btc_tefs2.shape
btc_tefs2 = np.expand_dims(btc_tefs2, [0,3])
btc_trt_log = btc_trt_log.values[0]
btc_tet_log = btc_tet_log.values[0]
def build(hp):
model = keras.Sequential()
# Input Layer
model.add(InputLayer(input_shape=(111,32,1)))
# ConvLSTM1D
convLSTM_hp_filters = hp.Int(name='convLSTM_filters', min_value=32, max_value=512, step=32)
convLSTM_hp_kernel_size = hp.Choice(name='convLSTM_kernel_size', values=[3,5,7])
convLSTM_activation = hp.Choice(name='convLSTM_activation', values=['selu', 'relu'])
model.add(ConvLSTM1D(filters=convLSTM_hp_filters,
kernel_size=convLSTM_hp_kernel_size,
padding='same',
activation=convLSTM_activation,
use_bias=True,
bias_initializer='zeros'))
# Flatten
model.add(Flatten())
# RepeatVector
model.add(RepeatVector(5))
# LSTM
LSTM_hp_units = hp.Int(name='LSTM_units', min_value=32, max_value=512, step=32)
LSTM_activation = hp.Choice(name='LSTM_activation', values=['selu', 'relu'])
model.add(LSTM(units=LSTM_hp_units, activation=LSTM_activation, return_sequences=True))
# TimeDistributed Dense
dense_units = hp.Int(name='dense_units', min_value=32, max_value=512, step=32)
dense_activation = hp.Choice(name='dense_activation', values=['selu', 'relu'])
model.add(TimeDistributed(Dense(units=dense_units, activation=dense_activation)))
# TimeDistributed Dense_Output
model.add(Dense(1))
# Set Learning Rate
hp_learning_rate = hp.Choice(name='learning_rate', values=[1e-2, 1e-3, 1e-4])
# Compile Model
model.compile(optimizer=Adam(learning_rate=hp_learning_rate),
loss=Huber(),
metrics=[RootMeanSquaredError()])
return model
tuner = kt.Hyperband(build,
objective=kt.Objective('root_mean_squared_error', direction='min'),
max_epochs=10,
factor=3)
early_stop = EarlyStopping(monitor='root_mean_squared_error', patience=5)
opt_hps = []
for train_features in scaled_train_features:
tuner.search(train_features, btc_trt_log, epochs=50, callbacks=[early_stop])
opt_hps.append(tuner.get_best_hyperparameters(num_trials=1)[0])
models, epochs = ([] for _ in range(2))
for hps in opt_hps:
model = tuner.hypermodel.build(hps)
models.append(model)
history = model.fit(train_features, btc_trt_log, epochs=70, verbose=0)
rmse = history.history['root_mean_squared_error']
best_epoch = rmse.index(min(rmse)) + 1
epochs.append(best_epoch)
hypermodel = tuner.hypermodel.build(opt_hps[0])
for train_features, epoch in zip(scaled_train_features, epochs): hypermodel.fit(train_features, btc_trt_log, epochs=epoch)
tp1 = hypermodel.predict(btc_tefs1).flatten()
tp2 = hypermodel.predict(btc_tefs2).flatten()
test_predictions = np.concatenate((tp1, tp2[86:]), axis=None)
The hyperparameters of the model are configured using keras_tuner; as there were ResourceExhaustError issues output by the notebook when training is done with the full features dataset, sequentially segmented datasets are used instead (and apparently, referring to the study done utilizing the similar model architecture, training is able to be efficiently done through this training approach).
The input dimension of each segmented dataset is (111,32,1).
There aren't any issues reported until before the last code block. The models work fine. Yet, when the .predict() function is executed, the notebook prints out an error, which states that the dimension of the input features for making predictions is incompatible with the dimension of the input features used while training. I did not understand the reason behind its occurrence, since as far as I know, the input dimensions of a train dataset for a DNN model cannot be identical as the input dimensions of a test dataset.
Even though all the price data from 2018 to early 2021 are used as training datasets, predictions are only needed for the mid 2021 timeframe.
The dataset used for prediction has a dimension of (136,32,1).
I tried matching the dimension of this dataset to (111,32,1), through index slicing.
Now this showed issues in the output dimension. While predictions should be made for 136 data points, the result only returned 10.
Are there any issues relevant to the model configuration? Cannot interpret the current situation.

How to mix many distributions in one tensorflow probability layer?

I have several DistributionLambda layers as the outputs of one model, and I would like to make a Concatenate-like operation into a new layer, in order to have only one output that is the mix of all the distributions, assuming they are independent. Then, I can apply a log-likelihood loss to the output of the model. Otherwise, I cannot apply the loss over a Concatenate layer, because it lost the log_prob method. I have been trying with the Blockwise distribution, but with no luck so far.
Here an example code:
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow_probability import distributions
from tensorflow_probability import layers as tfp_layers
def likelihood_loss(y_true, y_pred):
"""Adding negative log likelihood loss."""
return -y_pred.log_prob(y_true)
def distribution_fn(params):
"""Distribution function."""
return distributions.Normal(
params[:, 0], math.log(1.0 + math.exp(params[:, 1])))
output_steps = 3
...
lstm_layer = layers.LSTM(10, return_state=True)
last_layer, l_h, l_c = lstm_layer(last_layer)
lstm_state = [l_h, l_c]
dense_layer = layers.Dense(2)
last_layer = dense_layer(last_layer)
last_layer = tfp_layers.DistributionLambda(
make_distribution_fn=distribution_fn)(last_layer)
output_layers = [last_layer]
# Get output sequence, re-injecting the output of each step
for number in range(1, output_steps):
last_layer = layers.Reshape((1, 1))(last_layer)
last_layer, l_h, l_c = lstm_layer(last_layer, initial_state=lstm_states)
# Storing state for next time step
lstm_states = [l_h, l_c]
last_layer = tfp_layers.DistributionLambda(
make_distribution_fn=distribution_fn)(dense_layer(last_layer))
output_layers.append(last_layer)
# This does not work
# last_layer = distributions.Blockwise(output_layers)
# This works for the model but cannot compute loss
# last_layer = layers.Concatenate(axis=1)(output_layers)
the_model = models.Model(inputs=[input_layer], outputs=[last_layer])
the_model.compile(loss=likelihood_loss, optimizer=optimizers.Adam(lr=0.001))
The problem is your Input, not your output layer ;)
Input:0 is referenced in your error message.
Could you try to be more specific about your input?

TF2, Tensorflow Probability random seed generator and VAE

Playing around with Variational Autoencoders for some days. I am trying to fit a small toy function with a small model.
I first implemented the model using the Keras Functional API, with the following code:
def define_tfp_encoder(latent_dim, n_inputs=2, kl_weight=1):
prior = tfd.MultivariateNormalDiag(loc=tf.zeros(latent_dim))
input_x = Input((n_inputs,))
input_c = Input((1,))
dense = Dense(25, activation='relu', name='tfpenc/dense_1')(input_x)
dense = Dense(32, activation='relu', name='tfpenc/dense_2')(dense)
dense_z_params = Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim), name='tfpenc/z_params')(dense)
dense_z = tfpl.MultivariateNormalTriL(latent_dim, name='tfpenc/z')(dense_z_params)
#activity_regularizer=tfpl.KLDivergenceRegularizer(prior) # weight=kl_weight
kld = tfpl.KLDivergenceAddLoss(prior, name='tfpenc/kld_add')(dense_z)
model = Model(inputs=input_x, outputs=kld)
return model
def define_tfp_decoder(latent_dim, n_inputs=2):
input_c = Input((1,), name='tfpdec/cond_input')
input_n = Input((latent_dim,))
dense = Dense(15, activation='relu', name='tfpdec/dense_1')(input_n)
dense = Dense(32, activation='relu', name='tfpdec/dense_2')(dense)
dense = Dense(tfpl.IndependentNormal.params_size(n_inputs), name='tfpdec/output')(dense)
output = tfpl.IndependentNormal((n_inputs,))(dense)
model = Model(input_n, output)
return model
def get_custom_unconditional_vae():
latent_size = 5
encoder = define_tfp_encoder(latent_dim=latent_size)
decoder = define_tfp_decoder(latent_dim=latent_size)
encoder.trainable = True
decoder.trainable = True
x = encoder.input
z = encoder.output
out = decoder(z)
vae = Model(inputs=x, outputs=out)
vae.compile(loss=lambda x, pred: -pred.log_prob(x), optimizer='adam')
return encoder, decoder, vae
The vae-model was then fitted and trained on 3000 epochs.
However, it only produced garbage for a very simple quadratic function to fit.
Now it comes:
When creating the exact same model using the sequential API it works as expected and the desired function gets approximated nicely:
And it becomes even stranger for me:
After running tf.random.set_seed(None) the model created using the Functional API also works as expected - What am I missing or not understanding correctly so far? - I assume that there are some differences regarding tf.random.set_seed when using the Sequential vs. the Functional API but... ?
Thanks in advance,
codax
EDIT: I forgot to mention that setting a seed (e.g. tf.random.set_seed(123) leads to identical results for both models not fitting the desired function.

InvalidArgumentError while Building a deep-CNN [duplicate]

This question already has an answer here:
InvalidArgumentError: Received a label value of 8825 which is outside the valid range of [0, 8825) SEQ2SEQ model
(1 answer)
Closed last year.
I’m new to TensorFlow and python...
I’m trying to build a deep CNN for cell image classification Hep-2 dataset. The data set consists of 13596 images and I’m using 8701 images as my training data for CNN. Also, I have.CSV file which consists of image ID and its cell-type. I extracted the content and using image_ID from .CSV file as my labels. Both training data and Image ID has been converted to .astype(‘float32’). But, somehow I’m getting InvalidArgumentError which I have no idea what’s going on in there.
I’ve posted my code and error, any tips or help would be highly appreciated. Thank you in advance :)
I'm new to Stack Overflow as well. sorry for my messy formatting.
My CODE:
from PIL import Image
import glob
import os
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
from keras.optimizers import SGD
def extract_labels(image_names, Original_Labels):
temp = np.array([image.split('.')[0] for image in image_names])
temp2 = np.array([j[0] for i in temp for j in Original_Labels if(int(i) == int(j[0]))])
return temp2
def get_Labels():
df=pd.read_csv('gt_training.csv', sep=',')
labels = np.asarray(df)
path = 'path..../training/'
image_names_train = [f for f in os.listdir(path) if os.path.splitext(f)[-1] == '.png']
return labels, image_names_train
Train_images = glob.glob('path.../training/*.png')
train_data = np.array([np.array(Image.open(fname)) for fname in Train_images])
train_data = train_data.astype('float32')
train_data /= 255
#getting labels from .csv file for training data
labels, image_names_train = get_Labels()
train_labels = extract_labels(image_names_train, labels)
train_labels = train_labels.astype('float32')
print(train_labels.shape)
train_data = train_data.reshape(train_data.shape[0],78,78,1) #reshaping into 4-Dim
input_shape = (78, 78, 1) #1 because the provided dataset is in grey scale
#Adding pooling, dense layers to an an non-optimized empty CNN
model = Sequential()
model.add(Conv2D(6, kernel_size=(7,7),activation = tf.nn.tanh, input_shape = input_shape))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Conv2D(16, kernel_size=(4,4),activation = tf.nn.tanh))
model.add(MaxPooling2D(pool_size = (3, 3)))
model.add(Conv2D(32, kernel_size=(3,3),activation = tf.nn.tanh))
model.add(MaxPooling2D(pool_size = (3, 3)))
model.add(Flatten())
model.add(Dense(150, activation = tf.nn.tanh, kernel_regularizer = keras.regularizers.l2(0.00005)))
model.add(Dropout(0.5))
model.add(Dense(6, activation = tf.nn.softmax))
#setting an optimizer with a given loss function
opt = SGD(lr = 0.01, momentum = 0.9)
model.compile(optimizer = opt, loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.fit(x = train_data, y = train_labels, epochs = 10, batch_size = 77)
The error message I got:
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
InvalidArgumentError: Received a label value of 13269 which is outside the valid range of [0, 6). Label values: 8823 3208 9410 5223 8817 3799 6588 1779 1371 5017 9788 9886 3345 1815 5943 37 675 2396 4485 9528 11082 12457 13269 5488 3250 12896 13251 1854 10942 6287 6232 2944
[[node loss_24/dense_55_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at C:\Users\vardh\Anaconda3\envs\tf\lib\site-packages\keras\backend\tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_676176]
Function call stack:
keras_scratch_graph
Somehow I realized that my question is related to this Question InvalidArgumentError: Received a label value of 8825.....
Solution From that post:
#shaili posted,
In the last layer for eg you used model.add(Dense(1, activation='softmax')). Here 1 restricts its value from [0, 1) change its shape to the maximum output label. For eg your output is from label [0,7) then use model.add(Dense(7, activation='softmax'))
input_text = Input(shape=(max_len,), dtype=tf.string)
embedding = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(input_text)
x = Bidirectional(LSTM(units=512, return_sequences=True,
recurrent_dropout=0.2, dropout=0.2))(embedding)
x_rnn = Bidirectional(LSTM(units=512, return_sequences=True,
recurrent_dropout=0.2, dropout=0.2))(x)
x = add([x, x_rnn]) # residual connection to the first biLSTM
out = TimeDistributed(Dense(n_tags, activation="softmax"))(x)
Here in TimeDistributed layer n_tags is the length of tags from which I want to classify.
If I predict some other quantity such as q_tag whose length is different from n_tags i.e suppose 10 and length of n_tags is 7 and I received 8 as output label it will give the invalid argument error Received a label value of 8 which is outside the valid range of [0, 7).
From my experience,
Usually, this error generates due to the, no of classifications to be classified, inaccurately. Here in my code, model.add(Dense(6, activation = tf.nn.softmax) I gave 6 types of classification to be generated instead of 13596. However, this is not a fully working code, at least it gets my code running.

LSTM model has constant accuracy and doesn't variate

i'm stuck as you can see, with my lstm model. I'm trying to predict the amount of tons to produce per month. When i run the model to train the accuracy is almost constant, it has a minimal variation like:
0.34406
0.34407
0.34408
I tried different combination of activations, initializers and parameters, and the acc don't increase.
I don't know if the problem here is my data, my model or this value is the max acc the model can reach.
Here is the code (if you notice some libraries unused, its because i made some changes by the first version)
import numpy as np
import pandas as pd
from pandas.tseries.offsets import DateOffset
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn import preprocessing
import keras
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
from keras.optimizers import Adam
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
from plotly.offline import iplot
import matplotlib.pyplot as plt
import chart_studio.plotly as py
import plotly.offline as pyoff
import plotly.graph_objs as go
df_ventas = pd.read_csv('/content/drive/My Drive/proyectoPanimex/DEOPE.csv', parse_dates=['Data Emissão'], index_col=0, squeeze=True)
#df_ventas = df_ventas.resample('M').sum().reset_index()
df_ventas = df_ventas.drop(columns= ['weekday', 'month'], axis=1)
df_ventas = df_ventas.reset_index()
df_ventas = df_ventas.rename(columns= {'Data Emissão':'Fecha','Un':'Cantidad'})
df_ventas['dia'] = [x.day for x in df_ventas.Fecha]
df_ventas['mes']=[x.month for x in df_ventas.Fecha]
df_ventas['anio']=[x.year for x in df_ventas.Fecha]
df_ventas = df_ventas[:-48]
df_ventas = df_ventas.drop(columns='Fecha')
df_diff = df_ventas.copy()
df_diff['cantidad_anterior'] = df_diff['Cantidad'].shift(1)
df_diff = df_diff.dropna()
df_diff['diferencia'] = (df_diff['Cantidad'] - df_diff['cantidad_anterior'])
df_supervised = df_diff.drop(['cantidad_anterior'],axis=1)
#adding lags
for inc in range(1,31):
nombre_columna = 'retraso_' + str(inc)
df_supervised[nombre_columna] = df_supervised['diferencia'].shift(inc)
df_supervised = df_supervised.dropna()
df_supervisedNumpy = df_supervised.to_numpy()
train = df_supervisedNumpy
scaler = MinMaxScaler(feature_range=(0, 1))
X_train = scaler.fit(train)
train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
X_train, y_train = train_scaled[:, 1:], train_scaled[:, 0:1]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
#LSTM MODEL
model = Sequential()
act = 'tanh'
actF = 'relu'
model.add(LSTM(200, activation = act, input_dim=34, return_sequences=True ))
model.add(Dropout(0.15))
#model.add(Flatten())
model.add(LSTM(200, activation= act))
model.add(Dropout(0.2))
#model.add(Flatten())
model.add(Dense(200, activation= act))
model.add(Dropout(0.3))
model.add(Dense(1, activation= actF))
optimizer = keras.optimizers.Adam(lr=0.00001)
model.compile(optimizer=optimizer, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size = 100,
epochs = 50, verbose = 1)
hist = pd.DataFrame(history.history)
hist['Epoch'] = history.epoch
hist
History plot:
loss acc Epoch
0 0.847146 0.344070 0
1 0.769400 0.344070 1
2 0.703548 0.344070 2
3 0.698137 0.344070 3
4 0.653952 0.344070 4
As you can see the only value that change its loss, but what is going on with Acc?. I'm starting with machine learning, and i have no more knowledge to can see my errors. Thanks!
A Dense(1, activation='softmax') will always freeze and not learn anything
A Dense(1, activation='relu') will very probably freeze and not learn anything
A Dense(1, activation='sigmoid') is ideal for classification (binary) problems and somewhat good for regression with values between 0 and 1.
A Dense(1, activation='tanh') is somewhat good for regression with values between -1 and 1
A Dense(1, activation='softplus') is somewhat good for regression with values between 0 and +infinite
A Dense(1, actiavation='linear') is good for regression in general with no limits (but it's highly recommended that the data be normalized before)
For regression, you can't use accuracy, but the metrics 'mae' and 'mse' don't provide "relative" difference, they provide "absolute" mean difference, one linear, the other squared.
Your output activation should be linear for continuous prediction or softmax for classification. Also multiply your learning rate by 100. Your loss should be mean_absolute_error. You could also easily divide your lstm neurons by a factor of 10. The tanh should be replaced by relu or the likes.
For your accuracy problem, it makes no sense to use accuracy, since you're not trying to classify. For metrics, you can use mae. You're trying to know how far the prediction is from the actual target, on a continuous scale. Accuracy is for categories, not continuous data.

Categories