I am trying to train a neural network to make Inverse Kinematics calculations for a robotic arm with predefined segment lengths. I am not including the segment lengths in neural network inputs but rather through the training data. The training data is a pandas dataframe with the spatial mappings of the arm, with labels being the angles of rotation for the three segments of the arm and the features being the solutions of the x and y coordinates of where the endpoint of the last segment would end up in.
I am using Keras with Theano as the Backend.
model = Sequential([
Dense(3, input_shape=(2,), activation="relu"),
Dense(3, activation="relu"),
Dense(3)
])
model.summary()
model.compile(Adam(lr=0.001), loss='mean_squared_error', metrics=['accuracy'])
model.fit(samples, labels, validation_split=0.2, batch_size=1000, epochs=10,shuffle=True, verbose=1)
score = model.evaluate(samples, labels, batch_size=32, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])
weights = model.get_weights()
predictions = model.predict(samples, verbose=1)
print predictions
model.save("IK_NN_7-4-3_keras.h5")
OUTPUT===============================================================
Train on 6272736 samples, validate on 1568184 samples
Epoch 1/10
- 5s - loss: 10198.7558 - acc: 0.9409 - val_loss: 12149.1703 - val_acc: 0.9858
Epoch 2/10
- 5s - loss: 4272.9105 - acc: 0.9932 - val_loss: 12117.0527 - val_acc: 0.9858
Epoch 3/10
- 5s - loss: 4272.7862 - acc: 0.9932 - val_loss: 12113.3804 - val_acc: 0.9858
Epoch 4/10
- 5s - loss: 4272.7567 - acc: 0.9932 - val_loss: 12050.8211 - val_acc: 0.9858
Epoch 5/10
- 5s - loss: 4272.7271 - acc: 0.9932 - val_loss: 12036.5538 - val_acc: 0.9858
Epoch 6/10
- 5s - loss: 4272.7350 - acc: 0.9932 - val_loss: 12103.8665 - val_acc: 0.9858
Epoch 7/10
- 5s - loss: 4272.7553 - acc: 0.9932 - val_loss: 12175.0442 - val_acc: 0.9858
Epoch 8/10
- 5s - loss: 4272.7282 - acc: 0.9932 - val_loss: 12161.4815 - val_acc: 0.9858
Epoch 9/10
- 5s - loss: 4272.7213 - acc: 0.9932 - val_loss: 12101.4021 - val_acc: 0.9858
Epoch 10/10
- 5s - loss: 4272.7909 - acc: 0.9932 - val_loss: 12152.4966 - val_acc: 0.9858
Test score: 5848.549130022683
Test accuracy: 0.9917127071823204
[[ 59.452095 159.26912 258.94424 ]
[ 59.382706 159.41936 259.25183 ]
[ 59.72419 159.69777 259.48584 ]
...
[ 59.58721 159.33467 258.9603 ]
[ 59.51745 159.69331 259.62595 ]
[ 59.984367 160.5533 260.7689 ]]
Both the test accuracy and validation accuracy are seem good but they don't exactly reflect the reality. The predictions should have looked something like this
[[ 0 0 0]
[ 0 0 1]
[ 0 0 2]
...
[358 358 359]
[358 359 359]
[359 359 359]]
Since I fed back the same features expecting to get the same labels. Instead I'm getting this numbers for some reason:
[[ 59.452095 159.26912 258.94424 ]
[ 59.382706 159.41936 259.25183 ]
[ 59.72419 159.69777 259.48584 ]
...
[ 59.58721 159.33467 258.9603 ]
[ 59.51745 159.69331 259.62595 ]
[ 59.984367 160.5533 260.7689 ]]
Thank you for your time.
First of all your metric is accuracy and you are predicting continuous values. You get predictions, but they don´t make any sense. Your problem is a regression and your metric is for classification. You could just use "MSE" "R²" or other regression metrics
from keras import metrics
model.compile(loss='mse', optimizer='adam', metrics=[metrics.mean_squared_error, metrics.mean_absolute_error])
Additionally you should consider increasing the number of neurons and if your input data is really only 2 dimensions, think about some shallow models, not ANNs. (SVM with gauss kernel e.g.)
Related
df_btc1=df_btc.sort_index(ascending=True,axis=0)
new_dataset=pd.DataFrame(index=range(0,len(df_btc)),columns=['Date','Close'])
L=len(df_btc)
for i in range(0,len(df_btc1)):
new_dataset["Date"][i]=df_btc1['Date'][i]
new_dataset["Close"][i]=df_btc1["Close"][i]
#Normalize the Dataset
scaler=MinMaxScaler(feature_range=(0,1))
new_dataset.index=new_dataset.Date
new_dataset.drop("Date",axis=1,inplace=True)
final_dataset=new_dataset.values
train_data=final_dataset[0:L-300,:]
valid_data=final_dataset[L-300:,:]
scaled_data=scaler.fit_transform(final_dataset)
x_train_data,y_train_data=[],[]
for i in range(300,len(train_data)):
x_train_data.append(scaled_data[i-300:i,0])
y_train_data.append(scaled_data[i,0])
x_train_data,y_train_data=np.array(x_train_data),np.array(y_train_data)
x_train_data=np.reshape(x_train_data,(x_train_data.shape[0],x_train_data.shape[1],1))
#Build and train the LSTM model
lstm_model=Sequential()
lstm_model.add(LSTM(units=50,return_sequences=True,input_shape=(x_train_data.shape[1],1)))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50, return_sequences=True))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50, return_sequences=True))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(1))
inputs_data=new_dataset[len(new_dataset)-len(valid_data)-300:].values
inputs_data=inputs_data.reshape(-1,1)
inputs_data=scaler.transform(inputs_data)
lstm_model.compile(loss='binary_crossentropy', metrics=['accuracy'],optimizer='adam')
lstm_model.fit(x_train_data,y_train_data,epochs=5,batch_size=5,verbose=2)
Epoch 1/5
280/280 - 91s - loss: 0.3312 - accuracy: 0.0000e+00 - 91s/epoch - 324ms/step
Epoch 2/5
280/280 - 85s - loss: 0.3344 - accuracy: 0.0000e+00 - 85s/epoch - 305ms/step
Epoch 3/5
280/280 - 83s - loss: 0.3286 - accuracy: 0.0000e+00 - 83s/epoch - 298ms/step
Epoch 4/5
280/280 - 84s - loss: 0.3267 - accuracy: 0.0000e+00 - 84s/epoch - 299ms/step
Epoch 5/5
280/280 - 83s - loss: 0.3297 - accuracy: 0.0000e+00 - 83s/epoch - 297ms/step
<keras.callbacks.History at 0x7f0f8c3e97d0>
enter image description here
The loss loss='binary_crossentropy' is adapted for binary classification problems but not for time series prediction problems.
For time series prediction, you should use mean square error.
Also, you cannot use accuracy for problems that are not classifiction, so just remove the metrics from your code, giving:
lstm_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer='adam')
I tried to train a CNN to classify 9 class of image. Each class has 1000 image for training. I tried training on VGG16 and VGG19, both can achieve validation accuracy of 90%. But when I tried to train on InceptionResNetV2 model, the model seems to stuck around 20% and 30%. Below is my code for InceptionResNetV2 and the training. What can I do to improve the training?
base_model = tf.keras.applications.InceptionResNetV2(input_shape=(IMG_HEIGHT, IMG_WIDTH ,3),weights = 'imagenet',include_top=False)
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
Flatten(),
Dense(1024, activation = 'relu', kernel_regularizer=regularizers.l2(0.001)),
LeakyReLU(alpha=0.4),
Dropout(0.5),
BatchNormalization(),
Dense(1024, activation = 'relu', kernel_regularizer=regularizers.l2(0.001)),
LeakyReLU(alpha=0.4),
Dense(9, activation = 'softmax')])
optimizer_model = tf.keras.optimizers.Adam(learning_rate=0.0001, name='Adam', decay=0.00001)
loss_model = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer_model, loss="categorical_crossentropy", metrics=['accuracy'])
Epoch 1/10
899/899 [==============================] - 255s 283ms/step - loss: 4.3396 - acc: 0.3548 - val_loss: 4.2744 - val_acc: 0.3874
Epoch 2/10
899/899 [==============================] - 231s 257ms/step - loss: 3.5856 - acc: 0.4695 - val_loss: 3.9151 - val_acc: 0.3816
Epoch 3/10
899/899 [==============================] - 225s 250ms/step - loss: 3.1451 - acc: 0.4959 - val_loss: 4.8801 - val_acc: 0.2425
Epoch 4/10
899/899 [==============================] - 227s 252ms/step - loss: 2.7771 - acc: 0.5124 - val_loss: 3.7167 - val_acc: 0.3023
Epoch 5/10
899/899 [==============================] - 231s 257ms/step - loss: 2.4993 - acc: 0.5260 - val_loss: 3.7276 - val_acc: 0.3770
Epoch 6/10
899/899 [==============================] - 227s 252ms/step - loss: 2.3148 - acc: 0.5251 - val_loss: 3.7677 - val_acc: 0.3115
Epoch 7/10
899/899 [==============================] - 234s 260ms/step - loss: 2.1381 - acc: 0.5379 - val_loss: 3.4867 - val_acc: 0.2862
Epoch 8/10
899/899 [==============================] - 230s 256ms/step - loss: 2.0091 - acc: 0.5367 - val_loss: 4.1032 - val_acc: 0.3080
Epoch 9/10
899/899 [==============================] - 225s 251ms/step - loss: 1.9155 - acc: 0.5399 - val_loss: 4.1270 - val_acc: 0.2954
Epoch 10/10
899/899 [==============================] - 232s 258ms/step - loss: 1.8349 - acc: 0.5508 - val_loss: 4.3918 - val_acc: 0.2276
VGG-16/19 has a depth of 23/26 layers, whereas, InceptionResNetV2 has a depth of 572 layers. Now, there is minimal domain similarity between medical images and imagenet dataset. In VGG, due to low depth the features you're getting are not that complex and network is able to classify it on the basis of Dense layer features. However, in IRV2 network, as it's too much deep, the output of the fc layer is more complex (visualize it something object like but for imagenet dataset), and, then the features obtained from these layers are unable to connect to the Dense layer features, and, hence overfitting. I think you were able to get my point.
Check out my answer to very similar question of yours on this link: Link. It will help improve your accuracy.
I am trying to train my model using transfer learning, for this I am using VGG16 model, stripped the top layers and froze first 2 layers for using imagenet initial weights. For fine tuning them I am using learning rate 0.0001, activation softmax, dropout 0.5, loss categorical crossentropy, optimizer SGD, classes 46.
I am just unable to understand the behavior while training. Train loss and acc both are fine (loss is decreasing, acc is increasing). Val loss is decreasing and acc is increasing as well, BUT they are always higher than the train loss and acc.
Assuming its overfitting I made the model less complex, increased the dropout rate, added more samples to val data, but nothing seemed to work. I am a newbie so any kind of help is appreciated.
26137/26137 [==============================] - 7446s 285ms/step - loss: 1.1200 - accuracy: 0.3810 - val_loss: 3.1219 - val_accuracy: 0.4467
Epoch 2/50
26137/26137 [==============================] - 7435s 284ms/step - loss: 0.9944 - accuracy: 0.4353 - val_loss: 2.9348 - val_accuracy: 0.4694
Epoch 3/50
26137/26137 [==============================] - 7532s 288ms/step - loss: 0.9561 - accuracy: 0.4530 - val_loss: 1.6025 - val_accuracy: 0.4780
Epoch 4/50
26137/26137 [==============================] - 7436s 284ms/step - loss: 0.9343 - accuracy: 0.4631 - val_loss: 1.3032 - val_accuracy: 0.4860
Epoch 5/50
26137/26137 [==============================] - 7358s 282ms/step - loss: 0.9185 - accuracy: 0.4703 - val_loss: 1.4461 - val_accuracy: 0.4847
Epoch 6/50
26137/26137 [==============================] - 7396s 283ms/step - loss: 0.9083 - accuracy: 0.4748 - val_loss: 1.4093 - val_accuracy: 0.4908
Epoch 7/50
26137/26137 [==============================] - 7424s 284ms/step - loss: 0.8993 - accuracy: 0.4789 - val_loss: 1.4617 - val_accuracy: 0.4939
Epoch 8/50
26137/26137 [==============================] - 7433s 284ms/step - loss: 0.8925 - accuracy: 0.4822 - val_loss: 1.4257 - val_accuracy: 0.4978
Epoch 9/50
26137/26137 [==============================] - 7445s 285ms/step - loss: 0.8868 - accuracy: 0.4851 - val_loss: 1.5568 - val_accuracy: 0.4953
Epoch 10/50
26137/26137 [==============================] - 7387s 283ms/step - loss: 0.8816 - accuracy: 0.4874 - val_loss: 1.4534 - val_accuracy: 0.4970
Epoch 11/50
26137/26137 [==============================] - 7374s 282ms/step - loss: 0.8779 - accuracy: 0.4894 - val_loss: 1.4605 - val_accuracy: 0.4912
Epoch 12/50
26137/26137 [==============================] - 7411s 284ms/step - loss: 0.8733 - accuracy: 0.4915 - val_loss: 1.4694 - val_accuracy: 0.5030
Yes, you are facing over-fitting issue. To mitigate, you can try to implement below steps
1.Shuffle the Data, by using shuffle=True in VGG16_model.fit. Code is shown below:
history = VGG16_model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1,
validation_data=(x_validation, y_validation), shuffle = True)
2.Use Early Stopping. Code is shown below
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
3.Use Regularization. Code for Regularization is shown below (You can try l1 Regularization or l1_l2 Regularization as well):
from tensorflow.keras.regularizers import l2
Regularizer = l2(0.001)
VGG16_model.add(Conv2D(96,11, 11, input_shape = (227,227,3),strides=(4,4), padding='valid', activation='relu', data_format='channels_last',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
VGG16_model.add(Dense(units = 2, activation = 'sigmoid',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
4.You can try using BatchNormalization.
5.Perform Image Data Augmentation using ImageDataGenerator. Refer this link for more info about that.
6.If the Pixels are not Normalized, Dividing the Pixel Values with 255 also helps
I am working on an image classification results. My training and testing split used the same random_state. Model definition is the same. However, when I run the model for multiple times, three out of four times, the model is not learning, the loss function does not go down; one out of four times, the model is learning, I can get good classificaiton results. I suspect the randomness comes from the ImageDataGenerator(). But I cannot figure out how to let the model learn every time.
I have a relative small labeled dataset, I don't have ways to increase the data size
I tried different optimizers, different batch size. It doesn't help. I found that when I reduce the trainable layers and make the later fully-connected layers smaller (reduce to 256 units), the model start to learn every time. But why big network does not learn well even on the training data set??? My understanding is that the model will overfit, but why in this case, it is not learning at all?
IMAGE_WIDTH=128
IMAGE_HEIGHT=128
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3 # RGB color
os.chdir(r"XXX")
filenames = os.listdir(r"XXX")
ref_db= pd.read_csv(r"XXX")
ref_db['obj_id']= [str(i)+ '.tif' for i in ref_db.OBJECTID.values ]
ref_db2= ref_db[['label', 'obj_id' ]]
ref_db2['label'] = ref_db2['label'].apply(str)
train_df, validate_df = train_test_split(ref_db2, test_size=0.20, random_state=42)
train_df = train_df.reset_index(drop=True)
validate_df = validate_df.reset_index(drop=True)
total_train = train_df.shape[0]
total_validate = validate_df.shape[0]
batch_size=64
train_datagen = ImageDataGenerator(
rotation_range=15,
rescale=1./255,
shear_range=0.1,
zoom_range=0.2,
horizontal_flip=True,
width_shift_range=0.1,
height_shift_range=0.1
)
train_generator = train_datagen.flow_from_dataframe(
train_df,
r"XXX",
x_col='obj_id',
y_col='Green_Roof',
target_size=IMAGE_SIZE,
class_mode='binary',
batch_size=batch_size
)
inputs= Input(shape=(IMAGE_WIDTH, IMAGE_HEIGHT, 3))
base_model = VGG19(weights='imagenet', include_top=False,)
for layer in base_model.layers[:-3]:
layer.trainable = False
x = base_model(inputs)
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
#x = Dropout(0.5)(x)
x = Dense(512, activation="relu")(x)
predictions = Dense(1, activation="sigmoid")(x)
model_vgg= Model(inputs=inputs , outputs=predictions)
model_vgg.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
#########################
history = model_vgg.fit_generator(
train_generator,
epochs=50,
validation_data=validation_generator,
validation_steps=total_validate//batch_size,
steps_per_epoch=total_train//batch_size,
verbose=2
)
This is the unwanted behavior, Model is not learning, all observations are predicted as 1, the loss is not dropping
Found 756 validated image filenames belonging to 2 classes.
Found 190 validated image filenames belonging to 2 classes.
Epoch 1/50
- 4s - loss: 4.0464 - acc: 0.6203 - val_loss: 4.9820 - val_acc: 0.6875
Epoch 2/50
- 2s - loss: 4.3811 - acc: 0.7252 - val_loss: 4.8856 - val_acc: 0.6935
Epoch 3/50
- 2s - loss: 5.0209 - acc: 0.6851 - val_loss: 5.3556 - val_acc: 0.6641
Epoch 4/50
- 2s - loss: 4.3583 - acc: 0.7266 - val_loss: 4.1142 - val_acc: 0.7419
Epoch 5/50
- 2s - loss: 4.9317 - acc: 0.6907 - val_loss: 4.7329 - val_acc: 0.7031
Epoch 6/50
- 2s - loss: 4.6275 - acc: 0.7097 - val_loss: 5.3998 - val_acc: 0.6613
Epoch 7/50
This is the expected behavior, model is learning, both 1 and 0 are predicted, the loss is dropping
Found 756 validated image filenames belonging to 2 classes.
Found 190 validated image filenames belonging to 2 classes.
Epoch 1/50
- 4s - loss: 2.1181 - acc: 0.6484 - val_loss: 0.8013 - val_acc: 0.6562
Epoch 2/50
- 2s - loss: 0.6609 - acc: 0.7096 - val_loss: 0.5670 - val_acc: 0.7581
Epoch 3/50
- 2s - loss: 0.6539 - acc: 0.6912 - val_loss: 0.5923 - val_acc: 0.6953
Epoch 4/50
- 2s - loss: 0.5695 - acc: 0.7083 - val_loss: 0.5426 - val_acc: 0.6774
Epoch 5/50
- 2s - loss: 0.5262 - acc: 0.7176 - val_loss: 0.5386 - val_acc: 0.6875
I am training a Keras model to predict availability of bike-sharing stations. I am giving in the training set a whole row with day of the year, time, weekday, station and free bikes. Each sample contains the availability for the previous day (144 samples) and I am trying to predict the availability for the next day (144 samples). The shapes for the sets used are
Train X (2362, 144, 5)
Train Y (2362, 144)
Test X (39, 144, 5)
Test Y (39, 144)
Validation X (1535, 144, 5)
Validation Y (1535, 144)
The model I am using is this one
model.add(LSTM(20, input_shape=(self.train_x.shape[1], self.train_x.shape[2]), return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(20))
model.add(Dense(144))
model.compile(loss='mse', optimizer='adam', metrics = ['acc', 'mape', 'mse'])
history = self.model.fit(self.train_x, self.train_y, batch_size=50, epochs=20, validation_data=(self.validation_x, self.validation_y), verbose=1, shuffle = True)
The predictions made after training have nothing to do with the expected output, they have like a sawtooth shape with values that exceed the original size.
The accuracy rarely goes up but loss has a normal shape.
As an example the history after each epoch looks like this
Epoch 17/20
2362/2362 [==============================] - 12s 5ms/step - loss: 9.1214 - acc: 0.0000e+00 - mean_absolute_percentage_error: 21925846.0813 - mean_squared_error: 9.1214 - val_loss: 9.0642 - val_acc: 0.0000e+00 - val_mean_absolute_percentage_error: 24162847.3779 - val_mean_squared_error: 9.0642
Epoch 18/20
2362/2362 [==============================] - 12s 5ms/step - loss: 8.2241 - acc: 0.0013 - mean_absolute_percentage_error: 21906919.9136 - mean_squared_error: 8.2241 - val_loss: 8.1923 - val_acc: 0.0000e+00 - val_mean_absolute_percentage_error: 22754663.8013 - val_mean_squared_error: 8.1923
Epoch 19/20
2362/2362 [==============================] - 12s 5ms/step - loss: 7.4190 - acc: 0.0000e+00 - mean_absolute_percentage_error: 21910003.1744 - mean_squared_error: 7.4190 - val_loss: 7.3926 - val_acc: 0.0000e+00 - val_mean_absolute_percentage_error: 24673277.8420 - val_mean_squared_error: 7.3926
Epoch 20/20
2362/2362 [==============================] - 12s 5ms/step - loss: 6.7067 - acc: 0.0013 - mean_absolute_percentage_error: 22076339.2168 - mean_squared_error: 6.7067 - val_loss: 6.6758 - val_acc: 6.5147e-04 - val_mean_absolute_percentage_error: 22987089.8436 - val_mean_squared_error: 6.6758
I really don't know where the problem might be, more layers?, less layers?, different approach?
UPDATE: Plots of training/test data. Left part of the plot shows the previous day of availability that is fed to the model and the right part shows what the result should be and the prediction made by the model.