I am having a InvalidArgumentError with shapes using a toy dilation 1D-CNN example. My input
train_generator has the shape TensorShape([128, 1]) with 128 values and 1 expanded dimension to fit the convolutional features in.
def model():
return Sequential([
Convolution1D(1, 7, activation='relu', padding='causal', dilation_rate=2,
input_shape = np.shape(train_generator[0][0]))
])
EPOCHS = 4
model = model()
optimizer = Adam(lr=1.0e-4)
model.compile(optimizer=optimizer, loss='mse', metrics=['mse'])
model.summary()
print('Starting fit...')
history = model.fit(
train_generator, epochs=EPOCHS,verbose=1,
validation_data=val_generator)
The model summary gives me:
Model: "sequential_77"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_146 (Conv1D) (None, 128, 1) 8
And the error received is:
InvalidArgumentError: padded_shape[0]=13 is not divisible by block_shape[0]=2
[[node sequential_77/conv1d_146/Conv1D/SpaceToBatchND
(defined at C:\Users\xxxxx\anaconda3\envs\tensorflow27\lib\site-packages\keras\layers\convolutional.py:231)
]] [Op:__inference_train_function_129421]
In order to reproduce, create a vector of 128 random numbers, expand dimensions on -1 and use the same vector vector for both training and validation.
import numpy as np
import tensorflow as tf
vec = np.random.rand(128)
vec = tf.expand_dims(vec, axis = -1)
train_x = vec
train_y = vec
I'm not experiencing the issue with the following code:
import numpy as np
from tensorflow.keras import layers, Input, Sequential, optimizers
from keras.layers import Convolution1D
import tensorflow as tf
vec = np.random.rand(128)
vec = tf.expand_dims(vec, axis = -1)
train_x = vec
train_y = vec
def model():
return Sequential([
Convolution1D(1, 7, activation='relu', padding='causal', dilation_rate=2,
input_shape = (128, 1))
])
EPOCHS = 4
model = model()
optimizer = optimizers.Adam(learning_rate=1.0e-4)
model.compile(loss='mse', metrics=['mse'], optimizer=optimizer)
print(model.summary())
print('Starting fit...')
history = model.fit(train_x, train_y, epochs=EPOCHS, verbose=1, validation_data = (train_x, train_y))
Output:
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_6 (Conv1D) (None, 128, 1) 8
=================================================================
Total params: 8
Trainable params: 8
Non-trainable params: 0
_________________________________________________________________
None
Starting fit...
Epoch 1/4
WARNING:tensorflow:Model was constructed with shape (None, 128, 1) for input KerasTensor(type_spec=TensorSpec(shape=(None, 128, 1), dtype=tf.float32, name='conv1d_6_input'), name='conv1d_6_input', description="created by layer 'conv1d_6_input'"), but it was called on an input with incompatible shape (32, 1, 1).
WARNING:tensorflow:Model was constructed with shape (None, 128, 1) for input KerasTensor(type_spec=TensorSpec(shape=(None, 128, 1), dtype=tf.float32, name='conv1d_6_input'), name='conv1d_6_input', description="created by layer 'conv1d_6_input'"), but it was called on an input with incompatible shape (32, 1, 1).
1/4 [======>.......................] - ETA: 2s - loss: 0.3000 - mse: 0.3000WARNING:tensorflow:Model was constructed with shape (None, 128, 1) for input KerasTensor(type_spec=TensorSpec(shape=(None, 128, 1), dtype=tf.float32, name='conv1d_6_input'), name='conv1d_6_input', description="created by layer 'conv1d_6_input'"), but it was called on an input with incompatible shape (32, 1, 1).
4/4 [==============================] - 1s 132ms/step - loss: 0.3131 - mse: 0.3131 - val_loss: 0.3131 - val_mse: 0.3131
Epoch 2/4
4/4 [==============================] - 0s 30ms/step - loss: 0.3131 - mse: 0.3131 - val_loss: 0.3131 - val_mse: 0.3131
Epoch 3/4
4/4 [==============================] - 0s 30ms/step - loss: 0.3131 - mse: 0.3131 - val_loss: 0.3131 - val_mse: 0.3131
Epoch 4/4
4/4 [==============================] - 0s 30ms/step - loss: 0.3131 - mse: 0.3131 - val_loss: 0.3131 - val_mse: 0.3131
Related
I'm trying to achieve binary classification using MobileNetV2 in TensorFlow. I have two folders A and B and I'm using image_dataset_from_directory function to make them into two classes for training.
BATCH_SIZE = 32
IMG_SIZE = (224, 224)
train_directory = "Train_set/"
test_directory = "Test_set/"
train_dataset = image_dataset_from_directory(train_directory, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE)
validation_dataset = image_dataset_from_directory(test_directory, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE)
I'm preprocessing the input before passing it to the net.
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input```
Then I'm creating the model using the code:
def alpaca_model(image_shape=IMG_SIZE):
input_shape = image_shape + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape,
include_top=False, # <== Important!!!!
weights='imagenet') # From imageNet
# Freeze the base model by making it non trainable
base_model.trainable = False
# create the input layer (Same as the imageNetv2 input size)
inputs = tf.keras.Input(shape=input_shape)
# data preprocessing using the same weights the model was trained on
x = preprocess_input(inputs)
# set training to False to avoid keeping track of statistics in the batch norm layer
x = base_model(x, training=False)
# Add the new Binary classification layers
# use global avg pooling to summarize the info in each channel
x = tf.keras.layers.GlobalAveragePooling2D()(x)
#include dropout with probability of 0.2 to avoid overfitting
x = tf.keras.layers.Dropout(0.2)(x)
# create a prediction layer with one neuron (as a classifier only needs one)
prediction_layer = tf.keras.layers.Dense(1, activation="sigmoid")
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
The model summary looks something like this
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 224, 224, 3)] 0
tf.math.truediv_1 (TFOpLamb (None, 224, 224, 3) 0
da)
tf.math.subtract_1 (TFOpLam (None, 224, 224, 3) 0
bda)
mobilenetv2_1.00_224 (Funct (None, 7, 7, 1280) 2257984
ional)
global_average_pooling2d_1 (None, 1280) 0
(GlobalAveragePooling2D)
dropout_1 (Dropout) (None, 1280) 0
dense_1 (Dense) (None, 1) 1281
=================================================================
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_________________________________________________________________
Then the model is compiled using the following:
loss_function=tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
metrics=['accuracy', tf.metrics.Recall(), tf.metrics.Precision()]
These are the stats of model.fit and model.evaluate
total_epochs = 5
history_fine = model2.fit(train_dataset, epochs=total_epochs, validation_data=validation_dataset)
Epoch 1/5
54/54 [==============================] - 213s 3s/step - loss: 0.2236 - accuracy: 0.9013 - recall: 0.9149 - precision: 0.8852 - val_loss: 0.0856 - val_accuracy: 0.9887 - val_recall: 0.9950 - val_precision: 0.9803
Epoch 2/5
54/54 [==============================] - 217s 4s/step - loss: 0.0614 - accuracy: 0.9855 - recall: 0.9928 - precision: 0.9776 - val_loss: 0.0439 - val_accuracy: 0.9977 - val_recall: 1.0000 - val_precision: 0.9950
Epoch 3/5
54/54 [==============================] - 216s 4s/step - loss: 0.0316 - accuracy: 0.9948 - recall: 0.9988 - precision: 0.9905 - val_loss: 0.0297 - val_accuracy: 0.9977 - val_recall: 1.0000 - val_precision: 0.9950
Epoch 4/5
54/54 [==============================] - 217s 4s/step - loss: 0.0258 - accuracy: 0.9954 - recall: 1.0000 - precision: 0.9905 - val_loss: 0.0373 - val_accuracy: 0.9910 - val_recall: 0.9850 - val_precision: 0.9949
Epoch 5/5
54/54 [==============================] - 220s 4s/step - loss: 0.0242 - accuracy: 0.9942 - recall: 0.9988 - precision: 0.9893 - val_loss: 0.0225 - val_accuracy: 0.9977 - val_recall: 1.0000 - val_precision: 0.9950
model2.evaluate(validation_dataset)
14/14 [==============================] - 15s 354ms/step - loss: 0.0225 - accuracy: 0.9977 - recall: 1.0000 - precision: 0.9950
The stats are really good. But when I use the same validation set and check the prediction for individual pics from both folders A and B and plot the predictions, the points don't seem to be linearly seperable.
A = []
for i in os.listdir("Test_set\A"):
location = f"Test_set\A\{i}"
my_image = tf.keras.preprocessing.image.load_img(location, target_size=(224, 224))
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input
#preprocess the image
my_image = tf.keras.preprocessing.image.img_to_array(my_image)
my_image = my_image.reshape((1, my_image.shape[0], my_image.shape[1],
my_image.shape[2]))
my_image = preprocess_input(my_image)
#make the prediction
prediction = model2.predict(my_image)
# print(prediction)
A.append(float(prediction))
B = []
for i in os.listdir("Test_set\B"):
location = f"Test_set\B\{i}"
my_image = tf.keras.preprocessing.image.load_img(location, target_size=(224, 224))
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input
#preprocess the image
my_image = tf.keras.preprocessing.image.img_to_array(my_image)
my_image = my_image.reshape((1, my_image.shape[0], my_image.shape[1],
my_image.shape[2]))
my_image = preprocess_input(my_image)
#make the prediction
prediction = model2.predict(my_image)
# print(prediction)
B.append(float(prediction))
Since you have two classes you should replace
prediction_layer = tf.keras.layers.Dense(1, activation="sigmoid")
with
prediction_layer = tf.keras.layers.Dense(2, activation="softmax")
The number of units in the classifier's final layer is equal to the number of classes.
After this, you must re-train the model.
I'm implementing my first Neural Network, it being an LSTM for binary sentiment analysis classification. I've pre-processed the data with lowering the letters, tokenizing and removing most punctuation (keeping only .,').
I'm also using GloVe's 100d pre-trained embeddings for this.
The problem is: Whatever I do the accuracy is terrible and doesn't change with epocs (also doesn't change when changing the LSTM architecture)
I've tried changing the optimizer and its learning rate, adding more neurons to the LSTM, changing number of epochs and batch size.
Nothing seems to work
def setLSTM(data, stopRem, stemm, lemma, negHand):
#pre-processing data
data = pre_processing(data, stopRem, stemm, lemma, negHand)
print(data[1])
#splitting data
X_train, X_test, y_train, y_test = datasplit(data)
#Setting the words as unique indexes (max 10k unique indexes)
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
#getting vocabulary
vocab = tokenizer.word_index.items()
print(vocab)
vocab_size = vocab_size = len(tokenizer.word_index) + 1
#maxlen = Maxlen is correspondes to the maximum tweet length (so that we can add padding to shorter ones)
maxlen = len(max((X_train + X_test)))
print("Maxlen is: ",maxlen)
#Padding the sequences to guarantee that all tweets have the same length
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
#Create embedding matrix with zeros (because some of the vocabulary might not exist in the embeddings)
#and adding the embeddings we have
embedding_matrix = zeros((vocab_size, 100))
for idx,word in vocab:
embedding_vector = embeddings.get(word)
if embedding_vector is not None:
embedding_matrix[idx] = embedding_vector
#creating the model with its layers (embedding layer, lstm layer, dense layer)
model = Sequential()
#The embedding layer has "trainable=False" because we're using pre-trained embeddings
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_matrix], input_length=maxlen, trainable=False)
model.add(embedding_layer)
model.add(Dropout(0.2))
#Adding an LSTM layer with 128 neurons
model.add(LSTM(units=100))
model.add(Dropout(0.2))
#Adding dense layer with sigmoid activation
model.add(Dense(1, activation='sigmoid'))
#opt = Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, amsgrad=False)
#Compiling model ("loss='binary_crossentropy'" because we're dealing with a binary classification problem)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, batch_size=64, epochs=5, verbose=1, validation_split=0.2)
score = model.evaluate(X_test, y_test, verbose=1)
print("Test Score:", score[0])
print("Test Accuracy:", score[1])
setLSTM(tweets,False,False,False,False)
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_9 (Embedding) (None, 13, 100) 1916600
_________________________________________________________________
dropout_1 (Dropout) (None, 13, 100) 0
_________________________________________________________________
lstm_9 (LSTM) (None, 100) 80400
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
dense_9 (Dense) (None, 1) 101
=================================================================
Total params: 1,997,101
Trainable params: 80,501
Non-trainable params: 1,916,600
_________________________________________________________________
None
Train on 10852 samples, validate on 2713 samples
Epoch 1/5
10852/10852 [==============================] - 5s 448us/step - loss: 0.6920 - acc: 0.5275 - val_loss: 0.6916 - val_acc: 0.5404
Epoch 2/5
10852/10852 [==============================] - 4s 360us/step - loss: 0.6917 - acc: 0.5286 - val_loss: 0.6908 - val_acc: 0.5404
Epoch 3/5
10852/10852 [==============================] - 4s 365us/step - loss: 0.6920 - acc: 0.5286 - val_loss: 0.6907 - val_acc: 0.5404
Epoch 4/5
10852/10852 [==============================] - 4s 382us/step - loss: 0.6916 - acc: 0.5286 - val_loss: 0.6903 - val_acc: 0.5404
Epoch 5/5
10852/10852 [==============================] - 4s 383us/step - loss: 0.6916 - acc: 0.5264 - val_loss: 0.6906 - val_acc: 0.5404
4522/4522 [==============================] - 1s 150us/step
Test Score: 0.6925433831950933
Test Accuracy: 0.5176913142204285
In order to know what a certain filter, in a convolutional layer in a convolutional neural network, is sensitive to, one can apply gradient-based filters visualization.
The idea is to feed a random image into the network and then find the gradients that maximizes the activation of the filter's feature map. Add those gradients to the image and iterate.
Let:
conv_node be a convolutional layer in a neural network
filter_index be the index of the filter we want to visualize
The proposed approach can be found here: https://github.com/penny4860/cnn-visualizer/blob/master/src/utils.py
We can implement the proposed approach in TensorFlow using the following pseudo code:
loss = average(get_feature_map(conv_node, filter_index))
gradients = gradients(loss, input_image)
gradients = normalize(gradients)
Using a TensorFlow session:
gradients_values = session.run(gradients)
random_input_image + = gradients_values
My Question is:
How to implement the same procedure using TensorFlow's while_loop API?
A custom callback is a powerful tool to customize the behavior of a tensorflow model during training, evaluation, or inference.
You can utilize callbacks for your tasks. Below is an example where we are visualizing the Kernels of model.layer[4] after every epoch and also capturing the gradients after every epoch. Similarly you can create your own customize functions using the inbuilt methods. You can find more about it here - https://www.tensorflow.org/guide/keras/custom_callback
Note: I was using tensorflow 1.15.0
# Importing dependency
%tensorflow_version 1.x
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization
import numpy as np
from matplotlib import pyplot
np.random.seed(1000)
# Get Data
import tflearn.datasets.oxflower17 as oxflower17
x, y = oxflower17.load_data(one_hot=True)
# Create a sequential model
model = Sequential()
# 1st Convolutional Layer
model.add(Conv2D(filters=5, input_shape=(224,224,3), kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation before passing it to the next layer
model.add(BatchNormalization())
# 2nd Convolutional Layer
model.add(Conv2D(filters=10, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())
# 3rd Convolutional Layer
model.add(Conv2D(filters=5, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))
# Batch Normalisation
model.add(BatchNormalization())
# Passing it to a dense layer
model.add(Flatten())
# 1st Dense Layer
model.add(Dense(5, input_shape=(224*224*3,)))
model.add(Activation('relu'))
# Output Layer
model.add(Dense(17))
model.add(Activation('softmax'))
model.summary()
# Compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
epoch_gradient = []
epoch_count = 0
def get_gradient_func(model):
grads = K.gradients(model.total_loss, model.trainable_weights)
inputs = model._feed_inputs + model._feed_targets + model._feed_sample_weights
func = K.function(inputs, grads)
return func
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
print("\n","Calculating Gradient for Epoch ",(epoch+1))
get_gradient = get_gradient_func(model)
grads = get_gradient([x, y, np.ones(len(y))])
epoch_gradient.append(grads)
# Visualize the Kernels for Layer 5 of the Model
print("\n","Visualizing the kernels for Layer 5 of the Model for Epoch ",(epoch+1))
# retrieve weights from the second hidden layer
filters, biases = model.layers[4].get_weights()
# normalize filter values to 0-1 so we can visualize them
f_min, f_max = filters.min(), filters.max()
filters = (filters - f_min) / (f_max - f_min)
# plot all the filters
# n_filters = outgoing filters
n_filters, ix = 10, 1
for i in range(n_filters):
# get the filter
f = filters[:, :, :, i]
# Range of incoming filters
for j in range(5):
# specify subplot and turn of axis
ax = pyplot.subplot(10, 5, ix)
ax.set_xticks([])
ax.set_yticks([])
# plot filter channel in grayscale
pyplot.imshow(f[:, :, j], cmap='gray')
ix += 1
# show the figure
pyplot.show()
epoch = 4
model.fit(x, y, batch_size=64, epochs= epoch, verbose=1, validation_split=0.2, shuffle=True, callbacks=[GradientCalcCallback()])
# Convert to a array
gradient = np.asarray(epoch_gradient)
print("Shape of the Captured Gradient Array :",gradient.Shape)
Output -
Model: "sequential_29"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_87 (Conv2D) (None, 224, 224, 5) 140
_________________________________________________________________
activation_145 (Activation) (None, 224, 224, 5) 0
_________________________________________________________________
max_pooling2d_24 (MaxPooling (None, 112, 112, 5) 0
_________________________________________________________________
batch_normalization_24 (Batc (None, 112, 112, 5) 20
_________________________________________________________________
conv2d_88 (Conv2D) (None, 112, 112, 10) 460
_________________________________________________________________
activation_146 (Activation) (None, 112, 112, 10) 0
_________________________________________________________________
max_pooling2d_25 (MaxPooling (None, 56, 56, 10) 0
_________________________________________________________________
batch_normalization_25 (Batc (None, 56, 56, 10) 40
_________________________________________________________________
conv2d_89 (Conv2D) (None, 56, 56, 5) 455
_________________________________________________________________
activation_147 (Activation) (None, 56, 56, 5) 0
_________________________________________________________________
max_pooling2d_26 (MaxPooling (None, 28, 28, 5) 0
_________________________________________________________________
batch_normalization_26 (Batc (None, 28, 28, 5) 20
_________________________________________________________________
flatten_29 (Flatten) (None, 3920) 0
_________________________________________________________________
dense_58 (Dense) (None, 5) 19605
_________________________________________________________________
activation_148 (Activation) (None, 5) 0
_________________________________________________________________
dense_59 (Dense) (None, 17) 102
_________________________________________________________________
activation_149 (Activation) (None, 17) 0
=================================================================
Total params: 20,842
Trainable params: 20,802
Non-trainable params: 40
_________________________________________________________________
Train on 1088 samples, validate on 272 samples
Epoch 1/4
960/1088 [=========================>....] - ETA: 0s - loss: 2.8102 - acc: 0.1094
Calculating Gradient for Epoch 1
Visualizing the kernels for Layer 5 of the Model for Epoch 1
1088/1088 [==============================] - 9s 8ms/sample - loss: 2.7977 - acc: 0.1121 - val_loss: 2.8206 - val_acc: 0.1250
Epoch 2/4
960/1088 [=========================>....] - ETA: 0s - loss: 2.5060 - acc: 0.1979
Calculating Gradient for Epoch 2
Visualizing the kernels for Layer 5 of the Model for Epoch 2
1088/1088 [==============================] - 6s 5ms/sample - loss: 2.5227 - acc: 0.1921 - val_loss: 2.8027 - val_acc: 0.1140
Epoch 3/4
960/1088 [=========================>....] - ETA: 0s - loss: 2.3459 - acc: 0.2583
Calculating Gradient for Epoch 3
Visualizing the kernels for Layer 5 of the Model for Epoch 3
1088/1088 [==============================] - 5s 5ms/sample - loss: 2.3493 - acc: 0.2592 - val_loss: 2.7985 - val_acc: 0.0956
Epoch 4/4
960/1088 [=========================>....] - ETA: 0s - loss: 2.1954 - acc: 0.3063
Calculating Gradient for Epoch 4
Visualizing the kernels for Layer 5 of the Model for Epoch 4
1088/1088 [==============================] - 6s 5ms/sample - loss: 2.1978 - acc: 0.3006 - val_loss: 2.8202 - val_acc: 0.0551
Shape of the Captured Gradient Array : (4, 16)
I'm trying to train an autoencoder in Keras for signal processing but I'm somehow failing.
My inputs are segments of 128 frames length for 6 measures (acceleration_x/y/z, gyro_x/y/z), so the overall shape of my dataset is (22836, 128, 6) where 22836 is the sample size.
This is the sample code I'm using for the autoencoder:
X_train, X_test, Y_train, Y_test = load_dataset()
# reshape the input, whose size is (22836, 128, 6)
X_train = X_train.reshape(X_train.shape[0], np.prod(X_train.shape[1:]))
X_test = X_test.reshape(X_test.shape[0], np.prod(X_test.shape[1:]))
# now the shape will be (22836, 768)
### MODEL ###
input_shape = [X_train.shape[1]]
X_input = Input(input_shape)
x = Dense(1000, activation='sigmoid', name='enc0')(X_input)
encoded = Dense(350, activation='sigmoid', name='enc1')(x)
x = Dense(1000, activation='sigmoid', name='dec0')(encoded)
decoded = Dense(input_shape[0], activation='sigmoid', name='dec1')(x)
model = Model(inputs=X_input, outputs=decoded, name='autoencoder')
model.compile(optimizer='rmsprop', loss='mean_squared_error')
print(model.summary())
The output of model.summary() is
Model summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_55 (InputLayer) (None, 768) 0
_________________________________________________________________
enc0 (Dense) (None, 1000) 769000
_________________________________________________________________
enc1 (Dense) (None, 350) 350350
_________________________________________________________________
dec1 (Dense) (None, 1000) 351000
_________________________________________________________________
dec0 (Dense) (None, 768) 768768
=================================================================
Total params: 2,239,118
Trainable params: 2,239,118
Non-trainable params: 0
The training is done via
# train the model
history = model.fit(x = X_train, y = X_train,
epochs=5,
batch_size=32,
validation_data=(X_test, X_test))
where I'm simply trying to learn the identity function which yields:
Train on 22836 samples, validate on 5709 samples
Epoch 1/5
22836/22836 [==============================] - 27s 1ms/step - loss: 0.9481 - val_loss: 0.8862
Epoch 2/5
22836/22836 [==============================] - 24s 1ms/step - loss: 0.8669 - val_loss: 0.8358
Epoch 3/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8337 - val_loss: 0.8146
Epoch 4/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8164 - val_loss: 0.7960
Epoch 5/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8004 - val_loss: 0.7819
At this point, to try to understand how well it performed, I check the plot of some true inputs vs the predicted ones:
prediction = model.predict(X_test)
for i in np.random.randint(0, 100, 7):
pred = prediction[i, :].reshape(128,6)
# getting only values for acceleration_x
pred = pred[:, 0]
true = X_test[i, :].reshape(128,6)
# getting only values for acceleration_x
true = true[:, 0]
# plot original and reconstructed
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(20, 6))
ax1.plot(true, color='green')
ax2.plot(pred, color='red')
and these are some of the plots which appear to be completely wrong:
Do you have any suggestion on what's wrong, aside from the small number of epochs (which actually do not seem to make any difference)?
Your data is not in the range [0,1] so why do you use sigmoid as the activation function in the last layer? Remove the activation function from the last layer (and it might be better to use relu in the previous layers).
Also normalize the training data. You can use feature-wise normalization:
X_mean = X_train.mean(axis=0)
X_train -= X_mean
X_std = X_train.std(axis=0)
X_train /= X_std + 1e-8
And don't forget to use the computed statistics (X_mean and X_std) in inference time (i.e. testing) to normalize test data.
I'm beginner with deep learning and keras/tensorflow.
I have followed the first tutorial on tensorflow.org: a basic classification with fashion MNIST.
In this case the input data are 60000, 28x28 images and the model is this:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
Compiled with:
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
At the end of training the model has this accuracy:
10000/10000 [==============================] - 0s 21us/step
Test accuracy: 0.8769
It's ok.
Now I'm trying to duplicate this model with another set of datas. New input is a dataset downloaded from kaggle.
The dataset has images with different sized of dogs and cats, so I have create a simple script that get the images, resize in 28x28 pixel and convert in a numpy array.
This is the code to do this:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from tensorflow.keras.models import load_model
from PIL import Image
import os
# Helper libraries
import numpy as np
# base path dataset
base_path = './dataset/'
training_path = base_path + "training_set/"
test_path = base_path + "test_set/"
# size rate of images
size = 28, 28
#
train_images = []
train_labels = []
test_images = []
test_labels = []
classes = ['dogs', 'cats']
# Scorre sulle cartelle contenute nel path e trasforma le immagini in nparray
def from_files_to_nparray(path):
images = []
labels = []
for subfolder in os.listdir(path):
if subfolder == '.DS_Store':
continue
for image_name in os.listdir(path + subfolder):
if not image_name.endswith('.jpg'):
continue
img = Image.open(path + subfolder + "/" + image_name).convert("L").resize(size) # convert to grayscale and resize
npimage = np.asarray(img)
images.append(npimage)
labels.append(classes.index(subfolder))
img.close()
# convertt to np arrays
images = np.asarray(images)
labels = np.asarray(labels)
# Normalize to [0, 1]
images = images / 255.0
return (images, labels)
(train_images, train_labels) = from_files_to_nparray(training_path)
(test_images, test_labels) = from_files_to_nparray(test_path)
At the end I have these shapes:
Train images shape : (8000, 128, 128)
Labels images shape : (8000,)
Test images shape : (2000, 128, 128)
Test images shape : (2000,)
After training the same model (but with the last dense layer format by 2 neurons) I have this result, that should be ok:
Train images shape : (8000, 28, 28)
Labels images shape : (8000,)
Test images shape : (2000, 28, 28)
Test images shape : (2000,)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 128) 100480
_________________________________________________________________
dense_1 (Dense) (None, 2) 258
=================================================================
Total params: 100,738
Trainable params: 100,738
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
2018-07-27 15:25:51.283117: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
8000/8000 [==============================] - 1s 66us/step - loss: 0.6924 - acc: 0.5466
Epoch 2/5
8000/8000 [==============================] - 0s 39us/step - loss: 0.6679 - acc: 0.5822
Epoch 3/5
8000/8000 [==============================] - 0s 41us/step - loss: 0.6593 - acc: 0.6048
Epoch 4/5
8000/8000 [==============================] - 0s 39us/step - loss: 0.6545 - acc: 0.6134
Epoch 5/5
8000/8000 [==============================] - 0s 39us/step - loss: 0.6559 - acc: 0.6039
2000/2000 [==============================] - 0s 33us/step
Test accuracy: 0.592
Now, the question is, if I try to change the input size from 28x28 to, for example 128x128 the result is this:
Train images shape : (8000, 128, 128)
Labels images shape : (8000,)
Test images shape : (2000, 128, 128)
Test images shape : (2000,)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 16384) 0
_________________________________________________________________
dense (Dense) (None, 128) 2097280
_________________________________________________________________
dense_1 (Dense) (None, 2) 258
=================================================================
Total params: 2,097,538
Trainable params: 2,097,538
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
2018-07-27 15:27:41.966860: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
8000/8000 [==============================] - 4s 483us/step - loss: 8.0341 - acc: 0.4993
Epoch 2/5
8000/8000 [==============================] - 3s 362us/step - loss: 8.0590 - acc: 0.5000
Epoch 3/5
8000/8000 [==============================] - 3s 351us/step - loss: 8.0590 - acc: 0.5000
Epoch 4/5
8000/8000 [==============================] - 3s 342us/step - loss: 8.0590 - acc: 0.5000
Epoch 5/5
8000/8000 [==============================] - 3s 342us/step - loss: 8.0590 - acc: 0.5000
2000/2000 [==============================] - 0s 217us/step
Test accuracy: 0.5
Why? Though adding a new dense layer or increasing the neuron numbers the result is the same.
What is the connection between the input size and the model layers? Thanks!
The problem is that you have more parameters to train in the second example. In the first example you just have 100k Parameters. You train them with 8k images.
In the second example you have 2000k Parameters and you try to train them with the same amount of images. This does not work because there is a relation between the free parameters and the number of samples. There is no exact formula to calculate this relation, but there is a rule of thumb that you should have more samples than trainable parameters.
What you can try it to train more epochs and to look how it works but in general you need more data for more complex models.