Keras: How to connect a CNN model with a decision tree - python

I want to train a model to predict one's emotion from the physical signals. I have a physical signal and using it as input feature;
ecg(Electrocardiography)
I want to use the CNN architecture to extract features from the data, and then use these extracted features to feed a classical "Decision Tree Classifier". Below, you can see my CNN aproach without the decision tree;
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(3, activation = 'softmax'))
I want to edit this code so that, in the output layer there will be working decision tree instead of model.add(Dense(3, activation = 'softmax')). I have tried to save the outputs of the last convolutional layer like this;
output = model.layers[-6].output
And when I printed out the output variable, result was this;
THE OUTPUT: Tensor("conv1d_56/Relu:0", shape=(?, 8971, 30),
dtype=float32)
I guess, the output variable holds the extracted features. Now, how can I feed my decision tree classifier model with this data which is stored in the output variable? Here is the decision tree from scikit learn;
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier(criterion = 'entropy')
dtc.fit()
How should I feed the fit() method? Thanks in advance.

To extract a vector of features that you can pass on to another algorithm, you need a fully connected layer before your softmax layer. Something like this will add in a 128 dimensional layer just before your softmax layer:
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation = 'softmax'))
If you then run model.summary() you can see the name of the layers:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_9 (Conv1D) (None, 17941, 15) 915
_________________________________________________________________
max_pooling1d_9 (MaxPooling1 (None, 8970, 15) 0
_________________________________________________________________
dropout_10 (Dropout) (None, 8970, 15) 0
_________________________________________________________________
batch_normalization_9 (Batch (None, 8970, 15) 60
_________________________________________________________________
conv1d_10 (Conv1D) (None, 8911, 30) 27030
_________________________________________________________________
max_pooling1d_10 (MaxPooling (None, 2227, 30) 0
_________________________________________________________________
dropout_11 (Dropout) (None, 2227, 30) 0
_________________________________________________________________
batch_normalization_10 (Batc (None, 2227, 30) 120
_________________________________________________________________
flatten_6 (Flatten) (None, 66810) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 8551808
_________________________________________________________________
dropout_12 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 3) 387
=================================================================
Total params: 8,580,320
Trainable params: 8,580,230
Non-trainable params: 90
_________________________________________________________________
Once your network has been trained you can create a new model where the output layer becomes 'dense_7' and it'll generate 128 dimensional feature vectors:
feature_vectors_model = Model(model.input, model.get_layer('dense_7').output)
dtc_features = feature_vectors_model.predict(your_X_data) # fit your decision tree on this data

Related

Keras dense input size error, flatten returns (None, None)

I'm trying to implement this model to generate midi music but I'm getting an error
The last dimension of the inputs to `Dense` should be defined. Found `None`.
Here's my code
model = Sequential()
model.add(Bidirectional(LSTM(512, return_sequences=True), input_shape=(network_input.shape[1], network_input.shape[2])))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dropout(0.3))
model.add( LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(Flatten())
model.summary()
model.add(Dense(note_variants_count))
model.compile(loss='categorical_crossentropy', optimizer='adam')
And here's the summary before the dense layer
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_19 (Bidirectio (None, 100, 1024) 2105344
_________________________________________________________________
seq_self_attention_20 (SeqSe (None, None, 1024) 65601
_________________________________________________________________
dropout_38 (Dropout) (None, None, 1024) 0
_________________________________________________________________
lstm_40 (LSTM) (None, None, 512) 3147776
_________________________________________________________________
dropout_39 (Dropout) (None, None, 512) 0
_________________________________________________________________
flatten_14 (Flatten) (None, None) 0
=================================================================
Total params: 5,318,721
Trainable params: 5,318,721
Non-trainable params: 0
_________________________________________________________________
I think that the Flatten layer is causing the problem but I have no idea why it's returning a (None, None) shape.
You have to provide all dimensions for Flatten layer except for batch dimension.
Using RNN and 'return_sequence'
For RNNs like LSTM, there is an option to either return the whole sequence or just the results. In this case you want just the results. Changing just this one line
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=False)) # <== change to False
model.add(Dropout(0.3))
model.add(Flatten())
model.summary()
Returns the following network shape from summary
_________________________________________________________________
lstm_4 (LSTM) (None, 512) 3147776
_________________________________________________________________
dropout_3 (Dropout) (None, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 512) 0
=================================================================
Notice the reduction of rank of the output tensor from Rank 3 to Rank 2. This is because this output is just that, the output, and not the whole sequence under consideration with all the hidden states.

how many bidirectional lstm layers to use and how many is too many? Any advice on very imbalanced dataset?

Would much appreciate your help.
I am new to the RNN and I am trying to implement a RNN architecture to classify protein sequences. essentially they are one hot encoded np arrays.
I have an issue that the data is very imbalanced:
Examples:
Total: 34909
Positive: 282 (0.81% of total)
Therefore I am planning to implement the weights for the different classes by adding the class_weight=class_weight parameter when model is fitted.
I am also planning to use the f1 on the validation as a metric instead of accuracy or loss for the model as I am not interested in the true negatives.
Moreover, I am planning to implement transfer learning as I have dataseets with more positive data and datasets with only few points therefore I am planning to pretrain a general model and use the weights to further train on the specific problem.
I have come up with this architecture of the model however I am not sure if adding 4 bidirectional LSTM layers is a wise choice?:
from keras import regularizers
if output_bias is not None:
output_bias = Constant(output_bias)
model = Sequential()
# First LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=0.1), input_shape=(timesteps, features)))
model.add(Dropout(0.5))
# Second LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Third LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Forth LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=False)))
model.add(Dropout(0.5))
#First Dense Layer
model.add(Dense(units=128,kernel_initializer='he_normal',activation='relu'))
model.add(Dropout(0.5))
# Adding the output layer
if output_bias == None:
model.add(Dense(units=1, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001)))
else:
model.add(Dense(units=1, activation='sigmoid',
bias_initializer=output_bias,kernel_regularizer=regularizers.l2(0.001)))
model.compile(optimizer=Adam(lr=1e-3), loss=BinaryCrossentropy(), metrics=metrics)
model.build()
How do I know how many LSTM layers I should add? is it just trial and error?
Is there anything else I should include in the layers?
model.summary():
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_13 (Bidirectio (None, 5, 100) 28400
_________________________________________________________________
dropout_16 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_17 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_15 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_18 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_16 (Bidirectio (None, 100) 60400
_________________________________________________________________
dropout_19 (Dropout) (None, 100) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 12928
_________________________________________________________________
dropout_20 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 1) 129
=================================================================
Total params: 222,657
Trainable params: 222,657
Non-trainable params: 0
I have built this model by going through multiple tutorials such as https://www.tensorflow.org/tutorials/text/text_classification_rnn
and https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
Would appreciate if you could point in the right direction.
Thanks!

Tensorflow: Prediction of float value from image always returns 0

I have a model as follows:
from tensorflow import keras
from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPooling2D
model = keras.Sequential([
Conv2D(16, (3, 3), padding='same', activation='relu', input_shape=(480, 640, 3), data_format="channels_last"),
MaxPooling2D(),
Conv2D(32, (3, 3), padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, (3, 3), padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(480, activation='relu'),
Dense(1, activation="relu")
])
model.compile(optimizer='adam',
loss=keras.losses.MeanSquaredError(),
metrics=['accuracy'])
epochs = 3
model.fit(
x=train_images,
y=train_values,
epochs=epochs
)
The variable train_images is an array of PNG images (640x480 pixels) and train_values is an array of floats (e.g: [1.11842, -17.894, 2.03, ...].
My goal is to predict the float value (at least, to find some approximate value), so I suppose that MSE should be the loss function in this case.
However, after training the model, I only get zeros not only with model.predict(test_images) but also with model.predict(train_images).
Note: I have to recall that my batch contains only 37 images, and my test sample contains 14. I know that the size is ridiculous, but this script is just a concept for something bigger.
If it helps, here is the result of model.summary():
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 480, 640, 16) 448
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 240, 320, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 240, 320, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 120, 160, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 120, 160, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 60, 80, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 307200) 0
_________________________________________________________________
dense (Dense) (None, 480) 147456480
_________________________________________________________________
dense_1 (Dense) (None, 1) 481
=================================================================
Total params: 147,480,545
Trainable params: 147,480,545
Non-trainable params: 0
To start with change your activation function, relu limits values so anything below 0 = 0, thats not desire-able
Secondly normalize your y values, as it stands your values could be anywhere from -inf to +inf, range normalize them and store the normalization parameters. At run time you could always reverse this and get actual values
Also pump up the epochs, with that small a train set i suggest try overfitting the network, if a network overfits it will most likely train well
For now try these suggestions out, i think normalization is quite important
ALSO :: I suggest make the network much deeper, you need to extract the shapes and textures in the image and your network might not be deep enough (or as a matter of fact even be dense enough) to do that. I suggest use keras to load a pre trained model like VGG16, strip the head off add regression layers and transfer learn it onto your dataset. That could be better

Shapes error in Convolutional Neural Network

I'm trying to train a neural network with the following structure:
model = Sequential()
model.add(Conv1D(filters = 300, kernel_size = 5, activation='relu', input_shape=(4000, 1)))
model.add(Conv1D(filters = 300, kernel_size = 5, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(filters = 320, kernel_size = 5, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Dropout(0.5))
model.add(Dense(num_labels, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
return model
And I'm getting this error:
expected dense_1 to have shape (442, 3) but got array with shape (3, 1)
My input is a set of phrases (12501 total) that have been tokenized for the 4000 most relevant words, and there's 3 possible classification. Therefore my input is train_x.shape = (12501, 4000). I reshaped this to (12501, 4000, 1) for the Conv1D layer. Now, my train_y.shape = (12501,3), and I reshaped that into (12501,3, 1).
I'm using the fit function as follows:
model.fit(train_x, train_y, batch_size=32, epochs=10, verbose=1, validation_split=0.2, shuffle=True)
What am I doing wrong?
There's no need to convert label shape for classification. And you can look at your network structure.
print(model.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 3996, 300) 1800
_________________________________________________________________
conv1d_2 (Conv1D) (None, 3992, 300) 450300
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1330, 300) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 1326, 320) 480320
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 442, 320) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 442, 320) 0
_________________________________________________________________
dense_1 (Dense) (None, 442, 3) 963
=================================================================
Total params: 933,383
Trainable params: 933,383
Non-trainable params: 0
_________________________________________________________________
The last output of the model is (None, 442, 3), but the shape of your label is (None, 3, 1). You should eventually ending in either a global pooling layer GlobalMaxPooling1D() or a Flatten layer Flatten(), turning the 3D outputs into 2D outputs, for classification or regression.

Keras 2D input to 2D output

First, I have read this and this questions with similar names to mine and still do not have an answer.
I want to build a feedforward network for sequence prediction. (I realize that RNNs are more suitable for this task, but I have my reasons). The sequences are of length 128 and each element is a vector with 2 entries, so each batch should be of shape (batch_size, 128, 2) and the target is the next step in the sequence, so the target tensor should be of shape (batch_size, 1, 2).
The network architecture is something like this:
model = Sequential()
model.add(Dense(50, batch_input_shape=(None, 128, 2), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(2))
But trying to train I get the following error:
ValueError: Error when checking target: expected dense_4 to have shape (128, 2) but got array with shape (1, 2)
I've tried variations like:
model.add(Dense(50, input_shape=(128, 2), kernel_initializer="he_normal" ,activation="relu"))
but get the same error.
If you take a look at the model.summary() output you will see that what the issue is:
Layer (type) Output Shape Param #
=================================================================
dense_13 (Dense) (None, 128, 50) 150
_________________________________________________________________
dense_14 (Dense) (None, 128, 20) 1020
_________________________________________________________________
dense_15 (Dense) (None, 128, 5) 105
_________________________________________________________________
dense_16 (Dense) (None, 128, 2) 12
=================================================================
Total params: 1,287
Trainable params: 1,287
Non-trainable params: 0
_________________________________________________________________
As you can see, the output of the model is (None, 128,2) and not (None, 1, 2) (or (None, 2)) as you expected. So, you may or may not know that Dense layer is applied on the last axis of its input array and as a result, as you see above, the time axis and dimension is preserved until the end.
How to resolve this? You mentioned you don't want to use a RNN layer, therefore you have two options: you need to either use Flatten layer somewhere in the model or you can also use some Conv1D + Pooling1D layers or even a GlobalPooling layer. For example (these are just for demonstration, you may do it differently):
using Flatten layer
model = models.Sequential()
model.add(Dense(50, batch_input_shape=(None, 128, 2), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Flatten())
model.add(Dense(2))
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_17 (Dense) (None, 128, 50) 150
_________________________________________________________________
dense_18 (Dense) (None, 128, 20) 1020
_________________________________________________________________
dense_19 (Dense) (None, 128, 5) 105
_________________________________________________________________
flatten_1 (Flatten) (None, 640) 0
_________________________________________________________________
dense_20 (Dense) (None, 2) 1282
=================================================================
Total params: 2,557
Trainable params: 2,557
Non-trainable params: 0
_________________________________________________________________
using GlobalAveragePooling1D layer
model = models.Sequential()
model.add(Dense(50, batch_input_shape=(None, 128, 2), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(GlobalAveragePooling1D())
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(2))
model.summary()
​Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_21 (Dense) (None, 128, 50) 150
_________________________________________________________________
dense_22 (Dense) (None, 128, 20) 1020
_________________________________________________________________
global_average_pooling1d_2 ( (None, 20) 0
_________________________________________________________________
dense_23 (Dense) (None, 5) 105
_________________________________________________________________
dense_24 (Dense) (None, 2) 12
=================================================================
Total params: 1,287
Trainable params: 1,287
Non-trainable params: 0
_________________________________________________________________
Note that in both cases above you need to reshape the labels (i.e. targets) array to (n_samples, 2) (or you may want to use a Reshape layer at the end).

Categories