I am facing difficulty in using Keras embedding layer with one hot encoding of my input data.
Following is the toy code.
Import packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
import matplotlib.pyplot as plt
import numpy as np
import openpyxl
import pandas as pd
from keras.callbacks import ModelCheckpoint
from keras.callbacks import ReduceLROnPlateau
The input data is text based as follows.
Train and Test data
X_train_orignal= np.array(['OC(=O)C1=C(Cl)C=CC=C1Cl', 'OC(=O)C1=C(Cl)C=C(Cl)C=C1Cl',
'OC(=O)C1=CC=CC(=C1Cl)Cl', 'OC(=O)C1=CC(=CC=C1Cl)Cl',
'OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=O'])
X_test_orignal=np.array(['OC(=O)C1=CC=C(Cl)C=C1Cl', 'CCOC(N)=O',
'OC1=C(Cl)C(=C(Cl)C=C1Cl)Cl'])
Y_train=np.array(([[2.33],
[2.59],
[2.59],
[2.54],
[4.06]]))
Y_test=np.array([[2.20],
[2.81],
[2.00]])
Creating dictionaries
Now i create two dictionaries, characters to index vice. The unique character number is stored in len(charset) and maximum length of the string along with 5 additional characters is stored in embed. The start of each string will be padded with ! and end will be E.
charset = set("".join(list(X_train_orignal))+"!E")
char_to_int = dict((c,i) for i,c in enumerate(charset))
int_to_char = dict((i,c) for i,c in enumerate(charset))
embed = max([len(smile) for smile in X_train_orignal]) + 5
print (str(charset))
print(len(charset), embed)
One hot encoding
I convert all the train data into one hot encoding as follows.
def vectorize(smiles):
one_hot = np.zeros((smiles.shape[0], embed , len(charset)),dtype=np.int8)
for i,smile in enumerate(smiles):
#encode the startchar
one_hot[i,0,char_to_int["!"]] = 1
#encode the rest of the chars
for j,c in enumerate(smile):
one_hot[i,j+1,char_to_int[c]] = 1
#Encode endchar
one_hot[i,len(smile)+1:,char_to_int["E"]] = 1
return one_hot[:,0:-1,:]
X_train = vectorize(X_train_orignal)
print(X_train.shape)
X_test = vectorize(X_test_orignal)
print(X_test.shape)
When it converts the input train data into one hot encoding, the shape of the one hot encoded data becomes (5, 44, 14) for train and (3, 44, 14) for test. For train, there are 5 example, 0-44 is the maximum length and 14 are the unique characters. The examples for which there are less number of characters, are padded with E till the maximum length.
Verifying the correct padding
Following is the code to verify if we have done the padding rightly.
mol_str_train=[]
mol_str_test=[]
for x in range(5):
mol_str_train.append("".join([int_to_char[idx] for idx in np.argmax(X_train[x,:,:], axis=1)]))
for x in range(3):
mol_str_test.append("".join([int_to_char[idx] for idx in np.argmax(X_test[x,:,:], axis=1)]))
and let's see, how the train set looks like.
mol_str_train
['!OC(=O)C1=C(Cl)C=CC=C1ClEEEEEEEEEEEEEEEEEEEE',
'!OC(=O)C1=C(Cl)C=C(Cl)C=C1ClEEEEEEEEEEEEEEEE',
'!OC(=O)C1=CC=CC(=C1Cl)ClEEEEEEEEEEEEEEEEEEEE',
'!OC(=O)C1=CC(=CC=C1Cl)ClEEEEEEEEEEEEEEEEEEEE',
'!OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=OEEE']
Now is the time to build model.
Model
model = Sequential()
model.add(Embedding(len(charset), 10, input_length=embed))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr
return lr
optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])
callbacks_list = [
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]
history =model.fit(x=X_train, y=Y_train,
batch_size=1,
epochs=10,
validation_data=(X_test,Y_test),
callbacks=callbacks_list)
Error
ValueError: Error when checking input: expected embedding_3_input to have 2 dimensions, but got array with shape (5, 44, 14)
The embedding layer expects two dimensional array. How can I deal with this issue so that it can accept the one hot vector encoded data.
All the above code can be run.
The Keras embedding layer works with indices, not directly with one-hot encodings.
So you don't need to have (5,44,14), just (5,44) works fine.
E.g. get indices with argmax:
X_test = np.argmax(X_test, axis=2)
X_train = np.argmax(X_train, axis=2)
Although it's probably better to not one-hot encode it first =)
Besides that, your 'embed' variable says size 45, while your data is size 44.
If you change those, your model runs fine:
model = Sequential()
model.add(Embedding(len(charset), 10, input_length=44))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr
return lr
optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])
callbacks_list = [
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]
history =model.fit(x=np.argmax(X_train, axis=2), y=Y_train,
batch_size=1,
epochs=10,
validation_data=(np.argmax(X_test, axis=2),Y_test),
callbacks=callbacks_list)
our input shape was not defined properly in the embedding layer. The following code works for me by reducing the steps to covert your data dimensions to 2D you can directly pass the 3-D input to your embedding layer.
#THE MISSING STUFF
#_________________________________________
Y_train = Y_train.reshape(5) #Dense layer contains a single unit so need to input single dimension array
max_len = len(charset)
max_features = embed-1
inputshape = (max_features, max_len) #input shape didn't define. Embedding layer can accept 3D input by using input_shape
#__________________________________________
model = Sequential()
#model.add(Embedding(len(charset), 10, input_length=14))
model.add(Embedding(max_features, 10, input_shape=inputshape))#input_length=max_len))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
print(model.summary())
optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])
callbacks_list = [
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]
history =model.fit(x=X_train, y=Y_train,
batch_size=10,
epochs=10,
validation_data=(X_test,Y_test),
callbacks=callbacks_list)
Related
I believe I have correctly vectorized train and test data, labels, adequate layers, a suitable optimizer, but I cannot understand what is wrong. Why am I getting a ValueError for incompatible shapes?
My code:
from keras.datasets import imdb
(train_data, train_labels),(test_data, test_labels)=imdb.load_data(num_words=10000)
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
def to_one_hot(labels, dimension=46):
results = np.zeros((len(labels), dimension))
for i, label in enumerate(labels):
results[i, label] = 1.
return results
one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)
from tensorflow.keras.utils import to_categorical
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
history = model.fit(partial_x_train, partial_y_train, epochs=20, batch_size=512, validation_data=(x_val, y_val)
My error message:
ValueError: Shapes (None, 1) and (None, 46) are incompatible
According to the comment, if your partial_y_train shape is (24000, 1) then that means it has not been one hot encoded correctly. You are using the function to_one_hot() in your code, but I dont know what this code is doing. Using Tensorflow's one_hot function or scikit-learn's version would be best, then the shapes should match and the error will be removed.
So I am doing a classification machine learning with the input of (batch, step, features).
In order to improve the accuracy of this model, I intended to apply a self-attention layer to it.
I am unfamiliar with how to use it for my case since most examples online are concerned with embedding NLP models.
def opt_select(optimizer):
if optimizer == 'Adam':
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
return adamopt
elif optimizer == 'RMS':
RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
return RMSopt
else:
print('undefined optimizer')
def LSTM_attention_model(X_train, y_train, X_test, y_test, num_classes, loss,batch_size=68, units=128, learning_rate=0.005,epochs=20, dropout=0.2, recurrent_dropout=0.2,optimizer='Adam'):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('acc') > 0.90):
print("\nReached 90% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
model = tf.keras.models.Sequential()
model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))
opt = opt_select(optimizer)
model.compile(loss=loss,
optimizer=opt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose=1,
callbacks=[callbacks])
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
yhat = model.predict(X_test)
return history, that
This led to IndexError: list index out of range
What is the correct way to apply this layer to my model?
As requested, one may use the following codes to simulate a set of the dataset.
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout,Bidirectional,Masking,LSTM
from keras_self_attention import SeqSelfAttention
X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], 700)
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], 100)
batch_size= 217
epochs = 600
dropout = 0.6
Rdropout = 0.7
learning_rate = 0.00001
optimizer = 'RMS'
loss = 'categorical_crossentropy'
num_classes = y_train.shape[1]
LSTM_attention_his,yhat = LSTM_attention_model(X_train,y_train,X_test,y_test,loss =loss,num_classes=num_classes,batch_size=batch_size,units=32,learning_rate=learning_rate,epochs=epochs,dropout = 0.5,recurrent_dropout=Rdropout,optimizer=optimizer)
Here is how I would rewrite the code -
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout, Bidirectional, Masking, LSTM, Reshape
from keras_self_attention import SeqSelfAttention
import numpy as np
def opt_select(optimizer):
if optimizer == 'Adam':
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
return adamopt
elif optimizer == 'RMS':
RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
return RMSopt
else:
print('undefined optimizer')
def LSTM_attention_model(X_train, y_train, X_test, y_test, num_classes, loss, batch_size=68, units=128,
learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2, optimizer='Adam'):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('accuracy') > 0.90):
print("\nReached 90% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
model = tf.keras.models.Sequential()
model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout, return_sequences=True)))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Reshape((-1, model.output.shape[1]*model.output.shape[2])))
model.add(Dense(num_classes, activation='softmax'))
opt = opt_select(optimizer)
model.compile(loss=loss,
optimizer=opt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose=1,
callbacks=[callbacks])
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
yhat = model.predict(X_test)
return history, that
X_train = np.random.rand(700, 50,34)
y_train = np.random.choice([0, 1], (700, 1))
X_test = np.random.rand(100, 50, 34)
y_test = np.random.choice([0, 1], (100, 1))
batch_size= 217
epochs = 600
dropout = 0.6
Rdropout = 0.7
learning_rate = 0.00001
optimizer = 'RMS'
loss = 'categorical_crossentropy'
num_classes = y_train.shape[1]
LSTM_attention_his,yhat = LSTM_attention_model(
X_train,y_train,X_test,y_test,
loss =loss,num_classes=num_classes,batch_size=batch_size,units=32,
learning_rate=learning_rate,epochs=epochs,dropout = 0.5,recurrent_dropout=Rdropout,optimizer=optimizer
)
These are the changes I had to make to get this to start training -
The original issue was caused by the LSTM layer outputting the wrong dimensions. The SeqSelfAttention layer needs a 3D input (one dimension corresponding to the sequence of the data) which was missing from the output of the LSTM layer. As mentioned by #today, in the comments, this can be solved by adding return_sequences=True to the LSTM layer.
But even with that modification,the code still gives an error at when trying to compute the cost function.The issue is that, the output of the self-attention layer is (None, 50, 64) when this is directly passed into the Dense layer, the final output of the network becomes (None, 50, 1). This doesn't make sense for what we are trying to do, because the final output should just contain a single label for each datapoint (it should have the shape (None, 1)). The issue is the output from the self-attention layer which is 3 dimensional (each data point has a (50, 64) feature vector). This needs to be reshaped into a single dimensional feature vector for the computation to make sense. So I added a reshape layer model.add(Reshape((-1, ))) between the attention layer and the Dense layer.
In addition, the myCallback class is testing if logs.get('acc') is > 0.9 but I think it should be (logs.get('accuracy').
To comment on OP's question in the comment on what kind of column should be added, in this case, it was just a matter of extracting the full sequential data from the LSTM layer. Without the return_sequence flag, the output from the LSTM layer is (None, 64) This is simply the final features of the LSTM without the intermediate sequential data.
I've been playing around with Tensorflow and Keras and I finally got the following error while trying hyper parameter tuning:
"ValueError: activation is not a legal parameter"
The point is that I want to try different activation functions in my model to see which one works best.
I have the following code:
import pandas as pd
import tensorflow as tf
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
import numpy as np
ds = pd.read_csv(
"https://storage.googleapis.com/download.tensorflow.org/data/abalone_train.csv",
names=["Length", "Diameter", "Height", "Whole weight", "Shucked weight",
"Viscera weight", "Shell weight", "Age"])
print(ds)
x_train = ds.copy()
y_train = x_train.pop('Age')
x_train = np.array(x_train)
def create_model(layers, activations):
model = tf.keras.Sequential()
for i, nodes in enumerate(layers):
if i == 0:
model.add(tf.keras.layers.Dense(nodes, input_dim=x_train.shape[1]))
model.add(layers.Activation(activations))
model.add(Dropout(0.3))
else:
model.add(tf.keras.layers.Dense(nodes))
model.add(layers.Activation(activations))
model.add(Dropout(0.3))
model.add(tf.keras.layers.Dense(units=1, kernel_initializer='glorot_uniform'))
model.add(layers.Activation('sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model
model = KerasClassifier(build_fn=create_model, verbose=0)
layers = [[20], [40,20], [45, 30, 15]]
activations = ['sigmoid', 'relu']
param_grid = dict(layers=layers, activation=activations, batch_size = [128, 256], epochs=[30])
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5)
grid_result = grid.fit(x_train, y_train)
print(grid_result.best_score_,grid_reslult.best_params_)
pred_y = grid.predict(x_test)
y_pred = (pred_y > 0.5)
cm=confusion_matrix(y_pred, y_test)
score=accuracy_score(y_pred, y_test)
model.fit(x_train, y_train, epochs=30, callbacks=[cp_callback])
#steps_per_epoch
model.evaluate(x_test, y_test, verbose=2)
probability_model = tf.keras.Sequential([
model,
tf.keras.layers.Softmax()
])
probability_model(x_test[:100])
If you see here, you must specify activations as :
from tensorflow.keras import activations
layers.Activation(activations.relu)
Right now, you have:
activations = ['sigmoid', 'relu']
So , that's why the value error.
You should change your code to sth like this:
model.add(tf.keras.layers.Dense(nodes, activation=activations[i], input_dim=x_train.shape[1]))
So, remove the Activation layer: model.add(layers.Activation(activations)) and instead use the activation inside each layer.
Example:
def create_model(layers, activations):
model = tf.keras.Sequential()
for i in range(2):
if i == 0:
model.add(tf.keras.layers.Dense(2, activation=activations[i], input_dim=x_train.shape[1]))
model.add(tf.keras.layers.Dropout(0.3))
else:
model.add(tf.keras.layers.Dense(2, activation=activations[i]))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid', kernel_initializer='glorot_uniform'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model
layers.Activation() expects a function or a string, such as 'sigmoid' but you are currently passing an array activations to it. Use your index i (or a different index) to access the activation function like activations[i].
You can also pass the activation as string directly to the Dense layer like so:
model.add(tf.keras.layers.Dense(nodes, activation=activations[i], input_dim=x_train.shape[1])))
Here the problem, I have a dataset 2200x39, I know... very poor. Where 38 are the features (texture and statistic) and the last column is the output class which could be 0 or 1. My dataset is balanced (1100 "1" and 1100 "0").
I'm trying to improve my performance which is stuck in 0.69 for loss and 0.49 for accuracy. I tried to add a layer, to add neurons, different parameters. Nothing, values of accuracy and loss change just a bit.
So, first of all, I import all the stuff I need
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Conv1D
from tensorflow.keras.optimizers import SGD
import matplotlib.pyplot as plt
Then I prepar my data and split 80% training set and 20% validation test
# fix a seed for reproducing same results if we wish to train and evaluate our network more than once
seed = 9
np.random.seed(seed)
# load dataset
dataset = np.loadtxt('tr_set.csv', delimiter=',', skiprows=1)
# Show the first 10 rows
print(dataset[1:10])
# Delete the first column with the patient index
dataset = dataset[:,1:42]
# Split into input (features) and output variables
X = dataset[:,2:40]
Y = dataset[:,40]
# Counting elements in class 0 and in class 1
count_0 = 0
count_1 = 0
for i in Y:
if i == 0:
count_0 = count_0 + 1
if i == 1:
count_1 = count_1 + 1
print("Number of elements in 0 class:", count_0)
print("Number of elements in 1 class:", count_1)
# The dataset is balanced
# Split into training set(80%) and validation set (20%)
(X_train, X_val, Y_train, Y_val) = train_test_split(X, Y, test_size=0.2, random_state=seed)
And here my model after I reshape X_train and X_val due to using Conv1D
# Create the model
opt = SGD(lr=0.00001)
model = Sequential()
model.add(Dense(1024, activation='relu', kernel_initializer='random_uniform', input_shape=(1,38)))
model.add(BatchNormalization()) # It is used to normalize the input layer by adjusting and scaling the activations.
model.add(Dense(512, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.summary()
model.add(Conv1D(64, 3, padding="same", activation="relu"))
# model.add(MaxPooling1D(2))
model.summary()
model.add(Dense(1, activation='sigmoid'))
model.summary()
# compile the model
model.compile(loss='binary_crossentropy', optimizer= opt, metrics=['accuracy'])
# fit the model
history = model.fit(X_train, Y_train, validation_data=(X_val, Y_val), epochs=15, batch_size=10)
# w_data = model.get_weights()
What it is wrong, I delete the max-pooling because I have problems with dimension (Something like subtracting 2 from 1)?
I understand that the features extracted from an auto-encoder can be fed into an mlp for classification or regression purpose. This is something that I did earlier.
But what if I have 2 auto-encoders? Can I extract the features from the bottleneck layers of 2 auto-encoders and feed them into an mlp which performs classification based on these features? If yes, then how? I am not sure how to concatenate these two feature sets. I tried with numpy.hstack() which gives me 'unhashable slice' error, whereas, using tf.concat() gives me the error 'Input tensors to a Model must be Keras tensors.' the bottleneck layers of the two auto-encoders are of dimension (None,100) each. So, essentially, if I stack them horizontally, I should be getting a (None, 200). The hidden layer of the mlp may contain some (num_hidden=100) neurons. Could anyone please help?
x1 = autoencoder1.get_layer('encoder2').output
x2 = autoencoder2.get_layer('encoder2').output
#inp = np.hstack((x1, x2))
inp = tf.concat([x1, x2], 1)
x = tf.concat([x1, x2], 1)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inp, outputs=y)
# Compile model
mymlp.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train model
mymlp.fit(x_train, y_train, epochs=20, batch_size=8)
updated as per #twolffpiggott's suggestion:
from keras.layers import Input, Dense, Dropout
from keras import layers
from keras.models import Model
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy as np
x1 = Data1
x2 = Data2
y = Data3
num_neurons1 = x1.shape[1]
num_neurons2 = x2.shape[1]
# Train-test split
x1_train, x1_test, x2_train, x2_test, y_train, y_test = train_test_split(x1, x2, y, test_size=0.2)
# scale data within [0-1] range
scalar = MinMaxScaler()
x1_train = scalar.fit_transform(x1_train)
x1_test = scalar.transform(x1_test)
x2_train = scalar.fit_transform(x2_train)
x2_test = scalar.transform(x2_test)
x_train = np.concatenate([x1_train, x2_train], axis =-1)
x_test = np.concatenate([x1_test, x2_test], axis =-1)
# Auto-encoder1
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons1,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded1 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded1)
decoded = Dense(num_neurons1, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder1 = Model(inputs=input_data, outputs=decoded)
autoencoder1.compile(optimizer='sgd', loss='mse')
# training
autoencoder1.fit(x1_train, x1_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x1_test, x1_test))
# Auto-encoder2
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons2,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded2 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded2)
decoded = Dense(num_neurons2, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder2 = Model(inputs=input_data, outputs=decoded)
autoencoder2.compile(optimizer='sgd', loss='mse')
# training
autoencoder2.fit(x2_train, x2_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x2_test, x2_test))
# MLP
num_hidden = 100
encoded1.trainable = False
encoded2.trainable = False
encoded1 = autoencoder1(autoencoder1.inputs)
encoded2 = autoencoder2(autoencoder2.inputs)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
x = Dropout(0.2)(concatenated)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
# Compile model
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit(x_train, y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict(x_test)
giving me an error: unhashable type: 'list' from the line:
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
The problem is that you're mixing numpy arrays with keras tensors. This can't go.
There are two approaches.
Predict numpy arrays from each autoencoder, concat the arrays, send them to the third model
Connect all models, probably make the autoencoders untrainable, fit with one input for each autoencoder.
Personally, I'd go for the first. (Assuming the autoencoders are already trained and don't need change).
First approach
numpyOutputFromAuto1 = autoencoder1.predict(numpyInputs1)
numpyOutputFromAuto2 = autoencoder2.predict(numpyInputs2)
inputDataForThird = np.concatenate([numpyOutputFromAuto1,numpyOutputFromAuto2],axis=-1)
inputTensorForMlp = Input(inputsForThird.shape[1:])
h = Dense(num_hidden, activation='relu', name='hidden')(inputTensorForMlp)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inputTensorForMlp, outputs=y)
....
mymlp.fit(inputDataForThird ,someY)
Second Approach
This is a little more complicated, and at first I don't see much reason to do this. (But of course there may be cases where it's a good choice)
Now we're totally forgetting numpy and working with keras tensors.
Creating the mlp on its own (good if you will use it later without the autoencoders):
inputTensorForMlp = Input(input_shape_compatible_with_concatenated_encoder_outputs)
x = Dropout(0.2)(inputTensorForMlp)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
We probably want the bottleneck features of the autoencoders, right? If you happened to create the autoencoders properly with: encoder model, decoder model, join both, then it's easier to use just the encoder model. Else:
encodedOutput1 = autoencoder1.layers[bottleneckLayer].outputs #or encoder1.outputs
encodedOutput2 = autoencoder1.layers[bottleneckLayer].outputs #or encoder2.outputs
Creating a joined model. The concatenation must use a keras layer (we're working with keras tensors):
concatenated = Concatenate()([encodedOutput1,encodedOutput2])
output = myMLP(concatenated)
joinedModel = Model([autoencoder1.input,autoencoder2.input],output)
I'd also go with Daniel's first approach (for simplicity and efficiency), but if you're interested in the second; for instance if you're interested in running the network end-to-end, you'd approach it like this:
# make autoencoders not trainable
autoencoder1.trainable = False
autoencoder2.trainable = False
encoded1 = autoencoder1(kerasInputs1)
encoded2 = autoencoder2(kerasInputs2)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
h = Dense(num_hidden, activation='relu', name='hidden')(concatenated)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model([input_data1, input_data2], y)
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit([x1_train, x2_train], y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict([x1_test, x2_test])
Key edits
The weights of both autoencoders should be frozen end-to-end (otherwise early-stage gradient updates from the randomly initialized MLP will likely result in the loss of much of their learning).
The autoencoder input layers should be assigned to separate variables input_data1 and input_data2 per autoencoder (instead of both to input_data). Even though autoencoder1.inputs returns a tf tensor, this is the source of the unhashable type: list exception, and replacing with [input_data1, input_data2] solves the issue.
When fitting the MLP for the end-to-end model, the input should be a list of x1_train and x2_train rather than the concatenated inputs. Same when predicting.