I am a beginner with Keras and in writing Neural Networks models and actually I'm trying to write a LSTM for text-generation purpose, without success. What am I doing wrong?
I read this question: here
and other articles but there is something I am missing I can't get, sorry if I seem dumb.
The goal
My purpose is to generate english articles of a fixed length (1500 by now).
Suppose I have a 20k records dataset in sequences (articles, basically) of different lengths, I set a fixed length for all articles (MAX_SEQUENCE_LENGTH=1500) and tokenized them, getting a matrix (X, my training-data) looking like:
[[ 0 0 0 ... 88 664 206]
[ 0 0 0 ... 1 93 140]
[ 0 0 0 ... 3 173 2283]
...
[ 50 2761 4 ... 167 148 156]
[ 0 0 0 ... 10 77 206]
[ 0 0 0 ... 167 148 156]]
with a shape of 20000x1500
the output of my LSTM should be a 1 x MAX_SEQUENCE_LENGTH array of tokens.
My model looks like that:
def generator_model(sequence_input, embedded_sequences, output_shape):
layer = LSTM(16,return_sequences = True)(embedded_sequences)
layer = LSTM(32,return_sequences = True)(layer)
layer = Flatten()(layer)
output = Dense(output_shape, activation='softmax')(layer)
generator = Model(sequence_input, output)
return generator
with:
sequence_input = Input(batch_shape=(1, 1,1500), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
output_shape = MAX_SEQUENCE_LENGTH
the LSTM is supposed to train, with model.fit(), on a training-set of 20k x MAX_SEQUENCE_LENGTH shape (X).
and getting an array of tokens with 1 x MAX_SEQUENCE_LENGTH shape as output when I call model.predict(seed), with seed a random noise array.
compile, fit and predict
comments for the following section:
. generator.compile works, the model is given in edit section of ths post.
. generator.fit compile, epochs=1 param is for testing-purpose, will be BATCH_NUM
. now i have some doubts on the y I give to generator.fit, by now I'm giving a matrix of 0 as target output, if I generate it with a different shape from the X.shape[0], it throw the error, this means it needs to have a label for every record in X. but if I give him a matrix of 0 as target for model.fit, isn't it going to predict just arrays of 0?
. the error is giving is always the same, despite i use the noise_generator() or noise_integer_generator(), i believe it's because it doesn't like the y_shape param i'm giving
embedding_layer = load_embeddings(word_index)
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
embedded_sequences = embedding_layer(sequence_input)
generator = generator_model(sequence_input, embedded_sequences, X.shape[1])
print(generator.summary())
generator.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
Xnoise = generate_integer_noise(MAX_SEQUENCE_LENGTH)
y_shape = np.zeros((X.shape[0],), dtype=int)
generator.fit(X, y_shape, epochs=1)
acc = generator.predict(Xnoise, verbose=1)
But actually I'm getting the following error
ValueError: Error when checking input: expected input_1 to have shape (1500,) but got array with shape (1,)
when I call:
Xnoise = generate_noise(samples_number=MAX_SEQUENCE_LENGTH)
generator.predict(Xnoise, verbose=1)
The noise I give is a 1 x 1500 array, but it seems it's expecting a (1500,) matrix, So there must be some kind of error in the shape settings for my output.
Is my model correct for my purpose? or did I wrote something really really stupid I can't see?
Thanks for the help you can give me, I appreciate that!
edit
Changelog:
v1.
###
- Changed model structure, now return_sequences = True and using shape instead of batch_shape
###
- Changed
sequence_input = Input(batch_shape=(1,1,1500), dtype='int32')
to
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
###
- Changed the error the model is giving
v2.
###
- Changed generate_noise() code
###
- Added generate_integer_noise() code
###
- Added full sequence with the model compile, fit and predict
###
- Added model.fit summary under the model summary, in the tail of the post
generate_noise() code:
def generate_noise(samples_number, mean=0.5, stdev=0.1):
noise = np.random.normal(mean, stdev, (samples_number, MAX_SEQUENCE_LENGTH))
print(noise.shape)
return noise
which print: (1500,)
generate_integer_noise() code:
def generate_integer_noise(samples_number):
noise = []
for _ in range(0, samples_number):
noise.append(np.random.randint(1, MAX_NB_WORDS))
Xnoise = np.asarray(noise)
return Xnoise
my function load_embeddings() is as follow:
def load_embeddings(word_index, embeddingsfile='Embeddings/glove.6B.%id.txt' %EMBEDDING_DIM):
embeddings_index = {}
f = open(embeddingsfile, 'r', encoding='utf8')
for line in f:
values = line.split(' ') #split the line by spaces
word = values[0] #each line starts with the word
coefs = np.asarray(values[1:], dtype='float32') #the rest of the line is the vector
embeddings_index[word] = coefs #put into embedding dictionary
f.close()
print('Found %s word vectors.' % len(embeddings_index))
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
return embedding_layer
model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1500) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 1500, 300) 9751200
_________________________________________________________________
lstm_1 (LSTM) (None, 1500, 16) 20288
_________________________________________________________________
lstm_2 (LSTM) (None, 1500, 32) 6272
_________________________________________________________________
flatten_1 (Flatten) (None, 48000) 0
_________________________________________________________________
dense_1 (Dense) (None, 1500) 72001500
=================================================================
Total params: 81,779,260
Trainable params: 72,028,060
Non-trainable params: 9,751,200
_________________________________________________________________
model.fit() summary (using a 999-sized dataset for testing, instad of the 20k-sized):
999/999 [==============================] - 62s 62ms/step - loss: 0.5491 - categorical_accuracy: 0.9680
I rewrote full answer, now it works (at least compiles and runs, can't say anything about convergence).
First, I don't know why you use sparse_categorical_crossentropy instead of categorical_crossentropy? It could be important. I change the model a bit, so it compiles and use a categorical_crossentropy. If you need a sparse one, change the shape of a target.
Also, I change batch_shape to shape argument, because it allows to use batches of different shape. It's easier to work with.
And the last edit: you should change generate_noise, because an Embedding layer awaits a numbers from (0, max_features), not the normally distributed floats (see a comment in the function).
EDIT
Addressing the last comments, I've removed a generate_noise and post modified generate_integer_noise function:
from keras.layers import Input, Embedding, LSTM
from keras.models import Model
import numpy as np
def generate_integer_noise(samples_number):
"""
samples_number is a number of samples, i.e. first dimension in (some, 1500)
"""
return np.random.randint(1, MAX_NB_WORDS, size=(samples_number, MAX_SEQUENCE_LENGTH))
MAX_SEQUENCE_LENGTH = 1500
"""
Tou can use your definition of embedding layer,
I post to make a reproducible example
"""
max_features, embed_dim = 10, 300
embedding_matrix = np.zeros((max_features, embed_dim))
output_shape = MAX_SEQUENCE_LENGTH
embedded_layer = Embedding(
max_features,
embed_dim,
weights=[embedding_matrix],
trainable=False
)
def generator_model(embedded_layer, output_shape):
"""
embedded_layer: Embedding keras layer
output_shape: shape of the target
"""
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH, ))
embedded_sequences = embedded_layer(sequence_input) # Set trainable to the True if you wish to train
layer = LSTM(32, return_sequences=True)(embedded_sequences)
layer = LSTM(64, return_sequences=True)(layer)
output = LSTM(output_shape)(layer)
generator = Model(sequence_input, output)
return generator
generator = generator_model(embedded_layer, output_shape)
noise = generate_integer_noise(32)
# generator.predict(noise)
generator.compile(loss='categorical_crossentropy', optimizer='adam')
generator.fit(noise, noise)
Related
I have the following code for time series predictions with RNNs and I would like to know whether for the testing I predict one day in advance:
# -*- coding: utf-8 -*-
"""
Time Series Prediction with RNN
"""
import pandas as pd
import numpy as np
from tensorflow import keras
#%% Configure parameters
epochs = 5
batch_size = 50
steps_backwards = int(1* 4 * 24)
steps_forward = int(1* 4 * 24)
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
#%% "Reading the data"
dataset = pd.read_csv('C:/User1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
df = dataset
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:2]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)
#%% Prepare the input data for the RNN
series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]
indexWithYLabelsInSeriesReshapedY = 0
lengthOfTheYData = len(data_Y)-steps_backwards -steps_forward
Y = np.empty((lengthOfTheYData, steps_backwards, steps_forward))
for step_ahead in range(1, steps_forward + 1):
Y[..., step_ahead - 1] = series_reshaped_Y[..., step_ahead:step_ahead + steps_backwards, indexWithYLabelsInSeriesReshapedY]
Y_train = Y[:timeslot_x_train_end]
Y_valid = Y[timeslot_x_train_end:timeslot_x_valid_end]
Y_test = Y[timeslot_x_valid_end:]
#%% Build the model and train it
model = keras.models.Sequential([
keras.layers.SimpleRNN(90, return_sequences=True, input_shape=[None, 2]),
keras.layers.SimpleRNN(60, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(steps_forward))
#keras.layers.Dense(steps_forward)
])
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size,
validation_data=(X_valid, Y_valid))
#%% #Predict the test data
Y_pred = model.predict(X_test)
prediction_lastValues_list=[]
for i in range (0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))
#%% Create thw dataframe for the whole data
wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,0]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100
I define eps_forward = int(1* 4 * 24) which is basically one full day (in 15 minutes resolution which makes 1 * 4 *24 = 96 time stamps). I predict the test data by using Y_pred = model.predict(X_test) and I create a list with the predicted values by using for i in range (0, len(Y_pred)): prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))
As for me the input and output data of RNNs is quite confusing I am not sure whether for the test dataset I predict one day in advance meaning 96 time steps into the future. Actually what I want is to read historic data and then predict the next 96 time steps based on the historic 96 time steps. Can anyone of you tell me whether I am doing this by using this code or not?
Here I have a link to some test data that I just created randomly. Do not care about the actual values but just on the structure of the prediction: Download Test Data
Am I forecasting 96 steps in advance with the given code (my code is based on a tutorial that can be found here Tutorial RNN for electricity price prediction)?
Reminder: Can anyone tell me something about my question? Or do you need further information? If so, please tell me.
I'll highly appreciate your comments and will be quite thankful for your help. I will also award a bounty for a useful answer.
So if your goal is to predict the next 96 steps given 96 steps in the past, I think you are over-complicating it with your current model. Why not start off with something simple like this:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
np.random.seed(42)
tf.random.set_seed(42)
df = pd.read_csv('TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
df = df.drop('value', 1)
steps = 96
scaler = MinMaxScaler()
data = scaler.fit_transform(df.values)
series_reshaped = np.array([data[i:i + (steps+steps)].copy() for i in range(len(data) - (steps + steps))])
x_train_index = int(len(series_reshaped)* .80)
x_valid_index = int(len(series_reshaped)* .10)
x_test_index = x_train_index + x_valid_index
X_train = series_reshaped[:x_train_index, :steps]
X_valid = series_reshaped[x_train_index: x_test_index, :steps]
X_test = series_reshaped[x_test_index:, :steps]
Y_train = series_reshaped[:x_train_index, steps:]
Y_valid = series_reshaped[x_train_index: x_test_index, steps:]
Y_test = series_reshaped[x_test_index:, steps:]
model = tf.keras.models.Sequential([
tf.keras.layers.SimpleRNN(96, return_sequences=True, input_shape=(None, 1)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(0.001))
history = model.fit(X_train, Y_train, epochs=20,
validation_data=(X_valid, Y_valid))
You simply split your data into the 96 steps for training and 96 steps forward as your "labels". After training just make your predictions with your test data:
import matplotlib.pyplot as plt
Y_pred = model.predict(X_test)
prediction_list = []
for i in range (0, len(Y_pred)):
prediction_list.append(Y_pred[i][0])
prediction_df = pd.DataFrame((Y_test[:, 0]))
prediction_df.rename(columns = {0:'actual'}, inplace = True)
prediction_df['predictions'] = prediction_list
prediction_df['difference'] = (prediction_df['predictions'] - prediction_df['actual']).abs()
prediction_df['difference_percentage'] = ((prediction_df['difference'])/(prediction_df['actual']))*100
print(prediction_df)
fig, ax = plt.subplots(figsize = (24,12))
ax.set_title('Temperatures across time', fontsize=20)
ax.set_xlabel('Timesteps', fontsize=20)
ax.tick_params(axis='both', which='major', labelsize=20)
ax.set_ylabel('Temperature', fontsize=20)
plt1 = ax.plot(prediction_df['predictions'][steps:], color = 'g', label='predictions')
plt2 = ax.plot(prediction_df['actual'][steps:], color = 'r', label='actual')
ax.legend(loc='upper left', prop={'size': 20})
actual predictions difference difference_percentage
0 0.540650 [0.52996427] [0.010686159] [1.9765377]
1 0.550813 [0.5463712] [0.0044417977] [0.8064075]
2 0.544715 [0.54527795] [0.00056248903] [0.1032629]
3 0.543360 [0.5469178] [0.003557384] [0.65470064]
4 0.547425 [0.5332471] [0.014178336] [2.590003]
.. ... ... ... ...
977 0.410569 [0.440537] [0.029967904] [7.2991133]
978 0.395664 [0.44218686] [0.046522915] [11.758189]
979 0.414634 [0.448785] [0.03415087] [8.236386]
980 0.414634 [0.43778685] [0.023152709] [5.5838885]
981 0.409214 [0.45098385] [0.041769773] [10.207315]
Note that this model can be improved in a lot of ways, but I want you to understand the basics, which is why I tried to make it as simple as possible. After you have understood this approach, you can try an autoregressive approach as mentioned by elbe. Also note that I have not de-normalised your data, which is why you get very low values.
First, I suggest you read Tensorflow's tutorial on time series forecasting.
I played around a bit with your code and the data provided.
The first important thing is that only the temperature column contains information.
In the code below, I prepare the data so that X over a time window of 96 samples/steps and the next step is in Y. X is of dimension (n_samples, 96, 1) and Y of dimension (n_samples, ), I use only steps_backwards points for the past (and discarded the future for simplicity, without affecting the generality)
I have tried different models (a simple Fully Connected or RNN + FC, etc.).
I'm doing mean pooling (with the functional API rather than the sequential model definition approach) so that I have a single predicted value at the end.
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards, 1][:, :, np.newaxis]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards, 1][:, :, np.newaxis]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards, 1][:, :, np.newaxis]
Y_train = series_reshaped_X[:timeslot_x_train_end, steps_backwards, 1]
Y_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards, 1]
Y_test = series_reshaped_X[timeslot_x_valid_end:, steps_backwards, 1]
# define the model
input = tf.keras.Input(shape=(96, 1))
x = input
x = keras.layers.SimpleRNN(10, return_sequences=False, input_shape=[96, 1])(x)
x = keras.layers.Dense(5)(x)
x = tf.reduce_mean(x, axis=1)
model = tf.keras.Model(inputs=input, outputs=x)
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mae'])
with return_sequences=False, the RNN outputs only the last predicted value.
Model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 96, 1)] 0
_________________________________________________________________
simple_rnn_27 (SimpleRNN) (None, 10) 120
_________________________________________________________________
dense_21 (Dense) (None, 5) 55
_________________________________________________________________
tf.math.reduce_mean_3 (TFOpL (None,) 0
=================================================================
Total params: 175
Trainable params: 175
Non-trainable params: 0
If you set return_sequences=True, the entire output sequence is outputed, but the prediction time step is still one in the RNN. It is explained here.
One way to predict more steps is to do an autoregressive approach i.e. concatenating the n-1 previous data and the predicted value to get the next value. Another (better) way is to consider that the RNN capture the time dependency in the input, so another possible model could be, if we consider that the input and the output data are of same shape:
input = tf.keras.Input(shape=(96, 1))
x = input
x = keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[96, 1])(x)
x = keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs=input, outputs=x)
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mae'])
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_7 (InputLayer) [(None, 96, 1)] 0
_________________________________________________________________
simple_rnn_29 (SimpleRNN) (None, 96, 10) 120
_________________________________________________________________
dense_23 (Dense) (None, 96, 1) 11
=================================================================
Total params: 131
Trainable params: 131
Non-trainable params: 0
In a way, you can think of the RNN as being able to capture the temporal dependencies in the sequence. It can be combined with other layers to give a better predictor (e.g. the dense layer as you did or stacked RNNs etc).
Note that the number of parameters in the model summary gives you an idea of the ability of the network to learn complex relationships between inputs and outputs (and over fitting problem if the number of parameters is too high).
I have a dir with NumPy array files: bias1.npy, kernel1.npy, bias2.npy, kernel2.npy. How can I build a TF model that uses those arrays as kernels and biases of layers?
To avoid confusion bias matrix for the consistency of the numpy file is the 2D matrix with one column. This post shows how did I reproduce tf's model based on the numpy weights and biases.
class NumpyInitializer(tf.keras.initializers.Initializer):
# custom class converting numpy arrays to tf's initializers
# used to initialize both kernel and bias
def __init__(self, array):
# convert numpy array into tensor
self.array = tf.convert_to_tensor(array.tolist())
def __call__(self, shape, dtype=None):
# return tensor
return self.array
def restore_model_from_numpy(directory):
"""
Recreate model from the numpy files.
Numpy files in the directory are ordered by layers
and bias numpy matrix comes before numpy weight matrix.
In example:
directory-
- L1B.npy //numpy bias matrix for layer 1
- L1W.npy //numpy weights matrix for layer 1
- L2B.npy //numpy bias matrix for layer 2
- L2W.npy //numpy weights matrix for layer 2
Parameters:
directory - path to the directory with numpy files
Return:
tf's model recreated from numpy files
"""
def file_iterating(directory):
"""
Iterate over directory and create
dictionary of layers number and it's structure
layers[layer_number] = [numpy_bias_matrix, numpy_weight_matrix]
"""
pathlist = Path(directory).rglob("*.npy") # list of numpy files
layers = {} # initialize dictionary
index = 0
for file in pathlist: # iterate over file in the directory
if index % 2 == 0:
layers[int(index/2)] = [] # next layer - new key in dictionary
layers[int(index/2)].append(np.load(file)) # add to dictionary bias or weight
index +=1
print(file) # optional to show list of files we deal with
return layers # return dictionary
layers = file_iterating(directory) # get dictionary with model structure
inputs = Input(shape = (np.shape(layers[0][1])[0])) # create first model input layer
x = inputs
for key, value in layers.items(): # iterate over all levers in the layers dictionary
bias_initializer = NumpyInitializer(layers[key][0][0]) # create bias initializer for key's layer
kernal_initializer = NumpyInitializer(layers[key][1]) # create weights initializer for key's layer
layer_size = np.shape(layers[key][0])[-1] # get the size of the layer
new_layer = tf.keras.layers.Dense( # initialize new Dense layer
units = layer_size,
kernel_initializer=kernal_initializer,
bias_initializer = bias_initializer,
activation="tanh")
x = new_layer(x) # stack layer at the top of the previous layer
model = tf.keras.Model(inputs, x) # create tf's model based on the stacked layers
model.compile() # compile model
return model # return compiled model
In my directory, I had 4 numpy files (layer 1 - L1 and layer 2 - L2):
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L1B.npy , shape: (1, 80)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L1W.npy , shape: (100, 80)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L2B.npy , shape: (1, 100)
100_5_25_1Knapsack_Layer1\100_5_25_1Knapsack\L2W.npy , shape: (80, 100)
Calling the function result in:
m = restore_model_from_numpy(my_numpy_files_directory)
m.summary()
Model: "model_592"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_312 (InputLayer) [(None, 100)] 0
_________________________________________________________________
dense_137 (Dense) (None, 80) 8080
_________________________________________________________________
dense_138 (Dense) (None, 100) 8100
=================================================================
Total params: 16,180
Trainable params: 16,180
Non-trainable params: 0
_________________________________________________________________
I hope that this post will be helpful to anyone as it's my first one.
Happy coding :D
I am training a Tensorflow model with LSTMs for predictive maintenance. For each instance I create a matrix (50,4) where 50 is the length of the hisotry sequence, and 4 is the number of features for each records, so for training the model I use e.g. (55048, 50, 4) tensor and a (55048, 1) as labels. When I train on Jupyter on my computer it works (very slow, but it works), but on Colab I get this error:
Training data shape is (55048, 50, 4)
Labels shape is (55048, 1)
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 50, 100) 42000
_________________________________________________________________
dense (Dense) (None, 50, 1) 101
=================================================================
Total params: 42,101
Trainable params: 42,101
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
ValueError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:571 train_function *
outputs = self.distribute_strategy.run(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run **
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:543 train_step **
self.compiled_metrics.update_state(y, y_pred, sample_weight)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:406 update_state
metric_obj.update_state(y_t, y_p)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:90 decorated
update_op = update_state_fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/metrics.py:2083 update_state
label_weights=label_weights)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:351 update_confusion_matrix_variables
y_pred.shape.assert_is_compatible_with(y_true.shape)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_shape.py:1117 assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (None, 50) and (None, 1) are incompatible
I share with you some pieces of code. I know it is quite long:
def build_lstm(train_data, train_labels, structure=(100,), epochs=50, activation_fun="relu", dropout_rate=0.1,
loss_function="binary_crossentropy", optimizer="adagrad", val_split=0.2, seq_length=50):
#n_features = len(train_data.columns)
print("Train data is\n",train_data)
acceptable_ids = [idx for idx in train_data['id'].unique() if train_data[train_data['id']==idx].shape[0]>seq_length]
seq_gen = [list(gen_sequence(train_data[train_data['id']==idx], seq_length)) for idx in acceptable_ids]
print("Seq gen is\n")
print(np.array(seq_gen).shape)
seq_array = np.concatenate(seq_gen,0).astype(np.float32)
print("Training data shape is", seq_array.shape)
#train_labels = np.asarray(train_labels).astype('float32').reshape((-1,1))
label_gen = [gen_labels(train_labels[train_labels['id']==idx], seq_length) for idx in acceptable_ids]
label_array = np.concatenate(label_gen).astype(np.float32)
print("Labels shape is", label_array.shape)
first_layer=True
model = tf.keras.Sequential()
for layer_nodes in structure:
if first_layer:
model.add(LSTM(layer_nodes, activation=activation_fun, input_shape=(seq_length,train_data.shape[1]-1),
dropout=dropout_rate, return_sequences=True))
first_layer=False
else:
model.add(LSTM(layer_nodes, activation=activation_fun,
dropout=dropout_rate, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss=loss_function,
optimizer=optimizer,
metrics=['AUC','accuracy'])
history = model.fit(seq_array,label_array, epochs=epochs, shuffle=True, validation_split=val_split, callbacks=[earlystop_callback])
return model
def gen_sequence(id_df, seq_length):
""" Only sequences that meet the window-length are considered, no padding is used. This means for testing
we need to drop those which are below the window-length. An alternative would be to pad sequences so that
we can use shorter ones """
# for one id I put all the rows in a single matrix
data_matrix = id_df.drop("id",1).values
num_elements = data_matrix.shape[0]
# Iterate over two lists in parallel.
# For example id1 have 192 rows and sequence_length is equal to 50
# so zip iterate over two following list of numbers (0,112),(50,192)
# 0 50 -> from row 0 to row 50
# 1 51 -> from row 1 to row 51
# 2 52 -> from row 2 to row 52
# ...
# 111 191 -> from row 111 to 191
for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
#print(data_matrix[start:stop, :],"\n")
yield data_matrix[start:stop, :]
def gen_labels(id_df, seq_length):
data_array = id_df.drop("id",1).values
num_elements = data_array.shape[0]
return data_array[seq_length:num_elements, :]
...
for comb_hyp in hyp_combinations:
for id_validation in training_folds_2:
print(id_validation)
## SEPARATE TRAINING SET AND VALIDATION SET
X_val = X[X.id.isin(id_validation)].copy()
X_train = X[~X.id.isin(id_validation)].copy()
y_val = y[y.id.isin(id_validation)].copy()
y_train = y[~y.id.isin(id_validation)].copy()
## TRAIN THE CLASSIFIER
clf = build_lstm(train_data=X_train, train_labels=y_train, structure=comb_hyp[2], epochs=EPOCHS, activation_fun=comb_hyp[0], optimizer=SOLVER, seq_length=SEQ_LENGTH)
...
Why does it work in Jupyter and not in Colab? Thanks for your attention.
In my case, I uninstalled tensorflow and then installed tensorflow-gpu and the problem was solved
I was working with runtime set to GPU, already. It works if I put as last layer not a dense layer with one node (for binary classification), but a LSTM layer with one node. Maybe it is because LSTM and Dense should not be mixed.
Thank you for your replies.
I have two Keras (Tensorflow backend) models, which are stacked to make a combined model:
small_model with In: (None,K), Out: (None,K)
large_model with In: (None,N,K), Out: (None,1)
combined_model (N x small_model -> large_model) with In: (None,N,K), Out: (None,1)
large_model needs N stacked outputs from small_model as input.
I can define N small_models, which share weights, then concatenate their outputs (technically, I need to stack them), and then send that to large_model, as in the code below.
My problem is that I need to be able to do this for very large N (> 10**6), and that my current solution uses a lot of memory and time when creating the models, even for N ~ 10**2.
I'm hoping that there is a solution which sends the N data points through small_model in parallel (like what is done when giving a batch to a model), collects those points (with the Keras history, so that backprop is possible) and sends that to large_model, without having to define the N instances of small_model. The listed input and output shapes for the three models should not change, but other intermediate models can of course be defined.
Thank you.
Current unsatisfactory solution (assume that small_model and large_model already exist, and that N,K are defined):
from keras.layers import Input, Lambda
from keras.models import Model
from keras import backend as K
def build_small_model_on_batch():
def distribute_inputs_to_small_model(input):
return [small_model(input[:,i]) for i in range(N)]
def stacker(list_of_tensors):
return K.stack(list_of_tensors, axis=1)
input = Input(shape=(N,K,))
small_model_outputs = Lambda(distribute_inputs_to_small_model)(input)
stacked_small_model_outputs = Lambda(stacker)(small_model_outputs)
return Model(input, stacked_small_model_outputs)
def build_combined():
input = Input(shape=(N,K,))
stacked_small_model_outputs = small_model_on_batch(input)
output = large_model(stacked_small_model_outputs)
return Model(input, output)
small_model_on_batch = build_small_model_on_batch()
combined = build_combined()
You can do that with a TimeDistributed layer wrapper:
from keras.layers import Input, Dense, TimeDistributed
from keras.models import Sequential, Model
N = None # Use fixed value if you do not want variable input size
K = 20
def small_model():
inputs = Input(shape=(K,))
# Define the small model
# Here it is just a single dense layer
outputs = Dense(K, activation='relu')(inputs)
return Model(inputs=inputs, outputs=outputs)
def large_model():
inputs = Input(shape=(N, K))
# Define the large model
# Just a single neuron here
outputs = Dense(1, activation='relu')(inputs)
return Model(inputs=inputs, outputs=outputs)
def combined_model():
inputs = Input(shape=(N, K))
# The TimeDistributed layer applies the given model
# to every input across dimension 1 (N)
small_model_out = TimeDistributed(small_model())(inputs)
# Apply large model
outputs = large_model()(small_model_out)
return Model(inputs=inputs, outputs=outputs)
model = combined_model()
model.compile(loss='mean_squared_error', optimizer='sgd')
model.summary()
Output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 20) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 20) 420
_________________________________________________________________
model_2 (Model) (None, None, 1) 21
=================================================================
Total params: 441
Trainable params: 441
Non-trainable params: 0
_________________________________________________________________
I have two separately designed CNNs for two different features(image and text) of the same data, and the output has two classes
In the very last layer:
for image (resnet), I would like to use "he_normal" as the initializer
flatten1 = Flatten()(image_maxpool)
dense = Dense(output_dim=2, kernel_initializer="he_normal")(flatten1)
but for the text CNNs, i would like to use the default "glorot_normal"
flatten2 = Flatten()(text_maxpool)
output = Dense(output_dim=2, kernel_initializer="glorot_normal")(flatten2)
the flatten1 and flatten2 have sizes:
flatten_1 (Flatten) (None, 512)
flatten_2 (Flatten) (None, 192)
is there anyway i can concate these two flatten layers and have a long dense layer with a size 192+512 = 704, where the first 192 and second 512 has two seperate kernel_initializer, and produce a 2-class outputs?
something like this:
merged_tensor = merge([flatten1, flatten2], mode='concat', concat_axis=1)
output = Dense(output_dim=2,
kernel_initializer for [:512]='he_normal',
kernel_initializer for [512:]='glorot_normal')(merged_tensor)
Edit: I think I have gotten this work by having the following codes(thanks to #Aechlys):
def my_init(shape, shape1, shape2):
x = initializers.he_normal()(shape1)
y = initializers.glorot_normal()(shape2)
return tf.concat([x,y], 0)
class_num = 2
flatten1 = Flatten()(image_maxpool)
flatten2 = Flatten()(text_maxpool)
merged_tensor = concatenate([flatten1, flatten2],axis=-1)
output = Dense(output_dim=class_num, kernel_initializer=lambda shape: my_init(shape,\
shape1=(512,class_num),\
shape2=(192,class_num)),\
activation='softmax')(merged_tensor)
I have to manually add the shape size 512 and 192, because I failed to get the size of flatten1 and flatten1 via the code
flatten1.get_shape().as_list()
,which gave me [none, none], althought it should be [None, 512], other than that it should be fine
Oh my, have I had fun with this one. You have to create your own kernel intializer:
def my_init(shape, dtype=None, *, shape1, shape2):
x = keras.initializers.he_normal()(shape1, dtype=dtype)
y = keras.initializers.glorot_normal()(shape2, dtype=dtype)
return tf.concat([x,y], 0)
Then you will call it via lambda function within the Dense function:
Unfortunately, as you can see, I have not been able to deduce the shape programatically, yet. I may update this answer when I do. But, if you know the shape beforehand you can pass them as constants:
DENSE_UNITS = 64
input_t = Input((1,25))
input_i = Input((1,35))
input_a = Concatenate(axis=-1)([input_t, input_i])
dense = Dense(DENSE_UNITS, kernel_initializer=lambda shape: my_init(shape,
shape1=(int(input_t.shape[-1]), DENSE_UNITS),
shape2=(int(input_i.shape[-1]), DENSE_UNITS)))(input_a)
tf.keras.Model(inputs=[input_t, input_i], outputs=dense)
Out: <tensorflow.python.keras._impl.keras.engine.training.Model at 0x19ff7baac88>