I have recently started learning how to build LSTM model for multivariate time series data. I have looked here and here on how to pad sequences and implement many-to-many LSTM model. I have created a dataframe to test the model but I keep getting an error (below).
d = {'ID':['a12', 'a12','a12','a12','a12','b33','b33','b33','b33','v55','v55','v55','v55','v55','v55'], 'Exp_A':[2.2,2.2,2.2,2.2,2.2,3.1,3.1,3.1,3.1,1.5,1.5,1.5,1.5,1.5,1.5],
'Exp_B':[2.4,2.4,2.4,2.4,2.4,1.2,1.2,1.2,1.2,1.5,1.5,1.5,1.5,1.5,1.5],
'A':[0,0,1,0,1,0,1,0,1,0,1,1,1,0,1], 'B':[0,0,1,1,1,0,0,1,1,1,0,0,1,0,1],
'Time_Interval': ['11:00:00', '11:10:00', '11:20:00', '11:30:00', '11:40:00',
'11:00:00', '11:10:00', '11:20:00', '11:30:00',
'11:00:00', '11:10:00', '11:20:00', '11:30:00', '11:40:00', '11:50:00']}
df = pd.DataFrame(d)
df.set_index('Time_Interval', inplace=True)
I tried to pad using brute force:
from keras.preprocessing.sequence import pad_sequences
x1 = df['A'][df['ID']== 'a12']
x2 = df['A'][df['ID']== 'b33']
x3 = df['A'][df['ID']== 'v55']
mx = df['ID'].size().max() # Find the largest group
seq1 = [x1, x2, x3]
padded1 = np.array(pad_sequences(seq1, maxlen=6, dtype='float32')).reshape(-1,mx,1)
In similar ways I have created padded2, padded3 and padded4 for each feature:
padded_data = np.dstack((padded1, padded1, padded3, padded4))
padded_data.shape = (3, 6, 4)
padded_data
array([[[0. , 0. , 0. , 0. ],
[0. , 0. , 2.2, 2.4],
[0. , 0. , 2.2, 2.4],
[1. , 1. , 2.2, 2.4],
[0. , 0. , 2.2, 2.4],
[1. , 1. , 2.2, 2.4]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 3.1, 1.2],
[1. , 1. , 3.1, 1.2],
[0. , 0. , 3.1, 1.2],
[1. , 1. , 3.1, 1.2]],
[[0. , 0. , 1.5, 1.5],
[1. , 1. , 1.5, 1.5],
[1. , 1. , 1.5, 1.5],
[1. , 1. , 1.5, 1.5],
[0. , 0. , 1.5, 1.5],
[1. , 1. , 1.5, 1.5]]], dtype=float32)
edit
#split into train/test
train = pad_1[:2] # train on the 1st two samples.
test = pad_1[-1:]
train_X = train[:,:-1] # one step ahead prediction.
train_y = train[:,1:]
test_X = test[:,:-1] # test on the last sample
test_y = test[:,1:]
# check shapes
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
#(2, 5, 4) (2, 5, 4) (1, 5, 4) (1, 5, 4)
# design network
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(train.shape[1], train.shape[2])))
model.add(LSTM(32, input_shape=(train.shape[1], train.shape[2]), return_sequences=True))
model.add(Dense(4))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
model.summary()
# fit network
history = model.fit(train, test, epochs=300, validation_data=(test_X, test_y), verbose=2, shuffle=False)
[![enter image description here][3]][3]
So my questions are:
Surely, there must be an efficient way of transforming the data?
Say I want a single time-step prediction for future sequence, I have
first time-step = array([[[0.5 , 0.9 , 2.5, 3.5]]], dtype=float32)
Where first time-step is a single 'frame' of a sequence.
How do adjust the model to incorporate this?
To resolve the error, remove return_sequence=True from the LSTM layer arguments (since with this architecture you have defined, you only need the output of last layer) and also simply use train[:, -1] and test[:, -1] (instead of train[:, -1:] and test[:, -1:]) to extract the labels (i.e. removing : causes the second axis to be dropped and therefore makes the labels shape consistent with the output shape of the model).
As a side note, wrapping a Dense layer inside a TimeDistributed layer is redundant, since the Dense layer is applied on the last axis.
Update: As for the new question, either pad the input sequence which has only one timestep to make it have a shape of (5,4), or alternatively set the input shape of the first layer (i.e. Masking) to input_shape=(None, train.shape[2]) so the model can work with inputs of varying length.
Related
My initial data is:
[[ 8375.5 0. 8374.14285714 8374.14285714]
[ 8354.5 0. 8383.39285714 8371.52380952]
...
[11060. 0. 11055.21428571 11032.53702732]
[11076.5 0. 11061.60714286 11038.39875701]]
I create MinMax scaler to transform data to values from 0 to 1
scaler = MinMaxScaler(feature_range = (0, 1))
T = scaler.fit_transform(T)
So data now is:
[[0.5186697 , 0. , 0.46812344, 0.46950912],
[0.5161844 , 0. , 0.46935928, 0.46915412],
...,
[0.72264636, 0. , 0.6767292 , 0.6807525 ],
[0.7198651 , 0. , 0.6785377 , 0.6833385 ]]
I do some magic to prepare this data for LSTM layer and this is the result:
X_train variable of shape (6989, 4, 200)
[[[0.5186697 0. 0.46812344 ... 0. 0.45496237 0.45219505]
[0.48742527 0. 0.45273864 ... 0. 0.43144143 0.431924 ]
[0.4800284 0. 0.43054438 ... 0. 0.425362 0.4326681 ]
[0.5007989 0. 0.4290794 ... 0. 0.4696839 0.47831726]]
...
[[0.61240304 0. 0.57254803 ... 0. 0.5749577 0.57792616]
[0.61139715 0. 0.5746571 ... 0. 0.5971378 0.6017289 ]
[0.6365465 0. 0.59772 ... 0. 0.62671924 0.63145673]
[0.65719867 0. 0.62684333 ... 0. 0.6757128 0.6772785 ]]]
I process the data using this model with Dense(1) layer at the end:
model = Sequential()
model.add(LSTM(units = 50, activation = 'relu', #return_sequences = True,
input_shape = (X_train.shape[1], window_size)))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
And when I set return_sequences to false the shape of new data after fit is (6989, 1) and when I want to inverse_transform scaler.inverse_transform(train_predict) using this scalar I get an error:
ValueError: non-broadcastable output operand with shape (6989,1) doesn't match the broadcast shape (6989,4)
When I do set return_sequences to true the new shape is (6989, 4, 1) and when I inverse_transform I get other error:
ValueError: Found array with dim 3. None expected <= 2.
============
I think I know why I get these errors, because scaler requires shape of (6989,4), but what and how can I do to transform this data so that I will be able to inverse_transform?
How can I inverse_transform the data of new shape of (6989, 1)?
How can I inverse_transform the data of new shape of (6989, 4, 1)?
Is it doable? My scaler can be used? Or should I create new scaler? Can you suggest something? What am I missing?
I will appreciate any help, thanks!
I am trying to implement a custom layer that would preprocess a tokenized sequence of words into a matrix with a predefined number of elements equal to the size of vocabulary. Essentially, I'm trying to implement a 'bag of words' layer. This is the closest I could come up with:
def get_encoder(vocab_size=args.vocab_size):
encoder = TextVectorization(max_tokens=vocab_size)
encoder.adapt(train_dataset.map(lambda text, label: text))
return encoder
class BagOfWords(tf.keras.layers.Layer):
def __init__(self, vocab_size=args.small_vocab_size, batch_size=args.batch_size):
super(BagOfWords, self).__init__()
self.vocab_size = vocab_size
self.batch_size = batch_size
def build(self, input_shape):
super().build(input_shape)
def call(self, inputs):
if inputs.shape[-1] == None:
return tf.constant(np.zeros([self.batch_size, self.vocab_size])) # 32 is the batch size
outputs = tf.zeros([self.batch_size, self.vocab_size])
if inputs.shape[-1] != None:
for i in range(inputs.shape[0]):
for ii in range(inputs.shape[-1]):
ouput_idx = inputs[i][ii]
outputs[i][ouput_idx] = outputs[i][ouput_idx] + 1
return outputs
model = keras.models.Sequential()
model.add(encoder)
model.add(bag_of_words)
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
No surprise that I get an error when calling fit() on the model: "Incompatible shapes: [8,1] vs. [32,1]". This happens on the last steps, when the batch size is less than 32.
My question is: Putting aside performance, how do I define the outputs Tensor for my bag of words matrix so that it has a dynamic shape for batching and get my code working?
Edit 1
After the comment, I realised that the code doesn't work indeed because it never goes to the 'else' branch.
I edited it a bit so that it uses only tf functions:
class BagOfWords(tf.keras.layers.Layer):
def __init__(self, vocab_size=args.small_vocab_size, batch_size=args.batch_size):
super(BagOfWords, self).__init__()
self.vocab_size = vocab_size
self.batch_size = batch_size
self.outputs = tf.Variable(tf.zeros([batch_size, vocab_size]))
def build(self, input_shape):
super().build(input_shape)
def call(self, inputs):
if tf.shape(inputs)[-1] == None:
return tf.zeros([self.batch_size, self.vocab_size])
self.outputs.assign(tf.zeros([self.batch_size, self.vocab_size]))
for i in range(tf.shape(inputs)[0]):
for ii in range(tf.shape(inputs)[-1]):
output_idx = inputs[i][ii]
if output_idx >= tf.constant(self.vocab_size, dtype=tf.int64):
output_idx = tf.constant(1, dtype=tf.int64)
self.outputs[i][output_idx].assign(self.outputs[i][output_idx] + 1)
return outputs
It didn't help though: AttributeError: 'Tensor' object has no attribute 'assign'.
Here is an example of a Bag-of-Words custom keras layer without using any additional preprocessing layers:
import tensorflow as tf
class BagOfWords(tf.keras.layers.Layer):
def __init__(self, vocabulary_size):
super(BagOfWords, self).__init__()
self.vocabulary_size = vocabulary_size
def call(self, inputs):
batch_size = tf.shape(inputs)[0]
outputs = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
for i in range(batch_size):
string = inputs[i]
string_length = tf.shape(tf.where(tf.math.not_equal(string, b'')))[0]
string = string[:string_length]
string_array = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
for s in string:
string_array = string_array.write(string_array.size(), tf.where(tf.equal(s, self.vocabulary_size), 1.0, 0.0))
outputs = outputs.write(i, tf.cast(tf.reduce_any(tf.cast(string_array.stack(), dtype=tf.bool), axis=0), dtype=tf.float32))
return outputs.stack()
And here are the manual preprocessing steps and the model:
labels = [[1], [0], [1], [0]]
texts = ['All my cats in a row',
'When my cat sits down, she looks like a Furby toy!',
'The cat from the outer space',
'Sunshine loves to sit like this for some reason.']
DEFAULT_STRIP_REGEX = r'[!"#$%&()\*\+,-\./:;<=>?#\[\\\]^_`{|}~\']'
tensor_of_strings = tf.constant(texts)
tensor_of_strings = tf.strings.lower(tensor_of_strings)
tensor_of_strings = tf.strings.regex_replace(tensor_of_strings, DEFAULT_STRIP_REGEX, "")
split_strings = tf.strings.split(tensor_of_strings).to_tensor()
flattened_split_strings = tf.reshape(split_strings, (split_strings.shape[0] * split_strings.shape[1]))
unique_words, _ = tf.unique(flattened_split_strings)
unique_words = tf.random.shuffle(unique_words)
bag_of_words = BagOfWords(vocabulary_size = unique_words)
train_dataset = tf.data.Dataset.from_tensor_slices((split_strings, labels))
model = tf.keras.Sequential()
model.add(bag_of_words)
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss = tf.keras.losses.BinaryCrossentropy())
model.fit(train_dataset.batch(2), epochs=2)
Epoch 1/2
4/4 [==============================] - 2s 7ms/step - loss: 0.7081
Epoch 2/2
4/4 [==============================] - 0s 6ms/step - loss: 0.7008
<keras.callbacks.History at 0x7f5ba844bad0>
And this is what the 4 encoded sentences look like:
print(bag_of_words(split_strings))
tf.Tensor(
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0.
1. 1. 1. 0.]
[1. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0.
0. 1. 1. 0.]
[0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 0.
0. 0. 0. 0.]
[0. 1. 0. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
0. 0. 0. 1.]], shape=(4, 28), dtype=float32)
Correct me if I am wrong, but I think that using the output_mode="multi_hot" of the TextVectorization layer would be sufficient to do what you want to do. According to the docs, the multi_hot output mode:
Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item
So it could be as simple as this:
import tensorflow as tf
def get_encoder():
encoder = tf.keras.layers.TextVectorization(output_mode="multi_hot")
encoder.adapt(train_dataset.map(lambda text, label: text))
return encoder
texts = [
'All my cats in a row',
'When my cat sits down, she looks like a Furby toy!',
'The cat from outer space',
'Sunshine loves to sit like this for some reason.']
labels = [[1], [0], [1], [1]]
train_dataset = tf.data.Dataset.from_tensor_slices((texts, labels))
model = tf.keras.Sequential()
model.add(get_encoder())
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss = tf.keras.losses.BinaryCrossentropy())
model.fit(train_dataset.batch(2), epochs=2)
This is how your texts would be encoded:
import tensorflow as tf
texts = ['All my cats in a row',
'When my cat sits down, she looks like a Furby toy!',
'The cat from outer space',
'Sunshine loves to sit like this for some reason.']
encoder = get_encoder()
inputs = encoder(texts)
print(inputs)
tf.Tensor(
[[0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0.
0. 0. 1. 1.]
[0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 1. 0. 1. 0.
0. 1. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1.
0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 1. 0. 0. 0. 0.
1. 0. 0. 0.]], shape=(4, 28), dtype=float32)
So just as you tried in your custom layer, the presence of words in a sequence is marked with 1 and the absence of words is marked with 0.
The answer above by #AloneTogether is perfectly relevant. Just wanted to publish the working code that I came up with in the first place without manual processing.
import tensorflow_datasets as tfds
ds, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True, data_dir='/tmp/imdb')
train_dataset = ds['train']
def get_encoder(vocab_size=args.vocab_size):
encoder = TextVectorization(max_tokens=vocab_size)
encoder.adapt(train_dataset.map(lambda text, label: text))
return encoder
class BagOfWords(tf.keras.layers.Layer):
def __init__(self, vocabulary_size):
super(BagOfWords, self).__init__()
self.vocabulary_size = vocabulary_size
def call(self, inputs):
batch_size = tf.shape(inputs)[0]
outputs = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
for i in range(batch_size):
int_string = inputs[i]
array_string = tf.TensorArray(dtype=tf.float32, size=self.vocabulary_size)
array_string.unstack(tf.zeros(self.vocabulary_size))
for int_word in int_string:
idx = int_word
idx = tf.cond(idx >= self.vocabulary_size, lambda: 1, lambda: tf.cast(idx, tf.int32))
array_string = array_string.write(idx, array_string.read(idx) + 1.0)
outputs = outputs.write(i, array_string.stack())
return outputs.stack()
encoder = get_encoder(args.small_vocab_size)
bag_of_words = BagOfWords(args.small_vocab_size)
model = keras.models.Sequential()
model.add(encoder)
model.add(bag_of_words)
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
for d in train_dataset.batch(args.batch_size).take(1):
model(d[0])
model.compile(optimizer=keras.optimizers.Nadam(learning_rate=1e-3),
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
I'm doing a neural network designed to classify between 10 different compounds, the data set is something like:
array([[400. , 23. , 52.38, ..., 1. , 0. , 0. ],
[400. , 21.63, 61.61, ..., 0. , 0. , 0. ],
[400. , 21.49, 61.95, ..., 0. , 0. , 0. ],
...,
[400. , 21.69, 41.98, ..., 0. , 0. , 0. ],
[400. , 22.48, 65.2 , ..., 0. , 0. , 0. ],
[400. , 22.02, 58.91, ..., 0. , 0. , 1. ]])
where the 10 last numbers are the one hot encoded for the compounds I want to identify. This is the code I'm using:
dataset=numpy.asfarray(dataset[1:,0:],float)
x = dataset[0:,0:30]
y = dataset[0:,30:40]
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.20, random_state=1) #siempre ha sido 42
standard=preprocessing.StandardScaler().fit(x_train)
x_train=standard.transform(x_train)
x_test=standard.transform(x_test)
dump(standard, 'std_modelo_400.bin', compress=True)
model = Sequential()
model.add(Dense(50, input_dim = x_test.shape[1], activation = 'relu',kernel_regularizer=keras.regularizers.l1(0.01)))
model.add(Dense(30, input_dim = x_test.shape[1], activation = 'relu',kernel_regularizer=keras.regularizers.l1(0.01)))
model.add(Dense(15, input_dim = x_test.shape[1], activation = 'relu',kernel_regularizer=keras.regularizers.l1(0.01)))
model.add(Dense(10, activation='softmax',kernel_initializer='normal', bias_initializer=keras.initializers.Constant(value=0)))
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
history=model.fit(x_train,y_train,validation_data=(x_test,y_test),verbose=2,epochs=epochs,batch_size=batch_size)#callbacks=[monitor] , verbose=2
I try to get the confusion matrix using the command multilabel_confusion_matrix(y_test,pred) and I get in this form:
array([[[929681, 158],
[ 308, 102180]],
[[930346, 407],
[ 6677, 94897]],
[[930740, 38],
[ 477, 101072]],
[[929287, 1522],
[ 69, 101449]],
[[929703, 8843],
[ 12217, 81564]],
[[902624, 474],
[ 1565, 127664]],
[[931152, 2236],
[ 12140, 86799]],
[[929085, 10],
[ 0, 103232]],
[[911158, 22378],
[ 5362, 93429]],
[[930412, 689],
[ 617, 100609]]], dtype=int64)
When I use multilabel_confusion_matrix(y_test,pred,labels=["Comp1","Comp2","Comp3", "Comp4", "Comp5", "Comp6", "Comp7", "Comp8", "Comp9", "Comp10",]) I get an error:
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
mask &= (ar1 != a)
Traceback (most recent call last):
File "<ipython-input-18-00af06ffcbef>", line 1, in <module>
multilabel_confusion_matrix(y_test,pred,labels=["Comp1","Comp2","Comp3", "Comp4", "Comp5", "Comp6", "Comp7", "Comp8", "Comp9", "Comp10",])
File "C:\Users\fmarin\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py", line 485, in multilabel_confusion_matrix
if np.max(labels) > np.max(present_labels):
I have no idea how to fix it. I also like to get the graphic version of the confusion matrix, I'm using scikit-learn toolbox.
Thank you!
Observing the outputs of embedding layer with and without dropout shows that values in the arrays are replaced with 0. But along with this why other values of array changed ?
Following is my model:-
input = Input(shape=(23,))
model = Embedding(input_dim=n_words, output_dim=23, input_length=23)(input)
model = Dropout(0.2)(model)
model = Bidirectional(LSTM(units=LSTM_N, return_sequences=True, recurrent_dropout=0.1))(model)
out = TimeDistributed(Dense(n_tags, activation="softmax"))(model) # softmax output layer
model = Model(input, out)
Building model2 from trained model , with input as the input layer and output as the output of Dropout(0.2) . -
from keras import backend as K
model2 = K.function([model.layers[0].input , K.learning_phase()],
[model.layers[2].output] )
dropout = model2([X_train[0:1] , 1])[0]
nodrop = model2([X_train[0:1] , 0])[0]
Printing the first array of both dropout and no dropout:
dropout[0][0]
Output-
array([ 0. , -0. , -0. , -0.04656423, -0. ,
0.28391626, 0.12213208, -0.01187495, -0.02078421, -0. ,
0.10585815, -0. , 0.27178472, -0.21080771, 0. ,
-0.09336889, 0.07441022, 0.02960865, -0.2755439 , -0.11252255,
-0.04330419, -0. , 0.04974075], dtype=float32)
-
nodrop[0][0]
Output-
array([ 0.09657606, -0.06267098, -0.00049554, -0.03725138, -0.11286845,
0.22713302, 0.09770566, -0.00949996, -0.01662737, -0.05788678,
0.08468652, -0.22405024, 0.21742778, -0.16864617, 0.08558936,
-0.07469511, 0.05952817, 0.02368692, -0.22043513, -0.09001804,
-0.03464335, -0.05152775, 0.0397926 ], dtype=float32)
Some values are replaced with 0 , agreed, but why are other values changed ?
As the embedding outputs have a meaning and are unique for each of the words, if these are changed by applying dropout, then is it correct to apply dropout after embedding layer ?
Note- I have used "learning_phase" as 0 and 1 for testing(nodropout)
and training(droput) respectively.
It is how the dropout regularization works. After applying the dropout, the values are divided by the keeping probability (in this case 0.8).
When you use dropout, the function receives the probability of turning a neuron to zero as input, e.g., 0.2, which means it has 0.8 chance of keeping any given neuron. So, the values remaining will be multiplied by 1/(1-0.2).
This is called "inverted dropout technique" and it is done in order to ensure that the expected value of the activation remains the same. Otherwise, predictions will be wrong during inference when dropout is not used.
You'll notice that your dropout is 0.2, and all your values have been multiplied by 0.8 after you applied dropout.
Look what happens if I divide your second output bu the first:
import numpy as np
a = np.array([ 0. , -0. , -0. , -0.04656423, -0. ,
0.28391626, 0.12213208, -0.01187495, -0.02078421, -0. ,
0.10585815, -0. , 0.27178472, -0.21080771, 0. ,
-0.09336889, 0.07441022, 0.02960865, -0.2755439 , -0.11252255,
-0.04330419, -0. , 0.04974075])
b = np.array([ 0.09657606, -0.06267098, -0.00049554, -0.03725138, -0.11286845,
0.22713302, 0.09770566, -0.00949996, -0.01662737, -0.05788678,
0.08468652, -0.22405024, 0.21742778, -0.16864617, 0.08558936,
-0.07469511, 0.05952817, 0.02368692, -0.22043513, -0.09001804,
-0.03464335, -0.05152775, 0.0397926 ])
print(b/a)
[ inf inf inf 0.79999991 inf 0.80000004
0.79999997 0.8 0.8000001 inf 0.8 inf
0.80000001 0.80000001 inf 0.79999998 0.79999992 0.8
0.80000004 0.8 0.79999995 inf 0.8 ]
Based on a matrix, I am trying to approximate a value (regression). However, the CNN always predicts a matrix which is identical to the input of predict.
I am not getting any errors.
The data (matrices) used for training are stored in a numpy array but I only have around 9000 samples available. The values for each matrix are stored in a one dimensional array (one value for each matrix).
This is my model:
model = keras.Sequential([
layers.Conv2D(64, kernel_size=3, activation='selu', input_shape=(8, 8, 1)),
layers.Conv2D(64, kernel_size=3, activation='selu'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=2, activation='selu'),
layers.Flatten(),
layers.Dense(1, activation='linear')
])
optimizer = keras.optimizers.RMSprop(0.001)
model.compile(optimizer=optimizer,
loss='mean_squared_error',
metrics=['mean_squared_error'])
model.fit(matrices, values, epochs=10)
test_loss = model.evaluate(test_boards, test_values, verbose=2)
Example output when calling prediction = model.predict(some_matrix) can be found below. In this case some_matrix is equal to the output below.
[[ 51. 0. 33. 0. 100. 33. 0. 51.]
[ 10. 10. 10. 0. 0. 10. 10. 10.]
[ 0. 0. 32. 0. 0. 32. 0. 0.]
[ 0. 0. 0. 88. 10. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. -10. 0. -32. 0. 0.]
[ -10. -10. -10. 0. 0. -10. -10. -10.]
[ -51. -32. -33. -88. -100. -33. 0. -51.]]
What am I missing to get a single value as output? Or at least a modified version of the input?
Edit:
My matrix data (did not fit in a free pastebin account, sorry)
My values
An example google colab file
I did not find a way to provide the data into Google Colab and include them in the link, I'm sorry for the inconvenience.
I did get an error this time which I did not get when running the code in my own environment. This is definitely the issue but I am still unaware of how to fix this.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-595f98617fa0> in <module>()
97 [ -51, -32, -33, -88, -100, -33, 0, -51,]])
98 print(test_boards[0])
---> 99 prediction = model.predict(test_boards[0])
100 print("Prediction:")
101 print(prediction)
3 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
561 ': expected ' + names[i] + ' to have ' +
562 str(len(shape)) + ' dimensions, but got array '
--> 563 'with shape ' + str(data_shape))
564 if not check_batch_axis:
565 data_shape = data_shape[1:]
ValueError: Error when checking input: expected conv2d_12_input to have 4 dimensions, but got array with shape (8, 8, 1)
You need to add the batch size dimension to the test sample.
some_matrix = some_matrix[np.newaxis,:,:,np.newaxis]