Problem with dimensionality in Keras RNN - reshape isn't working? - python

Let's consider this random dataset on which I want to perform RNN:
import random
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN
from keras.optimizers import SGD
import numpy as np
df_train = random.sample(range(1, 100), 50)
I want to apply RNN with lag equal to 1. I'll use my own function:
def create_dataset(dataset, lags):
dataX, dataY = [], []
for i in range(lags):
subdata = dataset[i:len(dataset) - lags + i]
dataX.append(subdata)
dataY.append(dataset[lags:len(dataset)])
return np.array(dataX), np.array(dataY)
which narrows dataframe with respect to number of lags. It outputs two numpy arrays - first is independent variables, and second one is dependent variable.
x_train, y_train = create_dataset(df_train, lags = 1)
But now when I'm trying to run the function:
model = Sequential()
model.add(SimpleRNN(1, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer=SGD(lr = 0.1))
history = model.fit(x_train, y_train, epochs=1000, batch_size=50, validation_split=0.2)
I obtain error:
ValueError: Error when checking input: expected simple_rnn_18_input to have 3 dimensions, but got array with shape (1, 49)
I've read about it and the solution is just to apply reshape:
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
but when I apply it I obtain error:
ValueError: Error when checking input: expected simple_rnn_19_input to have shape (1, 1) but got array with shape (1, 49)
and I'm not sure where is the mistake. Could you please tell me what I'm doing wrong?

What you are calling lags is called look back in literature. This technique allow to feed the RNN with more contextual data and learn mid/long range dependencies.
The error is telling you that you are feeding the layer (shape: 1x1) with the dataset (shape: 1x49)
There are 2 reasons behind the error:
The first is due to your create_dataset which is building a stack of 1x(50 - lags) = 1x49 vectors, which is the opposite of what you want 1x(lags) = 1x1.
In particular this line is the responsible:
subdata = dataset[i:len(dataset) - lags + i]
# with lags = 1 you have just one
# iteration in range(1): i = 0
subdata = dataset[0:50 - 1 + 0]
subdata = dataset[0:49] # which is a 1x49 vector
# In order to create the right vector
# you need to change your function:
def create_dataset(dataset, lags = 1):
dataX, dataY = [], []
# iterate to a max of (50 - lags - 1) times
# because we need "lags" element in each vector
for i in range(len(dataset) - lags - 1):
# get "lags" elements from the dataset
subdata = dataset[i:i + lags]
dataX.append(subdata)
# get only the last label representing
# the current element iteration
dataY.append(dataset[i + lags])
return np.array(dataX), np.array(dataY)
If you use look back in your RNN you also need to increase the input dimensions, because you are looking also at precendent samples.
The network indeed is looking to more data than just 1 sample, because it needs to "look back" to more samples to understand mid/long range dependencies.
This is more conceptual than actual, in your code is fine because lags = 1:
model.add(SimpleRNN(1, input_shape=(1, 1)))
# you should use lags in the input shape
model.add(SimpleRNN(1, input_shape=(1, LAGS)))

Related

Keras network returns same output regardless of the input

I am trying to use keras dense neural networks to forecast some time series.
When fitting my model on complex real datasets, my model converges toward a constant output, i.e. whatever the input, the model gives the same output (which seems to be a reasonable estimate of the mean of my dataset).
I reduced the problem up to very simple simulated datasets, and still have the same issue. Here is a minimal working example:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
X = []
Y = []
for jh in range(10000):
x = np.arange(-1, 1, 0.01)
y = 1+x*((np.random.random()-0.5))
y += np.random.randn(len(x))/(100)
X.append(y[:100])
Y.append(y[100:])
X = np.array(X)[:,:,None]
Y = np.array(Y)[:,:,None]
model = models.Sequential()
model.add(layers.Input((100,1,)))
model.add(layers.Flatten())
model.add(layers.Dense(100, activation='sigmoid'))
model.add(layers.Dense(100, activation='sigmoid'))
model.add(layers.Dense(100, activation='sigmoid'))
model.add(tf.keras.layers.Reshape((100,1)))
model.compile(loss = tf.keras.losses.MeanSquaredError(),optimizer="adam")
# model.summary()
print("Fit model on training data")
print("Fit model on training data")
history = model.fit(x=X, y=Y, batch_size=10000, epochs=200)
for k in np.arange(0,10000,1000):
plt.plot(np.arange(len(X[k])), X[k])
plt.plot(np.arange(len(X[k]), len(X[k])+len(Y[k])), model(X)[k])
plt.plot(np.arange(len(X[k]), len(X[k])+len(Y[k])), Y[k])
In this example, the model returns exactly same output regardless of the input.
I tried to change the number of layers, the loss function, the learning rate, the batch size and the number of epochs, without any noticeable improvement.
Do you have any suggestion on this issue?
If you rearrange your random inputs to be like
y = np.array(1. + x)
y += 1. / 100.
also
J, K = [] , []
for jh in range(10000):
j = np.arange(-1, 1, 0.01)
k = -np.array(1. - j)
k += 1. / 100
J.append(k[:100])
K.append(k[100:])
J = np.array(J)[:, :, None]
K = np.array(K)[:, :, None]
and finally add
plt.plot(np.arange(len(X[k]), len(X[k]) + len(Y[k])), model(J)[k])
in the plotting loop, then you will see two different results. Probably you should check your datasets diversity.

Why can the reshape function in keras not change the number of dimensions

I'm attempting to make a chess engine using a neural network made in Keras. I want to output a prediction of the probable policy based off training games, and I am using a 73x8x8 output to do that (each position on the board times 73 different possible moves, 8 directions * 7 squares for the "queen moves", 8 knight moves, 3 promotions (any other promotion is a queen promotion) times 3 directions).
However the final layer in my network is a Dense layer, which outputs a single dimensional 4672 long output. I am trying to reshape this into something easier to use through the Reshape layer.
However, it gives me this error: ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
I have had a look at this question: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (118, 1) but the answer doesn't seem to apply to Dense layers as they do not have a "return sequences" input.
Here is my code:
from keras.models import Model, Input
from keras.layers import Conv2D, Dense, Flatten, Reshape
from keras.optimizers import SGD
import numpy
from copy import deepcopy
class NeuralNetwork:
def __init__(self):
self.network = Model()
self.create_network()
def create_network(self):
input = Input((13, 8, 8))
output = Conv2D(256, (3, 3), padding='same')(input)
policy_head_output = Conv2D(2, (1, 1))(output)
policy_head_output = Flatten()(policy_head_output)
policy_head_output = Dense(4672, name='policy_output')(policy_head_output)
policy_head_output = Reshape((73, 8, 8), input_shape=(4672,))(policy_head_output)
value_head_output = Conv2D(1, (1, 1))(output)
value_head_output = Dense(256)(value_head_output)
value_head_output = Flatten()(value_head_output)
value_head_output = Dense(1, name="value_output")(value_head_output)
self.network = Model(outputs=[value_head_output, policy_head_output], inputs=input)
def train_network(self, input_training_data, labels):
sgd = SGD(0.2, 0.9)
self.network.compile(sgd, 'categorical_crossentropy', metrics=['accuracy'])
self.network.fit(input_training_data, [labels[0], labels[1]])
self.network.save("Neural Net 1")
def make_training_data():
training_data = []
labels = []
for i in range(6):
training_data.append(make_image())
labels.append(make_label_image())
return training_data, labels
def make_image():
data = []
for i in range(13):
blank_board = []
for j in range(8):
a = []
for k in range(8):
a.append(0)
blank_board.append(a)
data.append(blank_board)
return data
def make_label_image():
policy_logits = []
blank_board = []
for i in range(8):
a = []
for j in range(8):
a.append(0)
blank_board.append(a)
for i in range(73):
policy_logits.append(deepcopy(blank_board))
return [policy_logits, [0]]
def main():
input_training_data, output_training_data = make_training_data()
neural_net = NeuralNetwork()
input_training_data = numpy.array(input_training_data)
output_training_data = numpy.array(output_training_data)
neural_net.train_network(input_training_data, output_training_data)
main()
Could someone please explain:
What's happening
What I can do to fix it
There's a few things wrong with your approach.
1. Your output/target data
So you're creating a list object with two elements (board, label). Board is 73x8x8 where label is 0/1. This creates inconsistent dimensions. And when you convert this ragged structure to a numpy array this happens.
a = [[0,1,2,3],[0]]
arr = np.array(a)
print(arr)
# => [list([0, 1, 2, 3]) list([0])]
Then data slicing indexing takes a very weird turn and I will not go there. So, first thing is separate out your data, so that each element returned in your make_training_data has consistent dimensions. So here we have the input_image, output_board_image and output_labels returned separately.
def make_training_data():
training_data = []
labels = []
board_output = []
for i in range(6):
training_data.append(make_image())
board, lbl = make_label_image()
labels.append(lbl)
board_output.append(board)
return training_data, board_output, labels
and in the main(), it becomes,
input_training_data, output_training_board, output_training_labels = make_training_data()
input_training_data = np.array(input_training_data)
output_training_board = np.array(output_training_board)
output_training_labels = np.array(output_training_labels)
The error
So you're getting the error
ValueError: Error when checking target: expected reshape_1 to have 4 dimensions, but got array with shape (2, 1)
Well, it's simple, you have given the outputs in the wrong order when doing the model.fit(). In other words, your model says,
outputs=[value_head_output, policy_head_output]
and your make_labels() says,
[policy_logits, [0]]
which is the other way around. Your poor model is trying to reshape labels to that 4 dimensional structure. That's why it complains. So it should be,
neural_net.train_network(input_training_data, [output_training_labels, output_training_board])
Even if you correct just this (without the make_training_data()), you probably won't get this working because of all those inconsistencies in your numpy structure (the first section).
The loss function
This is about your loss function. You have a Dense layer with a single output and you're using categorical_crossentropy which is for "categorical" outputs. You should use binary_crossentropy here, as you only have a single index.
Also, if you want multiple losses for your multiple outputs do the following.
self.network.compile(sgd, ['binary_crossentropy', 'mean_squared_error'], metrics=['accuracy'])
This is just an example. If you want you can have the same loss for both inputs too.

Averaging over the batch dimension in Keras

I've got a problem where I want to predict one time series with many time series. My input is (batch_size, time_steps, features) and my output should be (1, time_steps, features)
I can't figure out how to average over N.
Here's a dummy example. First, dummy data where the output is a linear function of 200 time series:
import numpy as np
time = 100
N = 2000
dat = np.zeros((N, time))
for i in range(time):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
y = dat.T # np.random.normal(size = N)
Now I'll define a time series model (using 1-D conv nets):
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda
from keras.optimizers import Adam
from keras import backend as K
n_filters = 2
filter_width = 3
dilation_rates = [2**i for i in range(5)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Out[43]: (2000, 100, 1)
The output is the wrong shape! Next, I tried using an averaging layer, but I get this weird error:
def av_over_batches(x):
x = K.mean(x, axis = 0)
return(x)
x = Lambda(av_over_batches)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Traceback (most recent call last):
File "<ipython-input-3-d43ccd8afa69>", line 4, in <module>
model.predict(dat.reshape(N, time, 1)).shape
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 302, in predict_loop
outs[i][batch_start:batch_end] = batch_out
ValueError: could not broadcast input array from shape (100,1) into shape (32,1)
Where does 32 come from? (Incidentally, I got the same number in my real data, not just in the MWE).
But the main question is: how can I build a network that averages over the input batch dimension?
I would approach the problem in a different way
Problem: You want to predict a time series from a set of time series. so lets say you have 3 time series value TS1, TS2, TS3 each of 100 time steps you want to predict a time series y1, y2, y3.
My approach for this problem will be as below
i.e group the times series each time step together and feed it to an LSTM. If some time steps are shorter then others them you can pad them. Similarly if some sets have fewer time series then again pad them.
Example:
import numpy as np
np.random.seed(33)
time = 100
N = 5000
k = 5
magic = np.random.normal(size = k)
x = list()
y = list()
for i in range(N):
dat = np.zeros((k, time))
for i in range(k):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
x.append(dat)
y.append(dat.T # magic)
So I want to predict a timeseries of 100 steps from a set of 3 times steps. We want to the model to learn the magic.
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda, LSTM
from keras.optimizers import Adam
from keras import backend as K
import matplotlib.pyplot as plt
input = Input(shape=(time, k))
lstm = LSTM(32, return_sequences=True)(input)
output = Dense(1,activation='sigmoid')(lstm)
model = Model(inputs = input, outputs = output)
model.compile(optimizer = Adam(), loss='mean_squared_error')
data_x = np.zeros((N,100,5))
data_y = np.zeros((N,100,1))
for i in range(N):
data_x[i] = x[i].T.reshape(100,5)
data_y[i] = y[i].reshape(100,1)
from sklearn.preprocessing import StandardScaler
ss_x = StandardScaler()
ss_y = StandardScaler()
data_x = ss_x.fit_transform(data_x.reshape(N,-1)).reshape(N,100,5)
data_y = ss_y.fit_transform(data_y.reshape(N,-1)).reshape(N,100,1)
# Lets leave the last one sample for testing rest split into train and validation
model.fit(data_x[:-1],data_y[:-1], batch_size=64, nb_epoch=100, validation_split=.25)
The val loss was going down still but I stoped it. Lets see how good our prediction is
y_hat = model.predict(data_x[-1].reshape(-1,100,5))
plt.plot(data_y[-1], label='y')
plt.plot(y_hat.reshape(100), label='y_hat')
plt.legend(loc='upper left')
The results are promising. Running it for more epochs and also hyper parameter tuning should further bring us close the the magic. One can also try stacked LSTM and bi-directional LSTM.
I feel RNNs are better suited for time series data rather then CNN's
Data Format:
Lets say time steps = 3
Time series 1 = [1,2,3]
Time series 2 = [4,5,6]
Time series 3 = [7,8,9]
Time series 3 = [10,11,12]
Y = [100,200,300]
For a batch size of 1
[[1,4,7,10],[2,5,8,11],[3,6,9,12]] -> LSTM -> [100,200,300]

Shape of data and LSTM Input for varying timesteps

For my master thesis, I want to predict the price of a stock in the next hour using a LSTM model. My X data contains 30.000 rows with 6 dimensions (= 6 features), my Y data contains 30.000 rows and only 1 dimension (=target variable). For my first LSTM model, I reshaped the X data to (30.000x1x6), the Y data to (30.000x1) and determined the input like this:
input_nn = Input(shape=(1, 6))
I am not sure how to reshape the data and to determine the input shape for the model if I want to increase the timesteps. I still want to predict the stock price in the next hour, but include more previous time steps.
Do I have to add the data from previous timesteps in my X data in the second dimension?
Can you explain what the number of units of a LSTM exactly refers to? Should it be the same as the number of timesteps in my case?
You are on the right track but confusing the number of units with timesteps. The units is a hyper-parameter that controls the output dimension of the LSTM. It is the dimension of the LSTM output vector, so if input is (1,6) and you have 32 units you will get (32,) as in the LSTM will traverse the single timestep and produce a vector of size 32.
Timesteps refers to the size of the history you can your LSTM to consider. So it isn't the same as units at all. Instead of processing the data yourself, Keras has a handy TimeseriesGenerator which will take a 2D data like yours and use a sliding window of some timestep size to generate timeseries data. From the documentation:
from keras.preprocessing.sequence import TimeseriesGenerator
import numpy as np
data = np.array([[i] for i in range(50)])
targets = np.array([[i] for i in range(50)])
data_gen = TimeseriesGenerator(data, targets,
length=10, sampling_rate=2,
batch_size=2)
assert len(data_gen) == 20
batch_0 = data_gen[0]
x, y = batch_0
assert np.array_equal(x,
np.array([[[0], [2], [4], [6], [8]],
[[1], [3], [5], [7], [9]]]))
assert np.array_equal(y,
np.array([[10], [11]]))
which you can use directory in model.fit_generator(data_gen,...) giving you the option to try out different sampling_rates, timesteps etc. You should probably investigate these parameters and how they affect the result in your thesis.
Update with code that is roughly 5 times quicker than the last one:
x = np.load(nn_input + "/EOAN" + "/EOAN_X" + ".npy")
y = np.load(nn_input + "/EOAN" + "/EOAN_Y" + ".npy")
num_features = x.shape[1]
num_time_steps = 500
for train_index, test_index in tscv.split(x):
# Split into train and test set
print("Fold:", fold_counter, "\n" + "Train Index:", train_index, "Test Index:", test_index)
x_train_raw, y_train, x_test_raw, y_test = x[train_index], y[train_index], x[test_index], y[test_index]
# Scaling the data
scaler = StandardScaler()
scaler.fit(x_train_raw)
x_train_raw = scaler.transform(x_train_raw)
x_test_raw = scaler.transform(x_test_raw)
# Creating Input Data with variable timesteps
x_train = np.zeros((x_train_raw.shape[0] - num_time_steps + 1, num_time_steps, num_features), dtype="float32")
x_test = np.zeros((x_test_raw.shape[0] - num_time_steps + 1, num_time_steps, num_features), dtype="float32")
for row in range(len(x_train)):
for timestep in range(num_time_steps):
x_train[row][timestep] = x_train_raw[row + timestep]
for row in range(len(x_test)):
for timestep in range(num_time_steps):
x_test[row][timestep] = x_test_raw[row + timestep]
y_train = y_train[num_time_steps - 1:]
y_test = y_test[num_time_steps - 1:]

Sequence-to-sequence classification with variable sequence lengths in Keras

I would like to train an LSTM or GRU network in TensorFlow/Keras to continuously recognize whether a user is walking or not based on input from motion sensors (accelerometer and gyroscope). I have 50 input sequences with lengths varying from 581 to 5629 time steps and 6 features and 50 corresponding output sequences of boolean values. My problem is that I don't know how to feed the training data to the fit() method.
I know approximately what I need to do: I'd like to train with 5 batches of 10 sequences each, and for each batch I have to pad all but the longest sequence so all 10 sequences have the same lengths and apply masking. I just don't know how to build the data structures. I know that I can make one big 3D tensor of size (50,5629,6) and that works, but it's painfully slow, so I'd really like to make the sequence length of each batch as small as possible.
Here's the problem in code:
import tensorflow as tf
import numpy as np
# Load data from file
x_list, y_list = loadSequences("train.csv")
# x_list is now a list of arrays (n,6) of float64, where n is the timesteps
# and 6 is the number of features, sorted by increasing sequence lengths.
# y_list is a list of arrays (n,1) of Boolean.
x_train = # WHAT DO I WRITE HERE?
y_train = # AND HERE?
model = tf.keras.models.Sequential([
tf.keras.layers.Masking(),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(2, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=10, epochs=100)
You can do some thing like this
use generator function take a look at this link fit_generator look for fit_generator method.
def data_generater(batch_size):
print("reading data")
training_file = 'data_location', 'r')
# assuming data is in json format so feels free to change accordingly
training_set = json.loads(training_file.read())
training_file.close()
batch_i = 0 # Counter inside the current batch vector
batch_x = [] # The current batch's x data
batch_y = [] # The current batch's y data
while True:
for obj in training_set:
batch_x.append(your input sequences one by one)
if obj['val'] == True:
batch_y.append([1])
elif obj['val'] == False:
batch_y.append([0])
batch_i += 1
if batch_i == batch_size:
# Ready to yield the batch
# pad input to max length in the batch
batch_x = pad_txt_data(batch_x)
yield batch_x, np.array(batch_y)
batch_x = []
batch_y = []
batch_i = 0
def pad_txt_data(arr):
# expecting arr to be in the shape of (10, m, 6)
paded_arr = []
prefered_len = len(max(arr, key=len))
# Now pad all your sequences to preferred length in the batch(arr)
return np.array(paded_arr)
and in the model
model = keras.Sequential()
model.add(keras.layers.Masking(mask_value=0., input_shape=(None,6)))
model.add(keras.layers.LSTM(32))
model.add(keras.layers.Dense(1, activation="softmax"))
model.compile(optimizer="Adam", loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit_generator(data_generater(10), steps_per_epoch=5, epochs=10)
Batch_size, steps_per_epoch, epoch can be different.
Generally
steps_per_epoch = (number of sequences/batch_size)
Note: Form reading your description your task appears to be Binary classification problem not like an Sequence to sequence problem. A good example for sequence to sequence is a language translation. Just google around you will find what i mean.
And if you really want to see the difference in training times I suggest using a GPU if available and CuDNNLSTM.
In case it helps someone, here's how I ended up implementing a solution:
import tensorflow as tf
import numpy as np
# Load data from file
x_list, y_list = loadSequences("train.csv")
# x_list is now a list of arrays (m,n) of float64, where m is the timesteps
# and n is the number of features.
# y_list is a list of arrays (m,1) of Boolean.
assert len(x_list) == len(y_list)
num_sequences = len(x_list)
num_features = len(x_list[0][0])
batch_size = 10
batches_per_epoch = 5
assert batch_size * batches_per_epoch == num_sequences
def train_generator():
# Sort by length so the number of timesteps in each batch is minimized
x_list.sort(key=len)
y_list.sort(key=len)
# Generate batches
while True:
for b in range(batches_per_epoch):
longest_index = (b + 1) * batch_size - 1
timesteps = len(x_list[longest_index])
x_train = np.zeros((batch_size, timesteps, num_features))
y_train = np.zeros((batch_size, timesteps, 1))
for i in range(batch_size):
li = b * batch_size + i
x_train[i, 0:len(x_list[li]), :] = x_list[li]
y_train[i, 0:len(y_list[li]), 0] = y_list[li]
yield x_train, y_train
model = tf.keras.models.Sequential([
tf.keras.layers.Masking(mask_value=0., input_shape=(None,num_features)),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(2, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(train_generator(), steps_per_epoch=batches_per_epoch, epochs=100)

Categories