I am training a neural network to calculate the inverse of a 3x3 matrix. I am using a Keras dense model with 1 layer and 9 neurons. The activation function on the first layer is 'relu' and linear on the output layer. I am using 10000 matrices of determinant 1. The results I am getting are not very good (RMSE is in the hundreds). I have been trying more layers, more neurons, and other activation functions, but the gain is very small. Here is the code:
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
def generator(nb_samples, matrix_size = 2, entries_range = (0,1), determinant = None):
'''
Generate nb_samples random matrices of size matrix_size with float
entries in interval entries_range and of determinant determinant
'''
matrices = []
if determinant:
inverses = []
for i in range(nb_samples):
matrix = np.random.uniform(entries_range[0], entries_range[1], (matrix_size,matrix_size))
matrix[0] *= determinant/np.linalg.det(matrix)
matrices.append(matrix.reshape(matrix_size**2,))
inverses.append(np.array(np.linalg.inv(matrix)).reshape(matrix_size**2,))
return np.array(matrices), np.array(inverses)
else:
determinants = []
for i in range(nb_samples):
matrix = np.random.uniform(entries_range[0], entries_range[1], (matrix_size,matrix_size))
determinants.append(np.array(np.linalg.det(matrix)).reshape(1,))
matrices.append(matrix.reshape(matrix_size**2,))
return np.array(matrices), np.array(determinants)
### Select number of samples, matrix size and range of entries in matrices
nb_samples = 10000
matrix_size = 3
entries_range = (0, 100)
determinant = 1
### Generate random matrices and determinants
matrices, inverses = generator(nb_samples, matrix_size = matrix_size, entries_range = entries_range, determinant = determinant)
### Select number of layers and neurons
nb_hidden_layers = 1
nb_neurons = matrix_size**2
activation = 'relu'
### Create dense neural network with nb_hidden_layers hidden layers having nb_neurons neurons each
model = Sequential()
model.add(Dense(nb_neurons, input_dim = matrix_size**2, activation = activation))
for i in range(nb_hidden_layers):
model.add(Dense(nb_neurons, activation = activation))
model.add(Dense(matrix_size**2))
model.compile(loss='mse', optimizer='adam')
### Train and save model using train size of 0.66
history = model.fit(matrices, inverses, epochs = 400, batch_size = 100, verbose = 0, validation_split = 0.33)
### Get validation loss from object 'history'
rmse = np.sqrt(history.history['val_loss'][-1])
### Print RMSE and parameter values
print('''
Validation RMSE: {}
Number of hidden layers: {}
Number of neurons: {}
Number of samples: {}
Matrices size: {}
Range of entries: {}
Determinant: {}
'''.format(rmse,nb_hidden_layers,nb_neurons,nb_samples,matrix_size,entries_range,determinant))
I have checked online and there seem to be papers dealing with the problem of inverse matrix approximation. However, before changing the model I would like to know if there would be other parameters I could change that could have a bigger impact on the error. I hope someone can provide some insight. Thank you.
Inverting a 3x3 matrix is pretty difficult for a neural network, as they tend to be bad at multiplying or dividing activations. I wasn't able to get it to work with a simple dense network, but a 7 layer resnet does the trick. It has millions of weights so it needs many more than 10000 examples: I found that it completely memorized up to 100,000 samples and badly overfit even with 10,000,000 samples, so I just generated samples continuously and fed each sample to the network once as it was generated.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
#too_small_model = tf.keras.Sequential([
# tf.keras.layers.Flatten(),
# tf.keras.layers.Dense(1500, activation="relu"),
# tf.keras.layers.Dense(1500, activation="relu"),
# tf.keras.layers.Dense(N * N),
# tf.keras.layers.Reshape([ N, N])
#])
N = 3
inp = tf.keras.layers.Input(shape=[N, N])
x = tf.keras.layers.Flatten()(inp)
x = tf.keras.layers.Dense(128, activation="relu")(x)
for _ in range(7):
skip = x
for _ in range(4):
y = tf.keras.layers.Dense(256, activation="relu")(x)
x = tf.keras.layers.concatenate([x, y])
#x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(128,
kernel_initializer=tf.keras.initializers.Zeros(),
bias_initializer=tf.keras.initializers.Zeros()
)(x)
x = skip + x
#x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(N * N)(x)
x = tf.keras.layers.Reshape([N, N])(x)
model2 = tf.keras.models.Model(inp, x)
model2.compile(loss="mean_squared_error", optimizer=tf.keras.optimizers.Adam(learning_rate=.00001))
for _ in range(5000):
random_matrices = np.random.random((1000000, N, N)) * 4 - 2
random_matrices = random_matrices[np.abs(np.linalg.det(random_matrices)) > .1]
inverses = np.linalg.inv(random_matrices)
inverses = inverses / 5. # normalize target values, large target values hamper training
model2.fit(random_matrices, inverses, epochs=1, batch_size=1024)
zz = model2.predict(random_matrices[:10000])
plt.scatter(inverses[:10000], zz, s=.0001)
print(random_matrices[76] # zz[76] * 5)
Related
I am trying to use keras dense neural networks to forecast some time series.
When fitting my model on complex real datasets, my model converges toward a constant output, i.e. whatever the input, the model gives the same output (which seems to be a reasonable estimate of the mean of my dataset).
I reduced the problem up to very simple simulated datasets, and still have the same issue. Here is a minimal working example:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
X = []
Y = []
for jh in range(10000):
x = np.arange(-1, 1, 0.01)
y = 1+x*((np.random.random()-0.5))
y += np.random.randn(len(x))/(100)
X.append(y[:100])
Y.append(y[100:])
X = np.array(X)[:,:,None]
Y = np.array(Y)[:,:,None]
model = models.Sequential()
model.add(layers.Input((100,1,)))
model.add(layers.Flatten())
model.add(layers.Dense(100, activation='sigmoid'))
model.add(layers.Dense(100, activation='sigmoid'))
model.add(layers.Dense(100, activation='sigmoid'))
model.add(tf.keras.layers.Reshape((100,1)))
model.compile(loss = tf.keras.losses.MeanSquaredError(),optimizer="adam")
# model.summary()
print("Fit model on training data")
print("Fit model on training data")
history = model.fit(x=X, y=Y, batch_size=10000, epochs=200)
for k in np.arange(0,10000,1000):
plt.plot(np.arange(len(X[k])), X[k])
plt.plot(np.arange(len(X[k]), len(X[k])+len(Y[k])), model(X)[k])
plt.plot(np.arange(len(X[k]), len(X[k])+len(Y[k])), Y[k])
In this example, the model returns exactly same output regardless of the input.
I tried to change the number of layers, the loss function, the learning rate, the batch size and the number of epochs, without any noticeable improvement.
Do you have any suggestion on this issue?
If you rearrange your random inputs to be like
y = np.array(1. + x)
y += 1. / 100.
also
J, K = [] , []
for jh in range(10000):
j = np.arange(-1, 1, 0.01)
k = -np.array(1. - j)
k += 1. / 100
J.append(k[:100])
K.append(k[100:])
J = np.array(J)[:, :, None]
K = np.array(K)[:, :, None]
and finally add
plt.plot(np.arange(len(X[k]), len(X[k]) + len(Y[k])), model(J)[k])
in the plotting loop, then you will see two different results. Probably you should check your datasets diversity.
I am training a keras dense model to approximate the determinant of 2x2 matrices. I am using 30 hidden layers with 100 nodes each and 10E6 matrices (with entries in the interval [0,100[). After predicting on the test set (33.3% of total) I calculate the square root of the MSE and get something usually not greater than 100. I think this is quite a high error (although I am not sure about what could be considered a good error in this case), but besides increasing the number of samples, I am not sure how I could improve it (already 10E6 seems like a big number). I hope someone can provide some advice. Here is the code:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
### Select number of samples, matrix size and range of entries in matrices
nb_samples = 1000000
matrix_size = 2
entries_range = 100
### Generate random matrices and determinants
matrices = []
determinants = []
for i in range(nb_samples):
matrix = np.random.randint(entries_range, size = (matrix_size,matrix_size))
matrices.append(matrix.reshape(matrix_size**2,))
determinants.append(np.array(np.linalg.det(matrix)).reshape(1,))
matrices = np.array(matrices)
determinants = np.array(determinants)
### Split the data
matrices_train, matrices_test, determinants_train, determinants_test = train_test_split(matrices,determinants,train_size = 0.66)
### Select number of layers and neurons
nb_layers = 30
nb_neurons = 100
### Create dense neural network with nb_layers hidden layers having nb_neurons neurons each
model = Sequential()
model.add(Dense(nb_neurons, input_dim = matrix_size**2, activation='relu'))
for i in range(nb_layers):
model.add(Dense(nb_neurons, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
model.fit(matrices_train, determinants_train, epochs = 10, batch_size = 100, verbose = 0)
#_ , test_acc = model.evaluate(matrices_test,determinants_test)
#print(test_acc)
### Make a prediction on the test set
determinants_pred = model.predict(matrices_test)
print('''
RMSE: {}
Number of layers: {}
Number of neurons: {}
Number of samples: {}
'''.format(np.sqrt(mean_squared_error(determinants_test,determinants_pred)),nb_layers,nb_neurons,nb_samples))
Here is an output:
RMSE: 20.429616387932295
Number of layers: 32
Number of neurons: 32
Number of samples: 1000000
Note: I decided to go for 30 layers and 100 nodes in each by trial and error (the MSE seemed the lowest around these values).
I think your network is massive for the size of the problem (input dim = 4 output = 1) and you do not have nearly enough epochs.
also we can cheat a bit here since we know the calculation can basically be represented in terms of squares of linear combinations of inputs, we can use a x*x custom activation function. Here is an example, 10 neurons, 1 hidden layer, custom activation function as above, epochs = 1000, nsamples = 10000, produces
RMSE: 0.04413008355924881
Number of layers: 1
Number of neurons: 10
Number of samples: 10000
here is your code in full with my small modifications
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
### Select number of samples, matrix size and range of entries in matrices
nb_samples = 10000#00
matrix_size = 2
entries_range = 100
### Generate random matrices and determinants
matrices = []
determinants = []
for i in range(nb_samples):
matrix = np.random.randint(entries_range, size = (matrix_size,matrix_size))
matrices.append(matrix.reshape(matrix_size**2,))
determinants.append(np.array(np.linalg.det(matrix)).reshape(1,))
matrices = np.array(matrices)
determinants = np.array(determinants)
### Split the data
matrices_train, matrices_test, determinants_train, determinants_test = train_test_split(matrices,determinants,train_size = 0.66)
### Select number of layers and neurons
nb_layers = 1#30
nb_neurons = 10#0
### Create dense neural network with nb_layers hidden layers having nb_neurons neurons each
model = Sequential()
model.add(Dense(nb_neurons, input_dim = matrix_size**2, activation=lambda x:x*x))
#for i in range(nb_layers):
# model.add(Dense(nb_neurons, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
model.fit(matrices_train, determinants_train, epochs = 1000, batch_size = 100, verbose = 1)
#_ , test_acc = model.evaluate(matrices_test,determinants_test)
#print(test_acc)
### Make a prediction on the test set
determinants_pred = model.predict(matrices_test)
print('''
RMSE: {}
Number of layers: {}
Number of neurons: {}
Number of samples: {}
'''.format(np.sqrt(mean_squared_error(determinants_test,determinants_pred)),nb_layers,nb_neurons,nb_samples))
I'm trying to execute a Bayesian Neural Network that I found on the paper "Uncertainty on Deep Learning", Yarin Gal. I found this code on GitHub:
import math
from scipy.misc import logsumexp
import numpy as np
from keras.regularizers import l2
from keras import Input
from keras.layers import Dropout
from keras.layers import Dense
from keras import Model
import time
class net:
def __init__(self, X_train, y_train, n_hidden, n_epochs = 40,
normalize = False, tau = 1.0, dropout = 0.05):
"""
Constructor for the class implementing a Bayesian neural network
trained with the probabilistic back propagation method.
#param X_train Matrix with the features for the training data.
#param y_train Vector with the target variables for the
training data.
#param n_hidden Vector with the number of neurons for each
hidden layer.
#param n_epochs Number of epochs for which to train the
network. The recommended value 40 should be
enough.
#param normalize Whether to normalize the input features. This
is recommended unless the input vector is for
example formed by binary features (a
fingerprint). In that case we do not recommend
to normalize the features.
#param tau Tau value used for regularization
#param dropout Dropout rate for all the dropout layers in the
network.
"""
# We normalize the training data to have zero mean and unit standard
# deviation in the training set if necessary
if normalize:
self.std_X_train = np.std(X_train, 0)
self.std_X_train[ self.std_X_train == 0 ] = 1
self.mean_X_train = np.mean(X_train, 0)
else:
self.std_X_train = np.ones(X_train.shape[ 1 ])
self.mean_X_train = np.zeros(X_train.shape[ 1 ])
X_train = (X_train - np.full(X_train.shape, self.mean_X_train)) / \
np.full(X_train.shape, self.std_X_train)
self.mean_y_train = np.mean(y_train)
self.std_y_train = np.std(y_train)
y_train_normalized = (y_train - self.mean_y_train) / self.std_y_train
y_train_normalized = np.array(y_train_normalized, ndmin = 2).T
# We construct the network
N = X_train.shape[0]
batch_size = 128
lengthscale = 1e-2
reg = lengthscale**2 * (1 - dropout) / (2. * N * tau)
inputs = Input(shape=(X_train.shape[1],))
inter = Dropout(dropout)(inputs, training=True)
inter = Dense(n_hidden[0], activation='relu', W_regularizer=l2(reg))(inter)
for i in range(len(n_hidden) - 1):
inter = Dropout(dropout)(inter, training=True)
inter = Dense(n_hidden[i+1], activation='relu', W_regularizer=l2(reg))(inter)
inter = Dropout(dropout)(inter, training=True)
outputs = Dense(y_train_normalized.shape[1], W_regularizer=l2(reg))(inter)
model = Model(inputs, outputs)
model.compile(loss='mean_squared_error', optimizer='adam')
# We iterate the learning process
start_time = time.time()
model.fit(X_train, y_train_normalized, batch_size=batch_size, nb_epoch=n_epochs, verbose=0)
self.model = model
self.tau = tau
self.running_time = time.time() - start_time
# We are done!
def predict(self, X_test, y_test):
"""
Function for making predictions with the Bayesian neural network.
#param X_test The matrix of features for the test data
#return m The predictive mean for the test target variables.
#return v The predictive variance for the test target
variables.
#return v_noise The estimated variance for the additive noise.
"""
X_test = np.array(X_test, ndmin = 2)
y_test = np.array(y_test, ndmin = 2).T
# We normalize the test set
X_test = (X_test - np.full(X_test.shape, self.mean_X_train)) / \
np.full(X_test.shape, self.std_X_train)
# We compute the predictive mean and variance for the target variables
# of the test data
model = self.model
standard_pred = model.predict(X_test, batch_size=500, verbose=1)
standard_pred = standard_pred * self.std_y_train + self.mean_y_train
rmse_standard_pred = np.mean((y_test.squeeze() - standard_pred.squeeze())**2.)**0.5
T = 10000
Yt_hat = np.array([model.predict(X_test, batch_size=500, verbose=0) for _ in range(T)])
Yt_hat = Yt_hat * self.std_y_train + self.mean_y_train
MC_pred = np.mean(Yt_hat, 0)
rmse = np.mean((y_test.squeeze() - MC_pred.squeeze())**2.)**0.5
# We compute the test log-likelihood
ll = (logsumexp(-0.5 * self.tau * (y_test[None] - Yt_hat)**2., 0) - np.log(T)
- 0.5*np.log(2*np.pi) + 0.5*np.log(self.tau))
test_ll = np.mean(ll)
# We are done!
return rmse_standard_pred, rmse, test_ll
I'm new at programming, so I have to study Classes on Python to understand the code. But my answer goes when I try to execute the code, but it ask a "vector with the numbers of neurons for each hidden layer", and I don't know how to create this vector, and which does it mean for the code. I've tried to create different vectors, like
vector = np.array([1, 2, 3]) but sincerely I don't know the correct answer. The only I have is the feature data and the target data. I hope you can help me.
That syntax is correct vector = np.array([1, 2, 3]). That is the way to define a vector in python's numpy.
A neural network can have any number o hidden (internal) layers. And each layer will have a certain number of neurons.
So in this code, a vector=np.array([100, 150, 100]), means that the network should have 3 hidden layers (because the vector has 3 values), and the hidden layers should have, from input to output 100, 150, 100 neurons respectively.
I am currently trying to learn logistic regression, and am stuck on plotting a line from the weights after training. I am expecting an array of 3 values, but when I print the weights to check them, I get (with different values each time, but the same format):
[array([[ 0.42433906],
[-0.67847246]], dtype=float32)
array([-0.06681705], dtype=float32)]
My question, is why are the weights in this format of 2 arrays, rather than 1 array of length 3? And how do I interpret these weights so that I can plot the separating line?
Here is my code:
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.regularizers import L1L2
import random
import numpy as np
# return the array data of shape (m, 2) and the array labels of shape (m, 1)
def get_random_data(w, b, mu, sigma, m): # slope, y-intercept, mean of the data, standard deviation, size of arrays
data = np.empty((m, 2))
labels = np.empty((m, 1))
# fill the arrays with random data
for i in range(m):
c = (random.random() > 0.5) # 0 with probability 1/2 and 1 with probability 1/2
n = random.normalvariate(mu, sigma) # noise using normal distribution
x_1 = random.random() # uniform distribution on [0, 1)
x_2 = w * x_1 + b + (-1)**c * n
labels[i] = c
data[i][0] = x_1
data[i][1] = x_2
# the train set is the first 80% of our data, and the test set is the following 20%
train_length = int(round(m * 0.8, 1))
train_data = np.empty((train_length, 2))
train_labels = np.empty((train_length, 1))
test_data = np.empty((m - train_length, 2))
test_labels = np.empty((m - train_length, 1))
for i in range(train_length):
train_data[i] = data[i]
train_labels[i] = labels[i]
for i in range(train_length, m):
test_data[i - train_length] = data[i]
test_labels[i - train_length] = labels[i]
return (train_data, train_labels), (test_data, test_labels)
(train_data, train_labels), (test_data, test_labels) = get_random_data(2,3,100,100,200)
model = Sequential()
model.add(Dense(train_labels.shape[1],
activation='sigmoid',
kernel_regularizer=L1L2(l1=0.0, l2=0.1),
input_dim=(train_data.shape[1])))
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=100, validation_data=(test_data,test_labels))
weights = np.asarray(model.get_weights())
print("the weights are " , weights)
The first index of the array shows the weights of coefficients and the second array shows the bias.
So you have a equation like below.
h(x) = 0.42433906x1 + -0.67847246x2 + -0.06681705
Logistic regression takes this equation and applies sigmoid function to squeeze the results between 0-1.
So if you want to draw an equation of a line, you can use do it with the returned weights like I explained above.
I've got a problem where I want to predict one time series with many time series. My input is (batch_size, time_steps, features) and my output should be (1, time_steps, features)
I can't figure out how to average over N.
Here's a dummy example. First, dummy data where the output is a linear function of 200 time series:
import numpy as np
time = 100
N = 2000
dat = np.zeros((N, time))
for i in range(time):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
y = dat.T # np.random.normal(size = N)
Now I'll define a time series model (using 1-D conv nets):
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda
from keras.optimizers import Adam
from keras import backend as K
n_filters = 2
filter_width = 3
dilation_rates = [2**i for i in range(5)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Out[43]: (2000, 100, 1)
The output is the wrong shape! Next, I tried using an averaging layer, but I get this weird error:
def av_over_batches(x):
x = K.mean(x, axis = 0)
return(x)
x = Lambda(av_over_batches)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Traceback (most recent call last):
File "<ipython-input-3-d43ccd8afa69>", line 4, in <module>
model.predict(dat.reshape(N, time, 1)).shape
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 302, in predict_loop
outs[i][batch_start:batch_end] = batch_out
ValueError: could not broadcast input array from shape (100,1) into shape (32,1)
Where does 32 come from? (Incidentally, I got the same number in my real data, not just in the MWE).
But the main question is: how can I build a network that averages over the input batch dimension?
I would approach the problem in a different way
Problem: You want to predict a time series from a set of time series. so lets say you have 3 time series value TS1, TS2, TS3 each of 100 time steps you want to predict a time series y1, y2, y3.
My approach for this problem will be as below
i.e group the times series each time step together and feed it to an LSTM. If some time steps are shorter then others them you can pad them. Similarly if some sets have fewer time series then again pad them.
Example:
import numpy as np
np.random.seed(33)
time = 100
N = 5000
k = 5
magic = np.random.normal(size = k)
x = list()
y = list()
for i in range(N):
dat = np.zeros((k, time))
for i in range(k):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
x.append(dat)
y.append(dat.T # magic)
So I want to predict a timeseries of 100 steps from a set of 3 times steps. We want to the model to learn the magic.
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda, LSTM
from keras.optimizers import Adam
from keras import backend as K
import matplotlib.pyplot as plt
input = Input(shape=(time, k))
lstm = LSTM(32, return_sequences=True)(input)
output = Dense(1,activation='sigmoid')(lstm)
model = Model(inputs = input, outputs = output)
model.compile(optimizer = Adam(), loss='mean_squared_error')
data_x = np.zeros((N,100,5))
data_y = np.zeros((N,100,1))
for i in range(N):
data_x[i] = x[i].T.reshape(100,5)
data_y[i] = y[i].reshape(100,1)
from sklearn.preprocessing import StandardScaler
ss_x = StandardScaler()
ss_y = StandardScaler()
data_x = ss_x.fit_transform(data_x.reshape(N,-1)).reshape(N,100,5)
data_y = ss_y.fit_transform(data_y.reshape(N,-1)).reshape(N,100,1)
# Lets leave the last one sample for testing rest split into train and validation
model.fit(data_x[:-1],data_y[:-1], batch_size=64, nb_epoch=100, validation_split=.25)
The val loss was going down still but I stoped it. Lets see how good our prediction is
y_hat = model.predict(data_x[-1].reshape(-1,100,5))
plt.plot(data_y[-1], label='y')
plt.plot(y_hat.reshape(100), label='y_hat')
plt.legend(loc='upper left')
The results are promising. Running it for more epochs and also hyper parameter tuning should further bring us close the the magic. One can also try stacked LSTM and bi-directional LSTM.
I feel RNNs are better suited for time series data rather then CNN's
Data Format:
Lets say time steps = 3
Time series 1 = [1,2,3]
Time series 2 = [4,5,6]
Time series 3 = [7,8,9]
Time series 3 = [10,11,12]
Y = [100,200,300]
For a batch size of 1
[[1,4,7,10],[2,5,8,11],[3,6,9,12]] -> LSTM -> [100,200,300]