Averaging over the batch dimension in Keras - python

I've got a problem where I want to predict one time series with many time series. My input is (batch_size, time_steps, features) and my output should be (1, time_steps, features)
I can't figure out how to average over N.
Here's a dummy example. First, dummy data where the output is a linear function of 200 time series:
import numpy as np
time = 100
N = 2000
dat = np.zeros((N, time))
for i in range(time):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
y = dat.T # np.random.normal(size = N)
Now I'll define a time series model (using 1-D conv nets):
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda
from keras.optimizers import Adam
from keras import backend as K
n_filters = 2
filter_width = 3
dilation_rates = [2**i for i in range(5)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Out[43]: (2000, 100, 1)
The output is the wrong shape! Next, I tried using an averaging layer, but I get this weird error:
def av_over_batches(x):
x = K.mean(x, axis = 0)
return(x)
x = Lambda(av_over_batches)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Traceback (most recent call last):
File "<ipython-input-3-d43ccd8afa69>", line 4, in <module>
model.predict(dat.reshape(N, time, 1)).shape
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 302, in predict_loop
outs[i][batch_start:batch_end] = batch_out
ValueError: could not broadcast input array from shape (100,1) into shape (32,1)
Where does 32 come from? (Incidentally, I got the same number in my real data, not just in the MWE).
But the main question is: how can I build a network that averages over the input batch dimension?

I would approach the problem in a different way
Problem: You want to predict a time series from a set of time series. so lets say you have 3 time series value TS1, TS2, TS3 each of 100 time steps you want to predict a time series y1, y2, y3.
My approach for this problem will be as below
i.e group the times series each time step together and feed it to an LSTM. If some time steps are shorter then others them you can pad them. Similarly if some sets have fewer time series then again pad them.
Example:
import numpy as np
np.random.seed(33)
time = 100
N = 5000
k = 5
magic = np.random.normal(size = k)
x = list()
y = list()
for i in range(N):
dat = np.zeros((k, time))
for i in range(k):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
x.append(dat)
y.append(dat.T # magic)
So I want to predict a timeseries of 100 steps from a set of 3 times steps. We want to the model to learn the magic.
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda, LSTM
from keras.optimizers import Adam
from keras import backend as K
import matplotlib.pyplot as plt
input = Input(shape=(time, k))
lstm = LSTM(32, return_sequences=True)(input)
output = Dense(1,activation='sigmoid')(lstm)
model = Model(inputs = input, outputs = output)
model.compile(optimizer = Adam(), loss='mean_squared_error')
data_x = np.zeros((N,100,5))
data_y = np.zeros((N,100,1))
for i in range(N):
data_x[i] = x[i].T.reshape(100,5)
data_y[i] = y[i].reshape(100,1)
from sklearn.preprocessing import StandardScaler
ss_x = StandardScaler()
ss_y = StandardScaler()
data_x = ss_x.fit_transform(data_x.reshape(N,-1)).reshape(N,100,5)
data_y = ss_y.fit_transform(data_y.reshape(N,-1)).reshape(N,100,1)
# Lets leave the last one sample for testing rest split into train and validation
model.fit(data_x[:-1],data_y[:-1], batch_size=64, nb_epoch=100, validation_split=.25)
The val loss was going down still but I stoped it. Lets see how good our prediction is
y_hat = model.predict(data_x[-1].reshape(-1,100,5))
plt.plot(data_y[-1], label='y')
plt.plot(y_hat.reshape(100), label='y_hat')
plt.legend(loc='upper left')
The results are promising. Running it for more epochs and also hyper parameter tuning should further bring us close the the magic. One can also try stacked LSTM and bi-directional LSTM.
I feel RNNs are better suited for time series data rather then CNN's
Data Format:
Lets say time steps = 3
Time series 1 = [1,2,3]
Time series 2 = [4,5,6]
Time series 3 = [7,8,9]
Time series 3 = [10,11,12]
Y = [100,200,300]
For a batch size of 1
[[1,4,7,10],[2,5,8,11],[3,6,9,12]] -> LSTM -> [100,200,300]

Related

Keras network returns same output regardless of the input

I am trying to use keras dense neural networks to forecast some time series.
When fitting my model on complex real datasets, my model converges toward a constant output, i.e. whatever the input, the model gives the same output (which seems to be a reasonable estimate of the mean of my dataset).
I reduced the problem up to very simple simulated datasets, and still have the same issue. Here is a minimal working example:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
X = []
Y = []
for jh in range(10000):
x = np.arange(-1, 1, 0.01)
y = 1+x*((np.random.random()-0.5))
y += np.random.randn(len(x))/(100)
X.append(y[:100])
Y.append(y[100:])
X = np.array(X)[:,:,None]
Y = np.array(Y)[:,:,None]
model = models.Sequential()
model.add(layers.Input((100,1,)))
model.add(layers.Flatten())
model.add(layers.Dense(100, activation='sigmoid'))
model.add(layers.Dense(100, activation='sigmoid'))
model.add(layers.Dense(100, activation='sigmoid'))
model.add(tf.keras.layers.Reshape((100,1)))
model.compile(loss = tf.keras.losses.MeanSquaredError(),optimizer="adam")
# model.summary()
print("Fit model on training data")
print("Fit model on training data")
history = model.fit(x=X, y=Y, batch_size=10000, epochs=200)
for k in np.arange(0,10000,1000):
plt.plot(np.arange(len(X[k])), X[k])
plt.plot(np.arange(len(X[k]), len(X[k])+len(Y[k])), model(X)[k])
plt.plot(np.arange(len(X[k]), len(X[k])+len(Y[k])), Y[k])
In this example, the model returns exactly same output regardless of the input.
I tried to change the number of layers, the loss function, the learning rate, the batch size and the number of epochs, without any noticeable improvement.
Do you have any suggestion on this issue?
If you rearrange your random inputs to be like
y = np.array(1. + x)
y += 1. / 100.
also
J, K = [] , []
for jh in range(10000):
j = np.arange(-1, 1, 0.01)
k = -np.array(1. - j)
k += 1. / 100
J.append(k[:100])
K.append(k[100:])
J = np.array(J)[:, :, None]
K = np.array(K)[:, :, None]
and finally add
plt.plot(np.arange(len(X[k]), len(X[k]) + len(Y[k])), model(J)[k])
in the plotting loop, then you will see two different results. Probably you should check your datasets diversity.

Matrix inverse approximation with keras dense model

I am training a neural network to calculate the inverse of a 3x3 matrix. I am using a Keras dense model with 1 layer and 9 neurons. The activation function on the first layer is 'relu' and linear on the output layer. I am using 10000 matrices of determinant 1. The results I am getting are not very good (RMSE is in the hundreds). I have been trying more layers, more neurons, and other activation functions, but the gain is very small. Here is the code:
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
def generator(nb_samples, matrix_size = 2, entries_range = (0,1), determinant = None):
'''
Generate nb_samples random matrices of size matrix_size with float
entries in interval entries_range and of determinant determinant
'''
matrices = []
if determinant:
inverses = []
for i in range(nb_samples):
matrix = np.random.uniform(entries_range[0], entries_range[1], (matrix_size,matrix_size))
matrix[0] *= determinant/np.linalg.det(matrix)
matrices.append(matrix.reshape(matrix_size**2,))
inverses.append(np.array(np.linalg.inv(matrix)).reshape(matrix_size**2,))
return np.array(matrices), np.array(inverses)
else:
determinants = []
for i in range(nb_samples):
matrix = np.random.uniform(entries_range[0], entries_range[1], (matrix_size,matrix_size))
determinants.append(np.array(np.linalg.det(matrix)).reshape(1,))
matrices.append(matrix.reshape(matrix_size**2,))
return np.array(matrices), np.array(determinants)
### Select number of samples, matrix size and range of entries in matrices
nb_samples = 10000
matrix_size = 3
entries_range = (0, 100)
determinant = 1
### Generate random matrices and determinants
matrices, inverses = generator(nb_samples, matrix_size = matrix_size, entries_range = entries_range, determinant = determinant)
### Select number of layers and neurons
nb_hidden_layers = 1
nb_neurons = matrix_size**2
activation = 'relu'
### Create dense neural network with nb_hidden_layers hidden layers having nb_neurons neurons each
model = Sequential()
model.add(Dense(nb_neurons, input_dim = matrix_size**2, activation = activation))
for i in range(nb_hidden_layers):
model.add(Dense(nb_neurons, activation = activation))
model.add(Dense(matrix_size**2))
model.compile(loss='mse', optimizer='adam')
### Train and save model using train size of 0.66
history = model.fit(matrices, inverses, epochs = 400, batch_size = 100, verbose = 0, validation_split = 0.33)
### Get validation loss from object 'history'
rmse = np.sqrt(history.history['val_loss'][-1])
### Print RMSE and parameter values
print('''
Validation RMSE: {}
Number of hidden layers: {}
Number of neurons: {}
Number of samples: {}
Matrices size: {}
Range of entries: {}
Determinant: {}
'''.format(rmse,nb_hidden_layers,nb_neurons,nb_samples,matrix_size,entries_range,determinant))
I have checked online and there seem to be papers dealing with the problem of inverse matrix approximation. However, before changing the model I would like to know if there would be other parameters I could change that could have a bigger impact on the error. I hope someone can provide some insight. Thank you.
Inverting a 3x3 matrix is pretty difficult for a neural network, as they tend to be bad at multiplying or dividing activations. I wasn't able to get it to work with a simple dense network, but a 7 layer resnet does the trick. It has millions of weights so it needs many more than 10000 examples: I found that it completely memorized up to 100,000 samples and badly overfit even with 10,000,000 samples, so I just generated samples continuously and fed each sample to the network once as it was generated.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
#too_small_model = tf.keras.Sequential([
# tf.keras.layers.Flatten(),
# tf.keras.layers.Dense(1500, activation="relu"),
# tf.keras.layers.Dense(1500, activation="relu"),
# tf.keras.layers.Dense(N * N),
# tf.keras.layers.Reshape([ N, N])
#])
N = 3
inp = tf.keras.layers.Input(shape=[N, N])
x = tf.keras.layers.Flatten()(inp)
x = tf.keras.layers.Dense(128, activation="relu")(x)
for _ in range(7):
skip = x
for _ in range(4):
y = tf.keras.layers.Dense(256, activation="relu")(x)
x = tf.keras.layers.concatenate([x, y])
#x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(128,
kernel_initializer=tf.keras.initializers.Zeros(),
bias_initializer=tf.keras.initializers.Zeros()
)(x)
x = skip + x
#x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(N * N)(x)
x = tf.keras.layers.Reshape([N, N])(x)
model2 = tf.keras.models.Model(inp, x)
model2.compile(loss="mean_squared_error", optimizer=tf.keras.optimizers.Adam(learning_rate=.00001))
for _ in range(5000):
random_matrices = np.random.random((1000000, N, N)) * 4 - 2
random_matrices = random_matrices[np.abs(np.linalg.det(random_matrices)) > .1]
inverses = np.linalg.inv(random_matrices)
inverses = inverses / 5. # normalize target values, large target values hamper training
model2.fit(random_matrices, inverses, epochs=1, batch_size=1024)
zz = model2.predict(random_matrices[:10000])
plt.scatter(inverses[:10000], zz, s=.0001)
print(random_matrices[76] # zz[76] * 5)

Concatenate a layer to itself

I am learning to work with categorical data following the code examples of same old entity embedding by Abhishek Thakur. He is using visual studio one as the python editor, but I am using Google Colab notebook. The code is the same, but I am getting error on a concatenation.
According to the tensor documentation:
layers.Concatenate(axis = along which to concatenate)
The sample code uses:
x=layers.Concatenate()(outputs)
without mentioning the axis to which the concatenation is supposed to happen. Using the code as it is, I get the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-8d7152b44077> in <module>()
----> 1 run(0)
5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/merge.py in build(self, input_shape)
491 # Used purely for shape validation.
492 if not isinstance(input_shape[0], tuple) or len(input_shape) < 2:
--> 493 raise ValueError('A `Concatenate` layer should be called '
494 'on a list of at least 2 inputs')
495 if all(shape is None for shape in input_shape):
ValueError: A `Concatenate` layer should be called on a list of at least 2 inputs
I get the same error even when I add the argument axis = 1.
Here is my sample code:
import os
import gc
import joblib
import pandas as pd
import numpy as np
from sklearn import metrics, preprocessing
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras import Model
from tensorflow.keras.models import load_model
from tensorflow.keras import callbacks
from tensorflow.keras import backend as K
from tensorflow.keras import utils
def create_model(data, catcols):
#init lit of inputs for embedding
inputs = []
#init list of outputs for embeddings
outputs = []
#loop over all categorical colums
for c in catcols:
#find the number of unique values in the column
num_unique_values = int(data[c].nunique())
#simple dimension of embedding calculator
#min size is half of the number of unique values
#max size is 50. max size depends on the number of unique
#categories too.
embed_dim = int(min(np.ceil((num_unique_values)/2), 50))
#simple keras input layer with size 1
inp = layers.Input(shape=(1, ))
#add embedding layer to raw input
#embedding size is always 1 more than uique values in input
out = layers.Embedding(num_unique_values + 1, embed_dim, name = c)(inp)
#1-d spatial dropout is the standard for embedding layers
#you can use it in NLP tasks too
out = layers.SpatialDropout1D(0.3)(out)
#reshape the input to the dimension of embedding
#this becomes our output layer for current feature
out = layers.Reshape(target_shape=(embed_dim, ))(out)
#add input to input list
inputs.append(inp)
#add out to output list
outputs.append(out)
#concatenate all output layers
x = layers.Concatenate(axis=1)(outputs)
#add a batchnorm layer
x = layers.BatchNormalization()(x)
#add layers with dropout
x= layers.Dense(300, acivation="relu")(x)
x= layers.Dropout(0.3)(x)
x= layers.BatchNormalization()(x)
x = layers.Dense(300, activation="relu")(x)
x = layers.Dropout(0.3)(x)
x = layers.BatchNormalization()(x)
y = layers.Dense(2, activation="softmax")(x)
#create final model
model = Model(inputs=inputs, outputs=y)
model.compile(loss='binary_crossentropy', optimizer='adam')
return model_selection
from sklearn import metrics, preprocessing
def run(fold):
df = pd.read_csv('/content/drive/My Drive/train_folds.csv')
features = [f for f in df.columns if f not in ("id", "target", "kfold")]
for col in features:
df.loc[:, col] = df[col].astype(str).fillna("NONE") #fill the nan values with nan. The function "fillna" was created to fill only one column;
# at a time it is not tested against a list of columns.
for feat in features:
lbl_enc = preprocessing.LabelEncoder() #create the label encoder
df.loc[:, feat] = lbl_enc.fit_transform(df[feat].values) #label encode all columns
#split the data
df_train = df[df.kfold != fold].reset_index(drop=True) #get the training data. Remember that we created a column name kfold. if kfold = fold,
#then we train all the other folds but the argument fold.
df_valid = df[df.kfold == fold].reset_index(drop=True) #get the validation data. The argument fold is reserved for validation
#create the model
model = create_model(df, features)
#our features are lists of lists
xtrain = [df_train[features].values[:, k] for k in range(len(features))]
xvalid = [df_valid[features].values[:, k] for k in range(len(features))]
#retrieve target values
ytain = df_train.target.values
yvalid = df_valid.target.values
#convert target columsn to categories
#binarization
ytrain_cat = utils.to_categorical(ytrain)
yvalid_cat = utils.to_categorical(yvalid)
#fit the model
model.fit(xtrain, ytrain_cat, validation_data=(xvalid, yvalid_cat), verbose=1, batch_size=1024, epochs=3)
#generation validation predictions
valid_preds = model.predict(xvalid)[:, 1]
#metrics
print(metrics.roc_auc_score(yvalid, valid_preds))
#save memory space by clearing session
K.clear_session()
I ran the code:
run(0)

Keras : second derivative of a keras model is giving 0 all the time

To calculate the second derivative of a neural network output with respect to the input using a keras model, I implemented codes used in those 2 posts :
Keras: using the sum of the first and second derivatives of a model as final output
Second derivative in Keras
However, for both techniques, the second derivative is always equal to 0. Here is the problem reproduced on a simple regression of the quadratic function:
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.backend import set_session
from tensorflow.keras import datasets, layers, models
from tensorflow.keras import optimizers
import numpy as np
x = np.linspace(0, 1, 1000)
y = x**2
def model_regression():
model = tf.keras.Sequential([
layers.Dense(1024, input_dim=1, activation="relu"),
layers.Dense(1, activation="linear")])
model.compile(optimizer=optimizers.Adam(lr=0.001),
loss='mean_squared_error')
return model
my_model = model_regression()
my_model.fit(x, y, epochs=10, batch_size=20)
y_pred = my_model.predict(x)
#### Technique 1
first_input = K.gradients(my_model.output, my_model.input)
second_input = K.gradients(first_input, my_model.input)
iterate_first = K.function([my_model.input], [first_input])
iterate_second = K.function([my_model.input], [second_input])
#### Technique 2
# def grad(y, x):
# print(y.shape)
# return layers.Lambda(lambda z: K.gradients(z[0], z[1]), output_shape=[1])([y, x])
# derivative1 = grad(my_model.output, my_model.input)
# derivative2 = grad(derivative1, my_model.input)
# iterate_first = K.function([my_model.input], [derivative1])
# iterate_second = K.function([my_model.input], [derivative2])
first_derivative, second_derivative = [], []
for i in range(x.shape[0]):
first_derivative.append(iterate_first(np.array([[x[i]]]))[0][0][0])
second_derivative.append(iterate_second(np.array([[x[i]]]))[0][0][0])
Here is the graphical result, as you can see the second derivative is always equal to 0 while it should approximatively be equal to 2.
How can I compute correctly the second derivative of my neural network in keras ?

Model fit and dimension size error for kera LSTM network in python

Hi I have been working on an LSTM network in python using Keras. I created a 1D array for my training and test set. When I try to fit the model I get the following error:
ValueError: Error when checking input: expected lstm_31_input to have 3 dimensions, but got array with shape (599, 1)
I have tried resizing the dimension and the add(Flatten) layer. None of this work. My code is below:
#Setup
import pandas as pd
import numpy as np
from numpy import array, zeros, newaxis
from numpy import argmax
from keras.layers.core import Dense, Activation, Dropout
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding, Flatten
from keras.layers import LSTM
#Used to ignore warning about some of the tensor command being depracated
#Code from:
#https://stackoverflow.com/questions/43819820/how-to-disable-keras-warnings
#import warnings
#with warnings.catch_warnings():
# warnings.simplefilter("ignore")
"""
#Allow use of modules from the Common_Functions Folder
import sys
sys.path.append('../_Common_Functions')
import Hello_World as helloWorld
"""
#Creates dataset of random numbers
#import numpy as np
from random import random
def generateDatset(n):
val = np.array([])
typ = np.array([])
for i in range (1, n):
val = np.append(val, round(random()*10, 2))
if val[i-1] < 3 or val[i-1] > 7:
typ = np.append(typ, 'm')
else:
typ = np.append(typ, 'f')
return val, typ
# Encode the output labels
def lable_encoding(gender_series):
labels = np.empty((0, 2))
for i in gender_series:
if i == 'm':
labels = np.append(labels, [[1,0]], axis=0)
else:
labels = np.append(labels, [[0,1]], axis=0)
return labels
#Gets dataset in proper format for this program
val, typ = generateDatset(1000)
df = pd.DataFrame( {"first_name": val[:], "gender": typ[:]} )
# Split dataset in 60% train, 20% test and 20% validation
train, validate, test = np.split(df.sample(frac=1), [int(.6*len(df)), int(.8*len(df))])
# Convert both the input names as well as the output lables into the discussed machine readable vector format
train_x = np.asarray(train.first_name)
#train_x = np.reshape(train_x, train_x.shape + (1,))
#train_x = np.reshape(train_x, (train_x.shape[0], 1, train_x.shape[1]))
train_y = lable_encoding(train.gender)
#train_y = np.reshape(train_y, train_y.shape + (1,))
#train_y = np.reshape(train_y, (train_y.shape[0], 1, train_y.shape[1]))
validate_x = np.asarray(validate.first_name)
#validate_x = np.reshape(validate_x, validate_x.shape + (1,))
validate_y = lable_encoding(validate.gender)
#validate_y = np.reshape(validate_y, validate_y.shape + (1,))
test_x = np.asarray(test.first_name)
#test_x = np.reshape(test_x, test_x.shape + (1,))
test_y = lable_encoding(test.gender)
#test_x = np.reshape(test_x, test_x.shape + (1,))
"""
The number of hidden nodes can be determined by the following equation:
Nh = (Ns/ (alpha * Ni + No ) )
Where Ni --> number of input neurons
No --> number of output neurons
Ns --> number of samples
alph --> scaling factor
Alternatively the following equation can be used:
Nh = (2/3)*(Ni + No)
As a not this equation is simpler but may not provide the best performance
"""
#Set a value for the scaling factor.
#This typically ranges between 2 and 10
alpha = 8
hidden_nodes = int(np.size(train_x) / (alpha * ((len(df.columns)-1)+ 4)))
input_length = train_x.shape # Length of the character vector
output_labels = 2 # Number of output labels
from keras import optimizers
# Build the model
print('Building model...')
model = Sequential()
#print(train_x.shape)
#
df = np.expand_dims(df, axis=2)
model.add(LSTM(hidden_nodes, return_sequences=True, input_shape=(599, 1)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(units=output_labels))
model.add(Activation('softmax'))
sgd = optimizers.SGD(lr=0.5, clipnorm=10.)
model.compile(loss='categorical_crossentropy', optimizer= sgd, metrics=['acc'])
#
batch_size=1000
#x = train_x[..., newaxis, newaxis]
#x.shape
#y = train_y[..., newaxis, newaxis]
#y.shape
model.fit(train_x, train_y, batch_size=batch_size, epochs=10)
#http://45.76.113.195/?questions/46616674/expected-ndim-3-found-ndim-2-how-to-feed-sparse-matrix-to-lstm-layer-in-keras
input_shape=(599, 1) specifies the shape of one sample.
In here your train batch size is 599 and shape of one sample is 1. Since the first layer is a LSTM layer it needs 3 dimensional input(batch_size,number_of_time_stamps_of_a_sample,diamentionality_of_one_time_stamp).But we don't mention the batch size in input_shape.So input shape of a LSTM layer should be
input_shape=(number_of_time_stamps_of_a_sample,diamentionality_of_one_time_stamp)
So u should
1)Replace input_shape=(599, 1) with input_shape=(1, 1)
2)Add the following line before training train_x=train_x.reshape(599,1,1)
To be more clear please refer to this video uploaded by me ;-)

Categories