I am trying to implement an algorithm from a paper, using keras, where they train a neural network to approximate a mathematical function f(x) with limited amount of data points. I want the input of the neural network to be x and the output in the form of f(x) = 1 + xN(x), where N(x) is the value from the final dense layer.
I know how to make it work for output f(x) = N(x) but I just don't know how to adjust the network for f(x) = 1 + xN(x). Can someone help me?
This is my current code
from keras.layers import Input, Dense, Add, Multiply
from keras.models import Model
import keras.backend as K
import matplotlib.pyplot as plt
import numpy as np
import time
def f(x):
return x**2
Xtrain = np.linspace(0, 1, 10)
ytrain = np.array([f(x) for x in Xtrain])
X = np.linspace(0, 2, 100)
y = np.array([f(x) for x in X])
input = Input(shape=(1,))
init = np.ones(shape=(10, 1))
init = K.variable(init)
hidden = input
hidden = Dense(8, activation='relu')(hidden)
out = Dense(1, activation='linear')(hidden)
out = Add()([init, Multiply()([out, input])])
model = Model(inputs=input, outputs=out)
model.compile(loss='mean_squared_error', optimizer="adam")
tic = time.perf_counter()
model.fit(Xtrain, ytrain, epochs=1000, verbose=1)
toc = time.perf_counter()
print(f"Training time: {toc - tic:0.4f} seconds")
prediction = model.predict(X)
prediction = prediction.reshape((100,))
plt.figure(figsize=(10,5))
plt.plot(X, y, color='red', label='Analytical solution')
plt.plot(X, prediction, color='black', label = 'Prediction')
plt.scatter(Xtrain, ytrain, color='blue', label='Training points')
plt.legend()
plt.show()
plt.tight_layout()
which crashes at line
out = Add()([init, Multiply()([out, input])])
The Add layer is working between two layers and between a layer and a number/ndarray.
you can just use it like this:
init=np.ones(shape=(10, 1))
inp = Input(shape=(1,))
hidden = Dense(8, activation='relu')(inp)
out = Dense(1, activation='linear')(hidden)
mul=Multiply()([out, inp])
out = Add()([init, mul])
model = Model(inputs=inp, outputs=out)
model.compile(loss='mean_squared_error', optimizer="adam")
I checked it and it worked.
by the way, input is a builtin function, I don't recommend to use it unless you want to use it.
Related
I am modelling a reactor operation with neural network which has 4 inputs (X1, X2, X3 and X3) and 3 outputs (Y1, Y2 and Y3) using PYTHON - scilearn. The predicted outputs should als respect a mass balance which can be represented by an equation represented by Y1+Y2+Y3-massflow in = 0. How can this equation or restriction be integrated in neural network? Is this possible?
My dataset is already cleaned by removing the samples where the mass balance is not honored.
Code
train_x = ...
train_y = ...
test_x = ...
test_y = ...
processed_data = data[["Y1.PV", "Y2.PV", "Y3.PV", "X1.PV", "X2.PV", "X3.PV", "X4.PV"]]
train, test = train_test_split(processed_data, test_size=.25, random_state=np.random.RandomState(1))
train_x = train.drop(["Y1.PV", "Y2.PV", "Y3.PV"], axis=1)
train_y = train.drop(["Y1.PV", "Y2.PV", "Y3.PV"], axis=1)
test_x = test.drop([""Y1.PV", "Y2.PV", "Y3.PV"], axis=1)
test_y = test.drop(["Y1.PV", "Y2.PV", "Y3.PV"], axis=1)
### TUNED MODEL
best_iter = 2000
best_hidden_layer = 200
neural_net = MLPRegressor(solver='lbfgs', hidden_layer_sizes=best_hidden_layer, random_state=1, max_iter=best_iter)
scaler = StandardScaler()
model = make_pipeline(scaler, neural_net)
model.fit(train_x, train_y)
### END SOLUTION
predict_y_train = model.predict(train_x)
predict_y = model.predict(test_x)
fig, axs = plt.subplots(ncols=2, figsize=(10,4))
axs[0].scatter(train_y, predict_y_train)
axs[0].set_title("Train data")
axs[1].scatter(test_y, predict_y)
axs[1].set_title("Test data")
print("MSE train:", mean_squared_error(train_y, predict_y_train))
print("MSE test:", mean_squared_error(test_y, predict_y))
A simple variable reduction approach could be to reduce the network's outputs to Y1 and Y2 and to replace any occurence of Y3 (e.g. in the loss function) with massflow-Y1-Y2.
EDIT It seems that scikit-learn only provides very specialized methods for training neural networks and is not suited for your problem. If, instead, you want to try tensorflow, then here is a minimum working example to get you started:
import numpy as np
import tensorflow as tf
def constraint_mse_loss(y_true, y_pred):
massflow = 2
y3 = tf.expand_dims(massflow-y_pred[:,0]-y_pred[:,1],axis=1)
y_pred = tf.concat((y_pred,y3),axis=1)
return tf.keras.losses.mean_squared_error(y_true, y_pred)
def main():
massflow = 2
# generate random training data
rng = np.random.default_rng()
X = rng.random((100,4))
y = rng.random((100,2))
y = np.concatenate((y,massflow-y[:,[0]]-y[:,[1]]), axis=1)
# define model
model = tf.keras.Sequential([
tf.keras.Input(shape=4),
tf.keras.layers.Dense(100, activation="relu"),
tf.keras.layers.Dense(50, activation="relu"),
tf.keras.layers.Dense(25, activation="relu"),
tf.keras.layers.Dense(2, activation="linear")
])
model.compile(
loss = constraint_mse_loss,
optimizer = tf.keras.optimizers.Adam()
)
# run training
model.fit(X,y,
batch_size = 16,
epochs = 10,
shuffle = True,
validation_split = 0.2
)
if __name__ == "__main__":
main()
I used randomly generated data here and set massflow=2 at random, replace this with your actual data. The proposed model architecture as well as all hyperparameters are meant as examples. The main point here lies in the custom loss function constraint_mse_loss that implements a mean squared error loss together with the required massflow constraint. Once trained, the model only outputs Y1 and Y2, while Y3 can be retrieved as Y3=massflow-Y1-Y2.
FURTHER EDIT Things are quite different if massflow, as you now clarified, actually varies with each sample. That makes the massflow itself an input to the model. The following minimum working example covers this situation better:
import numpy as np
import tensorflow as tf
def main():
# generate random training data
rng = np.random.default_rng()
X = rng.random((100,4))
massflow = rng.random((100,1))
X = {'X': X, 'massflow': massflow}
y = rng.random((100,2))
y = np.concatenate((y,massflow-y[:,[0]]-y[:,[1]]), axis=1)
inputs = {
'X': tf.keras.layers.Input(shape=(4,), name='X', dtype=tf.float32),
'massflow': tf.keras.layers.Input(shape=(1,), name='massflow', dtype=tf.float32)
}
x = tf.keras.layers.Dense(100)(inputs['X'])
x = tf.keras.layers.Dense(50, activation="relu")(x)
x = tf.keras.layers.Dense(25, activation="relu")(x)
y1 = tf.keras.layers.Dense(1, activation="linear")(x)
y2 = tf.keras.layers.Dense(1, activation="linear")(x)
y3 = tf.keras.layers.Add()([y1,y2])
y3 = tf.keras.layers.Subtract()([inputs['massflow'],y3])
outputs = tf.keras.layers.Concatenate(name='y')([y1,y2,y3])
model = tf.keras.Model(inputs = inputs, outputs = outputs)
model.compile(
loss = tf.keras.losses.MeanSquaredError(),
optimizer = tf.keras.optimizers.Adam()
)
# run training
model.fit(X,y,
batch_size = 16,
epochs = 10,
shuffle = True,
validation_split = 0.2
)
X = {'X': rng.random((1,4)), 'massflow': rng.random((1,1))}
y = model.predict(X)
print(f"Massflow {X['massflow'][0,0]:.4f} should be equal to sum of outputs {y.sum():.4f}.")
if __name__ == "__main__":
main()
Here, enforcing your massflow constraint is part of the model. Its outputs now are Y1, Y2 and Y3 where the relation Y3=massflow-Y1-Y2 is enforced. As a nice side-effect, we now can work with the built-in MSE loss and don't need to use a custom loss implementation. Again, the proposed model architecture and all hyperparameters are meant as examples and must still be adjusted to your data.
I am using TF2 (2.3.0) NN to approximate the function y which solves the ODE: y'+3y=0
I have defined cutsom loss class and function in which I am trying to differentiate the single output with respect to the single input so the equation holds, provided that y_true is zero:
from tensorflow.keras.losses import Loss
import tensorflow as tf
class CustomLossOde(Loss):
def __init__(self, x, model, name='ode_loss'):
super().__init__(name=name)
self.x = x
self.model = model
def call(self, y_true, y_pred):
with tf.GradientTape() as tape:
tape.watch(self.x)
y_p = self.model(self.x)
dy_dx = tape.gradient(y_p, self.x)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
return loss
but running the following NN:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
from custom_loss_ode import CustomLossOde
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
loss = CustomLossOde(model.input, model)
model.compile(optimizer=Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99),loss=loss)
model.run_eagerly = True
model.fit(x_train, y_train, batch_size=16, epochs=30)
for now I am getting 0 loss from the fisrt epoch, which doesn't make any sense.
I have printed both y_true and y_test from within the function and they seem OK so I suspect that the problem is in the gradien which I didn't succeed to print.
Apprecitate any help
Defining a custom loss with the high level Keras API is a bit difficult in that case. I would instead write the training loop from scracth, as it allows a finer grained control over what you can do.
I took inspiration from those two guides :
Advanced Automatic Differentiation
Writing a training loop from scratch
Basically, I used the fact that multiple tape can interact seamlessly. I use one to compute the loss function, the other to calculate the gradients to be propagated by the optimizer.
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
# using the high level tf.data API for data handling
x_train = tf.reshape(x_train,(-1,1))
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(1)
opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99)
for step, (x,y_true) in enumerate(dataset):
# we need to convert x to a variable if we want the tape to be
# able to compute the gradient according to x
x_variable = tf.Variable(x)
with tf.GradientTape() as model_tape:
with tf.GradientTape() as loss_tape:
loss_tape.watch(x_variable)
y_pred = model(x_variable)
dy_dx = loss_tape.gradient(y_pred, x_variable)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
grad = model_tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
if step%20==0:
print(f"Step {step}: loss={loss.numpy()}")
So, there is the universal approximation theorem which says that a neural network can approximate any continuous function, provided it has at least one hidden layer and uses non-linear activation there.
So my doubt is as follows: "How do I approximate a function using neural networks with my input being other functions?"
Let's say I want to approximate y = x + 1 and I have z_1 = 2x, z_2 = 3x + 3 and z_3 = 4x + 1, with x being time variant. What I want my model to learn is the relationship between z_1, z_2, z_3 and y, as I may write *y = -6 * z_1 - 1 * z_2 + 4 z_3* ( I want my network to learn this relationship).
From time 0 to T I have the value of all functions and can do a supervised learning, but from (T + 1) +, I will only have z_1, z_2 and z_3 and so, I would be using the network to approximate the future values of y based on these z functions (z_1, z_2, z_3).
How do I implement that on python using Keras? I used the following code but didn't get any decent results.
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
n = 10000
def z_1(x):
x_0 = []
for i in x:
x_0.append(2*i)
return x_0
def z_2(x):
x_0 = []
for i in x:
x_0.append(3*i + 3)
return x_0
def z_3(x):
x_0 = []
for i in x:
x_0.append(4* i + 1)
return x_0
def z_0(x):
x_0 = []
for i in x:
x_0.append(i + 1)
return x_0
model = Sequential()
model.add(Dense(500, activation='relu', input_dim=3))
model.add(Dense(500, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
np.random.seed(seed = 2000)
input = np.random.random(n) * 10
dataset = z_0(input)
input_1 = z_1(input)
input_2 = z_2(input)
input_3 = z_3(input)
x_train = np.array([input_1[0:int(0.8*n)], input_2[0:int(0.8*n)], input_3[0:int(0.8*n)]])
y_train = np.array([dataset[0:int(0.8*n)]])
x_train = x_train.reshape(int(0.8*n), 3)
y_train = y_train.reshape(int(0.8*n),1)
es = keras.callbacks.EarlyStopping(monitor='val_loss',
min_delta=0,
patience=0,
verbose=0, mode='auto')
model.fit(x_train, y_train, epochs=100, batch_size=128, callbacks = [es])
x_test = np.array([input_1[int(n-100):n], input_2[int(n-100):n], input_3[int(n-100):n]])
x_test = x_test.reshape(int(100), 3)
classes = model.predict(x_test, batch_size=128)
y_test = np.array([dataset[int(n-100):n]]).reshape(int(100),1)
plt.plot(y_test,c='b', label = 'test data')
plt.plot(classes,c='r', label = 'test result')
plt.legend()
plt.show()
You can't do this with a feedforward neural network. You need to do this with recurrent neural networks. Look up LSTM or GRU cells in Keras.
https://keras.io/layers/recurrent/
So I've just started experimenting a bit with tensorflow but I feel like I have a hard time grasping the concept, I'm currently focusing on the MNIST-dataset, but only 8000 of them as training and 2000 for testing. The little code snippet I have currently is:
from keras.layers import Input, Dense, initializers
from keras.models import Model
from Dataset import Dataset
import matplotlib.pyplot as plt
from keras import optimizers, losses
import tensorflow as tf
import keras.backend as K
#global variables
d = Dataset()
num_features = d.X_train.shape[1]
low_dim = 32
def autoencoder():
w = initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)
input = Input(shape=(num_features,))
encoded = Dense(low_dim, activation='relu', kernel_initializer = w)(input)
decoded = Dense(num_features, activation='sigmoid', kernel_initializer = w)(encoded)
autoencoder = Model(input, decoded)
adam = optimizers.Adagrad(lr=0.01, epsilon=None, decay=0.0)
autoencoder.compile(optimizer=adam, loss='binary_crossentropy')
autoencoder.fit(d.X_train, d.X_train,
epochs=50,
batch_size=64,
shuffle=True,
)
encoded_imgs = autoencoder.predict(d.X_test)
decoded_imgs = autoencoder.predict(encoded_imgs)
#sess = tf.InteractiveSession()
#error = losses.mean_absolute_error(decoded_imgs[0], d.X_train[0])
#print(error.eval())
#print(decoded_imgs.shape)
#sess.close()
n = 20 # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
#sess = tf.InteractiveSession()
error = losses.mean_absolute_error(decoded_imgs[n], d.X_test[n])
#print(error.eval())
#print(decoded_imgs.shape)
#sess.close()
ax = plt.subplot(2, n, i + 1)
plt.imshow(d.X_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
#print(error)
plt.show()
return error
What I want to do is to store the error as a list which I later can print or plot in a graph, but how do you do this efficiently with tensorflow/keras? Thanks in advance
You can store the errors inside a csv file by using the callback CSVLogger. This is a code snippet for this task.
from keras.callbacks import CSVLogger
# define callbacks
callbacks = [CSVLogger(path_csv_logger, separator=';', append=True)]
# pass callback to model.fit() oder model.fit_generator()
model.fit_generator(
train_batch, train_steps, epochs=10, callbacks=callbacks,
validation_data=validation_batch, validation_steps=val_steps)
EDIT: For storing the errors in list you can use something like this
# source https://keras.io/callbacks/
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
I'm new to machine learning and trying to fit a sample data set with neural networks in python using tensorflow. After having implemented the neural network in Dymola I want to compare the outputs of the function with those from the neural network.
The sample data set is:
import tensorflow as tf
from keras import metrics
import numpy as np
from keras.models import *
from keras.layers import Dense, Dropout
from keras import optimizers
from keras.callbacks import *
import scipy.io as sio
import mat4py as m4p
inputs = np.linspace(0, 15, num=3000)
outputs = 1/7 * ((inputs/5)^3 - (inputs/3)^2 + 5)
Inputs and outputs are then scaled into the interval [0; 0.9]:
inputs_max = np.max(inputs)
inputs_min = np.min(inputs)
outputs_max = np.max(outputs)
outputs_min = np.min(outputs)
upper_bound = 0.9
lower_bound = 0
m_in = (upper_bound - lower_bound) / (inputs_max - inputs_min)
c_in = upper_bound - (m_in * inputs_max)
scaled_in = m_in * inputs + c_in
m_out = (upper_bound - lower_bound) / (outputs_max - outputs_min)
c_out = upper_bound - (m_out * outputs_max)
scaled_out = m_in * inputs + c_in
and after that the neural network is trained with:
# shuffle values
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = np.empty(a.shape, dtype=a.dtype)
shuffled_b = np.empty(b.shape, dtype=b.dtype)
permutation = np.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
tf_features_64 = scaled_in
tf_labels_64 = scaled_out
tf_features_32 = tf_features_64.astype(np.float32)
tf_labels_32 = tf_labels_64.astype(np.float32)
X = tf_features_32
Y = tf_labels_32
shuffle_in_unison(X, Y)
# define callbacks
filepath = "weights-improvement-{epoch:02d}-{val_loss:.2f}.hdf5"
savebestCallBack = ModelCheckpoint(filepath, monitor='val_loss', verbose=1,
save_best_only=True, save_weights_only=False, mode='auto', period=1)
tbCallBack = TensorBoard(log_dir='./Graph',
histogram_freq=5,
write_graph=True,
write_images=True)
esCallback = EarlyStopping(monitor='val_loss',
min_delta=0,
patience=500,
verbose=0,
mode='min')
# neural network architecture
visible = Input(shape=(1,))
x = Dense(40, activation='tanh')(visible)
x = Dense(39, activation='tanh')(x)
x = Dense(38, activation='tanh')(x)
x = Dense(30, activation='tanh')(x)
output = Dense(1)(x)
# setup optimizer
Optimizer = optimizers.adam(lr=0.0007, amsgrad=True)
model = Model(inputs=visible, outputs=output)
model.compile(optimizer=Optimizer,
loss=['mse'],
metrics=['mae', 'mse']
)
model.fit(X, Y, epochs=1000, batch_size=1, verbose=1,
shuffle=True, validation_split=0.05, callbacks=[tbCallBack, esCallback])
# return weights
weights1 = model.layers[1].get_weights()[0]
biases1 = model.layers[1].get_weights()[1]
print('Layer1---------------------------------------------------------------------------------------------------------')
print('weights1:')
print(repr(weights1.transpose()))
print('biases1:')
print(repr(biases1))
w1 = weights1.transpose()
b1 = biases1.transpose()
we1 = {'w1' : w1.tolist()}
bi1 = {'b1' : b1.tolist()}
.........
......
Later on, I implemented the trained neural network in the program "Dymola" by loading the weights and biases in pre-configured "neural network base classes" (which have been used several times and are working).
// Modelica code for Dymola:
Real inputs;
Real outputs;
Real scaled_outputs;
Real scaled_inputs(start=0);
Real scaled_outputsfunc;
der(scaled_inputs) = 0.9;
//part of the neural network implementation in Dymola
NeuralNetwork.BaseClasses.NeuralNetworkLayer neuralNetworkLayer1(
NeuronActivationFunction=NeuralNetwork.Types.ActivationFunction.TanSig,
numInputs=1,
numNeurons=40,
weightTable=[-0.367953330278397; ......])
annotation (Placement(transformation(extent={{-76,22},{-56,42}})));
//scaled inputs
neuralNetworkLayer1.u[1] = scaled_inputs;
//scaled outputs
neuralNetworkLayer5.y[1]= scaled_outputs;
//scaled_inputs = 0.06 * inputs
inputs = 1/0.06 * (scaled_inputs);
outputs = 1/875 * inputs^3 - 1/63 * inputs^2 + 5/7;
scaled_outputsfunc = 1.2173139581825052 * outputs - 0.3173139581825052;
When plotting and comparing the scaled outputs of the function and the returned (scaled) values of the neural network I noticed that the approximation is very good in the interval from [0.5; 0.8], but the closer the inputs reach the boundaries the worse the approximation becomes.
Unfortunately, I have no clue why this is happening and how to fix this issue. I'd be very glad if someone could help me.
I want to answer my own question: I forgot to specify the activation function in the output layer in my python code, which Keras then set to a linear function by default, see also:
https://keras.io/layers/core/
In Dymola, where my ANN was implemented, 'tanh' was the activation function in the last layer, which lead to a divergence near the boundaries.
The correct python code for this application must be:
visible = Input(shape=(1,))
x = Dense(40, activation='tanh')(visible)
x = Dense(39, activation='tanh')(x)
x = Dense(38, activation='tanh')(x)
x = Dense(30, activation='tanh')(x)
output = Dense(1, activation='tanh')(x)