I want to solve ODE
with neural network and compare this with true solution $u(x) = x$.
Method
Build a neural network, say $v(x;w)$, where $x$ is 1-d input and $w$ is weight.
Set loss function as
Use either adam or sgd to minimize the loss
Question
In theory, this shall converge to true solution by fixed point theorem.
But I do not know how to implement it, especially how to set the loss function with tensorflow gradient.
Code
Here is some code up to the point I stuck with.
#creat training data
x_train = np.arange(10)/10. #the derivatives will be taken at these pts.
y_train = np.zeros(x_train.shape) #we do not need it indeed.
# model for the candidate function
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10, input_shape=(1,), use_bias=True),
tf.keras.layers.Dense(1)
])
Any opinion will be appreciated.
Related
I have a simple model in tensorflow which is being trained on the first 1000 images in the MNIST datset. From my previous experience the learning rates which I used were of the order of around 0.001, however for my model to converge the learning rate needs to be far heigher, at least larger than 1. The model is shown below.
def gen_model():
return tf.keras.models.Sequential([
tf.keras.Input(shape=(28,28,)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='sigmoid'),
tf.keras.layers.Dense(10, activation='softmax')
])
model = gen_model()
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=5), loss='mean_squared_error')
model.summary()
model.fit(x_train, y_train, batch_size=1000, epochs=10000)
Is it expected for models of this form to require an extremely high learning rate, or is there something I have missed? When I use a learning rate of around 0.001 the loss changes incredibly slowly.
The dataset was created with the following code:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_train = x_train.reshape(60000,28,28)[:1000];
y_train = y_train[:1000];
y_train = tf.one_hot(y_train, 10)
Generally speaking, models that require learning rates larger than 1 raise a red flag for me. It seems like your model is a vanilla multilayer perceptron, so there's nothing overly complicated about that, but there are a couple things about your setup that stand out:
The output from your model uses a softmax, which is normally used to represent values from a categorical distribution (i.e., 1-of-k) -- this is typical for a classification model. But the loss you're using is typically used for optimizing Gaussian or regression outputs. You might want to try using a cross-entropy loss to see if that helps.
The output from your model is in probability space, so the values you get out from your model are in [0, 1]. The loss you're using is averaging the squared differences between the model output and the target 1-hot vector (whose values are in {0, 1}). The value you'll get for this loss is always smaller than 1, so with a learning rate less than 1, and multiplying by the existing model weights, the delta that you'll apply to your model weights is always going to be small. Sometimes that's a good thing, but my guess is that in this case -- and particularly at the start of training when the model weights aren't near their optimal values -- this is going to be quite slow.
Related to the above point, you might try initializing your model weights with a larger range of values than the default. This would help make the gradient values larger, but could also make the model more likely to diverge.
You could also try to replace your softmax output activation with a plain linear activation, in effect converting your model's output to (unnormalized) log-probability space. Then you'd need to change your dataset labels to also represent target log-probability values, which isn't possible exactly, but could get close with something like 1e8 * (1 - one_hot). But if you wanted to go this route, you'd effectively be implementing a cross-entropy loss yourself; see the first point.
Have built a Reinforcement Learning DQN with variable length sequences as inputs, and positive and negative rewards calculated for actions. Some problem with my DQN model in Keras means that although the model runs, average rewards over time decrease, over single and multiple cycles of epsilon. This does not change even after significant period of training.
My thinking is that this is due to using MeanSquareError in Keras as the Loss function (minimising error). So I am trying to implement gradient ascent (to maximise reward). How to do this in Keras? My current model is:
model = Sequential()
inp = (env.NUM_TIMEPERIODS, env.NUM_FEATURES)
model.add(Input(shape=inp)) # 'a shape tuple(integers), not including batch-size
model.add(Masking(mask_value=0., input_shape=inp))
model.add(LSTM(env.NUM_FEATURES, input_shape=inp, return_sequences=True))
model.add(LSTM(env.NUM_FEATURES))
model.add(Dense(env.NUM_FEATURES))
model.add(Dense(4))
model.compile(loss='mse,
optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
metrics=[tf.keras.losses.MeanSquaredError()])
In trying to implement gradient ascent, by 'flipping' the gradient (as negative or inverse loss?), I have tried various loss definitions:
loss=-'mse'
loss=-tf.keras.losses.MeanSquaredError()
loss=1/tf.keras.losses.MeanSquaredError()
but these all generate bad operand [for unary] errors.
How to adapt current Keras model to maximise rewards ?
Or is this gradient ascent not even the problem? Could it be some issue with the action policy?
Writing a custom loss function
Here is the loss function you want
#tf.function
def positive_mse(y_true, y_pred):
return -1 * tf.keras.losses.MSE(y_true, y_pred)
And then your compile line becomes
model.compile(loss=positive_mse,
optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
metrics=[tf.keras.losses.MeanSquaredError()])
Please note : use loss=positive_mse and not loss=positive_mse(). That's not a typo. This is because you need to pass the function, not the results of executing the function.
I'm trying to improve my model so it can become a bit more accurate. Right now I'm training the model and get this as my training and validation accuracy.
For every epoch I get an training accuracy of 0.0003 and an validation accuracy of 0. I know this isn't good but I don't know how I can fix this.
Data is normalized with the minmax scaler. 4 of the 8 features are normalized (other 4 are hour, day, day_of_week and month)
Update:
I've also tried to normalize the entire dataset and it doesn't make a differance
scaling = MinMaxScaler(feature_range=(0,1)).fit(df[cols])
df[[cols]] = scaling.transform(df[[cols]])
My model: The shape is (5351, 1, 8)
and the input_shape is (1, 8)
model = keras.Sequential()
model.add(keras.layers.Bidirectional(keras.layers.LSTM(2,input_shape=(X_train.shape[1], X_train.shape[2]), return_sequences=True, activation='linear')))
model.add(keras.layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='Adamax', metrics=['acc'])
history = model.fit(
X_train, y_train,
epochs=200,
batch_size=24,
validation_split=0.35,
shuffle=False,
)
i tried using the answer of this question:
Keras model accuracy not improving
but it didn't work
A mean_sqared_error loss is for regression tasks while a acc metric is for classification problems. So it makes no sense to use them together.
If you work on a classification problem, use binary_crossentropy or categorical_crossentropy as loss and keep the metric parameter as you did.
If it is a regression tasks, change the metric to [mse] for mean squares error instead of [acc].
Your model "works" and you have applied the standard formula for backpropagation by using the mean squares error loss. But measuring the accuracy will make Keras check if your model's output is EXACTLY equals to the expected values. Since the loss function is for regression, it will hardly ever be equal.
Three last points because that little change won't correct everything.
Firstly, your last dense layer should have an activation function. (It's safier)
Secondly, I'm pretty sure a Bidirectional+LSTM layer placed before a Dense layer should have a return_sequences=False. A LSTM layer (with or without Bidirectional) can return thé full séquence of vector (like a matrix) but a dense layer takes vectors as input. But in this case it will work because of the third point.
The last point is about the shape of your data. You have 5351 examples of shape (1, 8) each which a vector of size 8. But a LSTM layer takes a sequence of vectors still thé size of your séquence is one. I don't know if it is relevent to use an RNN type layer here.
i try to build neural network model using the following code - multi task model
inp = Input((336,))
x = Dense(300, activation='relu')(inp)
x = Dense(256, activation='relu')(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(56, activation='relu')(x)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
out_reg = Dense(1, name='reg')(x)
out_class = Dense(1, activation='sigmoid', name='class')(x) # I suppose bivariate classification problem
model = Model(inp, [out_reg, out_class])
model.compile('adam', loss={'reg':'mse', 'class':'binary_crossentropy'},
loss_weights={'reg':0.5, 'class':0.5})
now i want to use genetic algorithm optimize neural network weights, layers and number of neurons using genetic algorithm in python
i learned many tutorial about it but i didn't find any materiel discuss how to implement it
any help could be appreciated
Initially, I think it is better to
- Fix the architecture of the model,
- Know how many trainable parameters are there and their format,
- Create a random population of trainable parameters,
- Define objective function to optimize,
- Implements GA operation (reproduction, crossover, mutation etc),
- Resize these population of weights and biases into correct format,
- Then run ML model with those weights and biases,
- Get loss, and update population and,
- Repeat the above process a number of epoch/with a stopping criteria
Hope it helps.
If you are this new to machine learning, I would not recommend using genetic algorithms to optimize your weights. You have already compiled your model with "Adam", which is an excellent gradient-descent based optimizer that is going to do all of the work for you, and you should use that instead.
Check out the Tensorflow quickstart tutorial for more information https://www.tensorflow.org/tutorials/quickstart/beginner
Here's an example of how to implement genetic algorithms from a Google search... https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3
If you want to do hypertuning with genetic algorithms, you can encode hyperparemeters of the network (number of layers, neurons) as your genes. Evaluating the fitness will be very costly, because it would involve having to train the network for a given task to get its final test loss.
If you want to do optimization with genetic algorithms, you can encode the model weights as genes, and the fitness would be directly related to the loss of the network.
I have this doubt when I fit a neural network in a regression problem. I preprocessed the predictors (features) of my train and test data using the methods of Imputers and Scale from sklearn.preprocessing,but I did not preprocessed the class or target of my train data or test data.
In the architecture of my neural network all the layers has relu as activation function except the last layer that has the sigmoid function. I have choosen the sigmoid function for the last layer because the values of the predictions are between 0 and 1.
tl;dr: In summary, my question is: should I deprocess the output of my neuralnet? If I don't use the sigmoid function, the values of my output are < 0 and > 1. In this case, how should I do it?
Thanks
Usually, if you are doing regression you should use a linear' activation in the last layer. A sigmoid function will 'favor' values closer to 0 and 1, so it would be harder for your model to output intermediate values.
If the distribution of your targets is gaussian or uniform I would go with a linear output layer. De-processing shouldn't be necessary unless you have very large targets.