Currently I'm trying to train a complex data generated from telecom engineering models. The weights and biases are also complex. I have used the relu activation for the hidden layers as follows at the l-th layer:
A_l = tf.complex(tf.nn.relu(tf.real(Z_l)), tf.nn.relu(tf.imag(Z_l)))
But how to do it for the cost and optimizer, please? I am really confused because I'm a beginner in machine learning. I have gone through some papers about non-analytic functions, but none of them helped to use the Tensorflow API. For example: how do I rewrite the functions below?
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z_out, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
I have seen a recommendation to split the cost as real and imaginary parts as:
cost_R = .., cost_I = ...
but I didn't try it because I think the optimizer will be split and the optimization not be working. I used the relu activation for the hidden layers as follows at the l-th layer:
A_l = tf.complex(tf.nn.relu(tf.real(Z_l)), tf.nn.relu(tf.imag(Z_l)))
But how to the cost and optimizer?
Any help is much appreciated.
Related
I am working on a project in which I have to predict the methane production
input:pH,temperature,solution concentration
output: methene production
I have used Keras TensorFlow
my questions are:
(as of now I have 60 experimental data) the accuracy is always 0.2-0.3 why?should I increase te number of data?
I used the following code:
classifier.add(Dense(6, activation='relu', kernel_initializer='uniform',input_dim=9))
classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dense(1, kernel_initializer= 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss='mean_squared_error',metrics=['mean_squared_error'])
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
3.It is possible to predict other than binary outputs, right? if no then which one will be suitable for predicting non binary values
If you only have 60 data points, yes definitely try to get more data. In general it is good to have hundreds (if not thousands) of data points to effectively train a neural network. Your network looks fine (assuming the relationship between those inputs and the output is fairly linear), if that is not the case you could try making your hidden layer wider (more neurons).
It is definitely possible to predict other than binary outputs, in fact it looks like your network should be doing so. It really just depends on the activation function you put on your output layer. For example, softmax is good for classifying data when there are several possible labels. For binary classification, a sigmoid activation function is good. If you're just trying to predict an output quantity, you can probably just not have an activation function on your output.
yes have to provide more data to learn the pattern in data points, if have linear regression than used it for better
i try to build neural network model using the following code - multi task model
inp = Input((336,))
x = Dense(300, activation='relu')(inp)
x = Dense(256, activation='relu')(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(56, activation='relu')(x)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
out_reg = Dense(1, name='reg')(x)
out_class = Dense(1, activation='sigmoid', name='class')(x) # I suppose bivariate classification problem
model = Model(inp, [out_reg, out_class])
model.compile('adam', loss={'reg':'mse', 'class':'binary_crossentropy'},
loss_weights={'reg':0.5, 'class':0.5})
now i want to use genetic algorithm optimize neural network weights, layers and number of neurons using genetic algorithm in python
i learned many tutorial about it but i didn't find any materiel discuss how to implement it
any help could be appreciated
Initially, I think it is better to
- Fix the architecture of the model,
- Know how many trainable parameters are there and their format,
- Create a random population of trainable parameters,
- Define objective function to optimize,
- Implements GA operation (reproduction, crossover, mutation etc),
- Resize these population of weights and biases into correct format,
- Then run ML model with those weights and biases,
- Get loss, and update population and,
- Repeat the above process a number of epoch/with a stopping criteria
Hope it helps.
If you are this new to machine learning, I would not recommend using genetic algorithms to optimize your weights. You have already compiled your model with "Adam", which is an excellent gradient-descent based optimizer that is going to do all of the work for you, and you should use that instead.
Check out the Tensorflow quickstart tutorial for more information https://www.tensorflow.org/tutorials/quickstart/beginner
Here's an example of how to implement genetic algorithms from a Google search... https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3
If you want to do hypertuning with genetic algorithms, you can encode hyperparemeters of the network (number of layers, neurons) as your genes. Evaluating the fitness will be very costly, because it would involve having to train the network for a given task to get its final test loss.
If you want to do optimization with genetic algorithms, you can encode the model weights as genes, and the fitness would be directly related to the loss of the network.
I am trying to understand why regularization syntax in Keras looks the way that it does.
Roughly speaking, regularization is way to reduce overfitting by adding a penalty term to the loss function proportional to some function of the model weights. Therefore, I would expect that regularization would be defined as part of the specification of the model's loss function.
However, in Keras the regularization is defined on a per-layer basis. For instance, consider this regularized DNN model:
input = Input(name='the_input', shape=(None, input_shape))
x = Dense(units = 250, activation='tanh', name='dense_1', kernel_regularizer=l2, bias_regularizer=l2, activity_regularizer=l2)(x)
x = Dense(units = 28, name='dense_2',kernel_regularizer=l2, bias_regularizer=l2, activity_regularizer=l2)(x)
y_pred = Activation('softmax', name='softmax')(x)
mymodel= Model(inputs=input, outputs=y_pred)
mymodel.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
I would have expected that the regularization arguments in the Dense layer were not needed and I could just write the last line more like:
mymodel.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'], regularization='l2')
This is obviously wrong syntax, but I was hoping someone could elaborate for me a bit on why the regularizes are defined this way and what is actually happening when I use layer-level regularization.
The other thing I don't understand is under what circumstances would I use each or all of the three regularization options: (kernel_regularizer, activity_regularizer, bias_regularizer)?
Let's break down the components of your question:
Your expectation of regularisation is probably in line with a feed-forward network where yes the penalty term is applied to the weights of the overall network. But this is not necessarily the case when you have RNNs mixed with CNNs etc so Keras opts give fine grain control. Perhaps for easy setup, a regularisation at model level could be added to the API for all weights.
When you use layer regularisation, the base Layer class actually adds the regularising term to the loss which at training time penalises the corresponding layer's weights etc.
Now in Keras you can often apply regularisation to 3 different things as in Dense layer. Every layer has different kernels such recurrent etc, so for the question let's look at the ones you are interested in but the same roughly applies to all layers:
kernel: this applies to actual weights of the layer, in Dense it is the W of Wx+b.
bias: this is the bias vector of the weights, so you can apply a different regulariser for it, the b in Wx+b.
activity: is applied to the output vector, the y in y = f(Wx + b).
I'm using Tensorflow to train a neural based reinforcement learning agent. The idea is the following, first I train an LSTM (on real data, to begin with a human bias) to predict what may be the the future actions of the human. Then I use the cell state of my LSTM as a state for reinforcement learning.
However, I've noticed a tendency for the internal state to shift with time, and therefore, the internal state is not directly suitable for Reinforcement Learning. It is apparently a common problem which is addressed with the normalized layer LSTM. I tried the tensorflow implementation of the NL LSTM but now, I cannot train it since TF puts NaN weights at its output.
I use the ADAM optimize with and tried different weights, nothing changes, I also tried to increase the number of units in my LSTM with no improvement.
Does anyone see the problem in my code ?
with tf.variable_scope("system_lstm") as scope:
no_units_system=128
_seq_system = tf.placeholder(tf.float32, [batch_size, max_length_system, system_inputShapeLen], name='seq_')
_seq_length_system = tf.placeholder(tf.int32, [batch_size], name='seq_length_')
cell_system = tf.contrib.rnn.LayerNormBasicLSTMCell(
no_units_system)
output_system, hidden_states_system = tf.nn.dynamic_rnn(
cell_system,
_seq_system,
sequence_length=_seq_length_system,
dtype=tf.float32
)
out2_system = tf.reshape(output_system, shape=[-1, no_units_system])
out2_system = tf.layers.dense(out2_system, system_outputShapeLen)
out_final_system = tf.reshape(out2_system, shape=[-1, max_length_system, system_outputShapeLen])
y_system_ = tf.placeholder(tf.float32, [None, max_length_system, system_outputShapeLen])
softmax_system = tf.nn.softmax(out_final_system, dim=-1)
loss_system = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_system, labels=y_system_))
optimizer = tf.train.AdamOptimizer(learning_rate=1e-10)
minimize_system = optimizer.minimize(loss_system)
Tensorflow has released a tutorial for transfer learning named Image retraining that can be found in here:
https://www.tensorflow.org/tutorials/image_retraining
What they are doing is using a pre-trained model on inception v3 and then they change only the very last layer (softmax regression layer) and train it on the new dataset. This is very understandable and in fact a common practice in transfer learning.
I have tried their method on my dataset (which is a small dataset) and I have applied all the suggestion to get a better result from the data augmentation to change the number of steps but I did not modify their code by any means. The accuracy I got is relatively bad ~70%.
I am thinking of the possibility of training a small neural network on top of the given model, namely, changing the last layer from a simple regression to a more sophisticated network.
Here is the part of their code where they modify the softmax layer:
def add_final_training_ops(class_count, final_tensor_name, bottleneck_tensor):
"""Adds a new softmax and fully-connected layer for training.
We need to retrain the top layer to identify our new classes, so this function
adds the right operations to the graph, along with some variables to hold the
weights, and then sets up all the gradients for the backward pass.
The set up for the softmax and fully-connected layers is based on:
https://tensorflow.org/versions/master/tutorials/mnist/beginners/index.html
Args:
class_count: Integer of how many categories of things we're trying to
recognize.
final_tensor_name: Name string for the new final node that produces results.
bottleneck_tensor: The output of the main CNN graph.
Returns:
The tensors for the training and cross entropy results, and tensors for the
bottleneck input and ground truth input.
"""
with tf.name_scope('input'):
bottleneck_input = tf.placeholder_with_default(
bottleneck_tensor, shape=[None, BOTTLENECK_TENSOR_SIZE],
name='BottleneckInputPlaceholder')
ground_truth_input = tf.placeholder(tf.float32,
[None, class_count],
name='GroundTruthInput')
# Organizing the following ops as `final_training_ops` so they're easier
# to see in TensorBoard
layer_name = 'final_training_ops'
with tf.name_scope(layer_name):
with tf.name_scope('weights'):
layer_weights = tf.Variable(tf.truncated_normal([BOTTLENECK_TENSOR_SIZE, class_count], stddev=0.001), name='final_weights')
variable_summaries(layer_weights)
with tf.name_scope('biases'):
layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
variable_summaries(layer_biases)
with tf.name_scope('Wx_plus_b'):
logits = tf.matmul(bottleneck_input, layer_weights) + layer_biases
tf.summary.histogram('pre_activations', logits)
final_tensor = tf.nn.softmax(logits, name=final_tensor_name)
tf.summary.histogram('activations', final_tensor)
with tf.name_scope('cross_entropy'):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
labels=ground_truth_input, logits=logits)
with tf.name_scope('total'):
cross_entropy_mean = tf.reduce_mean(cross_entropy)
tf.summary.scalar('cross_entropy', cross_entropy_mean)
with tf.name_scope('train'):
train_step = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(
cross_entropy_mean)
return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input,
final_tensor)
def add_evaluation_step(result_tensor, ground_truth_tensor):
"""Inserts the operations we need to evaluate the accuracy of our results.
Args:
result_tensor: The new final node that produces results.
ground_truth_tensor: The node we feed ground truth data
into.
Returns:
Tuple of (evaluation step, prediction).
"""
with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
prediction = tf.argmax(result_tensor, 1)
correct_prediction = tf.equal(
prediction, tf.argmax(ground_truth_tensor, 1))
with tf.name_scope('accuracy'):
evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', evaluation_step)
return evaluation_step, prediction
However, I am facing two main problems. First, I am not if this a good idea or not? would I be just wasting in my effort in doing something useless? Second, they are using the simple MNIST tutorial as a model for the last layer, say that I would use their Expert MNIST tutorial (https://www.tensorflow.org/get_started/mnist/pros) I am lost on what to do or how to configure it?.
Any suggestions on what can I do?