I've built the siamese network from the PyImageSearch.com tutorial here:
https://pyimagesearch.com/2020/11/30/siamese-networks-with-keras-tensorflow-and-deep-learning/
I'm augmenting the CNN to detect whether an image is identical (instead of the same number, as in the tutorial). This just calls for augmenting one line of code within the make_pairs() function in utils.py:
old line: pairImages.append([currentImage, posImage])
new line: pairImages.append([currentImage, currentImage])
I thought this would be pretty easy for the network to learn 100% accuracy quickly because the images are identical. However, it doesn't achieve accuracy beyond ~95% in 10 epochs.
I think the issue is in the output layer. A euclidean distance is passed to a Dense output layer that contains a sigmoid function. The positive class is identical images, so the euclidean distance between these two images is 0. I'm thinking that if I could change the sigmoid activation from 0.5 to something close to 1.0, it would speed up learning. Identical images would have a euclidean distance of 0, so I imagine this should work.
My question is then:
how do I change the cutoff for the sigmoid function from the default of 0.5
Would changing the sigmoid cutoff have the intended effect?
Try using rectified linear unit (ReLU) activation instead of sigmoid.
Related
When I use u-net for semantic segmentation of two categories, my output in the last layer of the model is set to 1 channel and 2 channel respectively. Then I use cross-entropy loss to measure: BCEloss and CrossEntropyLoss.
But the gap between the two is great. The performance of the former is normal, but the latter has a very low precision rate and a high recall rate.
I used pytorch.
Mathematically BCEloss (logist) is just a special case of CrossEntropy loss for the case of two classes.
Are you using a sigmoid or softmax in the output of the network? In PyTorch, CrossEntropy loss takes the raw output of the last layer (no need for softmax the output), that is done for numerical stability.
BCEloss only takes input in between 0 and 1. So a sigmoid is needed there. However, PyTorch has The BCEWithLogistLoss that applies the sigmoid for you, this version is more stable.
One more thing that it seems you are not doing correctly. (it would be nicer to have some minimum amount of code to better understand your problem). CrossEntropyLoss requires one channel per class. So if you have 2 classes, you have to give it an input with two channels. The logist (BCEloss) only takes one channel with a number ranging between 0 and 1. If I understood correctly, you are somehow giving it 2 channels. That will bring you problems in training.
My best guess is that the gap in performance between the two is due to misuse of the loss functions. The PyTorch documentation has improved a lot, I can recomend you spending a few minutes understanding the difference between each those three loss functions: https://pytorch.org/docs/stable/nn.html#loss-functions .
I'm trying to build an Agent that can play Pocket Tanks using RL. The problem I'm facing now is that how can I train a neural network to output the correct Power and Angle. so instead of actions classification. and I want a regression.
In order to output the correct power and angle, all you need to do is go into your neural network architecture and change the activation of your last layer.
In your question, you stated that you are currently using an action classification output, so it is most likely a softmax output layer. We can do two things here:
If the power and angle has hard constraints, e.g. the angle cannot be greater than 360°, or the power cannot exceed 700 kW, we can change the softmax output to a TanH output (hyperbolic tangent) and multiply it by the constraint of power/angle. This will create a "scaling effect" because tanh's output is between -1 and 1. Multiplying the tanh's output by the constraint of the power/angle ensures that the constraints are always satisfied and the output is the correct power/angle.
If there are no constraints on your problem. We can simply just delete the softmax output all together. Removing the softmax allows for the output to no longer be constrained between 0 and 1. The last layer of the neural network will simply act as a linear mapping, i.e., y = Wx + b.
I hope this helps!
EDIT: In both cases, your reward function to train your neural network can simply be a MSE loss. Example: loss = (real_power - estimated_power)^2 + (real_angle - estimated_angle)^2
I'm trying to build a simple multilayer perceptron model on a large data set but I'm getting the loss value as nan. The weird thing is: after the first training step, the loss value is not nan and is about 46 (which is oddly low. when i run a logistic regression model, the first loss value is about ~3600). But then, right after that the loss value is constantly nan. I used tf.print to try and debug it as well.
The goal of the model is to predict ~4500 different classes - so it's a classification problem. When using tf.print, I see that after the first training step (or feed forward through MLP), the predictions coming out from the last fully connected layer seem right (all varying numbers between 1 and 4500). But then, after that the outputs from the last fully connected layer go to either all 0's or some other constant number (0 0 0 0 0).
For some information about my model:
3 layer model. all fully connected layers.
batch size of 1000
learning rate of .001 (i also tried .1 and .01 but nothing changed)
using CrossEntropyLoss (i did add an epsilon value to prevent log0)
using AdamOptimizer
learning rate decay is .95
The exact code for the model is below: (I'm using the TF-Slim library)
input_layer = slim.fully_connected(model_input, 5000, activation_fn=tf.nn.relu)
hidden_layer = slim.fully_connected(input_layer, 5000, activation_fn=tf.nn.relu)
output = slim.fully_connected(hidden_layer, vocab_size, activation_fn=tf.nn.relu)
output = tf.Print(output, [tf.argmax(output, 1)], 'out = ', summarize = 20, first_n = 10)
return {"predictions": output}
Any help would be greatly appreciated! Thank you so much!
Two (possibly more) reasons why it doesn't work:
You skipped or inappropriately applied feature scaling of your
inputs and outputs. Consequently, data may be difficult to handle
for Tensorflow.
Using ReLu, which is a discontinuous function, may raise issues. Try using other activation functions, such as tanh or sigmoid.
For some reasons, your training process has diverged, and you may have infinite values in your weights, wich gives NaN losses. The reasons can be many, try changing your training parameters (use smaller batchs for test).
Also, using a relu for the last output in a classifier is not the usual method, try using a sigmoid.
From my understanding Relu doesn't put a cap on the upper bound for Neural Networks so its more likely to deconverge depending upon its implementation.
Try switching all the activation functions to tanh or sigmoid. Relu is generally used for convolution in cnns.
Its also difficult to determine if your deconverging due to cross entropy as we don't know how you effected it with your epsilon value. Try just using the residual its much simpler but still effective.
Also a 5000-5000-4500 neural network is huge. Its unlikely you actually need a network that large.
Background: I'm writing in Python a three-layer neural network using mini-batch stochastic gradient descent specifically designed to identify between three classes of iris plants from the famous iris data set. The input layer has four neurons, one for each feature in the data. The hidden layer has 3 neurons (but the code allows variations in hidden layer neuron numbers) and the output layer has three neurons (one for each species). All neurons use sigmoid activation functions.
Problem: The loss (mean-squared error) generally decreases over time, however the accuracy (usually below 55.55% or even 33.33%) is stagnant. I've tried experimenting with different epoch iteration numbers and learning rates, but nothing worked. Interestingly, more often than not, the outputs for the algorithm remain fixed no matter what the input values are. I'm fairly certain of my math, since the loss seems to be decreasing as the number of epochs increases.
To replicate problem: Just run the Python code and observe the LEARNING_RESULTS.txt file. (Make sure iris.txt file in the repo is in same directory)
Question: How can I improve performance for this neural network?
Link to GitHub repo: https://github.com/kwonkyo/neural-networks
Thanks!
UPDATE: Problem solved. I was adding a constant value (numerical sum of the sum of the mini-batch matrices) to the weight and bias matrices instead of the sum of the mini-batch gradient matrices. Updated code has been pushed to github.
I have a neural network with one input, three hidden neurons and one output. I have 720 input and corresponding target values, 540 for training, 180 for testing.
When I train my network using Logistic Sigmoid or Tan Sigmoid function, I get the same outputs while testing, i.e. I get same number for all 180 output values. When I use Linear activation function, I get NaN, because apparently, the value gets too high.
Is there any activation function to use in such a case? Or any improvements to be done? I can update the question with details and code if required.
Neural nets are not stable when fed input data on arbitrary scales (such as between approximately 0 and 1000 in your case). If your output units are tanh they can't even predict values outside the range -1 to 1 or 0 to 1 for logistic units!
You should try recentering/scaling the data (making it have mean zero and unit variance - this is called standard scaling in the datascience community). Since it is a lossless transformation you can revert back to your original scale once you've trained the net and predicted on the data.
Additionally, a linear output unit is probably the best as it makes no assumptions about the output space and I've found tanh units to do much better on recurrent neural networks in low dimensional input/hidden/output nets.
Newmu is right that the scaling is probably the issue here; you need to scale your inputs to lie in the valid range. (Standardization to zero mean, unit variance, as they suggest, though, isn't a great choice since that means about a third of your data will like outside [-1, 1]....) I don't know about pybrain, but in scikit-learn you'd want sklearn.preprocessing.MinMaxScaler.
But, also, in the comments you said your dataset looks like this:
where the horizontal axis is inputs, vertical is targets. So, when you see an input of 200, you have one training example saying it's 80 and one saying it's 320; what do you want it to say then? An "optimal" neural network (which may be hard to achieve) would predict 200 or so.
You may need to think about how to reframe your learning problem to be a more-consistent function from inputs to targets.