Why is my model giving different result each time I train it?

Why is my model giving different result each time I train it? - python

My question is why when I train the same algorithm twice, it gives different results each time I train it?
Is it normal or there might be a problem in data or code?
The algorithm is deep deterministic policy gradient.

It's absolutely normal. There is no problem with either data or code.
The algorithm may be initialized to a random state, such as the initial weights in an artificial neural network.
Try setting numpy seed for result reproducibility as below:
import numpy as np
np.random.seed(42)
Learn more about this from here.

When you initialize weights to your model, they are often initialized randomly by whatever you use, most likely np.random.rand(), and therefore yields different results every time.
If you do not want to randomize weights, use np.random.seed(10) to always get the same results. If using any other library, I'm sure there are equal commands.
Edit: i saw you used tensorflow, in that case:
tf.random.set_random_seed(10)

Related

Tensorflow: Different results with the same random seed

I'm running a reinforcement learning program in a gym environment(BipedalWalker-v2) implemented in tensorflow. I've set the random seed of the environment, tensorflow and numpy manually as follows
os.environ['PYTHONHASHSEED']=str(42)
random.seed(42)
np.random.seed(42)
tf.set_random_seed(42)
env = gym.make('BipedalWalker-v2')
env.seed(0)
config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
# run the graph with sess
However, I get different results every time I run my program (without changing any code). Why are the results not consistent and what should I do if I want to obtain the same result?
Update:
The only places that I can think of may introduce randomness (other than the neural networks) are
I use tf.truncated_normal to generate random noise epsilon so as to implement noisy layer
I use np.random.uniform to randomly select samples from replay buffer
I also spot that the scores I get are pretty consistent at the first 10 episodes, but then begin to differ. Other things such as losses also show a similar trend but are not the same in numeric.
Update 2
I've also set "PYTHONHASHSEED" and use single-thread CPU as #jaypops96 described, but still cannot reproduce the result. Code has been updated in the above code block

I suggest checking whether your TensorFlow graph contains nondeterministic operations. For example, reduce_sum before TensorFlow 1.2 was one such operation. These operations are nondeterministic because floating-point addition and multiplication are nonassociative (the order in which floating-point numbers are added or multiplied affects the result) and because such operations don't guarantee their inputs are added or multiplied in the same order every time. See also this question.
EDIT (Sep. 20, 2020): The GitHub repository framework-determinism has more information about sources of nondeterminism in machine learning frameworks, particularly TensorFlow.

It seems that tensorflow neural networks introduce randomness during training that isn't controlled by a numpy random seed. The randomness appears to possibly come from python hash operations and parallelized operations executing in non-controlled ordering, at the very least.
I had success getting 100% reproducibility using a keras-tensorflow NN, by following the setup steps in this response:
How to get reproducible results in keras
specifically, I used the formulation proposed by #Poete Maudit in that link.
They key was to set random seed values UP FRONT, for numpy, python, and tensorflow, then also to make tensorflow run on single-thread CPU in a specially-configured session.
Here's the code i used, updated very slightly from the link i posted.
print('Running in 1-thread CPU mode for fully reproducible results training a CNN and generating numpy randomness. This mode may be slow...')
# Seed value
# Apparently you may use different seed values at each stage
seed_value= 1
# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)
seed_value += 1
# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)
seed_value += 1
# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)
seed_value += 1
# 4. Set `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.set_random_seed(seed_value)
# 5. Configure a new global `tensorflow` session
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
tf.keras.backend.set_session(sess)
#rest of code...

Maybe you can try to set the number of parallelism threads to 1. I have the same problem: the loss became different to the seventh decimal place start from the second episode. It fixed when I set
tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

Predict() on Keras gives alway different results even if the NN and the dataset is the same

I have my model and a fixed dataset on which I do the train_test_split twice: once for getting train and test sets and the second time for getting a validation set too.
I have to reuse the same network, on the same data, twice in two different modules but every time I do that I get different results.
Is there a way to fix it?
I have the weights fixed and random_state = 42 so to eliminate every form of randomness but still it does not seem enough.
The optimizer I used is Adam and the loss function is the mean absolute error.

Do you train and evaluate (predict) the model in the same script and process?
Please check the official guide how to obtain reproducible results using keras during development.
In addition you can try to save and load your model (in another file) to check the predictions.

Using K-Means with predefined centers?

I'm running a KNN classifier whose feature vectors come from a K-Means classifier (more specifically, sklearn.cluster.MiniBatchKMeans). Since the K-means starts with random points every time I'm getting different results every time I run my algorithm. I've stored the cluster centers in a separate .npy file from a time where results were good, but now I need to use those centers in my K-means and I don't know how.
Following this advice, I tried to use the cluster centers as starting points like so:
MiniBatchKMeans.__init__(self, n_clusters=self.clusters, n_init=1, init=np.load('cluster_centers.npy'))
Still, results change every time the algorithm is run.
Then I tried to manually alter the cluster centers after fitting the data:
kMeansInstance.cluster_centers_ = np.load('cluster_centers.npy')
Still, different results each time.
The only other solution I can think of is manually implementing the predict method using the centers I saved, but I don't know how and I don't know if there is a better way to solve my problem than rewriting the wheel.

I would guess fixing the random_state will do the job.
See API docu.

Mini batch k-means only considers a sample of the data.
It uses a random generator for this.
If you want deterministic behaviour, fix the random seed, and prefer algorithms that do not use a random sample (i.e., use the regular k-means instead of mini-batch k-means).

tensorflow evaluate with confusion matrix

In the tensorflow CNN tutorial, it computes the accuracy, but I want to leverage from that to the confusion matrix.
Immediately, three different approaches hit on my mind:
I tried to directly compute the prediction result instead of top_k_op in tensorflow, then I could utilize sklearn. But I failed, because the it used multiple threads to compute(line 88);
I tried to load the trained Variables and give new placeholder to cifar10.inference, but failed again, because it defined batch_image as input(line 225);
The last approach is to defined a new operation to replace the line 128
top_k_op = tf.nn.in_top_k(logits, labels, 1)
but I could not find a proper operations could do that.
This has afflicted me for several days. Please help. Thank you in advance.

You can utilize sklearn's confusion_matrix only after running 'inference' on all the dataset.
Meaning, if you are modifying eval_only function, you should just accumulate all the scores into some thread-safe container (list). And then after all threads are stopped (line 113) you can run single confusion matrix computation.
Additionally, if you want to do it in the graph, TensorFlow recently got confusion_matrix op you can try using. That said, it only works on the batch so you will need to increase your batch to get any kind of resolution or write a custom aggregator.

Why a Trained Pybrain network yield different results even with an input use for training

I have trained a neural network using pybrain. But when i test my network using the same input as the one used for training, i get complete different result. Here is my code
from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
import numpy as np
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised import BackpropTrainer
from pybrain.tools.xml.networkreader import NetworkReader
from pybrain.tools.xml.networkwriter import NetworkWriter
from pybrain.utilities import percentError
n = FeedForwardNetwork()
inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)
n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()
X = np.array(([3,5], [5,1], [10,2]),dtype=float)
Y = np.array(([75], [82], [93]),dtype=float)
X/=np.amax(X, axis=0)
Y/=100
print(n.activate([ 1, 2]))
print(in_to_hidden.params)
ds = SupervisedDataSet(2,1)
for i in range(len(X)):
ds.addSample(X[i],Y[i])
trainer=BackpropTrainer(n,ds, learningrate=0.5, momentum=0.05,verbose=True)
trainer.trainUntilConvergence(ds)
trainer.testOnData(ds, verbose=True)
Now when i want to test on an input using the code
print("Testing",n.activate([3,5]))
i get ('Testing', array([ 1.17809308])) . I should have had around0.75 for this input n.activate([3,5]). So i dont understand why this strange result

If I understand you correctly, this is just one aspect of model validation that you will always have to undertake. The network generally seeks to minimise its error against all of the training data, but it will not get each result exactly. You could probably improve prediction accuracy by running more epochs with more hidden neurons. However, doing so would eventually lead to over-fitting through excessive flexibility. It's a bit of a balancing act.
As an analogy, take regression. In the linear case below, the model does not match any of the training (blue) data, but generally captures the trend for blue and red (external test) data. Using the linear equation would always give me the wrong answer for all data but it's a decent approximator. Then say that I fit a polynomial trendline to the data. Now it has a lot more flexibility, hitting all of the blue points but the error on the testing data has increased.
Once you have your network built, you need to rerun all of your data back through it. You can then validate on absolute average deviation, MSE, MASE etc. in addition to things like k-fold cross validation. Your tolerance of error is based on your application: in engineering, I might always need to be within 5% error, and anything that exceeds that threshold (which would occur in the second graph) could have fatal consequences. In language processing, I might be able to tolerate one or two real mess-ups and try catch them another way if the majority of predictions are very close, so I'd possibly take the second graph.
Playing with your learning rate and momentum might help converge on a better solution.
EDIT: Based on comments
The comment "should have been able to recognise it" implies to me something different than the basis of the neural network. There is not even a vague concept of memory in the network, it simply uses the training data to develop a convoluted set of rules to try and minimise its error against all data points. Once the network is trained, it has no recollection of any of the training data, it's just left with a spaghetti of multiplication steps it will perform on input data. So no matter how good your network is, you will never be able to reverse-map your training inputs to exactly the right answer.
The idea of "convergence" cannot be taken to mean that you have a good network. The network might just have found a local minima in error and given up learning. That is why you must always validate your models. If you are not happy with the result of the validation, you can try improve the model by:
- Simply re-running it again. The random initialisation of the network might now avoid the local minima
- Changing the number of neurons. This loosens or tightens the flexibility of the model
- Change the learning rate and momentum
- Change the learning rule e.g. swapping from Levenberg-Marquardt to Bayesian Regularisation

I don't seen anything strange with the result. Just because you have a training instance with input [3,5] and output [0.75] it does not mean that the network should always produce the same output. The model does not simply memorize training instances.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.