I set up a network to learn Fashion MNIST in the style of Hands On Machine Learning, page 298. When I ran the code multiple times, the accuracy was slightly different each time. This made me wonder if the accuracies were normally distributed around some mean, and that with enough runs, I could accurately determine the population mean of all runs for those parameters.
So I ran this code, which fits the model 1000 times, each with differently shuffled training data:
from tensorflow.keras import datasets, models, layers
import matplotlib.pyplot as plt
(X_train, y_train), _ = datasets.fashion_mnist.load_data()
final_accuracies = []
for i in range(1000):
model = models.Sequential(layers=[
layers.Flatten(input_shape=[28, 28]),
layers.Dense(300, activation="relu"),
layers.Dense(100, activation="relu"),
layers.Dense(10, activation="softmax")])
model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
history = model.fit(X_train / 255.0, y_train, epochs=5, validation_split=0.2)
final_accuracies.append(history.history['val_accuracy'][-1])
plt.hist(final_accuracies, bins=30)
plt.show()
The resulting distribution of accuracies is shown in the attached histogram.
What kind of distribution is this?! It's clearly not normally distributed. It's got a much longer tail in the direction of lower accuracies.
Is the statistics of what kind of distribution these accuracies are drawn from worked out? If so, please illuminate me.
Also, if this question really fits in better at stats.stackexchange.com, please kindly let me know that all I'll remove it from here and post it there instead.
I do not know the answer to your question specifically. It would be difficult to figure out because there are numerous sources of randomness at play each time you run a model. One example is the weight initialization. Others can be due to activities which shuffle the data. Dropout layers etc. If you use transfer learning randomness maybe be included within the model you select. It would take a lot of work to combine all these sources to mathematically calculate the distribution. There are a lot of question on Stack Overflow on how to get repeatable results in tensorflow. Turns out to do so you have to hunt down every source of random processes and try to eliminate it by using a "seed" that makes the random activity repeatable for each time the model is run. This is no easy task.
Related
I am currently building a Conditional GAN to apply data augmentation on a small audio dataset.
My problem is that I don't really know how to calibrate my models and the parameters, I feel like there is a need to fine-tune the hyperparameters in a certain way but I don't know in which direction to go.
First of all, here is a plot of my losses through the epochs, please don't bother with the names of the axis, they are wring becase I reused a function without modifying the name of the axis:
plot of the losses per epochs
As we can see, the two losses cross each other and I believe they should stay balanced and approximately equal for the rest of the training, but in my case, they diverge and never meet again. I was wondering if this is normal behavior, maybe I should stop the training when they cross?
Please tell me if you have any leads, clues, or criticism that would allow me to improve my models.
For further information, here are some of the hyper-parameters I am using:
# I used custom loss functions for both models, each function uses this cross_entropy,
# but I am quite confident that is part is correct.
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=False)
# different learning rates because I felt that the discriminator model was too chaotic
generator_optimizer = Adam(8e-5)
discriminator_optimizer = Adam(2e-5)
BATCH_SIZE = 20
epochs = 1000
I am conscious that 1000 epochs are way too much for this but I wanted to observe the behavior on a large scale.
I built my generator like that:
generator model
And my discriminator model is like that:
discriminator model
The architecture is done using the functional API of Tensorflow
Thanks for reading and please tell me if you see anything funny or if you have any leads.
I got this error, “Allocation of 73138176 exceeds 10% of system memory”, when I run image classification codes via CNN. I used different solutions to solve my problem. However, it changed the model accuracy in each testing.
Model accuracy here was 0.6761.
model.fit(X, y, batch_size=32, epochs=9, validation_split=0.3)
Then, when I lowered batch_size to 2, the accuracy here increased to 0.8451. Also, it did not give any errors related to the allocation problem.
model.fit(X, y, batch_size=2, epochs=9, validation_split=0.3)
Then, I was also curious about a code which also solved the allocation problem. However, this time, Model accuracy here was 0.7183. The code is;
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
model.fit(X, y, batch_size=32, epochs=9, validation_split=0.3)
My question is, which code do you actually suggest that I should follow? Also, could you please brighten me why the accuracy changes each time?
Thank you for every help and suggestion.
If you want exactly repeatable training results, you need to eliminate all sources of randomness. For a typical model training, the main sources are in 1: your dataset; randomization of the test/train split, or randomization of the order in which batches are generated. And 2: the model initialization; if you want to train the same model every time, you need to start with the same initial parameters every time. How you ensure that you get 'the same random numbers' with every training run varies by framework; and it was unreasonably painful last time I tried years ago in TF; but it can be done and google should know how to do if you search for fixing the random seed in TF.
However, fixing the random seed may not be what you are interested in; for doing repeatable experiments, it's what you want. But as far as the production qualities of your model are concerned, thats a different matter. If you find that the eventual model you end up with, behaves rather different depending on the seed (and many problems will intrinsically have this property, where multiple 'equally valid' but rather different interpretations exist), training an ensemble of such models, with a different random seed each, is a useful thing to do; in this way you can gain an explicit awareness of the amount of 'room for interpretation' that you model and dataset leaves open.
I'm a beginner in datascience and tensorflow, so, as a test of my "skills" I wanted to try and make an AI that you give a number to and then gives back a 28x28 pixel image of that number. It is possible to do this the other way around, so I figured, why not? So the code works pretty well actually, but the accuracy of the AI is very low, so low in fact that it just returns random pixels. Is there any way to make this AI more accurate, apart from maybe doing like 100 epochs or something? Heres the code I'm using:
import tensorflow as tf
import tensorflow.keras as tk
import numpy as np
import matplotlib.pyplot as plt
(train_data, train_labels), (test_data, test_labels) = tk.datasets.mnist.load_data()
model = tk.Sequential([
tk.layers.Dense(64, activation='relu'),
tk.layers.Dense(64, activation='relu'),
tk.layers.Dense(784, activation='relu')])
history = model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['acc'])
train_data = np.reshape(train_data, (60000, 784))
test_data = np.reshape(test_data, (-1, 784))
model.fit(train_labels, train_data, epochs=10, validation_data=(test_labels, test_data))
result = model.predict([2])
result = np.reshape(result, (28, 28))
plt.imshow(result)
plt.show()
I'm using google.colab since I havent yet been able to install tensorflow in my computer, maybe it has something to do with that. Thanks for any answers in advance!
This is very much possible, and has resulted in a vast area of research called Generative Adversarial Networks (GANs).
First off, let me list the problems with your approach:
You use a single number as input and expect the model to understand it. This does not practically work. It's better to use a representation called one-hot encoding.
For each label, multiple images exist. Mathematically, if the domain X consists of labels (one-hot encoded or not) and the range Y consists of images, the relationship is a one-to-many relationship, which can't be learned in a supervised fashion. By its very nature, supervised learning can be used to model only many-to-one relationships, or one-to-one relationships (although there is no point in using ML for this; dictionaries are a much better choice). Therefore, it is easy to predict labels for images, but impossible to generate images for labels using fully supervised approaches, unless you use only one image per label for training.
The way GANs solve the problem in (2) is by generating a "probable" image given a set of random values. Variations of GANs allow specifying the exact value to generate.
I suggest reading the paper that introduced GANs. Then try out the basic GAN before moving on to generate specific numbers.
here is my code
for _ in range(5):
K.clear_session()
model = Sequential()
model.add(LSTM(256, input_shape=(None, 1)))
model.add(Dropout(0.2))
model.add(Dense(256))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='RmsProp', metrics=['accuracy'])
hist = model.fit(x_train, y_train, epochs=20, batch_size=64, verbose=0, validation_data=(x_val, y_val))
p = model.predict(x_test)
print(mean_squared_error(y_test, p))
plt.plot(y_test)
plt.plot(p)
plt.legend(['testY', 'p'], loc='upper right')
plt.show()
Total params : 330,241
samples : 2264
and below is the result
I haven't changed anything.
I only ran for loop.
As you can see in the picture, the result of the MSE is huge, even though I have just run the for loop.
I think the fundamental reason for this problem is that the optimizer can not find global maximum and find local maximum and converge. The reason is that after checking all the loss graphs, the loss is no longer reduced significantly. (After 20 times) So in order to solve this problem, I have to find the global minimum. How should I do this?
I tried adjusting the number of batch_size, epoch. Also, I tried hidden layer size, LSTM unit, kerner_initializer addition, optimizer change, etc. but could not get any meaningful result.
I wonder how can I solve this problem.
Your valuable opinions and thoughts will be very much appreciated.
if you want to see full source here is link https://gist.github.com/Lay4U/e1fc7d036356575f4d0799cdcebed90e
From your example, the problem simply comes from the fact that you have over 100 times more parameters than you have samples. If you reduce the size of your model, you will see less variance.
The wider question you are asking is actually very interesting that usually isn't covered in tutorials. Nearly all Machine Learning models are by nature stochastic, the output predictions will change slightly everytime you run it which means you will always have to ask the question: Which model do I deploy to production ?
Off the top of my head there are two things you can do:
Choose the first model trained on all the data (after cross-validation, ...)
Build an ensemble of models that all have the same hyper-parameters and implement a simple voting strategy
References:
https://machinelearningmastery.com/train-final-machine-learning-model/
https://machinelearningmastery.com/randomness-in-machine-learning/
If you want to always start from the same point you should set some seed. You can do it like this if you use Tensorflow backend in Keras:
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)
If you want to learn why do you get different results in ML/DL models, I recommend this article.
I'm getting started with machine learning tools and I'd like to learn more about what the heck I'm doing. For instance, the script:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, BatchNormalization
from keras.initializers import RandomUniform
import numpy
numpy.random.seed(13)
RandomUniform(seed=13)
model = Sequential()
model.add(Dense(6, input_dim=6))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.01))
model.add(Dense(11))
model.add(Activation('tanh'))
model.add(Dropout(0.01))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(optimizer='sgd', loss='mean_absolute_error', metrics=['accuracy'])
data = numpy.loadtxt('train', delimiter=' ')
X = data[:, 0:6]
Y = data[:, 6]
model.fit(X, Y, batch_size=1, epochs=1000)
data = numpy.loadtxt('test', delimiter=' ')
X = data[:, 0:6]
Y = data[:, 6]
score = model.evaluate(X, Y, verbose=1)
print ('\n\nThe error is:\n', score, "\n")
print('\n\nPrediction:\n')
Y = model.predict(X, batch_size=1, verbose=1)
print('\nResult:\n', Y, '\n')
It's a Frankenstein I made from some examples I found on the internet and I have many unanswered questions about it:
The file train has 60 rows. Is 1000 epochs too little? Is it too much? Can I get an Underfit/Overfit?
What does the result I get from model.evaluate() mean? I know it's the loss but, if I get a [7.0506157875061035, 0.0], does it mean that my model has a 7% error?
And last, I'm getting a prediction of 0.99875391, 0.99875391, 0.9362126, 0.99875391, 0.99875391, 0.99875391, 0.93571019 when the expected values were anything close to 7.86, 3.57, 8.93, 6.57, 11.7, 8.53, 9.06, which means it's a real bad prediction. Clearly there's a lot of things I'm doing wrong. Could you guys give me a few pointers?
I know it all depends on the type of data I'm using, but is there anything I shouldn't do at all? Or maybe something I should be doing?
1
There is never a ready answer for how many epochs is a good number. It varies wildly depending on the size of your data, your model, and what you want to achieve. Normally, small models require less epochs, bigger models require more. Yours seem small enough and 1000 epochs seems way too much.
It also depends on the learning rate, a parameter given to the optimizer that defines how long are the steps your model takes to update its weights. Bigger learning rates mean less epochs, but there is a chance that you simply never find a good point because you're adjusting weights beyond what is good. Smaller learning rates mean more epochs and better learning.
Normally, if the loss reaches a limit, you're approaching a point where training is not useful anymore. (Of course, there may be problems with the model too, there is really no simple answer for this one).
To detect overfitting, you need besides the training data (X and Y), another group with test data (say Xtest and Ytest, for instance).
Then you use it in model.fit(X,Y, validation_data=(Xtest,Ytest), ...)
Test data is not given for training, it's kept separate just to see if your model can predict good things from data it has never seen in training.
If the training loss goes down, but the validation loss doesn't, you're overfitting (roughly, your model is capable of memorizing the training data without really understanding it).
An underfit, on the contrary, happens when you never achieve the accuracy you expect (of course we always expect a 100% accuracy, no mistakes, but good models get around the 90's, some applicatoins go better 99%, some worse, again, it's very subjective).
2
model.evaluate() gives you the losses and the metrics you added in the compile method.
The loss value is something your model will always try to decrease during training. It roughly means how distant your model is from the exact values. There is no rule for what the loss value means, it could even be negative (but usually keras uses positive losses). The point is: it must decrease during training, that means your model is evolving.
The accuracy value means how many right predictions your model outputs compared to the true values (Y). It seems your accuracy is 0%, your model is getting everything wrong. (You can see that from the values you typed).
3
In your model, you used activation functions. These normalize the results so they don't get too big. This avoids overflowing problems, numeric errors propagating, etc.
It's very very usual to work with values within such bounds.
tanh - outputs values between -1 and 1
sigmoid - outputs values between 0 and 1
Well, if you used a sigmoid activation in the last layer, your model will never output 3 for instance. It tries, but the maximum value is 1.
What you should do is prepare your data (Y), so it's contained between 0 and 1. (This is the best to do in classification problems, often done with images too)
But if you actually want numerical values, then you should just remove the activation and let the output be free to reach higher values. (It all depends on what you want to achieve with your model)
Epoch is a single pass through the full training set. I my mind it seems a lot, but you'd have to check for overfitting and evaluate the predictions. There are many ways of checking and controlling for overfitting in a model. If you understand the methods of doing so from here, coding them in Keras should be no problem.
According to the documentation .evaluate returns:
Scalar test loss (if the model has no metrics) or list of scalars (if the model computes other metrics)
so these are the evaluation metrics of your model, they tell you how good your model is given some notion of good. Those metrics depend on the model and type of data that you've used. Some explanation on those can be found here and here. As mentioned in the documentation,
The attribute model.metrics_names will give you the display labels for the scalar outputs.
So you can know what metric you are looking at. It is easier to do that interactively through the console (ipython, bpython) or Jupyter notebook.
I can't see your data, but a if you are doing a classification problem as suggested by metrics=['accuracy'], the loss=mean_absolute_error doesn't make sense, since it is made for regression problems. To learn more about those I refer you to here and here which discuss classification and regression problems with Keras.
PS: question 3 is not related to software per se, but to the theoretical construct supporting the software. In such cases, I'd recommend asking them at Cross Validated.