Logging training and validation loss in tensorboard

Logging training and validation loss in tensorboard - python

I'm trying to learn how to use tensorflow and tensorboard. I have a test project based on the MNIST neural net tutorial.
In my code, I construct a node that calculates the fraction of digits in a data set that are correctly classified, like this:
correct = tf.nn.in_top_k(self._logits, labels, 1)
correct = tf.to_float(correct)
accuracy = tf.reduce_mean(correct)
Here, self._logitsis the inference part of the graph, and labels is a placeholder that contains the correct labels.
Now, what I would like to do is evaluate the accuracy for both the training set and the validation set as training proceeds. I can do this by running the accuracy node twice, with different feed_dicts:
train_acc = tf.run(accuracy, feed_dict={images : training_set.images, labels : training_set.labels})
valid_acc = tf.run(accuracy, feed_dict={images : validation_set.images, labels : validation_set.labels})
This works as intended. I can print the values, and I can see that initially, the two accuracies will both increase, and eventually the validation accuracy will flatten out while the training accuracy keeps increasing.
However, I would also like to get graphs of these values in tensorboard, and I can not figure out how to do this. If I simply add a scalar_summary to accuracy, the logged values will not distinguish between training set and validation set.
I also tried creating two identical accuracy nodes with different names and running one on the training set and one on the validation set. I then add a scalar_summary to each of these nodes. This does give me two graphs in tensorboard, but instead of one graph showing the training set accuracy and one showing the validation set accuracy, they are both showing identical values that do not match either of the ones printed to the terminal.
I am probably misunderstanding how to solve this problem. What is the recommended way of separately logging the output from a single node for different inputs?

There are several different ways you could achieve this, but you're on the right track with creating different tf.summary.scalar() nodes. Since you must explicitly call SummaryWriter.add_summary() each time you want to log a quantity to the event file, the simplest approach is probably to fetch the appropriate summary node each time you want to get the training or validation accuracy:
accuracy = tf.reduce_mean(correct)
training_summary = tf.summary.scalar("training_accuracy", accuracy)
validation_summary = tf.summary.scalar("validation_accuracy", accuracy)
summary_writer = tf.summary.FileWriter(...)
for step in xrange(NUM_STEPS):
# Perform a training step....
if step % LOG_PERIOD == 0:
# To log training accuracy.
train_acc, train_summ = sess.run(
[accuracy, training_summary],
feed_dict={images : training_set.images, labels : training_set.labels})
writer.add_summary(train_summ, step)
# To log validation accuracy.
valid_acc, valid_summ = sess.run(
[accuracy, validation_summary],
feed_dict={images : validation_set.images, labels : validation_set.labels})
writer.add_summary(valid_summ, step)
Alternatively, you could create a single summary op whose tag is a tf.placeholder(tf.string, []) and feed the string "training_accuracy" or "validation_accuracy" as appropriate.

Another way to do it, is to use a second file writer. So you are able to use the merge_summaries command.
train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
sess.graph)
test_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/test')
tf.global_variables_initializer().run()
Here is the complete documentation. This works for me fine : TensorBoard: Visualizing Learning

Related

not able to predict using pytorch [MNIST]

pytorch noob here,trying to learn.
link to my notebook:
https://gist.github.com/jagadeesh-kotra/412f371632278a4d9f6cb31a33dfcfeb
I get validation accuracy of 95%.
i use the following to predict:
m.eval()
testset_predictions = []
for batch_id,image in enumerate(test_dataloader):
image = torch.autograd.Variable(image[0])
output = m(image)
_, predictated = torch.max(output.data,1)
for prediction in predicted:
testset_predictions.append(prediction.item())
len(testset_predictions)
The problem is i get only 10% accuracy when i submit the result to kaggle competition,which is as good as random prediction.I cant figure out what i'm doing wrong.
please help :)

Most probably it is due to a typo; while you want to use the newly created predictated outcomes, you actually use predicted:
_, predictated = torch.max(output.data,1)
for prediction in predicted:
which predicted comes from earlier in your linked code, and it contains predictions from the validation set instead of your test set:
#validation
# ...
for batch_idx, (data, target) in enumerate(val_dataloader):
data, target = Variable(data), Variable(target)
output = m.forward(data)
_, predicted = torch.max(output.data,1)
So, you don't even get an error message, because predicted indeed exists - it's just not what you actually want to use... You end up submitting the results for the validation set instead of the test one (it certainly doesn't help that both consist of 10,000 samples), hence you expectedly get a random guessing accuracy of ~ 10%...

Regressor Neural Network built with Keras only ever predicts one value

I'm trying to build a NN with Keras and Tensorflow to predict the final chart position of a song, given a set of 5 features.
After playing around with it for a few days I realised that although my MAE was getting lower, this was because the model had just learned to predict the mean value of my training set for all input, and this was the optimal solution. (This is illustrated in the scatter plot below)
This is a random sample of 50 data points from my testing set vs what the network thinks they should be
At first I realised this was probably because my network was too complicated. I had one input layer with shape (5,) and a single node in the output layer, but then 3 hidden layers with over 32 nodes each.
I then stripped back the excess layers and moved to just a single hidden layer with a couple nodes, as shown here:
self.model = keras.Sequential([
keras.layers.Dense(4,
activation='relu',
input_dim=num_features,
kernel_initializer='random_uniform',
bias_initializer='random_uniform'
),
keras.layers.Dense(1)
])
Training this with a gradient descent optimiser still results in exactly the same prediction being made the whole time.
Then it occurred to me that perhaps the actual problem I'm trying to solve isn't hard enough for the network, that maybe it's linearly separable. Since this would respond better to not having a hidden layer at all, essentially just doing regular linear regression, I tried that. I changed my model to:
inp = keras.Input(shape=(num_features,))
out = keras.layers.Dense(1, activation='relu')(inp)
self.model = keras.Model(inp,out)
This also changed nothing. My MAE, the predicted value are all the same.
I've tried so many different things, different permutations of optimisation functions, learning rates, network configurations, and nothing can help. I'm pretty sure the data is good, but I've included a sample of it just in case.
chartposition,tagcount,dow,artistscore,timeinchart,finalpos
121,3925,5,35128,7,227
131,4453,3,85545,25,130
69,2583,4,17594,24,523
145,1165,3,292874,151,187
96,1679,5,102593,111,540
134,3494,5,1252058,37,370
6,34895,7,6824048,22,5
A sample of my dataset, finalpos is the value I'm trying to predict. Dataset contains ~40,000 records, split 80/20 - training/testing
def __init__(self, validation_split, num_features, should_log):
self.should_log = should_log
self.validation_split = validation_split
inp = keras.Input(shape=(num_features,))
out = keras.layers.Dense(1, activation='relu')(inp)
self.model = keras.Model(inp,out)
optimizer = tf.train.GradientDescentOptimizer(0.01)
self.model.compile(loss='mae',
optimizer=optimizer,
metrics=['mae'])
def train(self, data, labels, plot=False):
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
history = self.model.fit(data,
labels,
epochs=self.epochs,
validation_split=self.validation_split,
verbose=0,
callbacks = [PrintDot(), early_stop])
if plot: self.plot_history(history)
All code relevant to constructing and training the networ
def normalise_dataset(df, mini, maxi):
return (df - mini)/(maxi-mini)
Normalisation of the input data. Both my testing and training data are normalised to the max and min of the testing set
Graph of my loss vs validation curves with the one hidden layer network with an adamoptimiser, learning rate 0.01
Same graph but with linear regression and a gradient descent optimiser.

So I am pretty sure that your normalization is the issue: You are not normalizing by feature (as is the de-fact industry standard), but across all data.
That means, if you have two different features that have very different orders of magnitude/ranges (in your case, compare timeinchart with artistscore.
Instead, you might want to normalize using something like scikit-learn's StandardScaler. Not only does this normalize per column (so you can pass all features at once), but it also does unit variance (which is some assumption about your data, but can potentially help, too).
To transform your data, use something along these lines
from sklearn.preprocessing import StandardScaler
import numpy as np
raw_data = np.array([[1,40], [2, 80]])
scaler = StandardScaler()
processed_data = scaler.fit_transform(raw_data)
# fit() calculates mean etc, transform() puts it to the new range.
print(processed_data) # returns [[-1, -1], [1,1]]
Note that you have two possibilities to normalize/standardize your training data:
Either scale them together with your training data, and then split afterwards,
or you instead only fit the training data, and then use the same scaler to transform your test data.
Never fit_transform your test set separate from training data!
Since you have potentially different mean/min/max values, you can end up with totally wrong predictions! In a sense, the StandardScaler is your definition of your "data source distribution", which is inherently still the same for your test set, even though they might be a subset not exactly following the same properties (due to small sample size etc.)
Additionally, you might want to use a more advanced optimizer, like Adam, or specify some momentum property (0.9 is a good choice in practic, as a rule of thumb) for your SGD.

Turns out the error was a really stupid and easy to miss bug.
When I was importing my dataset, I shuffle it, however when I performed the shuffling, I was accidentally applying the shuffling only to the labels set, not the whole dataset as a whole.
As a result, each label was being assigned to a completely random feature set, of course the model didn't know what to do with this.
Thanks to #dennlinger for suggesting for me to look in the place where I eventually found this bug.

Delay gap between reality and prediction

Using machine learning (as library I've tried Tensorflow and Tflearn (which, I know is just a wrapping of Tensorflow)) I'm trying to predict the congestion in an area for the next week (see my previous questions if you want more backstory on it). My training set is composed of 400K tagged entry (with the date an congestion value for each minute).
My problem is that I now have a time gap between predictions and reality.
If I had to draw a chart with the reality and prediction you would see that my prediction, while have the same shape as the reality is in advance. She increase/decrease before the reality. It started to make me think that maybe my training had a problem. It would seem like that my prediction didn't start when my training ended.
Both of my data-sets (training/testing) are on 2 different file. First I train on my training set (for convenience sake let's say it end at 100th minutes and my testing set start at 101th minute), once my model saved I do my predictions, it should normally then start to predict 101 or am I wrong somewhere? Because it seem like it's starting to predict way way after my training stopped (if I keep my example it would start predicting value 107 for example).
For now one of a bad fix was to remove from the training set as many value as I had of delay (take this example, it would be 7) and it worked, no more delay but I don't understand why I have this problem or how to fix it so it wouldn't happen later.
Following some advices found on different website it seem like having gap in my training dataset (missing timestamp in this case) could be a problem, seeing that there do was some (in total around 7 to 9% of the whole dataset was missing) I've used Pandas to add the missing timestamps (I've also gave them the congestion value of the last know timestamp) while I do think that it may have helped a little (the gap is smaller) it haven't fixed the problem.
I tried multistep forecasting, multivariate forecasting, LSTM, GRU, MLP, Tensorflow, Tflearn but it change nothing making me think it could come from my training.
Here is my model training.
def fit_lstm(train, batch_size, nb_epoch, neurons):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
print X.shape
print y.shape
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(None, X.shape[1], X.shape[2]), stateful=False))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
model.reset_states()
return model
The 2 shape are :
(80485, 1, 1)
(80485,)
(On this example I'm using only 80K of data as training for speed purpose).
As parameter I'm using 1 neuron, 64 of batch_size and 5 epoch.
My dataset is made of 2 file. First is the training file with 2 column:
timestamp | values
The second have the same shape but is the testing set (separated to avoid any influence of it on my prediction), the file is only used once every prediction have been made and to compare reality and prediction. The testing set start where the training set stop.
Do you have an idea of what could be the reason of this problem?
Edit:
On my code I have this function:
# invert differencing
yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]
It's supposed to invert the difference (to go from a scaled value to the real one).
When using it like in the pasted example (using the testing set) I get perfection, accuracy above 95% and no gap.
Since in reality we wouldn't know theses values I had to change it.
I tried first to use the training set but got the problem explained on this post:
Why is this happening? Is there an explanation for this problem?

Found it. It was a problem with the "def inverse_difference(history, yhat, interval=1):" function. In fact it make my result look like my last lines of training. This is why I had a gap, since there is a pattern in my data (peak always at more or less the same moment) I thought he was doing prediction while he was just giving me back values from training.

TensorFlow: Does each session run initiate a different batch of data in a graph?

Meaning to say if I have the following graph like:
images, labels = load_batch(...)
with slim.arg_scope(inception_resnet_v2_arg_scope()):
logits, end_points = inception_resnet_v2(images, num_classes = dataset.num_classes, is_training = True)
predictions = tf.argmax(end_points['Predictions'], 1)
accuracy, accuracy_update = tf.contrib.metrics.streaming_accuracy(predictions, labels)
....
train_op = slim.learning.create_train_op(...)
and in a supervisor managed_session as sess within the graph context, I run the following every once in a while:
print sess.run(logits)
print sess.run(end_points['Predictions'])
print sess.run(predictions)
print sess.run(labels)
Do they actually call in different batches for each sess run, given that the batch tensor must actually start from load_batch onwards before they ever get to logits, predictions, or labels? Because now when I run each of these sessions, I get a very confusing result in that even the predictions do not match tf.argmax(end_points['Predictions'], 1), and despite a high accuracy in the model, I do not get any predictions that remotely even match the labels to give that kind of high accuracy. Therefore I suspect that each of the result from sess.run probably come from a different batch of data.
This brings me to my next question: Is there a way to inspect the results of different parts of a graph when a batch from load_batch goes all the way till a train_op, where the sess.run is actually run instead? In other words, is there a way to do what I want to do without calling for another sess.run?
Also, if I were to check the results using sess.run in such a way, would it affect my training in that some batches of data will be skipped and not reach the train_op?

I realized there is a problem with running using separate sess.run in that the data loaded is always different. Instead, when I did something like:
logits, probabilities, predictions, labels = sess.run([logits, probabilities, predictions, labels])
print 'logits: \n', logits
print 'Probabilities: \n', probabilities
print 'predictions: \n', predictions
print 'Labels:\n:', labels
All the quantities coincide very well as what I had expected. I had also tried using tf.Print by writing something like:
logits = tf.Print(logits, [logits], message = 'logits: \n', summarize = 100)
immediately after defining the logits, so that they can get printed within the same session I run the train_op. However, the printing is rather messy and so I would prefer the first method of running everything in a session to obtain the values first and then printing them normally like numpy arrays.

Edgy accuracy and loss plots in tensorboard

I use tensorboard to visualize learning of my neural networks. Currently, I am working with datasets of such size, that equal, say 80%/20% split into training and development collections leaves me with the data that does not fit into memory during processing, so I have to use batches for development step.
When I do so, I end up with following shapes on the tensorboard charts:
Light-green line represents batched summary, while the dark green one represents unbatched development processing. I used datasets of same size and split for those two charts.
Here is the code, that is used to populate summaries
# accuracy definition in network
network.correct_predictions = tf.equal(predictions, tf.argmax(input_y, 1))
network.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
network.loss = tf.reduce_mean(_losses)
# accuracy summary
acc_summary = tf.scalar_summary("accuracy", network.accuracy)
loss_summary = tf.scalar_summary("loss", network.loss)
# dev data summary writer
dev_summary_operation = tf.merge_summary([loss_summary, acc_summary])
dev_summary_writer = tf.train.SummaryWriter(dev_summary_dir, session.graph)
# dev step
def dev_step(x_batch, y_batch)
feed_dict = {network.input_x: x_batch, network.input_y: y_batch}
step, summaries, loss, accuracy = session.run([global_step, dev_summary_operation, network.loss, network.accuracy], feed_dict)
dev_summary_writer.add_summary(summaries, step)
Same approach during training step gives normal lines (those, that are not edgy) since training is batched in both cases.
In both cases (batched or not) development summary gets updated once in 100 training batches.
Suggestions of reasons for such behaviour and possible fixes are more than welcome and highly appreciated.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.