I am quite new with PyTorch, and I am trying to use an object detection model to do transfer learning in order to learn how to detect my new dataset.
Here is how I load the dataset:
train_dataset = MyDataset(train_data_path, 512, 512, train_labels_path, get_train_transform())
train_loader = DataLoader(train_dataset,batch_size=8,shuffle=True,num_workers=4,collate_fn=collate_fn)
valid_dataset = MyDataset(test_data_path, 512, 512, test_labels_path, get_valid_transform())
valid_loader = DataLoader(valid_dataset,batch_size=8, shuffle=False,num_workers=4,collate_fn=collate_fn)
I define the model and optimizer as follows:
# load Faster RCNN pre-trained model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="FasterRCNN_ResNet50_FPN_Weights.COCO_V1") # get the number of input features
in_features = model.roi_heads.box_predictor.cls_score.in_features
# define a new head for the detector with the required number of classes
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model = model.to(DEVICE)
# get the model parameters
params = [p for p in model.parameters() if p.requires_grad]
# define the optimizer
# We are using the SGD optimizer with a learning rate of 0.001 and momentum on 0.9.
optimizer = torch.optim.SGD(params, lr=0.001, momentum=0.9, weight_decay=0.0005)
I train the model as follows:
def train(train_data_loader, model, optimizer, train_loss_hist):
global train_itr
global train_loss_list
prog_bar = tqdm(train_data_loader, total=len(train_data_loader), position=0, leave=True, ascii=True)
# Then we have the for loop iterating over the batches.
for i, data in enumerate(prog_bar):
optimizer.zero_grad()
images, targets = data
images = list(image.to(DEVICE) for image in images)
targets = [{k: v.to(DEVICE) for k, v in t.items()} for t in targets]
# Forward pass
loss_dict = model(images, targets)
# Then we sum the losses and append the current iterations loss value to train_loss_list list.
losses = sum(loss for loss in loss_dict.values())
loss_value = losses.item()
# We also send the current loss value to train_loss_hist of the Averager class.
train_loss_list.append(loss_value)
train_loss_hist.send(loss_value)
# Then we backpropagate the gradients and update parameters.
losses.backward()
optimizer.step()
train_itr += 1
return train_loss_list
Considering that I adapted one code I found and I am not sure where the loss is defined (I have not defined any kind of loss in the code, so I believe it will use the default loss that was used to train the original object detector), how can I train my network considering such an imbalanced dataset and update my code?
It seems that you have two questions.
How to deal with imbalanced dataset.
Note that Faster-RCNN is an Anchor-Based detector, which means number of anchors containing the object is extremely small compared to the number of total anchors, so you don't need to deal with the imbalanced dataset. Or you can use RetinaNet which proposed a loss function called focal loss to improve performance upon imbalanced dataset.
Where is the loss function.
torchvision integrated the loss function inside the model object, you can debug your python code step by step inside the torchvision package and see the implementation details
Related
I'm attempting feature extraction in an unorthodox way. I extract features in eval() mode to switch off the batch norm and dropout layers and use the running means and std provided by ImageNet.
I use a feature extractor to extract features from two related images and concatenate the two tensors stackwise before passing through a linear dense classifier model for training. I'm wondering whether I can avoid using with torch.no_grad() as the two models are unrelated.
Here is a simplified version:
num_classes = 2
num_epochs = 10
criterion = nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(classifier.parameters(), lr=0.001)
densenet= DenseNetConv()
# set densenet to eval to switch off batch norm and dropout layers and use ImageNet running means/ std devs
densenet.eval()
densenet.to(device)
classifier = nn.Linear(4416, num_classes)
classifier.to(device)
for epoch in range(num_epochs):
classifier.train()
for i, (inputs_1, inputs_2, labels) in enumerate(dataloaders_dict['train']):
inputs_1= inputs_1.to(device)
inputs_2 = inputs_2.to(device)
labels = labels.to(device)
features_1 = densenet(inputs_1) # extract features 1
features_2 = densenet(inputs_2) # extract features 2
combined = torch.cat([features_1, features_2], dim=1) # combine features
combined = combined(-1, 4416) # reshape
optimizer.zero_grad()
# Forward pass to get output/logits
outputs = classifier(combined)
# Calculate Loss: softmax --> cross entropy loss
loss = criterion(outputs, labels)
_, pred = torch.max(outputs, 1)
equality_check = (labels.data == pred)
# Getting gradients w.r.t. parameters
loss.backward()
optimizer.step()
As you can see, I do not call with torch.no_grad(), despite having densenet.eval() as my separate feature extractor. Is there an issue with the way this is implemented or can I assume that this will not interfere with the classifier model?
If you are doing inference on a model, applying torch.no_grad() won't have any effect on the resulting output. As you've said only nn.Module.eval will since it modifies how the forward operation is performed (namely which statistics to use to normalize the batch elements).
It is recommended to switch off gradient computation when backpropagation is not necessary. This avoids caching activations on forward call resulting in faster inference time.
In your case, you can either wrap your inference call on densenet with torch.no_grad:
torch.no_grad():
features_1 = densenet(inputs_1) # extract features 1
features_2 = densenet(inputs_2) # extract features 2
Or alternatively, switch off the requires_grad flag on your module's parameter tensors using nn.Module.requires_grad_:
densenet.eval()
densenet.requires_grad_(False)
I need to create a custom training loop with Tensorflow / Keras (because I want to have more than one optimizer and tell which weights each optimizer should act upon).
Although this tutorial and that one too are quite clear regarding this matter, they miss a very important point: how do I predict for training phase and how do I predict for validation phase?
Suppose my model has Dropout layers, or BatchNormalization layers. They certainly work in a completely different way whether they are in training or validation.
How do I adapt these tutorials? This is a dummy example (may contain one or two pieces of pseudocode):
# Iterate over epochs.
for epoch in range(3):
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
#model with two outputs
#IMPORTANT: must be in training phase (use dropouts, calculate batch statistics)
logits1, logits2 = model(x_batch_train) #must be "training"
loss_value1 = loss_fn1(y_batch_train[0], logits1)
loss_value2 = loss_fn2(y_batch_train[1], logits2)
grads1 = tape.gradient(loss_value1, model.trainable_weights[selection1])
grads2 = tape.gradient(loss_value2, model.trainable_weights[selection2])
optimizer1.apply_gradients(zip(grads1, model.trainable_weights[selection1]))
optimizer2.apply_gradients(zip(grads2, model.trainable_weights[selection2]))
# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
##Important: must be validation phase
#dropouts are off: calculate all neurons and divide value
#batch norms use previously calculated statistics
val_logits1, val_logits2 = model(x_batch_val)
#.... do the evaluations
I think you can just pass a training parameter when you call a tf.keras.Model, and it will be passed down to the layers:
# On training
logits1, logits2 = model(x_batch_train, training=True)
# On evaluation
val_logits1, val_logits2 = model(x_batch_val, training=False)
What I do:
I am training a pre-trained CNN with Keras fit_generator(). This produces evaluation metrics (loss, acc, val_loss, val_acc) after each epoch. After training the model, I produce evaluation metrics (loss, acc) with evaluate_generator().
What I expect:
If I train the model for one epoch, I would expect that the metrics obtained with fit_generator() and evaluate_generator() are the same. They both should derive the metrics based on the entire dataset.
What I observe:
Both loss and acc are different from fit_generator() and evaluate_generator():
What I don't understand:
Why the accuracy from fit_generator() is
different to that from evaluate_generator()
My code:
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory\
(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True) # whether to shuffle the data
return generator
[...]
def train_model(model, nBatches, nEpochs, trainGenerator, valGenerator, resultPath):
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=None, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=32, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=True, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
print("%s: Model trained." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
# Save model
modelPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelArchitecture.h5')
weightsPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelWeights.h5')
model.save(modelPath)
model.save_weights(weightsPath)
print("%s: Model saved." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
return history, model
[...]
def evaluate_model(model, generator):
score = model.evaluate_generator(generator=generator, # Generator yielding tuples
steps=
generator.samples//nBatches) # number of steps (batches of samples) to yield from generator before stopping
print("%s: Model evaluated:"
"\n\t\t\t\t\t\t Loss: %.3f"
"\n\t\t\t\t\t\t Accuracy: %.3f" %
(datetime.now().strftime('%Y-%m-%d_%H-%M-%S'),
score[0], score[1]))
[...]
def main():
# Create model
modelUntrained = create_model(imagesize, nBands, nClasses)
# Prepare training and validation data
trainGenerator = generate_data(imagePathTraining, imagesize, nBatches)
valGenerator = generate_data(imagePathValidation, imagesize, nBatches)
# Train and save model
history, modelTrained = train_model(modelUntrained, nBatches, nEpochs, trainGenerator, valGenerator, resultPath)
# Evaluate on validation data
print("%s: Model evaluation (valX, valY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, valGenerator)
# Evaluate on training data
print("%s: Model evaluation (trainX, trainY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, trainGenerator)
Update
I found some sites that report on this issue:
The Batch Normalization layer of Keras is broken
Strange
behaviour of the loss function in keras model, with pretrained
convolutional base
model.evaluate() gives a different loss on
training data from the one in training process
Got different accuracy between history and evaluate
ResNet: 100% accuracy during training, but 33% prediction
accuracy with the same data
I tried following some of their suggested solutions without success so far. acc and loss are still different from fit_generator() and evaluate_generator(), even when using the exact same data generated with the same generator for training and validation. Here is what I tried:
statically setting the learning_phase for the entire script or before adding new layers to the pre-trained ones
K.set_learning_phase(0) # testing
K.set_learning_phase(1) # training
unfreezing all batch normalization layers from the pre-trained model
for i in range(len(model.layers)):
if str.startswith(model.layers[i].name, 'bn'):
model.layers[i].trainable=True
not adding dropout or batch normalization as untrained layers
# Create pre-trained base model
basemodel = ResNet50(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
# Create new untrained layers
x = basemodel.output
x = GlobalAveragePooling2D()(x) # global spatial average pooling layer
x = Dense(1024, activation='relu')(x) # fully-connected layer
y = Dense(nClasses, activation='softmax')(x) # logistic layer making sure that probabilities sum up to 1
# Create model combining pre-trained base model and new untrained layers
model = Model(inputs=basemodel.input,
outputs=y)
# Freeze weights on pre-trained layers
for layer in basemodel.layers:
layer.trainable = False
# Define learning optimizer
learningRate = 0.01
optimizerSGD = optimizers.SGD(lr=learningRate, # learning rate.
momentum=0.9, # parameter that accelerates SGD in the relevant direction and dampens oscillations
decay=learningRate/nEpochs, # learning rate decay over each update
nesterov=True) # whether to apply Nesterov momentum
# Compile model
model.compile(optimizer=optimizerSGD, # stochastic gradient descent optimizer
loss='categorical_crossentropy', # objective function
metrics=['accuracy'], # metrics to be evaluated by the model during training and testing
loss_weights=None, # scalar coefficients to weight the loss contributions of different model outputs
sample_weight_mode=None, # sample-wise weights
weighted_metrics=None, # metrics to be evaluated and weighted by sample_weight or class_weight during training and testing
target_tensors=None) # tensor model's target, which will be fed with the target data during training
using different pre-trained CNNs as base model (VGG19, InceptionV3, InceptionResNetV2, Xception)
from keras.applications.vgg19 import VGG19
basemodel = VGG19(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
Please let me know if there are other solutions around that I am missing.
I now managed having the same evaluation metrics. I changed the following:
I set seed in flow_from_directory() as suggested by #Anakin
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True, # whether to shuffle the data
seed=42) # random seed for shuffling and transformations
return generator
I set use_multiprocessing=False in fit_generator() according to the warning: use_multiprocessing=True and multiple workers may duplicate your data
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=callback, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=1, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=False, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
I unified my python setup as suggested in the keras documentation on how to obtain reproducible results using Keras during development
import tensorflow as tf
import random as rn
from keras import backend as K
np.random.seed(42)
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
Instead of rescaling input images with datagen = ImageDataGenerator(rescale=1./255), I now generate my data with:
from keras.applications.resnet50 import preprocess_input
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
With this, I managed to have a similar accuracy and loss from fit_generator() and evaluate_generator(). Also, using the same data for training and testing now results in a similar metrics. Reasons for remaining differences are provided in the keras documentation.
Set use_multiprocessing=False at fit_generator level fixes the problem BUT at the cost of slowing down training significantly. A better but still imperfect workround would be to set use_multiprocessing=False for only the validation generator as the code below modified from keras' fit_generator function.
...
try:
if do_validation:
if val_gen and workers > 0:
# Create an Enqueuer that can be reused
val_data = validation_data
if isinstance(val_data, Sequence):
val_enqueuer = OrderedEnqueuer(val_data,
**use_multiprocessing=False**)
validation_steps = len(val_data)
else:
val_enqueuer = GeneratorEnqueuer(val_data,
**use_multiprocessing=False**)
val_enqueuer.start(workers=workers,
max_queue_size=max_queue_size)
val_enqueuer_gen = val_enqueuer.get()
...
Training for one epoch might not be informative enough in this case. Also your train and test data may not be exactly same, since you are not setting a random seed to the flow_from_directory method. Have a look here.
Maybe, you can set a seed, remove augmentations (if any) and save trained model weights to load them later to check.
I am experimenting with TensorFlow 2.0 (alpha). I want to implement a simple feed forward Network with two output nodes for binary classification (it's a 2.0 version of this model).
This is a simplified version of the script. After I defined a simple Sequential() model, I set:
# import layers + dropout & activation
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.activations import elu, softmax
# Neural Network Architecture
n_input = X_train.shape[1]
n_hidden1 = 15
n_hidden2 = 10
n_output = y_train.shape[1]
model = tf.keras.models.Sequential([
Dense(n_input, input_shape = (n_input,), activation = elu), # Input layer
Dropout(0.2),
Dense(n_hidden1, activation = elu), # hidden layer 1
Dropout(0.2),
Dense(n_hidden2, activation = elu), # hidden layer 2
Dropout(0.2),
Dense(n_output, activation = softmax) # Output layer
])
# define loss and accuracy
bce_loss = tf.keras.losses.BinaryCrossentropy()
accuracy = tf.keras.metrics.BinaryAccuracy()
# define optimizer
optimizer = tf.optimizers.Adam(learning_rate = 0.001)
# save training progress in lists
loss_history = []
accuracy_history = []
# loop over 1000 epochs
for epoch in range(1000):
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(X_train), y_train)
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# save in history vectors
current_loss = current_loss.numpy()
loss_history.append(current_loss)
accuracy.update_state(model(X_train), y_train)
current_accuracy = accuracy.result().numpy()
accuracy_history.append(current_accuracy)
# print loss and accuracy scores each 100 epochs
if (epoch+1) % 100 == 0:
print(str(epoch+1) + '.\tTrain Loss: ' + str(current_loss) + ',\tAccuracy: ' + str(current_accuracy))
accuracy.reset_states()
print('\nTraining complete.')
Training goes without errors, however strange things happen:
Sometimes, the Network doesn't learn anything. All loss and accuracy scores are constant throughout all the epochs.
Other times, the network is learning, but very very badly. Accuracy never went beyond 0.4 (while in TensorFlow 1.x I got an effortless 0.95+). Such a low performance suggests me that something went wrong in the training.
Other times, the accuracy is very slowly improving, while the loss remains constant all the time.
What can cause these problems? Please help me understand my mistakes.
UPDATE:
After some corrections, I can make the Network learn. However, its performance is extremely poor. After 1000 epochs, it reaches about %40 accuracy, which clearly means something is still wrong. Any help is appreciated.
The tf.GradientTape is recording every operation that happens inside its scope.
You don't want to record in the tape the gradient calculation, you only want to compute the loss forward.
with tf.GradientTape() as tape:
# take binary cross-entropy (bce_loss)
current_loss = bce_loss(model(df), classification)
# End of tape scope
# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
# The tape is now consumed
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
More importantly, I don't see the loop on the training set, therefore I suppose the complete code looks like:
for epoch in range(n_epochs):
for df, classification in dataset:
# your code that computes loss and trains
Moreover, the usage of the metrics is wrong.
You want to accumulate, thus update the internal state of the accuracy operation, at every training step and measure the overall accuracy at the end of every epoch.
Thus you have to:
# Measure the accuracy inside the training loop
accuracy.update_state(model(df), classification)
And call accuracy.result() only at the end of the epoch, when all the accuracy value have been saved into the metric.
Remember to call to the .reset_states() method to clears the variable states, resetting it to zero at the end of every epoch.
I am working on texture classification and based on previous works, I am trying to modify the final layer of AlexNET to have 20 classes, and train only that layer for my multi class classification problem.
I am using Tensorflow-GPU on an NVIDIA GTX 1080, Python3.6 on Ubuntu 16.04.
I am using the Gradient Descent Optimiser and the class Estimator to build this. I am also using two dropout layers for regularization. Therefore, my hyper parameters are the learning rate, batch_size, and weight_decay. I have tried using batch_size of 50,100,200,weight_decays of 0.005 and 0.0005, and learning rates of 1e-3,1e-4,and 1e-5. All the training loss curves for the above values follow similar trends.
My training loss curve does not monotonically decrease and instead seems to oscillate. I have provided a tensorboard visualization for learning rate=1e-5, weight decay=0.0005, and batch_size=200.
Please assist in understanding what went wrong and how I could possibly rectify it.
The Tensorboard Visualization for the case I specified
# Create the Estimator
classifier = tf.estimator.Estimator(model_fn=cnn_model)
# Set up logging for predictions
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=10)
# Train the model
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": train_data},y=train_labels,batch_size=batch_size,num_epochs=None,shuffle=True)
classifier.train(input_fn=train_input_fn, steps=200000, hooks=[logging_hook])
# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": eval_data},
y=eval_labels,
num_epochs=1,
shuffle=False)
eval_results = classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)
#Sections of the cnn_model
#Output Config
predictions = { "classes": tf.argmax(input=logits, axis=1),# Generate predictions (for PREDICT and EVAL mode)
"probabilities": tf.nn.softmax(logits, name="softmax_tensor")} # Add `softmax_tensor` to the graph. It is used for PREDICT and by the `logging_hook`.
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Calculate Loss (for both TRAIN and EVAL modes)
onehot_labels = tf.one_hot(indices=tf.cast(labels,tf.int32),depth=20)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
#Training Config
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
tf.summary.scalar('training_loss',loss)
summary_hook = tf.train.SummarySaverHook(save_steps=10,output_dir='outputs',summary_op=tf.summary.merge_all())
train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op,training_hooks=[summary_hook])
# Evaluation Metric- Accuracy
eval_metric_ops = {"accuracy": tf.metrics.accuracy(labels=labels, predictions=predictions["classes"])}
print(time.time()-t)
tf.summary.scalar('eval_loss',loss)
ac=tf.metrics.accuracy(labels=labels,predictions=predictions["classes"])
tf.summary.scalar('eval_accuracy',ac)
evaluation_hook= tf.train.SummarySaverHook(save_steps=10,output_dir='outputseval',summary_op=tf.summary.merge_all())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops,evaluation_hooks=[evaluation_hook])
Are you selecting your mini-batches randomly? It looks like you have a high variance across your mini-batches which leads to a high variance of the loss at different iterations.
I assume the x-axis in your plot is iterations and not epochs and the training data provided every ~160 iterations is harder to predict which leads to the periodic drop in your loss curve. How does your validation loss behave?
Possible solutions/ideas:
try randomizing your training data selection in a better way
Check your training data for mislabeled examples