Caffe - inconsistency in the activation feature values - GPU mode - python

Hi I am using caffe on Ubuntu 14.04,
CUDA version 7.0 (latest)
cudnn version 2 (latest)
GPU : NVIDIA GT 730
In caffe first I get the initialization done and then I load the imagenet model (Alexnet). I also initialize the gpu using set_mode_gpu()
After that I take an image. I copy this image onto the caffe source blob. Then I perform a forward pass for this image by using : net.forward(end='fc7')
Then I extract the 4096 dimensional fc7 output.(the activation features of the fc7 layer)
The problem I am facing is that when I run the same code multiple times, everytime I obtain a different result. That is, in GPU mode, everytime the activation features are different for the same image. When I am using forward pass, the function of the network is supposed to be deterministic right ? So I should get the same output everytime for the same image.
On the other hand, when I run caffe on cpu by using set_mode_cpu() everything works perfectly, i.e, I get the same output each time
The code used and the outputs obtained are shown below. I am not able to understand what the problem is. Is it that the problem is caused due to GPU rounding off ? But the errors are very large. Or is it due to some issues with the latest CUDNN version ? Or is it something else altogether ?
Following is the CODE
1) IMPORT libraries
from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
from IPython.display import clear_output, Image, display
from google.protobuf import text_format
import scipy
import matplotlib.pyplot as plt
import caffe
2) IMPORT Caffe Models and define utility functions
model_path = '../../../caffe/models/bvlc_alexnet/'
net_fn = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_reference_caffenet.caffemodel'
model = caffe.io.caffe_pb2.NetParameter()
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('tmp.prototxt', 'w').write(str(model))
net = caffe.Classifier('tmp.prototxt', param_fn,
mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
channel_swap = (2,1,0),# the reference model has channels in BGR order instead of RGB
image_dims=(227, 227))
caffe.set_mode_gpu()
# caffe.set_mode_cpu()
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
return np.dstack((img + net.transformer.mean['data'])[::-1])
3) LOADING Image and setting constants
target_img = PIL.Image.open('alpha.jpg')
target_img = target_img.resize((227,227), PIL.Image.ANTIALIAS)
target_img=np.float32(target_img)
target_img=preprocess(net, target_img)
end='fc7'
4) Setting the source image and making the forward pass to obtain fc7 activation features
src = net.blobs['data']
src.reshape(1,3,227,227) # resize the network's input image size
src.data[0] = target_img
dst = net.blobs[end]
net.forward(end=end)
target_data = dst.data[0]
print dst.data
FOLLOWING is the output that I obtained for 'print dst.data' when I ran the above code multiple times
output on 1st execution of code
[[-2.22313166 -1.66219997 -1.67641115 ..., -3.62765646 -2.78621101
-5.06158161]]
output on 2nd execution of code
[[ -82.72431946 -372.29296875 -160.5559845 ..., -367.49728394 -138.7151947
-343.32080078]]
output on 3rd execution of code
[[-10986.42578125 -10910.08105469 -10492.50390625 ..., -8597.87011719
-5846.95898438 -7881.21923828]]
output on 4th execution of code
[[-137360.3125 -130303.53125 -102538.78125 ..., -40479.59765625
-5832.90869141 -1391.91259766]]
The output values keep becoming larger and larger and then again become smaller after some time. I am not able to understand the issue.

Switch your network to Test mode to prevent the effect of dropout which is non-deterministic and needed for training mode.
Add the following line right after initializing your network:
net.set_phase_test()
So that you'll always have the same results.
Soner

Related

fastai error predicting with exported/reloaded model: "Input type and weight type should be the same"

Whenever I export a fastai model and reload it, I get this error (or a very similar one) when I try and use the reloaded model to generate predictions on a new test set:
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Minimal reprodudeable code example below, you just need to update your FILES_DIR variable to where the MNIST data gets deposited on your system:
from fastai import *
from fastai.vision import *
# download data for reproduceable example
untar_data(URLs.MNIST_SAMPLE)
FILES_DIR = '/home/mepstein/.fastai/data/mnist_sample' # this is where command above deposits the MNIST data for me
# Create FastAI databunch for model training
tfms = get_transforms()
tr_val_databunch = ImageDataBunch.from_folder(path=FILES_DIR, # location of downloaded data shown in log of prev command
train = 'train',
valid_pct = 0.2,
ds_tfms = tfms).normalize()
# Create Model
conv_learner = cnn_learner(tr_val_databunch,
models.resnet34,
metrics=[error_rate]).to_fp16()
# Train Model
conv_learner.fit_one_cycle(4)
# Export Model
conv_learner.export() # saves model as 'export.pkl' in path associated with the learner
# Reload Model and use it for inference on new hold-out set
reloaded_model = load_learner(path = FILES_DIR,
test = ImageList.from_folder(path = f'{FILES_DIR}/valid'))
preds = reloaded_model.get_preds(ds_type=DatasetType.Test)
Output:
"RuntimeError: Input type (torch.cuda.FloatTensor) and weight type
(torch.cuda.HalfTensor) should be the same"
Stepping through the code statement by statement, everything works fine until the last line pred = ... which is where the torch error above pops up.
Relevant software versions:
Python 3.7.3
fastai 1.0.57
torch 1.2.0
torchvision 0.4.0
So the answer to this ended up being relatively simple:
1) As noted in my comment, training in mixed precision mode (setting conv_learner to_fp16()) caused the error with the exported/reloaded model
2) To train in mixed precision mode (which is faster than regular training) and enable export/reload of the model without errors, simply set the model back to default precision before exporting.
...In code, simply changing the example above:
# Export Model
conv_learner.export()
to:
# Export Model (after converting back to default precision for safe export/reload
conv_learner = conv_learner.to_fp32()
conv_learner.export()
...and now the full (reproduceable) code example above runs without errors, including the prediction after model reload.
Your model is in half precision if you have .to_fp16, which would be the same if you would model.half() in PyTorch.
Actually if you trace the code .to_fp16 will call model.half()
But there is a problem. If you convert the batch norm layer also to half precision you may get the convergence problem.
This is why you would typically do this in PyTorch:
model.half() # convert to half precision
for layer in model.modules():
if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
layer.float()
This will convert any layer to half precision other than batch norm.
Note that code from PyTorch forum is also OK, but just for nn.BatchNorm2d.
Then make sure your input is in half precision using to() like this:
import torch
t = torch.tensor(10.)
print(t)
print(t.dtype)
t=t.to(dtype=torch.float16)
print(t)
print(t.dtype)
# tensor(10.)
# torch.float32
# tensor(10., dtype=torch.float16)
# torch.float16

Different spectrogram between audio_ops and tf.contrib.signal

I am trying to update the feature extraction pipeline of an speech command recognition model replacing the function audio_ops.audio_spectrogram() by tf.contrib.signal.stft(). I assumed that they were equivalent, but I am obtaining different spectrogram values with the same input audio. Could someone explain the relation between the two methods, or whether it is possible to obtain the same results using tf.contrib.signal.stft()?
My code:
1) audio_ops method:
from tensorflow.contrib.framework.python.ops import audio_ops
import tensorflow as tf
import numpy as np
from tensorflow.python.ops import io_ops
#WAV audio loader
wav_filename_placeholder_ = tf.placeholder(tf.string, [], name='wav_filename')
wav_loader = io_ops.read_file(wav_filename_placeholder_)
sample_rate = 16000
desired_samples = 16000 #1 sec audio
wav_decoder = audio_ops.decode_wav(wav_loader, desired_channels=1, desired_samples=desired_samples)
#Computing the spectrograms
spectrogram = audio_ops.audio_spectrogram(wav_decoder.audio,
window_size=320,
stride=160,
magnitude_squared=False)
with tf.Session() as sess:
feed_dict={wav_filename_placeholder_:"/<folder_path>/audio_sample.wav"}
#Get the input audio and the spectrogram
audio_ops_wav_decoder_audio, audio_ops_spectrogram = sess.run([wav_decoder.audio, spectrogram], feed_dict)
2) tf.contrib.signal method:
#Input WAV audio (will be initialized with the same audio signal: wav_decoder.audio )
signals = tf.placeholder(tf.float32, [None, None])
#Compute the spectrograms and get the absolute values
stfts = tf.contrib.signal.stft(signals,
frame_length=320,
frame_step=160,
fft_length=512,
window_fn=None)
magnitude_spectrograms = tf.abs(stfts)
with tf.Session() as sess:
feed_dict = {signals : audio_ops_wav_decoder_audio.reshape(1,16000)}
tf_original, tf_stfts, tf_spectrogram, = sess.run([signals, stfts, magnitude_spectrograms], feed_dict)
Thank you in advance
Found these helpful comments in github that discuss the differences:
https://github.com/tensorflow/tensorflow/issues/11339#issuecomment-345741527
https://github.com/tensorflow/tensorflow/issues/11339#issuecomment-443553788
You can think of audio_ops.audio_spectrogram and audio_ops.mfcc as
"fused" ops (like fused batch-norm or fused LSTM cells that TensorFlow
has) for the ops in tf.contrib.signal. I think the original motivation
of them was that a fused op makes it easier to provide mobile support.
I think long term it would be nice if we removed them and provided
automatic fusing via XLA, or unified the API to match
tf.contrib.signal API, and provided fused keyword arguments to
tf.contrib.signal functions, like we do for
tf.layers.batch_normalization.
audio_spectrogram is a C++ implementation of an STFT, while
tf.signal.stft uses TensorFlow ops to compute the STFT (and thus has
CPU, GPU and TPU support).
The main cause of difference between them is that audio_spectrogram
uses fft2d to compute FFTs while tf.contrib.signal.stft uses Eigen
(CPU), cuFFT (GPU), and XLA (TPU). There is another very minor
difference, which is that the default periodic Hann window used by
each is slightly different. tf.contrib.signal.stft follows
numpy/scipy's definition.

Variable size inputs for CNTK in Keras

I want to feed a CNN with images from different resolutions using Keras. Thus, I defined the input layer shape as (None,None,3), since the images have 3 channels. My problem is that this works well on TensorFlow, but gives and error on CNTK (and I must use CNTK).
The following python code illustrates my problem:
import numpy as np
from keras.models import Model
from keras.layers import Conv2D, Input
input_layer = Input(shape=(None,None,3),name='input')
x = Conv2D(16,3)(input_layer)
x = Conv2D(16,3)(x)
model = Model(input=input_layer,output=x)
model.compile('adam','mse')
X = np.random.random((1,32,32,3))
Y = model.predict(X)
print Y.shape
If I run using Keras+TensorFlow it will execute nicely, however changing Keras backend to CNTK gives the error:
ValueError: Convolution operation requires that kernel dim 3 <= input dim 1.
As far as I could find over the internet, this problem should have been fixed from CNTK 2.2 and so on, however I'm using CNTK 2.5. Any ideas on how can I overcome this issue?

Keras with CNTK backend: Writing custom layers

I'm trying to write a custom layer in Keras to replicate on particular architecture proposed in a paper. The layer has no trainable weights. I believe this might be relevant, since it wouldn't be necessary to extend the class Layer.
I'm using the CNTK backend, but I'm trying to keep the code as backend-agnostic as possible, so I'm relying on the interfaces defined in keras.backend, instead of directly using CNTK.
Right now I'm just trying to get a small example to work. The example is as follows:
import numpy as np
from scipy.misc import imread
from keras import backend as K
im = imread('test.bmp')
#I'm extending a grayscale image to behave as a color image
ex_im = np.empty([im.shape[0],im.shape[1],3])
ex_im[:,:,0] = im
ex_im[:,:,1] = im
ex_im[:,:,2] = im
conv_filter = K.ones([3,3,ex_im.shape[2],ex_im.shape[2]])
x = K.conv2d(ex_im,conv_filter,padding='same')
This code, however, results in the following error:
RuntimeError: Convolution currently requires the main operand to have
dynamic axes
CNTK requires the input to the convolution to have dynamic axes, otherwise it would interpret the first dimension of the input as the batch size. So I tried to make the axes dynamic with placeholders (the only way I could find of doing so):
import numpy as np
from scipy.misc import imread
from keras import backend as K
im = imread('test.bmp')
ex_im = np.empty([1,im.shape[0],im.shape[1],3])
ex_im[0,:,:,0] = im
ex_im[0,:,:,1] = im
ex_im[0,:,:,2] = im
place = K.placeholder(shape=((None,) + ex_im.shape[1:]))
conv_filter = K.ones([3,3,ex_im.shape[3],ex_im.shape[3]])
x = K.conv2d(place,conv_filter,padding='same')
The image is now an array of images, with what is basically a batch size of 1.
This works correctly. However, I can't figure out how to feed an input to the placeholder in order to test my code. eval() doesn't take any arguments, and there doesn't seem to be a way to pass the input as an argument to the evaluation.
Is there a way to do this without placeholders? Or a way to feed the inputs to the placeholder? Am I doing something fundamentally wrong and should be following another path?
I should add that I really want to avoid being locked in to a backend, so any solutions should be backend-agnostic.
For using custom layers, you don't define tensors, let Keras do it for you. Just create the layer, and what will be given to the layer will already be a proper tensor:
images = np.ones((1,50,50,3))
def myFunc(x):
conv_filter = K.ones([3,3,3,3])
return K.conv2d(x,conv_filter,padding='same')
inp = Input((50,50,3))
out = Lambda(myFunc, output_shape=(50,50,3))(inp)
model = Model(inp,out)
print(model.predict(images))

How to get a layer from a caffe model using torch

In python when I want to get the data from a layer using caffe I have the following code
input_image = caffe.io.load_image(imgName)
input_oversampled = caffe.io.resize_image(input_image, self.net.crop_dims)
prediction = self.net.predict([input_image])
caffe_input = np.asarray(self.net.preprocess('data', prediction))
self.net.forward(data=caffe_input)
data = self.net.blobs['fc7'].data[4] // I want to get this value in lua
Hoever when I'm using torch I'm a bit stuck since I don't know how to perform the same action.
Currently I have the following code
require 'caffe'
require 'image'
net = caffe.Net('/opt/caffe/models/bvlc_reference_caffenet/deploy.prototxt', '/opt/caffe/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
img = image.lena()
dest = torch.Tensor(3, 227,227)
img = image.scale(dest, img)
img = img:resize(10,3,227,227)
output = net:forward(img:float())
conv_nodes = net:findModules('fc7') -- not working
Any help would be appreciated
First of all please note that torch-caffe-binding (i.e the tool you use with require 'caffe') is a direct wrapper around Caffe library thanks to LuaJIT FFI.
This means that it allows you to conveniently do a forward or backward with a Torch tensor, but behind the scenes these operations are made on a caffe::Net and not on a Torch nn network.
So if you want to manipulate a plain Torch network what you should use is the loadcaffe library which fully converts the network into a nn.Sequential:
require 'loadcaffe'
local net = loadcaffe.load('net.prototxt', 'net.caffemodel')
Then you can use findModules. However please note that you cannot use their initial label anymore (like conv1 or fc7) as they are discarded after convert.
Here fc7 (= INNER_PRODUCT) corresponds to the N-1 linear transformation. So you can get it as follow:
local nodes = net:findModules('nn.Linear')
local fc7 = nodes[#nodes-1]
Then you can read the data (weights and biases) via fc7.weight and fc7.bias - these are regular torch.Tensor-s.
UPDATE
As of commit 2516fac loadcaffe now saves layer names in addition. So to retrieve the 'fc7' layer you can now do something like:
local fc7
for _,m in pairs(net:listModules()) do
if m.name == 'fc7' then
fc7 = m
break
end
end

Categories