Weights and bias giving Nan

Weights and bias giving Nan - python

import pandas as pd
import numpy as np
from pandas import DataFrame
from random import shuffle
import tensorflow as tf
Taking data from CSV file (IMDB dataset)
data=pd.read_csv('imdb.csv')
data.fillna(-1)
features=data.loc[:,['actor_1_facebook_likes','actor_2_facebook_likes','actor_3_facebook_likes','movie_facebook_likes']].as_matrix()
labels=data.loc[:,['imdb_score']].as_matrix()
learning_rate=.01
training_epochs=2000
display_steps=50
n_samples=features.size
Defining placeholders for features and labels:
inputX = tf.placeholder(tf.float32,[None,4])
inputY = tf.placeholder(tf.float32,[None,1])
Defining weights and bias.
Weights and bias are coming out to be NaN.
w = tf.Variable(tf.zeros([4,4]))
b = tf.Variable(tf.zeros([4]))
y_values = tf.add(tf.matmul(inputX,w),b)
Applying neural network:
y=tf.nn.softmax(y_values)
cost=tf.reduce_sum(tf.pow(inputY-y,2))/2*n_samples
optimizer=tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(training_epochs):
sess.run(optimizer,feed_dict={inputX:features,inputY:labels})
if (i) % display_steps==0:
cc=sess.run(cost,feed_dict={inputX:features,inputY:labels})
print(sess.run(w,feed_dict={inputX:features,inputY:labels}))

Your learning rate is too big (try starting with 1e-3).
Also, your neural network won't learn anything because you're starting from a condition in which your weights can't change: you have initialized your weights to zero, that's wrong.
Change your weights initialization to random values in that way:
w = tf.Variable(tf.truncated_normal([4,4]))
and you'll be able to train your network. (biases initialized to 0 are OK)

Use add_check_numerics_ops of TensorFlow library to check which operation is giving you the nan values.
https://www.tensorflow.org/api_docs/python/tf/add_check_numerics_ops

Related

PyTorch NN and linear regression: Why bigger Training Data gives worst results than a much smaller dataset?

I am trying to create a bare minimum PyTorch example only for learning purpose. I found out that my PyTorch code works fine with a really small training data but as soon as I increase the input data size it stops working. This seems very counterintuitive, ideally bigger training data size should give better results.
[ I have intentionally not used Object Oriented Paradigm as I am trying to first learn the core functionality hence keeping the code to a bare minimum. ]
import numpy as np
import torch
x_train = np.float32(np.random.rand(25,1)*10)
#Synthesize training data; we will verify the weights and bias later with the trained model
def synthesize_output(input):
return (1.29*input[0] + 13)
y_train = np.array([synthesize_output(row) for row in x_train]).reshape(-1,1)
X_train = torch.from_numpy(x_train)
Y_train = torch.from_numpy(y_train)
learning_rate = 0.001
# Initialize Weights and Bias to random starting values
w = torch.rand(1, 1, requires_grad=True)
b = torch.rand(1, 1, requires_grad=True)
for iter in range(1, 4001):
#forward pass : predict values
y_pred = X_train.mm(w).clamp(min=0).add(b)
#find loss
loss = (y_pred - Y_train).pow(2).sum()
#Backword pass for computing gradients
loss.backward()
#Just printing the loss to see how it is changing over the iterations
if (iter % 100) == 0:
print(f"Iter: {iter}, Loss={loss}")
#Manually updating weights
with torch.no_grad():
w -= learning_rate * w.grad
b -= learning_rate * b.grad
w.grad.zero_()
b.grad.zero_()
#finally check the weight and bias
print(f"Weights: {w} \n\nBias: {b}")
Above code works as it is but as soon as I increase ( just doubling ) the data size it stops working.
x_train = np.float32(np.random.rand(50,1)*10)
Unlike the above code the basic sklearn sample I had created however seems to work fine even with a much larger input dataset.
import numpy as np
from sklearn.linear_model import LinearRegression
x_train = np.float32(np.random.rand(2000000,1)*10)
def synthesize_output(input):
return (1.29*input[0] + 13)
y_train = np.array([synthesize_output(row) for row in x_train]).reshape(-1,1)
lm = LinearRegression()
lm.fit(x_train, y_train)
#finally check the weight and constant
lm.score(x_train, y_train)
print(f"Weight: {lm.coef_}")
print(f"Bias: {lm.intercept_}")
Why is PyTorch not able to handle large input data like sklearn?
When I use input very large size ( > 5000 ) training datasize the loss goes to a NaN.
How do we typically work around this problem?

Is normalization necessary for regression problem in Neural network

I am learning how to build a neural network using PyTorch.
This formula is the target of my code:
y =2X^3 + 7X^2 - 8*X + 120
It is a regression problem.
I used this because it is simple and the output can be calculated so that I can ensure my neural network is able to predict output with the given input.
However, I met some problem during training.
The problem occurs in this line of code:
loss = loss_func(prediction, outputs)
The loss computed in this line is NAN (not a number)
I am using MSEloss as the loss function. 100 datasets are used for training the ANN model. The input X_train is ranged from -1000 to 1000.
I believed that the problem is owing to the value of X_train and MSEloss. X_train should be scaled into some values between 0 and 1 so that MSEloss can compute the loss.
However, is it possible to train the ANN model without scaling the input into value between 0 and 1 in a regression problem?
Here is my code, it does not use MinMaxScaler and it print the loss with NAN:
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch.autograd import Variable
#Load datasets
dataset = pd.read_csv('test_100.csv')
x_temp_train = dataset.iloc[:79, :-1].values
y_temp_train = dataset.iloc[:79, -1:].values
x_temp_test = dataset.iloc[80:, :-1].values
y_temp_test = dataset.iloc[80:, -1:].values
#Turn into tensor
X_train = torch.FloatTensor(x_temp_train)
Y_train = torch.FloatTensor(y_temp_train)
X_test = torch.FloatTensor(x_temp_test)
Y_test = torch.FloatTensor(y_temp_test)
#Define a Artifical Neural Network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear = nn.Linear(1,1) #input=1, output=1, bias=True
def forward(self, x):
x = self.linear(x)
return x
net = Net()
print(net)
#Define a Loss function and optimizer
optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()
#Training
inputs = Variable(X_train)
outputs = Variable(Y_train)
for i in range(100): #epoch=100
prediction = net(inputs)
loss = loss_func(prediction, outputs)
optimizer.zero_grad() #zero the parameter gradients
loss.backward() #compute gradients(dloss/dx)
optimizer.step() #updates the parameters
if i % 10 == 9: #print every 10 mini-batches
#plot and show learning process
plt.cla()
plt.scatter(X_train.data.numpy(), Y_train.data.numpy())
plt.plot(X_train.data.numpy(), prediction.data.numpy(), 'r-', lw=2)
plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 10, 'color': 'red'})
plt.pause(0.1)
plt.show()
Thanks for your time.

Is normalization necessary for regression problem in Neural Network?
No.
But...
I can tell you that MSELoss works with non-normalised values. You can tell because:
>>> import torch
>>> torch.nn.MSELoss()(torch.randn(1)-1000, torch.randn(1)+1000)
tensor(4002393.)
MSE is a very well-behaved loss function, and you can't really get NaN without giving it a NaN. I would bet that your model is giving a NaN output.
The two most common causes of a NaN are: an accidental divide by 0, and absurdly large weights/gradients.
I ran a variant of your code on my machine using:
x = torch.randn(79, 1)*1000
y = 2*x**3 + 7*x**2 - 8*x + 120
And it got to NaN in about 20 training steps due to absurdly large weights.
A model can get absurdly large weights if the learning rate is too large. You may think 0.2 is not too large, but that's a typical learning rate people use for normalised data, which forces their gradients to be fairly small. Since you are not using normalised data, let's calculate how large your gradients are (roughly).
First, your x is on the order of 1e3, your expected output y scales at x^3, then MSE calculates (pred - y)^2. Then your loss is on the scale of 1e3^3^2=1e18. This propagates to your gradients, and recall that weight updates are += gradient*learning_rate, so it's easy to see why your weights fairly quickly explode outside of float precision.
How to fix this? Well you could use a learning rate of 2e-7. Or you could just normalise your data. I recommend normalising your data; it has other nice properties for training and avoids these kinds of problems.

Autoencoder in TensorFlow

I am building a Tensorflow implementation of an autoencoder for time series. I have a 2000 time series, each of which is a series of 501-time components. These time series are stored in a '.mat' file, which I read in input using scipy.
I then build the autoencoder and train it using batches of the 2000 time series. Finally, I would like to visualize the prediction of the trained autoencoder on the 2000 time series given as input, and compare with the original series, so that I can see if the autoencoder is doing a good job in compressing the data.
I use a double-layer autoencoder, with 250 and 100 nodes in the first and second hidden layer, respectively.
My problem is that when I compare the predicted time series with the original ones, the predicted ones have only positive values, while the original time series have both negative and positive values.
Here the code I have been using:
import scipy.io
mat = scipy.io.loadmat('input_time_series.mat')
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
input = mat
output = input
tf.reset_default_graph()
num_inputs=501 #number of components in the original time seris
num_hid1=250
num_hid2=100
num_hid3=num_hid1
num_output=num_inputs
lr=0.01
actf=tf.nn.relu
X=tf.placeholder(tf.float32,shape=[None,num_inputs])
initializer=tf.variance_scaling_initializer()
w1=tf.Variable(initializer([num_inputs,num_hid1]),dtype=tf.float32)
w2=tf.Variable(initializer([num_hid1,num_hid2]),dtype=tf.float32)
w3=tf.Variable(initializer([num_hid2,num_hid3]),dtype=tf.float32)
w4=tf.Variable(initializer([num_hid3,num_output]),dtype=tf.float32)
b1=tf.Variable(tf.zeros(num_hid1))
b2=tf.Variable(tf.zeros(num_hid2))
b3=tf.Variable(tf.zeros(num_hid3))
b4=tf.Variable(tf.zeros(num_output))
hid_layer1=actf(tf.matmul(X,w1)+b1)
hid_layer2=actf(tf.matmul(hid_layer1,w2)+b2)
hid_layer3=actf(tf.matmul(hid_layer2,w3)+b3)
output_layer=actf(tf.matmul(hid_layer3,w4)+b4)
loss=tf.reduce_mean(tf.square(output_layer-X))
optimizer=tf.train.AdamOptimizer(lr)
train=optimizer.minimize(loss)
init=tf.global_variables_initializer()
num_epoch=5000
batch_size=150
with tf.Session() as sess:
sess.run(init)
for epoch in range(num_epoch):
num_batches=2000//batch_size
for iteration in range(num_batches):
X_batch = input[:]
Y_batch = output[:]
sess.run(train,feed_dict={X:X_batch})
train_loss=loss.eval(feed_dict={X:X_batch})
print("epoch {} loss {}".format(epoch,train_loss))
results=output_layer.eval(feed_dict={X:input})
I also include an example of comparison between one input time series (in blue) and the relevant one predicted by the autoencoder (in orange)

tensorflow: assigning weights after finalizing graph

Solution below
If you are just interested in solving this problem, you can skip to my answer below.
Original question
I'm using tensorflow for reinforcement learning. A swarm of agents uses the model in parallel and one central entity trains it on the collected data.
I had found here: Is it thread-safe when using tf.Session in inference service? that tensorflow sessions are threadsafe. So I simply let the prediction and updating run in parallel.
But now I would like to change the setup. Instead of updating and training on one single model, I now need to keep two models. One is used for prediction and the second one is trained. After some training steps the weights from the second one are copied over to the first. Below is a minimal example in keras. For multiprocessing, it is recommended to finalize the graph, but then I can't copy weights:
# the usual imports
import numpy as np
import tensorflow as tf
from keras.models import *
from keras.layers import *
# set up the first model
i = Input(shape=(10,))
b = Dense(1)(i)
prediction_model = Model(inputs=i, outputs=b)
# set up the second model
i2 = Input(shape=(10,))
b2 = Dense(1)(i2)
training_model = Model(inputs=i2, outputs=b2)
# look at this code, to check if the weights are the same
# here the output is different
prediction_model.predict(np.ones((1, 10)))
training_model.predict(np.ones((1, 10)))
# now to use them in multiprocessing, the following is necessary
prediction_model._make_predict_function()
training_model._make_predict_function()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
default_graph = tf.get_default_graph()
# the following line is the critical part
# if this is uncommented, the two options below both fail
# default_graph.finalize()
# option 1, use keras methods to update the weights
prediction_model.set_weights(training_model.get_weights())
# option 2, use tensorflow to update the weights
update_ops = [tf.assign(to_var, from_var) for to_var, from_var in
zip(prediction_model.trainable_weights, training_model.trainable_weights)]
sess.run(update_ops)
# now the predictions are the same
prediction_model.predict(np.ones((1, 10)))
training_model.predict(np.ones((1, 10)))
According to the question above, it is recommended to finalize the graph. If it is not finalized, there can be memory leaks (!?), so that seems like a strong recommendation.
But if I finalize it, I can no longer update the weights.
What confuses me about this is: It is possible to train the network, so changing the weights is allowed. Assignment looks to me like the weights are just overwritten, why is this different from applying an optimizer step ?

In short, my problem was to assign values to weights of a finalized graph. If this assignment is done after finalization, tensorflow complains that the graph can no longer be changed.
I was confused why this is forbidden. After all, changing the weights by backpropagation is allowed.
But the problem is not related to changing the weights. Keras set_weights() is confusing because it looks as if the weights are simply overwritten (like in backprop). Actually, behind the scenes, assignment operations are added and executed. These new operations represent a change in the graph and that change is forbidden.
So the solution is to set up the assignment operations before finalizing the graph. You have to reorder the code:
# the usual imports
import numpy as np
import tensorflow as tf
from keras.models import *
from keras.layers import *
# set up the first model
i = Input(shape=(10,))
b = Dense(1)(i)
prediction_model = Model(inputs=i, outputs=b)
# set up the second model
i2 = Input(shape=(10,))
b2 = Dense(1)(i2)
training_model = Model(inputs=i2, outputs=b2)
# set up operations to move weights from training to prediction
update_ops = [tf.assign(to_var, from_var) for to_var, from_var in
zip(prediction_model.trainable_weights, training_model.trainable_weights)]
# now to use them in multiprocessing, the following is necessary
prediction_model._make_predict_function()
training_model._make_predict_function()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
default_graph = tf.get_default_graph()
default_graph.finalize()
# this can be executed now
sess.run(update_ops)
# now the predictions are the same
prediction_model.predict(np.ones((1, 10)))
training_model.predict(np.ones((1, 10)))

Use attribute and target matrices for TensorFlow Linear Regression Python

I'm trying to follow this tutorial.
TensorFlow just came out and I'm really trying to understand it. I'm familiar with penalized linear regression like Lasso, Ridge, and ElasticNet and its usage in scikit-learn.
For scikit-learn Lasso regression, all I need to input into the regression algorithm is DF_X [an M x N dimensional attribute matrix (pd.DataFrame)] and SR_y [an M dimensional target vector (pd.Series)]. The Variable structure in TensorFlow is a bit new to me and I'm not sure how to structure my input data into what it wants.
It seems as if softmax regression is for classification. How can I restructure my DF_X (M x N attribute matrix) and SR_y (M dimensional target vector) to input into tensorflow for linear regression?
My current method for doing a Linear Regression uses pandas, numpy, and sklearn and it's shown below. I think this question will be really helpful for people getting familiar with TensorFlow:
#!/usr/bin/python
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.linear_model import LassoCV
#Create DataFrames for attribute and target matrices
DF_X = pd.DataFrame(np.array([[0,0,1],[2,3,1],[4,5,1],[3,4,1]]),columns=["att1","att2","att3"],index=["s1","s2","s3","s4"])
SR_y = pd.Series(np.array([3,2,5,8]),index=["s1","s2","s3","s4"],name="target")
print DF_X
#att1 att2 att3
#s1 0 0 1
#s2 2 3 1
#s3 4 5 1
#s4 3 4 1
print SR_y
#s1 3
#s2 2
#s3 5
#s4 8
#Name: target, dtype: int64
#Create Linear Model (Lasso Regression)
model = LassoCV()
model.fit(DF_X,SR_y)
print model
#LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True,
#max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False,
#precompute='auto', random_state=None, selection='cyclic', tol=0.0001,
#verbose=False)
print model.coef_
#[ 0. 0.3833346 0. ]

Softmax is an only addition function (in logistic regression for example), it is not a model like
model = LassoCV()
model.fit(DF_X,SR_y)
Therefore you can't simply give it data with fit method. However, you can simply create your model with the help of TensorFlow functions.
First of all, you have to create a computational graph, for example for linear regression you will create tensors with the size of your data. They are only tensors and you will give them your array in another part of the program.
import tensorflow as tf
x = tf.placeholder("float", [4, 3])
y_ = tf.placeholder("float",[4])
When you create two variables, that will contain initial weights of our model
W = tf.Variable(tf.zeros([3,1]))
b = tf.Variable(tf.zeros([1]))
And now you can create the model (you want to create regression, not classification therefore you don't need to use tf.nn.softmax )
y=tf.matmul(x,W) + b
As you have regression and linear model you will use
loss=tf.reduce_sum(tf.square(y_ - y))
Then we will train our model with the same step as in the tutorial
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
Now that you created the computational graph you have to write one more part of the program, where you will use this graph to work with your data.
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
sess.run(train_step, feed_dict={x:np.asarray(DF_X),y_:np.asarray(SR_y)})
Here you give your data to this computational graph with the help of feed_dict. In TensorFlow you provide information in numpy arrays.
If you want to see your mistake you can write
sess.run(loss,feed_dict={x:np.asarray(DF_X),y_:np.asarray(SR_y)})

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Weights and bias giving Nan - python

Use add_check_numerics_ops of TensorFlow library to check which operation is giving you the nan values. https://www.tensorflow.org/api_docs/python/tf/add_check_numerics_ops

Related

PyTorch NN and linear regression: Why bigger Training Data gives worst results than a much smaller dataset?

Is normalization necessary for regression problem in Neural network

Autoencoder in TensorFlow

tensorflow: assigning weights after finalizing graph

Use attribute and target matrices for TensorFlow Linear Regression Python

Categories

Resources