nan session output from polynomial regression training in Tensorflow - python

I am new to Tensorflow and am working through the examples of regression examples given here tensorflow tutorials. Speicifically, I am working on the 3rd: "polynomial_regression.py"
I followed the linear regression example fine, and have now moved on to the polynomial regression.
However, I wanted to try substituting another set of data instead of that made up in the example. I did this by exchanging
xs = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1], dtype=np.float32)
ys = np.asarray([1.7,2.76,-2.09,3.19,1.9,1.573,3.366,2.596,2.53,1.221,
2.827,-3.465,1.65,-2.1004,2.42,2.94,1.3], dtype=np.float32)
n_observations = xs.shape[0]
for
n_observations = 100
xs = np.linspace(-3, 3, n_observations)
ys = np.tan(xs) + np.random.uniform(-0.5, 0.5, n_observations)
I.e. the second was what was given in the example, and I wanted to try to run the same training with the new xs,ys, n_observation. These were the only lines I changed. I also tried changing the dtype of the array to be float64, but this did not change the output.
The output I am getting (which is from print(training_cost) is just a repeated nan. When I switch back to the original data, the network runs fine, and generates a fitting funciton.
Thank you for any ideas!

NaNs can be caused by many things, usually some form of numerical instability. Lowering the learning rate or using a more stable optimizer are good things to try.

Related

Pytorch Linear Regression wrong result on particular dataset

I'm using pytorch to implement a simple linear regression model.
The code works perfectly for randomly created datasets, but when it comes to the dataset I wanted to train, it gives significantly wrong results.
Here is the code:
x = torch.linspace(1,100,steps=100)
learn_rate = 0.000001
x_train = x[:100]
x_test = x[100:]
y_train = data[:100]
y_test = data[100:]
# y_train = -0.01*x_train + torch.randn(100)*10 #Code for generating random data.
w = torch.rand(1,requires_grad=True)
b= torch.rand(1,requires_grad=True)
for i in range(1000):
loss = torch.mean((y_train-(w*x_train+b))**2)
if(i%100==0):
print(loss)
loss.backward()
w.data.add_(-w.grad.data*learn_rate)
b.data.add_(-b.grad.data*learn_rate)
w.grad.data.zero_()
b.grad.data.zero_()
The result it gives makes no sense.
However, when I used a randomly generated dataset, it works perfectly:
The dataset actually looks similar. I am not sure for the reason of the inaccuracy of this model.
Code for plotting data:
plt.plot(x_train.numpy(),y_train.numpy())
plt.plot(x_train.numpy(),(w*x_train+b).data.numpy())
plt.show()
--
Now the problem seems to be that weight converges much faster than bias. At the current learning rate, bias will not converge to the optimal. However, if I increase the learning rate just by a little, the weight will simply diverge. I have to set two learning rates.
However, I'm wondering whether setting different learning rate is the best solution for a simple model like this, because I've found out that not much model actually uses different learning rate for different parameters.
Your code seems to be correct, but your model converges slower when there is a large bias in your data (because it now has to update the bias parameter many times before it reaches the correct value).
You could try running it for more iterations or increasing the learning rate.

How can I get predicted the following value of stock using predict method of Tensorflow?

I am wondering how to predict and get future time series data after model training. I would like to get the values after N steps. I wonder if the time series data has been properly learned and predicted. How do I do this right to get the following(next) value? I want to get the next value using model.predict or similar.
I have x_test and x_test[-1] == t So, the meaning of the next value is t+1, t+2, .... t+n. In this example I want to get t+1, t+2 ... t+n
First
I tried using stock index data
inputs = total_data[len(total_data) - forecast - look_back:]
inputs = scaler.transform(inputs)
X_test = []
for i in range(look_back, inputs.shape[0]):
X_test.append(inputs[i - look_back:i])
X_test = np.array(X_test)
predicted = model.predict(X_test)
but the result is like below
The results from X_test[-20:] and the following 20 predictions looks like same. I'm wondering if it's the correct method to train and predicted value and also if the result was correct.
full source
The method I tried first did not work correctly.
Second
I realized something is wrong, I tried using another official data so I used the time series in the Tensorflow tutorial to practice training the model.
a = y_val[-look_back:]
for i in range(N-step prediction): #predict a new value n times.
tmp = model.predict(a.reshape(-1, look_back, num_feature)) #predicted value
a = a[1:] #remove first
a = np.append(a, tmp) #insert predicted value
The results were predicted in a linear regression shape very differently from the real data.
Output a linear regression abnormal that is independent of the real data:
full source (After the 25th line is my code.)
I'm really very curious that How can I predict the following value of time series using Tensorflow predict method
I'm not wondering if this works or not theoretically. I'm just wondering how to get the following n steps using the predict method.
Thank you for reading the long question. I seek advice about your priceless opinion.
In the Second approach, Output is not expected, as per my understanding, because of a small mistake in the code.
The line of code,
a = y_val[-look_back:]
should be replaced by
look_back = 20
x = x_val_uni
a = x[-look_back:]
a.shape
In other words, we should send X Values as Inputs to the Model for the Prediction, not the Y Values.
However, we can compare it's predictions with Y Values, with the code,
y = y_val_uni[-20:]
plt.plot(y)
plt.plot(tmp)
plt.show()
Which would result in the plot shown below:
Please find the Complete Working Code in this Google Colab Gist.

Loop over a layer to do a Monte Carlo from a neural net output

I've recently had to tweak a neural network. Here's how it works:
Given an image as input, several layer turns it into a mean matrix mu and a covariance matrix sigma.
Then, a sample z is taken from the Gaussian distribution of parameters mu, sigma.
Several layer turns this sample into an output
this output is compared to a given image, which gives a cost
What I want to do is to keep mu and sigma, take multiple samples z, propagate them through the rest of the NN, and compare the multiple images I get to a given image.
Note that the step z -> image output calls other package, I'd like not having to dig into these...
What I did so far :
At first, I thought I did not need to go through all this hassle : I take a batch_size of one, it is as if I'm doing a Monte Carlo by running the NN multiple times. But I actually need the neural net to try several image before updating the weights, and thus changing mu and sigma.
I simply sampled multiple z then propagated them through the net. But I soon discovered that I was duplicating all the layers, making the code terribly slow, and above all preventing me from taking many samples to achieve the MC I'm aiming at.
Of course, I updated the loss and data input classes to take that into account.
Do you have any ideas ? Basically, I'd like an efficient way to make z -> output multiple time, in a cost-efficient manner. I've still a lot to learn from tensorflow and keras, so I'm a little bit lost on how to do that. As usual, please apologized if an answer already exists somewhere, I did my best to look for one by myself!
Ok, my question was a bit stupid. So as not to duplicate layers, I created multiple slice layers, and then I simply propagated them through the net with previously declared layers. Here's my code :
# First declare layers
a = layer_A()
b = layer_B()
# And so on ...
# Generate samples
samples = generate_samples()([mu, sigma])
# for all the monte carlo samples, do :
for i in range(mc_samples):
cur_sample = Lambda(lambda x: K.slice(x, (0, 0, 0, 2*i), (-1, -1, -1, 2)), name="slice-%i" % i)(samples)
cur_output = a(cur_sample)
cur_output = b(cur_output)
all_output.append(output)
output_of_net = keras.layers.concatenate(all_output)
return Model(inputs=inputs, outputs=output_of_net)
Simply loop over the last dimension in the loss function, average, and you're done ! A glimpse at my loss :
loss = 0
for i in range(mc_samples):
loss += f(y_true[..., i], y_pred[..., i])
return loss/mc_samples

Memory error when using logistic regression

I have an array with the shape 57159x924 which I will use as training data. 896 of these 924 columns are feature and the remaining labels. I want to use logistic regression on this, but when I use the fit function from logistic regression I get a memory error. I guess it is because it's too much data for my computer's memory to handle. Is there any way to get around this problem?
The code I want to use is
lr = LogisticRegression(random_state=1)
lr.fit(train_set, train_label)
lr.predict_proba(x_test)
And the following is the error
line 21, in main
lr.fit(train_set, train_label)
....
return array(a, dtype, copy=False, order=order)
MemoryError
You haven't given enough details to really understand the problem or give a definite answer, but here are a few options I hope will help:
The amount of memory available might be configurable.
Training over all the data at the same time would raise OOM problems in many contexts, which is why the common practice is to use SGD (stochastic gradient descent) by training over batches, i.e. introducing only subsets of the data every iteration and getting a global optimization solution in a stochastic sense. If I'm guessing correctly, you're using sklearn.linear_model.LogisticRegression, which has different "solvers". Maybe the saga solver will handle your situation better.
There are other implementations out there, and some of them definitely have batching options built-in in a highly configurable way. And if worst comes to worst, implementing a logistic-regression model is fairly simple, and then you can batch easy as pie.
Edit (due to discussion in comments):
Here's a practical way to go about it, with a very very simple (and easy) example -
from sklearn.linear_model import SGDClassifier
import numpy as np
import random
X1 = np.random.multivariate_normal(mean=[10, 5], cov = np.diag([3, 8]), size=1000) # diagonal covariance for simplicity
Y1 = np.zeros((1000, 1))
X2 = np.random.multivariate_normal(mean=[-4, 55], cov = np.diag([5, 1]), size=1000) # diagonal covariance for simplicity
Y2 = np.ones((1000, 1))
X = np.vstack([X1, X2])
Y = np.vstack([Y1, Y2]).reshape([2000,])
sgd = SGDClassifier(loss='log', warm_start=True) # as mentioned in answer. note that shuffle is defaulted to True.
sgd.partial_fit(X, Y, classes = [0, 1]) # first time you need to say what your classes are
for k in range(1000):
batch_indexs = random.sample(range(2000), 20)
sgd.partial_fit(X[batch_indexs, :], Y[batch_indexs])
In practice you should be looking at the loss and accuracy and using a suitable while instead of for, but that much is left for the reader ;-)
Note that you can control more than I've shown (like the number of iterations etc.), so you should read the documentation of SGDClassifier properly.
Another thing to note is that there are different practices of batching. I just took a random subset every iteration, but some prefer to make sure every point in the data has been seen an equal amount of times (e.g. shuffle the data and then take batches in order indexes or something).

SKlearn prediction on test dataset with different shape from training dataset shape

I'm new to ML and would be grateful for any assistance provided. I've run a linear regression prediction using test set A and training set A. I saved the linear regression model and would now like to use the same model to predict a test set A target using features from test set B. Each time I run the model it throws up the error below
How can I successfully predict a test data set from features and a target with different shapes?
Input
print(testB.shape)
print(testA.shape)
Output
(2480, 5)
(1315, 6)
Input
saved_model = joblib.load(filename)
testB_result = saved_model.score(testB_features, testA_target)
print(testB_result)
Output
ValueError: Found input variables with inconsistent numbers of samples: [1315, 2480]
Thanks again
They are inconsistent shapes which is why the error is being thrown. Have you tried to reshape the data so one of them are same shape? From a quick look, it seems that you have more samples and one less feature in testA.
Think about it, if you have trained your model with 5 features you cannot then ask the same model to make a prediction given 6 features. You speak of using a Linear Regressor, the equation is roughly:
y = b + w0*x0 + w1*x1 + w2*x2 + .. + wN-1*xN-1
Where {
y is your output/label
N is the number of features
b is the bias term
w(i) is the ith weight
x(i) is the ith feature value
}
You have trained a linear regressor with 5 features, effectively producing the following
y (your output/label) = b + w0*x0 + w1*x1 + w2*x2 + w3*x3 + w4*x4
You then ask it to make a prediction given 6 features but it only knows how to deal with 5.
Aside from that issue, you also have too many samples, testB has 2480 and testA has 1315. These need to match, as the model wants to make 2480 predictions, but you only give it 1315 outputs to compare it to. How can you get a score for 1165 missing samples? Do you now see why the data has to be reshaped?
EDIT
Assuming you have datasets with an equal amount of features as discussed above, you may now look at reshaping (removing data) testB like so:
testB = testB[0:1314, :]
testB.shape
(1315, 5)
Or, if you would prefer a solution using the numpy API:
testB = np.delete(testB, np.s_[0:(len(testB)-len(testA))], axis=0)
testB.shape
(1315, 5)
Keep in mind, when doing this you slice out a number of samples. If this is important to you (which it can be) then it may be better to introduce a pre-processing step to help out with the missing values, namely imputing them like this. It is worth noting that the data you are reshaping should be shuffled (unless it is already), as you may be removing parts of the data the model should be learning about. Neglecting to do this could result in a model that may not generalise as well as you hoped.

Categories