Use attribute and target matrices for TensorFlow Linear Regression Python - python

I'm trying to follow this tutorial.
TensorFlow just came out and I'm really trying to understand it. I'm familiar with penalized linear regression like Lasso, Ridge, and ElasticNet and its usage in scikit-learn.
For scikit-learn Lasso regression, all I need to input into the regression algorithm is DF_X [an M x N dimensional attribute matrix (pd.DataFrame)] and SR_y [an M dimensional target vector (pd.Series)]. The Variable structure in TensorFlow is a bit new to me and I'm not sure how to structure my input data into what it wants.
It seems as if softmax regression is for classification. How can I restructure my DF_X (M x N attribute matrix) and SR_y (M dimensional target vector) to input into tensorflow for linear regression?
My current method for doing a Linear Regression uses pandas, numpy, and sklearn and it's shown below. I think this question will be really helpful for people getting familiar with TensorFlow:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.linear_model import LassoCV
#Create DataFrames for attribute and target matrices
DF_X = pd.DataFrame(np.array([[0,0,1],[2,3,1],[4,5,1],[3,4,1]]),columns=["att1","att2","att3"],index=["s1","s2","s3","s4"])
SR_y = pd.Series(np.array([3,2,5,8]),index=["s1","s2","s3","s4"],name="target")
print DF_X
#att1 att2 att3
#s1 0 0 1
#s2 2 3 1
#s3 4 5 1
#s4 3 4 1
print SR_y
#s1 3
#s2 2
#s3 5
#s4 8
#Name: target, dtype: int64
#Create Linear Model (Lasso Regression)
model = LassoCV(),SR_y)
print model
#LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True,
#max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False,
#precompute='auto', random_state=None, selection='cyclic', tol=0.0001,
print model.coef_
#[ 0. 0.3833346 0. ]

Softmax is an only addition function (in logistic regression for example), it is not a model like
model = LassoCV(),SR_y)
Therefore you can't simply give it data with fit method. However, you can simply create your model with the help of TensorFlow functions.
First of all, you have to create a computational graph, for example for linear regression you will create tensors with the size of your data. They are only tensors and you will give them your array in another part of the program.
import tensorflow as tf
x = tf.placeholder("float", [4, 3])
y_ = tf.placeholder("float",[4])
When you create two variables, that will contain initial weights of our model
W = tf.Variable(tf.zeros([3,1]))
b = tf.Variable(tf.zeros([1]))
And now you can create the model (you want to create regression, not classification therefore you don't need to use tf.nn.softmax )
y=tf.matmul(x,W) + b
As you have regression and linear model you will use
loss=tf.reduce_sum(tf.square(y_ - y))
Then we will train our model with the same step as in the tutorial
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
Now that you created the computational graph you have to write one more part of the program, where you will use this graph to work with your data.
init = tf.initialize_all_variables()
sess = tf.Session(), feed_dict={x:np.asarray(DF_X),y_:np.asarray(SR_y)})
Here you give your data to this computational graph with the help of feed_dict. In TensorFlow you provide information in numpy arrays.
If you want to see your mistake you can write,feed_dict={x:np.asarray(DF_X),y_:np.asarray(SR_y)})


Tensorflow model architecture for sparse dataset

I have a regression dataset where approximately 95% of the target variables are zeros (the other 5% are between 1 and 30) and I am trying to design a Tensorflow model to model that data. I am thinking of implementing a model that combines a classifier and a regressor (check the output of the classifier submodel, if it's less than a threshold then pass it to the regression submodel). I have the intuition that this should be built using the functional API But I couldn't find helpful resources on that. Any ideas?
Here is the code that generates the data that I am using to replicate the problem:
n = 10000
zero_percentage = 0.95
zeros = np.zeros(round(n * zero_percentage))
non_zeros = np.random.randint(1,30,size=round(n * (1- zero_percentage)))
y = np.concatenate((zeros,non_zeros))
a = 50
b = 10
x = np.array([np.random.randint(31,60) if element == 0 else (element - b) / a for element in y])
y_classification = np.array([0 if element == 0 else 1 for element in y])
Note: I experimented with probabilistic models (Poisson regression and regression with a discretized logistic mixture distribution), and they provided good results but the training was unstable (loss diverges very often).
Instead of trying to find some heuristic to balance the training between the zero values and the others, you might want to try some input preprocessing method that can handle imbalanced training sets better (usually by mapping to another space before running the model, then doing the inverse with the results); for example, an embedding layer. Alternatively, normalize the values to a small range (like [-1, 1]) and apply an activation function before evaluating the model on the data.

Is normalization necessary for regression problem in Neural network

I am learning how to build a neural network using PyTorch.
This formula is the target of my code:
y =2X^3 + 7X^2 - 8*X + 120
It is a regression problem.
I used this because it is simple and the output can be calculated so that I can ensure my neural network is able to predict output with the given input.
However, I met some problem during training.
The problem occurs in this line of code:
loss = loss_func(prediction, outputs)
The loss computed in this line is NAN (not a number)
I am using MSEloss as the loss function. 100 datasets are used for training the ANN model. The input X_train is ranged from -1000 to 1000.
I believed that the problem is owing to the value of X_train and MSEloss. X_train should be scaled into some values between 0 and 1 so that MSEloss can compute the loss.
However, is it possible to train the ANN model without scaling the input into value between 0 and 1 in a regression problem?
Here is my code, it does not use MinMaxScaler and it print the loss with NAN:
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch.autograd import Variable
#Load datasets
dataset = pd.read_csv('test_100.csv')
x_temp_train = dataset.iloc[:79, :-1].values
y_temp_train = dataset.iloc[:79, -1:].values
x_temp_test = dataset.iloc[80:, :-1].values
y_temp_test = dataset.iloc[80:, -1:].values
#Turn into tensor
X_train = torch.FloatTensor(x_temp_train)
Y_train = torch.FloatTensor(y_temp_train)
X_test = torch.FloatTensor(x_temp_test)
Y_test = torch.FloatTensor(y_temp_test)
#Define a Artifical Neural Network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear = nn.Linear(1,1) #input=1, output=1, bias=True
def forward(self, x):
x = self.linear(x)
return x
net = Net()
#Define a Loss function and optimizer
optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()
inputs = Variable(X_train)
outputs = Variable(Y_train)
for i in range(100): #epoch=100
prediction = net(inputs)
loss = loss_func(prediction, outputs)
optimizer.zero_grad() #zero the parameter gradients
loss.backward() #compute gradients(dloss/dx)
optimizer.step() #updates the parameters
if i % 10 == 9: #print every 10 mini-batches
#plot and show learning process
plt.plot(,, 'r-', lw=2)
plt.text(0.5, 0, 'Loss=%.4f' %, fontdict={'size': 10, 'color': 'red'})
Thanks for your time.
Is normalization necessary for regression problem in Neural Network?
I can tell you that MSELoss works with non-normalised values. You can tell because:
>>> import torch
>>> torch.nn.MSELoss()(torch.randn(1)-1000, torch.randn(1)+1000)
MSE is a very well-behaved loss function, and you can't really get NaN without giving it a NaN. I would bet that your model is giving a NaN output.
The two most common causes of a NaN are: an accidental divide by 0, and absurdly large weights/gradients.
I ran a variant of your code on my machine using:
x = torch.randn(79, 1)*1000
y = 2*x**3 + 7*x**2 - 8*x + 120
And it got to NaN in about 20 training steps due to absurdly large weights.
A model can get absurdly large weights if the learning rate is too large. You may think 0.2 is not too large, but that's a typical learning rate people use for normalised data, which forces their gradients to be fairly small. Since you are not using normalised data, let's calculate how large your gradients are (roughly).
First, your x is on the order of 1e3, your expected output y scales at x^3, then MSE calculates (pred - y)^2. Then your loss is on the scale of 1e3^3^2=1e18. This propagates to your gradients, and recall that weight updates are += gradient*learning_rate, so it's easy to see why your weights fairly quickly explode outside of float precision.
How to fix this? Well you could use a learning rate of 2e-7. Or you could just normalise your data. I recommend normalising your data; it has other nice properties for training and avoids these kinds of problems.

custom class-wise loss function in tensorflow

For my problem, I want to predict customer review scores ranging from 1 to 5.
I thought it would be good to implement this as a regression problem because a predicted 1 from the model while 5 being the true value should be a "worse" prediction than 4.
It is also wished, that the model performs somehow equally good for all review score classes.
Because my dataset is highly unbalanced I want to create a metric/loss that is capable of capturing this (I think just as F1 for classification).
Therefore I created following metric (for now just mse is relevant):
def custom_metric(y_true, y_pred):
df = pd.DataFrame(np.column_stack([y_pred, y_true]), columns=["Predicted", "Truth"])
class_mse = 0
#class_mae = 0
print("MAE for Classes:")
for i in df.Truth.unique():
temp = df[df["Truth"]==i]
mse = mean_squared_error(temp.Truth, temp.Predicted)
#mae = mean_absolute_error(temp.Truth, temp.Predicted)
print("Class {}: {}".format(i, mse))
class_mse += mse
#class_mae += mae
print("AVG MSE over Classes {}".format(class_mse/len(df.Truth.unique())))
#print("AVG MAE over Classes {}".format(class_mae/len(df.Truth.unique())))
Now an example prediction:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, mean_absolute_error
# sample predictions: "model" messed up at class 2 and 3
y_true = np.array((1,1,1,2,2,2,3,3,3,4,4,4,5,5,5))
y_pred = np.array((1,1,1,2,2,3,5,4,3,4,4,4,5,5,5))
custom_metric(y_true, y_pred)
Now my question: Is it able to create a custom tensorflow loss function which is able to act in a similar behaviour? I also worked on this implementation which is not yet ready for tensorflow but maybe more alike:
def custom_metric(y_true, y_pred):
mse_class = 0
num_classes = len(np.unique(y_true))
stacked = np.vstack((y_true, y_pred))
for i in np.unique(stacked[0]):
y_true_temp = stacked[0][np.where(stacked[0]==i)]
y_pred_temp = stacked[1][np.where(stacked[0]==i)]
mse = np.mean(np.square(y_pred_temp - y_true_temp))
mse_class += mse
return mse_class/num_classes
But still, I am not sure how to work around the for loop for a tensorflow like definition.
Thanks in advance for any help!
The for loop should be dealt with exactly by means of numpy/tensorflow operations on a tensor.
A custom metric example would be:
from keras import backend as K
def custom_mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
where y_true is the ground truth label, y_pred are your predictions. You can see there are not explicit for-loops.
The motivation for not using for loops is that vectorized operations (which are present both in numpy and tensorflow) take advantage of the modern CPU architectures, turning multiple iterative operations into matrix ones. Consider that a dot-product implementation in numpy takes approximately 30 times less than a regular for-loop in Python.

Simple Neural Network does not learn non linear data?

I am trying to understand why this sample Neural Network with Numpy does not learn non-linear data. Even a simple NN is supposed to learn non-linear data right?
I want my NN to learn that if the input is 1 then 0 if the input is greater than 1 and less than 4 then 1. If value > 4 then 0.
I have tried many sample NN codes with numpy from google, I seem to get this problem.
The below code does not learn, but learn well with input [2,2,0,0] desired [1,1,0,0].
import numpy as np
# #sigmoid function
def nonlin(x,deriv=False):
return x*(1-x)
return 1/(1+np.exp(-x))
# input dataset
X = np.array([ [1],
[4] ])
# #output dataset
y = np.array([[0,1,1,0]]).T
# #seed random numbers to make calculation
# #deterministic (just a good practice)
# #initialize weights randomly with mean 0
syn0 = 2*np.random.random((1,1)) - 1
for iter in range(10000):
# #forward propagation
l0 = X
l1 = nonlin(,syn0))
# #how much did we miss?
l1_error = y - l1
# multiply how much we missed by the
# slope of the sigmoid at the values in l1
l1_delta = l1_error * nonlin(l1,True)
# #update weights
syn0 +=,l1_delta)
print ("Output After Training:")
print (l1)
Because your model is essentially a linear model. You need to add at least one hidden layer if you want to fit nonlinear data.
As already said, you have built a simple linear logistic regression model.
The sigmoid in your NN is only used to get the prediction of your model and not to actually non-linearly train the NN.
A good start at learning neural networks is this:

how to predict only one class in tensorflow

In case you want to predict only one class. Then first you need to label your vectors in such a way that maybe label all those vectors as 'one' which has ground truth 5 and 'zero' to those vectors whose ground truth is not 5.
How can I implement this in TensorFlow using python?
while preparing the data you can use numpy to set all the data points in class 5 as 1 and the others will be set to as 0 using .
arr = np.where(arr!=5,arr,0)
arr = np.where(arr=5,arr,1)
and then you can create a binary classifier using Tensorflow to classifiy them while using a binary_crossentropy loss to optimize the classifier
