pyspark, logistic regression, how to get coefficient of respective features - python

I am new to Spark, my current version is 1.3.1. And I want to implement logistic regression with PySpark, so, I found this example from Spark Python MLlib
from pyspark.mllib.classification import LogisticRegressionWithLBFGS
from pyspark.mllib.regression import LabeledPoint
from numpy import array
# Load and parse the data
def parsePoint(line):
values = [float(x) for x in line.split(' ')]
return LabeledPoint(values[0], values[1:])
data = sc.textFile("data/mllib/sample_svm_data.txt")
parsedData = data.map(parsePoint)
# Build the model
model = LogisticRegressionWithLBFGS.train(parsedData)
# Evaluating the model on training data
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features)))
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count())
print("Training Error = " + str(trainErr))
And I found the attributes of model are:
In [21]: model.<TAB>
model.clearThreshold model.predict model.weights
model.intercept model.setThreshold
How can I get the coefficients of logistic regression?

As you noticed the way to obtain the coefficients is by using LogisticRegressionModel's attributes.
Parameters:
weights – Weights computed for every feature.
intercept – Intercept computed for this model. (Only used in Binary Logistic Regression. In Multinomial Logistic Regression, the
intercepts will not be a single value, so the intercepts will be part
of the weights.)
numFeatures – the dimension of the features.
numClasses – the number of possible outcomes for k classes classification problem in Multinomial Logistic Regression. By default,
it is binary logistic regression so numClasses will be set to 2.
Don't forget that hθ(x) = 1 / exp ^ -(θ0 + θ1 * x1 + ... + θn * xn) where θ0 represents the intercept, [θ1,...,θn] the weights, and the number of features is n.
Edit
As you can see this is the way how the prediction is done, you can check LogisticRegressionModel's source.
def predict(self, x):
"""
Predict values for a single data point or an RDD of points
using the model trained.
"""
if isinstance(x, RDD):
return x.map(lambda v: self.predict(v))
x = _convert_to_vector(x)
if self.numClasses == 2:
margin = self.weights.dot(x) + self._intercept
if margin > 0:
prob = 1 / (1 + exp(-margin))
else:
exp_margin = exp(margin)
prob = exp_margin / (1 + exp_margin)
if self._threshold is None:
return prob
else:
return 1 if prob > self._threshold else 0
else:
best_class = 0
max_margin = 0.0
if x.size + 1 == self._dataWithBiasSize:
for i in range(0, self._numClasses - 1):
margin = x.dot(self._weightsMatrix[i][0:x.size]) + \
self._weightsMatrix[i][x.size]
if margin > max_margin:
max_margin = margin
best_class = i + 1
else:
for i in range(0, self._numClasses - 1):
margin = x.dot(self._weightsMatrix[i])
if margin > max_margin:
max_margin = margin
best_class = i + 1
return best_class

I'm using
model.coefficients
and it works!
Documentation:
https://spark.apache.org/docs/2.4.5/api/python/pyspark.ml.html?highlight=coefficients#pyspark.ml.classification.LogisticRegressionModel.coefficients

Related

Difference in cost result between Sklearn and manual implementation of logistic learning

I have been trying to replicate the logistic regression model with sklearn results. There seem to be having some minor difference in the cost. I think there would be some mistake.
Here is the code from Sklearn:
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=0.30)
classifier = sklearn.linear_model.LogisticRegression(max_iter = 10000)
classifier.fit(X_train,Y_train)
function = np.dot(X_train, classifier.coef_.reshape(-1,1)) + classifier.intercept_ - Y_train.reshape(-1,1)
activation = 1 / (1 + np.exp(-function))
cost = np.sum(-(Y_train.reshape(-1,1) * np.log(activation) + (1-Y_train.reshape(-1,1)) * np.log(1-activation)))/Y_train.shape[0]
print(cost)
cost = 0.09494712120076532
My attempt of replicating logistic regression model manually:
## My Logistic Model:
## Preprocessing
ones_train = np.ones((X_train.shape[0],1)) # Vector of ones for train X
ones_test = np.ones((X_test.shape[0],1)) # Vector of ones for train X
X_train_rev = np.concatenate((X_train,ones_train),axis = 1) # Append Intercept column in the training X data
X_test_rev = np.concatenate((X_test,ones_test),axis = 1) # Append Intercept column in the test X data
Y_train = Y_train.reshape(-1,1) # Reshape Y_train
Y_test = Y_test.reshape(-1,1) # Reshape Y_test
m = X_train_rev.shape[0]
n = X_train_rev.shape[1]
# Paramaterization
alpha = 0.001 # Learning_Rate
# Initilization
coefficient = np.random.randn(1,n) # Initialisation of coefficients including intercept
# Update Gradients
for i in range(10000):
Z = np.dot(X_train_rev, coefficient.reshape(-1,1)) - Y_train.reshape(-1,1) # Compute Z
A = 1 / (1 + np.exp(-Z)) # Compute A
cost = np.sum(-(Y_train.reshape(-1,1) * np.log(A) + (1-Y_train.reshape(-1,1)) * np.log(1-A)))/X_train_rev.shape[0] # Compute cost
if i % 1000 == 0: print(cost) # Print cost
grad = np.dot((A - Y_train).T,X_train_rev) # Compute Gradient
coefficient = coefficient - (alpha * grad) # adjust coefficients including intercept
Cost after 1000th iteration:
1.4916689930810232
0.0875458497191988
0.0875181643157349
0.08751717663190926
0.08751704000862144
0.08751701549904194
0.08751701099400619
0.08751701016449238
0.08751701001176625
0.08751700998365015
It is observed that there is a minor difference of 0.01 approximately between Sklearn and my manual implementation. I guess this is considered to be a big difference in ML. I re ran the code multiple times and it gives the cost from manual implementation to be approx. 0.01 lower than what Sklearn provides.
Here are the coefficients learnt by the two model. They are quite different:
Thank you all

Scipy fails to minimize cost function

Currently I'm learning from Andrew Ng course on Coursera called "Machine Learning". In exercise 5, we built a model that can predict digits, trained by the MNIST dataset. This task was completed successfully in Matlab by me, but I wanted to migrate that code to Python, just to see how different things are and maybe continue to play around with the model.
I managed to implement the cost function and the back propagation algorithm correctly. I know that because I compared the metrics with my working model in Matlab and it emits the same numbers.
Now, because in the course we train the model using fmincg, I tried to do the same using Scipy fmin_cg
function.
My problem is, the cost function takes extra small steps and fails to converge.
Here is my code for the network:
import numpy as np
import utils
import scipy.optimize as op
class Network:
def __init__(self, layers):
self.layers = layers
self.weights = self.generate_params()
# Function for generating theta multidimensional matrix
def generate_params(self):
theta = []
epsilon = 0.12
for i in range(len(self.layers) - 1):
current_layer_units = self.layers[i]
next_layer_units = self.layers[i + 1]
theta_i = np.multiply(
np.random.rand(next_layer_units, current_layer_units + 1),
2 * epsilon - epsilon
)
# Appending the params to the theta matrix
theta.append(theta_i)
return theta
# Function to append bias row/column to matrix X
def append_bias(self, X, d):
m = X.shape[0]
n = 1 if len(X.shape) == 1 else X.shape[1]
if (d == 'column'):
ones = np.ones((m, n + 1))
ones[:, 1:] = X.reshape((m, n))
elif (d == 'row'):
ones = np.ones((m + 1, n))
ones[1:, :] = X.reshape((m, n))
return ones
# Function for computing the gradient for 1 training example
def back_prop(self, y, feed, theta):
activations = feed["activations"]
weighted_layers = feed["weighted_layers"]
delta_output = activations[-1] - y.reshape(len(y), 1)
current_delta = delta_output
# Initializing gradients
gradients = []
for i, theta_i in enumerate(theta):
gradients.append(np.zeros(theta_i.shape))
# Peforming delta calculations.
# Here, we continue to propagate the delta values backwards
# until we arrive to the second layer.
for i in reversed(range(len(theta))):
theta_i = theta[i]
if (i > 0):
i_weighted_inputs = self.append_bias(weighted_layers[i - 1], 'row')
t_theta_i = np.transpose(theta_i)
delta_i = np.multiply(np.dot(t_theta_i, current_delta), utils.sigmoidGradient(i_weighted_inputs))
delta_i = delta_i[1:]
gradients[i] = current_delta * np.transpose(activations[i])
# Setting current delta for the next layer
current_delta = delta_i
else:
gradients[i] = current_delta * np.transpose(activations[i])
return gradients
# Function for computing the cost and the derivatives
def compute_cost(self, theta, X, y, r12n = 0):
m = len(X)
num_labels = self.layers[-1]
costs = np.zeros(m)
# Initializing gradients
gradients = []
for i, theta_i in enumerate(theta):
gradients.append(np.zeros(theta_i.shape))
# Iterating over the training set
for i in range(m):
inputs = X[i]
observed = utils.create_output_vector(y[i], num_labels)
feed = self.feed_forward(inputs)
predicted = feed["activations"][-1]
total_cost = 0
for k, o in enumerate(observed):
if (o == 1):
total_cost += np.log(predicted[k])
else:
total_cost += np.log(1 - predicted[k])
cost = -1 * total_cost
# Storing the cost for the i-th training example
costs[i] = cost
# Calculating the gradient for this training example
# using back propagation algorithm
gradients_i = self.back_prop(observed, feed, theta)
for i, gradient in enumerate(gradients_i):
gradients[i] += gradient
# Calculating the avg regularization term for the cost
sum_of_theta = 0
for i, theta_i in enumerate(theta):
squared_theta = np.power(theta_i[:, 1:], 2)
sum_of_theta += np.sum(squared_theta)
r12n_avg = r12n * sum_of_theta / (2 * m)
total_cost = np.sum(costs) / m + r12n_avg
# Applying regularization terms to the gradients
for i, theta_i in enumerate(theta):
lambda_i = np.copy(theta_i)
lambda_i[:, 0] = 0
lambda_i = np.multiply((r12n / m), lambda_i)
# Adding the r12n matrix to the gradient
gradients[i] = gradients[i] / m + lambda_i
return total_cost, gradients
# Function for training the neural network using conjugate gradient algorithm
def train_cg(self, X, y, r12n = 0, iterations = 50):
weights = self.weights
def Cost(theta, X, y):
theta = utils.roll_theta(theta, self.layers)
cost, _ = self.compute_cost(theta, X, y, r12n)
print(cost);
return cost
def Gradient(theta, X, y):
theta = utils.roll_theta(theta, self.layers)
_, gradient = self.compute_cost(theta, X, y, r12n)
return utils.unroll_theta(gradient)
unrolled_theta = utils.unroll_theta(weights)
result = op.fmin_cg(f = Cost,
x0 = unrolled_theta,
args=(X, y),
fprime=Gradient,
maxiter = iterations)
self.weights = utils.roll_theta(result, self.layers)
# Function for feeding forward the network
def feed_forward(self, X):
# Useful variables
activations = []
weighted_layers = []
weights = self.weights
currentActivations = self.append_bias(X, 'row')
activations.append(currentActivations)
for i in range(len(self.layers) - 1):
layer_weights = weights[i]
weighted_inputs = np.dot(layer_weights, currentActivations)
# Storing the weighted inputs
weighted_layers.append(weighted_inputs)
activation_nodes = []
# If the next layer is not the output layer, we'd like to add a bias unit to it
# (Excluding the input and the output layer)
if (i < len(self.layers) - 2):
activation_nodes = self.append_bias(utils.sigmoid(weighted_inputs), 'row')
else:
activation_nodes = utils.sigmoid(weighted_inputs)
# Appending the layer of nodes to the activations array
activations.append(activation_nodes)
currentActivations = activation_nodes
data = {
"activations": activations,
"weighted_layers": weighted_layers
}
return data
def predict(self, X):
data = self.feed_forward(X)
output = data["activations"][-1]
# Finding the max index in the output layer
return np.argmax(output, axis=0)
Here is the invocation of the code:
import numpy as np
from network import Network
# %% Load data
X = np.genfromtxt('data/mnist_data.csv', delimiter=',')
y = np.genfromtxt('data/mnist_outputs.csv', delimiter=',').astype(int)
# %% Create network
num_labels = 10
input_layer = 400
hidden_layer = 25
output_layer = num_labels
layers = [input_layer, hidden_layer, output_layer]
# Create a new neural network
network = Network(layers)
# %% Train the network and save the weights
network.train_cg(X, y, r12n = 1, iterations = 20)
This is what the code emits after each iteration:
15.441233231650283
15.441116436313076
15.441192262452514
15.44122384651483
15.441231216030646
15.441232804294314
15.441233141284435
15.44123321255294
15.441233227614855
As you can see, the changes to the cost are very small.
I checked for the shapes of the vectors and gradient and they both seem fine, just like in my Matlab implementation. I'm not sure what I do wrong here.
If you guys could help me, that'd be great :)

How to create a customized scoring function in scikit-learn for scoring a set of instances based on their individual properties?

I'm trying to perform GridSearchCV to optimize hyperparameters of my classifier, this should be done by optimizing a custom scoring-function. The problem is, that the scoring-function is assigned on a certain cost, that is different each instance (the cost is also a feature of each instance). Like shown in the example below, another array test_amt is needed that holds the cost of each instance (in addition to the 'normal' scoring function that just gets y and y_pred.
def calculate_costs(y_test, y_test_pred, test_amt):
cost = 0
for i in range(1, len(y_test)):
y = y_test.iloc[i]
y_pred = y_test_pred.iloc[i]
x_amt = test_amt.iloc[i]
if y == 0 and y_pred == 0:
cost -= x_amt * 1.1
elif y == 0 and y_pred == 1:
cost += x_amt
elif y == 1 and y_pred == 0:
cost += x_amt * 1.1
elif y == 1 and y_pred == 1:
cost += 0
else:
print("ERROR! No cost could be assigned to the instance: " + str(i))
return cost
When I call this function after training with the three arrays, it perfectly calculates the total cost that results from a model. However integrating this into GridSearchCV is difficult, because the scoring function only expects two parameters. While there is the possibility to pass additional kwargs to the scorer, I have no clue on how to pass a subset that is dependent on the split that GridSearchCV is currently working on.
What I have thought of / tried so far:
Wrapping the whole pipeline in a class with a globally stored pandas.Series object that stores the cost of each instance with an index. Then, it would theoretically be possible to reference the cost of an instance by calling it with the same index. Unfortunately, this does not work as scikit-learn transforms everything into a numpy array.
def calculate_costs_class(y_test, y_test_pred):
cost = 0
for index, _ in y_test.iteritems():
y = y_test.loc[index]
y_pred = y_test_pred.loc[index]
x_amt = self.test_amt.loc[index]
if y == 0 and y_pred == 0:
cost += (x_amt * (-1)) + 5 + (x_amt * 0.1) # -revenue, +shipping, +fees
elif y == 0 and y_pred == 1:
cost += x_amt # +revenue
elif y == 1 and y_pred == 0:
cost += x_amt + 5 + (x_amt * 0.1) + 5 # +revenue, +shipping, +fees, +charge cost
elif y == 1 and y_pred == 1:
cost += 0 # nothing
else:
print("ERROR! No cost could be assigned to the instance: " + str(index))
return cost
Creating a custom PseudoInt class, that is the data type of the label, which inherits all properties from int but is also able to store the cost of an instance (while retaining all its properties for applying logical operations). While even this would work outside of Scikit Learn, the check_classification_targets method in scikit learn raises a ValueError: Unknown label type: 'unknown' error.
class PseudoInt(int):
def __new__(cls, x, cost, *args, **kwargs):
instance = int.__new__(cls, x, *args, **kwargs)
instance.cost = cost
return instance
I haven't tried but thought of: Since the cost is also a feature in the instance set X, it is also available in the __call__ method of _PredictScorer(_BaseScorer) class in Scikit's scorer.py. If I reprogram the call function to also pass the cost array as a subset of X to the score_func I would also have the cost.
Or: I could just implement everything myself.
Is there an "easier" solution?
I found a way to solve the problem by going the path of the 2nd proposed answer: Passing a PseudoInteger to Scikit-Learn that has all the same properties as a normal integer when compared or done mathematical operations with. However, it also acts as a wrapper for the int, and instance variables (such as the cost of an instance) can also be stored. As already stated in the question, this causes Scikit-learn to recognize that the values inside the passed label array are in fact of type object rather than int. So I just replaced the test in the type_of_target(y) method of Scikit-Learn's multiclass.py in line 273 to return 'binary' even though it doesn't pass the test. So that Scikit-Learn just treats the whole problem (as it should be) like a binary classification problem. So line 269-273 in the type_of_target(y) method in multiclass.py now looks like:
# Invalid inputs
if y.ndim > 2 or (y.dtype == object and len(y) and
not isinstance(y.flat[0], string_types)):
# return 'unknown' # [[[1, 2]]] or [obj_1] and not ["label_1"]
return 'binary' # Sneaky, modified to force binary classification.
My code then looks like this:
import sklearn
import sklearn.model_selection
import sklearn.base
import sklearn.metrics
import numpy as np
import sklearn.tree
import sklearn.feature_selection
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics.scorer import make_scorer
class PseudoInt(int):
# Behaves like an integer, but is able to store instance variables
pass
def grid_search(x, y_normal, x_amounts):
# Change the label set to a np array containing pseudo ints with the costs associated with the instances
y = np.empty(len(y_normal), dtype=PseudoInt)
for index, value in y_normal.iteritems():
new_int = PseudoInt(value)
new_int.cost = x_amounts.loc[index] # Here the cost is added to the label
y[index] = new_int
# Normal train test split
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.2)
# Classifier
clf = sklearn.tree.DecisionTreeClassifier()
# Custom scorer with the cost function below (lower cost is better)
cost_scorer = make_scorer(cost_function, greater_is_better=False) # Custom cost function (Lower cost is better)
# Define pipeline
pipe = Pipeline([('clf', clf)])
# Grid search grid with any hyper parameters or other settings
param_grid = [
{'sfs__estimator__criterion': ['gini', 'entropy']}
]
# Grid search and pass the custom scorer function
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring=cost_scorer,
n_jobs=1,
cv=5,
refit=True)
# run grid search and refit with best hyper parameters
gs = gs.fit(x_train.as_matrix(), y_train)
print("Best Parameters: " + str(gs.best_params_))
print('Best Accuracy: ' + str(gs.best_score_))
# Predict with retrained model (with best parameters)
y_test_pred = gs.predict(x_test.as_matrix())
# Get scores (also cost score)
get_scores(y_test, y_test_pred)
def get_scores(y_test, y_test_pred):
print("Getting scores")
print("SCORES")
precision = sklearn.metrics.precision_score(y_test, y_test_pred)
recall = sklearn.metrics.recall_score(y_test, y_test_pred)
f1_score = sklearn.metrics.f1_score(y_test, y_test_pred)
accuracy = sklearn.metrics.accuracy_score(y_test, y_test_pred)
print("Precision " + str(precision))
print("Recall " + str(recall))
print("Accuracy " + str(accuracy))
print("F1_Score " + str(f1_score))
print("COST")
cost = cost_function(y_test, y_test_pred)
print("Cost Savings " + str(-cost))
print("CONFUSION MATRIX")
cnf_matrix = sklearn.metrics.confusion_matrix(y_test, y_test_pred)
cnf_matrix = cnf_matrix.astype('float') / cnf_matrix.sum(axis=1)[:, np.newaxis]
print(cnf_matrix)
def cost_function(y_test, y_test_pred):
"""
Calculates total cost based on TP, FP, TN, FN and the cost of a certain instance
:param y_test: Has to be an array of PseudoInts containing the cost of each instance
:param y_test_pred: Any array of PseudoInts or ints
:return: Returns total cost
"""
cost = 0
for index in range(len(y_test)):
# print(index)
y = y_test[index]
y_pred = y_test_pred[index]
x_amt = y.cost
if y == 0 and y_pred == 0:
cost -= x_amt # Reducing cot by x_amt
elif y == 0 and y_pred == 1:
cost += x_amt # Wrong classification adds cost
elif y == 1 and y_pred == 0:
cost += x_amt + 5 # Wrong classification adds cost and fee
elif y == 1 and y_pred == 1:
cost += 0 # No cost
else:
raise ValueError("No cost could be assigned to the instance: " + str(index))
# print("Cost: " + str(cost))
return cost
UPDATE
Instead of changing the files in the package directly (which is kind of dirty), I now added to the first import lines of my project:
import sklearn.utils.multiclass
def return_binary(y):
return "binary"
sklearn.utils.multiclass.type_of_target = return_binary
This overwrites the type_of_tartget(y) method in sklearn.utils.multiclass to always return binary. Note that his has to be in front of all the other sklearn-imports.

Logistic Regression with regularization in python failing to minimize

I'm implementing logistic regression based on the Coursera documentation, both in python and Octave.
In Octave, I managed to do it and achieve the right training accuracy, but in python, since I don't have access to fminunc, I cannot figure out a work around.
Currently, this is my code:
df = pandas.DataFrame.from_csv('ex2data2.txt', header=None, index_col=None)
df.columns = ['x1', 'x2', 'y']
y = df[df.columns[-1]].as_matrix()
m = len(y)
y = y.reshape(m, 1)
X = df[df.columns[:-1]]
X = X.as_matrix()
from sklearn.preprocessing import PolynomialFeatures
feature_mapper = PolynomialFeatures(degree=6)
X = feature_mapper.fit_transform(X)
def sigmoid(z):
return 1/(1+np.power(np.e, z))
def cost_function_reg(theta):
_theta = theta.copy().reshape(-1, 1)
shifted_theta = np.insert(_theta[1:], 0, 0)
h = sigmoid(np.dot(X, _theta))
reg = (_lambda / (2.0*m))* shifted_theta.T.dot(shifted_theta)
J = ((1.0/m)*(-y.T.dot(np.log(h)) - (1 - y).T.dot(np.log(1-h)))) + reg
return J
def gradient(theta):
_theta = theta.copy().reshape(-1, 1)
shifted_theta = np.insert(_theta[1:], 0, 0)
h = sigmoid(np.dot(X, _theta))
gradR = _lambda*shifted_theta
gradR.shape = (gradR.shape[0], 1)
grad = (1.0/m)*(X.T.dot(h-y)+gradR)
return grad.flatten()
from scipy.optimize import *
theta = fmin_ncg(cost_f, initial_theta, fprime=gradient)
predictions = predict(theta, X)
accuracy = np.mean(np.double(predictions == y)) * 100
print 'Train Accuracy: %.2f' % accuracy
The output is:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 0.693147
Iterations: 0
Function evaluations: 22
Gradient evaluations: 12
Hessian evaluations: 0
Train Accuracy: 50.85
In octave, the accuracy is: 83.05.
Any help is appreciated.
There were two problems on that implementation:
The first one, fmin_ncg is not ideal for that minimization. I have used it on the previous exercise, but it was failing to find the theta with that gradient function, which is ideal to the one in Octave.
Switching to
theta = fmin_bfgs(cost_function_reg, initial_theta)
Fixed that issue.
The second issue was that the accuracy was being miscalculated.
Once I optimized with fmin_bfgs, and achieved the cost that matched the Octave results (0.529), the (predictions == y) part had different shapes ((118, 118) and (118,1)) , yielding a matrix that was MxM instead of vector.

Using theano to implement maximum likelihood learning in neural probability language model Python

I'm trying to implement maximum likelihood learning for neural probability language model in python from code of log-bilinear model:
https://github.com/wenjieguan/Log-bilinear-language-models/blob/master/lbl.py
I used grad function in theano to compute gradient and try using function train to update parameters of model, but it got errors. Here is my code:
def train(self, sentences, alpha = 0.001, batches = 1000):
print('Start training...')
self.alpha = alpha
count = 0
RARE = self.vocab['<>']
#print RARE
q = np.zeros(self.dim, np.float32)
#print q
delta_context = [np.zeros((self.dim, self.dim), np.float32) for i in range(self.context) ]
#print delta_context
delta_feature = np.zeros((len(self.vocab), self.dim), np.float32)
#print delta_feature
for sentence in sentences:
sentence = self.start_sen + sentence + self.end_sen
for pos in range(self.context, len(sentence) ):
count += 1
q.fill(0)
featureW = []
contextMatrix = []
indices = []
for i, r in enumerate(sentence[pos - self.context : pos]):
if r == '<_>':
continue
index = self.vocab.get(r, RARE)
print index
indices.append(index)
ri = self.featureVectors[index]
#print ri
ci = self.contextMatrix[i]
#print ci
featureW.append(ri)
contextMatrix.append(ci)
#Caculating predicted representation for the target word
q += np.dot(ci, ri)
#Computing energy function
energy = np.exp(np.dot(self.featureVectors, q) + self.biases)
#print energy
#Computing the conditional distribution
probs = energy / np.sum(energy)
#print probs
w_index = self.vocab.get(sentence[pos], RARE)
#Computing gradient
logProbs = T.log(probs[w_index])
print 'Gradient start...'
delta_context, delta_feature = T.grad(logProbs, [self.contextMatrix, self.featureVectors])
print 'Gradient completed!'
train = theano.function(
inputs = [self.vocab],
outputs = [logProbs],
updates=((self.featureVectors,self.featureVectors - self.alpha * delta_feature),
(self.contextMatrix,self.contextMatrix - self.alpha * delta_context)),
name="train"
)
print('Training is finished!')
I have just learnt about Python and neural probability language model, so it is quite difficult to me.
Could you help me, please! Thank you!

Categories