I've been searching on how to perform matrix factorization for this very simple and basic case that I will show, but didn't find anything. I only found complex and long solutions, so I will present what I want to solve:
U x V = A
I would just like to know how to solve this equation in Tensorflow 2, being A a known sparse matrix, and U and V two random initialized matrices. So I would like to find U and V, so that their multiplication is approximately equal to A.
For example, having these variables:
# I use this function to build a toy dataset for the sparse matrix
def build_rating_sparse_tensor(ratings):
indices = ratings[['U_num', 'V_num']].values
values = ratings['rating'].values
return tf.SparseTensor(
indices=indices,
values=values,
dense_shape=[ratings.U_num.max()+1, ratings.V_num.max()+1])
# here I create what will be the matrix A
ratings = (pd.DataFrame({'U_num': list(range(0,10_000))*30,
'V_num': list(range(0,60_000))*5,
'rating': np.random.randint(6, size=300_000)})
.sample(1000)
.drop_duplicates(subset=['U_num','V_num'])
.sort_values(['U_num','V_num'], ascending=[1,1]))
# Variables
A = build_rating_sparse_tensor(ratings)
U = tf.Variable(tf.random_normal(
[A_Sparse.shape[0], embeddings], stddev=init_stddev))
# this matrix would be transposed in the equation
V = tf.Variable(tf.random_normal(
[A_Sparse.shape[1], embeddings], stddev=init_stddev))
# loss function
def sparse_mean_square_error(sparse_ratings, user_embeddings, movie_embeddings):
predictions = tf.reduce_sum(
tf.gather(user_embeddings, sparse_ratings.indices[:, 0]) *
tf.gather(movie_embeddings, sparse_ratings.indices[:, 1]),
axis=1)
loss = tf.losses.mean_squared_error(sparse_ratings.values, predictions)
return loss
Is it possible to do this with a particular loss function, optimizer and learning schedule?
Thank you very much.
A naive and straightforward approach using TensorFlow 2:
Note that rating was converted to float32. TensorFlow cannot calculate gradients over integer, see https://github.com/tensorflow/tensorflow/issues/20524.
A = build_rating_sparse_tensor(ratings)
indices = ratings[["U_num", "V_num"]].values
embeddings = 3000
U = tf.Variable(tf.random.normal([A.shape[0], embeddings]), dtype=tf.float32)
V = tf.Variable(tf.random.normal([embeddings, A.shape[1]]), dtype=tf.float32)
optimizer = tf.optimizers.Adam()
trainable_weights = [U, V]
for step in range(100):
with tf.GradientTape() as tape:
A_prime = tf.matmul(U, V)
# indexing the result based on the indices of A that contain a value
A_prime_sparse = tf.gather(
tf.reshape(A_prime, [-1]),
indices[:, 0] * tf.shape(A_prime)[1] + indices[:, 1],
)
loss = tf.reduce_sum(tf.metrics.mean_squared_error(A_prime_sparse, A.values))
grads = tape.gradient(loss, trainable_weights)
optimizer.apply_gradients(zip(grads, trainable_weights))
if step % 20 == 0:
print(f"Training loss at step {step}: {loss:.4f}")
We take advantage of the sparsity of A by calculating the loss only over the actual values of A. However, we still have to allocate two really big dense tensor for the trainable weights U and V. For big numbers like in your example, you will probably encounter some OOM errors.
Maybe it could be worth exploring another representation for your data.
Related
I am working on implementing an algorithm which approximates the solution of a partial differential equation. The main idea behind this is that I start at time 0 with a guess of the solution u[0], and its gradient z[0], and then use a recursive formula to approximately calculate the solution up to the last time point in a forward manner. The formula looks like this
u[i+1] = u[i] + f(t[i],x[i],u[i],z[i])*dt + z[i]*dW[i]
where the function f, time time discretization, the time step dt, and the increment of a Brownian motion dW is given. The gradient z[i] at time point i is being approximated by a deep neural network with input x[i] which I already have implemented with tf.keras with two hidden dense layers. These networks perform quite well. So far, I have N (number of time points) independent neural networks approximating z[i] for each time point respectively.
My task is to form a global neural network with input (x, W), and where (u[0], z[0]) will be given to this network as network parameters, such that this network can than optimize its parameters by minimizing the expected quadratic loss of the output/approximation of uN and the given terminal condition of the partial differential equation g(x). u[0] will then be the solution of the PDE. So while my neural networks approximating have 2 hidden layers each, the global network should have 2*(N-1) layers in total.
My neural networks for the gradients look like this:
# Input dimension
d = 1
# Output dimension
d_1 = 1
# Number of neurons
m = d + 10
# Batch size
batch_size = 32
# Training data
x_tr = some_simulation()
z_tr = calculated_given(x_tr)
# Test data
x_te = some_simulation()
z_te = calculated_given(x_te)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(d,), dtype=tf.float32))
model.add(tf.keras.layers.Dense(m, activation=tf.nn.tanh))
model.add(tf.keras.layers.Dense(m, activation=tf.nn.tanh))
model.add(tf.keras.layers.Dense(d_1, activation=tf.keras.activations.linear))
model.compile(optimizer='adam',
loss='MeanSquaredError',
metrics=[])
model.fit(x_tr, z_tr, batch_size=batch_size, epochs=10)
val_loss = model.evaluate(x_te, z_te)
print(val_loss)
So I have trained N of them, and saved each as a file using
model.save(path_to_model)
So given the approximations of the gradients, I now want to stack all the subnetworks together to form a global deep neural network, which is based on the recursive formula above, which only takes the N-dimensional vectors x, and W as input data, and which gives the final output u[N] as output, and which uses (u[0], z[0]) as parameters. But I am trying to wrap my head around for two days as to how such a global neural network should be implemented in Python using Tensorflow.keras, so maybe someone can give me a push in the right direction?
I assume that you'll pass the tuple (x, dW, t) as input to your model, since t is also indexed. Furthermore, you can always create dW from W using np.diff. I also assume that u0 and z0 are scalars (common to all batches).
With all of that in mind, you can subclass the base Model and override its call() as follows
class GlobalModel(tf.keras.models.Model):
def __init__(self, u0, z0, dt, subnet_list, **kwargs):
super().__init__(**kwargs)
self.u0 = tf.Variable(u0, trainable=True, dtype=tf.float32)
self.z0 = tf.Variable(z0, trainable=True, dtype=tf.float32)
self.dt = tf.constant(dt, dtype=tf.float32)
self.subnet_list = subnet_list
# Freeze the pre-trained subnets
for subnet in subnet_list:
subnet.trainable = False
def f(self, t, x, u, z):
# code of your function f() goes here
def step_update(self, t, x, u, z, dW):
return u + self.f(t, x, u, z) * self.dt + z * dW
def call(self, inputs, training=None):
x, dW, t = inputs
# First step
x_i = tf.gather(x, 0, axis=1)
dW_i = tf.gather(dW, 0, axis=1)
t_i = tf.gather(t, 0, axis=1)
u_i = self.step_update(t_i, x_i, self.u0, self.z0, dW_i)
# Subsequent steps
for i, subnet in enumerate(self.subnet_list):
x_i = tf.gather(x, i+1, axis=1)
dW_i = tf.gather(dW, i+1, axis=1)
t_i = tf.gather(t, i+1, axis=1)
z_i = subnet(x_i, training=False)
u_i = self.step_update(t_i, x_i, u_i, z_i, dW_i)
return u_i
You initialize this model by
global_model = GlobalModel(init_u0, init_z0, dt, subnet_list)
where subnet_list is a list of your pre-trained subnets, ordered by time index. That is, the subnet responsible for predicting z_i should be at index i-1 in this list.
After compiling, you call fit() on the model by
global_model.fit(x=(x_tr, dW_tr, t_tr), y=y_tr, batch_size=batch_size, epochs=epochs)
where y_tr is your target.
I want to implement Logistic Regression with crossentropy loss using this formula for loss function and the gradient of it.
the image of loss function
the image of the gradient of the loss function
This is my code in python. I want to make this code to operate without a for loop and more like vector or array type operations.
w = np.array([1, 1, 1, 1])
hasses = []
for i in range(X.shape[0]):
soo = np.dot(y, X)
print(soo)
print(soo.shape)
mac = np.dot(y[i], w.T, X[i])
#print(mac)
print(mac.shape)
mac = 1 + np.exp(mac)
#print(mac)
print(mac.shape)
has = soo / mac
#print(has)
print(has.shape)
hasses.append(has)
sdf
hasses = np.array(hasses)
print(hasses)
print(hasses.shape)
res = np.sum(hasses)
print(res)
print(res.shape)
res /= X.shape[0]
print(res)
print(res.shape)
The dataset is Titanic dataset. I have 3 features for X, plus one feature for bias (which contains only 1 value) and for labels, only +1 and -1.
Thanks.
I have a tensorflow model whose outputs correspond to coefficients of multiple polynomials. Note that my model actually has another set outputs (multi-output), but I've mocked this below just by returning the input in addition to the polynomial coefficients.
I'm having a lot of trouble during the training of the model, related to tensor shapes. I've verified that the model is able to predict on sample inputs, and that the loss function works on sample outputs. But, during training, it immediately throws an error (see below)
For every input, the model takes in a fixed embedding-size input, and outputs coefficients for 2 polynomials of degree 2. For example, the output on a single input can look like:
[array([[[1, 2, 3],
[ 4, 5, 6]]]),
[...]]
corresponding to polynomials [1*x^2+2*x+3, 4*x^2+5*x+6]. Note that I've hidden the second output.
I noticed that tf.math.polyval requires a list of coefficients, making it wonky with AutoGrad. So, I implemented my own version of Horner's algorithm with pure tensors.
import numpy as np
import tensorflow as tf
import logging
import tensorflow.keras as K
#tf.function
def tensor_polyval(coeffs, x):
"""
Calculates polynomial scalars from tensor of polynomial coefficients
Tensorflow tf.math.polyval requires a list coeff, which isn't compatible with autograd
# Inputs:
- coeffs (NxD Tensor): each row of coeffs corresponds to r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D]
- x: Scalar!
# Output:
- r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D] for row in coeffs
"""
p = coeffs[:, 0]
for i in range(1,coeffs.shape[1]):
tf.autograph.experimental.set_loop_options(
shape_invariants=[(p, tf.TensorShape([None]))])
c = coeffs[:, i]
p = tf.add(c, tf.multiply(x, p))
return p
#tf.function
def coeffs_to_poly(coeffs, n):
# Converts a NxD array of coefficients to N evaluated polynomials at x=n
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
Now here's a super-simplified example of my model, loss function and training routine:
def model_init(embedDim=8, polyDim=2,terms=2):
input = K.Input(shape=(embedDim,))
x = K.layers.Reshape((embedDim,))(input)
aCoeffs = K.layers.Dense((polyDim+1)*terms, activation='tanh')(x)
aCoeffs = K.layers.Reshape((terms, polyDim+1))(aCoeffs)
model = K.Model(inputs=input, outputs=[aCoeffs, input])
return model
def get_random_batch(batch, embedDim, dtype='float64'):
x = np.random.randn(batch, embedDim).astype(dtype)
y = np.array([1. for i in range(batch)]).astype(dtype)
return [x,
y]
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
tf.constant(5,dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an,axis=-1))
embedDim=8
polyDim=2
terms=2
dataType = 'float64'
tf.keras.backend.set_floatx(dataType)
model = model_init(embedDim, polyDim, terms)
XTrain, yTrain = get_random_batch(batch=128,
embedDim=embedDim)
# Init Model
LR = 0.001
loss = test_loss
epochs = 5
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=loss)
hist = model.fit(XTrain,
yTrain,
batch_size=4,
epochs=epochs,
max_queue_size=10, workers=2, use_multiprocessing=True)
The error I get is related to the tensor_polyval function:
<ipython-input-15-f96bd099fe08>:3 test_loss *
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
<ipython-input-5-7205207d12fd>:23 coeffs_to_poly *
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
<ipython-input-5-7205207d12fd>:13 tensor_polyval *
p = coeffs[:, 0]
...
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice}} = StridedSlice[Index=DT_INT32, T=DT_DOUBLE, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2](coeffs, strided_slice/stack, strided_slice/stack_1, strided_slice/stack_2)' with input shapes: [3], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.
What's frustrating is that I'm perfectly able to predict with the model on sample inputs and also calculate a sample loss:
test_loss(yTrain[0:5],
model.predict(XTrain[0:5]),
dtype=dataType)
which runs just fine.
In the test_loss function, specifically the I'm just referring to the first output, via y_p[0]. It tries to calculate the value of the polynomials at n=5 and then outputs an average over everything (again this is just mocked code). As I understand it, y_p[1] would refer to the second output (in this case, a copy of the input). I would think the tf.vectorized_map should be operating across all outputs of the model batch, but it seems to be slicing one extra dimension??
I noticed that the code does train if I remove the output ,input in the model (making it a single output) and change y_p[0] to y_p in the test_loss. I have no idea why it's broken when adding the extra output, as my understanding of tf.vectorized_map implies that it acts separately on each element of the list y_pred
If we need the single loss function to receive multiple outputs altogether, perhaps we can concatenate them together to form one output.
In this case:
Changes to the model structure, here we pack the outputs:
def model_init(embedDim=8, polyDim=2, terms=2):
input = K.Input(shape=(embedDim, ))
x = K.layers.Reshape((embedDim, ))(input)
aCoeffs = K.layers.Dense((polyDim + 1) * terms, activation='tanh')(x)
# pack the two outputs, add flatten layers if their shapes are not batch*K
outputs = K.layers.Concatenate()([aCoeffs, input])
model = K.Model(inputs=input, outputs=outputs)
model.summary()
return model
Changes to the loss function, here we unpack the outputs:
# the loss function needs to know these
polyDim = 2
terms = 2
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for flattened outputs."""
# unpack multiple outputs
offset = (polyDim + 1) * terms
aCoeffs = tf.reshape(y_pred[:, :offset], [-1, terms, polyDim + 1])
inputs = y_pred[:, offset:]
print(aCoeffs, inputs)
# do something with the two unpacked outputs, like below
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
aCoeffs)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
Notice that the loss function relies on the knowledge of the original shapes of the outputs in order to restore them. Consider sub-classing tf.keras.losses.Loss.
P.S. For anyone simply need different losses for the multiple losses:
Define loss functions for the two outputs.
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 1
(Only changed y_p[0] to y_p)"""
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
#tf.function
def dummy_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 2 i.e. the input, for debugging
Better use 0 insead of 1.2345"""
return tf.constant(1.2345, dataType)
Change to model.compile:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=[test_loss, dummy_loss])
I saw this question: Implementing custom loss function in keras with condition And I need to do the same thing but with code that seems to need loops.
I have a custom numpy function which calculates the mean Euclid distance from the mean vector. I wrote this based on the paper https://arxiv.org/pdf/1801.05365.pdf:
import numpy as np
def mean_euclid_distance_from_mean_vector(n_vectors):
dists = []
for (i, v) in enumerate(n_vectors):
n_vectors_rest = n_vectors[np.arange(len(n_vectors)) != i]
print("rest of vectors: ")
print(n_vectors_rest)
# calculate mean vector
mean_rest = n_vectors_rest.mean(axis=0)
print("mean rest vector")
print(mean_rest)
dist = v - mean_rest
print("dist vector")
print(dist)
dists.append(dist)
# dists is now a matrix of distance vectors (distance from the mean vector)
dists = np.array(dists)
print("distance vector matrix")
print(dists)
# here we matmult each vector
# sum them up
# and divide by the total number of elements
result = np.sum([np.matmul(d, d) for d in dists]) / dists.size
return result
features = np.array([
[1,2,3,4],
[4,3,2,1]
])
c = mean_euclid_distance_from_mean_vector(features)
print(c)
I need this function however to work inside tensorflow with Keras. So a custom lambda https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
However, I'm not sure how to implement the above in Keras/Tensorflow since it has loops, and the way the paper talked about calculating the m_i seems to require loops like the way I implemented the above.
For reference, the PyTorch version of this code is here: https://github.com/PramuPerera/DeepOneClass
Given a feature map like:
features = np.array([
[1, 2, 3, 4],
[2, 4, 4, 3],
[3, 2, 1, 4],
], dtype=np.float64)
reflecting a batch_size of
batch_size = features.shape[0]
and
k = features.shape[1]
One has that implementing the above Formulas in Tensorflow could be expressed (prototyped) by:
dim = (batch_size, features.shape[1])
def zero(i):
arr = np.ones(dim)
arr[i] = 0
return arr
mapper = [zero(i) for i in range(batch_size)]
elems = (features, mapper)
m = (1 / (batch_size - 1)) * tf.map_fn(lambda x: tf.math.reduce_sum(x[0] * x[1], axis=0), elems, dtype=tf.float64)
pairs = tf.map_fn(lambda x: tf.concat(x, axis=0) , tf.stack([features, m], 1), dtype=tf.float64)
compactness_loss = (1 / (batch_size * k)) * tf.map_fn(lambda x: tf.math.reduce_euclidean_norm(x), pairs, dtype=tf.float64)
with tf.Session() as sess:
print("loss value output is: ", compactness_loss.eval())
Which yields:
loss value output is: [0.64549722 0.79056942 0.64549722]
However a single measure is required for the batch, therefore it is necessary to reduce it; by the summation of all values.
The wanted Compactness Loss function à la Tensorflow is:
def compactness_loss(actual, features):
features = Flatten()(features)
k = 7 * 7 * 512
dim = (batch_size, k)
def zero(i):
z = tf.zeros((1, dim[1]), dtype=tf.dtypes.float32)
o = tf.ones((1, dim[1]), dtype=tf.dtypes.float32)
arr = []
for k in range(dim[0]):
arr.append(o if k != i else z)
res = tf.concat(arr, axis=0)
return res
masks = [zero(i) for i in range(batch_size)]
m = (1 / (batch_size - 1)) * tf.map_fn(
# row-wise summation
lambda mask: tf.math.reduce_sum(features * mask, axis=0),
masks,
dtype=tf.float32,
)
dists = features - m
sqrd_dists = tf.pow(dists, 2)
red_dists = tf.math.reduce_sum(sqrd_dists, axis=1)
compact_loss = (1 / (batch_size * k)) * tf.math.reduce_sum(red_dists)
return compact_loss
Of course the Flatten() could be moved back into the model for convenience and the k could be derived directly from the feature map; this answers your question. You may just have some trouble finding out the the expected values for the model are - feature maps from the VGG16 (or any other architechture) trained against the imagenet for instance?
The paper says:
In our formulation (shown in Figure 2 (e)), starting froma pre-trained deep model, we freeze initial features (gs) and learn (gl) and (hc). Based on the output of the classification sub-network (hc), two losses compactness loss and descriptiveness loss are evaluated. These two losses, introduced in the subsequent sections, are used to assess the quality of the learned deep feature. We use the provided one-class dataset to calculate the compactness loss. An external multi-class reference dataset is used to evaluate the descriptiveness loss.As shown in Figure 3, weights of gl and hc are learned in the proposed method through back-propagation from the composite loss. Once training is converged, system shown in setup in Figure 2(d) is used to perform classification where the resulting model is used as the pre-trained model.
then looking at the "Framework" backbone here plus:
AlexNet Binary and VGG16 Binary (Baseline). A binary CNN is trained by having ImageNet samples and one-class image samples as the two classes using AlexNet andVGG16 architectures, respectively. Testing is performed using k-nearest neighbor, One-class SVM [43], Isolation Forest [3]and Gaussian Mixture Model [3] classifiers.
Makes me wonder whether it would not be reasonable to add suggested the dense layers to both the Secondary and the Reference Networks to a single class output (Sigmoid) or even and binary class output (using Softmax) and using the mean_squared_error as the so called Compactness Loss and binary_cross_entropy as the Descriptveness Loss.
I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy.
From this stackexchange answer, softmax gradient is calculated as:
Python implementation for above is:
num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
for j in range(num_classes):
p = np.exp(f_i[j])/sum_i
dW[j, :] += (p-(j == y[i])) * X[:, i]
Could anyone explain how the above snippet work? Detailed implementation for softmax is also included below.
def softmax_loss_naive(W, X, y, reg):
"""
Softmax loss function, naive implementation (with loops)
Inputs:
- W: C x D array of weights
- X: D x N array of data. Data are D-dimensional columns
- y: 1-dimensional array of length N with labels 0...K-1, for K classes
- reg: (float) regularization strength
Returns:
a tuple of:
- loss as single float
- gradient with respect to weights W, an array of same size as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
#############################################################################
# Compute the softmax loss and its gradient using explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
# Get shapes
num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
# Compute vector of scores
f_i = W.dot(X[:, i]) # in R^{num_classes}
# Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax
log_c = np.max(f_i)
f_i -= log_c
# Compute loss (and add to it, divided later)
# L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j}
sum_i = 0.0
for f_i_j in f_i:
sum_i += np.exp(f_i_j)
loss += -f_i[y[i]] + np.log(sum_i)
# Compute gradient
# dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )]
# Here we are computing the contribution to the inner sum for a given i.
for j in range(num_classes):
p = np.exp(f_i[j])/sum_i
dW[j, :] += (p-(j == y[i])) * X[:, i]
# Compute average
loss /= num_train
dW /= num_train
# Regularization
loss += 0.5 * reg * np.sum(W * W)
dW += reg*W
return loss, dW
Not sure if this helps, but:
is really the indicator function , as described here. This forms the expression (j == y[i]) in the code.
Also, the gradient of the loss with respect to the weights is:
where
which is the origin of the X[:,i] in the code.
I know this is late but here's my answer:
I'm assuming you are familiar with the cs231n Softmax loss function.
We know that:
So just as we did with the SVM loss function the gradients are as follows:
Hope that helped.
A supplement to this answer with a small example.
I came across this post and still was not 100% clear how to arrive at the partial derivatives.
For that reason I took another approach to get to the same results - maybe it is helpful to others too.