I'm implementing my own neural network from scratch (educational purposes, I know there are faster and better libraries for this) and for that I'm trying to calculate the derivative of a fully connected layer. I know the following:
and assuming I have a way to calculate the derivative of f using f.derivative(<some_matrix>), how can I use numpy to efficiently calculate the derivative of f(XW) with respect to W as seen in the picture?
I want to be able to calculate the derivative for N different inputs at the same time (giving me a 4-d tensor, 1 dimension for the N samples, and 3 dimensions for the derivative in the image).
Note: f.derivative takes in a matrix of N inputs with d features each, and returns the derivative of each of the input points.
Related
Want to use Tucker and canonical polyadic decomposition (cdp or PARAFAC/CANDECOMP) of 3-dimension tensor for latent analysis.
I use python, function from tensorly.decomposition.parafac of library tensorly.
from tensorly.decomposition import parafac
# Rank of the CP decomposition
cp_rank = 5
Perform the CP decomposition
weights, factors = parafac(result, non_negative=True ,rank=cp_rank , normalize_factors=True, init='random', tol=10e-6)
# Reconstruct the tensor from the factors
cp_reconstruction = tl.kruskal_to_tensor((weights, factors))
Factors matrices and core are not-unique (can multiply on non-singular matrix), so factor matrices change after calling the function.
Use this code for understand this:
weights = 0
for i in range(100):
error = weights
weights, factors = parafac(result, non_negative=True ,rank=8, normalize_factors=True, init='random', tol=10e-6)
error -= weights
print(tl.norm(error))
How I can describe or analysis every components of tensor.Has ones any meaning?
For matrix I understand SVD decomposition. What do for tensor?
The decomposition you are using in your example (parafac, also known as Canonical-Polyadic -CP- decomposition), does not have a core. It expresses the original tensor as a weighted sum of rank-1 tensors, i.e. a weighted sum of outer-product of vectors. These vectors are collected for each mode (dimension) into the factors matrices. The weights of the sum are a vector. The CP decomposition, unlike the Tucker, does not have a core and is unique under mild conditions (you could consider CP as a special case of Tucker, with the weight vector being the values of a diagonal core).
However, there are several issues with directly comparing the factors: for one, even if the decomposition is unique, it is also invariant under permutations of the factors so you can't directly compare the factors. In addition, finding the actual rank of a tensor is in general NP-hard. What is typically computed with a CP decomposition is a low-rank approximation (i.e. best rank-R approximation) which is also, in general, NP-hard and the ALS is just a (good) heuristic. If you want to compare several factorized tensors, it is easier to compare the reconstructions rather than directly the factors.
For latent factor analysis I advise you look at this paper which shows how you can learn latent variable models by factorizing low-order observable moments.
I want to make a zero-mean Gaussian Matrix, e.g., M of size (n,n) in Python such that
where, the four dimensional matrix A with entries is given. Is there any way to do that, without changing M into a vector?
What does Tensorflow really do when the Gradient descent optimizer is applied to a "loss" placeholder that is not a number (a tensor of size 1) but rather a vector (a 1-dimensional tensor of size 2, 3, 4, or more)?
Is it like doing the descent on the sum of the components?
The answer to your second question is "no".
As for the second: just like in the one-dimensional case (e.g. y = f(x), x in R), where the direction the algorithm takes is defined by the derivative of the function with respect to its single variable, in the multidimensional case the 'overall' direction is defined by the derivative of the function with respect to each variable.
This means the size of the step you'll take in each direction will be determined by the value of the derivative of the variable corresponding to that direction.
Since there's no way to properly type math in StackOverflow, instead of messing around with it I'll suggest you take a look at this article.
Tensorflow first reduces your loss to a scalar and then optimizes that.
I'm trying to generate a kernel function for GP using only Matrix operations (no loops).
Vectors where no problem taking advantage of broadcasting
def kernel(A,B):
return 1/np.exp(np.linalg.norm(A-B.T))**2
A and B are both [n,1] vectors, but with [n,m] shaped matrices It just doesn't work. (Also tried reshaping to [1,n,m])
I'm interested on computing an X Matrix where every ij-th element is defined by Ai-Bj.
Now I'm working on Numpy but my final objective is implement this on Tensorflow.
Thanks in Advance.
I am solving a detection problem using ConvNet. However in my case the labels are matrix of dimension [3 x 5] for each image. I use Caffe for this work. I read the images using the Datalayer while I read the labels using HDF5Layer.
The HDF5Layer reads the [3x5] label matrix as [1x15] dimensional vector.
So I used Reshape Layer with reshape the vector into matrix before computing the L2-loss. However I realized that the reshape layer formats the data in H x W while my label matrix is [W x H] i.e., [w=3, h=5]
hence the reshape is incorrect. I wonder is there a way to reshape the [1x15] label vector in right order i.e., [3x5] and not [5x3]
Another way I thought I can work around is by flatten the output form Convolutional layer into [1 x 15] and then compute the loss using my [1 x 15] label.
I am showing the problem using Figures for better understanding because of my poor English.
Eample of my input matrix label (Note the images are just enlarged for illustration)
Result of Caffe Reshape Layer
Any suggestion if I am doing it right?
Either way of computing the loss is just fine. In fact, computing in the 1x15 shape will save you the time of converting. The loss computation is still pixel-by pixel; the logical organization doesn't matter.
Using the same idea, it doesn't really matter whether you compute 3x5 or 5x3; all that matters is that your convolutional output and your label properly match each other.
If you want the display (graph, picture, etc.) to match, perhaps you can just switch the x and y designations before you plot the output.