Constrained Linear combination of learned parameters is pytorch? - python

I have three tensors X,Y,Z and I want to learn the optimal convex combination of these tensors wrt to some cost, i.e.
aX + bY + cZ such that a + b + c = 1. How can I do this easily in Pytorch?
I know that I could just concatenate along an unsqueezed axis and then apply linear layer as so:
X = X.unsqueeze(-1)
Y = Y.unsqueeze(-1)
Z = Z.unsqueeze(-1)
W = torch.cat([X,Y,Z], dim = -1) #third axis has dimension 3)
W = torch.linear(3,1)(W)
but this would not apply the convex combination constraint...

I found an answer that works well for those who are interested this would generalize to a linear combination of N tensors you just need to change the weights dim and number of tensors you concatenate.
weights = nn.Parameter(torch.rand(1,3))
X = X.unsqueeze(-1)
Y = Y.unsqueeze(-1)
Z = Z.unsqueeze(-1)
weights_normalized = nn.functional.softmax(weights, dim=-1)
output = torch.matmul(torch.cat([X, Y, Z], dim=-1), weights_normalized.t()).squeeze()

Related

How to solve 1D heat equation using neural networks in pytorch

I want to solve a 1D heat conduction using neural netwroks in pytorch. The PDE represeting the heat conduction is as follows:
du/dt = k d2u/dx2
where, k is a constant, u represent temperature and x is also the space. I also include a boundary condition like 0 temperature at x=0 and initial condition like t=0. I am quite new in the field of PINN and only can solve normal ODEs with it. The following code is tryin to solve a very simple ODE like du/dx=u with boundary condition like u=0 at x=0. The answer is simply u = u2/2. The following code is a simple PINN that solve this ODE:
import torch
import torch.nn as nn
import numpy as np
# N is a Neural Network with three layers
N = nn.Sequential(nn.Linear(1, 50), nn.Sigmoid(), nn.Linear(50,1, bias=False))
BC = 0. # Boundary condition
g_f = lambda x: BC + x * N(x) # a general function satisfying the BC
f = lambda x: x # Undolvable ODE!!! : du/dx = x ----> u = x2/2
# The loss function
def loss(x):
x.requires_grad = True
outputs = g_f(x)
grdnt = torch.autograd.grad(outputs, x, grad_outputs=torch.ones_like(outputs),
create_graph=True)[0]
return torch.mean((grdnt - f(x)) ** 2)
optimizer = torch.optim.LBFGS(N.parameters())
x = torch.Tensor(np.linspace(-4, 4, 100)[:, None])
# Run the optimizer
def closure():
optimizer.zero_grad()
l = loss(x)
l.backward()
return l
for i in range(10):
optimizer.step(closure)
x_test = np.linspace(-4, 4, 100)[:, None]
with torch.no_grad():
y_test = g_f(torch.Tensor(x_test)).numpy()
How to replicate the code with 1D heat conduction?

How to vectorize this batched-pairwise computation in PyTorch?

I want to write a batched pairwise bi-variate moran's I. The formula can be found here.
If X and Y are both (n,n) then the weight matrix W has dimension (n^2, n^2).
I think I have a vectorization for a dumb toy example with 1 pair as follows. Note that you have to flatten and standardize x,y.
n_elts = 4
x = torch.FloatTensor([0,1,1,0])
y = torch.FloatTensor([0,1,1,0])
w = torch.FloatTensor(np.identity(n_elts))
x = x - torch.mean(x)
y = y - torch.mean(y)
ans = torch.sum(torch.outer(x,y) * w)/(torch.norm(x)**2) * (n_elts/torch.sum(w)) # = 1
I'm having a hard time extending this to the batched pairwise case. That is, x has shape (B,n,n) and y has shape (C,n,n). You can assume they get flattened to (B,n^2) and (C, n^2), respectively. The output should have shape (B,C). Here B is batch size and C is some number that will generally be different than B.
So far all I can figure out is that, again if x is (B,n^2) and y is (C,n^2) then I can get a broadcasted outer product as follows
at = a[:,None,:,None]
bt = b[None,:,None,:]
outer = at*bt # has shape (B,C,n^2,n^2)

What does the locality linear coding function do?

I got this code for spectral clustering.
https://github.com/BirdYin/scllc/blob/master/scllc.py
This is a landmark-based spectral clustering code.
What does the locality_linear_coding function do in this code?
class Scllc:
def __locality_linear_coding(self, data, neighbors):
indicator = np.ones([neighbors.shape[0], 1])
penalty = np.eye(self.n_neighbors)
# Get the weights of every neighbors
z = neighbors - indicator.dot(data.reshape(-1,1).T)
local_variance = z.dot(z.T)
local_variance = local_variance + self.lambda_val * penalty
weights = scipy.linalg.solve(local_variance, indicator)
weights = weights / np.sum(weights)
weights = weights / np.sum(np.abs(weights))
weights = np.abs(weights)
return weights.reshape(self.n_neighbors)
def fit(self, X):
[n_data, n_dim] = X.shape
# Select landmarks
if self.func_landmark == 'kmeans':
landmarks, centers, unknown = k_means(X, self.n_landmarks, n_init=1, max_iter=100)
nbrs = NearestNeighbors(metric='euclidean').fit(landmarks)
# Create properties of the sparse matrix Z
[dist, indy] = nbrs.kneighbors(X, n_neighbors = self.n_neighbors)
indx = np.ones([n_data, self.n_neighbors]) * np.asarray(range(n_data))[:, None]
valx = np.zeros([n_data, self.n_neighbors])
self.delta = np.mean(valx)
# Compute all the coded data
for index in range(n_data):
# Compute the weights of its neighbors
localmarks = landmarks[indy[index,:], :]
weights = self.__locality_linear_coding(X[index,:], localmarks)
# Compute the coded data
valx[index] = weights
# Construct sparse matrix
indx = indx.reshape(n_data * self.n_neighbors)
indy = indy.reshape(n_data * self.n_neighbors)
valx = valx.reshape(n_data * self.n_neighbors)
Z = sparse.coo_matrix((valx,(indx,indy)),shape=(n_data,self.n_landmarks))
Z = Z / np.sqrt(np.sum(Z, 0))
# Get first k eigenvectors
[U, Sigma, V] = svds(Z, k = self.n_clusters + 1)
U = U[:, 0:self.n_clusters]
embedded_data = U / np.sqrt(np.sum(U * U, 0))
You can see the documentation of numpy module to deal with n-dimensional array
.For exemple, the dot method do the product of the matrices
Than They have use the scipy module, you can also see the documentation on internet.
the first function of a class is always an initialize method. Because the user have to call it to fully use the class. It is the first function where are defined and saved all the variables that the user want

Linear Regression with gradient descent: two questions

I'm trying to understand Linear Regression with Gradient Descent and I do not understand this part in my loss_gradients function below.
import numpy as np
def forward_linear_regression(X, y, weights):
# dot product weights * inputs
N = np.dot(X, weights['W'])
# add bias
P = N + weights['B']
# compute loss with MSE
loss = np.mean(np.power(y - P, 2))
forward_info = {}
forward_info['X'] = X
forward_info['N'] = N
forward_info['P'] = P
forward_info['y'] = y
return loss, forward_info
Here is where I'm stuck in my understanding, I have commented out my questions:
def loss_gradients(forward_info, weights):
# to update weights, we need: dLdW = dLdP * dPdN * dNdW
dLdP = -2 * (forward_info['y'] - forward_info['P'])
dPdN = np.ones_like(forward_info['N'])
dNdW = np.transpose(forward_info['X'], (1, 0))
dLdW = np.dot(dNdW, dLdP * dPdN)
# why do we mix matrix multiplication and dot product like this?
# Why not dLdP * dPdN * dNdW instead?
# to update biases, we need: dLdB = dLdP * dPdB
dPdB = np.ones_like(forward_info[weights['B']])
dLdB = np.sum(dLdP * dPdB, axis=0)
# why do we sum those values along axis 0?
# why not just dLdP * dPdB ?
It looks to me like this code is expecting a 'batch' of data. What I mean by that is, it's expecting that when you do forward_info and loss_gradients, you're actually passing a bunch of (X, y) pairs together. Let's say you pass B such pairs. The first dimension of all of your forward info stuff will have size B.
Now, the answers to both of your questions are the same: essentially, these lines compute the gradients (using the formulas you predicted) for each of the B terms, and then sum up all of the gradients so you get one gradient update. I encourage you to work out the logic behind the dot product yourself, because this is a very common pattern in ML, but it's a little tricky to get the hang of at first.

How to do one-dimensional linear interpolation in TensorFlow efficiently?

I need to do one-dimensional linear interpolation for building my model in TensorFlow. I tried to follow the definition and write a linear interpolation function. But it is computationally intensive and almost unusable in my model. Is there any efficient method to perform one-dimensional interpolation in TensorFlow?
Here is my code for linear interpolation. similar to NumPy.interp().
t contains interpolated values and it has shape [64,64].
x contains x-coordinates data points and it has shape [91,1].
y contains y-coordinates data points and it has shape [91,1].
t and x are numpy arraies and y is a tensor.
def tf_interpolation_v2(t, x, y, left, right):
# perform one dimensional linear interpolation
# returns tensor same shape as t
# t is the interpolated values
# x is the The x-coordinates of the data points, must be increasing
# y is the The y-coordinates of the data points, same length as x.
# left is the Value to return for x < x[0]
# right is the Value to return for x > x[-1]
t = np.asarray(t)
t = t.astype(np.float32)
x = x.astype(np.float32)
y = tf.cast(y, tf.float32)
t_return = []
t_return_row = []
for row in t:
for v in row:
if v < x[0]: # value smaller than x[0]
a = left
t_return_row.append(a)
elif v > x[-1]: # value larger than x[-1]
a = right
t_return_row.append(a)
else: # finding interval where t is in
nearest_index = 0 # initialize interval index
for i in range(1, len(x) - 1):
if (v >= x[i]) & (v <= x[i+1]): # if t larger than x[i] but smaller than x[i+1]. i is the index
nearest_index = i # we need
break
k = tf.subtract(tf.gather(y, nearest_index + 1), tf.gather(y, nearest_index)) # calculate slope
de_x = x[nearest_index + 1] - x[nearest_index]
k = tf.divide(k, de_x)
b_sub = tf.multiply(k, x[nearest_index]) # calculate bias
b = tf.subtract(tf.gather(y, nearest_index), b_sub)
a = tf.multiply(k, v)
a = tf.add(a, b)
t_return_row.append(a)
t_return.append(t_return_row)
t_return_row = []
t_return = tf.convert_to_tensor(t_return)
t_return = tf.cast(t_return, tf.float32)
return t_return
EDIT:
I say it is unusable, because: TensorFlow will need to calculate gradients for all of these variables in the linear interpolation function, making the network really hard to train. This might be the answer to the question I asked yesterday.
There is a function in TensorFlow doing bilinear interpolation.
tf.contrib.resampler.resampler()
Wondering is it possible to use this function to do linear interpolation in my circumstance?

Categories