I need to do one-dimensional linear interpolation for building my model in TensorFlow. I tried to follow the definition and write a linear interpolation function. But it is computationally intensive and almost unusable in my model. Is there any efficient method to perform one-dimensional interpolation in TensorFlow?
Here is my code for linear interpolation. similar to NumPy.interp().
t contains interpolated values and it has shape [64,64].
x contains x-coordinates data points and it has shape [91,1].
y contains y-coordinates data points and it has shape [91,1].
t and x are numpy arraies and y is a tensor.
def tf_interpolation_v2(t, x, y, left, right):
# perform one dimensional linear interpolation
# returns tensor same shape as t
# t is the interpolated values
# x is the The x-coordinates of the data points, must be increasing
# y is the The y-coordinates of the data points, same length as x.
# left is the Value to return for x < x[0]
# right is the Value to return for x > x[-1]
t = np.asarray(t)
t = t.astype(np.float32)
x = x.astype(np.float32)
y = tf.cast(y, tf.float32)
t_return = []
t_return_row = []
for row in t:
for v in row:
if v < x[0]: # value smaller than x[0]
a = left
t_return_row.append(a)
elif v > x[-1]: # value larger than x[-1]
a = right
t_return_row.append(a)
else: # finding interval where t is in
nearest_index = 0 # initialize interval index
for i in range(1, len(x) - 1):
if (v >= x[i]) & (v <= x[i+1]): # if t larger than x[i] but smaller than x[i+1]. i is the index
nearest_index = i # we need
break
k = tf.subtract(tf.gather(y, nearest_index + 1), tf.gather(y, nearest_index)) # calculate slope
de_x = x[nearest_index + 1] - x[nearest_index]
k = tf.divide(k, de_x)
b_sub = tf.multiply(k, x[nearest_index]) # calculate bias
b = tf.subtract(tf.gather(y, nearest_index), b_sub)
a = tf.multiply(k, v)
a = tf.add(a, b)
t_return_row.append(a)
t_return.append(t_return_row)
t_return_row = []
t_return = tf.convert_to_tensor(t_return)
t_return = tf.cast(t_return, tf.float32)
return t_return
EDIT:
I say it is unusable, because: TensorFlow will need to calculate gradients for all of these variables in the linear interpolation function, making the network really hard to train. This might be the answer to the question I asked yesterday.
There is a function in TensorFlow doing bilinear interpolation.
tf.contrib.resampler.resampler()
Wondering is it possible to use this function to do linear interpolation in my circumstance?
Related
I have M points in 2-dimensional Euclidean space, and have stored them in an array X of size M x 2.
I have constructed a cost matrix whereby element ij is the distance d(X[i, :], X[j, :]). The distance function I am using is the standard Euclidean distance weighted by an inverse of the matrix D. i.e d(x,y)= < D^{-1}(x-y) , x-y >. I would like to know if there is a more efficient way of doing this, note I have practically avoided for loops.
import numpy as np
Dinv = np.linalg.inv(D)
def cost(X, Dinv):
Msq = len(X) ** 2
mesh = []
for i in range(2): # separate each coordinate axis
xmesh = np.meshgrid(X[:, i], X[:, i]) # meshgrid each axis
xmesh = xmesh[1] - xmesh[0] # create the difference matrix
xmesh = xmesh.reshape(Msq) # reshape into vector
mesh.append(xmesh) # save/append into list
meshv = np.vstack((mesh[0], mesh[1])).T # recombined coordinate axis
# apply D^{-1}
Dx = np.einsum("ij,kj->ki", Dinv, meshv)
return np.sum(Dx * meshv, axis=1) # dot the elements
I ll try something like this, mostly optimizing your meshv calculation:
meshv = (X[:,None]-X).reshape(-1,2)
((meshv # Dinv.T)*meshv).sum(1)
I want to write a batched pairwise bi-variate moran's I. The formula can be found here.
If X and Y are both (n,n) then the weight matrix W has dimension (n^2, n^2).
I think I have a vectorization for a dumb toy example with 1 pair as follows. Note that you have to flatten and standardize x,y.
n_elts = 4
x = torch.FloatTensor([0,1,1,0])
y = torch.FloatTensor([0,1,1,0])
w = torch.FloatTensor(np.identity(n_elts))
x = x - torch.mean(x)
y = y - torch.mean(y)
ans = torch.sum(torch.outer(x,y) * w)/(torch.norm(x)**2) * (n_elts/torch.sum(w)) # = 1
I'm having a hard time extending this to the batched pairwise case. That is, x has shape (B,n,n) and y has shape (C,n,n). You can assume they get flattened to (B,n^2) and (C, n^2), respectively. The output should have shape (B,C). Here B is batch size and C is some number that will generally be different than B.
So far all I can figure out is that, again if x is (B,n^2) and y is (C,n^2) then I can get a broadcasted outer product as follows
at = a[:,None,:,None]
bt = b[None,:,None,:]
outer = at*bt # has shape (B,C,n^2,n^2)
I have a set of data values for a scalar 3D function, arranged as inputs x,y,z in an array of shape (n,3) and the function values f(x,y,z) in an array of shape (n,).
EDIT: For instance, consider the following simple function
data = np.array([np.arange(n)]*3).T
F = np.linalg.norm(data,axis=1)**2
I would like to convolve this function with a spherical kernel in order to perform a 3D smoothing. The easiest way I found to perform this is to map the function values in a 3D spatial grid and then apply a 3D convolution with the kernel I want.
This works fine, however the part that maps the 3D function to the 3D grid is very slow, as I did not find a way to do it with NumPy only. The code below is my actual implementation, where data is the (n,3) array containing the 3D positions in which the function is evaluated, F is the (n,) array containing the corresponding values of the function and M is the (N,N,N) array that contains the 3D space grid.
step = 0.1
# Create meshgrid
xmin = data[:,0].min()
xmax = data[:,0].max()
ymin = data[:,1].min()
ymax = data[:,1].max()
zmin = data[:,2].min()
zmax = data[:,2].max()
x = np.linspace(xmin,xmax,int((xmax-xmin)/step)+1)
y = np.linspace(ymin,ymax,int((ymax-ymin)/step)+1)
z = np.linspace(zmin,zmax,int((zmax-zmin)/step)+1)
# Build image
M = np.zeros((len(x),len(y),len(z)))
for l in range(len(data)):
for i in range(len(x)-1):
if x[i] < data[l,0] < x[i+1]:
for j in range(len(y)-1):
if y[j] < data[l,1] < y[j+1]:
for k in range(len(z)-1):
if z[k] < data[l,2] < z[k+1]:
M[i,j,k] = F[l]
Is there a more efficient way to fill a 3D spatial grid with the values of a 3D function ?
For each item of data you're scanning pixels of cuboid to check if it's inside. There is an option to skip this scan. You could calculate corresponding indices of these pixels by yourself, for example:
data = np.array([[1, 2, 3], #14 (corner1)
[4, 5, 6], #77 (corner2)
[2.5, 3.5, 4.5], #38.75 (duplicated pixel)
[2.9, 3.9, 4.9], #47.63 (duplicated pixel)
[1.5, 2, 3]]) #15.25 (one step up from [1, 2, 3])
step = 0.5
data_idx = ((data - data.min(axis=0))//step).astype(int)
M = np.zeros(np.max(data_idx, axis=0) + 1)
x, y, z = data_idx.T
M[x, y, z] = F
Note that only one value of duplicated pixels is being mapped to M.
All you need is just reshape F[:, 3] (only f(x, y, z)) into a grid. Hard to be more precise without sample data:
If the data is not sorted, you need to sort it:
F_sorted = F[np.lexsort((F[:,0], F[:,1], F[:,2]))] # sort by x, then y, then z
Choose only f(x, y, z)
F_values = F_sorted[:, 3]
Finally, reshape data into a grid:
M = F_sorted.reshape(N, N, N)
This method is faster than the original (approximatly 20x speed up):
step = 0.1
mins = np.min(data, axis=0)
maxs = np.max(data, axis=0)
ranges = np.floor((maxs - mins) / step + 1).astype(int)
indx = np.zeros(data.shape,dtype=int)
for i in range(3):
x = np.linspace(mins[i], maxs[i], ranges[i])
indx[:,i] = np.argmax(data[:,i,np.newaxis] <= (x[np.newaxis,:]), axis=1) -1
M = np.zeros(ranges)
M[indx[:,0],indx[:,1],indx[:,2]] = F
The first part sets up the required grid variables. The argmax function provides a simple (and fast) way to find the first true value of the broadcasted array. This produces a set of indices for x, y and z directions for each of the function values.
The resulting array M is not the same as that produced by the original code as the original code loses data. The logic of y[j] < data[l,1] < y[j+1] where y is a vector produced using linspace means the minimum and maximum values for each direction will be missed (data[l,1] might be equal to either y[j] or y[j+1]!). Run it with a dataset of two values each with their own coordinates and the M array will be all zeros.
I wrote multilayer-perceptron, using three layers (0,1,2). I want to plot the decision boundary and the data-set(eight features long) that i classified, Using python.
How do i plot it on the screen, using one of the python libraries?
Weight function -> matrix[3][8]
Sample x -> vector[8]
#-- Trains the boundary decision, and test it. --#
def perceptron(x, y):
m = len(x)
d = len(x[0])
eta = 0.1
w = [[0 for k in range(d)] for j in range(3)]
T = 2500
for t in range(0, T):
i = random.randint(0, m - 1)
v = [float(j) for j in x[i]]
y_hat = np.argmax(np.dot(w, v))
if y_hat != y[i]:
w[y[i]] = np.add(w[y[i]], np.array(v) * eta)
w[y_hat] = np.subtract(w[y_hat], np.array(v) * eta)
w_perceptron = w
#-- Test the decision boundary that we trained. --#
#-- Prints the loss weight function. --#
M_perceptron = 0
for t in range(0, m):
y_hat = np.argmax(np.dot(w_perceptron, x[t]))
if y[t] != y_hat:
M_perceptron = M_perceptron + 1
return float(M_perceptron) / m
def main():
y = []
x = [[]]
x = readTrain_X(sys.argv[1], x) # Reads data trainning set.
readTrain_Y(sys.argv[2], y) # Reads right classified training set.
print(perceptron(x, y))
You cannot plot 8 features. There is no way you can visualize a 8D space. But what you can do is to perform dimensionality reduction using PCA/t-SNE to 2D for visualization. If you can reduce it to 2D then you can use create a grid of values and use the probabilities returned by the model to visualize the decision boundary.
Reference: Link
Currently my convergence criteria for SGD checks whether the MSE error ratio is within a specific boundary.
def compute_mse(data, labels, weights):
m = len(labels)
hypothesis = np.dot(data,weights)
sq_errors = (hypothesis - labels) ** 2
mse = np.sum(sq_errors)/(2.0*m)
return mse
cur_mse = 1.0
prev_mse = 100.0
m = len(labels)
while cur_mse/prev_mse < 0.99999:
prev_mse = cur_mse
for i in range(m):
d = np.array(data[i])
hypothesis = np.dot(d, weights)
gradient = np.dot((labels[i] - hypothesis), d)/m
weights = weights + (alpha * gradient)
cur_mse = compute_mse(data, labels, weights)
if cur_mse > prev_mse:
return
The weights are update w.r.t. to a single data point in the training set.
With an alpha of 0.001, the model is supposed to have converged within a few iterations however I get no convergence. Is this convergence criteria too strict?
I'll try to answer the question. First, the pseudocode of stochastic gradient descent looks something like this:
input: f(x), alpha, initial x (guess or random)
output: min_x f(x) # x that minimizes f(x)
while True:
shuffle data # good practice, not completely needed
for d in data:
x -= alpha * grad(f(x)) # df/dx
if <stopping criterion>:
break
There can be other regularization parameters added to the function that you want to minimize, such as the l1 penalty to avoid overfitting.
Going back to your problem, looking at your data and definition of the gradient, looks like you want to solve a simple linear system of equations of the form:
Ax = b
which yields the objevtive function:
f(x) = ||Ax - b||^2
stochastic gradient descent uses one row data at a time:
||A_i x - b||
where || o || is the euclidean norm and _i means index of a row.
Here, A is your data, x is your weights and b is your labels.
The gradient of the function is then computed as a:
grad(f(x)) = 2 * A.T (Ax - b)
Or in the case of the stochastic gradient descent:
2 * A_i.T (A_i x - b)
where .T means transpose.
Putting everything back into your code... first I will setup a synthetic data:
A = np.random.randn(100, 2) # 100x2 data
x = np.random.randn(2, 1) # 2x1 weights
b = np.random.randint(0, 2, 100).reshape(100, 1) # 100x1 labels
b[b == 0] = -1 # labels in {-1, 1}
Then, define the parameters:
alpha = 0.001
cur_mse = 100.
prev_mse = np.inf
it = 0
max_iter = 100
m = A.shape[0]
idx = range(m)
And loop!
while cur_mse/prev_mse < 0.99999 and it < max_iter:
prev_mse = cur_mse
shuffle(idx)
for i in idx:
d = A[i:i+1]
y = b[i:i+1]
h = np.dot(d, x)
dx = 2 * np.dot(d.T, (h - y))
x -= (alpha * dx)
cur_mse = np.mean((A.dot(x) - b)**2)
if cur_mse > prev_mse:
raise Exception("Not converging")
it += 1
This code is pretty much the same as yours, with a couple of additions:
Another stopping criterion based on the number of iterations (to avoid looping forever if the system doesn't converge or does too slowly)
Redefinition of the gradient dx (still similar to yours). You have the sign inverted and therefore the weight update is positive + since in my example is negative - (makes sense since you are going down in a gradient).
Indexing of data and labels. While data[i] gives a tuple of size (2,) (in this case for a 100x2 data), using fancy indexing data[i:i+1] will return a view of the data without reshaping it (e.g with shape (1, 2)) and therefore will allow you to perform the proper matrix multiplications.
You can add a 3rd stopping criterion based on acceptable mse error, i.e: if cur_mse < 1e-3: break.
This algorithm, with random data, converges in 20-40 iterations for me (depending on the generated random data).
So... assuming that this is the function you want to minimize, if this method doesn't work for you, it might mean that your system is underdeterminated (you have less training data than features, which means A is more wide than high).
Hope it helps!