I have been trying to model nonlinear dynamic systems with LSTM networks using Keras. I have had success by simply using the Keras LSTM networks, where I define my input/output relationship something like the following pseudo-code:
x[t] = NN(y[t-200:t],x[t-200-1:t-1])
Where y would be my forcing function and x is the variable I'm after. So I use the past 200 points to estimate the next point. I do this recursively by adding the newly predicted point to my "past outputs" vector.
Now I would like to add some information about the PDE that I'm solving to the loss function, so I need to compute derivatives with respect to time. I have read this answer and the related answers to get started but I can't seem to get that workflow to work with LSTMs. First of all, time is not an explicit variable in my workflow, so I would need to add it as an input to accommodate the workflow in that answer.
So I could add the time vector to the list of inputs, and then try to compute derivatives of the output with respect to the input:
_df1 = grad(model.output,model.input)
df1 = tf.Print( _df1, [ _df1 ], message = "df1" )
For reference, my input dimension is (?,200,3) and my output dimension is simply (?,1). The code above works and I get a (?,200,3) tensor. But when I try to compute the second derivative like so:
_df2 = grad(df1,model.input)
df2 = tf.Print( _df2, [ _df2 ], message = "df2" )
Then I get the error:
TypeError: Second-order gradient for while loops not supported.
Since I only need the derivatives at the current timestep (t), I have tried slicing the tensor, but that doesn't work either.
_df2 = grad(df1[:,-1,-1],model.input)
df2 = tf.Print( _df2, [ _df2 ], message = "df2" )
Even if I could do something like that, I am not too comfortable adding the time vector as an explicit input. I have considered computing the derivative numerically with diff() (given a constant dt), but I am not sure how to do that here when dealing with tensors.
So I'd appreciate any suggestions or ideas to help me solve this problem. Ultimately, I'd like to add the homogeneous portion of the PDE to the loss function. At this point, my equation only has derivatives with respect to time.
Thanks.
Related
I'm trying to implement a loss function that increases loss as the model ranks images from worst to best, to do this I've come up with an algorithm that sorts the predicted score array according to the true scores of the image batch, then starting from the largest predicted score, check how far away it is from the first position in the array and give it a loss based on that, for the second largest we will see how far away it is from the 2nd array position and give it a loss based on that
To do this, I'm using tf.nn.top_k and other functions that I looked up to all be differentiable to my knowledge, but I still get the No gradients provided error
Can someone please tell me what part am I doing wrong?
Please note that the global sub_tensor was a workaround (to replace correct_indices) I was doing to avoid using a range which I know is non-differentiable, an array from outside the function that is fixed to be a range of the length of the batch [0-32]. This still didn't work
sub_tensor = constant(np.array([np.arange(32)],dtype='int32'))
def get_ranking_loss(y_true,y_pred):
global sub_tensor
_, y_true_ind_k = tf.nn.top_k(y_true, y_true.shape[1])
sorted_y_pred = tf.gather(y_pred,y_true_ind_k)
_, y_pred_ind_k = tf.nn.top_k(sorted_y_pred, sorted_y_pred.shape[1])
# correct_indices = tf.range(0,sorted_y_pred.shape[1])
subtracted = tf.math.subtract(y_pred_ind_k,sub_tensor)
absolute = tf.abs(subtracted)
absolute = tf.cast(absolute, float64)
return tf.reduce_sum(absolute)
I tried to change almost all functions to be tf functions only, but no luck
I have a dataset of peak load for a year. Its a simple two column dataset with the date and load(kWh).
I want to train it on the first 9 months and then let it predict the next three months . I can't get my head around how to implement SVR. I understand my 'y' would be predicted value in kWh but what about my X values?
Can anyone help?
given multi-variable regression, y =
Regression is a multi-dimensional separation which can be hard to visualize in ones head since it is not 3D.
The better question might be, which are consequential to the output value `y'.
Since you have the code to the loadavg in the kernel source, you can use the input parameters.
For Python (I suppose, the same way will be for R):
Collect the data in this way:
[x_i-9, x_i-8, ..., x_i] vs [x_i+1, x_i+2, x_i+3]
First vector - your input vector. Second vector - your output vector (or value if you like). Use method fit from here, for example: http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR.fit
You can try scaling, removing outliers, apply weights and so on. Play :)
One of my operaction need integer, but output of convolution is float.
It means I need to use tf.floor, tf.ceil, tf.cast...etc to handle it.
But these operactions cause None gradients, since operactions like tf.floor are not differentiable
So, I tried something like below
First. detour
out1 = tf.subtract(vif, tf.subtract(vif, tf.floor(vif)))
But output of test.compute_gradient_error is 500 or 0, I don't think this is a reasonable gradient.
Second. override gradient function of floor
#ops.RegisterGradient("CustomFloor")
def _custom_floor_grad(op, grads):
return [grads]
A, B = 50, 7
shape = [A, B]
f = np.ones(shape, dtype=np.float32)
vif = tf.constant(f, dtype=tf.float32)
# out1 = tf.subtract(vif, tf.subtract(vif, tf.floor(vif)))
with tf.get_default_graph().gradient_override_map({"Floor": "CustomFloor"}):
out1 = tf.floor(vif)
with tf.Session() as sess:
err1 = tf.test.compute_gradient_error(vif, shape, out1, shape)
print err1
output of test.compute_gradient_error is 500 or 1, doesn't work too.
Question: A way to get integer and keep back propagation work fine (value like 2.0, 5.0 is ok)
In general, it's not inadvisable to solve discrete problem with gradient descent. You should be able express, to some extent integer solvers in TF but you're more or less on your own.
FWIW, the floor function looks like a saw. Its derivative is a constant function at 1 with little holes at every integer. At these positions you have a Dirac functional pointing downwards, like a rake if you wish. The Dirac functional has finite energy but no finite value.
The canonical way to tackle these problems is to relax the problem by "relaxiing" the hard floor constraint with something that is (at least once) differentiable (smooth).
There are multiple ways to do this. Perhaps the most popular are:
Hack up a function that looks like what you want. For instance a piece-wise linear function that slopes down quickly, but not vertically.
Replace step functions by sigmoids
Use a filter approximation which is well understood if it's a time series
I have built a basic multi-layer perceptron which can learn from the MNIST dataset.
After the learning, I wanted to obtain an input from an output, meaning reversing the feedforward function in order to obtain a picture generated by the network.
My whole code is similar to https://github.com/mnielsen/neural-networks-and-deep-learning/blob/master/src/network.py
The feedforward function looks like:
def feedforward(self, a):
"""Return the output of the network if ``a`` is input."""
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a)+b)
return a
If I try to reverse it, using np.linalg.pinv, it doesn't work because at some point I have to inverse a vector and the solution is not unique, for example :
m = np.array([[3, 4]])
n = np.array([[1], [2]])
o = np.dot(m,n)
print(np.dot(np.linalg.pinv(m),o)) #not n
I'm stucked there. I think I need a hack to make the output of np.linalg.pinv more coherent, because as it is I don't even manage to retrieve a real input as there are logarithms in the expression of the inverse of the sigmoid function, and because of the values given by np.linalg.pinv, I end up with negative numbers resulting in NaN everywhere.
Any ideas ?
PS : the end goal is to feed random noise into the network and watch numbers appear, hence the dream expression which I picked up from http://googleresearch.blogspot.fr/2015/06/inceptionism-going-deeper-into-neural.html
If I understand correctly, there is no need to reverse the network in any way, but you should create a random image and feed it to the network and see how it performs.
Now you gradually change the input image in ways that make the output be closer to what you would like it to be. Moreover you should try to have neighboring pixels be similar to each other in intensity.
(This is basically what the provided link states in other words.)
I've tried to model a mixture of exponentials by copying the mixture-of-Gaussians example given here. The code is below. I know there are some funky aspects to the inference here, but my question is more about how to debug the calculations in models like this.
The idea is that it's a mixture of three exponentials, with scale parameters taken from the Gamma assigned to scales. However, all observations get assigned to the zeroth exponential during the ElemwiseCategoricalStep. You can see that the assignments of the observations to the exponential components are initially diverse by looking at initial_assignments, and you can see that all observations are assigned to the zeroth component on all interations from the fact that set(tr['exp'].flatten()) contains only 0.
I assume this is because all of the values assigned to p in the expression array([logp(v * self.sh) for v in self.values]) in ElemwiseCategoricalStep.astep are minus infinity. I would like to know why that is and how to correct it, but even more, I would like to know what tools are available to debug this kind of thing. Is there any way for me to step through the calculation of logp(v * self.sh) to see how the result is determined? If I try to do it using pdb, I think I get stymied at outputs = self.fn() in theano.compile.function_module.Function.__call__, which I guess I can't step into because it's a native function.
Even knowing how to compute the pdf for a given set of model parameters would be a useful start.
import numpy as np
import pymc as pm
from pymc import Model, Gamma, Normal, Dirichlet, Exponential
from pymc import Categorical
from pymc import sample, Metropolis, ElemwiseCategoricalStep
durations = np.concatenate(
[np.random.exponential(1/lam, 10)
for lam in [1e-3,7e-5,2e-6]])
initial_assignments = np.random.randint(0, 3, len(durations))
print 'initial_assignments', initial_assignments
with Model() as model:
scales = Gamma('hp', 1, 1, shape=3)
props = Dirichlet('props', a=np.array([1., 1., 1.]), shape=3)
category = Categorical('exp', p=props, shape=len(durations))
points = Exponential('obs', lam=scales[category], observed=durations)
step1 = pm.Metropolis(vars=[props,scales])
step2 = ElemwiseCategoricalStep(var=category, values=[0,1,2])
start = {'exp': initial_assignments,
'hp': np.ones(3),
'props': np.ones(3),}
tr = sample(3000, step=[step1, step2], start=start)
print set(tr['exp'].flatten())
Excellent question. One thing you can do is look at the pdf for each of the components.
The Model and each of the variables should have both a .logp and a .elemwise_logp property and them which returns a function that can take a point or parameter values.
Thus you can say something like print scales.logp(start) or print model.logp(start) or print scales.dlogp()(start).
For now, I think you unfortunately have to specify all the parameter values (even ones that don't affect the result for a particular variable).
Model, FreeRV and ObservedRV all inherit from Factor which provides this functionality and has a few other methods. You'll probably want the non fast versions since those are more forgiving in the kinds of arguments they accept.
Does that help? Please let me know if you have other ideas for things that might help you in debugging. This is one area where we know pymc3 and theano needs some work.