I designed a really simple recurrent neural network in Blocks (and Theano). As a cost function I decided to use the square error function which is defined simply as (y-y')^2. I would like to compute the average cost across the minibatch.
The following code is an almost working example with Blocks class/method SquaredError which is, as far as I'm concerned, supposed to do exactly the desired operation.
Please ignore inefficient float64, I use them in order to simplify eval execution. The problem preserves when using 32b.
import theano.tensor as tt
from blocks.bricks.cost import SquaredError
if __name__ == '__main__':
a = tt.vector('a', dtype='float64')
b = tt.vector('b', dtype='float64')
cost = SquaredError().apply(a, b)
print(cost.eval({a: [1.0, 2.0, 3.0, 4.0],
b: [0.5, 2.1, 3.4, 3.8]}))
# Expected: mean(0.5^2 + 0.1^2 + 0.4^2 + 0.2^2)
# Got: ValueError: Not enough dimensions on squarederror_cost_matrix_output_0 to reduce on axis 1
If I change the problematic line into the one below, everything works as expected.
cost = tt.sqr(tt.abs_(a - b)).mean()
What am I doing wrong? I am trying to learn Blocks more but this is beyond my understanding. Am I supposed to use another brick? Or somehow preprocess the tensors?
Looks as though we require 2D inputs for CostMatrix bricks, which is kind of dumb. I've filed an issue about it. You can get around this, if you like, by dimshuffling your inputs up to (N, 1) matrices, but the Cost bricks are mainly only useful if you're using the automatic tagging of inputs and outputs for VariableFilter operations, etc. Writing down the cost as you did in a Theano expression is fine too (although to nitpick you don't need the abs, the square of a negative number is always positive anyway).
Related
So I've got some code
tensors = [] //its filled with 3D float tensors
total = sum(tensors)
if I change that last line to
total = tf.add_n(tensors)
then the code produces the same output but runs much more slowly and soon causes
an out-of-memory exception. Whats going on here? Can someone explain how pythons built in sum function and tf.add_n interact with an array of tensors respectively and why pythons sum would seemingly just be a better version?
When you use sum, you call a standard python algorithm that call __add__ recursively on the elements of the array. Since __add__ (or +) indeed is overloaded on tensorflow's tensors, it works as expected: it creates a graph that can be executed during a session. It is not optimal, however, because you add as many operation as there are elements in your list; also, you are enforcing the order of the operation (add the first two elements, then the third to the result, and so on), which is also not optimal.
By contrast, add_n is a specialized operation to do just that. Looking at the graph is really telling I think:
import tensorflow as tf
with tf.variable_scope('sum'):
xs = [tf.zeros(()) for _ in range(10)]
sum(xs)
with tf.variable_scope('add_n'):
xs = [tf.zeros(()) for _ in range(10)]
tf.add_n(xs)
However – contrary to what I thought earlier – add_n takes up more memory because it waits – and store – for all incoming inputs before storing them. If the number of inputs is large, then the difference can be substantial.
The behavior I was expecting from add_n, that is, summation of inputs as they are available, is actually achieved by tf.accumulate_n. This should be the superior alternative, as it takes less memory than add_n but does not enforce the order of summation like sum.
Why did the authors of tensorflow-wavenet used sum instead of tf.accumulate_n? Certainly because before this function is not differentiable on TF < 1.7. So if you have to support TF < 1.7 and be memory efficient, good old sum is actually quite a good option.
The sum() built-in only takes iterables and therefor would seem to gain the advantage of using generators in regards to memory profile.
the add_n() function for tensor takes a list of tensors and seem to retain that data structure throughout handling based on it's requirement for shape comparison.
In [29]: y = [1,2,3,4,5,6,7,8,9,10]
In [30]: y.__sizeof__()
Out[30]: 120
In [31]: x = iter(y)
In [32]: x.__sizeof__()
Out[32]: 32
I'm struggling with the fact that elements of sympy.MatrixSymbol don't seem to interact well with sympy's differentiation routines.
The fact that I'm trying to work with elements of sympy.MatrixSymbol rather than "normal" sympy symbols is beacause I want to autowrap a large function, and this seems that this is the only way to overcome argument limitations and enable input of a single array.
To give the reader a picture of the restrictions on possible solutions, I'll start with an overview of my intentions; however, the hasty reader might as well jump to the codeblocks below, which illustrate my problem.
Declare a vector or array of variables of some sort.
Build some expressions out of the elements of the former; these expressions are to make up the components of a vector valued function of said vector. In addition to this function, I'd like to obtain the Jacobian w.r.t. the vector.
Use autowrap (with the cython backend) to get numerical implementations of the vector function and its Jacobian. This puts some limitations on the former steps: (a) it is desired that the input of the function is given as a vector, rather than a list of symbols. (Both because there seems to be a limit to the number of inputs for an autwrapped function, and to ease interaction with scipy later on, i.e. avoid having to unpack numpy vectors to lists often).
On my journey, I ran into 2 issues:
Cython does not seem to like some sympy function, among them sympy.Max, upon which I heavily rely. The "helpers" kwarg of autowrap seems unable to handle multiple helpers at once.
This is by itself not a big deal, as I learned to circumvent it using abs() or sign(), which cython readily understands.
(see also this question on the above)
As stated before, autowrap/cython do not accept more than 509 arguments in form of symbols, at least not in my compiler setup. (See also here)
As I would prefer to give a vector rather than a list as input to the function anyways, I looked for a way to get the wrapped function to take a numpy array as input (comparable to DeferredVector + lambdify). It seems the natural way to do this is sympy.MatrixSymbol. (See thread linked above. I'm not sure there'd be an alternative, if so, suggestions are welcome.)
My latest problem then starts here: I realized that the elements of sympy.MatrixSymbol in many ways do not behave like "other" sympy symbols. One has to assign the properties real and commutative individually, which then seems to work fine though. However, my real trouble starts when trying to get the Jacobian; sympy seems not to get derivatives of the elements right out of the box:
import sympy
X= sympy.MatrixSymbol("X",10,1)
for element in X:
element._assumptions.update({"real":True, "commutative":True})
X[0].diff(X[0])
Out[2]: Derivative(X[0, 0], X[0, 0])
X[1].diff(X[0])
Out[15]: Derivative(X[1, 0], X[0, 0])
The following block is a minimal example of what I'd like to do, but here using normal symbols:
(I think it captures all I need, if I forgot something I'll add that later.)
import sympy
from sympy.utilities.autowrap import autowrap
X = sympy.symbols("X:2", real = True)
expr0 = X[1]*( (X[0] - abs(X[0]) ) /2)**2
expr1 = X[0]*( (X[1] - abs(X[1]) ) /2)**2
F = sympy.Matrix([expr0, expr1])
J = F.jacobian([X[0],X[1]])
J_num = autowrap(J, args = [X[0],X[1]], backend="cython")
And here is my (currently) best guess using sympy.MatrixSymbol, which then of course fails because the Derivative-expressions within J:
X= sympy.MatrixSymbol("X",2,1)
for element in X:
element._assumptions.update({"real":True, "commutative":True, "complex":False})
expr0 = X[1]*( (X[0] - abs(X[0]) ) /2)**2
expr1 = X[0]*( (X[1] - abs(X[1]) ) /2)**2
F = sympy.Matrix([expr0, expr1])
J = F.jacobian([X[0],X[1]])
J_num = autowrap(J, args = [X], backend="cython")
Here is what Jlooks like after running the above:
J
Out[50]:
Matrix([
[(1 - Derivative(X[0, 0], X[0, 0])*X[0, 0]/Abs(X[0, 0]))*(-Abs(X[0, 0])/2 + X[0, 0]/2)*X[1, 0], (-Abs(X[0, 0])/2 + X[0, 0]/2)**2],
[(-Abs(X[1, 0])/2 + X[1, 0]/2)**2, (1 - Derivative(X[1, 0], X[1, 0])*X[1, 0]/Abs(X[1, 0]))*(-Abs(X[1, 0])/2 + X[1, 0]/2)*X[0, 0]]])
Which, unsurprisingly, autowrap does not like:
[...]
wrapped_code_2.c(4): warning C4013: 'Derivative' undefined; assuming extern returning int
[...]
wrapped_code_2.obj : error LNK2001: unresolved external symbol Derivative
How can I tell sympy that X[0].diff(X[0])=1 and X[0].diff(X[1])=0? And perhaps even that abs(X[0]).diff(X[0]) = sign(X[0]).
Or is there any way around using sympy.MatrixSymbol and still get a cythonized function, where the input is a single vector rather than a list of symbols?
Would be greatful for any input, might well be a workaround at any step of the process described above. Thanks for reading!
Edit:
One short remark: One solution I could come up with myself is this:
Construct F and J using normal symbols; then replace the symbols in both expressions by the elements of some sympy.MatrixSymbol. This seems to get the job done, but the replacement takes considerable time, as J can reach dimensions of ~1000x1000 and above. I therefore would prefer to avoid such an approach.
After more extensive research, it seems the problem I was describing above is already fixed in the development/github version. After updating accordingly, all the Derivative terms involving MatrixElementare resolved correctly!
See here for reference.
This is a generic question. I found that in the tensorflow, after we build the graph, fetch data into the graph, the output from graph is a tensor. but in many cases, we need to do some computation based on this output (which is a tensor), which is not allowed in tensorflow.
for example, I'm trying to implement a RNN, which loops times based on data self property. That is, I need use a tensor to judge whether I should stop (I am not using dynamic_rnn since in my design, the rnn is highly customized). I find tf.while_loop(cond,body.....) might be a candidate for my implementation. But the official tutorial is too simple. I don't know how to add more functionalities into the 'body'. Can anyone give me few more complex example?
Also, in such case that if the future computation is based on the tensor output (ex: the RNN stop based on the output criterion), which is very common case. Is there an elegant way or better way instead of dynamic graph?
What is stopping you from adding more functionality to the body? You can build whatever complex computational graph you like in the body and take whatever inputs you like from the enclosing graph. Also, outside of the loop, you can then do whatever you want with whatever outputs you return. As you can see from the amount of 'whatevers', TensorFlow's control flow primitives were built with much generality in mind. Below is another 'simple' example, in case it helps.
import tensorflow as tf
import numpy as np
def body(x):
a = tf.random_uniform(shape=[2, 2], dtype=tf.int32, maxval=100)
b = tf.constant(np.array([[1, 2], [3, 4]]), dtype=tf.int32)
c = a + b
return tf.nn.relu(x + c)
def condition(x):
return tf.reduce_sum(x) < 100
x = tf.Variable(tf.constant(0, shape=[2, 2]))
with tf.Session():
tf.global_variables_initializer().run()
result = tf.while_loop(condition, body, [x])
print(result.eval())
I have made a python code to smoothen a given signal using the Weierstrass transform, which is basically the convolution of a normalised gaussian with a signal.
The code is as follows:
#Importing relevant libraries
from __future__ import division
from scipy.signal import fftconvolve
import numpy as np
def smooth_func(sig, x, t= 0.002):
N = len(x)
x1 = x[-1]
x0 = x[0]
# defining a new array y which is symmetric around zero, to make the gaussian symmetric.
y = np.linspace(-(x1-x0)/2, (x1-x0)/2, N)
#gaussian centered around zero.
gaus = np.exp(-y**(2)/t)
#using fftconvolve to speed up the convolution; gaus.sum() is the normalization constant.
return fftconvolve(sig, gaus/gaus.sum(), mode='same')
If I run this code for say a step function, it smoothens the corner, but at the boundary it interprets another corner and smoothens that too, as a result giving unnecessary behaviour at the boundary. I explain this with a figure shown in the link below.
Boundary effects
This problem does not arise if we directly integrate to find convolution. Hence the problem is not in Weierstrass transform, and hence the problem is in the fftconvolve function of scipy.
To understand why this problem arises we first need to understand the working of fftconvolve in scipy.
The fftconvolve function basically uses the convolution theorem to speed up the computation.
In short it says:
convolution(int1,int2)=ifft(fft(int1)*fft(int2))
If we directly apply this theorem we dont get the desired result. To get the desired result we need to take the fft on a array double the size of max(int1,int2). But this leads to the undesired boundary effects. This is because in the fft code, if size(int) is greater than the size(over which to take fft) it zero pads the input and then takes the fft. This zero padding is exactly what is responsible for the undesired boundary effects.
Can you suggest a way to remove this boundary effects?
I have tried to remove it by a simple trick. After smoothening the function I am compairing the value of the smoothened signal with the original signal near the boundaries and if they dont match I replace the value of the smoothened func with the input signal at that point.
It is as follows:
i = 0
eps=1e-3
while abs(smooth[i]-sig[i])> eps: #compairing the signals on the left boundary
smooth[i] = sig[i]
i = i + 1
j = -1
while abs(smooth[j]-sig[j])> eps: # compairing on the right boundary.
smooth[j] = sig[j]
j = j - 1
There is a problem with this method, because of using an epsilon there are small jumps in the smoothened function, as shown below:
jumps in the smooth func
Can there be any changes made in the above method to solve this boundary problem?
Best approach is probably to use mode = 'valid':
The output consists only of those elements that do not
rely on the zero-padding.
Unless you can wrap your signal, or the signal being processed is an excerpt from a larger signal (in which case: process full signal then crop region of interest) you are always going to have edge effects when doing convolution. You have to choose how you want to deal with them. Using mode = valid just crops them off, which is a pretty good solution. If you know that the signal is always 'step-like' you could then extend the front and end of the processed signal as appropriate.
What a symmetric filter kernel produces at the ends depends on what you assume the data is beyond the ends.
If you don't like the looks of the current result, which assumes zeros beyond both ends, try extending the data with another assumption, say a reflection of the data, or polynomial regression continuation. Extend the data on both ends by at least half the length of the filter kernel (except if your extension is zeros, which come for free with the existing zero-padding required for non-circular convolution). Then remove the added end exensions after filtering, and see if you like the looks of your assumption. If not, try another assumption. Or better yet, use actual data beyond the ends if you have such.
I am having trouble sovling the optical bloch equation, which is a first order ODE system with complex values. I have found scipy may solve such system, but their webpage offers too little information and I can hardly understand it.
I have 8 coupled first order ODEs, and I should generate a function like:
def derv(y):
compute the time dervative of elements in y
return answers as an array
then do complex_ode(derv)
My questions are:
my y is not a list but a matrix, how can i give a corrent output
fits into complex_ode()?
complex_ode() needs a jacobian, I have no idea how to start constructing one
and what type it should be?
Where should I put the initial conditions like in the normal ode and
time linspace?
this is scipy's complex_ode link:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.complex_ode.html
Could anyone provide me with more infomation so that I can learn a bit more.
I think we can at least point you in the right direction. The optical
bloch equation is a problem which is well understood in the scientific
community, although not by me :-), so there are already solutions on the internet
to this particular problem.
http://massey.dur.ac.uk/jdp/code.html
However, to address your needs, you spoke of using complex_ode, which I suppose
is fine, but I think just plain scipy.integrate.ode will work just fine as well
according to their documentation:
from scipy import eye
from scipy.integrate import ode
y0, t0 = [1.0j, 2.0], 0
def f(t, y, arg1):
return [1j*arg1*y[0] + y[1], -arg1*y[1]**2]
def jac(t, y, arg1):
return [[1j*arg1, 1], [0, -arg1*2*y[1]]]
r = ode(f, jac).set_integrator('zvode', method='bdf', with_jacobian=True)
r.set_initial_value(y0, t0).set_f_params(2.0).set_jac_params(2.0)
t1 = 10
dt = 1
while r.successful() and r.t < t1:
r.integrate(r.t+dt)
print r.t, r.y
You also have the added benefit of an older more established and better
documented function. I am surprised you have 8 and not 9 coupled ODE's, but I'm
sure you understand this better than I. Yes, you are correct, your function
should be of the form ydot = f(t,y), which you call def derv() but you're
going to need to make sure your function takes at least two parameters
like derv(t,y). If your y is in matrix, no problem! Just "reshape" it in
the derv(t,y) function like so:
Y = numpy.reshape(y,(num_rows,num_cols));
As long as num_rows*num_cols = 8, your number of ODE's you should be fine. Then
use the matrix in your computations. When you're all done, just be sure to return
a vector and not a matrix like:
out = numpy.reshape(Y,(8,1));
The Jacobian is not required, but it will likely allow the computation to proceed
much more quickly. If you do not know how to compute this you may want to consult
wikipedia or a calculus text book. It's pretty simple, but can be time consuming.
As far as initial conditions, you should probably already know what those should
be, whether it's complex or real valued. As long as you select values that are
within reason, it shouldn't matter much.