how to call python scipy quad with array inputs - python

I am using nested scipy.integrate.quad calls to integrate a 2 dimensional integrand. The integrand is made of numpy functions - so it is much more efficient to pass it an array of inputs - than to loop through the inputs and call it once for each one - it is ~2 orders of magnitude faster because of numpy's arrays.
However.... if I want to integrate my integrand over only one dimension - but with an array of inputs over the other dimension things fall down - it seems like the 'scipy' quadpack package isn't able to do whatever it is that numpy does to handle arrayed inputs. Has anyone else seen this - and or found a way of fixing it - or am i misunderstanding it. The error i get from quad is :
Traceback (most recent call last):
File "C:\Users\JP\Documents\Python\TestingQuad\TestingQuad_v2.py", line 159, in <module>
fnIntegrate_x(0, 1, NCALLS_SET, True)
File "C:\Users\JP\Documents\Python\TestingQuad\TestingQuad_v2.py", line 35, in fnIntegrate_x
I = Integrate_x(yarray)
File "C:\Users\JP\Documents\Python\TestingQuad\TestingQuad_v2.py", line 23, in Integrate_x
return quad(Integrand, 0, np.pi/2, args=(y))[0]
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 247, in quad
retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points)
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 312, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
quadpack.error: Supplied function does not return a valid float.
I have put a cartoon version of what i'm trying to do below - what i'm actually doing has a more complicated integrand but this is the gyst.
The meat is at the top - the bottom is doing benchmarking to show my point.
import numpy as np
import time
from scipy.integrate import quad
def Integrand(x, y):
'''
Integrand
'''
return np.sin(x)*np.sin( y )
def Integrate_x(y):
'''
Integrate over x given (y)
'''
return quad(Integrand, 0, np.pi/2, args=(y))[0]
def fnIntegrate_x(ystart, yend, nsteps, ArrayInput = False):
'''
'''
yarray = np.arange(ystart,yend, (yend - ystart)/float(nsteps))
I = np.zeros(nsteps)
if ArrayInput :
I = Integrate_x(yarray)
else :
for i,y in enumerate(yarray) :
I[i] = Integrate_x(y)
return y, I
NCALLS_SET = 1000
NSETS = 10
SETS_t = np.zeros(NSETS)
for i in np.arange(NSETS) :
XInputs = np.random.rand(NCALLS_SET, 2)
t0 = time.time()
for x in XInputs :
Integrand(x[0], x[1])
t1 = time.time()
SETS_t[i] = (t1 - t0)/NCALLS_SET
print "Benchmarking Integrand - Single Values:"
print "NCALLS_SET: ", NCALLS_SET
print "NSETS: ", NSETS
print "TimePerCall(s): ", np.mean(SETS_t) , np.std(SETS_t)/ np.sqrt(SETS_t.size)
print "TotalTime: ",np.sum(SETS_t) * NCALLS_SET
'''
Benchmarking Integrand - Single Values:
NCALLS_SET: 1000
NSETS: 10
TimePerCall(s): 1.23999834061e-05 4.06987868647e-06
'''
NCALLS_SET = 1000
NSETS = 10
SETS_t = np.zeros(NSETS)
for i in np.arange(NSETS) :
XInputs = np.random.rand(NCALLS_SET, 2)
t0 = time.time()
Integrand(XInputs[:,0], XInputs[:,1])
t1 = time.time()
SETS_t[i] = (t1 - t0)/NCALLS_SET
print "Benchmarking Integrand - Array Values:"
print "NCALLS_SET: ", NCALLS_SET
print "NSETS: ", NSETS
print "TimePerCall(s): ", np.mean(SETS_t) , np.std(SETS_t)/ np.sqrt(SETS_t.size)
print "TotalTime: ",np.sum(SETS_t) * NCALLS_SET
'''
Benchmarking Integrand - Array Values:
NCALLS_SET: 1000
NSETS: 10
TimePerCall(s): 2.00009346008e-07 1.26497018465e-07
'''
NCALLS_SET = 1000
NSETS = 100
SETS_t = np.zeros(NSETS)
for i in np.arange(NSETS) :
t0 = time.time()
fnIntegrate_x(0, 1, NCALLS_SET, False)
t1 = time.time()
SETS_t[i] = (t1 - t0)/NCALLS_SET
print "Benchmarking fnIntegrate_x - Single Values:"
print "NCALLS_SET: ", NCALLS_SET
print "NSETS: ", NSETS
print "TimePerCall(s): ", np.mean(SETS_t) , np.std(SETS_t)/ np.sqrt(SETS_t.size)
print "TotalTime: ",np.sum(SETS_t) * NCALLS_SET
'''
NCALLS_SET: 1000
NSETS: 100
TimePerCall(s): 0.000165750000477 8.61204306241e-07
TotalTime: 16.5750000477
'''
NCALLS_SET = 1000
NSETS = 100
SETS_t = np.zeros(NSETS)
for i in np.arange(NSETS) :
t0 = time.time()
fnIntegrate_x(0, 1, NCALLS_SET, True)
t1 = time.time()
SETS_t[i] = (t1 - t0)/NCALLS_SET
print "Benchmarking fnIntegrate_x - Array Values:"
print "NCALLS_SET: ", NCALLS_SET
print "NSETS: ", NSETS
print "TimePerCall(s): ", np.mean(SETS_t) , np.std(SETS_t)/ np.sqrt(SETS_t.size)
'''
**** Doesn't work!!!! *****
Traceback (most recent call last):
File "C:\Users\JP\Documents\Python\TestingQuad\TestingQuad_v2.py", line 159, in <module>
fnIntegrate_x(0, 1, NCALLS_SET, True)
File "C:\Users\JP\Documents\Python\TestingQuad\TestingQuad_v2.py", line 35, in fnIntegrate_x
I = Integrate_x(yarray)
File "C:\Users\JP\Documents\Python\TestingQuad\TestingQuad_v2.py", line 23, in Integrate_x
return quad(Integrand, 0, np.pi/2, args=(y))[0]
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 247, in quad
retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points)
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 312, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
quadpack.error: Supplied function does not return a valid float.
'''

It is possible through numpy.vectorize function. I had this problem for a long time and then came up to this vectorize function.
you can use it like this:
vectorized_function = numpy.vectorize(your_function)
output = vectorized_function(your_array_input)

Afraid I'm answering my own question with a negative here. I don't think it is possible. Seems like quad is some sort of port of a library written in something else - as such it is the library on the inside that defines how things are done - so it is probably not possible to do what i wanted without redesigning the library itself.
for other people with timing issues on multiple D integration, I found the best way was using a dedicated integration library. I found that 'cuba' seemed to have some pretty efficient multi D integration routines.
http://www.feynarts.de/cuba/
These routines are written in c so i ended up using SWIG to talk to them - and eventually also for efficiency re-wrote my integrand in c - which sped things up loads....

Use quadpy (a project of mine). It is fully vectorized, so can handle array-valued functions of any shape, and does so very fast.

I was having this issue with integrating probability density functions from -np.inf to np.inf over all dimensions.
I fixed it by creating a wrapper function taking in *args, converting args to a numpy array, and integrating the wrapper function.
I think using numpy's vectorize only integrates the subspace where all values are equal.
Here's an example:
from scipy.integrate import nquad
from scipy.stats import multivariate_normal
mean = [0., 0.]
cov = np.array([[1., 0.],
[0., 1.]])
bivariate_normal = multivariate_normal(mean=mean, cov=cov)
def pdf(*args):
x = np.array(args)
return bivariate_normal.pdf(x)
integration_range = [[-18, 18], [-18, 18]]
nquad(pdf, integration_range)
Output: (1.000000000000001, 1.3429066352690133e-08)

Related

How do I assign a a value to a 'Nonetype' function?

I have been trying to use the Runge-Kutta45 integration method to update a set of positions and velocities of particles in space to get the new state at some time step.
Initially, I created an array with all these elements and combined them (y):
r_vec_1 = np.array([0, 0])
v_vec_1 = np.array([-np.sqrt(2), -np.sqrt(2)])
r_vec_2 = np.array([-1, 0])
v_vec_2 = np.array([np.sqrt(2) / 2, np.sqrt(2) / 2])
r_vec_3 = np.array([1, 0])
v_vec_3 = np.array([np.sqrt(2) / 2, np.sqrt(2) / 2])
y_0 = np.concatenate((r_vec_1, v_vec_1, r_vec_2, v_vec_2, r_vec_3, v_vec_3))
y = y_0
Now, I used this array as my initial conditions and created a function that gave me a new function called F(y) which is the derivative of my function y represented in a set of 1st order ODEs:
def fun(t,y):
np.array([y[2], y[3], x1_double_dot(y, mass_vector), y1_double_dot(y, mass_vector),
y[6], y[7], x2_double_dot(y, mass_vector), y2_double_dot(y, mass_vector),
y[10], y[11], x3_double_dot(y, mass_vector), y3_double_dot(y, mass_vector)])
Once I had obtained the new function file, I used an initial and final time as well as a times step which is needed in the scipy.integrate.RK45 subroutine, resulting in the following code:
#Time start, step, and finish point
t_0 = 0
t = 0
t_step = 0.01
t_final = 200
nsteps = int((t_final - t)/t_step)
#The loop for running the Runge-Kutta method over some time period.
for step in np.linspace(t, t_final, num = nsteps):
y_new = sp.integrate.RK45(fun(t,y), t_0, y_0, t_final,vectorized=True)
history.append(y_new)
y_new = y
t += dt
history = np.array(history)
The problem is that once I run the code, I would expect the function y to update to the new state and keep integrating over the time period until elapsed. However, upon running this I receive the following error message:
Traceback (most recent call last):
File "C:\Users\RSlat\PycharmProjects\pythonProject\Practice\3BP Calculator.py", line 68, in <module>
y_new = sp.integrate.RK45(fun(t,y), t_0, y_0, t_final,vectorized=True)
File "C:\Users\RSlat\PycharmProjects\pythonProject\Practice\lib\site-packages\scipy\integrate\_ivp\rk.py", line 94, in __init__
self.f = self.fun(self.t, self.y)
File "C:\Users\RSlat\PycharmProjects\pythonProject\Practice\lib\site-packages\scipy\integrate\_ivp\base.py", line 138, in fun
return self.fun_single(t, y)
File "C:\Users\RSlat\PycharmProjects\pythonProject\Practice\lib\site-packages\scipy\integrate\_ivp\base.py", line 125, in fun_single
return self._fun(t, y[:, None]).ravel()
File "C:\Users\RSlat\PycharmProjects\pythonProject\Practice\lib\site-packages\scipy\integrate\_ivp\base.py", line 20, in fun_wrapped
return np.asarray(fun(t, y), dtype=dtype)
TypeError: 'NoneType' object is not callable
Any help at all would be greatly appreciated. Thanks and have an awesome day!
Apparently (and according to https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.RK45.html) sp.integrate.RK45() requires a callable at the first position.
Thus it should work when you write it this way:
sp.integrate.RK45(fun, t_0, y_0, t_final,vectorized=True)
As you can see, I only give the function (callable) "fun" (without parameters) to RK45.

how to pack my numpy variables and arrays when calling curve_fit?

This is my standalone code to reproduce the problem:
import numpy as np
from scipy.optimize import curve_fit
def find_vector_of_minor_axis_from_chunk(data):
n = 20 # number of points
time = np.linspace(0, 2 * np.pi, n)
guess_center_point = data.mean(1)
guess_center_point = guess_center_point[np.newaxis, :].transpose()
guess_a_phase = 0
guess_b_phase = 0
guess_a = 1
guess_b = 1
guess_a_axis_vector = np.array([[1], [0], [0]])
guess_b_axis_vector = np.array([[0], [1], [0]])
p0 = np.array([guess_center_point,
guess_a, guess_a_axis_vector, guess_a_phase,
guess_b, guess_b_axis_vector, guess_b_phase])
def ellipse_func(t, center_point, a, a_axis_vector, a_phase, b, b_axis_vector, b_phase):
return center_point + a * a_axis_vector * np.sin(t * a_phase) + b * b_axis_vector * np.sin(t + b_phase)
popt, pcov = curve_fit(ellipse_func, time, data, p0=p0)
center_point, a, a_axis_vector, a_phase, b, b_axis_vector, b_phase = popt
print(str(a_axis_vector, b_axis_vector))
shorter_vector = a_axis_vector
if np.abs(a_axis_vector) > np.aps(b_axis_vector):
shorter_vector = b_axis_vector
return shorter_vector
def main():
data = np.array([[-4.62767933, -4.6275775, -4.62735346, -4.62719652, -4.62711625, -4.62717975,
-4.62723845, -4.62722407, -4.62713901, -4.62708749, -4.62703238, -4.62689101,
-4.62687185, -4.62694013, -4.62701082, -4.62700483, -4.62697488, -4.62686825,
-4.62675683, -4.62675204],
[-1.58625998, -1.58625039, -1.58619648, -1.58617611, -1.58620606, -1.5861833,
-1.5861821, -1.58619169, -1.58615814, -1.58616893, -1.58613179, -1.58615934,
-1.58611262, -1.58610782, -1.58613179, -1.58614017, -1.58613059, -1.58612699,
-1.58607428, -1.58610183],
[-0.96714786, -0.96713827, -0.96715984, -0.96715145, -0.96716703, -0.96712869,
-0.96716104, -0.96713228, -0.96719698, -0.9671838, -0.96717062, -0.96717062,
-0.96715744, -0.96707717, -0.96709275, -0.96706519, -0.96715026, -0.96711791,
-0.96713588, -0.96714786]])
print(str(find_vector_of_minor_axis_from_chunk(data)))
if __name__ == '__main__':
main()
That gives me this traceback:
Traceback (most recent call last):
File "C:/Users/X/PycharmProjects/lissajous-achse/ellipse_fit.py", line 52, in <module>
main()
File "C:/Users/X/PycharmProjects/lissajous-achse/ellipse_fit.py", line 49, in main
print(str(find_vector_of_minor_axis_from_chunk(data)))
File "C:/Users/X/PycharmProjects/lissajous-achse/ellipse_fit.py", line 25, in find_vector_of_minor_axis_from_chunk
popt, pcov = curve_fit(ellipse_func, time, data, p0=p0)
File "C:\Users\X\PycharmProjects\lissajous-achse\venv\lib\site-packages\scipy\optimize\minpack.py", line 763, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "C:\Users\X\PycharmProjects\lissajous-achse\venv\lib\site-packages\scipy\optimize\minpack.py", line 392, in leastsq
raise TypeError('Improper input: N=%s must not exceed M=%s' % (n, m))
TypeError: Improper input: N=7 must not exceed M=3
Process finished with exit code 1
My code is an adaption of the second answer here. The problem causing the error message is solved by simple packing of variables here.
Why does the problem not surface in the mentioned second answer? And how can I pack my variables, which consist of several 3d vectors and individual scalars, to solve this problem? How do i pass in my t, which is a constant and should not be optimized?
Apparently python is quite smart regarding the length of the fields of the arguments, depending on the the initial guesses. So i could just pass in ONE variable, and split it up inside the function like so:
import numpy as np
from scipy.optimize import minimize
def find_vector_of_minor_axis_from_chunk(data):
n = 20 # number of points
guess_center_point = data.mean(1)
guess_center_point = guess_center_point[np.newaxis, :].transpose()
guess_a_phase = 0.0
guess_b_phase = 0.0
guess_a = 1.0
guess_b = 1.0
guess_a_axis_vector = np.array([[1.0], [0.0], [0.0]])
guess_b_axis_vector = np.array([[0.0], [1.0], [0.0]])
p0 = np.array([guess_center_point,
guess_a, guess_a_axis_vector, guess_a_phase,
guess_b, guess_b_axis_vector, guess_b_phase])
def ellipse_func(x, data):
center_point = x[0]
a = x[1]
a_axis_vector = x[2]
a_phase = x[3]
b = x[4]
b_axis_vector = x[5]
b_phase = x[6]
t = np.linspace(0, 2 * np.pi, n)
error = center_point + a * a_axis_vector * np.sin(t * a_phase) + b * b_axis_vector * np.sin(t + b_phase) - data
error_sum = np.sum(error**2)
print(str(error_sum))
return error_sum
popt, pcov = minimize(ellipse_func, p0, args=(data))
center_point, a, a_axis_vector, a_phase, b, b_axis_vector, b_phase = popt
print(str(a_axis_vector, b_axis_vector))
shorter_vector = a_axis_vector
if np.abs(a_axis_vector) > np.aps(b_axis_vector):
shorter_vector = b_axis_vector
return shorter_vector
def main():
data = np.array([[-4.62767933, -4.6275775, -4.62735346, -4.62719652, -4.62711625, -4.62717975,
-4.62723845, -4.62722407, -4.62713901, -4.62708749, -4.62703238, -4.62689101,
-4.62687185, -4.62694013, -4.62701082, -4.62700483, -4.62697488, -4.62686825,
-4.62675683, -4.62675204],
[-1.58625998, -1.58625039, -1.58619648, -1.58617611, -1.58620606, -1.5861833,
-1.5861821, -1.58619169, -1.58615814, -1.58616893, -1.58613179, -1.58615934,
-1.58611262, -1.58610782, -1.58613179, -1.58614017, -1.58613059, -1.58612699,
-1.58607428, -1.58610183],
[-0.96714786, -0.96713827, -0.96715984, -0.96715145, -0.96716703, -0.96712869,
-0.96716104, -0.96713228, -0.96719698, -0.9671838, -0.96717062, -0.96717062,
-0.96715744, -0.96707717, -0.96709275, -0.96706519, -0.96715026, -0.96711791,
-0.96713588, -0.96714786]])
print(str(find_vector_of_minor_axis_from_chunk(data)))
if __name__ == '__main__':
main()
Also i fixed some floating point vs integer errors in the vector for the initial values.
However now I get a different error:
Traceback (most recent call last):
File "C:/Users/X/PycharmProjects/lissajous-achse/ellipse_fit.py", line 61, in <module>
main()
File "C:/Users/X/PycharmProjects/lissajous-achse/ellipse_fit.py", line 58, in main
print(str(find_vector_of_minor_axis_from_chunk(data)))
File "C:/Users/X/PycharmProjects/lissajous-achse/ellipse_fit.py", line 34, in find_vector_of_minor_axis_from_chunk
popt, pcov = minimize(ellipse_func, p0, args=(data))
File "C:\Users\X\PycharmProjects\lissajous-achse\venv\lib\site-packages\scipy\optimize\_minimize.py", line 604, in minimize
return _minimize_bfgs(fun, x0, args, jac, callback, **options)
File "C:\Users\X\PycharmProjects\lissajous-achse\venv\lib\site-packages\scipy\optimize\optimize.py", line 1063, in _minimize_bfgs
if isinf(rhok): # this is patch for numpy
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I guess that
The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
is some internal error, stemming from the internal decision matrix how to proceed. I don't know how I caused it and how to fix it. When i figure out how it is done properly, I will come back and edit this answer.

Custom Theano Op to do numerical integration

I'm attempting to write a custom Theano Op which numerically integrates a function between two values. The Op is a custom likelihood for PyMC3 which involves the numerical evaluation of some integrals. I can't simply use the #as_op decorator as I need to use HMC to do the MCMC step. Any help would be much appreciated, as this question seems to have come up several times but has never been solved (e.g. https://stackoverflow.com/questions/36853015/using-theano-with-numerical-integration, Theano: implementing an integral function).
Clearly one solution would be to write a numerical integrator within Theano, but this seems like a waste of effort when very good integrators are already available, for example through scipy.integrate.
To keep this as a minimal example, let's just try and integrate a function between 0 and 1 inside an Op. The following integrates a Theano function outside of an Op, and produces correct results as far as my testing has gone.
import theano
import theano.tensor as tt
from scipy.integrate import quad
x = tt.dscalar('x')
y = x**4 # integrand
f = theano.function([x], y)
print f(0)
print f(1)
ans = integrate.quad(f, 0, 1)[0]
print ans
However, attempting to do integration within an Op appears much harder. My current best effort is:
import numpy as np
import theano
import theano.tensor as tt
from scipy import integrate
class IntOp(theano.Op):
__props__ = ()
def make_node(self, x):
x = tt.as_tensor_variable(x)
return theano.Apply(self, [x], [x.type()])
def perform(self, node, inputs, output_storage):
x = inputs[0]
z = output_storage[0]
f_to_int = theano.function([x], x)
z[0] = tt.as_tensor_variable(integrate.quad(f_to_int, 0, 1)[0])
def infer_shape(self, node, i0_shapes):
return i0_shapes
def grad(self, inputs, output_grads):
ans = integrate.quad(output_grads[0], 0, 1)[0]
return [ans]
intOp = IntOp()
x = tt.dmatrix('x')
y = intOp(x)
f = theano.function([x], y)
inp = np.asarray([[2, 4], [6, 8]], dtype=theano.config.floatX)
out = f(inp)
print inp
print out
Which gives the following error:
Traceback (most recent call last):
File "stackoverflow.py", line 35, in <module>
out = f(inp)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 912, in rval
r = p(n, [x[0] for x in i], o)
File "stackoverflow.py", line 17, in perform
f_to_int = theano.function([x], x)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 320, in function
output_keys=output_keys)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 390, in pfunc
for p in params]
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 489, in _pfunc_param_to_in
raise TypeError('Unknown parameter type: %s' % type(param))
TypeError: Unknown parameter type: <type 'numpy.ndarray'>
Apply node that caused the error: IntOp(x)
Toposort index: 0
Inputs types: [TensorType(float64, matrix)]
Inputs shapes: [(2, 2)]
Inputs strides: [(16, 8)]
Inputs values: [array([[ 2., 4.],
[ 6., 8.]])]
Outputs clients: [['output']]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "stackoverflow.py", line 30, in <module>
y = intOp(x)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "stackoverflow.py", line 11, in make_node
return theano.Apply(self, [x], [x.type()])
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
I'm surprised by this, especially the TypeError, as I thought I had converted the output_storage variable into a tensor but it appears to believe here that it is still an ndarray.
I found your question because I'm trying to build a random variable in PyMC3 that represents a general point process (Hawkes, Cox, Poisson, etc) and the likelihood function has an integral. I really want to be able to use Hamiltonian Monte Carlo or NUTS samplers, so I needed that integral with respect to time to be differentiable.
Starting off of your attempt, I made an integrateOut theano Op that seems to work correctly with the behavior I need. I've tested it out on a few different inputs (not on my stats model just yet, but it appears promising!). I'm a total theano n00b, so pardon any stupidity. I would greatly appreciate feedback if anyone has any. Not sure it's exactly what you're looking for, but here's my solution (example at the bottom and in the doc strings). *EDIT: simplified some remnants of screwing around with ways to do this.
import theano
import theano.tensor as T
from scipy.integrate import quad
class integrateOut(theano.Op):
"""
Integrate out a variable from an expression, computing
the definite integral w.r.t. the variable specified
!!! Only implemented in this for scalars !!!
Parameters
----------
f : scalar
input 'function' to integrate
t : scalar
the variable to integrate out
t0: float
lower integration limit
tf: float
upper integration limit
Returns
-------
scalar
a new scalar with the 't' integrated out
Notes
-----
usage of this looks like:
x = T.dscalar('x')
y = T.dscalar('y')
t = T.dscalar('t')
z = (x**2 + y**2)*t
# integrate z w.r.t. t as a function of (x,y)
intZ = integrateOut(z,t,0.0,5.0)(x,y)
gradIntZ = T.grad(intZ,[x,y])
funcIntZ = theano.function([x,y],intZ)
funcGradIntZ = theano.function([x,y],gradIntZ)
"""
def __init__(self,f,t,t0,tf,*args,**kwargs):
super(integrateOut,self).__init__()
self.f = f
self.t = t
self.t0 = t0
self.tf = tf
def make_node(self,*inputs):
self.fvars=list(inputs)
# This will fail when taking the gradient... don't be concerned
try:
self.gradF = T.grad(self.f,self.fvars)
except:
self.gradF = None
return theano.Apply(self,self.fvars,[T.dscalar().type()])
def perform(self,node, inputs, output_storage):
# Everything else is an argument to the quad function
args = tuple(inputs)
# create a function to evaluate the integral
f = theano.function([self.t]+self.fvars,self.f)
# actually compute the integral
output_storage[0][0] = quad(f,self.t0,self.tf,args=args)[0]
def grad(self,inputs,grads):
return [integrateOut(g,self.t,self.t0,self.tf)(*inputs)*grads[0] \
for g in self.gradF]
x = T.dscalar('x')
y = T.dscalar('y')
t = T.dscalar('t')
z = (x**2+y**2)*t
intZ = integrateOut(z,t,0,1)(x,y)
gradIntZ = T.grad(intZ,[x,y])
funcIntZ = theano.function([x,y],intZ)
funcGradIntZ = theano.function([x,y],gradIntZ)
print funcIntZ(2,2)
print funcGradIntZ(2,2)
SymPy is proving harder than anticipated, but in the meantime in case anyone's finding this useful, I'll also point out how to modify this Op to allow for changing the final timepoint without creating a new Op. This can be useful if you have a point process, or if you have uncertainty in your time measurements.
class integrateOut2(theano.Op):
def __init__(self, f, int_var, *args,**kwargs):
super(integrateOut2,self).__init__()
self.f = f
self.int_var = int_var
def make_node(self, *inputs):
tmax = inputs[0]
self.fvars=list(inputs[1:])
return theano.Apply(self, [tmax]+self.fvars, [T.dscalar().type()])
def perform(self, node, inputs, output_storage):
# Everything else is an argument to the quad function
tmax = inputs[0]
args = tuple(inputs[1:])
# create a function to evaluate the integral
f = theano.function([self.int_var]+self.fvars, self.f)
# actually compute the integral
output_storage[0][0] = quad(f, 0., tmax, args=args)[0]
def grad(self, inputs, grads):
tmax = inputs[0]
param_grads = T.grad(self.f, self.fvars)
## Recall fundamental theorem of calculus
## d/dt \int^{t}_{0}f(x)dx = f(t)
## So sub in t_max to the graph
FTC_grad = theano.clone(self.f, {self.int_var: tmax})
grad_list = [FTC_grad*grads[0]] + \
[integrateOut2(grad_fn, self.int_var)(*inputs)*grads[0] \
for grad_fn in param_grads]
return grad_list
I always use the following code where I generate B = 10000 samples of n = 30 observations from a normal distribution with µ = 1 and σ 2 = 2.25. For each sample, the parameters µ and σ are estimated and stored in a matrix. I hope this can help you.
loglik <- function(p,z){
sum(dnorm(z,mean=p[1],sd=p[2],log=TRUE))
}
set.seed(45)
n <- 30
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)
B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}
One can also obtain the ML estimate of λ and use the bootstrap to estimate the bias and the standard error of the estimate. First calculate the MLE of λ Then, we estimate the bias and the standard error of λˆ by a nonparametric bootstrap.
B <- 9999
lambda.B <- rep(NA,B)
n <- length(w.time)
for (b in 1:B){
b.sample <- sample(1:n,n,replace=TRUE)
lambda.B[b] <- 1/mean(w.time[b.sample])
}
bias <- mean(lambda.B-m$estimate)
sd(lambda.B)
In the second part we calculate a 95% confidence interval for the mean time between failures.
n <- length(w.time)
m <- mean(w.time)
se <- sd(w.time)/sqrt(n)
interval.1 <- m + se * qnorm(c(0.025,0.975))
interval.1
But we can also use the the assumption that the data are from an exponential distribution. In that case we have varX¯ = 1/(nλ^2) = θ^{2}/n which can be estimated by X¯^{2}/n.
sd.m <- sqrt(m^2/n)
interval.2 <- m + sd.m * qnorm(c(0.025,0.975))
interval.2
We can also estimate the standard error of ˆθ by means of a boostrap procedure. We use the nonparametric bootstrap, that is, we sample from the original sample with replacement.
B <- 9999
m.star <- rep(NA,B)
for (b in 1:B){
m.star[b] <- mean(sample(w.time,replace=TRUE))
}
sd.m.star <- sd(m.star)
interval.3 <- m + sd.m.star * qnorm(c(0.025,0.975))
interval.3
An interval not based on the assumption of normality of ˆθ is obtained by the percentile method:
interval.4 <- quantile(m.star, probs=c(0.025,0.975))
interval.4

Scipy fmin_powell function

I'm trying to optimize the function eul with the initial guess X0 (X0 = [0.6421, -0.5046]) using fmin_powell. The function eul gets the initial conditions and calculates the velocity and temperature profile across a vertical flat plate using predictor-corrector method. I've displayed my code below:
def eul(X):
f2, q1 = X
N_tot = 5000;
n=np.linspace(0.0,10.0,N_tot)
f = np.zeros(N_tot,dtype=float).reshape(N_tot,)
dfdn = np.zeros(N_tot,dtype=float).reshape(N_tot,)
d2fdn2 = np.zeros(N_tot,dtype=float).reshape(N_tot,)
q = np.zeros(N_tot,dtype=float).reshape(N_tot,)
dqdn = np.zeros(N_tot,dtype=float).reshape(N_tot,)
Pr = 0.72; #Prandtl Number
##x0 = [d2fdn2_g1, dtdn_g1]
# Boundary Conditions
f[0] = 0.0;
dfdn[0] = 0.0;
d2fdn2[0] = f2;
q[0] = 1.0;
dqdn[0] = q1;
for i in np.arange(0,N_tot-1):
Dn = n[i+1] - n[i];
f_tmp=f[i]+dfdn[i]*Dn;
dfdn_tmp=dfdn[i]+d2fdn2[i]*Dn;
d2fdn2_tmp=d2fdn2[i]+(-3*f[i]*d2fdn2[i]+2*(dfdn[i])**2-q[i])*Dn;
q_tmp=q[i]+dqdn[i]*Dn;
dqdn_tmp=dqdn[i]-3*Pr*f[i]*dqdn[i]*Dn;
f[i+1]=f[i]+0.5*Dn*(dfdn[i]+dfdn_tmp);
dfdn[i+1]=dfdn[i]+0.5*Dn*(d2fdn2[i]+d2fdn2_tmp);
d2fdn2[i+1]=d2fdn2[i]+0.5*Dn*((-3*f[i]*d2fdn2[i]+2*(dfdn[i])**2-q[i])+(-3*f_tmp*d2fdn2_tmp+2*(dfdn_tmp)**2-q_tmp));
q[i+1]=q[i]+0.5*Dn*(dqdn[i]+dqdn_tmp);
dqdn[i+1]=dqdn[i]-0.5*Dn*((3*Pr*f[i]*dqdn[i])+(3*Pr*f_tmp*dqdn_tmp));
if((q[i+1]>1)|(q[i+1]<0)|(f[i+1]>2)|(f[i+1]<0)):
q[N_tot-1]=1+1/i;
dfdn[N_tot-1]=1+1/i;
break
return dfdn, q, n
MAIN PROGRAM
import numpy as np
import scipy as sp
import scipy.optimize
# Initial Guess
d2fdn2_g1 = 0.6421;
dtdn_g1 = -0.5046;
X0 = np.array([d2fdn2_g1, dtdn_g1])
X = scipy.optimize.fmin_powell(eul, X0)
I'm getting an error message:
Traceback (most recent call last):
File "C:\Users\labuser\Desktop\Sankar\New_Euler.py", line 52, in <module>
X = scipy.optimize.fmin_powell(eul, X0)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 1519, in fmin_powell
fval, x, direc1 = _linesearch_powell(func, x, direc1, tol=xtol*100)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 1418, in _linesearch_powell
alpha_min, fret, iter, num = brent(myfunc, full_output=1, tol=tol)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 1241, in brent
brent.optimize()
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 1113, in optimize
xa,xb,xc,fa,fb,fc,funcalls = self.get_bracket_info()
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 1089, in get_bracket_info
xa,xb,xc,fa,fb,fc,funcalls = bracket(func, args=args)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 1357, in bracket
if (fa < fb): # Switch so fa > fb
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My guess is your function eul is returning an array. fmin_powell minimizes a scalar function. Check that eul returns a single value, not an array.
(Without seeing more code, the best we can do is guess. It would help if you added the definition of eul to the question.)
Instead of sending an array to fmin_powell just define another function that computes sum of the returned array, and use it.
# Initial Guess
d2fdn2_g1 = 0.6421;
dtdn_g1 = -0.5046;
def eeul(X):
return np.sum(eul(X))
X0 = np.array([d2fdn2_g1, dtdn_g1])
X = scipy.optimize.fmin_powell(eeul, X0)
This seems to work properly.

SciPy optimization for under-constrained system

I often have to solve nonlinear problems in which the number of variables exceeds the number of constraints (or sometimes the other way around). Usually some of the constraints or variables are redundant in a complicated way. Is there any way to solve such problems?
Most of the scipy solvers seem to assume that the number of constraints equals the number of variables, and that the Jacobian is nonsingular. leastsq works sometimes but it doesn't even try when the constraints are fewer than the number of variables. I realize that I could just run fmin on linalg.norm(F), but this is much less efficient than any method which makes use of the Jacobian.
Here is an example of a problem which demonstrates what I am talking about. It obviously has a solution, but leastsq gives an error. Of course, this example is easy to solve by hand, I just put it here to demonstrate the issue.
import numpy as np
import scipy.optimize
mat = np.random.randn(5, 7)
def F(x):
y = np.dot(mat, x)
return np.array([ y[0]**2 + y[1]**3 + 12, y[2] + 17 ])
x0 = np.random.randn(7)
scipy.optimize.leastsq(F, x0)
The error message I get is:
Traceback (most recent call last):
File "question.py", line 13, in <module>
scipy.optimize.leastsq(F, x0)
File "/home/dstahlke/apps/scipy/lib64/python2.7/site-packages/scipy/optimize/minpack.py", line 278, in leastsq
raise TypeError('Improper input: N=%s must not exceed M=%s' % (n,m))
TypeError: Improper input: N=7 must not exceed M=2
I have scoured the net for an answer and have even asked on the SciPy mailing list, and got no response. For now I hacked the SciPy source so that the newton_krylov solver uses pinv(), but I don't think this is an optimal solution.
How about resize the return array from F() to the number of variables:
import numpy as np
import scipy.optimize
mat = np.random.randn(5, 7)
def F(x):
y = np.dot(mat, x)
return np.resize(np.array([ y[0]**2 + y[1]**3 + 12, y[2] + 17]), 7)
while True:
x0 = np.random.randn(7)
r = scipy.optimize.leastsq(F, x0)
err = F(r[0])
norm = np.dot(err, err)
if norm < 1e-6:
break
print err

Categories