Conflict between vectorization/broadcasting and solving an ODE with solve_ivp - python

Using a NumPy array and vectorization, I'm trying to create a population of n different individuals, with each individual having three properties: alpha, beta, and phenotype (the phenotype being calculated as the steady state of a differential equation that involves alpha and beta). So, I want each individual to have its own phenotype.
However, my code produces the same phenotype for every individual. Moreover, this unwanted behavior only occurs if there happen to be exactly n entries in solve_ivp's y0 array (which here is [0, 1]) -- otherwise, a broadcasting error is produced:
ValueError: operands could not be broadcast together with shapes (2,) (3,)
Here's the code:
import numpy as np
from scipy.integrate import solve_ivp
def create_population(n):
"""creates a population of n individuals"""
pop = np.zeros(n, dtype=[('alpha','<f8'),('beta','<f8'),('phenotype','<f8')])
pop['alpha'] = np.random.randn(n)
pop['beta'] = np.random.randn(n) + 5
def phenotype(n):
"""creates the phenotype"""
def pheno_ode(t_ode, y):
"""defines the ode for the phenotype"""
dydt = 0.123 - y + pop['alpha'] * (y ** pop['beta'] / (1 + y ** pop['beta']))
return dydt
t_end = 1e06
sol = solve_ivp(pheno_ode, [0, t_end], [0, 1], method='BDF')
return sol.y[0][-1] # last entry is assumed to be the steady state
pop['phenotype'] = phenotype(n)
return pop
popul = create_population(3)
print(popul)
In contrast, if the phenotype is calculated from alpha and beta via a "simple" equation, then vectorization works fine:
def phenotype(n):
"""creates the phenotype"""
phenotype_simple = 2 * pop['alpha'] + pop['beta']
return phenotype_simple

There are two problems that I can see:
First, you have the initial condition for the ODE set to [0, 1]. The sets the size of the vector solution for solve_ivp to 2, regardless of the value of n. However, the arrays pop['alpha'] and pop['beta'] have length n, and in your script, you call create_population with n set to 3. So you have a mismatch in the array shapes in the formula for dydt: y has length 2, but pop['alpha'] and pop['beta'] have length 3. That causes the error that you see.
You can fix this by using, say, np.ones(n) instead of [0, 1] as the initial condition in your call to solve_ivp.
The second problem is in the statement return sol.y[0][-1] in the function phenotype(n). sol.y has shape (n, num_points), where num_points is the number of points computed by solve_ivp. So sol.y[0] is just the first component of the solution, and sol.y[0][-1] is the last value of the solution for the first component. It is a scalar, so when you execute pop['phenotype'] = phenotype(n), you are assigning the same value (the steady state of the first component) to all the phenotypes.
The return statement should be return sol.y[:, -1]. That returns the last column of the solution array (i.e. all the steady state phenotypes).

Related

Machine Learning - implementing a Gradient Descent in Python from Octave code

I am trying to implement a gradient descent function in Python from scratch which I have implemented and work in GNU Octave. Unfortunately I am stuck. I fiddled with it for a while and checked the NumPy documentation but so far no luck.
I am aware of libraries such as scikit-learn, however my purpose is to learn to code such a function from scratch. Perhaps I am going about it the wrong way.
Below you will find all the code necessary to reproduce the error.
Thanks in advance for your help.
Actual result: test fails with error -> "ValueError: matmul: Input operand 0 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)"
Expected result: and array with values [5.2148, -0.5733]
Function gradientDescent() in Octave:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - (alpha/m)*X'*(X*theta-y);
J_history(iter) = computeCost(X, y, theta);
end
Function gradient_descent() in python:
from numpy import zeros
def compute_cost(X, y, theta):
m = len(y)
ans = (X.T # theta).T - y
J = (ans # ans.T) / (2 * m)
return J[0, 0]
def gradient_descent(X, y, theta, alpha, num_iters):
m = len(y)
J_history = zeros((num_iters, 1), dtype=int)
for iter in range(num_iters):
theta = theta - (alpha / m) # X.T # (X # theta - y)
J_history[iter] = compute_cost(X, y, theta)
return theta
the test file: test_ml_utils.py
import unittest
import numpy as np
from ml.ml_utils import compute_cost, gradient_descent
class TestGradientDescent(unittest.TestCase):
# TODO: implement tests for Gradient Descent function
# [theta J_hist] = gradientDescent([1 5; 1 2; 1 4; 1 5],[1 6 4 2]',[0 0]',0.01,1000);
def test_gradient_descent_00(self):
X = np.array([[1, 5], [1, 2], [1, 4], [1, 5]])
y = np.array([1, 6, 4, 2])
theta = np.zeros(2)
alpha = 0.01
num_iter = 1000
r_theta = np.array([5.2148, -0.5733])
result = gradient_descent(X, y, theta, alpha, num_iter)
self.assertEqual((round(result, 4), r_theta), 'Result is wrong!')
if __name__ == '__main__':
unittest.main()
The __matmul__ operator # in Python binds more tightly than -. That means you're trying to do matrix multiplication with the operands (alpha / m), which is a scalar, and X.T, which is actually a matrix. See operator precedence.
In the Octave code, (alpha - m) * X' is doing scalar multiplication, not matrix, so if you want that same behavior in Python, use * rather than #. This seems to be because Octave overloads the * operator to perform scalar multiplication if one operand is a scalar, but matrix multiplication if both operands are matrices.
Adding to Adam's answer (which is correct in regards to the error you're getting).
However I wanted to add more generally, this code is meaningless to a reader without some sort of hint (whether programmatically or in the form of a comment) what dimensions the different variables take.
E.g., there is a hint in the code that y is likely to be 2-dimensional, and you're using len to get its size. Just as an example of how this might fail silently, consider this:
>>> y = numpy.array([[1,2,3,4,5]])
>>> len( y )
1
whereas presumably you want
>>> numpy.shape( y )
(1, 5)
or
>>> numpy.size( y )
5
I note in your unit test that you're passing a rank 1 vector instead of rank 2, so it turns out y is 1D instead of 2D, but operates with X which is 2D due to broadcasting. Your code therefore works despite the logic implied, but in the absence of making such things explicit, this is a runtime error waiting to happen.

Scipy minimize returns a higher value than minimum

As a part of multi-start optimization, I am running differential evolution (DE), the output of which I feed as initial values to scipy minimization with SLSQP (I need constraints).
I am testing testing the procedure on the Ackley function. Even in situations in which DE returns the optimum (zeros), scipy minimization deviates from the optimal initial value and returns a value higher than at the optimum.
Do you know how to make scipy minimize return the optimum? I noticed it helps to specify tolerance for scipy minimize, but it does not solve the issue completely. Scaling the objective function makes things worse. The problem is not present for COBYLA solver.
Here are the optimization steps:
# Set up
x0min = -20
x0max = 20
xdim = 4
fun = ackley
bounds = [(x0min,x0max)] * xdim
tol = 1e-12
# Get a DE solution
result = differential_evolution(fun, bounds,
maxiter=10000,
tol=tol,
workers = 1,
init='latinhypercube')
# Initialize at DE output
x0 = result.x
# Estimate the model
r = minimize(fun, x0, method='SLSQP', tol=1e-18)
which in my case yields
result.fun = -4.440892098500626e-16
r.fun = 1.0008238682246429e-09
result.x = array([0., 0., 0., 0.])
r.x = array([-1.77227927e-10, -1.77062108e-10, 4.33179228e-10, -2.73031830e-12])
Here is the implementation of the Ackley function:
def ackley(x):
# Computes the value of Ackley benchmark function.
# ACKLEY accepts a matrix of size (dim,N) and returns a vetor
# FVALS of size (N,)
# Parameters
# ----------
# x : 1-D array size (dim,) or a 2-D array size (dim,N)
# Each row of the matrix represents one dimension.
# Columns have therefore the interpretation of different points at which
# the function is evaluated. N is number of points to be evaluated.
# Returns
# -------
# fvals : a scalar if x is a 1-D array or
# a 1-D array size (N,) if x is a 2-D array size (dim,N)
# in which each row contains the function value for each column of X.
n = x.shape[0]
ninverse = 1 / n
sum1 = np.sum(x**2, axis=0)
sum2 = np.sum(np.cos(2 * np.pi * x), axis=0)
fvals = (20 + np.exp(1) - (20 * np.exp(-0.2 * np.sqrt( ninverse * sum1)))
- np.exp( ninverse * sum2))
return fvals
Decreasing the "step size used for numerical approximation of the Jacobian" in the SLSQP options solved the issue for me.

scipy's sparse eigs yielding wrong eigenvectors

I am a bit confused regarding the following issue:
I am computing fixed points of a quantum channel, which means I want to compute the leading eigenvector of a specific matrix. The matrix is such that its dimensionality is n^2 x n^2 and defined in such a way that the leading eigenvalue reshaped to a matrix with shape n x n is a positive matrix (self adjoint with positive eigenvalues).
If I do this with scipy.sparse.linalg.eigs however I get wrong results. The exact computation (using scipy.linalg.eig) however works fine. I tried around playing with the arguments for k and ncv for the solver, but didn't get i working properly unless I set k=n**2 in which case eigs just refers to eig. This, however, won't work in the case that I have actually in mind where the channel (super_op in the script below) is actually encoded as a LinearOperator. So I rely on using eigs :/
Anybody any idea how to get this run properly?
Thanks to everybody in advance!
import numpy as np
from numpy.random import rand
from numpy import tensordot as td
from scipy.sparse.linalg import eigs
from scipy.linalg import eig
n = 16
d = 3
kraus_op = .5 - rand(n, d, n) + 1j * (.5 - rand(n, d, n))
super_op = td(kraus_op, kraus_op.conj(), [[1], [1]]).transpose(0, 2, 1, 3)
########
# Sparse
########
vals, vecs = eigs(super_op.reshape(n**2, n**2), k=n*(n-1), which='LM')
rho = vecs[:,0].reshape(n, n)
print('is self adjoint: ', np.allclose(rho, rho.conj().T))
super_op_times_rho = td(super_op, rho, [[2, 3], [0, 1]])
print('super_op(rho) == lambda * rho :', np.allclose(rho, super_op_times_rho/vals[0]))
########
# Exact
########
vals, vecs = eig(super_op.reshape(n**2, n**2))
rho = vecs[:,0].reshape(n, n)
print('is self adjoint: ', np.allclose(rho, rho.conj().T))
super_op_times_rho = td(super_op, rho, [[2, 3], [0, 1]])
print('super_op(rho) == lambda * rho :', np.allclose(rho, super_op_times_rho/vals[0]))
the result is:
is self adjoint: False
super_op(rho) == lambda * rho : True
is self adjoint: True
super_op(rho) == lambda * rho : True
For completeness:
Python 3.5.2
numpy 1.16.1
scipy 1.2.1
After all I found the solution with some help of my colleagues:
While eig gives the eigenvectors (1) sorted (by magnitude) and (2) such that the first entry is real, eigs has another sorting that must not be as in eig and also does not regularize the complex phase of the eigenvector. Correcting the phase is easily done by dividing the tensor by the phase of the first entry (corresponding to ensuring that the first diagonal element is real to get rid of the freedom of choosing the complex phase of an eigenvector and making Hermiticity possible...).
So the corrected code snippet for the sparse case is:
vals, vecs = eigs(super_op.reshape(chi**2, chi**2), k=chi*(chi-1), which='LM')
# find the index corresponding to the largest eigenvalue
arg = np.argmax(np.abs(vals))
rho = vecs[:,arg].reshape(chi, chi)
# regularize the output array
rho *= np.abs(rho[0, 0])/rho[0, 0]

Solve a system of odes with scipy - how to reference different indexes?

I have a system of ODEs that depend on a matrix of data. Each ODE should reference a different column of data in its evaluation.
import numpy as np
n_eqns = 20
coeffs = np.random.normal(0, 1, (n_eqns, 20))
def dfdt(_, f, idx):
return (f ** 2) * coeffs[idx, :].sum() - 2 * f * coeffs.sum()
from scipy.integrate import ode
f0 = np.random.uniform(-1, 1, n_eqns)
t0 = 0
tf = 1
dt = 0.001
r = ode(dfdt)
r.set_initial_value(f0, t0).set_f_params(range(n_eqns))
while r.successful() and r.t < tf:
print(r.t+dt, r.integrate(r.t+dt))
How can I specify that each ODE should use the idx value associated with its index in the system of ODEs? The first equation should be passed idx=0, the second idx=1, and so on.
The function dfdt takes and returns the state and derivative, respectively as arrays (or other iterables). Thus, all you have to do is to loop over all indices and apply your operations accordingly. For example:
def dfdt(t,f):
output = np.empty_like(f)
for i,entry in enumerate(f)
output[i] = f[i]**2 * coeffs[i,:].sum() - 2*f[i]*coeffs.sum()
return output
You can also write this using NumPy’s component-wise operations (which is quicker):
def dfdt(t,f):
return f**2 * coeffs.sum(axis=1) - 2*f*coeffs.sum()
Finally note that using f for your state may be somewhat confusing since this is how ode denotes the derivative (which you call dfdt).

Artefacts from Riemann sum in scipy.signal.convolve

Short summary: How do I quickly calculate the finite convolution of two arrays?
Problem description
I am trying to obtain the finite convolution of two functions f(x), g(x) defined by
To achieve this, I have taken discrete samples of the functions and turned them into arrays of length steps:
xarray = [x * i / steps for i in range(steps)]
farray = [f(x) for x in xarray]
garray = [g(x) for x in xarray]
I then tried to calculate the convolution using the scipy.signal.convolve function. This function gives the same results as the algorithm conv suggested here. However, the results differ considerably from analytical solutions. Modifying the algorithm conv to use the trapezoidal rule gives the desired results.
To illustrate this, I let
f(x) = exp(-x)
g(x) = 2 * exp(-2 * x)
the results are:
Here Riemann represents a simple Riemann sum, trapezoidal is a modified version of the Riemann algorithm to use the trapezoidal rule, scipy.signal.convolve is the scipy function and analytical is the analytical convolution.
Now let g(x) = x^2 * exp(-x) and the results become:
Here 'ratio' is the ratio of the values obtained from scipy to the analytical values. The above demonstrates that the problem cannot be solved by renormalising the integral.
The question
Is it possible to use the speed of scipy but retain the better results of a trapezoidal rule or do I have to write a C extension to achieve the desired results?
An example
Just copy and paste the code below to see the problem I am encountering. The two results can be brought to closer agreement by increasing the steps variable. I believe that the problem is due to artefacts from right hand Riemann sums because the integral is overestimated when it is increasing and approaches the analytical solution again as it is decreasing.
EDIT: I have now included the original algorithm 2 as a comparison which gives the same results as the scipy.signal.convolve function.
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
import math
def convolveoriginal(x, y):
'''
The original algorithm from http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P, Q, N = len(x), len(y), len(x) + len(y) - 1
z = []
for k in range(N):
t, lower, upper = 0, max(0, k - (Q - 1)), min(P - 1, k)
for i in range(lower, upper + 1):
t = t + x[i] * y[k - i]
z.append(t)
return np.array(z) #Modified to include conversion to numpy array
def convolve(y1, y2, dx = None):
'''
Compute the finite convolution of two signals of equal length.
#param y1: First signal.
#param y2: Second signal.
#param dx: [optional] Integration step width.
#note: Based on the algorithm at http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P = len(y1) #Determine the length of the signal
z = [] #Create a list of convolution values
for k in range(P):
t = 0
lower = max(0, k - (P - 1))
upper = min(P - 1, k)
for i in range(lower, upper):
t += (y1[i] * y2[k - i] + y1[i + 1] * y2[k - (i + 1)]) / 2
z.append(t)
z = np.array(z) #Convert to a numpy array
if dx != None: #Is a step width specified?
z *= dx
return z
steps = 50 #Number of integration steps
maxtime = 5 #Maximum time
dt = float(maxtime) / steps #Obtain the width of a time step
time = [dt * i for i in range (steps)] #Create an array of times
exp1 = [math.exp(-t) for t in time] #Create an array of function values
exp2 = [2 * math.exp(-2 * t) for t in time]
#Calculate the analytical expression
analytical = [2 * math.exp(-2 * t) * (-1 + math.exp(t)) for t in time]
#Calculate the trapezoidal convolution
trapezoidal = convolve(exp1, exp2, dt)
#Calculate the scipy convolution
sci = signal.convolve(exp1, exp2, mode = 'full')
#Slice the first half to obtain the causal convolution and multiply by dt
#to account for the step width
sci = sci[0:steps] * dt
#Calculate the convolution using the original Riemann sum algorithm
riemann = convolveoriginal(exp1, exp2)
riemann = riemann[0:steps] * dt
#Plot
plt.plot(time, analytical, label = 'analytical')
plt.plot(time, trapezoidal, 'o', label = 'trapezoidal')
plt.plot(time, riemann, 'o', label = 'Riemann')
plt.plot(time, sci, '.', label = 'scipy.signal.convolve')
plt.legend()
plt.show()
Thank you for your time!
or, for those who prefer numpy to C. It will be slower than the C implementation, but it's just a few lines.
>>> t = np.linspace(0, maxtime-dt, 50)
>>> fx = np.exp(-np.array(t))
>>> gx = 2*np.exp(-2*np.array(t))
>>> analytical = 2 * np.exp(-2 * t) * (-1 + np.exp(t))
this looks like trapezoidal in this case (but I didn't check the math)
>>> s2a = signal.convolve(fx[1:], gx, 'full')*dt
>>> s2b = signal.convolve(fx, gx[1:], 'full')*dt
>>> s = (s2a+s2b)/2
>>> s[:10]
array([ 0.17235682, 0.29706872, 0.38433313, 0.44235042, 0.47770012,
0.49564748, 0.50039326, 0.49527721, 0.48294359, 0.46547582])
>>> analytical[:10]
array([ 0. , 0.17221333, 0.29682141, 0.38401317, 0.44198216,
0.47730244, 0.49523485, 0.49997668, 0.49486489, 0.48254154])
largest absolute error:
>>> np.max(np.abs(s[:len(analytical)-1] - analytical[1:]))
0.00041657780840698155
>>> np.argmax(np.abs(s[:len(analytical)-1] - analytical[1:]))
6
Short answer: Write it in C!
Long answer
Using the cookbook about numpy arrays I rewrote the trapezoidal convolution method in C. In order to use the C code one requires three files (https://gist.github.com/1626919)
The C code (performancemodule.c).
The setup file to build the code and make it callable from python (performancemodulesetup.py).
The python file that makes use of the C extension (performancetest.py)
The code should run upon downloading by doing the following
Adjust the include path in performancemodule.c.
Run the following
python performancemodulesetup.py build
python performancetest.py
You may have to copy the library file performancemodule.so or performancemodule.dll into the same directory as performancetest.py.
Results and performance
The results agree neatly with one another as shown below:
The performance of the C method is even better than scipy's convolve method. Running 10k convolutions with array length 50 requires
convolve (seconds, microseconds) 81 349969
scipy.signal.convolve (seconds, microseconds) 1 962599
convolve in C (seconds, microseconds) 0 87024
Thus, the C implementation is about 1000 times faster than the python implementation and a bit more than 20 times as fast as the scipy implementation (admittedly, the scipy implementation is more versatile).
EDIT: This does not solve the original question exactly but is sufficient for my purposes.

Categories