Tricking numpy/python into representing very large and very small numbers - python

I need to compute the integral of the following function within ranges that start as low as -150:
import numpy as np
from scipy.special import ndtr
def my_func(x):
return np.exp(x ** 2) * 2 * ndtr(x * np.sqrt(2))
The problem is that this part of the function
np.exp(x ** 2)
tends toward infinity -- I get inf for values of x less than approximately -26.
And this part of the function
2 * ndtr(x * np.sqrt(2))
which is equivalent to
from scipy.special import erf
1 + erf(x)
tends toward 0.
So, a very, very large number times a very, very small number should give me a reasonably sized number -- but, instead of that, python is giving me nan.
What can I do to circumvent this problem?

I think a combination of #askewchan's solution and scipy.special.log_ndtr will do the trick:
from scipy.special import log_ndtr
_log2 = np.log(2)
_sqrt2 = np.sqrt(2)
def my_func(x):
return np.exp(x ** 2) * 2 * ndtr(x * np.sqrt(2))
def my_func2(x):
return np.exp(x * x + _log2 + log_ndtr(x * _sqrt2))
print(my_func(-150))
# nan
print(my_func2(-150)
# 0.0037611803122451198
For x <= -20, log_ndtr(x) uses a Taylor series expansion of the error function to iteratively compute the log CDF directly, which is much more numerically stable than simply taking log(ndtr(x)).
Update
As you mentioned in the comments, the exp can also overflow if x is sufficiently large. Whilst you could work around this using mpmath.exp, a simpler and faster method is to cast up to a np.longdouble which, on my machine, can represent values up to 1.189731495357231765e+4932:
import mpmath
def my_func3(x):
return mpmath.exp(x * x + _log2 + log_ndtr(x * _sqrt2))
def my_func4(x):
return np.exp(np.float128(x * x + _log2 + log_ndtr(x * _sqrt2)))
print(my_func2(50))
# inf
print(my_func3(50))
# mpf('1.0895188633566085e+1086')
print(my_func4(50))
# 1.0895188633566084842e+1086
%timeit my_func3(50)
# The slowest run took 8.01 times longer than the fastest. This could mean that
# an intermediate result is being cached 100000 loops, best of 3: 15.5 µs per
# loop
%timeit my_func4(50)
# The slowest run took 11.11 times longer than the fastest. This could mean
# that an intermediate result is being cached 100000 loops, best of 3: 2.9 µs
# per loop

There already is such a function: erfcx. I think erfcx(-x) should give you the integrand you want (note that 1+erf(x)=erfc(-x)).

Not sure how helpful will this be, but here are a couple of thoughts that are too long for a comment.
You need to calculate the integral of , which you correctly identified would be . Opening the brackets you can integrate both parts of the summation.
Scipy has this imaginary error function implemented
The second part is harder:
This is a generalized hypergeometric function. Sadly it looks like scipy does not have an implementation of it, but this package claims it does.
Here I used indefinite integrals without constants, knowing the from to values it is clear how to use definite ones.

Related

Is it possible to improve python performance for this code?

I have a simple code that:
Read a trajectory file that can be seen as a list of 2D arrays (list of positions in space) stored in Y
I then want to compute for each pair (scipy.pdist style) the RMSD
My code works fine:
trajectory = read("test.lammpstrj", index="::")
m = len(trajectory)
#.get_positions() return a 2d numpy array
Y = np.array([snapshot.get_positions() for snapshot in trajectory])
b = [np.sqrt(((((Y[i]- Y[j])**2))*3).mean()) for i in range(m) for j in range(i + 1, m)]
This code execute in 0.86 seconds using python3.10, using Julia1.8 the same kind of code execute in 0.46 seconds
I plan to have trajectory much larger (~ 200,000 elements), would it be possible to get a speed-up using python or should I stick to Julia?
You've mentioned that snapshot.get_positions() returns some 2D array, suppose of shape (p, q). So I expect that Y is a 3D array with some shape (m, p, q), where m is the number of snapshots in the trajectory. You also expect m to scale rather high.
Let's see a basic way to speed up the distance calculation, on the setting m=1000:
import numpy as np
# dummy inputs
m = 1000
p, q = 4, 5
Y = np.random.randn(m, p, q)
# your current method
def foo():
return [np.sqrt(((((Y[i]- Y[j])**2))*3).mean()) for i in range(m) for j in range(i + 1, m)]
# vectorized approach -> compute the upper triangle of the pairwise distance matrix
def bar():
u, v = np.triu_indices(Y.shape[0], 1)
return np.sqrt((3 * (Y[u] - Y[v]) ** 2).mean(axis=(-1, -2)))
# Check for correctness
out_1 = foo()
out_2 = bar()
print(np.allclose(out_1, out_2))
# True
If we test the time required:
%timeit -n 10 -r 3 foo()
# 3.16 s ± 50.3 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)
The first method is really slow, it takes over 3 seconds for this calculation. Let's check the second method:
%timeit -n 10 -r 3 bar()
# 97.5 ms ± 405 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)
So we have a ~30x speedup here, which would make your large calculation in python much more feasible than using the original code. Feel free to test out with other sizes of Y to see how it scales compared to the original.
JIT
In addition, you can also try out JIT, mainly jax or numba. It is fairly simple to port the function bar with jax.numpy, for example:
import jax
import jax.numpy as jnp
#jax.jit
def jit_bar(Y):
u, v = jnp.triu_indices(Y.shape[0], 1)
return jnp.sqrt((3 * (Y[u] - Y[v]) ** 2).mean(axis=(-1, -2)))
# check for correctness
print(np.allclose(bar(), jit_bar(Y)))
# True
If we test the time of the jitted jnp op:
%timeit -n 10 -r 3 jit_bar(Y)
# 10.6 ms ± 678 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)
So compared to the original, we could reach even up to ~300x speed.
Note that not every operation can be converted to jax/jit so easily (this particular problem is conveniently suitable), so the general advice is to simply avoid python loops and use numpy's broadcasting/vectorization capabilities, like in bar().
Stick to Julia.
If you already made it in a language which runs faster, why are you trying to use python in the first place?
Your question is about speeding up Python, relative to Julia, so I'd like to offer some Julia code for comparison.
Since your data is most naturally expressed as a list of 4x5 arrays, I suggest expressing it as a vector of SMatrixes:
sumdiff2(A, B) = sum((A[i] - B[i])^2 for i in eachindex(A, B))
function dists(Y)
M = length(Y)
V = Vector{float(eltype(eltype(Y)))}(undef, sum(1:M-1))
Threads.#threads for i in eachindex(Y)
ii = sum(M-i+1:M-1) # don't worry about this sum
for j in i+1:lastindex(Y)
ind = ii + (j-i)
V[ind] = sqrt(3 * sumdiff2(Y[i], Y[j])/length(Y[i]))
end
end
return V
end
using Random: randn
using StaticArrays: SMatrix
Ys = [randn(SMatrix{4,5,Float64}) for _ in 1:1000];
Benchmarks:
# single-threaded
julia> using BenchmarkTools
julia> #btime dists($Ys);
6.561 ms (2 allocations: 3.81 MiB)
# multi-threaded with 6 cores
julia> #btime dists($Ys);
1.606 ms (75 allocations: 3.82 MiB)
I was not able to install jax on my computer, but when comparing with #Mercury's numpy code I got
foo: 5.5seconds
bar: 179ms
i.e. approximately 3400x speedup over foo.
It is possible to write this as a one-liner at a ~2-3x performance cost.
While Python tends to be slower than Julia for many tasks, it is possible to write numerical codes as fast as Julia in Python using Numba and plain loops. Indeed, Numba is based on LLVM-Lite which is basically a JIT-compiler based on the LLVM toolchain. The standard implementation of Julia also use a JIT and the LLVM toolchain. This means the two should behave pretty closely besides the overhead introduced by the languages that are negligible once the computation is performed in parallel (because the resulting computation will be memory-bound on nearly all modern platforms).
This computation can be parallelized in both Julia and Python (still using Numba). While writing a sequential computation is quite straightforward, writing a parallel computation is if bit more complex. Indeed, computing the upper triangular values can result in an imbalanced workload and so to a sub-optimal execution time. An efficient strategy is to compute, for each iteration, a pair of lines: one comes from the top of the upper triangular part and one comes from the bottom. The top line contains m-i items while the bottom one contains i+1 items. In the end, there is m+1 items to compute per iteration so the number of item is independent of the iteration number. This results in a much better load-balancing. The line of the middle needs to be computed separately regarding the size of the input array.
Here is the final implementation:
import numba as nb
import numpy as np
#nb.njit(inline='always', fastmath=True)
def compute_line(tmp, res, i, m):
offset = (i * (2 * m - i - 1)) // 2
factor = 3.0 / n
for j in range(i + 1, m):
s = 0.0
for k in range(n):
s += (tmp[i, k] - tmp[j, k]) ** 2
res[offset] = np.sqrt(s * factor)
offset += 1
return res
#nb.njit('()', parallel=True, fastmath=True)
def fastest():
m, n = Y.shape[0], Y.shape[1] * Y.shape[2]
res = np.empty(m*(m-1)//2)
tmp = Y.reshape(m, n)
for i in nb.prange(m//2):
compute_line(tmp, res, i, m)
compute_line(tmp, res, m-i-1, m)
if m % 2 == 1:
compute_line(tmp, res, (m+1)//2, m)
return res
# [...] same as others
%timeit -n 100 fastest()
Results
Here are performance results on my machine (with a i5-9600KF having 6 cores):
foo (seq, Python, Mercury): 4910.7 ms
bar (seq, Python, Mercury): 134.2 ms
jit_bar (seq, Python, Mercury): ???
dists (seq, Julia, DNF) 6.9 ms
dists (par, Julia, DNF) 2.2 ms
fastest (par, Python, me): 1.5 ms <-----
(Jax does not work on my machine so I cannot test it yet)
This implementation is the fastest one and succeed to beat the best Julia code so far.
Optimal implementation
Note that for large arrays like (200_000,4,5), all implementations provided so far are inefficient since they are not cache friendly. Indeed, the input array will take 32 MiB and will not for on the cache of most modern processors (and even if it could, one need to consider the space needed for the output and the fact that caches are not perfect). This can be fixed using tiling, at the expense of an even more complex code. I think such an implementation should be optimal if you use Z-order curves.

How does sympy simplify ln((exp(x)+1)/exp(x)) to log(1+exp(-x))?

If I use simplify() function in sympy, log((exp(x)+1)/exp(x)) does simplify to log(1+exp(-x)), however, as I read the doc, the simplify function is "can be unnecessarily slow", I tried other simplification methods, but non of them works, so I'm wondering how do I simplify ln((exp(x)+1)/exp(x)) to the form like this log(1+exp(-x)) without calling simplify().
The exact function you want to use depends on the general form of the expressions you are dealing with. cancel apparently works, but perhaps only by accident. In general, cancel cancels common factors from a numerator and denominator, like cancel((x**2 - 1)/(x - 1)) -> x + 1. I think it is only working here because it represents the expression in terms of exp(-x). If it instead used exp(x), it wouldn't simplify, because (x + 1)/x doesn't have any common factors. This might be why you are seeing different results from cancel in different versions. See this issue for more information.
For this expression, I would use expand() (or the more targeted expand_mul). expand will distribute the denominator over the numerator, i.e., (exp(x) + 1)/exp(x) will become exp(x)/exp(x) + 1/exp(x). SymPy then automatically cancels exp(x)/exp(x) into 1 and converts 1/exp(x) into exp(-x) (they are internally both represented the same way).
In [1]: log((exp(x)+1)/exp(x)).expand()
Out[1]:
⎛ -x⎞
log⎝1 + ℯ ⎠
There's a guide on some of the simplification functions in the tutorial.
You can more directly just use sympy.polys.polytools.cancel(), which is available as a method on your expression with .cancel().
>>> from sympy.abc import x
>>> from sympy import *
>>> my_expr = log((exp(x)+1)/exp(x))
>>> my_expr.cancel()
log(1 + exp(-x))
This is what is doing the work of simplifying your expression inside simplify().
A very naive benchmark:
>>> import timeit
>>> %timeit my_expr.simplify()
100 loops, best of 3: 7.78 ms per loop
>>> %timeit my_expr.cancel()
1000 loops, best of 3: 972 µs per loop
Edit: This isn't a stable solution, and I would advise that you take a look at asmeurer's answer where he suggests using expand().

Speed up function evaluation for integration in scipy

I am trying to port code from Matlab to SciPy. Here is the simplified version of the code I have written so far: https://gist.github.com/atmo/01b6e007be9ef90e402c . However, Python version is considerably slower then Matlab. I've included profiling results in the gist and they show that almost 90% of time python spends evaluating function f. Is there any way to speed up its evaluation, except from rewriting it in C or Cython?
As I mentioned in the comments, you can get rid of about half the calls to quad (and consequently the complicated function f) if you take into account that the matrix is symmetric.
Further speed gains, still in pure python, are to be had by rewriting that complicated function. I did most of that in sympy.
Finally I tried to vectorize the call to quad using np.vectorize.
from scipy.integrate import quad
from scipy.special import jn as besselj
from scipy import exp, zeros, linspace
from scipy.linalg import norm
import numpy as np
def complicated_func(lmbd, a, n, k):
u,v,w = 5, 3, 2
x = a*lmbd
fac = exp(2*x)
comm = (2*w + x)
part1 = ((v**2 + 4*w*(w + 2*x) + 2*x*(x - 1))*fac**5
+ 2*u*fac**4
+ (-v**2 - 4*(w*(3*w + 4*x + 1) + x*(x-2)) + 1)*fac**3
+ (-8*(w + x) + 2)*fac**2
+ (2*comm*(comm + 1) - 1)*fac)
return part1/lmbd *besselj(n+1, lmbd) * besselj(k+1, lmbd)
def perform_quad(n, k, a):
return quad(complicated_func, 0, np.inf, args=(a,n,k))[0]
def improved_main():
sz = 20
amatrix = np.zeros((sz,sz))
ls = -np.linspace(1, 10, 20)/2
inds = np.tril_indices(sz)
myv3 = np.vectorize(perform_quad)
res = myv3(inds[0], inds[1], ls.reshape(-1,1))
results = np.empty(res.shape[0])
for rowind, row in enumerate(res):
amatrix[inds] = row
symm_matrix = amatrix + amatrix.T - np.diag(amatrix.diagonal())
results[rowind] = norm(symm_matrix)
return results
Timing results show me a speed increase of a factor 5 (you'll forgive me if I only ran it once, it takes long enough as it is):
In [11]: %timeit -n1 -r1 improved_main()
1 loops, best of 1: 6.92 s per loop
In [12]: %timeit -n1 -r1 main()
1 loops, best of 1: 35.9 s per loop
There was also a microgain to be had if you replaced v immediately by its square, because that's the only time it is used in that complicated function: as its square.
There's also an extreme amount of repetition in the calls to besselj, but I don't see how to avoid that, because quad will determine lmbd, so you can't easily precompute those values and then perform a lookup.
If you profile the improved_main, you'll see that the amount of calls to complicated_func has nearly decreased by a factor of 2 (the diagonal still needs to be computed). All the other speed gains can be attributed to np.vectorize and the improvements to complicated_func.
I don't have Matlab on my system, so I can't make any statements for its speed gain if you improve the complicated function there.
Your numpy version probably is comparable in to speed to older MATLAB runs. But new MATLAB versions do various forms of just-in-time compilation that speed up repeated calculations considerably.
My guess is that you can nibble away at the lambda and f code, and maybe cut their evaluation times in half. But the real killer is that you are calling f so many times.
For a start I'd try to precalculate things in f. For example define K1=K[1] and use K1 in the calculations. That will reduce the number of indexing calls. Are of the exponentials repeated? Maybe replace the lambda definition with a regular def, or combine it with f.

Log-computations in Python

I'm looking to compute something like:
Where f(i) is a function that returns a real number in [-1,1] for any i in {1,2,...,5000}.
Obviously, the result of the sum is somewhere in [-1,1], but when I can't seem to be able to compute it in Python using straight forward coding, as 0.55000 becomes 0 and comb(5000,2000) becomes inf, which result in the computed sum turning into NaN.
The required solution is to use log on both sides.
That is using the identity a × b = 2log(a) + log(b), if I could compute log(a) and log(b) I could compute the sum, even if a is big and b is almost 0.
So I guess what I'm asking is if there's an easy way of computing
log2(scipy.misc.comb(5000,2000))
So I could compute my sum simply by
sum([2**(log2comb(5000,i)-5000) * f(i) for i in range(1,5000) ])
#abarnert's solution, while working for the 5000 figure addresses the problem by increasing the precision in which the comb is computed. This works for this example, but doesn't scale, as the memory required would significantly increase if instead of 5000 we had 1e7 for example.
Currently, I'm using a workaround which is ugly, but keeps memory consumption low:
log2(comb(5000,2000)) = sum([log2 (x) for x in 1:5000])-sum([log2 (x) for x in 1:2000])-sum([log2 (x) for x in 1:3000])
Is there a way of doing so in a readable expression?
The sum
is the expectation of f with respect to a binomial distribution with n = 5000 and p = 0.5.
You can compute this with scipy.stats.binom.expect:
import scipy.stats as stats
def f(i):
return i
n, p = 5000, 0.5
print(stats.binom.expect(f, (n, p), lb=0, ub=n))
# 2499.99999997
Also note that as n goes to infinity, with p fixed, the binomial distribution approaches the normal distribution with mean np and variance np*(1-p). Therefore, for large n you can instead compute:
import math
print(stats.norm.expect(f, loc=n*p, scale=math.sqrt((n*p*(1-p))), lb=0, ub=n))
# 2500.0
EDIT: #unutbu has answered the real question, but I'll leave this here in case log2comb(n, k) is useful to anyone.
comb(n, k) is n! / ((n-k)! k!), and n! can be computed using the Gamma function gamma(n+1). Scipy provides the function scipy.special.gamma. Scipy also provides gammaln, which is the log (natural log, that is) of the Gamma function.
So log(comb(n, k)) can be computed as gammaln(n+1) - gammaln(n-k+1) - gammaln(k+1)
For example, log(comb(100, 8)) (after executing from scipy.special import gammaln):
In [26]: log(comb(100, 8))
Out[26]: 25.949484949043022
In [27]: gammaln(101) - gammaln(93) - gammaln(9)
Out[27]: 25.949484949042962
and log(comb(5000, 2000)):
In [28]: log(comb(5000, 2000)) # Overflow!
Out[28]: inf
In [29]: gammaln(5001) - gammaln(3001) - gammaln(2001)
Out[29]: 3360.5943053174142
(Of course, to get the base-2 logarithm, just divide by log(2).)
For convenience, you can define:
from math import log
from scipy.special import gammaln
def log2comb(n, k):
return (gammaln(n+1) - gammaln(n-k+1) - gammaln(k+1)) / log(2)
By default, comb gives you a float64, which overflows and gives you inf.
But if you pass exact=True, it gives you a Python variable-sized int instead, which can't overflow (unless you get so ridiculously huge you run out of memory).
And, while you can't use np.log2 on an int, you can use Python's math.log2.
So:
math.log2(scipy.misc.comb(5000, 2000, exact=True))
As an alternative, you relative that n choose k is defined as n!k / k!, right? You can reduce that to ∏(i=1...k)((n+1-i)/i), which is simple to compute.
Or, if you want to avoid overflow, you can do it by alternating * (n-i) and / (k-i).
Which, of course, you can also reduce to adding and subtracting logs. I think looping in Python and computing 4000 logarithms is going to be slower than looping in C and computing 4000 multiplications, but we can always vectorize it, and then, it might be faster. Let's write it and test:
In [1327]: n, k = 5000, 2000
In [1328]: %timeit math.log2(scipy.misc.comb(5000, 2000, exact=True))
100 loops, best of 3: 1.6 ms per loop
In [1329]: %timeit np.log2(np.arange(n-k+1, n+1)).sum() - np.log2(np.arange(1, k+1)).sum()
10000 loops, best of 3: 91.1 µs per loop
Of course if you're more concerned with memory instead of time… well, this obviously makes it worse. We've got 2000 8-byte floats instead of one 608-byte integer at a time. And if you go up to 100000, 20000, you get 20000 8-byte floats instead of one 9K integer. And at 1000000, 200000, it's 200000 8-byte floats vs. one 720K integer.
I'm not sure why either way is a problem for you. Especially given that you're using a listcomp instead of a genexpr, and therefore creating an unnecessary list of 5000, 100000, or 1000000 Python floats—24MB is not a problem, but 720K is? But if it is, we can obviously just do the same thing iteratively, at the cost of some speed:
r = sum(math.log2(n-i) - math.log2(k-i) for i in range(n-k))
This isn't too much slower than the scipy solution, and it never uses more than a small constant number of bytes (a handful of Python floats). (Unless you're on Python 2, in which case… just use xrange instead of range and it's back to constant.)
As a side note, why are you using a list comprehension instead of an NumPy array with vectorized operations (for speed, and also a bit of compactness) or a generator expression instead of a list comprehension (for no memory usage at all, at no cost to speed)?

solving colebrook (nonlinear) equation in python

I want to do in python what this guy did in MATLAB.
I have installed anaconda, so i have numpy and sympy libraries. So far I have tried with numpy nsolve, but that doesn't work. I should say I'm new with python, and also that I konw how to do it in MATLAB :P.
The equation:
-2*log(( 2.51/(331428*sqrt(x)) ) + ( 0.0002 /(3.71*0.26)) ) = 1/sqrt(x)
Normally, I would solve this iteratively, simply guessing x on the left and than solving for the x on the right. Put solution on the left, solve again. Repeat until left x is close to right. I have an idea what solution should be.
So I could do that, but that's not very cool. I want to do it numerically.
My 15€ Casio calculator can solve it as is, so I think it shouldn't be to complicated?
Thank you for your help,
edit: so I have tried the following:
from scipy.optimize import brentq
w=10;
d=0.22;
rho=1.18;
ni=18.2e-6;
Re=(w*d*rho)/ni
k=0.2e-3;
d=0.26;
def f(x,Re,k,d):
return (
-2*log((2.51/(Re*sqrt(x)))+(k/(3.71*d)),10)*sqrt(x)+1
);
print(
scipy.optimize.brentq
(
f,0.0,1.0,xtol=4.44e-12,maxiter=100,args=(),full_output=True,disp=True
)
);
And i get this result:
r = _zeros._brentq(f,a,b,xtol,maxiter,args,full_output,disp)
TypeError: f() takes exactly 4 arguments (1 given)
Is it because I'm solving also solving for constants?
edit2:
so I think I have to assign constants via args=() keyword, so I changed:
f,0.0,1.0,xtol=4.44e-12,maxiter=100,args=(Re,k,d),full_output=True,disp=True
but now I get this:
-2*log((2.51/(Re*sqrt(x)))+(k/(3.71*d)),10)*sqrt(x)+1
TypeError: return arrays must be of ArrayType
Anyway, when I put in a different equation; lets say 2*x*Re+(k*d)/(x+5) it works, so I guess I have to transform the equation.
so it dies here: log(x,10)..
edit4: correct syntax is log10(x)... Now it works but I get zero as a result
This works fine. I've done a few things here. First, I've used a simpler definition of the function using the global variables you've defined anyway. I find this a little nicer than passing the args= to the solvers, it also enables easier use of your own custom solvers if you ever need something like that. I've used the generic root function as an entry point rather than using a particular algorithm - this is nice because you can simply pass a different method later. I've also fixed up your spacing to be as recommended by PEP 8 and fixed your erronious rewriting of the equation. I find it more intuitive simply to write LHS - RHS rather than manipulate as you did. Also, notice that I've replaced all the integer literals with 1.0 or whatever to avoid problems with integer division. 0.02 is regarded as a pretty standard starting point for the friction factor.
import numpy
from scipy.optimize import root
w = 10.0
d = 0.22
rho = 1.18
ni = 18.2e-6
Re = w*d*rho/ni
k = 0.2e-3
def f(x):
return (-2*numpy.log10((2.51/(Re*numpy.sqrt(x))) + (k/(3.71*d))) - 1.0/numpy.sqrt(x))
print root(f, 0.02)
I must also mention that fixed point iteration is actually faster than even Newton's method for this problem. You can use the built-in fixed point iteration routine by defining f2 as follows:
def f2(x):
LHS = -2*numpy.log10((2.51/(Re*numpy.sqrt(x))) + (k/(3.71*d)))
return 1/LHS**2
Timings (starting further from the root to show speed of convergence):
%timeit root(f, 0.2)
1000 loops, best of 3: 428 µs per loop
%timeit fixed_point(f2, 0.2)
10000 loops, best of 3: 148 µs per loop
Your tags are a little off: you're tagging it as sympy which is a library for symbolic computations, but say that you want to solve it numerically. In case the latter is your actual intention, here are relevant scipy docs:
http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#root-finding
Scipy with fixed_point is to be preferred also because the root does not converge for guess values far away, like the 0.2 in the #chthonicdaemon %timeit exempla.

Categories