Can I use libraries like numpy and scipy in fortran? - python

Im trying to speed up my python code by porting a bunch of my nested loops over to fortran and calling them as subroutines.
But alot of my loops call numpy, and special functions from scipy like bessel functions.
Before I try and use fortran I was wondering if it was possible to import scipy and numpy to my fortran subroutine and call the modules for bessel functions?
Else would I have to create the bessel function in fortran in order to use it?
Ideally, I would create some sort of subroutine that would optimize this code below. This is just a snippet of my entire project to give you an idea of what I'm trying to accomplish.
I understand that there are other practices I should implement to improve the speed, but for now I was investigating the benefits of calling fortran subroutines in my main python program.
for m in range(self.MaxNum_Eigen):
#looping throught the eigenvalues for the given maximum number of eigenvalues allotted
bm = self.beta[m]
#not sure
#*note: rprime = r. BUT tprime ~= t.
#K is a list of 31 elements for this particular case
K = (bm / math.sqrt( (self.H2**2) + (bm**2) ))*(math.sqrt(2) / self.b)*((scipy.special.jv(0, bm * self.r))/ (scipy.special.jv(0, bm * self.b))) # Kernel, K0(bm, r).
#initial condition
F = [37] * (self.n1)
# Integral transform of the initial condition
#Fbar = (np.trapz(self.r,self.r*K*F))
'''
matlab syntax trapz(X,Y), x ethier spacing or vector
matlab: trapz(r,r.*K.*F) trapz(X,Y)
python: np.trapz(self.r*K*F, self.r) trapz(Y,X)
'''
#*(np.trapz(self.r,self.r*K*F))
Fbar = np.ones((self.n1,self.n2))*(np.trapz(self.r*K*F, self.r))
#steady state condition: integral is in steady state
SS = np.zeros((sz[0],sz[1]))
coeff = 5000000*math.exp(-(10**3)) #defining value outside of loop with higher precision
for i in range(sz[0]):
for j in range(sz[1]):
'''
matlab reshape(Array, size1, size2) takes multiple arguments the item its resizeing and the new desired shape
create self variables and so we are not re-initializing them over and over agaian?
using generators? How to use generators
'''
s = np.reshape(tau[i,j,:],(1,n3))
# will be used for rprime and tprime in Ozisik solution.
[RR,TT] = np.meshgrid(self.r,s)
'''
##### ERROR DUE TO ROUNDING OF HEAT SOURCE ####
error in rounding 5000000*math.exp(-(10**3)) becomes zero
#log10(e−10000)=−10000∗(0.4342944819)=−4342.944819
#e−1000=10−4342.944819=10−4343100.05518=1.13548386531×10−4343
'''
#g = 5000000*math.exp(-(10**3)) #*(RR - self.c*TT)**2) #[W / m^2] heat source.
g = coeff * (RR - self.c*TT)**2
K = (bm/math.sqrt(self.H2**2 + bm**2))*(math.sqrt(2)/self.b)*((scipy.special.jv(0,bm*RR))/(scipy.special.jv(0,bm*self.b)))
#integral transform of heat source
gbar = np.trapz(RR*K*g, self.r, 2) #trapz(Y,X,dx (spacing) )
gbar = gbar.transpose()
#boundary condition. BE SURE TO WRITE IN TERMS OF s!!!
f2 = self.h2 * 37
A = (self.alpha/self.k)*gbar + ((self.alpha*self.b)/self.k2)*((bm/math.sqrt(self.H2**2 + bm**2))*(math.sqrt(2)/self.b)*((scipy.special.jv(0,bm*self.b))/(scipy.special.jv(0,bm*self.b))))*f2
#A is essentially a constant is this correct all the time?
#What does A represent
SS[i, j] = np.trapz(np.exp( (-self.alpha*bm**2)*(T[i,j] - s) )*A, s)
#INSIDE M loop
K = (bm / math.sqrt((self.H2 ** 2) + (bm ** 2)))*(math.sqrt(2) /self.b)*((scipy.special.jv(0, bm * R))/ (scipy.special.jv(0, bm * self.b)))
U[:,:, m] = np.exp(-self.alpha * bm ** 2 * T)* K* Fbar + K* SS
#print(['Eigenvalue ' num2str(m) ', found at time ' num2str(toc) ' seconds'])

Compilation of answers given in the comments
Answers specific to my code:
As vorticity mentioned my code in itself was not using the numpy, and scipy packages to the fullest extent.
In regards to Bessel, function 'royvib' mentions using using .jo from scipy rather than .jv. Calling the special Bessel function jv. is much more computationally expensive, especially since I knew that I would be using a zeroth order bessel function for many of my declarations the minor change from jv -> j0 solved speed up the process.
In addition, I declared variables outside the loop to prevent expensive calls to searching for my appropriate functions. Example below.
Before
for i in range(SomeLength):
some_var = scipy.special.jv(1,coeff)
After
Bessel = scipy.special.jv
for i in range(SomeLength):
some_var = Bessel(1,coeff)
Storing the function saved time by not using the dot ('.') the command to look through the libraries every single loop. However keep in mind this does make python less readable, which is the main reason I choose to do this project in python. I do not have an exact amount of time this step cut from my process.
Fortran specific:
Since I was able to improve my python code I did not go this route an lack of specifics, but the general answer as stated by 'High Performance Mark' is that yes there are libraries that have been made to handle Bessel functions in Fortran.
If I do port my code over to Fortran or use f2py to mix Fortran and python I will update this answer accordingly.

Related

Using NEOS as a Pyomo solver

I have recently started in doing some OR, and have been trying to use Pyomo and NEOS to do some optimation problems. I have been following along with one of the UT Austin Pyomo lectures, and when my GLPT was being difficult to be installed, I moved on to NEOS. I am having some difficulty in now receiving a solved answer from NEOS.
What I have so far is this:
from pyomo import environ as pe
import os
os.environ['NEOS_EMAIL'] = 'my registered email'
model = pe.ConcreteModel()
model.x1 = pe.Var(domain=pe.Binary)
model.x2 = pe.Var(domain=pe.Binary)
model.x3 = pe.Var(domain=pe.Binary)
model.x4 = pe.Var(domain=pe.Binary)
model.x5 = pe.Var(domain=pe.Binary)
obj_expr = 3 * model.x1 + 4 * model.x2 + 5 * model.x3 + 8 * model.x4 + 9 * model.x5
model.obj = pe.Objective(sense=pe.maximize, expr=obj_expr)
con_expr = 2 * model.x1 + 3 * model.x2 + 4 * model.x3 + 5 * model.x4 + 9 * model.x5 <= 20
model.con = pe.Constraint(expr=con_expr)
solver_manager = pe.SolverManagerFactory('neos')
results = solver_manager.solve(model, solver = "minos")
print(results)
What I receive in return is number of solutions = 0, while I know for a fact that one exits. I also see that I don't have any bounds set, so how would I go about doing that? Once again, I am very new to this, and have not been able to find any sort of documentation regarding this elsewhere, or perhaps I just don't know how to look.
Thanks for any help!
This is a "problem" with the design of the current results object. For historical reasons, that field reports the number of solutions contained in the results object and is not the number of solutions generated by the solver. By default, Pyomo solvers directly load the solution returned by the solver into the original model (both for convenience and efficiency) and do not return it in the results object. You can change that behavior by providing load_solutions=False to the solve() call.
As for the bounds, what bounds are you referring to? Variable bounds are set using either the bounds= argument to the Var() declaration, or the domain= argument. For your example, because the variables are declared to be Binary, they all have bounds of [0..1]. Bounds on the objective are gathered by parsing the solver output. This is dependent on bother the solver that you are using (many do not report bounds information), and the interface used to parse the solver results.
As a final note, you are sending a MIP problem to a LP/NLP solver (minos). You will get fractional valies for your binary variables back from the solver.
To retrieve the solution from the model, you can use something like:
print(model.x1.value, model.x2.value, model.x3.value, model.x4.value, model.x5.value)
And using solver="cbc" you can avoid fractional values in this example.

python dblquad invalid callable

I am trying to approximate the Gauss Linking integral for two straight lines in R^3 using dblquad. I've created this pair of lines as an object.
I have a form for the integrand in parametrisation variables s and t generated by a function gaussint(self,s,t) and this is working. I'm then just trying to define a function which returns the double integral over the two intervals [0,1].
Edit - the code for the function looks like this:
def gaussint(self,s,t):
formnum = self.newlens()[0]*self.newlens()[1]*np.sin(test.angle())*np.cos(test.angle())
formdenone = (np.cos(test.angle())**2)*(t*(self.newlens()[0]) - s*(self.newlens()[1]) + self.adists()[0] - self.adists()[1])**2
formdentwo = (np.sin(test.angle())**2)*(t*(self.newlens()[0]) + s*(self.newlens()[1]) + self.adists()[0] + self.adists()[1])**2
fullden = (4 + formdenone + formdentwo)**(3/2)
fullform = formnum/fullden
return fullform
The various other function calls here are just bits of linear algebra - lengths of lines, angle between them and so forth. s and t have been defined as symbols upstream, if they need to be.
The code for the integration then just looks like this (I've separated it out just to try and work out what was going on:
def approxint(self, s, t):
from scipy.integrate import dblquad
return dblquad(self.gaussint(s,t),0,1, lambda t:0,lambda t:1)
Running it gets me a lengthy bit of somewhat impenetrable process messages, followed by
ValueError: invalid callable given
Any idea where I'm going wrong?
Cheers.

How to monitor the process of SciPy.odeint?

SciPy can solve ode equations by scipy.integrate.odeint or other packages, but it gives result after the function has been solved completely. However, if the ode function is very complex, the program will take a lot of time(one or two days) to give the whole result. So how can I mointor the step it solve the equations(print out result when the equation hasn't been solved completely)?
When I was googling for an answer, I couldn't find a satisfactory one. So I made a simple gist with a proof-of-concept solution using the tqdm project. Hope that helps you.
Edit: Moderators asked me to give an explanation of what is going on in the link above.
First of all, I am using scipy's OOP version of odeint (solve_ivp) but you could adapt it back to odeint. Say you want to integrate from time T0 to T1 and you want to show progress for every 0.1% of progress. You can modify your ode function to take two extra parameters, a pbar (progress bar) and a state (current state of integration). Like so:
def fun(t, y, omega, pbar, state):
# state is a list containing last updated time t:
# state = [last_t, dt]
# I used a list because its values can be carried between function
# calls throughout the ODE integration
last_t, dt = state
# let's subdivide t_span into 1000 parts
# call update(n) here where n = (t - last_t) / dt
time.sleep(0.1)
n = int((t - last_t)/dt)
pbar.update(n)
# we need this to take into account that n is a rounded number.
state[0] = last_t + dt * n
# YOUR CODE HERE
dydt = 1j * y * omega
return dydt
This is necessary because the function itself must know where it is located, but scipy's odeint doesn't really give this context to the function. Then, you can integrate fun with the following code:
T0 = 0
T1 = 1
t_span = (T0, T1)
omega = 20
y0 = np.array([1], dtype=np.complex)
t_eval = np.arange(*t_span, 0.25/omega)
with tqdm(total=1000, unit="‰") as pbar:
sol = solve_ivp(
fun,
t_span,
y0,
t_eval=t_eval,
args=[omega, pbar, [T0, (T1-T0)/1000]],
)
Note that anything mutable (like a list) in the args is instantiated once and can be changed from within the function. I recommend doing this rather than using a global variable.
This will show a progress bar that looks like this:
100%|█████████▉| 999/1000 [00:13<00:00, 71.69‰/s]
You could split the integration domain and integrate the segments, taking the last value of the previous as initial condition of the next segment. In-between, print out whatever you want. Use numpy.concatenate to assemble the pieces if necessary.
In a standard example of a 3-body solar system simulation, replacing the code
u0 = solsys.getState0();
t = np.arange(0, 100*365.242*day, 0.5*day);
%timeit u_res = odeint(lambda u,t: solsys.getDerivs(u), u0, t, atol = 1e11*1e-8, rtol = 1e-9)
output: 1 loop, best of 3: 5.53 s per loop
with a progress-reporting code
def progressive(t,N):
nk = [ int(n+0.5) for n in np.linspace(0,len(t),N+1) ]
u0 = solsys.getState0();
u_seg = [ np.array([u0]) ];
for k in range(N):
u_seg.append( odeint(lambda u,t: solsys.getDerivs(u), u0, t[nk[k]:nk[k+1]], atol = 1e11*1e-8, rtol = 1e-9)[1:] )
print t[nk[k]]/day
for b in solsys.bodies: print("%10s %s"%(b.name,b.x))
return np.concatenate(u_seg)
%timeit u_res = progressive(t,20)
output: 1 loop, best of 3: 5.96 s per loop
shows only a slight 8% overhead for console printing. With a more substantive ODE function, the fraction of the reporting overhead will reduce significantly.
That said, python, at least with its standard packages, is not the tool for industrial-scale number-crunching. Always use compiled versions with strong typing of variables to reduce interpretative overhead as much as possible.
Use some heavily developed and tested package like Sundials or the julia-lang framework differentialequations.jl directly coding the ODE function in an appropriate compiled language. Use the higher-order methods for larger step sizes, thus smaller steps. Test if using implicit or exponential/Rosenbrock methods reduces the number of steps or ODE function evaluations per fixed interval further. The difference can be a factor of 10 to 100 in speedup.
Use a python wrapper of the above with some acceleration-friendly implementation of your ODE function.
Use the quasi-source-translating tool JITcode to translate your python ODE function to a spaghetti list of C instruction that then give a compiled function that can be (almost) directly called from the compiled FORTRAN kernel of odeint.
Simple and Clear.
If you want to integrate an ODE from T0 to T1:
In the last line of the code, before return, you can use print((t/T1)*100,end='')
Then use a sys.stdout.flush() to keep the same line of printing.
Here is an example. My integrating time [0 0.2]
ddt[-2]=(beta/(Ap2*(L-x)))*(-Qgap+Ap*u)
ddt[-1]=(beta/(Ap2*(L+x)))*(Qgap-Ap*u)
print("\rCompletion percentage "+str(format(((t/0.2)*100),".4f")),end='')
sys.stdout.flush()
return ddt
It slows a bit the solving process by fraction of seconds, but it serves perfectly the purpose rather than creating new functions.

rewriting python scipy.integrate.odeint to mimic matlab ode15s

I am new to python, and would like to mimic using the matlab ode15s in python instead of the built-in odeint from scipy.
The code originally is written like this:
newRphi = odeint(PSP,Rphi,t,(b,k,F))[-1,:]
where PSP is defined as:
def PSP(xx,t,b,k,F):
R = xx[0]
phi = xx[1]
Rdot = sum([b[i]*R**(i+1) for i in xrange(len(b))]) + F(t) #indexing from zero
phiDot = 2*pi * k[2]*((R/k[1])**k[0])
yy = hstack((Rdot,phiDot))
return(yy)
from reading the instructions on scipy.integrate.odeint(), this function takes arguments in the following format:
scipy.integrate.odeint(func, y0, t, args=())
which means that func=PSP, y0=Rphi, t=t, args=(b,k,f)
So Rphi goes into PSP function, and gets integrated and becomes yy and comes out, and this function does it repeatedly for every element of t.
Now I want to translate it into something that would mimic ode15s from matlab. From reading some other treads, I found out that I can do that using
ode.set_integrator('vode', method='bdf', order=15)
Now the question becomes, how do I pass the original arguments to this integrator?
I am thinking it would probably look something like this:
ode15s = scipy.integrate.ode(f)
ode15s.set_integrator('vode', method='bdf', order=15)
ode15s.set_initial_value(y0, t0)
I know that the f is my PSP function, y0 is still the same: Rphi,
Here are my questions:
what is my initial value for t0, is it just t[0]?
how do I pass the variables (b,k,f) to the function f=PSP?
when I call this ode15s, how do I integrate through the vector size of t and collect the final values for yy?
Any help would be greatly appreciated. Thank you.

Numpy/Python performing terribly vs. Matlab

Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.
My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.
I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.
Here is the python code for the main loop:
for basecell in range (0, cellnumber-1):
if firstcelltype == np.array((cellrecord[basecell,2])):
xloc=np.array((cellrecord[basecell,0]))
yloc=np.array((cellrecord[basecell,1]))
xedgedist=(xbound-xloc)
yedgedist=(ybound-yloc)
if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist:
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
xcomploc=np.array((cellrecord[comparecell,0]))
ycomploc=np.array((cellrecord[comparecell,1]))
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
Here is the matlab code for the main loop:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
for comparecell = 1:cellnumber;
if secondcelltype==cellrecord(comparecell,3);
xcomploc=cellrecord(comparecell,1);
ycomploc=cellrecord(comparecell,2);
dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
if (dist>=1) && (dist<=100.4999);
arraytarget=round(dist*analysisdist/intervalnumber);
spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
end
end
end
end
end
end
Thanks!
Here are some ways to speed up your python code.
First: Don't make np arrays when you are only storing one value. You do this many times over in your code. For instance,
if firstcelltype == np.array((cellrecord[basecell,2])):
can just be
if firstcelltype == cellrecord[basecell,2]:
I'll show you why with some timeit statements:
>>> timeit.Timer('x = 111.1').timeit()
0.045882196294822819
>>> t=timeit.Timer('x = np.array(111.1)','import numpy as np').timeit()
0.55774970267830071
That's an order of magnitude in difference between those calls.
Second: The following code:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
can be replaced with
arraytarget=round(dist*analysisdist/intervalnumber)-1
spatialraw[arraytarget] += 1
Third: You can get rid of the sqrt as Philip mentioned by squaring analysisdist beforehand. However, since you use analysisdist to get arraytarget, you might want to create a separate variable, analysisdist2 that is the square of analysisdist and use that for your comparison.
Fourth: You are looking for cells that match secondcelltype every time you get to that point rather than finding those one time and using the list over and over again. You could define an array:
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
and then replace
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
with
for comparecell in comparecells:
Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you're using a somewhat recent version. This should speed-up your code a bit.
Sixth: If the code still isn't fast enough after all previous steps, then you should try vectorizing your code. It shouldn't be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here's my try at vectorizing:
basecells = np.where(cellrecord[:,2]==firstcelltype)[0]
xlocs = cellrecord[basecells, 0]
ylocs = cellrecord[basecells, 1]
xedgedists = xbound - xloc
yedgedists = ybound - yloc
whichcells = np.where((xlocs>excludedist) & (xedgedists>excludedist) & (ylocs>excludedist) & (yedgedists>excludedist))[0]
selectedcells = basecells[whichcells]
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
xcomplocs = cellrecords[comparecells,0]
ycomplocs = cellrecords[comparecells,1]
analysisdist2 = analysisdist**2
for basecell in selectedcells:
dists = np.round((xcomplocs-xlocs[basecell])**2 + (ycomplocs-ylocs[basecell])**2)
whichcells = np.where((dists >= 1) & (dists <= analysisdist2))[0]
arraytargets = np.round(dists[whichcells]*analysisdist/intervalnumber) - 1
for target in arraytargets:
spatialraw[target] += 1
You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn't actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make analysisdist/intervalnumber a separate variable to avoid doing that division over and over again.
Not too sure about the slowness of python but you Matlab code can be HIGHLY optimized. Nested for-loops tend to have horrible performance issues. You can replace the inner loop with a vectorized function ... as below:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
% for comparecell = 1:cellnumber;
% if secondcelltype==cellrecord(comparecell,3);
% xcomploc=cellrecord(comparecell,1);
% ycomploc=cellrecord(comparecell,2);
% dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
% if (dist>=1) && (dist<=100.4999);
% arraytarget=round(dist*analysisdist/intervalnumber);
% spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
% end
% end
% end
%replace with:
secondcelltype_mask = secondcelltype == cellrecord(:,3);
xcomploc_vec = cellrecord(secondcelltype_mask ,1);
ycomploc_vec = cellrecord(secondcelltype_mask ,2);
dist_vec = sqrt((xcomploc_vec-xloc)^2+(ycomploc_vec-yloc)^2);
dist_mask = dist>=1 & dist<=100.4999
arraytarget_vec = round(dist_vec(dist_mask)*analysisdist/intervalnumber);
count = accumarray(arraytarget_vec,1, [size(spatialsum,1),1]);
spatialsum(:,1) = spatialsum(:,1)+count;
end
end
end
There may be some small errors in there since I don't have any data to test the code with but it should get ~10X speed up on the Matlab code.
From my experience with numpy I've noticed that swapping out for-loops for vectorized/matrix-based arithmetic has noticeable speed-ups as well. However, without the shapes the shapes of all of your variables its hard to vectorize things.
You can avoid some of the math.sqrt calls by replacing the lines
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
with
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
dist=round(dist)
if dist>=1 and dist<=analysisdist_squared:
arraytarget=round(math.sqrt(dist)*analysisdist/intervalnumber)
where you have the line
analysisdist_squared = analysis_dist * analysis_dist
outside of the main loop of your function.
Since math.sqrt is called in the innermost loop, you should have from math import sqrt at the top of the module and just call the function as sqrt.
I would also try replacing
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
with
dist=(xcomploc-xloc)*(xcomploc-xloc)+(ycomploc-yloc)*(ycomploc-yloc)
There's a chance it will produce faster byte code to do multiplication rather than exponentiation.
I doubt these will get you all the way to MATLABs performance, but they should help reduce some overhead.
If you have a multicore, you could maybe give the multiprocessing module a try and use multiple processes to make use of all the cores.
Instead of sqrt you could use x**0.5, which is, if I remember correct, slightly faster.

Categories