Is there a cythonized version of `norm` methods from scipy.stats? - python

I'm talking about the main public methods for continuous RV in scipy.stats:
specifically,
from scipy.stats import norm
then using
norm.ppf or norm.pdf
Link: http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html
Is there any opportunity for speed optimization on norm.ppf() or norm.pdf() from using cython? Or is it already optimized or not worth wrapping with cython?

Have you examined the code in
.../scipy/stats/distributions.py?
It looks like norm_pdf ends up using
_norm_pdf_C = math.sqrt(2*pi)
_norm_pdf_logC = math.log(_norm_pdf_C)
def _norm_pdf(x):
return exp(-x**2/2.0) / _norm_pdf_C
Since it doesn't involve a loop through numpy arrays it does not look like a prime candidate for cython speedup. Would you write it differently?
Opps, sorry. You are asking about a function like:
def pdf(self,x,*args,**kwds):
args, loc, scale = self._parse_args(*args, **kwds)
x,loc,scale = map(asarray,(x,loc,scale))
args = tuple(map(asarray,args))
x = asarray((x-loc)*1.0/scale)
cond0 = self._argcheck(*args) & (scale > 0)
cond1 = (scale > 0) & (x >= self.a) & (x <= self.b)
cond = cond0 & cond1
output = zeros(shape(cond),'d')
putmask(output,(1-cond0)+np.isnan(x),self.badvalue)
if any(cond):
goodargs = argsreduce(cond, *((x,)+args+(scale,)))
scale, goodargs = goodargs[-1], goodargs[:-1]
place(output,cond,self._pdf(*goodargs) / scale)
if output.ndim == 0:
return output[()]
return output
Besides argument checking and massaging with map, I see some iteration hiding in putmask and place. I haven't used place much, but I think it's iterating on cond, and applying self._pdf, and placing the values in output. I suspect code organization is intended to provide a lot of flexibility, allowing for different models and distributions. I don't see tight code aimed at speed.
For code that would benefit from conversion to cython you probably need to write something from scratch, something that does not call alot of other numpy and scipy code, and does not build an elaborate class structure. Focus on a very specific calculation, not a family of calculations.

Related

Can I use libraries like numpy and scipy in fortran?

Im trying to speed up my python code by porting a bunch of my nested loops over to fortran and calling them as subroutines.
But alot of my loops call numpy, and special functions from scipy like bessel functions.
Before I try and use fortran I was wondering if it was possible to import scipy and numpy to my fortran subroutine and call the modules for bessel functions?
Else would I have to create the bessel function in fortran in order to use it?
Ideally, I would create some sort of subroutine that would optimize this code below. This is just a snippet of my entire project to give you an idea of what I'm trying to accomplish.
I understand that there are other practices I should implement to improve the speed, but for now I was investigating the benefits of calling fortran subroutines in my main python program.
for m in range(self.MaxNum_Eigen):
#looping throught the eigenvalues for the given maximum number of eigenvalues allotted
bm = self.beta[m]
#not sure
#*note: rprime = r. BUT tprime ~= t.
#K is a list of 31 elements for this particular case
K = (bm / math.sqrt( (self.H2**2) + (bm**2) ))*(math.sqrt(2) / self.b)*((scipy.special.jv(0, bm * self.r))/ (scipy.special.jv(0, bm * self.b))) # Kernel, K0(bm, r).
#initial condition
F = [37] * (self.n1)
# Integral transform of the initial condition
#Fbar = (np.trapz(self.r,self.r*K*F))
'''
matlab syntax trapz(X,Y), x ethier spacing or vector
matlab: trapz(r,r.*K.*F) trapz(X,Y)
python: np.trapz(self.r*K*F, self.r) trapz(Y,X)
'''
#*(np.trapz(self.r,self.r*K*F))
Fbar = np.ones((self.n1,self.n2))*(np.trapz(self.r*K*F, self.r))
#steady state condition: integral is in steady state
SS = np.zeros((sz[0],sz[1]))
coeff = 5000000*math.exp(-(10**3)) #defining value outside of loop with higher precision
for i in range(sz[0]):
for j in range(sz[1]):
'''
matlab reshape(Array, size1, size2) takes multiple arguments the item its resizeing and the new desired shape
create self variables and so we are not re-initializing them over and over agaian?
using generators? How to use generators
'''
s = np.reshape(tau[i,j,:],(1,n3))
# will be used for rprime and tprime in Ozisik solution.
[RR,TT] = np.meshgrid(self.r,s)
'''
##### ERROR DUE TO ROUNDING OF HEAT SOURCE ####
error in rounding 5000000*math.exp(-(10**3)) becomes zero
#log10(e−10000)=−10000∗(0.4342944819)=−4342.944819
#e−1000=10−4342.944819=10−4343100.05518=1.13548386531×10−4343
'''
#g = 5000000*math.exp(-(10**3)) #*(RR - self.c*TT)**2) #[W / m^2] heat source.
g = coeff * (RR - self.c*TT)**2
K = (bm/math.sqrt(self.H2**2 + bm**2))*(math.sqrt(2)/self.b)*((scipy.special.jv(0,bm*RR))/(scipy.special.jv(0,bm*self.b)))
#integral transform of heat source
gbar = np.trapz(RR*K*g, self.r, 2) #trapz(Y,X,dx (spacing) )
gbar = gbar.transpose()
#boundary condition. BE SURE TO WRITE IN TERMS OF s!!!
f2 = self.h2 * 37
A = (self.alpha/self.k)*gbar + ((self.alpha*self.b)/self.k2)*((bm/math.sqrt(self.H2**2 + bm**2))*(math.sqrt(2)/self.b)*((scipy.special.jv(0,bm*self.b))/(scipy.special.jv(0,bm*self.b))))*f2
#A is essentially a constant is this correct all the time?
#What does A represent
SS[i, j] = np.trapz(np.exp( (-self.alpha*bm**2)*(T[i,j] - s) )*A, s)
#INSIDE M loop
K = (bm / math.sqrt((self.H2 ** 2) + (bm ** 2)))*(math.sqrt(2) /self.b)*((scipy.special.jv(0, bm * R))/ (scipy.special.jv(0, bm * self.b)))
U[:,:, m] = np.exp(-self.alpha * bm ** 2 * T)* K* Fbar + K* SS
#print(['Eigenvalue ' num2str(m) ', found at time ' num2str(toc) ' seconds'])
Compilation of answers given in the comments
Answers specific to my code:
As vorticity mentioned my code in itself was not using the numpy, and scipy packages to the fullest extent.
In regards to Bessel, function 'royvib' mentions using using .jo from scipy rather than .jv. Calling the special Bessel function jv. is much more computationally expensive, especially since I knew that I would be using a zeroth order bessel function for many of my declarations the minor change from jv -> j0 solved speed up the process.
In addition, I declared variables outside the loop to prevent expensive calls to searching for my appropriate functions. Example below.
Before
for i in range(SomeLength):
some_var = scipy.special.jv(1,coeff)
After
Bessel = scipy.special.jv
for i in range(SomeLength):
some_var = Bessel(1,coeff)
Storing the function saved time by not using the dot ('.') the command to look through the libraries every single loop. However keep in mind this does make python less readable, which is the main reason I choose to do this project in python. I do not have an exact amount of time this step cut from my process.
Fortran specific:
Since I was able to improve my python code I did not go this route an lack of specifics, but the general answer as stated by 'High Performance Mark' is that yes there are libraries that have been made to handle Bessel functions in Fortran.
If I do port my code over to Fortran or use f2py to mix Fortran and python I will update this answer accordingly.

Can I provide a solver in Google's ortools package with a BFS to start?

I'm solving a very large LP — one which doesn't have 0 as a Basic feasible solution (BFS). I'm wondering if by passing the solver a basic feasible solution, I can speed up the process. Looking for something along the lines of: solver.setBasicFeasibleSolution(). I'll formulate a toy instance below (with a lot fewer constraints) and show you what I mean.
from ortools.linear_solver import pywraplp
def main():
# Instantiate solver
solver = pywraplp.Solver('Toy',
pywraplp.Solver.GLOP_LINEAR_PROGRAMMING)
# Variables
x = solver.NumVar(-1, solver.infinity(), 'x')
y = solver.NumVar(-1, solver.infinity(), 'y')
z = solver.NumVar(-1, solver.infinity(), 'z')
# Constraint 1: x + y >= 10.
constraint1 = solver.Constraint(10, solver.infinity())
constraint1.SetCoefficient(x, 1)
constraint1.SetCoefficient(y, 1)
# Constraint 2: x + z >= 5.
constraint2 = solver.Constraint(5, solver.infinity())
constraint2.SetCoefficient(x, 1)
constraint2.SetCoefficient(z, 1)
# Constraint 3: y + z >= 15.
constraint2 = solver.Constraint(15, solver.infinity())
constraint2.SetCoefficient(y, 1)
constraint2.SetCoefficient(z, 1)
# Objective function: min 2x + 3y + 4z.
objective = solver.Objective()
objective.SetCoefficient(x, 2)
objective.SetCoefficient(y, 3)
objective.SetCoefficient(z, 4)
objective.SetMinimization()
# What I want:
"""
solver.setBasicFeasibleSolution({x: 10, y: 5, z: 15})
"""
solver.Solve()
[print val.solution_value() for val in [x, y, z]]
Hoping something like this will speed things up (in case the solver has to use two phase simplex to find an initial BFS or the big M method).
Also, if anyone can point me to python API docs — not Google provided examples — that would be really helpful. Looking to understand what objects are available in ortools' solvers, what their methods are, and what their return values and patterns are. Sort of like the C++ docs.
Of course, other resources are also welcomed.
Crawling the docs, this seems to be the API-docs for the C++-based solver and swig-based python-binding is mentioned.
Within this, you will find MPSolver with this:
SetStartingLpBasis
Return type: void
Arguments: const std::vector& variable_statuses, const std::vector& constraint_statuses
Advanced usage: Incrementality. This function takes a starting basis to be used in the next LP Solve() call. The statuses of a current solution can be retrieved via the basis_status() function of a MPVariable or a MPConstraint. WARNING: With Glop, you should disable presolve when using this because this information will not be modified in sync with the presolve and will likely not mean much on the presolved problem.
The warning somewhat makes me wonder if this will work out for you (in terms of saving time).
If you don't have a good reason to stick with GLOP (looks interesting!), use CoinORs Clp (depressing state of docs; but imho the best open-source LP-solver including some interesting crashing-procedures)! I think it's even interfaced within ortools. (Mittelmann Benchmarks, where it's even beating CPLEX. But in regards to a scientific-eval it only shows that it's very competetive!)
Or if it's very large and you don't need Simplex-like solutions, go for an Interior-point method (Clp has one; no info about quality).

Algorithm - return aliasing frequency

In Python, I'm trying to write an algorithm alias_freq(f_signal,f_sample,n), which behaves as follows:
def alias_freq(f_signal,f_sample,n):
f_Nyquist=f_sample/2.0
if f_signal<=f_Nyquist:
return n'th frequency higher than f_signal that will alias to f_signal
else:
return frequency (lower than f_Nyquist) that f_signal will alias to
The following is code that I have been using to test the above function (f_signal, f_sample, and n below are chosen arbitrarily just to fill out the code)
import numpy as np
import matplotlib.pyplot as plt
t=np.linspace(0,2*np.pi,500)
f_signal=10.0
y1=np.sin(f_signal*t)
plt.plot(t,y1)
f_sample=13.0
t_sample=np.linspace(0,int(f_sample)*(2*np.pi/f_sample),f_sample)
y_sample=np.sin(f_signal*t_sample)
plt.scatter(t_sample,y_sample)
n=2
f_alias=alias_freq(f_signal,f_sample,n)
y_alias=np.sin(f_alias*t)
plt.plot(t,y_alias)
plt.xlim(xmin=-.1,xmax=2*np.pi+.1)
plt.show()
My thinking is that if the function works properly, the plots of both y1 and y_alias will hit every scattered point from y_sample. So far I have been completely unsuccessful in getting either the if statement or the else statement in the function to do what I think it should, which makes me believe that either I don't understand aliasing nearly as well as I want to, or my test code is no good.
My questions are: Prelimarily, is the test code I'm using sound for what I'm trying to do? And primarily, what is the alias_freq function that I am looking for?
Also please note: If some Python package has a function just like this already built in, I'd love to hear about it - however, part of the reason I'm doing this is to give myself a device to understand phenomena like aliasing better, so I'd still like to see what my function should look like.
As far as I understood the question correctly, the frequency of the aliased signal is abs(sampling_rate * n - f_signal), where n is the closest integer multiple to f_signal.
Thus:
n = round(f_signal / float(f_sample))
f_alias = abs(f_sample * n - f_signal)
This should work for frequencies under and over Nyquist.
I figured out the answer to my and just realized that I forgot to post it here, sorry. Turns out it was something silly - Antii's answer is basically right, but the way I wrote the code I need a f_sample-1 in the alias_freq function, where I just had an f_sample. There's still a phase shift thing that happens sometimes, but just plugging in either 0 or pi for the phase shift has worked for me every time, I think it's just due to even or odd folding. The working function and test code is below.
import numpy as np
import matplotlib.pyplot as plt
#Given a sample frequency and a signal frequency, return frequency that signal frequency will be aliased to.
def alias_freq(f_signal,f_sample,n):
f_alias = np.abs((f_sample-1)*n - f_signal)
return f_alias
t=np.linspace(0,2*np.pi,500)
f_signal=13
y1=np.sin(f_signal*t)
plt.plot(t,y1)
f_sample=7
t_sample=np.linspace(0,int(f_sample)*(2*np.pi/f_sample),f_sample)
y_sample=np.sin((f_signal)*t_sample)
plt.scatter(t_sample,y_sample)
f_alias=alias_freq(f_signal,f_sample,3)
y_alias=np.sin(f_alias*t+np.pi)#Sometimes with phase shift, usually np.pi for integer f_signal and f_sample, sometimes without.
plt.plot(t,y_alias)
plt.xlim(xmin=-.1,xmax=2*np.pi+.1)
plt.show()
Here is a Python aliased frequency calculator based on numpy
def get_aliased_freq(f, fs):
"""
return aliased frequency of f sampled at fs
"""
import numpy as np
fn = fs / 2
if np.int(f / fn) % 2 == 0:
return f % fn
else:
return fn - (f % fn)

How to improve the performance of this tiny distance Python function

I'm running into a performance bottleneck when using a custom distance metric function for a clustering algorithm from sklearn.
The result as shown by Run Snake Run is this:
Clearly the problem is the dbscan_metric function. The function looks very simple and I don't quite know what the best approach to speeding it up would be:
def dbscan_metric(a,b):
if a.shape[0] != NUM_FEATURES:
return np.linalg.norm(a-b)
else:
return np.linalg.norm(np.multiply(FTR_WEIGHTS, (a-b)))
Any thoughts as to what is causing it to be this slow would be much appreciated.
I am not familiar with what the function does - but is there a possibility of repeated calculations? If so, you could memoize the function:
cache = {}
def dbscan_metric(a,b):
diff = a - b
if a.shape[0] != NUM_FEATURES:
to_calc = diff
else:
to_calc = np.multiply(FTR_WEIGHTS, diff)
if not cache.get(to_calc): cache[to_calc] = np.linalg.norm(to_calc)
return cache[to_calc]

Numpy/Python performing terribly vs. Matlab

Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.
My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.
I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.
Here is the python code for the main loop:
for basecell in range (0, cellnumber-1):
if firstcelltype == np.array((cellrecord[basecell,2])):
xloc=np.array((cellrecord[basecell,0]))
yloc=np.array((cellrecord[basecell,1]))
xedgedist=(xbound-xloc)
yedgedist=(ybound-yloc)
if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist:
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
xcomploc=np.array((cellrecord[comparecell,0]))
ycomploc=np.array((cellrecord[comparecell,1]))
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
Here is the matlab code for the main loop:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
for comparecell = 1:cellnumber;
if secondcelltype==cellrecord(comparecell,3);
xcomploc=cellrecord(comparecell,1);
ycomploc=cellrecord(comparecell,2);
dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
if (dist>=1) && (dist<=100.4999);
arraytarget=round(dist*analysisdist/intervalnumber);
spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
end
end
end
end
end
end
Thanks!
Here are some ways to speed up your python code.
First: Don't make np arrays when you are only storing one value. You do this many times over in your code. For instance,
if firstcelltype == np.array((cellrecord[basecell,2])):
can just be
if firstcelltype == cellrecord[basecell,2]:
I'll show you why with some timeit statements:
>>> timeit.Timer('x = 111.1').timeit()
0.045882196294822819
>>> t=timeit.Timer('x = np.array(111.1)','import numpy as np').timeit()
0.55774970267830071
That's an order of magnitude in difference between those calls.
Second: The following code:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
can be replaced with
arraytarget=round(dist*analysisdist/intervalnumber)-1
spatialraw[arraytarget] += 1
Third: You can get rid of the sqrt as Philip mentioned by squaring analysisdist beforehand. However, since you use analysisdist to get arraytarget, you might want to create a separate variable, analysisdist2 that is the square of analysisdist and use that for your comparison.
Fourth: You are looking for cells that match secondcelltype every time you get to that point rather than finding those one time and using the list over and over again. You could define an array:
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
and then replace
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
with
for comparecell in comparecells:
Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you're using a somewhat recent version. This should speed-up your code a bit.
Sixth: If the code still isn't fast enough after all previous steps, then you should try vectorizing your code. It shouldn't be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here's my try at vectorizing:
basecells = np.where(cellrecord[:,2]==firstcelltype)[0]
xlocs = cellrecord[basecells, 0]
ylocs = cellrecord[basecells, 1]
xedgedists = xbound - xloc
yedgedists = ybound - yloc
whichcells = np.where((xlocs>excludedist) & (xedgedists>excludedist) & (ylocs>excludedist) & (yedgedists>excludedist))[0]
selectedcells = basecells[whichcells]
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
xcomplocs = cellrecords[comparecells,0]
ycomplocs = cellrecords[comparecells,1]
analysisdist2 = analysisdist**2
for basecell in selectedcells:
dists = np.round((xcomplocs-xlocs[basecell])**2 + (ycomplocs-ylocs[basecell])**2)
whichcells = np.where((dists >= 1) & (dists <= analysisdist2))[0]
arraytargets = np.round(dists[whichcells]*analysisdist/intervalnumber) - 1
for target in arraytargets:
spatialraw[target] += 1
You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn't actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make analysisdist/intervalnumber a separate variable to avoid doing that division over and over again.
Not too sure about the slowness of python but you Matlab code can be HIGHLY optimized. Nested for-loops tend to have horrible performance issues. You can replace the inner loop with a vectorized function ... as below:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
% for comparecell = 1:cellnumber;
% if secondcelltype==cellrecord(comparecell,3);
% xcomploc=cellrecord(comparecell,1);
% ycomploc=cellrecord(comparecell,2);
% dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
% if (dist>=1) && (dist<=100.4999);
% arraytarget=round(dist*analysisdist/intervalnumber);
% spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
% end
% end
% end
%replace with:
secondcelltype_mask = secondcelltype == cellrecord(:,3);
xcomploc_vec = cellrecord(secondcelltype_mask ,1);
ycomploc_vec = cellrecord(secondcelltype_mask ,2);
dist_vec = sqrt((xcomploc_vec-xloc)^2+(ycomploc_vec-yloc)^2);
dist_mask = dist>=1 & dist<=100.4999
arraytarget_vec = round(dist_vec(dist_mask)*analysisdist/intervalnumber);
count = accumarray(arraytarget_vec,1, [size(spatialsum,1),1]);
spatialsum(:,1) = spatialsum(:,1)+count;
end
end
end
There may be some small errors in there since I don't have any data to test the code with but it should get ~10X speed up on the Matlab code.
From my experience with numpy I've noticed that swapping out for-loops for vectorized/matrix-based arithmetic has noticeable speed-ups as well. However, without the shapes the shapes of all of your variables its hard to vectorize things.
You can avoid some of the math.sqrt calls by replacing the lines
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
with
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
dist=round(dist)
if dist>=1 and dist<=analysisdist_squared:
arraytarget=round(math.sqrt(dist)*analysisdist/intervalnumber)
where you have the line
analysisdist_squared = analysis_dist * analysis_dist
outside of the main loop of your function.
Since math.sqrt is called in the innermost loop, you should have from math import sqrt at the top of the module and just call the function as sqrt.
I would also try replacing
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
with
dist=(xcomploc-xloc)*(xcomploc-xloc)+(ycomploc-yloc)*(ycomploc-yloc)
There's a chance it will produce faster byte code to do multiplication rather than exponentiation.
I doubt these will get you all the way to MATLABs performance, but they should help reduce some overhead.
If you have a multicore, you could maybe give the multiprocessing module a try and use multiple processes to make use of all the cores.
Instead of sqrt you could use x**0.5, which is, if I remember correct, slightly faster.

Categories