I have a fairly simple function involving a logarithm of base 10 (f1 shown below). I need it to run as fast as possible since it is called millions of times as part of a larger code.
I tried with a Taylor approximation (f2 below) but even with a large expansion the accuracy is very poor and, even worse, it ends up taking a lot more time.
Have I reached the limit of performance attainable with numpy?
import time
import numpy as np
def f1(m1, m2):
return m1 - 2.5 * np.log10(1. + 10 ** (-.4 * (m2 - m1)))
def f2(m1, m2):
"""
Taylor expansion of 'f1'.
"""
x = -.4 * (m2 - m1)
return m1 - 2.5 * (
0.30102999 + .5 * x + 0.2878231366 * x ** 2 -
0.0635837 * x ** 4 + 0.0224742887 * x ** 6 -
0.00904311879 * x ** 8 + 0.00388579 * x ** 10)
# The data I actually use has more or less this range.
N = 1000
m1 = np.random.uniform(5., 30., N)
m2 = np.random.uniform(.7 * m1, m1)
# Test both functions
M = 5000
s = time.clock()
for _ in range(M):
mc1 = f1(m1, m2)
t1 = time.clock() - s
s = time.clock()
for _ in range(M):
mc2 = f2(m1, m2)
t2 = time.clock() - s
print(t1, t2, np.allclose(mc1, mc2, 0.01))
With this code-snippet, i'm not sure if you should optimize the log, but more the whole vector-expression itself.
You can try numexpr (Fast numerical array expression evaluator for Python, NumPy,...), which might do a lot for you.
The idea to try this came from Ignacio's comment which made me think where his speedup is coming from (i'm sure, it's not coming from the log calculation itself).
In my simple modification of your code:
import numexpr as ne
def f1(m1, m2):
return ne.evaluate("m1 - 2.5 * log10( 1.0 + 10 ** (-0.4 * (m2-m1)))")
it seems the above is 5 - 6x times as fast as (an unoptimized) f2 (approximation), while still giving the original accuracy.
It's also nearly twice as fast as the original numpy-approach f1.
These numbers might change depending on numexpr's setup as Intels MKL for example could be used too. As i'm too lazy to check my anaconda-based setup, i offer this just as a tech-demo, which everyone can try out too.
While i used numexpr a few times in the past for simple stuff, i might add, that it's also used within pandas, just to mention a real-world project depending on it's correct workings.
Disclaimer: i used your benchmark as template (and hope caching and co does not play a role).
Replace all of those exponentiations in f2 with multiplication:
def f2(m1, m2):
"""
Taylor expansion of 'f1'.
"""
x = -0.4 * (m2 - m1)
x2 = x * x
x4 = x2 * x2
x6 = x4 * x2
return m1 - 2.5 * (
0.30102999 + .5 * x + 0.2878231366 * x2 -
0.0635837 * x4 + 0.0224742887 * x6 -
0.00904311879 * x4 * x4 + 0.00388579 * x4 * x6)
Related
Background.
I'm attempting to write a python implementation of this answer over on Math SE. You may find the following background to be useful.
Problem
I have an experimental setup consisting of three (3) receivers, with known locations [xi, yi, zi], and a transmitter with unknown location [x,y,z] emitting a signal at known velocity v. This signal arrives at the receivers at known times ti. The time of emission, t, is unknown.
I wish to find the angle of arrival (i.e. the transmitter's polar coordinates theta and phi), given only this information.
Solution
It is not possible to locate the transmitter exactly with only three (3) receivers, except in a handful of unique cases (there are several great answers across Math SE explaining why this is the case). In general, at least four (and, in practice, >>4) receivers are required to uniquely determine the rectangular coordinates of the transmitter.
The direction to the transmitter, however, may be "reliably" estimated. Letting vi be the vector representing the location of receiver i, ti being the time of signal arrival at receiver i, and n be the vector representing the unit vector pointing in the (approximate) direction of the transmitter, we obtain the following equations:
<n, vj - vi> = v(ti - tj)
(where < > denotes the scalar product)
...for all pairs of indices i,j. Together with |n| = 1, the system has 2 solutions in general, symmetric by reflection in the plane through vi/vj/vk. We may then determine phi and theta by simply writing n in polar coordinates.
Implementation.
I've attempted to write a python implementation of the above solution, using scipy's fsolve.
from dataclasses import dataclass
import scipy.optimize
import random
import math
c = 299792
#dataclass
class Vertexer:
roc: list
def fun(self, var, dat):
(x,y,z) = var
eqn_0 = (x * (self.roc[0][0] - self.roc[1][0])) + (y * (self.roc[0][1] - self.roc[1][1])) + (z * (self.roc[0][2] - self.roc[1][2])) - c * (dat[1] - dat[0])
eqn_1 = (x * (self.roc[0][0] - self.roc[2][0])) + (y * (self.roc[0][1] - self.roc[2][1])) + (z * (self.roc[0][2] - self.roc[2][2])) - c * (dat[2] - dat[0])
eqn_2 = (x * (self.roc[1][0] - self.roc[2][0])) + (y * (self.roc[1][1] - self.roc[2][1])) + (z * (self.roc[1][2] - self.roc[2][2])) - c * (dat[2] - dat[1])
norm = math.sqrt(x**2 + y**2 + z**2) - 1
return [eqn_0, eqn_1, eqn_2, norm]
def find(self, dat):
result = scipy.optimize.fsolve(self.fun, (0,0,0), args=dat)
print('Solution ', result)
# Crude code to simulate a source, receivers at random locations
x0 = random.randrange(0,50); y0 = random.randrange(0,50); z0 = random.randrange(0,50)
x1 = random.randrange(0,50); x2 = random.randrange(0,50); x3 = random.randrange(0,50);
y1 = random.randrange(0,50); y2 = random.randrange(0,50); y3 = random.randrange(0,50);
z1 = random.randrange(0,50); z2 = random.randrange(0,50); z3 = random.randrange(0,50);
t1 = math.sqrt((x0-x1)**2 + (y0-y1)**2 + (z0-z1)**2)/c
t2 = math.sqrt((x0-x2)**2 + (y0-y2)**2 + (z0-z2)**2)/c
t3 = math.sqrt((x0-x3)**2 + (y0-y3)**2 + (z0-z3)**2)/c
print('Actual coordinates ', x0,y0,z0)
myVertexer = Vertexer([[x1,y1,z1], [x2,y2,z2], [x3,y3,z3]])
myVertexer.find([t1,t2,t3])
Unfortunately, I have far more experience solving such problems in C/C++ using GSL, and have limited experience working with scipy and the like. I'm getting the error:
TypeError: fsolve: there is a mismatch between the input and output shape of the 'func' argument 'fun'.Shape should be (3,) but it is (4,).
...which seems to suggest that fsolve expects a square system.
How may I solve this rectangular system? I can't seem to find anything useful in the scipy docs.
If necessary, I'm open to using other (Python) libraries.
As you already mentioned, fsolve expects a system with N variables and N equations, i.e. it finds a root of the function F: R^N -> R^N. Since you have four equations, you simply need to add a fourth variable. Note also that fsolve is a legacy function, and it's recommended to use root instead. Last but not least, note that sqrt(x^2+y^2+z^2) = 1 is equivalent to x^2+y^2+z^2=1 and that the latter is much less susceptible to rounding errors caused by the finite differences when approximating the jacobian of F.
Long story short, your class should look like this:
from scipy.optimize import root
#dataclass
class Vertexer:
roc: list
def fun(self, var, dat):
x,y,z, *_ = var
eqn_0 = (x * (self.roc[0][0] - self.roc[1][0])) + (y * (self.roc[0][1] - self.roc[1][1])) + (z * (self.roc[0][2] - self.roc[1][2])) - c * (dat[1] - dat[0])
eqn_1 = (x * (self.roc[0][0] - self.roc[2][0])) + (y * (self.roc[0][1] - self.roc[2][1])) + (z * (self.roc[0][2] - self.roc[2][2])) - c * (dat[2] - dat[0])
eqn_2 = (x * (self.roc[1][0] - self.roc[2][0])) + (y * (self.roc[1][1] - self.roc[2][1])) + (z * (self.roc[1][2] - self.roc[2][2])) - c * (dat[2] - dat[1])
norm = x**2 + y**2 + z**2 - 1
return [eqn_0, eqn_1, eqn_2, norm]
def find(self, dat):
result = root(self.fun, (0,0,0,0), args=dat)
if result.success:
print('Solution ', result.x[:3])
This question already has answers here:
Performance: Matlab vs Python
(5 answers)
Python vs MATLAB performance on algorithm
(1 answer)
Closed 3 years ago.
I have been using MATLAB on a daily basis for a few years, but recently I decided to switch to Python because of its additional capabilities (e.g. Machine learning libraries, etc.).
I started by translating a simple MATLAB code into Python using primarily NumPy, and run it through Spyder environment. The results are striking:
Python is more than 100 slower! Below I attached both codes. For the given CSV file (gmr) Python executed the code in 25 secs, while MATLAB in several ms. More importantly, I want to repeat the same calculations for a large number of CSV files (e.g. 200). It is quite striking that MATLAB needs around 50 secs for 200 CSV files, while Python would need more than 1 hour.
Could you please suggest some modifications? What am I doing wrong here?
I have tried several alternatives ways to formulate the code, but unfortunately none of them worked.
MATLAB:
clear variables; close all ; clc ;
tic;
damp=0.05;
T=[0.01:0.01:3]';
Spectra = cell(num_records,2);
name='rec_1_out_scaled.csv';
gmr=load(name);
dt = gmr(3,1) - gmr(2,1);
cd (main_Dir)
Sa=zeros(length(T),1);
% Newmark's Direct Integration Method
for j=1:length(T)
Sa(j)=newmark_dim(gmr(:,2),T(j),damp,dt);
end
Spectra(i,:) = {name,Sa};
toc;
function Sa = newmark_dim(gacc,T,zeta,dt)
% newmark coefficients
beta = 1/4;
gamma = 1/2;
% natural circular frequency of SDOF system
wn = 2*pi/T;
% initialization
npun = length(gacc);
y = zeros(npun,1);
yp = zeros(npun,1);
ypp = zeros(npun,1);
ypp(1) = -gacc(1)-2*wn*zeta*yp0-wn^2*y0;
% Integration coefficients
keff = wn^2 + 1/(beta*dt^2) + gamma*2*wn*zeta/(beta*dt);
a1 = 1/(beta*dt^2)+gamma*2*wn*zeta/(beta*dt);
a2 = 1/(beta*dt)+2*wn*zeta*(gamma/beta-1);
a3 = (1/(2*beta)-1)+2*wn*zeta*dt*(gamma/(2*beta)-1);
for i=1:npun-1
y(i+1) = (-gacc(i+1)+a1*y(i)+a2*yp(i)+a3*ypp(i))/keff;
ypp(i+1) = (y(i+1)-y(i)-dt*yp(i)-dt^2*ypp(i)/2)/(beta*dt^2) + ypp(i);
yp(i+1) = yp(i)+dt*ypp(i)+dt*gamma*(ypp(i+1)-ypp(i));
end
Sd = max(abs(y));
Sa = Sd*wn^2;
return
PYTHON:
import time
import numpy as np
import math
import pandas as pd
start = time.time()
# Initialize
name = "rec_1_out_scaled.csv"
T = np.arange(0.01,3.01,0.01)
zeta = 0.05 # 5% damping
extension = 'csv'
num_records = 200
Spectra = np.zeros((num_records,T.size,2)) # Multidimentional array
"""
Newmark's Direct Integration Method
"""
def Newmark_DIM(gmr,T,zeta,dt):
# newmark coefficients for constant acceleration
beta = 1/4
gamma = 1/2
# natural circular frequency of the SDOF system
wn = 2*math.pi/T
# initialization
npun = len(gmr)
y = np.zeros(npun)
yp = np.zeros(npun)
ypp = np.zeros(npun)
ypp[0] = - gmr[0] - 2 * wn * zeta* yp[0] - wn**2 * y[0]
# Integration coefficients
keff = wn**2 + 1/(beta * dt**2) + gamma * 2 * wn * zeta /(beta * dt)
a1 = 1/(beta * dt**2) + gamma * 2 * wn * zeta/(beta * dt)
a2 = 1/(beta * dt) + 2 * wn * zeta * (gamma/beta - 1)
a3 = (1/(2 * beta) - 1) + 2 * wn * zeta * dt * (gamma/(2 * beta) - 1)
# Solves ODE
for i in range(npun-1):
y[i+1] = (-gmr[i+1] + a1 * y[i] + a2 * yp[i] + a3 * ypp[i])/keff
ypp[i+1] = (y[i+1] -y[i] - dt * yp[i] - dt**2 * ypp[i]/2) / (beta * dt**2) + ypp[i]
yp[i+1] = yp[i] + dt * ypp[i] + dt * gamma * (ypp[i+1] - ypp[i])
Sd = np.amax(np.abs(y))
Sa = Sd * wn**2
return (Sa)
# Calculates Spectrum
gmr = pd.read_csv(name, header=None)
gmr = gmr.values # makes it numpy array
dt = gmr[2,0] - gmr[1,0]
Sa = np.zeros(len(T))
for j in range(len(T)):
Sa[j] = Newmark_DIM(gmr[:,1],T[j],zeta,dt)
Spectra[0][:,0] = T
Spectra[0][:,1] = Sa
print("Finished")
end = time.time()
print(end - start)
I will try and explain exactly what's going on and my issue.
This is a bit mathy and SO doesn't support latex, so sadly I had to resort to images. I hope that's okay.
I don't know why it's inverted, sorry about that.
At any rate, this is a linear system Ax = b where we know A and b, so we can find x, which is our approximation at the next time step. We continue doing this until time t_final.
This is the code
import numpy as np
tau = 2 * np.pi
tau2 = tau * tau
i = complex(0,1)
def solution_f(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) + np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
def solution_g(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) - np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
for l in range(2, 12):
N = 2 ** l #number of grid points
dx = 1.0 / N #space between grid points
dx2 = dx * dx
dt = dx #time step
t_final = 1
approximate_f = np.zeros((N, 1), dtype = np.complex)
approximate_g = np.zeros((N, 1), dtype = np.complex)
#Insert initial conditions
for k in range(N):
approximate_f[k, 0] = np.cos(tau * k * dx)
approximate_g[k, 0] = -i * np.sin(tau * k * dx)
#Create coefficient matrix
A = np.zeros((2 * N, 2 * N), dtype = np.complex)
#First row is special
A[0, 0] = 1 -3*i*dt
A[0, N] = ((2 * dt / dx2) + dt) * i
A[0, N + 1] = (-dt / dx2) * i
A[0, -1] = (-dt / dx2) * i
#Last row is special
A[N - 1, N - 1] = 1 - (3 * dt) * i
A[N - 1, N] = (-dt / dx2) * i
A[N - 1, -2] = (-dt / dx2) * i
A[N - 1, -1] = ((2 * dt / dx2) + dt) * i
#middle
for k in range(1, N - 1):
A[k, k] = 1 - (3 * dt) * i
A[k, k + N - 1] = (-dt / dx2) * i
A[k, k + N] = ((2 * dt / dx2) + dt) * i
A[k, k + N + 1] = (-dt / dx2) * i
#Bottom half
A[N :, :N] = A[:N, N:]
A[N:, N:] = A[:N, :N]
Ainv = np.linalg.inv(A)
#Advance through time
time = 0
while time < t_final:
b = np.concatenate((approximate_f, approximate_g), axis = 0)
x = np.dot(Ainv, b) #Solve Ax = b
approximate_f = x[:N]
approximate_g = x[N:]
time += dt
approximate_solution = np.concatenate((approximate_f, approximate_g), axis=0)
#Calculate the actual solution
actual_f = np.zeros((N, 1), dtype = np.complex)
actual_g = np.zeros((N, 1), dtype = np.complex)
for k in range(N):
actual_f[k, 0] = solution_f(t_final, k * dx)
actual_g[k, 0] = solution_g(t_final, k * dx)
actual_solution = np.concatenate((actual_f, actual_g), axis = 0)
print(np.sqrt(dx) * np.linalg.norm(actual_solution - approximate_solution))
It doesn't work. At least not in the beginning, it shouldn't start this slow. I should be unconditionally stable and converge to the right answer.
What's going wrong here?
The L2-norm can be a useful metric to test convergence, but isn't ideal when debugging as it doesn't explain what the problem is. Although your solution should be unconditionally stable, backward Euler won't necessarily converge to the right answer. Just like forward Euler is notoriously unstable (anti-dissipative), backward Euler is notoriously dissipative. Plotting your solutions confirms this. The numerical solutions converge to zero. For a next-order approximation, Crank-Nicolson is a reasonable candidate. The code below contains the more general theta-method so that you can tune the implicit-ness of the solution. theta=0.5 gives CN, theta=1 gives BE, and theta=0 gives FE.
A couple other things that I tweaked:
I selected a more appropriate time step of dt = (dx**2)/2 instead of dt = dx. That latter doesn't converge to the right solution using CN.
It's a minor note, but since t_final isn't guaranteed to be a multiple of dt, you weren't comparing solutions at the same time step.
With regards to your comment about it being slow: As you increase the spatial resolution, your time resolution needs to increase too. Even in your case with dt=dx, you have to perform a (1024 x 1024)*1024 matrix multiplication 1024 times. I didn't find this to take particularly long on my machine. I removed some unneeded concatenation to speed it up a bit, but changing the time step to dt = (dx**2)/2 will really bog things down, unfortunately. You could trying compiling with Numba if you are concerned with speed.
All that said, I didn't find tremendous success with the consistency of CN. I had to set N=2^6 to get anything at t_final=1. Increasing t_final makes this worse, decreasing t_final makes it better. Depending on your needs, you could looking into implementing TR-BDF2 or other linear multistep methods to improve this.
The code with a plot is below:
import numpy as np
import matplotlib.pyplot as plt
tau = 2 * np.pi
tau2 = tau * tau
i = complex(0,1)
def solution_f(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) + np.exp(tau * i * x) * np.exp((tau2 + 4) * i * t))
def solution_g(t, x):
return 0.5 * (np.exp(-tau * i * x) * np.exp((2 - tau2) * i * t) - np.exp(tau * i * x) *
np.exp((tau2 + 4) * i * t))
l=6
N = 2 ** l
dx = 1.0 / N
dx2 = dx * dx
dt = dx2/2
t_final = 1.
x_arr = np.arange(0,1,dx)
approximate_f = np.cos(tau*x_arr)
approximate_g = -i*np.sin(tau*x_arr)
H = np.zeros([2*N,2*N], dtype=np.complex)
for k in range(N):
H[k,k] = -3*i*dt
H[k,k+N] = (2/dx2+1)*i*dt
if k==0:
H[k,N+1] = -i/dx2*dt
H[k,-1] = -i/dx2*dt
elif k==N-1:
H[N-1,N] = -i/dx2*dt
H[N-1,-2] = -i/dx2*dt
else:
H[k,k+N-1] = -i/dx2*dt
H[k,k+N+1] = -i/dx2*dt
### Bottom half
H[N :, :N] = H[:N, N:]
H[N:, N:] = H[:N, :N]
### Theta method. 0.5 -> Crank Nicolson
theta=0.5
A = np.eye(2*N)+H*theta
B = np.eye(2*N)-H*(1-theta)
### Precompute for faster computations
mat = np.linalg.inv(A)#B
t = 0
b = np.concatenate((approximate_f, approximate_g))
while t < t_final:
t += dt
b = mat#b
approximate_f = b[:N]
approximate_g = b[N:]
approximate_solution = np.concatenate((approximate_f, approximate_g))
#Calculate the actual solution
actual_f = solution_f(t,np.arange(0,1,dx))
actual_g = solution_g(t,np.arange(0,1,dx))
actual_solution = np.concatenate((actual_f, actual_g))
plt.figure(figsize=(7,5))
plt.plot(x_arr,actual_f.real,c="C0",label=r"$Re(f_\mathrm{true})$")
plt.plot(x_arr,actual_f.imag,c="C1",label=r"$Im(f_\mathrm{true})$")
plt.plot(x_arr,approximate_f.real,c="C0",ls="--",label=r"$Re(f_\mathrm{num})$")
plt.plot(x_arr,approximate_f.imag,c="C1",ls="--",label=r"$Im(f_\mathrm{num})$")
plt.legend(loc=3,fontsize=12)
plt.xlabel("x")
plt.savefig("num_approx.png",dpi=150)
I am not going to go through all of your math, but I'm going to offer a suggestion.
The use of a direct calculation for fxx and gxx seems like a good candidate for being numerically unstable. Intuitively a first order method should be expected to make second order mistakes in the terms. Second order mistakes in the individual terms, after passing through that formula, wind up as constant order mistakes in the second derivative. Plus when your step size gets small, you are going to find that a quadratic formula makes even small roundoff mistakes turn into surprisingly large errors.
Instead I would suggest that you start by turning this into a first-order system of 4 functions, f, fx, g, and gx. And then proceed with backward's Euler on that system. Intuitively, with this approach, a first order method creates second order mistakes, which pass through a formula that creates first order mistakes of them. And now you are converging as you should from the start, and are also not as sensitive to propagation of roundoff errors.
I've written some Python code to do some image processing work, but it takes a huge amount of time to run. I've spent the last few hours trying to optimize it, but I think I've reached the end of my abilities.
Looking at the outputs from the profiler, the function below is taking a large proportion of the overall time of my code. Is there any way that it can be speeded up?
def make_ellipse(x, x0, y, y0, theta, a, b):
c = np.cos(theta)
s = np.sin(theta)
a2 = a**2
b2 = b**2
xnew = x - x0
ynew = y - y0
ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1
return ellipse
To give the context, it is called with x and y as the output from np.meshgrid with a fairly large grid size, and all of the other parameters as simple integer values.
Although that function seems to be taking a lot of the time, there are probably ways that the rest of the code can be speeded up too. I've put the rest of the code at this gist.
Any ideas would be gratefully received. I've tried using numba and autojiting the main functions, but that doesn't help much.
Let's try to optimize make_ellipse in conjunction with its caller.
First, notice that a and b are the same over many calls. Since make_ellipse squares them each time, just have the caller do that instead.
Second, notice that np.cos(np.arctan(theta)) is 1 / np.sqrt(1 + theta**2) which seems slightly faster on my system. A similar trick can be used to compute the sine, either from theta or from cos(theta) (or vice versa).
Third, and less concretely, think about short-circuiting some of the final ellipse formula evaluations. For example, wherever (xnew * c + ynew * s)**2/a2 is greater than 1, the ellipse value must be False. If this happens often, you can "mask" out the second half of the (expensive) calculation of the ellipse at those locations. I haven't planned this thoroughly, but see numpy.ma for some possible leads.
It won't speed up things for all cases, but if your ellipses don't take up the whole image, you should limit your search for points inside the ellipse to its bounding rectangle. I am lazy with the math, so I googled it and reused #JohnZwinck neat cosine of an arctangent trick to come up with this function:
def ellipse_bounding_box(x0, y0, theta, a, b):
x_tan_t = -b * np.tan(theta) / a
if np.isinf(x_tan_t) :
x_cos_t = 0
x_sin_t = np.sign(x_tan_t)
else :
x_cos_t = 1 / np.sqrt(1 + x_tan_t*x_tan_t)
x_sin_t = x_tan_t * x_cos_t
x = x0 + a*x_cos_t*np.cos(theta) - b*x_sin_t*np.sin(theta)
y_tan_t = b / np.tan(theta) / a
if np.isinf(y_tan_t):
y_cos_t = 0
y_sin_t = np.sign(y_tan_t)
else:
y_cos_t = 1 / np.sqrt(1 + y_tan_t*y_tan_t)
y_sin_t = y_tan_t * y_cos_t
y = y0 + b*y_sin_t*np.cos(theta) + a*y_cos_t*np.sin(theta)
return np.sort([-x, x]), np.sort([-y, y])
You can now modify your original function to something like this:
def make_ellipse(x, x0, y, y0, theta, a, b):
c = np.cos(theta)
s = np.sin(theta)
a2 = a**2
b2 = b**2
x_box, y_box = ellipse_bounding_box(x0, y0, theta, a, b)
indices = ((x >= x_box[0]) & (x <= x_box[1]) &
(y >= y_box[0]) & (y <= y_box[1]))
xnew = x[indices] - x0
ynew = y[indices] - y0
ellipse = np.zeros_like(x, dtype=np.bool)
ellipse[indices] = ((xnew * c + ynew * s)**2/a2 +
(xnew * s - ynew * c)**2/b2 <= 1)
return ellipse
Since everything but x and y are integers, you can try to minimize the number of array computations. I imagine most of the time is spent in this statement:
ellipse = (xnew * c + ynew * s)**2/a2 + (xnew * s - ynew * c)**2/b2 <= 1
A simple rewriting like so should reduce the number of array operations:
a = float(a)
b = float(b)
ellipse = (xnew * (c/a) + ynew * (s/a))**2 + (xnew * (s/b) - ynew * (c/b))**2 <= 1
What was 12 array operations is now 10 (plus 4 scalar ops). I'm not sure if numba's jit would have tried this. It might just do all the broadcasting first, then jit the resulting operations. In this case, reordering so common operations are done at once should help.
Furthering along, you can rewrite this again as
ellipse = ((xnew + ynew * (s/c)) * (c/a))**2 + ((xnew * (s/c) - ynew) * (c/b))**2 <= 1
Or
t = numpy.tan(theta)
ellipse = ((xnew + ynew * t) * (b/a))**2 + (xnew * t - ynew)**2 <= (b/c)**2
Replacing one more array operation with a scalar, and eliminating other scalar ops to get 9 array operations and 2 scalar ops.
As always, be aware of what the range of inputs are to avoid rounding errors.
Unfortunately there's no way good way to do a running sum and bail early if either of the two addends is greater than the right hand side of the comparison. That would be an obvious speed-up, but one you'd need cython (or c/c++) to code.
You can speed it up considerably by using Cython. There is a very good documentation on how to do this.
I have the following set of equations, and I want to solve them simultaneously for X and Y. I've been advised that I could use numpy to solve these as a system of linear equations. Is that the best option, or is there a better way?
a = (((f * X) + (f2 * X3 )) / (1 + (f * X) + (f2 * X3 ))) * i
b = ((f2 * X3 ) / (1 + (f * X) + (f2 * X3))) * i
c = ((f * X) / (1 + (j * X) + (k * Y))) * i
d = ((k * Y) / (1 + (j * X) + (k * Y))) * i
f = 0.0001
i = 0.001
j = 0.0001
k = 0.001
e = 0 = X + a + b + c
g = 0.0001 = Y + d
h = i - a
As noted by Joe, this is actually a system of nonlinear equations. You are going to need more firepower than numpy alone provides.
Solution of nonlinear equations is tricky, and the typical approach is to define an objective function
F(z) = sum( e[n]^2, n=1...13 )
where z is a vector containing a value for each of your 13 variables a,b,c,d,e,f,g,h,i,X,Y and e[n] is the amount by which each of your 13 equations is violated. For example
e[3] = (d - ((k * Y) / (1 + (j * X) + (k * Y))) * i )
Once you have that objective function, then you can apply a nonlinear solver to try to find a z for which F(z)=0. That of course corresponds to a solution to your equations.
Commonly used solvers include:
The Solver in Microsoft Excel
The python library scipy.optimize
Fitting routines in the Gnu Scientific Library
Matlab's optimization toolbox
Note that all of them will work far better if you first alter your set of equations to eliminate as many variables as practical before trying to run the solver (e.g. by substituting for k wherever it is found). The reduced dimensionality makes a big difference.