More efficient weighted Gini coefficient in Python - python

Per https://stackoverflow.com/a/48981834/1840471, this is an implementation of the weighted Gini coefficient in Python:
import numpy as np
def gini(x, weights=None):
if weights is None:
weights = np.ones_like(x)
# Calculate mean absolute deviation in two steps, for weights.
count = np.multiply.outer(weights, weights)
mad = np.abs(np.subtract.outer(x, x) * count).sum() / count.sum()
rmad = mad / np.average(x, weights=weights)
# Gini equals half the relative mean absolute deviation.
return 0.5 * rmad
This is clean and works well for medium-sized arrays, but as warned in its initial suggestion (https://stackoverflow.com/a/39513799/1840471) it's O(n2). On my computer that means it breaks after ~20k rows:
n = 20000 # Works, 30000 fails.
gini(np.random.rand(n), np.random.rand(n))
Can this be adjusted to work for larger datasets? Mine is ~150k rows.

Here is a version which is much faster than the one you provided above, and also uses a simplified formula for the case without weight to get even faster results in that case.
def gini(x, w=None):
# The rest of the code requires numpy arrays.
x = np.asarray(x)
if w is not None:
w = np.asarray(w)
sorted_indices = np.argsort(x)
sorted_x = x[sorted_indices]
sorted_w = w[sorted_indices]
# Force float dtype to avoid overflows
cumw = np.cumsum(sorted_w, dtype=float)
cumxw = np.cumsum(sorted_x * sorted_w, dtype=float)
return (np.sum(cumxw[1:] * cumw[:-1] - cumxw[:-1] * cumw[1:]) /
(cumxw[-1] * cumw[-1]))
else:
sorted_x = np.sort(x)
n = len(x)
cumx = np.cumsum(sorted_x, dtype=float)
# The above formula, with all weights equal to 1 simplifies to:
return (n + 1 - 2 * np.sum(cumx) / cumx[-1]) / n
Here is some test code to check we get (mostly) the same results:
>>> x = np.random.rand(1000000)
>>> w = np.random.rand(1000000)
>>> gini_max_ghenis(x, w)
0.33376310938610521
>>> gini(x, w)
0.33376310938610382
But the speed is very different:
%timeit gini(x, w)
203 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit gini_max_ghenis(x, w)
55.6 s ± 3.35 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you remove the pandas ops from the function, it is already much faster:
%timeit gini_max_ghenis_no_pandas_ops(x, w)
1.62 s ± 75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you want to get the last drop of performance you could use numba or cython but that would only gain a few percent because most of the time is spent in sorting.
%timeit ind = np.argsort(x); sx = x[ind]; sw = w[ind]
180 ms ± 4.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
edit: gini_max_ghenis is the code used in Max Ghenis' answer

Adapting the StatsGini R function from here:
import numpy as np
import pandas as pd
def gini(x, w=None):
# Array indexing requires reset indexes.
x = pd.Series(x).reset_index(drop=True)
if w is None:
w = np.ones_like(x)
w = pd.Series(w).reset_index(drop=True)
n = x.size
wxsum = sum(w * x)
wsum = sum(w)
sxw = np.argsort(x)
sx = x[sxw] * w[sxw]
sw = w[sxw]
pxi = np.cumsum(sx) / wxsum
pci = np.cumsum(sw) / wsum
g = 0.0
for i in np.arange(1, n):
g = g + pxi.iloc[i] * pci.iloc[i - 1] - pci.iloc[i] * pxi.iloc[i - 1]
return g
This works for large vectors, at least up to 10M rows:
n = 1e7
gini(np.random.rand(n), np.random.rand(n)) # Takes ~15s.
It also produces the same result as the function provided in the question, for example giving 0.2553 for this example:
gini(np.array([3, 1, 6, 2, 1]), np.array([4, 2, 2, 10, 1]))

Related

How to speed-up the Ipopt solver?

I want to solve the following (relaxed, i.e. v(t) ∈ [0, 1]) optimal control problem with cyipopt:
Here's what I have so far to solve the discretized problem:
import numpy as np
import matplotlib.pyplot as plt
from cyipopt import minimize_ipopt
from scipy.optimize._numdiff import approx_derivative
# z = (x1(t0) .... x1(tN) x2(t0) .... x2(tN) v(t0) .... v(tN))^T
def objective(z, time):
x0, x1, v = np.split(z, 3)
res = 0.0
for i in range(time.size-1):
h = time[i+1] - time[i]
res += h*((x0[i]-1)**2 + (x1[i]-1)**2)
return res
def ode_rhs(t, x, v):
x0, x1 = x
xdot1 = x0 - x0*x1 - 0.4*x0*v
xdot2 = -x1 + x0*x1 - 0.2*x1*v
return np.array([xdot1, xdot2])
def constraint(z, time):
x0, x1, v = np.split(z, 3)
x = np.array([x0, x1])
res = np.zeros((2, x0.size))
# initial values
res[:, 0] = x[:, 0] - np.array([0.5, 0.7])
# 'solve' the ode-system
for j in range(time.size-1):
h = time[j+1] - time[j]
# implicite euler scheme
res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])
return res.flatten()
# time grid
tspan = [0, 12]
dt = 0.1
time = np.arange(tspan[0], tspan[1] + dt, dt)
# initial point
z0 = 0.1 + np.zeros(time.size*3)
# variable bounds
bnds = [(None, None) if i < 2*time.size else (0, 1) for i in range(z0.size)]
# constraints:
cons = [{
'type': 'eq',
'fun': lambda z: constraint(z, time),
'jac': lambda z: approx_derivative(lambda zz: constraint(zz, time), z)
}]
# call the solver
res = minimize_ipopt(lambda z: objective(z, time), x0=z0, bounds=bnds,
constraints=cons, options = {'disp': 5})
The code works as expected. However, it runs quite slow. Any ideas on how I can speed up the solver?
By analyzing Ipopt's output
Total CPU secs in IPOPT (w/o function evaluations) = 30.153
Total CPU secs in NLP function evaluations = 203.782
we can see that the evaluation of your functions is the bottleneck. So let's try to profile your code as Tom suggested in the comments:
In [2]: %timeit objective(z0, time)
307 µs ± 6.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: %timeit constraint(z0, time)
1.38 ms ± 4.77 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Okay, not bad. But we can do better. As a rule of thumb, try to prevent loops in numerical python code whenever possible. You can find some best numpy practices e.g. in Jake VanderPlas awesome talk at the PyCon2015. Your objective is equivalent to:
def objective(z, time):
x0, x1, v = np.split(z, 3)
h = time[1:] - time[:-1]
return np.sum(h*((x0[1:]-1)**2 + (x1[1:]-1)**2))
Similarly, you can remove the loop inside your constraint function. Note that
# 'solve' the ode-system
for j in range(time.size-1):
h = time[j+1] - time[j]
# implicite euler scheme
res[:, j+1] = x[:, j+1] - x[:, j] - h*ode_rhs(time[j+1], x[:, j+1], v[j])
is the same as
h = time[1:] - time[:-1]
res[:, 1:] = x[:, 1:] - x[:, :-1] - h * ode_rhs(time, x[:, 1:], v[:-1])
Timing the functions again, we get
In [4]: %timeit objective(z0, time)
31.8 µs ± 683 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: %timeit constraint(z0, time)
54.1 µs ± 647 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
i.e. speedups with factors 10x and 25x! Consequently, we can significantly reduce the solver runtime:
Total CPU secs in IPOPT (w/o function evaluations) = 30.906
Total CPU secs in NLP function evaluations = 46.950
However, note that calculating the gradient and jacobian numerically by finite differences is still computationally expensive and prone to rounding errors:
In [6]: %timeit approx_derivative(lambda zz: objective(zz, time), z0)
232 ms ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit approx_derivative(lambda zz: constraint(zz, time), z0)
642 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Instead, we can go one step further and calculate both via algorithmic differentiation (AD) by means of the jax library:
from jax.config import config
# enable 64 bit floating point precision
config.update("jax_enable_x64", True)
import jax.numpy as np
from jax import grad, jacfwd, jit
Then, we only need to change the constraint function as follows:
def constraint(z, time):
x0, x1, v = np.split(z, 3)
x = np.array([x0, x1])
res = np.zeros((2, x0.size))
# initial values
res = res.at[:, 0].set(x[:, 0] - np.array([0.5, 0.7]))
h = time[1:] - time[:-1]
res = res.at[:, 1:].set(x[:, 1:] - x[:, :-1] - h*ode_jit(time[1:], x[:, 1:], v[:-1]))
return res.flatten()
since item assignments are not supported, see here. Next, we just-in-time (jit) compile the functions:
# jit the functions
ode_jit = jit(ode_rhs)
obj_jit = jit(lambda z: objective(z, time))
con_jit = jit(lambda z: constraint(z, time))
# Build and jit the derivatives
obj_grad = jit(grad(obj_jit)) # objective gradient
con_jac = jit(jacfwd(con_jit)) # constraint jacobian
# Dummy first call in order to compile the functions
print("Compiling the functions...")
_ = obj_jit(z0), con_jit(z0), obj_grad(z0), con_jac(z0)
print("Done.")
Timing again, we obtain
In [10]: %timeit obj_grad(z0)
62.1 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [11]: %timeit con_jac(z0)
204 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
i.e. speedups with factors 3740x and 2260x. Finally, we can pass the exact gradient and jacobians:
# constraints:
cons = [{'type': 'eq', 'fun': con_jit, 'jac': con_jac}]
# call the solver
res = minimize_ipopt(obj_jit, x0=z0, jac=obj_grad, bounds=bnds,
constraints=cons, options={'disp': 5})
and obtain
Total CPU secs in IPOPT (w/o function evaluations) = 35.348
Total CPU secs in NLP function evaluations = 1.691

Efficient way to do a large number of regressions using numpy?

I have a large collection (26,214,400 to be exact) of sets of data I want to perform a linear regressions on, i.e. each of the 26,214,400 data sets consists of n x values and n y values and I want to find y = m * x + b. For any set of points I can use sklearn or numpy.linalg.lstsq, something like:
A = np.vstack([x, np.ones(len(x))]).T
m, b = np.linalg.lstsq(A, y, rcond=None)[0]
Is there a way to set up the matrices such that I can avoid a python loop through 26,214,400 items? Or do I have to use a loop and would be better served using something like Numba?
I ended up going the numba route which yielded a ~20x speed up on my laptop, it used all my cores so I assume more CPUs the better. The answer looked something like
import numpy as np
from numpy.linalg import lstsq
import numba
#numba.jit(nogil=True, parallel=True)
def fit(XX, yy):
""""Fit a large set of points to a regression"""
assert XX.shape == yy.shape, "Inputs mismatched"
n_pnts, n_samples = XX.shape
scale = np.empty(n_pnts)
offset = np.empty(n_pnts)
for i in numba.prange(n_pnts):
X, y = XX[i], yy[i]
A = np.vstack((np.ones_like(X), X)).T
offset[i], scale[i] = lstsq(A, y)[0]
return offset, scale
Running it:
XX, yy = np.random.randn(2, 1000, 10)
offset, scale = fit(XX, yy)
%timeit offset, scale = fit(XX, yy)
1.87 ms ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The non-jitted version has this timing:
41.7 ms ± 620 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

How to write a case when like statement in numpy array

def custom_asymmetric_train(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
grad = np.where(residual>0, -2*10.0*residual, -2*residual)
hess = np.where(residual>0, 2*10.0, 2.0)
return grad, hess
I want to write this statement:
case when residual>=0 and residual<=0.5 then -2*1.2*residual
when residual>=0.5 and residual<=0.7 then -2*1.*residual
when residual>0.7 then -2*2*residual end )
however np.where cannot write &(and) logic . How do I write this case when logic in the np.where in python.
Thanks
This statement can be written using np.select as:
import numpy as np
residual = np.random.rand(10) -0.3 # -0.3 to get some negative values
condlist = [(residual>=0.0)&(residual<=0.5), (residual>=0.5)&(residual<=0.7), residual>0.7]
choicelist = [-2*1.2*residual, -2*1.0*residual,-2*2.0*residual]
residual = np.select(condlist, choicelist, default=residual)
Note that, when multiple conditions are satisfied in condlist, the first one encountered is used. When all conditions evaluate to False, it will use the default value. Moreover, for your information, you need to use bitwise operator & on boolean numpy arrays as and python keyword won't work on them.
Let's benchmark these answers:
residual = np.random.rand(10000) -0.3
def charl_3where(residual):
residual = np.where((residual>=0.0)&(residual<=0.5), -2*1.2*residual, residual)
residual = np.where((residual>=0.5)&(residual<=0.7), -2*1.0*residual, residual)
residual = np.where(residual>0.7, -2*2.0*residual, residual)
return residual
def yaco_select(residual):
condlist = [(residual>=0.0)&(residual<=0.5), (residual>=0.5)&(residual<=0.7), residual>0.7]
choicelist = [-2*1.2*residual, -2*1.0*residual,-2*2.0*residual]
residual = np.select(condlist, choicelist, default=residual)
return residual
%timeit charl_3where(residual)
>>> 112 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit yaco_select(residual)
>>> 141 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
let's try to optimize these with numba
from numba import jit
#jit(nopython=True)
def yaco_numba(residual):
out = np.empty_like(residual)
for i in range(residual.shape[0]):
if residual[i]<0.0 :
out[i] = residual[i]
elif residual[i]<=0.5 :
out[i] = -2*1.2*residual[i]
elif residual[i]<=0.7:
out[i] = -2*1.0*residual[i]
else: # residual>0.7
out[i] = -2*2.0*residual[i]
return out
%timeit yaco_numba(residual)
>>> 6.65 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Final check
res1 = charl_3where(residual)
res2 = yaco_select(residual)
res3 = yaco_numba(residual)
np.allclose(res1,res3)
>>> True
np.allclose(res2,res3)
>>> True
This one is about 15x faster than the previously best one. Hope this helps.
You can use the syntax (condition1) & (condition2) in np.where() calls, so you would modify your function's np.where() calls like so:
def custom_asymmetric_train(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
residual = np.where((residual>=0.0)&(residual<=0.5), -2*1.2*residual, residual)
residual = np.where((residual>=0.5)&(residual<=0.7), -2*1.0*residual, residual)
residual = np.where(residual>0.7, -2*2.0*residual, residual)
...
The first argument is the condition to meet, the second argument is the value to use if the condition is met, the third argument is the value to use if the condition is not met.
You can also use vectorization since conditions are mutually exclusive:
residual = (y_true - y_pred).astype(float)
m1 = (residual>=0.0)&(residual<=0.5)
m2 = (residual>=0.5)&(residual<=0.7)
m3 = (residual >0.7)
new_residual = -2*(m1 *1.2 *residual + m2*residual + m3*2.0*residual)
return new_residual
This will have the following performances:
residual = np.random.rand(10000) -0.3
def speed_test(residual):
residual = (y_true - y_pred).astype(float)
m1 = (residual>=0.0)&(residual<=0.5)
m2 = (residual>=0.5)&(residual<=0.7)
m3 = residual >0.7
return -2*(m1 *1.2 *residual + m2*residual + m3*2.0*residual)
%timeit speed_test(residual)
123 µs ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

How to vectorize calculation of barycentric coordinates in python

In scipy.spatial there is the Delaunay function. The documentation includes an example of how to calculate barycentric coordinates.
Following that example, the following code will calculate barycentric coordinates using a loop.
points = np.array([(0,0),(0,1),(1,0),(1,1)])
samples = np.array([(0.5,0.5),(0,0),(0.1,0.1)])
dim = len(points[0]) # determine the dimension of the samples
simp = Delaunay(points) # create simplexes for the defined points
s = simp.find_simplex(samples) # for each sample, find corresponding simplex for each sample
b0 = np.zeros((len(samples),dim)) # reserve space for each barycentric coordinate
for ii in range(len(samples)):
b0[ii,:] = simp.transform[s[ii],:dim].dot((samples[ii] - simp.transform[s[ii],dim]).transpose())
coord = np.c_[b0, 1 - b0.sum(axis=1)]
This is ok for short list of samples to convert to barycentric coordinates, however for very large lists of samples, the performance is poor. How can this be modified to take advantage of vectorized math in numpy/scipy to improve performance?
Consider the following modification (for-loop replaced with numpy methods):
def f_1(points, samples):
""" original """
dim = len(points[0])
simp = ssp.Delaunay(points)
s = simp.find_simplex(samples)
b0 = np.zeros((len(samples), dim))
for ii in range(len(samples)):
b0[ii, :] = simp.transform[s[ii], :dim].dot(
(samples[ii] - simp.transform[s[ii], dim]).transpose())
coord = np.c_[b0, 1 - b0.sum(axis=1)]
return coord
def f_2(points, samples):
""" modified """
simp = ssp.Delaunay(points)
s = simp.find_simplex(samples)
b0 = (simp.transform[s, :points.shape[1]].transpose([1, 0, 2]) *
(samples - simp.transform[s, points.shape[1]])).sum(axis=2).T
coord = np.c_[b0, 1 - b0.sum(axis=1)]
return coord
Test case:
N = 100
points = np.array(list(itertools.product(range(N), repeat=2)))
samples = np.random.rand(100_000, 2) * N
Result:
%timeit f_1(points, samples)
712 ms ± 2.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit f_2(points, samples)
422 ms ± 809 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
With modified version the line simp.find_simplex(samples) gives about 95% of the running time. So, I guess there is nothing else you can do with vectorization. To improve perfomance further you need another implementation of find_simplex method or another approach to the problem.

Improving matrix multiplication in Numpy

me and some friends are doing a small language competition to calculate some neural networks. Some doing in C other in fortran, and me: Python.
The code is simple, is just a bunch of vector dot operations and a summation after that apply a signal function and return -1 or 1 (activated or not).
With that we are sending a bunch of random numbers and checking (right now only single process) which language do it faster.
My code is simple as this:
def sgn(h):
"""Signal function"""
return -1 if h < 0 else 1
def lincomb(A, B):
"""Linear combinator between two matrices"""
return np.einsum('ji,ij->', A, B)
def lincombrav(A, B):
return A.ravel().dot(B.ravel('F'))
def functional_test():
w1 = np.random.random(50**2).reshape(50,50)
w2 = np.random.random(50**2).reshape(50,50)
return sgn(lincombrav(w1, w2))
Where A and B are matrices that represent each layer in a neural network. then we dot the ith-column of the first matrix with the ith-row for the second matrix, sum all results and send to signal function. Something like:
w1 = 2*np.random.random(100**2).reshape(100,100)-1
w2 = 2*np.random.random(100**2).reshape(100,100)-1
then we time it with
%timeit sgn(lincomb(w1, w2))
Python is losing to Fortran by 38x :-(
Is there anyway to improve that Python "code".
EDIT: Added timeit results:
Python version (already with the ravel mode)
In [10]: %timeit functional_test()
8.72 µs ± 406 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Python version (with einsum)
In [16]: %timeit functional_test()
10.27 µs ± 490 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Fortran version
In [13]: %timeit fort.test()
235 ns ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Fortran version was created using "f2py" program, to generate a python loadable module from fortran code.
The test functions do the following (in each language):
Create the matrix A
Create the matrix B
call sgn(lincomb(A,B)) # from each respective language implementation
I also moved the matrix creation to outside, to run only the mathematical operation instead also handling memory. Still, python is behind by same magnitude.
EDIT2: Good python news. Python has won in all but the small matrix tests. Here will follow the whole code:
Python functions (bla.py)
import numpy as np
from numba import jit
import timeit
import matplotlib.pyplot as plt
def sgn(h):
"""Signal function"""
return -1 if h < 0 else 1
def lincomb(A, B):
"""Linear combinator between two matrices"""
return np.einsum('ji,ij->', A, B)
def lincombrav(A, B):
return A.ravel().dot(B.ravel('F'))
def functional_test_ravel(n):
"""Functional tests (Victor experiment)"""
w = 2*np.random.random(n**2).reshape(n,n)-1
x = 2*np.random.random(n**2).reshape(n,n)-1
return sgn(lincombrav(w, x))
def functional_test_einsum(n):
"""Functional tests (Victor experiment)"""
w = 2*np.random.random(n**2).reshape(n,n)-1
x = 2*np.random.random(n**2).reshape(n,n)-1
return sgn(lincomb(w, x))
#jit()
def functional_test_numbaein(n):
"""Functional tests (Victor experiment)"""
w = 2*np.random.random(n**2).reshape(n,n)-1
x = 2*np.random.random(n**2).reshape(n,n)-1
return sgn(lincomb(w, x))
#jit()
def functional_test_numbarav(n):
"""Functional tests (Victor experiment)"""
w = 2*np.random.random(n**2).reshape(n,n)-1
x = 2*np.random.random(n**2).reshape(n,n)-1
return sgn(lincombrav(w, x))
Fortran functions (fbla.f95)
module fbla
implicit none
integer, parameter::dp = selected_real_kind(12,100)
public
contains
real(kind=dp) function sgn(x)
integer, parameter::dp = selected_real_kind(12,100)
real(kind=dp), intent(in):: x
if(x >= 0.0 ) then
sgn = +1.0
else if (x < 0.0) then
sgn = -1.0
end if
end function sgn
real(kind=dp) function lincomb(A, B, n)
integer, parameter :: sp = selected_int_kind(r=8)
integer, parameter :: dp = selected_real_kind(12,100)
integer(kind=sp) :: i
integer(kind=sp), intent(in):: n
real(kind=DP), intent(in) :: A(n,n)
real(kind=DP), intent(in) :: B(n,n)
lincomb = 0
do i=1,n
lincomb = lincomb + dot_product(A(:,i),B(i,:))
end do
end function lincomb
real(kind=dp) function functional_test(n)
integer, parameter::dp = selected_real_kind(12,100)
integer, parameter::sp = selected_int_kind(r=8)
integer(kind=sp), intent(in):: n
integer(kind=sp):: i, j
real(kind=dp), allocatable, dimension(:,:):: x, w, wt
ALLOCATE(wt(n,n),w(n,n),x(n,n))
do i=1,n
do j=1,n
w(i,j) = 2*rand(0)-1
x(i,j) = 2*rand(0)-1
end do
end do
wt = transpose(w)
functional_test = sgn(lincomb(wt, x, n))
end function functional_test
end module fbla
Test execution functions (tests.py)
import numpy as np
import timeit
import matplotlib.pyplot as plt
import bla
from fbla import fbla
def run_test(test_functions, N, runs=1000):
results = []
global rank
for n in N:
rank = n
for t in test_functions:
# print(f'Rank {globals()["rank"]}')
print(f'Running {t} to matrix size {rank}', end='')
r = min(timeit.Timer(t , globals=globals()).repeat(repeat=5, number=runs))
print(f' total time {r} per run {r/runs}')
results.append((t, n, r, r/runs))
return results
def plotbars(results, test_functions, N):
Nsz = len(N)
M = len(test_functions)
fig, ax = plt.subplots()
ind = np.arange(int(Nsz))
width = 1/(M+1)
p = []
for n in range(M):
g = [ w*1000 for (x,y,z,w) in results if x==test_functions[n]]
p.append(ax.bar(ind+n*width, g, width, bottom=0))
ax.legend([ l[0] for l in p ], test_functions)
ax.set_xticks(ind-width/2+((M/2) * width))
ax.set_xticklabels(np.array(N).astype(str))
ax.set_xlabel('Rank of square random matrix')
ax.set_ylabel('Average time(ms) per run')
ax.set_yscale('log')
return fig
N = (10, 50, 100, 1000)
test_functions = [
'bla.functional_test_einsum(rank)',
'fbla.functional_test(rank)'
]
results = run_test(test_functions, N)
plot = plotbars(results, test_functions, N)
plot.show()
The results are:
[('bla.functional_test_einsum(rank)', 10, 0.023221354000270367, 2.3221354000270368e-05),
('fbla.functional_test(rank)', 10, 0.005375514010665938, 5.375514010665938e-06),
('bla.functional_test_einsum(rank)', 50, 0.07035048000398092, 7.035048000398091e-05),
('fbla.functional_test(rank)', 50, 0.1242617039824836, 0.0001242617039824836),
('bla.functional_test_einsum(rank)', 100, 0.22694124400732107, 0.00022694124400732108),
('fbla.functional_test(rank)', 100, 0.5518505079962779, 0.0005518505079962779),
('bla.functional_test_einsum(rank)', 1000, 37.88827919398318, 0.03788827919398318),
('fbla.functional_test(rank)', 1000, 74.09929457501858, 0.07409929457501857)]
Some standard timeit output from a ipython3 session. fbla is the fortran library while bla is standard python library.
In : n=1000
In : w1 = 2*np.random.random(n**2).reshape(n,n)-1
In : w2 = 2*np.random.random(n**2).reshape(n,n)-1
In : bla.sgn(bla.lincomb(w1,w2))
Out: -1
In : fbla.sgn(fbla.lincomb(w1,w2))
Out: -1.0
In : %timeit fbla.sgn(fbla.lincomb(w1,w2))
11.3 ms ± 430 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In : %timeit bla.sgn(bla.lincomb(w1,w2))
3.81 ms ± 573 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
We can improve a bit with matrix-multiplication -
sgn(w1.ravel().dot(w2.ravel('F')))
If you want Numpy to be faster get a faster Numpy. Try uninstalling Numpy and installing the Intel optimized version of Numpy. Intel's optimized version of Numpy includes a number of CPU level optimizations that should significantly improve the performance of operations such as matrix multiplication on machines that use an Intel CPU.
pip uninstall numpy
pip install intel-numpy

Categories