how to efficiently vectorize a function and retrieve values from it

how to efficiently vectorize a function and retrieve values from it - python

Suppose I have a very computationally expensive function f(x). I want to compute some values of it, and then just access them instead of evaluating the function every time with new x values.
See the following simple example to illustrate what I mean:
import numpy as np
x = np.linspace(-3, 3, 6001)
fx = x**2
x = np.round(x, 3)
#I want to evaluate the function for the following w:
w = np.random.rand(10000)
#Rounding is necessary, so that the w match the x.
w = np.round(w, 3)
fx_w = []
for i in range(w.size):
fx_w.append(fx[x==w[i]])
fx_w = np.asarray(fx_w)
So, I'd like to have f(w) computed from the values already generated for x. Of course, a for loop is out of the question, so my question is: how can I implement this somewhat efficiently?

You can use searchsorted to find the corresponding indices of your prepared function array. This will be an approximation. Rounding is not necessary.
import numpy as np
np.random.seed(42)
x = np.linspace(-3, 3, 6001)
fx = x ** 2
w = np.random.rand(10000)
result = fx[np.searchsorted(x, w)]
print('aprox. F(x):', result)
print('real F(x):', w ** 2)
Output
aprox. F(x): [0.140625 0.904401 0.535824 ... 0.896809 0.158404 0.047524 ]
real F(x): [0.1402803 0.90385769 0.53581513 ... 0.89625588 0.1579967 0.04714996]
Your function has to be much more computationally intensive to justify this approach
%timeit fx[np.searchsorted(x, w)] #1000 loops, best of 5: 992 µs per loop
%timeit w ** 2 #100000 loops, best of 5: 3.81 µs per loop

Related

Cross product all points in 3D space

I have a vector, a, which I wish to cross with every point in a defined 3D space.
import numpy as np
# Grid
x = np.arange(-4,4,0.1)
y = np.arange(-4,4,0.1)
z = np.arange(-4,4,0.1)
a = [1,0,0]
result = [[] for i in range(3)]
for j in range(len(x)): # loop on x coords
for k in range(len(y)): # loop on y coords
for l in range(len(z)): # loop on z coords
r = [x[j] , y[k], z[l]]
result[0].append(np.cross(a, r)[0])
result[1].append(np.cross(a, r)[1])
result[2].append(np.cross(a, r)[2])
This produces an array which has taken the cross product of a with every point in space. However, the process takes far too long, due to the nested loops. Is there anyway to exploit vectors (meshgrid perhaps?) to make this process faster?

Here's one vectorized approach -
np.cross(a, np.array(np.meshgrid(x,y,z)).transpose(2,1,3,0)).reshape(-1,3).T
Sample run -
In [403]: x = np.random.rand(4)
...: y = np.random.rand(5)
...: z = np.random.rand(6)
...:
In [404]: result = original_app(x,y,z,a)
In [405]: out = np.cross(a, np.array(np.meshgrid(x,y,z)).\
transpose(2,1,3,0)).reshape(-1,3).T
In [406]: np.allclose(result[0], out[0])
Out[406]: True
In [407]: np.allclose(result[1], out[1])
Out[407]: True
In [408]: np.allclose(result[2], out[2])
Out[408]: True
Runtime test -
# Original setup used in the question
In [393]: # Grid
...: x = np.arange(-4,4,0.1)
...: y = np.arange(-4,4,0.1)
...: z = np.arange(-4,4,0.1)
...:
# Original approach
In [397]: %timeit original_app(x,y,z,a)
1 loops, best of 3: 21.5 s per loop
# #Denziloe's soln
In [395]: %timeit [np.cross(a, r) for r in product(x, y, z)]
1 loops, best of 3: 7.34 s per loop
# Proposed in this post
In [396]: %timeit np.cross(a, np.array(np.meshgrid(x,y,z)).\
transpose(2,1,3,0)).reshape(-1,3).T
100 loops, best of 3: 16 ms per loop
More than 1000x speedup over the original one and more than 450x over the loopy approach from other post.

This takes a couple of seconds to run on my machine:
from itertools import product
result = [np.cross(a, r) for r in product(x, y, z)]
I don't know if that's fast enough for you, but there are a lot of calculations involved. It's certainly cleaner, and there is at least some reduction of redundancy (e.g. calculating np.cross(a, r) three times). It also gives the result in a slightly different format, but this is the natural way to store the result and is hopefully fine for your purposes.

NumPy - vectorizing matrix-matrix column correlation coefficient [duplicate]

I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).
What's the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I'm open to converting them to a different type.
I'm expecting my output to be an array with the shape N X M.
N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.
Here are some things to note:
The numpy function correlate requires input arrays to be one-dimensional.
The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.
The scipy.stats function pearsonr requires input arrays to be one-dimensional.

Correlation (default 'valid' case) between two 2D arrays:
You can simply use matrix-multiplication np.dot like so -
out = np.dot(arr_one,arr_two.T)
Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.
Row-wise Correlation Coefficient calculation for two 2D arrays:
def corr2_coeff(A, B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:, None]
B_mB = B - B.mean(1)[:, None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum(1)
# Finally get corr coeff
return np.dot(A_mA, B_mB.T) / np.sqrt(np.dot(ssA[:, None],ssB[None]))
This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB
Benchmarking
This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.
Case #1:
In [106]: A = np.random.rand(1000, 100)
In [107]: B = np.random.rand(1000, 100)
In [108]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15 ms per loop
In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop
Case #2:
In [110]: A = np.random.rand(5000, 100)
In [111]: B = np.random.rand(5000, 100)
In [112]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 368 ms per loop
In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop
Case #3:
In [114]: A = np.random.rand(10000, 10)
In [115]: B = np.random.rand(10000, 10)
In [116]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 1.29 s per loop
In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop
The other loopy pearsonr based approach seemed too slow, but here are the runtimes for one small datasize -
In [118]: A = np.random.rand(1000, 100)
In [119]: B = np.random.rand(1000, 100)
In [120]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15.3 ms per loop
In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop
In [122]: %timeit pearsonr_based(A, B)
1 loops, best of 3: 33 s per loop

#Divakar provides a great option for computing the unscaled correlation, which is what I originally asked for.
In order to calculate the correlation coefficient, a bit more is required:
import numpy as np
def generate_correlation_map(x, y):
"""Correlate each n with each m.
Parameters
----------
x : np.array
Shape N X T.
y : np.array
Shape M X T.
Returns
-------
np.array
N X M array in which each element is a correlation coefficient.
"""
mu_x = x.mean(1)
mu_y = y.mean(1)
n = x.shape[1]
if n != y.shape[1]:
raise ValueError('x and y must ' +
'have the same number of timepoints.')
s_x = x.std(1, ddof=n - 1)
s_y = y.std(1, ddof=n - 1)
cov = np.dot(x,
y.T) - n * np.dot(mu_x[:, np.newaxis],
mu_y[np.newaxis, :])
return cov / np.dot(s_x[:, np.newaxis], s_y[np.newaxis, :])
Here's a test of this function, which passes:
from scipy.stats import pearsonr
def test_generate_correlation_map():
x = np.random.rand(10, 10)
y = np.random.rand(20, 10)
desired = np.empty((10, 20))
for n in range(x.shape[0]):
for m in range(y.shape[0]):
desired[n, m] = pearsonr(x[n, :], y[m, :])[0]
actual = generate_correlation_map(x, y)
np.testing.assert_array_almost_equal(actual, desired)

For those interested in computing the Pearson correlation coefficient between a 1D and 2D array, I wrote the following function, where x is a 1D array and y a 2D array.
def pearsonr_2D(x, y):
"""computes pearson correlation coefficient
where x is a 1D and y a 2D array"""
upper = np.sum((x - np.mean(x)) * (y - np.mean(y, axis=1)[:,None]), axis=1)
lower = np.sqrt(np.sum(np.power(x - np.mean(x), 2)) * np.sum(np.power(y - np.mean(y, axis=1)[:,None], 2), axis=1))
rho = upper / lower
return rho
Example run:
>>> x
Out[1]: array([1, 2, 3])
>>> y
Out[2]: array([[ 1, 2, 3],
[ 6, 7, 12],
[ 9, 3, 1]])
>>> pearsonr_2D(x, y)
Out[3]: array([ 1. , 0.93325653, -0.96076892])

Fully vectorise numpy polyfit

Overview
I am running into issues with performance using polyfit because it doesn't appear able to accept broadcast arrays. I am aware from this post that the dependant data y can be multidimensional if you use numpy.polynomial.polynomial.polyfit. However, the x dimension cannot be multidimensional. Is there anyway around this?
Motivation
I need to compute the rate of change of some data. To match with an experiment I want to use the following method: take data y and x, for short sections of data fit a polynomial, then use the fitted coefficient as an estimate of the rate of change.
Illustration
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = np.linspace(0, 10, n)
y = np.sin(x)
window_length = 10
ydot = [np.polyfit(x[j:j+window_length], y[j:j+window_length], 1)[0]
for j in range(n - window_length)]
x_mids = [x[j+window_length/2] for j in range(n - window_length)]
plt.plot(x, y)
plt.plot(x_mids, ydot)
plt.show()
The blue line is the original data (a sine curve), while the green is the first differential (a cosine curve).
The problem
To vectorise this I did the following:
window_length = 10
vert_idx_list = np.arange(0, len(x) - window_length, 1)
hori_idx_list = np.arange(window_length)
A, B = np.meshgrid(hori_idx_list, vert_idx_list)
idx_array = A + B
x_array = x[idx_array]
y_array = y[idx_array]
This broadcasts the two 1D vectors to 2D vectors of shape (n-window_length, window_length). Now I was hoping that polyfit would have an axis argument so I could parallelise the calculation, but no such luck.
Does anyone have any suggestion for how to do this? I am open to

The way polyfit works is by solving a least-square problem of the form:
y = [X].a
where y are your dependent coordinates, [X] is the Vandermonde matrix of the corresponding independent coordinates, and a is the vector of fitted coefficients.
In your case you are always computing a 1st degree polynomial approximation, and are actually only interested in the coefficient of the 1st degree term. This has a well known closed form solution you can find in any statistics book, or produce your self by creating a 2x2 linear system of equation premultiplying both sides of the above equation by the transpose of [X]. This all adds up to the value you want to calculate being:
>>> n = 10
>>> x = np.random.random(n)
>>> y = np.random.random(n)
>>> np.polyfit(x, y, 1)[0]
-0.29207474654700277
>>> (n*(x*y).sum() - x.sum()*y.sum()) / (n*(x*x).sum() - x.sum()*x.sum())
-0.29207474654700216
On top of that you have a sliding window running over your data, so you can use something akin to a 1D summed area table as follows:
def sliding_fitted_slope(x, y, win):
x = np.concatenate(([0], x))
y = np.concatenate(([0], y))
Sx = np.cumsum(x)
Sy = np.cumsum(y)
Sx2 = np.cumsum(x*x)
Sxy = np.cumsum(x*y)
Sx = Sx[win:] - Sx[:-win]
Sy = Sy[win:] - Sy[:-win]
Sx2 = Sx2[win:] - Sx2[:-win]
Sxy = Sxy[win:] - Sxy[:-win]
return (win*Sxy - Sx*Sy) / (win*Sx2 - Sx*Sx)
With this code you can easily check that (notice I extended the range by 1):
>>> np.allclose(sliding_fitted_slope(x, y, window_length),
[np.polyfit(x[j:j+window_length], y[j:j+window_length], 1)[0]
for j in range(n - window_length + 1)])
True
And:
%timeit sliding_fitted_slope(x, y, window_length)
10000 loops, best of 3: 34.5 us per loop
%%timeit
[np.polyfit(x[j:j+window_length], y[j:j+window_length], 1)[0]
for j in range(n - window_length + 1)]
100 loops, best of 3: 10.1 ms per loop
So it is about 300x faster for your sample data.

Sorry for answering my own question, but 20 minutes more of trying to get to grips with it I have the following solution:
ydot = np.polynomial.polynomial.polyfit(x_array[0], y_array.T, 1)[-1]
One confusing part is that np.polyfit returns the coefficients with the highest power first. In np.polynomial.polynomial.polyfit the highest power is last (hence the -1 instead of 0 index).
Another confusion is that we use only the first slice of x (x_array[0]). I think that this is okay because it is not the absolute values of the independent vector x that are used, but the difference between them. Or alternatively it is like changing the reference x value.
If there is a better way to do this I am still happy to hear about it!

Using an alternative method for calculating the rate of change may be the solution for both speed and accuracy increase.
n = 1000
x = np.linspace(0, 10, n)
y = np.sin(x)
def timingPolyfit(x,y):
window_length = 10
vert_idx_list = np.arange(0, len(x) - window_length, 1)
hori_idx_list = np.arange(window_length)
A, B = np.meshgrid(hori_idx_list, vert_idx_list)
idx_array = A + B
x_array = x[idx_array]
y_array = y[idx_array]
ydot = np.polynomial.polynomial.polyfit(x_array[0], y_array.T, 1)[-1]
x_mids = [x[j+window_length/2] for j in range(n - window_length)]
return ydot, x_mids
def timingSimple(x,y):
dy = (y[2:] - y[:-2])/2
dx = x[1] - x[0]
dydx = dy/dx
return dydx, x[1:-1]
y1, x1 = timingPolyfit(x,y)
y2, x2 = timingSimple(x,y)
polyfitError = np.abs(y1 - np.cos(x1))
simpleError = np.abs(y2 - np.cos(x2))
print("polyfit average error: {:.2e}".format(np.average(polyfitError)))
print("simple average error: {:.2e}".format(np.average(simpleError)))
result = %timeit -o timingPolyfit(x,y)
result2 = %timeit -o timingSimple(x,y)
print("simple is {0} times faster".format(result.best / result2.best))
polyfit average error: 3.09e-03
simple average error: 1.09e-05
100 loops, best of 3: 3.2 ms per loop
100000 loops, best of 3: 9.46 µs per loop
simple is 337.995634151131 times faster
Relative error:
Results:

Efficient weighted vector distance calculation with numpy

I want to calculate the squared euclidean distance between two sets of points, inputs and testing. inputs is typically a real array of size ~(200, N), whereas testing is typically ~(1e8, N), and N is around 10. The distances should be scaled in each dimension in N, so I'd be aggregating the expression scale[j]*(inputs[i,j] - testing[ii,j])**2 (where scale is the scaling vector) over N times. I am trying to make this as fast as possible, particularly as N can be large. My first test is
def old_version (inputs, testing, x0):
nn, d1 = testing.shape
n, d1 = inputs.shape
b = np.zeros((n, nn))
for d in xrange(d1):
b += x0[d] * (((np.tile(inputs[:, d], (nn, 1)) -
np.tile (testing[:, d], (n, 1)).T))**2).T
return b
Nothing too fancy. I then tried using scipy.spatial.distance.cdist, although I still have to loop through it to get the scaling right
def new_version (inputs, testing, x0):
# import scipy.spatial.distance as dist
nn, d1 = testing.shape
n, d1 = inputs.shape
b = np.zeros ((n, nn))
for d in xrange(d1):
b += x0[d] * dist.cdist(inputs[:, d][:, None],
testing[:, d][:, None], 'sqeuclidean')
return b
It would appear that new_version scales better (as N > 1000), but I'm not sure that I've gone as fast as possible here. Any further ideas much appreciated!

This code gave me a factor of 10 over your implementation, give it a try:
x = np.random.randn(200, 10)
y = np.random.randn(1e5, 10)
scale = np.abs(np.random.randn(1, 10))
scale_sqrt = np.sqrt(scale)
dist_map = dist.cdist(x*scale_sqrt, y*scale_sqrt, 'sqeuclidean')
These are the test results:
In [135]: %timeit suggested_version(inputs, testing, x0)
1 loops, best of 3: 341 ms per loop
In [136]: %timeit op_version(inputs, testing, x00) (NOTICE: x00 is a reshape of x0)
1 loops, best of 3: 3.37 s per loop
Just make sure than when you go for the larger N you don't get low on memory. It can really slow things down.

Slow computation: could itertools.product be the culprit?

A numerical integration is taking exponentially longer than I expect it to. I would like to know if the way that I implement the iteration over the mesh could be a contributing factor. My code looks like this:
import numpy as np
import itertools as it
U = np.linspace(0, 2*np.pi)
V = np.linspace(0, np.pi)
for (u, v) in it.product(U,V):
# values = computation on each grid point, does not call any outside functions
# solution = sum(values)
return solution
I left out the computations because they are long and my question is specifically about the way that I have implemented the computation over the parameter space (u, v). I know of alternatives such as numpy.meshgrid; however, these all seem to create instances of (very large) matrices, and I would guess that storing them in memory would slow things down.
Is there an alternative to it.product that would speed up my program, or should I be looking elsewhere for the bottleneck?
Edit: Here is the for loop in question (to see if it can be vectorized).
import random
import numpy as np
import itertools as it
##########################################################################
# Initialize the inputs with random (to save space)
##########################################################################
mat1 = np.array([[random.random() for i in range(3)] for i in range(3)])
mat2 = np.array([[random.random() for i in range(3)] for i in range(3)])
a1, a2, a3 = np.array([random.random() for i in range(3)])
plane_normal = np.array([random.random() for i in range(3)])
plane_point = np.array([random.random() for i in range(3)])
d = np.dot(plane_normal, plane_point)
truthval = True
##########################################################################
# Initialize the loop
##########################################################################
N = 100
U = np.linspace(0, 2*np.pi, N + 1, endpoint = False)
V = np.linspace(0, np.pi, N + 1, endpoint = False)
U = U[1:N+1] V = V[1:N+1]
Vsum = 0
Usum = 0
##########################################################################
# The for loops starts here
##########################################################################
for (u, v) in it.product(U,V):
cart_point = np.array([a1*np.cos(u)*np.sin(v),
a2*np.sin(u)*np.sin(v),
a3*np.cos(v)])
surf_normal = np.array(
[2*x / a**2 for (x, a) in zip(cart_point, [a1,a2,a3])])
differential_area = \
np.sqrt((a1*a2*np.cos(v)*np.sin(v))**2 + \
a3**2*np.sin(v)**4 * \
((a2*np.cos(u))**2 + (a1*np.sin(u))**2)) * \
(np.pi**2 / (2*N**2))
if (np.dot(plane_normal, cart_point) - d > 0) == truthval:
perp_normal = plane_normal
f = np.dot(np.dot(mat2, surf_normal), perp_normal)
Vsum += f*differential_area
else:
perp_normal = - plane_normal
f = np.dot(np.dot(mat2, surf_normal), perp_normal)
Usum += f*differential_area
integral = abs(Vsum) + abs(Usum)

If U.shape == (nu,) and (V.shape == (nv,), then the following arrays vectorize most of your calculations. With numpy you get the best speed by using arrays for the largest dimensions, and looping on the small ones (e.g. 3x3).
Corrected version
A = np.cos(U)[:,None]*np.sin(V)
B = np.sin(U)[:,None]*np.sin(V)
C = np.repeat(np.cos(V)[None,:],U.size,0)
CP = np.dstack([a1*A, a2*B, a3*C])
SN = np.dstack([2*A/a1, 2*B/a2, 2*C/a3])
DA1 = (a1*a2*np.cos(V)*np.sin(V))**2
DA2 = a3*a3*np.sin(V)**4
DA3 = (a2*np.cos(U))**2 + (a1*np.sin(U))**2
DA = DA1 + DA2 * DA3[:,None]
DA = np.sqrt(DA)*(np.pi**2 / (2*Nu*Nv))
D = np.dot(CP, plane_normal)
S = np.sign(D-d)
F1 = np.dot(np.dot(SN, mat2.T), plane_normal)
F = F1 * DA
#F = F * S # apply sign
Vsum = F[S>0].sum()
Usum = F[S<=0].sum()
With the same random values, this produces the same values. On a 100x100 case, it is 10x faster. It's been fun playing with these matrices after a year.

In ipython I did simple sum calculations on your 50 x 50 gridspace
In [31]: sum(u*v for (u,v) in it.product(U,V))
Out[31]: 12337.005501361698
In [33]: UU,VV = np.meshgrid(U,V); sum(sum(UU*VV))
Out[33]: 12337.005501361693
In [34]: timeit UU,VV = np.meshgrid(U,V); sum(sum(UU*VV))
1000 loops, best of 3: 293 us per loop
In [35]: timeit sum(u*v for (u,v) in it.product(U,V))
100 loops, best of 3: 2.95 ms per loop
In [38]: timeit list(it.product(U,V))
1000 loops, best of 3: 213 us per loop
In [45]: timeit UU,VV = np.meshgrid(U,V); (UU*VV).sum().sum()
10000 loops, best of 3: 70.3 us per loop
# using numpy's own sum is even better
product is slower (by factor 10), not because product itself is slow, but because of the point by point calculation. If you can vectorize your calculations so they use the 2 (50,50) arrays (without any sort of looping) it should speed up the overall time. That's the main reason for using numpy.

[k for k in it.product(U,V)] runs in 2ms for me, and the itertool package is made to be efficient, e.g. it does not create a long array first (http://docs.python.org/2/library/itertools.html).
The culprit seems to be your code inside the iteration, or your using a lot of points in linspace.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to efficiently vectorize a function and retrieve values from it - python

Related

Cross product all points in 3D space

NumPy - vectorizing matrix-matrix column correlation coefficient [duplicate]

Fully vectorise numpy polyfit

Efficient weighted vector distance calculation with numpy

Slow computation: could itertools.product be the culprit?

Categories

Resources