Given
z = np.linspace(1,10,100)
Calculate Summation over all values of z in z^k * exp((-z^2)/ 2)
import numpy as np
import math
def calc_Summation1(z, k):
ans = 0.0
for i in range(0, len(z)):`
ans += math.pow(z[i], k) * math.exp(math.pow(-z[i], 2) / 2)
return ans
def calc_Summation2(z,k):
part1 = z**k
part2 = math.exp(-z**2 / 2)
return np.dot(part1, part2.transpose())
Can someone tell me what is wrong with both calc_Summation1 and calc_Summation2?
I think this might be what you're looking for:
sum(z_i**k * math.exp(-z_i**2 / 2) for z_i in z)
If you want to vectorize calculations with numpy, you need to use numpy's ufuncs. Also, the usual way of doing you calculation would be:
import numpy as np
calc = np.sum(z**k * np.exp(-z*z / 2))
although you can keep your approach using np.dot if you call np.exp instead of math.exp:
calc = np.dot(z**k, np.exp(-z*z / 2))
It does run faster with dot:
In [1]: z = np.random.rand(1000)
In [2]: %timeit np.sum(z**5 * np.exp(-z*z / 2))
10000 loops, best of 3: 142 µs per loop
In [3]: %timeit np.dot(z**5, np.exp(-z*z / 2))
1000 loops, best of 3: 129 µs per loop
In [4]: np.allclose(np.sum(z**5 * np.exp(-z*z / 2)),
... np.dot(z**5, np.exp(-z*z / 2)))
Out[4]: True
k=1
def myfun(z_i):
return z_i**k * math.exp(-z_i**2 / 2)
sum(map(myfun,z))
We define a function for the thing we want to sum, use the map function to apply it to each value in the list and then sum all these values. Having to use an external variable k is slightly niggling.
A refinement would be to define a two argument function
def myfun2(z_i,k):
return z_i**k * math.exp(-z_i**2 / 2)
and use a lambda expression to evaluate it
sum(map(lambda x:myfun2(x,1), z))
Related
I want to calculate the weighted average for a long array.
which rolling in a fixed-size window.
For example:
In [1]: import pandas as pd
In [2]: a = pd.Series(range(int(1e8)))
In [5]: import numpy as np; w = np.array(list(range(10)));
In [6]: a.rolling(10).apply(lambda x: (x * w).sum())
as i tried, this is very slow, I read some blog, sometimes it can be speeded up by:
a.rolling(10).apply(np.argmax, engine='numba', raw=True)
but this can only be used in build-in function, for some customed function, seems not work.
do you know how to make it standable in costing time?
Solution with np.convolve
We can convolve the weights w over the series a using the valid mode convolution operation, essentially this has the same effect as calculating the rolling weighted sum.
s = np.convolve(a, w[::-1], 'valid')
s = [np.nan] * (len(w) - 1) + list(s)
Solution with sliding_window_view
Alternatively, we can also use sliding_window_view to speed up the rolling weighted sum computation
from numpy.lib.stride_tricks import sliding_window_view
s = (sliding_window_view(a, len(w)) * w).sum(1)
s = [np.nan] * (len(w) - 1) + list(s)
Timings
a = pd.Series(range(int(1e4)))
%%timeit
s = np.convolve(a, w[::-1], 'valid')
s = [np.nan] * (len(w) - 1) + list(s)
# 1000 loops, best of 5: 626 µs per loop
%%timeit
s = (sliding_window_view(a, len(w)) * w).sum(1)
s = [np.nan] * (len(w) - 1) + list(s)
# 1000 loops, best of 5: 1.2 ms per loop
%%timeit
s = a.rolling(10).apply(lambda x: (x * w).sum())
# 1 loop, best of 5: 3.6 s per loop
As it is evident from the performance test using np.convolve is about 5750x faster, while using sliding_window_view is around 2880x faster compared to the pandas rolling + apply method.
Suppose I have a very computationally expensive function f(x). I want to compute some values of it, and then just access them instead of evaluating the function every time with new x values.
See the following simple example to illustrate what I mean:
import numpy as np
x = np.linspace(-3, 3, 6001)
fx = x**2
x = np.round(x, 3)
#I want to evaluate the function for the following w:
w = np.random.rand(10000)
#Rounding is necessary, so that the w match the x.
w = np.round(w, 3)
fx_w = []
for i in range(w.size):
fx_w.append(fx[x==w[i]])
fx_w = np.asarray(fx_w)
So, I'd like to have f(w) computed from the values already generated for x. Of course, a for loop is out of the question, so my question is: how can I implement this somewhat efficiently?
You can use searchsorted to find the corresponding indices of your prepared function array. This will be an approximation. Rounding is not necessary.
import numpy as np
np.random.seed(42)
x = np.linspace(-3, 3, 6001)
fx = x ** 2
w = np.random.rand(10000)
result = fx[np.searchsorted(x, w)]
print('aprox. F(x):', result)
print('real F(x):', w ** 2)
Output
aprox. F(x): [0.140625 0.904401 0.535824 ... 0.896809 0.158404 0.047524 ]
real F(x): [0.1402803 0.90385769 0.53581513 ... 0.89625588 0.1579967 0.04714996]
Your function has to be much more computationally intensive to justify this approach
%timeit fx[np.searchsorted(x, w)] #1000 loops, best of 5: 992 µs per loop
%timeit w ** 2 #100000 loops, best of 5: 3.81 µs per loop
For approximating the value of Pi consider this stochastic method that populates an array with random values and tests for unit circle inclusion,
import random as rd
import numpy as np
def r(_): return rd.random()
def np_pi(n):
v_r = np.vectorize(r)
x = v_r(np.zeros(n))
y = v_r(np.zeros(n))
return sum (x*x + y*y <= 1) * 4. / n
Note the random number generation relies on Python standard library; consider though numpy random generation,
def np_pi(n):
x = np.random.random(n)
y = np.random.random(n)
return sum (x*x + y*y <= 1) * 4. / n
Consider now the non-vectorized approach,
import random as rd
def dart_board():
x,y = rd.random(), rd.random()
return (x*x + y*y <= 1)
def pi(n):
s = sum([dart_board() for _ in range(n)])
return s * 4. / n
The non-vectorized form proves 4 times faster in average than the vectorized counterpart, for instance consider n = 5000000 and OS command line as follows (Python 2.7, Quadcore, 8GB RAM, RedHat Linux),
time python pi.py
time python np_pi.py
Thus to ask how to improve the vectorized approach to improve its performance.
You are invoking the python builtin sum, rather than numpy's vectorized method sum:
import numpy as np
import random as rd
def np_pi(n):
x = np.random.random(n)
y = np.random.random(n)
return (x*x + y*y <= 1).sum()
def dart_board():
x,y = rd.random(), rd.random()
return (x*x + y*y <= 1)
def pi(n):
s = sum([dart_board() for _ in range(n)])
Timing results are now much different:
In [12]: %timeit np_pi(10000)
1000 loops, best of 3: 250 us per loop
In [13]: %timeit pi(10000)
100 loops, best of 3: 3.54 ms per loop
It is my guess that calling the builtin sum on a numpy-array causes overhead by iterating over the array, rather than using vectorized routines.
I have data for latitude and longitude, and I need to calculate distance matrix between two arrays containing locations. I used this This to get distance between two locations given latitude and longitude.
Here is an example of my code:
import numpy as np
import math
def get_distances(locs_1, locs_2):
n_rows_1 = locs_1.shape[0]
n_rows_2 = locs_2.shape[0]
dists = np.empty((n_rows_1, n_rows_2))
# The loops here are inefficient
for i in xrange(n_rows_1):
for j in xrange(n_rows_2):
dists[i, j] = get_distance_from_lat_long(locs_1[i], locs_2[j])
return dists
def get_distance_from_lat_long(loc_1, loc_2):
earth_radius = 3958.75
lat_dif = math.radians(loc_1[0] - loc_2[0])
long_dif = math.radians(loc_1[1] - loc_2[1])
sin_d_lat = math.sin(lat_dif / 2)
sin_d_long = math.sin(long_dif / 2)
step_1 = (sin_d_lat ** 2) + (sin_d_long ** 2) * math.cos(math.radians(loc_1[0])) * math.cos(math.radians(loc_2[0]))
step_2 = 2 * math.atan2(math.sqrt(step_1), math.sqrt(1-step_1))
dist = step_2 * earth_radius
return dist
My expected output is this:
>>> locations_1 = np.array([[34, -81], [32, -87], [35, -83]])
>>> locations_2 = np.array([[33, -84], [39, -81], [40, -88], [30, -80]])
>>> get_distances(locations_1, locations_2)
array([[ 186.13522573, 345.46610882, 566.23466349, 282.51056676],
[ 187.96657622, 589.43369894, 555.55312473, 436.88855214],
[ 149.5853537 , 297.56950329, 440.81203371, 387.12153747]])
Performance is important for me, and one thing I could do is use Cython to speed up the loops, but it would be nice if I don't have to go there.
Is there a module that can do something like this? Or any other solution?
There's a lot of suboptimal things in the Haversine equations you are using. You can trim some of that and minimize the number of sines, cosines and square roots you need to calculate. The following is the best I have been able to come up with, and on my system runs about 5x faster than Ophion's code (which does mostly the same as far as vectorization goes) on two random arrays of 1000 and 2000 elements:
def spherical_dist(pos1, pos2, r=3958.75):
pos1 = pos1 * np.pi / 180
pos2 = pos2 * np.pi / 180
cos_lat1 = np.cos(pos1[..., 0])
cos_lat2 = np.cos(pos2[..., 0])
cos_lat_d = np.cos(pos1[..., 0] - pos2[..., 0])
cos_lon_d = np.cos(pos1[..., 1] - pos2[..., 1])
return r * np.arccos(cos_lat_d - cos_lat1 * cos_lat2 * (1 - cos_lon_d))
If you feed it your two arrays "as is" it will complain, but that's not a bug, it's a feature. Basically, this function computes the distance on a sphere over the last dimension, and broadcasts on the rest. So you can get what you are after as:
>>> spherical_dist(locations_1[:, None], locations_2)
array([[ 186.13522573, 345.46610882, 566.23466349, 282.51056676],
[ 187.96657622, 589.43369894, 555.55312473, 436.88855214],
[ 149.5853537 , 297.56950329, 440.81203371, 387.12153747]])
But it could also be used to calculate the distances between two lists of points, i.e.:
>>> spherical_dist(locations_1, locations_2[:-1])
array([ 186.13522573, 589.43369894, 440.81203371])
Or between two single points:
>>> spherical_dist(locations_1[0], locations_2[0])
186.1352257300577
This is inspired on how gufuncs work, and once you get used to it, I have found it to be a wonderful "swiss army knife" coding style, that lets you reuse a single function in lots of different settings.
It is more efiicient when using meshgrid to replace the double for loop:
import numpy as np
earth_radius = 3958.75
def get_distances(locs_1, locs_2):
lats1, lats2 = np.meshgrid(locs_1[:,0], locs_2[:,0])
lons1, lons2 = np.meshgrid(locs_1[:,1], locs_2[:,1])
lat_dif = np.radians(lats1 - lats2)
long_dif = np.radians(lons1 - lons2)
sin_d_lat = np.sin(lat_dif / 2.)
sin_d_long = np.sin(long_dif / 2.)
step_1 = (sin_d_lat ** 2) + (sin_d_long ** 2) * np.cos(np.radians(lats1[0])) * np.cos(np.radians(lats2[0]))
step_2 = 2 * np.arctan2(np.sqrt(step_1), np.sqrt(1-step_1))
dist = step_2 * earth_radius
return dist
This is simply vectorizing your code:
def new_get_distances(loc1, loc2):
earth_radius = 3958.75
locs_1 = np.deg2rad(loc1)
locs_2 = np.deg2rad(loc2)
lat_dif = (locs_1[:,0][:,None]/2 - locs_2[:,0]/2)
lon_dif = (locs_1[:,1][:,None]/2 - locs_2[:,1]/2)
np.sin(lat_dif, out=lat_dif)
np.sin(lon_dif, out=lon_dif)
np.power(lat_dif, 2, out=lat_dif)
np.power(lon_dif, 2, out=lon_dif)
lon_dif *= ( np.cos(locs_1[:,0])[:,None] * np.cos(locs_2[:,0]) )
lon_dif += lat_dif
np.arctan2(np.power(lon_dif,.5), np.power(1-lon_dif,.5), out = lon_dif)
lon_dif *= ( 2 * earth_radius )
return lon_dif
locations_1 = np.array([[34, -81], [32, -87], [35, -83]])
locations_2 = np.array([[33, -84], [39, -81], [40, -88], [30, -80]])
old = get_distances(locations_1, locations_2)
new = new_get_distances(locations_1,locations_2)
np.allclose(old,new)
True
If we look at timings:
%timeit new_get_distances(locations_1,locations_2)
10000 loops, best of 3: 80.6 µs per loop
%timeit get_distances(locations_1,locations_2)
10000 loops, best of 3: 74.9 µs per loop
It is actually slower for a small example; however, lets look at a larger example:
locations_1 = np.random.rand(1000,2)
locations_2 = np.random.rand(1000,2)
%timeit get_distances(locations_1,locations_2)
1 loops, best of 3: 5.84 s per loop
%timeit new_get_distances(locations_1,locations_2)
10 loops, best of 3: 149 ms per loop
We now have a speedup of 40x. Can probably squeeze some more speed in a few places.
Edit: Made a few updates to cut out redundant places and make it clear that we are not altering the original location arrays.
Does the Haversine formula provide good enough accuracy for your use? It can be off by quite a bit. I think you'd be able to get both accuracy and speed if you use proj.4, in particular the python bindings, pyproj. Note that pyproj can work directly on numpy arrays of coordinates.
A numerical integration is taking exponentially longer than I expect it to. I would like to know if the way that I implement the iteration over the mesh could be a contributing factor. My code looks like this:
import numpy as np
import itertools as it
U = np.linspace(0, 2*np.pi)
V = np.linspace(0, np.pi)
for (u, v) in it.product(U,V):
# values = computation on each grid point, does not call any outside functions
# solution = sum(values)
return solution
I left out the computations because they are long and my question is specifically about the way that I have implemented the computation over the parameter space (u, v). I know of alternatives such as numpy.meshgrid; however, these all seem to create instances of (very large) matrices, and I would guess that storing them in memory would slow things down.
Is there an alternative to it.product that would speed up my program, or should I be looking elsewhere for the bottleneck?
Edit: Here is the for loop in question (to see if it can be vectorized).
import random
import numpy as np
import itertools as it
##########################################################################
# Initialize the inputs with random (to save space)
##########################################################################
mat1 = np.array([[random.random() for i in range(3)] for i in range(3)])
mat2 = np.array([[random.random() for i in range(3)] for i in range(3)])
a1, a2, a3 = np.array([random.random() for i in range(3)])
plane_normal = np.array([random.random() for i in range(3)])
plane_point = np.array([random.random() for i in range(3)])
d = np.dot(plane_normal, plane_point)
truthval = True
##########################################################################
# Initialize the loop
##########################################################################
N = 100
U = np.linspace(0, 2*np.pi, N + 1, endpoint = False)
V = np.linspace(0, np.pi, N + 1, endpoint = False)
U = U[1:N+1] V = V[1:N+1]
Vsum = 0
Usum = 0
##########################################################################
# The for loops starts here
##########################################################################
for (u, v) in it.product(U,V):
cart_point = np.array([a1*np.cos(u)*np.sin(v),
a2*np.sin(u)*np.sin(v),
a3*np.cos(v)])
surf_normal = np.array(
[2*x / a**2 for (x, a) in zip(cart_point, [a1,a2,a3])])
differential_area = \
np.sqrt((a1*a2*np.cos(v)*np.sin(v))**2 + \
a3**2*np.sin(v)**4 * \
((a2*np.cos(u))**2 + (a1*np.sin(u))**2)) * \
(np.pi**2 / (2*N**2))
if (np.dot(plane_normal, cart_point) - d > 0) == truthval:
perp_normal = plane_normal
f = np.dot(np.dot(mat2, surf_normal), perp_normal)
Vsum += f*differential_area
else:
perp_normal = - plane_normal
f = np.dot(np.dot(mat2, surf_normal), perp_normal)
Usum += f*differential_area
integral = abs(Vsum) + abs(Usum)
If U.shape == (nu,) and (V.shape == (nv,), then the following arrays vectorize most of your calculations. With numpy you get the best speed by using arrays for the largest dimensions, and looping on the small ones (e.g. 3x3).
Corrected version
A = np.cos(U)[:,None]*np.sin(V)
B = np.sin(U)[:,None]*np.sin(V)
C = np.repeat(np.cos(V)[None,:],U.size,0)
CP = np.dstack([a1*A, a2*B, a3*C])
SN = np.dstack([2*A/a1, 2*B/a2, 2*C/a3])
DA1 = (a1*a2*np.cos(V)*np.sin(V))**2
DA2 = a3*a3*np.sin(V)**4
DA3 = (a2*np.cos(U))**2 + (a1*np.sin(U))**2
DA = DA1 + DA2 * DA3[:,None]
DA = np.sqrt(DA)*(np.pi**2 / (2*Nu*Nv))
D = np.dot(CP, plane_normal)
S = np.sign(D-d)
F1 = np.dot(np.dot(SN, mat2.T), plane_normal)
F = F1 * DA
#F = F * S # apply sign
Vsum = F[S>0].sum()
Usum = F[S<=0].sum()
With the same random values, this produces the same values. On a 100x100 case, it is 10x faster. It's been fun playing with these matrices after a year.
In ipython I did simple sum calculations on your 50 x 50 gridspace
In [31]: sum(u*v for (u,v) in it.product(U,V))
Out[31]: 12337.005501361698
In [33]: UU,VV = np.meshgrid(U,V); sum(sum(UU*VV))
Out[33]: 12337.005501361693
In [34]: timeit UU,VV = np.meshgrid(U,V); sum(sum(UU*VV))
1000 loops, best of 3: 293 us per loop
In [35]: timeit sum(u*v for (u,v) in it.product(U,V))
100 loops, best of 3: 2.95 ms per loop
In [38]: timeit list(it.product(U,V))
1000 loops, best of 3: 213 us per loop
In [45]: timeit UU,VV = np.meshgrid(U,V); (UU*VV).sum().sum()
10000 loops, best of 3: 70.3 us per loop
# using numpy's own sum is even better
product is slower (by factor 10), not because product itself is slow, but because of the point by point calculation. If you can vectorize your calculations so they use the 2 (50,50) arrays (without any sort of looping) it should speed up the overall time. That's the main reason for using numpy.
[k for k in it.product(U,V)] runs in 2ms for me, and the itertool package is made to be efficient, e.g. it does not create a long array first (http://docs.python.org/2/library/itertools.html).
The culprit seems to be your code inside the iteration, or your using a lot of points in linspace.