how can I speed up the calculations for this equation using GPU or cuda as the file contains 30.000 points
points = pd.read_csv('file.dat', sep='\t', usecols=[0, 1])
d = pd.DataFrame(np.zeros((max_id, max_id)))
dis = sch.distance.pdist(points, 'euclidean')
n = 0
for i in range(max_id):
print(i)
for j in range(i + 1, max_id):
d.at[i, j] = dis[n]
d.at[j, i] = d.at[i, j]
n += 1
EDIT
i tried
points = genfromtxt(path, delimiter='\t', usecols=[0, 1])
points =torch.tensor(points)
d = pd.DataFrame(np.zeros((max_id, max_id)))
dis = torch.cdist(points)
but got
TypeError: cdist() missing 1 required positional argument: 'x2'
does that mean I need to read points or the two columns of points separately?
NumPy doesn't natively support GPUs. Though you can use some libraries which are friendly with numpy and supports GPU. One of the option like that would be to use PyTorch. torch.cdist would be one of the function you can look at (Then you don't need to organize it like that using for loops). Also there is torch.nn.functional.pdist. Note also that, you don't need to use for loop in second case. Once you get the output, you can just reshape it as per your need.
Sorry, numpy doesn't support GPU, but it's not impossible to speed up your program. It seems that you want to write the calculated distance value to the non diagonal position. I use the pure numpy array to show you how to speed up your program:
>>> max_id = 5
>>> d = np.zeros((max_id, max_id))
>>> dis = np.arange(max_id // 2 * (max_id - 1)) ** 2 # set the value at will
>>> d[np.triu_indices(max_id, k=1)] = dis
>>> d + d.T # d += d.T is better
array([[ 0., 0., 1., 4., 9.],
[ 0., 0., 16., 25., 36.],
[ 1., 16., 0., 49., 64.],
[ 4., 25., 49., 0., 81.],
[ 9., 36., 64., 81., 0.]])
After testing, when the max_id is 500, the accelerated program is more than ten times faster than the previous one:
>>> def loop(max_id):
... d = np.zeros((max_id, max_id))
... dis = np.arange(max_id // 2 * (max_id - 1)) ** 2
... n = 0
... for i in range(max_id):
... for j in range(i + 1, max_id):
... d[i, j] = d[j, i] = dis[n]
... n += 1
...
>>> def triu(max_id):
... d = np.zeros((max_id, max_id))
... dis = np.arange(max_id // 2 * (max_id - 1)) ** 2
... d[np.triu_indices(max_id, k=1)] = dis
... d += d.T
...
>>> timeit(lambda: loop(500), number=10)
0.546407600006205
>>> timeit(lambda: triu(500), number=10)
0.0350386000063736
If you want to do more complex loop operations, you can learn about numba.
Update:
It seems that the size of non diagonal block is larger than size of dis, you just need to slice the result of triu_indices:
dis_size = dis.size
i, j = np.triu_indices(max_id, k=1)
d[i[:dis_size], j[:dis_size]] = dis
Related
I am trying to use broadcasting to speed up my numpy code. the real code has much larger arrays and loops through multiple times, but I think this snippet illustrates the issue.
import numpy as np
row = np.array([0,0,1,1,4])
dl_ddk = np.array([0,8,29,112,11])
change1 = np.zeros(5)
change2 = np.zeros(5)
for k in range(0, row.shape[0]):
i = row[k]
change1[i] += dl_ddk[k]
change2[row] += dl_ddk
print(change1)
print(change2)
change1 = [8, 141, 0, 0 11]
change2 = [8, 112, 0, 0 11]
I thought these two change arrays would be equals however, it seems that the broadcast operations += is overwriting rather than adding values. Is there a way to vectorize a loop in np with matrix referencing like this that will give the same results as change1?
You can use np.bincount() and use dl_ddk as the weights:
import numpy as np
row = np.array([0,0,1,1,4])
dl_ddk = np.array([0,8,29,112,11])
change1 = np.bincount(row, weights=dl_ddk)
print(change1)
# [ 8. 141. 0. 0. 11.]
The bit in the docs show using it in a way almost exactly like your problem:
If weights is specified the input array is weighted by it, i.e. if a
value n is found at position i, out[n] += weight[i] instead of out[n]
+= 1.
In [1]: row = np.array([0,0,1,1,4])
...: dl_ddk = np.array([0,8,29,112,11])
...: change1 = np.zeros(5)
...: change2 = np.zeros(5)
...: for k in range(0, row.shape[0]):
...: i = row[k]
...: change1[i] += dl_ddk[k]
...: change2[row] += dl_ddk
change2 does not match because of buffering. ufunc has added a at method to address this:
Performs unbuffered in place operation on operand 'a' for elements specified by 'indices'.
In [3]: change3 = np.zeros(5)
In [4]: np.add.at(change3, row, dl_ddk)
In [5]: change1
Out[5]: array([ 8., 141., 0., 0., 11.])
In [6]: change2
Out[6]: array([ 8., 112., 0., 0., 11.])
In [7]: change3
Out[7]: array([ 8., 141., 0., 0., 11.])
suppose I have this 2d array A:
[[0,0,0,0],
[0,0,0,0],
[0,0,0,0],
[0,0,0,4]]
and I want to sum B:
[[1,2,3]
[4,5,6]
[7,8,9]]
centered on A[0][0] so the result would be:
array_sum(A,B,0,0) =
[[5,6,0,4],
[8,9,0,0],
[0,0,0,0],
[2,0,0,5]]
I was thinking that I should make a function that compares if its on a boundary and then adjust the index for that:
def array_sum(A,B,i,f):
...
if i == 0 and j == 0:
A[-1][-1] = A[-1][-1]+B[0][0]
...
else:
A[i-1][j-1] = A[i][j]+B[0][0]
A[i][j] = A[i][j]+B[1][1]
A[i+1][j+1] = A[i][j]+B[2][2]
...
but I don't know if there is a better way of doing that, I was reading about broadcasting or maybe using convolute for that, but I'm not sure if there is a better way to do that.
Assuming B.shape is all odd numbers, you can use np.indices, manipulate them to point where you want, and use np.add.at
def array_sum(A, B, loc = (0, 0)):
A_ = A.copy()
ix = np.indices(B.shape)
new_loc = np.array(loc) - np.array(B.shape) // 2
new_ix = np.mod(ix + new_loc[:, None, None],
np.array(A.shape)[:, None, None])
np.add.at(A_, tuple(new_ix), B)
return A_
Testing:
array_sum(A, B)
Out:
array([[ 5., 6., 0., 4.],
[ 8., 9., 0., 7.],
[ 0., 0., 0., 0.],
[ 2., 3., 0., 5.]])
As a rule of thumb slice indexing is faster (~2x) than fancy indexing. This appears to be true even for the small example in OP. Downside: the code is slightly more complicated.
import numpy as np
from numpy import s_ as _
from itertools import product, starmap
def wrapsl1d(N, n, c):
# check in 1D whether a patch of size n centered at c in a vector
# of length N fits or has to be wrapped around
# return appropriate slice objects for both vector and patch
assert n <= N
l = (c - n//2) % N
h = l + n
# return list of pairs (index into A, index into patch)
# 2 pairs if we wrap around, otherwise 1 pair
return [_[l:h, :]] if h <= N else [_[l:, :N-l], _[:h-N, n+N-h:]]
def use_slices(A, patch, center=(0, 0)):
slAptch = product(*map(wrapsl1d, A.shape, patch.shape, center))
# the product now has elements [(idx0A, idx0ptch), (idx1A, idx1ptch)]
# transpose them:
slAptch = starmap(zip, slAptch)
out = A.copy()
for sa, sp in slAptch:
out[sa] += patch[sp]
return out
I am realizing Exponentiation of a matrix using FOR:
import numpy as np
fl=2
cl=2
fl2=fl
cl2=cl
M = random.random((fl,cl))
M2 = M
Result = np.zeros((fl,cl))
Temp = np.zeros((fl,cl))
itera = 2
print('Matriz A:\n',M)
print('Matriz AxA:\n',M2)
for i in range (0,itera):
for a in range(0,fl):
for b in range (0,cl):
Result[a,b]+=M[a,b]*M[a,b]
temp[a,b]=Result[a,b]
Res[a,k]=M[a,b]
print('Potencia:\n',temp)
print('Matriz:\n', Result)
The error is that it does not perform well the multiplication in Result[a,b]+=M[a,b]*M[a,b] and when I save it in a temporary matrix to multiply it with the original matrix, it does not make the next jump in for i in range (0,itera):
I know I can perform the function np.matmul
but I try to do it with the FOR loop
Example
You're looking for np.linalg.matrix_power.
If you're using numpy, don't use a for loop, use a vectorized operation.
arr = np.arange(16).reshape((4,4))
np.linalg.matrix_power(arr, 3)
array([[ 1680, 1940, 2200, 2460],
[ 4880, 5620, 6360, 7100],
[ 8080, 9300, 10520, 11740],
[11280, 12980, 14680, 16380]])
Which is the same as the explicit multiplication:
arr # arr # arr
>>> np.array_equal(arr # arr # arr, np.linalg.matrix_power(arr, 3))
True
Since you asked
If you really want a naive solution using loops, we can put together the pieces quite easily. First we need a way to actually multiple the matrices. There are options that beat n^3 complexity, this answer is not going to do that. Here is a basic matrix multiplication function:
def matmultiply(a, b):
res = np.zeros(a.shape)
size = a.shape[0]
for i in range(size):
for j in range(size):
for k in range(size):
res[i][j] += a[i][k] * b[k][j]
return res
Now you need an exponential function. This function takes a matrix and a power, and raises a matrix to that power.
def loopy_matrix_power(a, n):
res = np.identity(a.shape[0])
while n > 0:
if n % 2 == 0:
a = matmultiply(a, a)
n /= 2
else:
res = matmultiply(res, a)
n -= 1
return res
In action:
loopy_matrix_power(arr, 3)
array([[ 1680., 1940., 2200., 2460.],
[ 4880., 5620., 6360., 7100.],
[ 8080., 9300., 10520., 11740.],
[11280., 12980., 14680., 16380.]])
There are some problems here:
you do not reset the result matrix after multiplication is done, hence you keep adding more values; and
you never assign the result back to m to perform a next generation of multiplications.
Naive power implementation
I think it is also better to "encapsulate" matrix multiplication in a separate function, like:
def matmul(a1, a2):
m, ka = a1.shape
kb, n = a2.shape
if ka != kb:
raise ValueError()
res = np.zeros((m, n))
for i in range(m):
for j in range(n):
d = 0.0
for k in range(ka):
d += a1[i,k] * a2[k,j]
res[i, j] = d
return res
Then we can calculate the power of this matrix with:
m2 = m
for i in range(topow-1):
m = matmul(m, m2)
Note that we can not use m here as the only matrix. Since if we write m = matmul(m, m), then m is now m2. But that means that if we perform the multiplication a second time, we get m4 instead of m3.
This then produces the expected results:
>>> cross = np.array([[1,0,1],[0,1,0], [1,0,1]])
>>> matmul(cross, cross)
array([[2., 0., 2.],
[0., 1., 0.],
[2., 0., 2.]])
>>> matmul(cross, matmul(cross, cross))
array([[4., 0., 4.],
[0., 1., 0.],
[4., 0., 4.]])
>>> matmul(cross, matmul(cross, matmul(cross, cross)))
array([[8., 0., 8.],
[0., 1., 0.],
[8., 0., 8.]])
Logarithmic power multiplication
The above can calculate the Mn in O(n) (linear time), but we can do better, we can calculate this matrix in logarithmic time: we do this by looking if the power is 1, if it is, we simply return the matrix, if it is not, we check if the power is even, if it is even, we multiply the matrix with itself, and calculate the power of that matrix, but with the power divided by two, so M2 n=(M×M)n. If the power is odd, we do more or less the same, except that we multiply it with the original value for M: M2 n + 1=M×(M×M)n. Like:
def matpow(m, p):
if p <= 0:
raise ValueError()
if p == 1:
return m
elif p % 2 == 0: # even
return matpow(matmul(m, m), p // 2)
else: # odd
return matmul(m, matpow(matmul(m, m), p // 2))
The above can be written more elegantly, but I leave this as an exercise :).
Note however that using numpy arrays for scalar comuputations is typically less efficient than using the matrix multiplication (and other functions) numpy offers. These are optimized, and are not interpreted, and typically outperform Python equivalents significantly. Therefore I would really advice you to use these. The numpy functions are also tested, making it less likely that there are bugs in it.
I get an error such as;
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 42, in
G[alpha][n]=compute_G(x,n) NameError: name 'G' is not defined
Here is my code:
N = 20
N_cor = 25
N_cf = 25
a = 0.5
eps = 1.4
def update(x):
for j in range(0,N):
old_x = x[j]
old_Sj = S(j,x)
x[j] = x[j] + random.uniform(-eps,eps)
dS = S(j,x) - old_Sj
if dS>0 and exp(-dS)<random.uniform(0,1):
x[j] = old_x
def S(j,x):
jp = (j+1)%N
jm = (j-1)%N
return a*x[j]**2/2 + x[j]*(x[j]-x[jp]-x[jm])/a
def compute_G(x,n):
g = 0
for j in range(0,N):
g = g + x[j]*x[(j+n)%N]
return g/N
#def MCaverage(x,G):
import random
from math import exp
x=[]
for j in range(0,N):
x.append(0.0)
print"x(%d)=%f"%(j,x[j])
for j in range(0,5*N_cor):
update(x)
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for i in range(0,N):
print"x(%d)=%f"%(i,x[i])
for n in range(0,N):
G[alpha][n]=compute_G(x,n)
for n in range(0,N):
avg_G = 0
for alpha in range(0,N_cf):
avg_G = avg_G + G[alpha][n]
avg_G = avg_G / N_cf
print "G(%d) = %f"%(n,avg_G)
When i define G I get another error such as:
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 43, in
G[alpha][n]=compute_G(x,n) IndexError: list index out of range
Here is how i define G:
...
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for n in range(0,N):
G=[][]
G[alpha][n]=compute_G(x,n)
...
What should i do to define an array with two index ie a two dimensional matrix?
In Python a=[] defines a list, not an array. It certainly can be used to store a lot of elements all of the same numeric type, and one can define a mapping from two integers indexing a rectangular array to one list index. It's rather going against the grain, though. Hard to program and inefficiently stored, because lists are intended as ordered collections of objects which may be of arbitrary type.
What you probably need most is a direction to where to start reading. Here it is. Learn about Numpy http://www.numpy.org/, which is a Python module for use in typical scienticic calculations with arrays of (mostly) numeric data in which all the elements are of the same type. Here is a brief taster, after you have installed numpy.
>>> import numpy as np # importing as np is conventional
>>> p = np.zeros( (6,4) ) # two dimensional, 24 elements in total
>>> for i in range(4): p[i,i]=1
>>> p
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
numpy arrays are efficient ways of manipulating as much data as you can fit into your computer's RAM.
Underlying numpy is Python's array.array datatype, but it is rarely used on its own. numpy is the support code that you'll usually not want to write for yourself. Not least, because when your arrays are millions or billions of elements, you can't afford the inefficiency of inner loops over their indices in an interpreted language like Python. Numpy offers you row-, column- and array-level operations whose underlying code is compiled and optimized, so it runs considerably faster.
I have a list of times (called times in my code, produced by the code suggested to me in the thread astropy.io fits efficient element access of a large table) and I want to do some statistical tests for periodicity, using Zn^2 and epoch folding tests. Some steps in the code take quite a while to run, and I am wondering if there is a faster way to do it. I have tried the equivalent map and lambda functions, but that takes even longer. My list of times has several hundred or maybe thousands of elements, depending on the dataset. Here is my code:
phase=[(x-mintime)*testfreq[m]-int((x-mintime)*testfreq[m]) for x in times]
# the above step takes 3 seconds for the dataset I am using for testing
# testfreq[m] is just one of several hundred frequencies I am testing
# times is of type numpy.ndarray
phasebin=[int(ph*numbins)for ph in phase]
# 1 second (numbins is 20)
powerarray=[phasebin.count(n) for n in range(0,numbins-1)]
# 0.3 seconds
poweravg=np.mean(powerarray)
chisq[m]=sum([(pow-poweravg)**2/poweravg for pow in powerarray])
# the above 2 steps are very quick
for n in range(0,maxn): # maxn is 3
cosparam=sum([(np.cos(2*np.pi*(n+1)*ph)) for ph in phase])
sinparam=sum([(np.sin(2*np.pi*(n+1)*ph)) for ph in phase])
# these steps each take 4 seconds
z2[m,n]=sum(z2[m,])+(cosparam**2+sinparam**2)/count
# this is quick (count is the number of times)
As this steps through several hundred frequencies on either side of frequencies identified through an FFT search, it takes a very long time to run. The same functionality in a lower level language runs much more quickly, but I need some of the Python modules for plotting, etc. I am hoping that Python can be persuaded to do some of the operations, particularly the phase, phasebin, powerarray, cosparam, and sinparam calculations, significantly faster, but I am not sure how to make this happen. Can anyone tell me how this can be done, or do I have to write and call functions in C or fortran? I know that this could be done in a few minutes e.g. in fortran, but this Python code takes hours as it is.
Thanks very much.
Instead of Python lists, you could use the numpy library, it is much faster for linear algebra type operations. For example to add two arrays in an element-wise fashion
>>> import numpy as np
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([2,3,4,5,6])
>>> a + b
array([ 3, 5, 7, 9, 11])
Similarly, you can multiply arrays by scalars which multiplies each element as you'd expect
>>> 2 * a
array([ 2, 4, 6, 8, 10])
As far as speed, here is the Python list equivalent of adding two lists
>>> c = [1,2,3,4,5]
>>> d = [2,3,4,5,6]
>>> [i+j for i,j in zip(c,d)]
[3, 5, 7, 9, 11]
Then timing the two
>>> from timeit import timeit
>>> setup = '''
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([2,3,4,5,6])'''
>>> timeit('a+b', setup)
0.521275608325351
>>> setup = '''
c = [1,2,3,4,5]
d = [2,3,4,5,6]'''
>>> timeit('[i+j for i,j in zip(c,d)]', setup)
1.2781205834379108
In this small example numpy was more than twice as fast.
for loop substitute - operating on complete arrays
First multiply phase by 2*pi*n using broadcasting
phase = np.arange(10)
maxn = 3
ens = np.arange(1, maxn+1) # array([1, 2, 3])
two_pi_ens = 2*np.pi*ens
b = phase * two_pi_ens[:, np.newaxis]
b.shape is (3,10) one row for each value of range(1, maxn)
Take the cosine then sum to get the three cosine parameters
c = np.cos(b)
c_param = c.sum(axis = 1) # c_param.shape is 3
Take the sine then sum to get the three sine parameters
s = np.sin(b)
s_param = s.sum(axis = 1) # s_param.shape is 3
Sum of the squares divided by count
d = (np.square(c_param) + np.square(s_param)) / count
# d.shape is (3,)
Assign to z2
for n in range(maxn):
z2[m,n] = z2[m,:].sum() + d[n]
That loop is doing a cumulative sum. numpy ndarrays have a cumsum method.
If maxn is small (3 in your case) it may not be noticeably faster.
z2[m,:] += d
z2[m,:].cumsum(out = z2[m,:])
To illustrate:
>>> a = np.ones((3,3))
>>> a
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
>>> m = 1
>>> d = (1,2,3)
>>> a[m,:] += d
>>> a
array([[ 1., 1., 1.],
[ 2., 3., 4.],
[ 1., 1., 1.]])
>>> a[m,:].cumsum(out = a[m,:])
array([ 2., 5., 9.])
>>> a
array([[ 1., 1., 1.],
[ 2., 5., 9.],
[ 1., 1., 1.]])
>>>