I am trying to create a 3D image mat1 from the data given to me by an object. But I am getting the error for the last line: mat1[x,y,z] = mat[x,y,z] + (R**2/U**2)**pf1[l,m,beta]:
IndexError: too many indices for array
What could possible be the problem here?
Following is my code :
mat1 = np.zeros((1024,1024,360),dtype=np.int32)
k = 498
gamma = 0.00774267
R = 0.37
g = np.zeros(1024)
g[0:512] = np.linspace(0,1,512)
g[513:] = np.linspace(1,0,511)
pf = np.zeros((1024,1024,360))
pf1 = np.zeros((1024,1024,360))
for b in range(0,1023) :
for beta in range(0,359) :
for a in range(0,1023) :
pf[a,b,beta] = (R/(((R**2)+(a**2)+(b**2))**0.5))*mat[a,b,beta]
pf1[:,b,beta] = np.convolve(pf[:,b,beta],g,'same')
for x in range(0,1023) :
for y in range(0,1023) :
for z in range(0,359) :
for beta in range(0,359) :
a = R*((-x*0.005)*(sin(beta)) + (y*0.005)*(cos(beta)))/(R+ (x*0.005)*(cos(beta))+(y*0.005)*(sin(beta)))
b = z*R/(R+(x*0.005)*(cos(beta))+(y*0.005)*(sin(beta)))
U = R+(x*0.005)*(cos(beta))+(y*0.005)*(sin(beta))
l = math.trunc(a)
m = math.trunc(b)
if (0<=l<1024 and 0<=m<1024) :
mat1[x,y,z] = mat[x,y,z] + (R**2/U**2)**pf1[l,m,beta]
The line where you do the convolution:
pf1 = np.convolve(pf[:,b,beta],g)
generates a 1-dimensional array, and not 3-dimensional as your call in the last line: pf1[l,m,beta]
To solve this you can use:
pf1[:,b,beta] = np.convolve(pf[:,b,beta],g,'same')
and you also need to predefine pf1:
pf1 = np.zeros((1024,1024,360))
Note that the convolution of f*g (np.convole(f,g)) returns normally a length of |f|+|g|-1. If you however use np.convolve with the parameter 'same' it returns an array which has the maximum length of f or g (i.e. max(|f|,|g|)).
Edit:
Furthermore you have to be sure that the dimensions of the matrices and the indices you use are correct, for example:
You define mat1 = np.zeros((100,100,100),dtype=np.int32), thus a 100x100x100 matrix, but in the last line you do mat1[x,y,z] where the variables x, y and z clearly get out of these dimensions. In this case they get to the range of the mat matrix. Probably you have to change the dimensions of mat1 also to those:
mat1 = np.zeros((1024,1024,360),dtype=np.int32)
Also be sure that the last variable indices you calculate (l and m) are within the dimensions of pf1.
Edit 2: The range(a,b) function returns an array from a to b, but not including b. So instead of range(0,1023) for example, you should write range(0,1024) (or shorter: range(1024)).
Edit 3: To check if l or m exceed the dimensions you could add an error as soon as they do:
l = math.trunc(a)
if l>=1024:
print 'l exceeded bounds: ',l
m = math.trunc(b)
if m>=1024:
print 'm exceeded bounds: ',m
Edit 4: note that your your code, especially your last for will take a long time! Your last nested for results in 1024*1024*360*360=135895449600 iterations. With a small time estimation I did (calculating the running time of the code in your for loop) your code might take about 5 days to run.
A small easy optimization you could do is instead of calculating the sin and cos several times, create a variable storing the value:
sinbeta = sin(beta)
cosbeta = cos(beta)
but it will probably still take several days. You might want to check how to optimize your calculations or calculate it with a C program for example.
Related
I've been working on a basic simulation for "diffusion monte carlo" to find the ground state energy of the hydrogen molecule. There's a critical piece of the algorithm which is slowing my code down painfully, and I'm not sure how to fix it.
This is what the code is doing. I have a 6 by N numpy array called x. The array represents N random "walkers" which sample the 6 dimensional phase space (two electrons times 3 dimensions is 6 dimensions). I propose certain random changes to each "walker" to get my new "walker", and then using a formula I spit out a number "m" for each new walker.
The number m can either be 0,1,2,or 3. This is where the hard part comes in. If m is 0, then the "walker" it corresponds to is deleted from the array. If m is 1 then the walker remains. If m is 2 then the walker remains AND I have to make a new copy of the walker in the array. If m is 3 then the walker remains AND I have to make TWO new copies of the walker in the array. After this the code repeats; random changes are proposed to my array of walkers, etc.
So; the following is the code that's slowing down the algorithm. This is the code for the final part where I have to go through my m's and determine what to do with each "walker", and create my new array x to use for the next iteration of the algorithm.
j1 = 0
n1 = len(x[0,:])
x_n = np.ones((6,2*n1))
for i in range(0,n1):
if m[i] == 1:
x_n[:,j1] = x[:,i]
j1 = j1 + 1
if m[i] == 2:
x_n[:,j1] = x[:,i]
x_n[:,j1+1] = x[:,i]
j1 = j1 + 2
if m[i] == 3:
x_n[:,j1] = x[:,i]
x_n[:,j1+1] = x[:,i]
j1 = j1 + 3
x = np.ones((6,j1))
for j in range(0,j1):
x[:,j] = x_n[:,j]
My question is as follows; is there a way to do what I'm doing in this code using numpy itself? Numpy tends to be way faster than for loops in my experience. Using numpy directly in variational monte carlo simulations I was able to achieve a 100 fold improvement in run-time. If you'd like the full code to actually run the algorithm then I can post that; it's just fairly long.
let M be an N x 1 array of the m values for each random walker.
let X be your original 6 x N data array
# np.where returns a list of indices where the condition is satisfied
zeros = np.where(M == 0) # don't actually need this variable, I just did it for completeness
ones = np.where(M == 1)
twos = np.where(M == 2)
threes = np.where(M == 3)
# use the lists of indices to access the relevant portions of X
ones_array = X[:,ones]
twos_array = X[:,twos]
threes_array = X[:,threes]
# update X with one copy where m = 1, two copies where m = 2, three copies where m = 3
X = np.concatenate((ones_array,twos_array,twos_array,threes_array,threes_array,threes_array),axis = 1)
This doesn't preserve the ordering of the walkers, so if that is important the code will be slightly different.
I'm wondering if there is a faster way to do this.
"""
Structure
-data[]
-data[0]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[1]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[2]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[3]
-data[number, number, number, number, number, number, number]
- ... ect X 12000
x and y are the first two numbers in each data array.
"""
I need to scan each item in layers 1,2,3 against each item in the first layer (0) looking to see if they fall within a given search radius. This takes a while.
for i in range (len(data[0])):
x = data[0][i][0]
y = data[0][i][1]
for x in range (len(data[1])):
x1 = data[1][x][0]
y1 = data[1][x][1]
if( math.pow((x1 -x),2) + math.pow((y1 - y),2) < somevalue):
matches1.append(data[0][i])
matches2.append(data[1][x])
continue
else:
continue
Thanks for any assistance!
First you should write more readable python code:
for x,y in data[0]:
for x1, y1 in data[1]:
if (x1 - x)**2 + (y1 - y)**2 < somevalue:
matches1.append((x,y))
matches2.append((x1,y1))
The you can vectorize the inner loop with numpy:
for x,y in data[0]:
x1, y1 = data[1].T
indices = (x1 - x)**2 + (y1 - y)**2 < somevalue
matches.append(((x,y), data[1][indices]))
For this specific problem scipy.spatial.KDTree or rather its Cython workalike scipy.spatial.cKDTree would appear to be taylor-made:
import numpy as np
from scipy.spatial import cKDTree
# create some random data
data = np.random.random((4, 12000, 7))
# in each record discard all but x and y
data_xy = data[..., :2]
# build trees
trees = [cKDTree(d) for d in data_xy]
somevalue = 0.001
# find all close pairs between reference layer and other layers
pairs = []
for tree in trees[1:]:
pairs.append(trees[0].query_ball_tree(tree, np.sqrt(somevalue)))
This example takes less than a second. Please note that the output format is different to the one your script produces. For each of the three non-reference layers it is a list of lists, where the inner list at index k contains the indices of the points that are close to point k in the reference list.
I would suggest creating a function out of this and using the numba libray with decorator #jit(nopython=True).
also as suggested you should use numpy arrays as numba is focusing on utilizing numpy operations.
from numba import jit
#jit(nopython=True)
def search(data):
matches1 = []
matches2 = []
for i in range (len(data[0])):
x = data[0][i][0]
y = data[0][i][1]
for x in range (len(data1[1])):
x1 = data[1][x][0]
y1 = data[1][x][1]
if( math.pow((x1 -x),2) + math.pow((y1 - y),2) < somevalue):
matches1.append(data[0][i])
matches2.append(data[1][x])
continue
else:
continue
return matches1, matches2
if __name__ == '__main__':
# Initialize
# import your data however.
m1, m2 = search(data)
The key is to make sure to only use the allowed functions supported by numba.
I have seen speed increases from 100x faster to ~300x faster.
This could also be a good place to use GPGPU computation. From python you have pycuda and pyopencl depending on your underlying hardware. Opencl can also use some of the SIMD instructions on the CPU if you don't have a gpu.
If you don't want to go down the GPGPU road then numpy or numba would also be useful as mentioned before.
I have the following two arrays with shape:
A = (d,w,l)
B = (d,q)
And I want to combine these into a 3d array with shape:
C = (q,w,l)
To be a bit more specific, in my case d (depth of the 3d array) is 2, and i'd first like to multiply all positions out of w * l in the upper layer of A (so d = 0) with the first value of B in the highest row (so d=0, q=0). For d=1 I do the same, and then sum the two so:
C_{q=0,w,l} = A_{d=0,w,l}*B_{d=0,q=0} + A_{d=1,w,l}*B_{d=1,q=0}
I wanted to calculate C by making use of numpy.einsum. I thought of the following code:
A = np.arange(100).reshape(2,10,5)
B = np.arange(18).reshape(2,9)
C = np.einsum('ijk,i -> mjk',A,B)
Where ijk refers to 2,10,5 and mjk refers to 9,10,5. However I get an error. Is there some way to perform this multiplication with numpy einsum?
Thanks
Your shapes A = (d,w,l), B = (d,q), C = (q,w,l) practically write the einsum expression
C=np.einsum('dwl,dq->qwl',A,B)
which I can test with
In [457]: np.allclose(A[0,:,:]*B[0,0]+A[1,:,:]*B[1,0],C[0,:,:])
Out[457]: True
I have trouble passing 2D arrays to fortran. I want to combine a bunch of not overlapping spectra. First I select the points on the x-axis, then I interpolate all data to this new, common grid. I store the spectra in a 2D list in python.
This works in Python 2.7, but very slow:
for i in range(len(wlp)):
print wlp[i],
for a in range(len(datax)):
inrange = 0
if datax[a][0] >= wlp[i] or datax[a][-1] <= wlp[i]:
for b in range(len(datax[a])-1):
if float(datax[a][b]) <= wlp[i] and float(datax[a][b+1]) >= wlp[i]:
sp = float(datax[a][b]); ep = float(datax[a][b+1])
delx = ep-sp; dely = float(data[a][b+1])-float(data[a][b])
ji = (dely/delx)*(wlp[i]-sp)+float(data[a][b])
inrange = 1
if inrange == 0: ji = '?0'
else: ji = ji * weights[a]
print ji,
print
The common x-grid is printed in column one and all the interpolated spectra are printed in subsequent columns. If there are some shorter ones out of range, it prints "?0". This helps to set up proper weights for each datapoints later.
I ended up having this fortran subroutine to speed it up with f2py:
c wlp = x axis points (wavelength)
c lenwlp = length of list wlp, len(wlp)
c datay = 2D python list with flux
c datax = 2D python list with wavelength
c lendatax = number of spectra, len(datax)
c datax_pl = list of the lengths of all spectra
c weights = list of optional weights
c maxi = length of the longest spectrum
C============================================================================80
SUBROUTINE DOIT(wlp,lenwlp,datay,datax,lendatax,datax_pl,
. weights,maxi)
C============================================================================80
INTEGER I,a,b,lenwlp,inrange,datax_pl(*),maxi,lendatax
DOUBLE PRECISION WLP(*),SP,EP,DELY,DELX,ji
DOUBLE PRECISION WEIGHTS(*)
DOUBLE PRECISION DATAY(lendatax,maxi)
DOUBLE PRECISION DATAX(lendatax,maxi)
2 FORMAT (E20.12, 1X, $)
3 FORMAT (A, $)
4 FORMAT (1X)
I = 1
DO WHILE (I.LE.lenwlp)
WRITE(*,2) WLP(I)
DO a=1,lendatax
inrange = 0
ji = 0.0
IF (datax(a,1).ge.WLP(I) .or.
. datax(a,datax_pl(a)).le.WLP(I)) THEN
DO b=1,datax_pl(a)-1
IF (DATAX(a,b).LE.WLP(I) .and.
. DATAX(a,b+1).GE.WLP(I)) THEN
SP = DATAX(a,b); EP = DATAX(a,b+1)
DELX = EP - SP; DELY = datay(a,b+1)-datay(a,b)
if (delx.eq.0.0) then
ji = datay(a,b)
else
ji = (DELY/DELX)*(WLP(I)-SP)+datay(a,b)
end if
inrange = 1
END IF
END DO
END IF
IF (inrange.eq.0) THEN
WRITE(*,3) ' ?0'
ELSE
WRITE(*,2) ji*WEIGHTS(a)
END IF
END DO
I = I + 1
write(*,4)
END DO
END
which compiles with gfortran 4.8 fine. Then I import it in the Python code, set up the lists and run the subroutine:
import subroutines
wlp = [...]
data = [[...],[...],[...]]
datax = [[...],[...],[...]]
datax_pl = [...]
weights = [...]
maxi = max(datax_pl)
subroutines.doit(wlp,len(wlp),data,datax,len(datax),datax_pl,weights,maxi)
and it returns:
ValueError: setting an array element with a sequence.
I pass the lists and the length of the longest spectrum (maxi), this should define the maximum dimension in fortran (?).
I don't need return values, everything is printed on stdout.
The problem must be right at the beginning at the array declarations. I don't have experience with this... any advice is appreciated.
As I said in the comment, you cannot pass Python lists to f2py procedures. You MUST use numpy arrays, which are compatible with Fortran or C arrays.
The error message you show comes from this problem.
You can create the array from a list http://docs.scipy.org/doc/numpy/user/basics.creation.html
I have two matrices. Both are filled with zeros and ones. One is a big one (3000 x 2000 elements), and the other is smaller ( 20 x 20 ) elements. I am doing something like:
newMatrix = (size of bigMatrix), filled with zeros
l = (a constant)
for y in xrange(0, len(bigMatrix[0])):
for x in xrange(0, len(bigMatrix)):
for b in xrange(0, len(smallMatrix[0])):
for a in xrange(0, len(smallMatrix)):
if (bigMatrix[x, y] == smallMatrix[x + a - l, y + b - l]):
newMatrix[x, y] = 1
Which is being painfully slow. Am I doing anything wrong? Is there a smart way to make this work faster?
edit: Basically I am, for each (x,y) in the big matrix, checking all the pixels of both big matrix and the small matrix around (x,y) to see if they are 1. If they are 1, then I set that value on newMatrix. I am doing a sort of collision detection.
I can think of a couple of optimisations there -
As you are using 4 nested python "for" statements, you are about as slow as you can be.
I can't figure out exactly what you are looking for -
but for one thing, if your big matrix "1"s density is low, you can certainly use python's "any" function on bigMtarix's slices to quickly check if there are any set elements there -- you could get a several-fold speed increase there:
step = len(smallMatrix[0])
for y in xrange(0, len(bigMatrix[0], step)):
for x in xrange(0, len(bigMatrix), step):
if not any(bigMatrix[x: x+step, y: y + step]):
continue
(...)
At this point, if still need to interact on each element, you do another pair of indexes to walk each position inside the step - but I think you got the idea.
Apart from using inner Numeric operations like this "any" usage, you could certainly add some control flow code to break-off the (b,a) loop when the first matching pixel is found.
(Like, inserting a "break" statement inside your last "if" and another if..break pair for the "b" loop.
I really can't figure out exactly what your intent is - so I can't give you more specifc code.
Your example code makes no sense, but the description of your problem sounds like you are trying to do a 2d convolution of a small bitarray over the big bitarray. There's a convolve2d function in scipy.signal package that does exactly this. Just do convolve2d(bigMatrix, smallMatrix) to get the result. Unfortunately the scipy implementation doesn't have a special case for boolean arrays so the full convolution is rather slow. Here's a function that takes advantage of the fact that the arrays contain only ones and zeroes:
import numpy as np
def sparse_convolve_of_bools(a, b):
if a.size < b.size:
a, b = b, a
offsets = zip(*np.nonzero(b))
n = len(offsets)
dtype = np.byte if n < 128 else np.short if n < 32768 else np.int
result = np.zeros(np.array(a.shape) + b.shape - (1,1), dtype=dtype)
for o in offsets:
result[o[0]:o[0] + a.shape[0], o[1]:o[1] + a.shape[1]] += a
return result
On my machine it runs in less than 9 seconds for a 3000x2000 by 20x20 convolution. The running time depends on the number of ones in the smaller array, being 20ms per each nonzero element.
If your bits are really packed 8 per byte / 32 per int,
and you can reduce your smallMatrix to 20x16,
then try the following, here for a single row.
(newMatrix[x, y] = 1 when any bit of the 20x16 around x,y is 1 ??
What are you really looking for ?)
python -m timeit -s '
""" slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] """
import numpy as np
bits = np.zeros( 2000 // 16, np.uint16 ) # 2000 bits
bits[::8] = 1
mask = 32+16
nhit = 16 * [0]
def hit16( bits, mask, nhit ):
"""
slide 16-bit mask across 32-bit pairs bits[j], bits[j+1]
bits: long np.array( uint16 )
mask: 16 bits, int
out: nhit[j] += 1 where pair & mask != 0
"""
left = bits[0]
for b in bits[1:]:
pair = (left << 16) | b
if pair: # np idiom for non-0 words ?
m = mask
for j in range(16):
if pair & m:
nhit[j] += 1
# hitposition = jb*16 + j
m <<= 1
left = b
# if any(nhit): print "hit16:", nhit
' \
'
hit16( bits, mask, nhit )
'
# 15 msec per loop, bits[::4] = 1
# 11 msec per loop, bits[::8] = 1
# mac g4 ppc