I've been working on a basic simulation for "diffusion monte carlo" to find the ground state energy of the hydrogen molecule. There's a critical piece of the algorithm which is slowing my code down painfully, and I'm not sure how to fix it.
This is what the code is doing. I have a 6 by N numpy array called x. The array represents N random "walkers" which sample the 6 dimensional phase space (two electrons times 3 dimensions is 6 dimensions). I propose certain random changes to each "walker" to get my new "walker", and then using a formula I spit out a number "m" for each new walker.
The number m can either be 0,1,2,or 3. This is where the hard part comes in. If m is 0, then the "walker" it corresponds to is deleted from the array. If m is 1 then the walker remains. If m is 2 then the walker remains AND I have to make a new copy of the walker in the array. If m is 3 then the walker remains AND I have to make TWO new copies of the walker in the array. After this the code repeats; random changes are proposed to my array of walkers, etc.
So; the following is the code that's slowing down the algorithm. This is the code for the final part where I have to go through my m's and determine what to do with each "walker", and create my new array x to use for the next iteration of the algorithm.
j1 = 0
n1 = len(x[0,:])
x_n = np.ones((6,2*n1))
for i in range(0,n1):
if m[i] == 1:
x_n[:,j1] = x[:,i]
j1 = j1 + 1
if m[i] == 2:
x_n[:,j1] = x[:,i]
x_n[:,j1+1] = x[:,i]
j1 = j1 + 2
if m[i] == 3:
x_n[:,j1] = x[:,i]
x_n[:,j1+1] = x[:,i]
j1 = j1 + 3
x = np.ones((6,j1))
for j in range(0,j1):
x[:,j] = x_n[:,j]
My question is as follows; is there a way to do what I'm doing in this code using numpy itself? Numpy tends to be way faster than for loops in my experience. Using numpy directly in variational monte carlo simulations I was able to achieve a 100 fold improvement in run-time. If you'd like the full code to actually run the algorithm then I can post that; it's just fairly long.
let M be an N x 1 array of the m values for each random walker.
let X be your original 6 x N data array
# np.where returns a list of indices where the condition is satisfied
zeros = np.where(M == 0) # don't actually need this variable, I just did it for completeness
ones = np.where(M == 1)
twos = np.where(M == 2)
threes = np.where(M == 3)
# use the lists of indices to access the relevant portions of X
ones_array = X[:,ones]
twos_array = X[:,twos]
threes_array = X[:,threes]
# update X with one copy where m = 1, two copies where m = 2, three copies where m = 3
X = np.concatenate((ones_array,twos_array,twos_array,threes_array,threes_array,threes_array),axis = 1)
This doesn't preserve the ordering of the walkers, so if that is important the code will be slightly different.
Related
I have two ordered lists of consecutive integers m=0, 1, ... M and n=0, 1, 2, ... N. Each value of m has a probability pm, and each value of n has a probability pn. I am trying to find the ordered list of unique values r=n/m and their probabilities pr. I am aware that r is infinite if n=0 and can even be undefined if m=n=0.
In practice, I would like to run for M and N each be of the order of 2E4, meaning up to 4E8 values of r - which would mean 3 GB of floats (assuming 8 Bytes/float).
For this calculation, I have written the python code below.
The idea is to iterate over m and n, and for each new m/n, insert it in the right place with its probability if it isn't there yet, otherwise add its probability to the existing number. My assumption is that it is easier to sort things on the way instead of waiting until the end.
The cases related to 0 are added at the end of the loop.
I am using the Fraction class since we are dealing with fractions.
The code also tracks the multiplicity of each unique value of m/n.
I have tested up to M=N=100, and things are quite slow. Are there better approaches to the question, or more efficient ways to tackle the code?
Timing:
M=N=30: 1 s
M=N=50: 6 s
M=N=80: 30 s
M=N=100: 82 s
import numpy as np
from fractions import Fraction
import time # For timiing
start_time = time.time() # Timing
M, N = 6, 4
mList, nList = np.arange(1, M+1), np.arange(1, N+1) # From 1 to M inclusive, deal with 0 later
mProbList, nProbList = [1/(M+1)]*(M), [1/(N+1)]*(N) # Probabilities, here assumed equal (not general case)
# Deal with mn=0 later
pmZero, pnZero = 1/(M+1), 1/(N+1) # P(m=0) and P(n=0)
pNaN = pmZero * pnZero # P(0/0) = P(m=0)P(n=0)
pZero = pmZero * (1 - pnZero) # P(0) = P(m=0)P(n!=0)
pInf = pnZero * (1 - pmZero) # P(inf) = P(m!=0)P(n=0)
# Main list of r=m/n, P(r) and mult(r)
# Start with first line, m=1
rList = [Fraction(mList[0], n) for n in nList[::-1]] # Smallest first
rProbList = [mProbList[0] * nP for nP in nProbList[::-1]] # Start with first line
rMultList = [1] * len(rList) # Multiplicity of each element
# Main loop
for m, mP in zip(mList[1:], mProbList[1:]):
for n, nP in zip(nList[::-1], nProbList[::-1]): # Pick an n value
r, rP, rMult = Fraction(m, n), mP*nP, 1
for i in range(len(rList)-1): # See where it fits in existing list
if r < rList[i]:
rList.insert(i, r)
rProbList.insert(i, rP)
rMultList.insert(i, 1)
break
elif r == rList[i]:
rProbList[i] += rP
rMultList[i] += 1
break
elif r < rList[i+1]:
rList.insert(i+1, r)
rProbList.insert(i+1, rP)
rMultList.insert(i+1, 1)
break
elif r == rList[i+1]:
rProbList[i+1] += rP
rMultList[i+1] += 1
break
if r > rList[-1]:
rList.append(r)
rProbList.append(rP)
rMultList.append(1)
break
# Deal with 0
rList.insert(0, Fraction(0, 1))
rProbList.insert(0, pZero)
rMultList.insert(0, N)
# Deal with infty
rList.append(np.Inf)
rProbList.append(pInf)
rMultList.append(M)
# Deal with undefined case
rList.append(np.NAN)
rProbList.append(pNaN)
rMultList.append(1)
print(".... done in %s seconds." % round(time.time() - start_time, 2))
print("************** Final list\nr", 'Prob', 'Mult')
for r, rP, rM in zip(rList, rProbList, rMultList): print(r, rP, rM)
print("************** Checks")
print("mList", mList, 'nList', nList)
print("Sum of proba = ", np.sum(rProbList))
print("Sum of multi = ", np.sum(rMultList), "\t(M+1)*(N+1) = ", (M+1)*(N+1))
Based on the suggestion of #Prune, and on this thread about merging lists of tuples, I have modified the code as below. It's a lot easier to read, and runs about an order of magnitude faster for N=M=80 (I have omitted dealing with 0 - would be done same way as in original post). I assume there may be ways to tweak the merge and conversion back to lists further yet.
# Do calculations
data = [(Fraction(m, n), mProb(m) * nProb(n)) for n in range(1, N+1) for m in range(1, M+1)]
data.sort()
# Merge duplicates using a dictionary
d = {}
for r, p in data:
if not (r in d): d[r] = [0, 0]
d[r][0] += p
d[r][1] += 1
# Convert back to lists
rList, rProbList, rMultList = [], [], []
for k in d:
rList.append(k)
rProbList.append(d[k][0])
rMultList.append(d[k][1])
I expect that "things are quite slow" because you've chosen a known inefficient sort. A single list insertion is O(K) (later list elements have to be bumped over, and there is added storage allocation on a regular basis). Thus a full-list insertion sort is O(K^2). For your notation, that is O((M*N)^2).
If you want any sort of reasonable performance, research and use the best-know methods. The most straightforward way to do this is to make your non-exception results as a simple list comprehension, and use the built-in sort for your penultimate list. Simply append your n=0 cases, and you're done in O(K log K) time.
I the expression below, I've assumed functions for m and n probabilities.
This is a notational convenience; you know how to directly compute them, and can substitute those expressions if you wish.
data = [ (mProb(m) * nProb(n), Fraction(m, n))
for n in range(1, N+1)
for m in range(0, M+1) ]
data.sort()
data.extend([ # generate your "zero" cases here ])
I am writing Python code (and also relatively new to Python) to do some classification using neural networks. The number of neurons in the hidden layers varies, which means I have various matrices of different size that I want to save.
If this were MATLAB, I would just do
for layer_iteration = 1 : number_of_layers
cell_matrix{layer_iteration} = create_matrix(size_current, size_previous)
end
which would store different sized matrices in cell_matrix{layer_iteration}.
I wasn't quite sure how to do this in Python, so my first thought was to create the matrices at each iteration, flatten them, and append them. Then I create an index matrix of the same size as the flattened array, where each value is an integer, which points to the correct layer_iteration. I can then used indexing in later operations to find the correct slice of the 1D matrix and reshape. This is some of the code:
layer_sizes = [2, 10 ,6 ,2]
total_amount_of_values = 92 e.g. 2*10 + 10*6 + 6*2
W = np.zeros(total_amount_of_values)
idx_W = np.zeros((total_amount_of_values,), dtype=np.int)
it_idx = 0
for idx, val in enumerate(layer_sizes):
to_idx = idx + 1
if idx == 0:
start_idx = 0
else:
start_idx = end_idx
if it_dx < len(layer_sizes)-1:
tmp_W = create_matrix(val, layer_sizes[to_idx])
end_idx = start_idx + layer_sizes[idx + 1] * val
W[start_idx:end_idx] = tmp_W
idx_W[start_idx:end_idx] = int(it_idx)
it_idx += 1
If I want to use W for matrix multiplication later on, all I need to do is:
tmp_W = W[idx_W == iteration]
tmp_W = np.reshape(tmp_W, (size_1, size_2))
This works fine but I've realised it doesn't seem Pythonic at all. I am now at a part of my code, where using my technique is complicating things. I would really benefit of a simpler way of storing different size matrices both in terms of my current code and more importantly, my education in Python.
What would be the best solution?
Thanks!
First time publishing in here, here it goes:
I have two sets of data(v and t), each one has 46 values. The data is imported with "pandas" module and coverted to a numpy array in order to do the calculation.
I need to set ml_min1[45], ml_min2[45], and so on to the value "0". The problem is that each time I ran the script, the values corresponding to the position 45 of ml_min1 and ml_min2 are different. This is the piece of code that I have:
t1 = fil_copy.t1.as_matrix()
t2 = fil_copy.t2.as_matrix()
v1 = fil_copy.v1.as_matrix()
v2 = fil_copy.v2.as_matrix()
ml_min1 = np.empty(len(t1))
l_h1 = np.empty(len(t1))
ml_min2 = np.empty(len(t2))
l_h2 = np.empty(len(t2))
for i in range(0, (len(v1) - 1)):
if (i != (len(v1) - 1)) and (v1[i+1] > v1[i]):
ml_min1[i] = v1[i+1] - v1[i]
l_h1[i] = ml_min1[i] * (60/1000)
elif i == (len(v1)-1):
ml_min1[i] = 0
l_h1[i] = 0
print(i, ml_min1[i])
else:
ml_min1[i] = 0
l_h1[i] = 0
print(i, ml_min1[i])
for i in range(0, (len(v2) - 1)):
if (i != (len(v2) - 1)) and (v2[i+1] > v2[i]):
ml_min2[i] = v2[i+1] - v2[i]
l_h2[i] = ml_min2[i] * (60/1000)
elif i == (len(v2)-1):
ml_min2[i] = 0
l_h2[i] = 0
print(i, ml_min2[i])
else:
ml_min2[i] = 0
l_h2[i] = 0
print(i, ml_min2[i])
Your code as it is currently written doesn't work because the elif blocks are never hit, since range(0, x) does not include x (it stops just before getting there). The easiest way to solve this is probably just to initialize your output arrays with numpy.zeros rather than numpy.empty, since then you don't need to do anything in the elif and else blocks (you can just delete them).
That said, it's generally a design error to use loops like yours in numpy code. Instead, you should use numpy's broadcasting features to perform your mathematical operations to a whole array (or a slice of one) at once.
If I understand correctly, the following should be equivalent to what you wanted your code to do (just for one of the arrays, the other should work the same):
ml_min1 = np.zeros(len(t1)) # use zeros rather than empty, so we don't need to assign any 0s
diff = v1[1:] - v1[:-1] # find the differences between all adjacent values (using slices)
mask = diff > 0 # check which ones are positive (creates a Boolean array)
ml_min1[:-1][mask] = diff[mask] # assign with mask to a slice of the ml_min1 array
l_h1 = ml_min1 * (60/1000) # create l_h1 array with a broadcast scalar multiplication
I am trying to create a 3D image mat1 from the data given to me by an object. But I am getting the error for the last line: mat1[x,y,z] = mat[x,y,z] + (R**2/U**2)**pf1[l,m,beta]:
IndexError: too many indices for array
What could possible be the problem here?
Following is my code :
mat1 = np.zeros((1024,1024,360),dtype=np.int32)
k = 498
gamma = 0.00774267
R = 0.37
g = np.zeros(1024)
g[0:512] = np.linspace(0,1,512)
g[513:] = np.linspace(1,0,511)
pf = np.zeros((1024,1024,360))
pf1 = np.zeros((1024,1024,360))
for b in range(0,1023) :
for beta in range(0,359) :
for a in range(0,1023) :
pf[a,b,beta] = (R/(((R**2)+(a**2)+(b**2))**0.5))*mat[a,b,beta]
pf1[:,b,beta] = np.convolve(pf[:,b,beta],g,'same')
for x in range(0,1023) :
for y in range(0,1023) :
for z in range(0,359) :
for beta in range(0,359) :
a = R*((-x*0.005)*(sin(beta)) + (y*0.005)*(cos(beta)))/(R+ (x*0.005)*(cos(beta))+(y*0.005)*(sin(beta)))
b = z*R/(R+(x*0.005)*(cos(beta))+(y*0.005)*(sin(beta)))
U = R+(x*0.005)*(cos(beta))+(y*0.005)*(sin(beta))
l = math.trunc(a)
m = math.trunc(b)
if (0<=l<1024 and 0<=m<1024) :
mat1[x,y,z] = mat[x,y,z] + (R**2/U**2)**pf1[l,m,beta]
The line where you do the convolution:
pf1 = np.convolve(pf[:,b,beta],g)
generates a 1-dimensional array, and not 3-dimensional as your call in the last line: pf1[l,m,beta]
To solve this you can use:
pf1[:,b,beta] = np.convolve(pf[:,b,beta],g,'same')
and you also need to predefine pf1:
pf1 = np.zeros((1024,1024,360))
Note that the convolution of f*g (np.convole(f,g)) returns normally a length of |f|+|g|-1. If you however use np.convolve with the parameter 'same' it returns an array which has the maximum length of f or g (i.e. max(|f|,|g|)).
Edit:
Furthermore you have to be sure that the dimensions of the matrices and the indices you use are correct, for example:
You define mat1 = np.zeros((100,100,100),dtype=np.int32), thus a 100x100x100 matrix, but in the last line you do mat1[x,y,z] where the variables x, y and z clearly get out of these dimensions. In this case they get to the range of the mat matrix. Probably you have to change the dimensions of mat1 also to those:
mat1 = np.zeros((1024,1024,360),dtype=np.int32)
Also be sure that the last variable indices you calculate (l and m) are within the dimensions of pf1.
Edit 2: The range(a,b) function returns an array from a to b, but not including b. So instead of range(0,1023) for example, you should write range(0,1024) (or shorter: range(1024)).
Edit 3: To check if l or m exceed the dimensions you could add an error as soon as they do:
l = math.trunc(a)
if l>=1024:
print 'l exceeded bounds: ',l
m = math.trunc(b)
if m>=1024:
print 'm exceeded bounds: ',m
Edit 4: note that your your code, especially your last for will take a long time! Your last nested for results in 1024*1024*360*360=135895449600 iterations. With a small time estimation I did (calculating the running time of the code in your for loop) your code might take about 5 days to run.
A small easy optimization you could do is instead of calculating the sin and cos several times, create a variable storing the value:
sinbeta = sin(beta)
cosbeta = cos(beta)
but it will probably still take several days. You might want to check how to optimize your calculations or calculate it with a C program for example.
I have two matrices. Both are filled with zeros and ones. One is a big one (3000 x 2000 elements), and the other is smaller ( 20 x 20 ) elements. I am doing something like:
newMatrix = (size of bigMatrix), filled with zeros
l = (a constant)
for y in xrange(0, len(bigMatrix[0])):
for x in xrange(0, len(bigMatrix)):
for b in xrange(0, len(smallMatrix[0])):
for a in xrange(0, len(smallMatrix)):
if (bigMatrix[x, y] == smallMatrix[x + a - l, y + b - l]):
newMatrix[x, y] = 1
Which is being painfully slow. Am I doing anything wrong? Is there a smart way to make this work faster?
edit: Basically I am, for each (x,y) in the big matrix, checking all the pixels of both big matrix and the small matrix around (x,y) to see if they are 1. If they are 1, then I set that value on newMatrix. I am doing a sort of collision detection.
I can think of a couple of optimisations there -
As you are using 4 nested python "for" statements, you are about as slow as you can be.
I can't figure out exactly what you are looking for -
but for one thing, if your big matrix "1"s density is low, you can certainly use python's "any" function on bigMtarix's slices to quickly check if there are any set elements there -- you could get a several-fold speed increase there:
step = len(smallMatrix[0])
for y in xrange(0, len(bigMatrix[0], step)):
for x in xrange(0, len(bigMatrix), step):
if not any(bigMatrix[x: x+step, y: y + step]):
continue
(...)
At this point, if still need to interact on each element, you do another pair of indexes to walk each position inside the step - but I think you got the idea.
Apart from using inner Numeric operations like this "any" usage, you could certainly add some control flow code to break-off the (b,a) loop when the first matching pixel is found.
(Like, inserting a "break" statement inside your last "if" and another if..break pair for the "b" loop.
I really can't figure out exactly what your intent is - so I can't give you more specifc code.
Your example code makes no sense, but the description of your problem sounds like you are trying to do a 2d convolution of a small bitarray over the big bitarray. There's a convolve2d function in scipy.signal package that does exactly this. Just do convolve2d(bigMatrix, smallMatrix) to get the result. Unfortunately the scipy implementation doesn't have a special case for boolean arrays so the full convolution is rather slow. Here's a function that takes advantage of the fact that the arrays contain only ones and zeroes:
import numpy as np
def sparse_convolve_of_bools(a, b):
if a.size < b.size:
a, b = b, a
offsets = zip(*np.nonzero(b))
n = len(offsets)
dtype = np.byte if n < 128 else np.short if n < 32768 else np.int
result = np.zeros(np.array(a.shape) + b.shape - (1,1), dtype=dtype)
for o in offsets:
result[o[0]:o[0] + a.shape[0], o[1]:o[1] + a.shape[1]] += a
return result
On my machine it runs in less than 9 seconds for a 3000x2000 by 20x20 convolution. The running time depends on the number of ones in the smaller array, being 20ms per each nonzero element.
If your bits are really packed 8 per byte / 32 per int,
and you can reduce your smallMatrix to 20x16,
then try the following, here for a single row.
(newMatrix[x, y] = 1 when any bit of the 20x16 around x,y is 1 ??
What are you really looking for ?)
python -m timeit -s '
""" slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] """
import numpy as np
bits = np.zeros( 2000 // 16, np.uint16 ) # 2000 bits
bits[::8] = 1
mask = 32+16
nhit = 16 * [0]
def hit16( bits, mask, nhit ):
"""
slide 16-bit mask across 32-bit pairs bits[j], bits[j+1]
bits: long np.array( uint16 )
mask: 16 bits, int
out: nhit[j] += 1 where pair & mask != 0
"""
left = bits[0]
for b in bits[1:]:
pair = (left << 16) | b
if pair: # np idiom for non-0 words ?
m = mask
for j in range(16):
if pair & m:
nhit[j] += 1
# hitposition = jb*16 + j
m <<= 1
left = b
# if any(nhit): print "hit16:", nhit
' \
'
hit16( bits, mask, nhit )
'
# 15 msec per loop, bits[::4] = 1
# 11 msec per loop, bits[::8] = 1
# mac g4 ppc