I am trying to use NumPy and vectorization operations to make a section of code run faster. I appear to have a misunderstanding of how to vectorize this code, however (probably due to an incomplete understanding of vectorization).
Here's the working code with loops (A and B are 2D arrays of a set size, already initialized):
for k in range(num_v):
B[:] = A[:]
for i in range(num_v):
for j in range(num_v):
A[i][j] = min(B[i][j], B[i][k] + B[k][j])
return A
And here is my attempt at vectorizing the above code:
for k in range(num_v):
B = numpy.copy(A)
A = numpy.minimum(B, B[:,k] + B[k,:])
return A
For testing these, I used the following, with the code above wrapped in a function called 'algorithm':
def setup_array(edges, num_v):
r = range(1, num_v + 1)
A = [[None for x in r] for y in r] # or (numpy.ones((num_v, num_v)) * 1e10) for numpy
for i in r:
for j in r:
val = 1e10
if i == j:
val = 0
elif (i,j) in edges:
val = edges[(i,j)]
A[i-1][j-1] = val
return A
A = setup_array({(1, 2): 2, (6, 4): 1, (3, 2): -3, (1, 3): 5, (3, 6): 5, (4, 5): 2, (3, 1): 4, (4, 3): 8, (3, 4): 6, (2, 4): -4, (6, 5): -5}, 6)
B = []
algorithm(A, B, 6)
The expected outcome, and what I get with the first code is:
[[0, 2, 5, -2, 0, 10]
[8, 0, 4, -4, -2, 9]
[4, -3, 0, -7, -5, 5]
[12, 5, 8, 0, 2, 13]
[10000000000.0, 9999999997.0, 10000000000.0, 9999999993.0, 0, 10000000000.0]
[13, 6, 9, 1, -5, 0]]
The second (vectorized) function instead returns:
[[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. -4. 0. 0.]
[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. 0. -5. 0.]]
What am I missing?
Usually you want to vectorize code because you think it is running too slow.
If your code is too slow, then I can tell you that proper indexing will make it faster.
Instead of A[i][j] you should write A[i, j] -- this avoids a transient copy of a (sub)array.
Since you do this in the inner-most loop of your code this might be very costly.
Look here:
In [37]: timeit test[2][2]
1000000 loops, best of 3: 1.5 us per loop
In [38]: timeit test[2,2]
1000000 loops, best of 3: 639 ns per loop
Do this consistently in your code -- I strongly believe that solves already your performance problem!
Having said that...
... here's my take on how to vectorize
for k in range(num_v):
numpy.minimum(A, np.add.outer(A[:,k], A[k,:]), A)
return A
numpy.minimum will compare two arrays and return element-wise the smaller of two elements. If you pass a third argument it will take the output. If this is an input array the whole operation is in place.
As Peter de Rivay explains, there is a problem in your solution with broadcasting -- but mathematically what you want to do is some kind of outer product over addition of two vectors. Therefore you can use the outer operation on the add function.
NumPy’s binary ufuncs have special methods for performing certain kinds of
special vectorized operations like reduce, accumulate, sum and outer.
The problem is caused by array broadcasting in the line:
A = numpy.minimum(B, B[:,k] + B[k,:])
B is size 6 by 6, B[:,k] is an array with 6 elements, B[k,:] is an array with 6 elements.
(Because you are using the numpy array type, both B[:,k] and B[k,:] return a rank-1 array of shape N)
Numpy automatically changes the sizes to match:
First B[:,k] is added to B[k,:] to make an intermediate array result with 6 elements. (This is not what you intended)
Second this 6 element array is broadcast to a 6 by 6 matrix by repeating the rows
Third the minimum of the original matrix and this broadcast matrix is computed.
This means that your numpy code is equivalent to:
for k in range(num_v):
B[:] = A[:]
C=[B[i][k]+B[k][i] for i in range(num_v)]
for i in range(num_v):
for j in range(num_v):
A[i][j] = min(B[i][j], C[j])
The easiest way to fix your code is to use the matrix type instead of the array type:
A = numpy.matrix(A)
for k in range(num_v):
A = numpy.minimum(A, A[:,k] + A[k,:])
The matrix type uses stricter broadcasting rules so in this case:
A[:,k] is extended to a 6 by 6 matrix by repeating columns
A[k,:] is extended to a 6 by 6 matrix by repeating rows
The broadcasted matrices are added together to make a 6 by 6 matrix
The minimum is applied
Related
I pretend to remove slices from the third dimension of a 4d numpy array if it's contains only zeros.
I have a 4d numpy array of dimensions [256,256,336,6] and I need to delete the slices in the third dimension that only contains zeros. So the result would have a shape like this , e.g. [256,256,300,6] if 36 slices are fully zeros. I have tried multiple approaches including for loops, np.delete and all(), any() functions without success.
You need to reduce on all axes but the one you are interested in.
An example using np.any() where there are all-zero subarrays along the axis 1 (at position 0 and 2):
import numpy as np
a=np.ones((2, 3, 2, 3))
a[:, 0, :, :] = a[:, 2, :, :] =0
mask = np.any(a, axis=(0, 2, 3))
new_a = a[:, mask, :, :]
print(new_a.shape)
# (2, 1, 2, 3)
print(new_a)
# [[[[1. 1. 1.]
# [1. 1. 1.]]]
#
#
# [[[1. 1. 1.]
# [1. 1. 1.]]]]
The same code parametrized and refactored as a function:
def remove_all_zeros(arr: np.ndarray, axis: int) -> np.ndarray:
red_axes = tuple(i for i in range(arr.ndim) if i != axis)
mask = np.any(arr, axis=red_axes)
slicing = tuple(slice(None) if i != axis else mask for i in range(arr.ndim))
return arr[slicing]
a = np.ones((2, 3, 2, 3))
a[:, 0, :, :] = a[:, 2, :, :] = 0
new_a = remove_all_zeros(a, 1)
print(new_a.shape)
# (2, 1, 2, 3)
print(new_a)
# [[[[1. 1. 1.]
# [1. 1. 1.]]]
#
#
# [[[1. 1. 1.]
# [1. 1. 1.]]]]
I'm not an afficionado with numpy, but does this do what you want?
I take the following small example matrix with 4 dimensions all full of 1s and then I set some slices to zero:
import numpy as np
a=np.ones((4,4,5,2))
The shape of a is:
>>> a.shape
(4, 4, 5, 2)
I will artificially set some of the slices in dimension 3 to zero:
a[:,:,0,:]=0
a[:,:,3,:]=0
I can find the indices of the slices with not all zeros by calculating sums (not very efficient, perhaps!)
indices = [i for i in range(a.shape[2]) if a[:,:,i,:].sum() != 0]
>>> indices
[1, 2, 4]
So, in your general case you could do this:
indices = [i for i in range(a.shape[2]) if a[:,:,i,:].sum() != 0]
a_new = a[:, :, indices, :].copy()
Then the shape of a_new is:
>>> anew.shape
(4, 4, 3, 2)
I've read the tf.scatter_nd documentation and run the example code for 1D and 3D tensors... and now I'm trying to do it for a 2D tensor. I want to 'interleave' the columns of two tensors. For 1D tensors, one can do this via
'''
We want to interleave elements of 1D tensors arr1 and arr2, where
arr1 = [10, 11, 12]
arr2 = [1, 2, 3, 4, 5, 6]
such that
desired result = [1, 2, 10, 3, 4, 11, 5, 6, 12]
'''
import tensorflow as tf
with tf.Session() as sess:
updates1 = tf.constant([1,2,3,4,5,6])
indices1 = tf.constant([[0], [1], [3], [4], [6], [7]])
shape = tf.constant([9])
scatter1 = tf.scatter_nd(indices1, updates1, shape)
updates2 = tf.constant([10,11,12])
indices2 = tf.constant([[2], [5], [8]])
scatter2 = tf.scatter_nd(indices2, updates2, shape)
result = scatter1 + scatter2
print(sess.run(result))
(aside: is there a better way to do this? I'm all ears.)
This gives the output
[ 1 2 10 3 4 11 5 6 12]
Yay! that worked!
Now lets' try to extend this to 2D.
'''
We want to interleave the *columns* (not rows; rows would be easy!) of
arr1 = [[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]]
arr2 = [[10 11 12], [10 11 12], [10 11 12]]
such that
desired result = [[1,2,10,3,4,11,5,6,12],[1,2,10,3,4,11,5,6,12],[1,2,10,3,4,11,5,6,12]]
'''
updates1 = tf.constant([[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]])
indices1 = tf.constant([[0], [1], [3], [4], [6], [7]])
shape = tf.constant([3, 9])
scatter1 = tf.scatter_nd(indices1, updates1, shape)
This gives the error
ValueError: The outer 1 dimensions of indices.shape=[6,1] must match the outer 1
dimensions of updates.shape=[3,6]: Dimension 0 in both shapes must be equal, but
are 6 and 3. Shapes are [6] and [3]. for 'ScatterNd_2' (op: 'ScatterNd') with
input shapes: [6,1], [3,6], [2].
Seems like my indices is specifying row indices instead of column indices, and given the way that arrays are "connected" in numpy and tensorflow (i.e. row-major order), does that mean
I need to explicitly specify every single pair of indices for every element in updates1?
Or is there some kind of 'wildcard' specification I can use for the rows? (Note indices1 = tf.constant([[:,0], [:,1], [:,3], [:,4], [:,6], [:,7]]) gives syntax errors, as it probably should.)
Would it be easier to just do a transpose, interleave the rows, then transpose back?
Because I tried that...
scatter1 = tf.scatter_nd(indices1, tf.transpose(updates1), tf.transpose(shape))
print(sess.run(tf.transpose(scatter1)))
...and got a much longer error message, that I don't feel like posting unless someone requests it.
PS- I searched to make sure this isn't a duplicate -- I find it hard to imagine that someone else hasn't asked this before -- but turned up nothing.
This is pure slicing but I didn't know that syntax like arr1[0:,:][:,:2] actually works. It seems it does but not sure if it is better.
This may be the wildcard slicing mechanism you are looking for.
arr1 = tf.constant([[1,2,3,4,5,6],[1,2,3,4,5,7],[1,2,3,4,5,8]])
arr2 = tf.constant([[10, 11, 12], [10, 11, 12], [10, 11, 12]])
with tf.Session() as sess :
sess.run( tf.global_variables_initializer() )
print(sess.run(tf.concat([arr1[0:,:][:,:2], arr2[0:,:] [:,:1],
arr1[0:,:][:,2:4],arr2[0:, :][:, 1:2],
arr1[0:,:][:,4:6],arr2[0:, :][:, 2:3]],axis=1)))
Output is
[[ 1 2 10 3 4 11 5 6 12]
[ 1 2 10 3 4 11 5 7 12]
[ 1 2 10 3 4 11 5 8 12]]
So, for example,
arr1[0:,:] returns
[[1 2 3 4 5 6]
[1 2 3 4 5 7]
[1 2 3 4 5 8]]
and arr1[0:,:][:,:2] returns the first two columns
[[1 2]
[1 2]
[1 2]]
axis is 1.
Some moderators might have regarded my question as a duplicate of this one, not because the questions are the same, but only because the answers contain parts one can use to answer this question -- i.e. specifying every index combination by hand.
A totally different method would be to multiply by a permutation matrix as shown in the last answer to this question. Since my original question was about scatter_nd, I'm going to post this solution but wait to see what other answers come in... (Alternatively, I or someone could edit the question to make it about reordering columns, not specific to scatter_nd --EDIT: I have just edited the question title to reflect this).
Here, we concatenate the two different arrays/tensors...
import numpy as np
import tensorflow as tf
sess = tf.Session()
# the ultimate application is for merging variables which should be in groups,
# e.g. in this example, [1,2,10] is a group of 3, and there are 3 groups of 3
n_groups = 3
vars_per_group = 3 # once the single value from arr2 (below) is included
arr1 = 10+tf.range(n_groups, dtype=float)
arr1 = tf.stack((arr1,arr1,arr1),0)
arr2 = 1+tf.range(n_groups * (vars_per_group-1), dtype=float)
arr2 = tf.stack((arr2,arr2,arr2),0)
catted = tf.concat((arr1,arr2),1) # concatenate the two arrays together
print("arr1 = \n",sess.run(arr1))
print("arr2 = \n",sess.run(arr2))
print("catted = \n",sess.run(catted))
Which gives output
arr1 =
[[10. 11. 12.]
[10. 11. 12.]
[10. 11. 12.]]
arr2 =
[[1. 2. 3. 4. 5. 6.]
[1. 2. 3. 4. 5. 6.]
[1. 2. 3. 4. 5. 6.]]
catted =
[[10. 11. 12. 1. 2. 3. 4. 5. 6.]
[10. 11. 12. 1. 2. 3. 4. 5. 6.]
[10. 11. 12. 1. 2. 3. 4. 5. 6.]]
Now we build the permutation matrix and multiply...
start_index = 2 # location of where the interleaving begins
# cml = "column map list" is the list of where each column will get mapped to
cml = [start_index + x*(vars_per_group) for x in range(n_groups)] # first array
for i in range(n_groups): # second array
cml += [x + i*(vars_per_group) for x in range(start_index)] # vars before start_index
cml += [1 + x + i*(vars_per_group) + start_index \
for x in range(vars_per_group-start_index-1)] # vars after start_index
print("\n cml = ",cml,"\n")
# Create a permutation matrix using p
np_perm_mat = np.zeros((len(cml), len(cml)))
for idx, i in enumerate(cml):
np_perm_mat[idx, i] = 1
perm_mat = tf.constant(np_perm_mat,dtype=float)
result = tf.matmul(catted, perm_mat)
print("result = \n",sess.run(result))
Which gives output
cml = [2, 5, 8, 0, 1, 3, 4, 6, 7]
result =
[[ 1. 2. 10. 3. 4. 11. 5. 6. 12.]
[ 1. 2. 10. 3. 4. 11. 5. 6. 12.]
[ 1. 2. 10. 3. 4. 11. 5. 6. 12.]]
Even though this doesn't use scatter_nd as the original question asked, one thing I like about this is, you can allocate the perm_mat once in some __init__() method, and hang on to it, and after that initial overhead it's just matrix-matrix multiplication by a sparse, constant matrix, which should be pretty fast. (?)
Still happy to wait and see what other answers might come in.
...and that reference comes from a separate matrix.
This question is an extension of an earlier answered question where the reference element came directly from the same column it was being compared against. Some clever sorting and referencing the index of the sort seemed to solve that one.
Broadcasting has been suggested in both the original and this new question. I run out of memory at around n ~ 3000 and need another order of magnitude larger yet.
The Target ( Production-grade ) Scaling Definitions:
So as to let proposed solutions' approaches fair and mutually comparable, both in the [SPACE]- and the [TIME]-domains,
let's assume n = 50000; m = 20; k = 50; a = np.random.rand( n, m ); ...
I'm now interested in a more general form where the reference value comes from another matrix of reference values.
Original question:
Vectorized pythonic way to get count of elements greater than current element
New question: Can we write a vectorized form to perform the following role.
Function receives as input 2 2-d arrays.
A = n x m
B = k x m
and returns
C = k x m.
C[i,j] is the proportion of observations in A[:,j] ( just the j-th column ) that are larger than B[i,j]
Here is my embarrasingly slow double for loop implementation.
import numpy as np
n = 100
m = 20
k = 50
a = np.random.rand(n,m)
b = np.random.rand(k,m)
c = np.zeros((k,m))
for j in range(0,m): #cols
for i in range(0,k): # rows
r = b[i,j]
c[i,j] = ( ( a[:,j] > r).sum() ) / (n)
Approach #1
We could again use the argsort trick as discussed in this solution but in a bit twisted manner. We would concatenate the second array into the first array and then perform argsort-ing. We need to use argsort for both the concatenated array and the second one and have our desired output. The implementation would look something like this -
ab = np.vstack((a,b))
len_a, len_b = len(a), len(b)
b_argoffset = b.argsort(0).argsort(0)
total_args = ab.argsort(0).argsort(0)[-len_b:]
out = len_a - total_args + b_argoffset
Explanation
Concatenate second array whose values are to be computed into the first array.
Now, since we are appending, we would have their index positions later on, after the first array length has ended.
We use one argsort to get the relative positions of the second array w.r.t to the entire concatenated array and one more argsort to trace back those indices w.r.t the original order.
We need to repeat the double argsort-ing for the second array on itself, so as to compensate for the concatenation.
These indices are for each element in b with the comparison : a[:,j] > b[i,j]. Now, these indices orders are 0-based, i.e. an index closer to 0 represent greater number of elements in a[:,j] than the current element b[i,j], so a greater count and vice versa. So, we need to subtract those indices from the length of a[:,j] for the final output.
Approach #1 - Improvement
We would optimize it further by using array-assignment, again inspired by Approach #2 from the same solution. So, those arg outputs : b_argoffset and total_args could be alternatively computed, like so -
def unqargsort(a):
n,m = a.shape
idx = a.argsort(0)
out = np.zeros((n,m),dtype=int)
out[idx, np.arange(m)] = np.arange(n)[:,None]
return out
b_argoffset = unqargsort(b)
total_args = unqargsort(ab)[-len_b:]
Approach #2
We could also leverage searchsorted for an altogether different approach -
k,m = b.shape
sidx = a.argsort(0)
out = np.empty((k,m), dtype=int)
for i in range(m): #cols
out[:,i] = np.searchsorted(a[:,i], b[:,i],sorter=sidx[:,i])
out = len(a) - out
Explanation
We get the sorted order indices for each column of a.
Then, use those indices to get how we could place values off b into the sorted a with searcshorted. This gives us same as the output from step#3,4 in Approach#1.
Note that these approaches give us the count. So, for the final output, divide the output thus obtained by n.
I think you can use broadcasting:
c = (a[:,None,:] > b).mean(axis=0)
Demo:
In [207]: n = 5
In [208]: m = 3
In [209]: a = np.random.randint(10, size=(n,m))
In [210]: b = np.random.randint(10, size=(n,m))
In [211]: c = np.zeros((n,m))
In [212]: a
Out[212]:
array([[2, 2, 8],
[5, 0, 8],
[2, 5, 7],
[4, 4, 4],
[2, 6, 7]])
In [213]: b
Out[213]:
array([[3, 6, 8],
[2, 7, 5],
[8, 9, 2],
[9, 8, 7],
[2, 7, 2]])
In [214]: for j in range(0,m): #cols
...: for i in range(0,n): # rows
...: r = b[i,j]
...: c[i,j] = ( ( a[:,j] > r).sum() ) / (n)
...:
...:
In [215]: c
Out[215]:
array([[0.4, 0. , 0. ],
[0.4, 0. , 0.8],
[0. , 0. , 1. ],
[0. , 0. , 0.4],
[0.4, 0. , 1. ]])
In [216]: (a[:,None,:] > b).mean(axis=0)
Out[216]:
array([[0.4, 0. , 0. ],
[0.4, 0. , 0.8],
[0. , 0. , 1. ],
[0. , 0. , 0.4],
[0.4, 0. , 1. ]])
check:
In [217]: ((a[:,None,:] > b).mean(axis=0) == c).all()
Out[217]: True
The following code does exactly what I want, which is to compute the pairwise sum of squares of differences between elements of a vector (length three in the example), of which I have a long series (limited to five here). The desired result is shown at the bottom.
But the implementation feels kludgy for two reasons:
1) the need to add a phantom dimension, changing the shape from (5, 3) to (5,1,3) to avoid broadcast problems, and
2) the apparent necessity of an explicit 'for' loop, which I'm sure is why it's taking hours to execute on my much larger data set (a million vectors of length 2904).
Is there a more efficient and/or pythonic way to achieve the same result?
a = np.array([[ 4, 2, 3], [-1, -5, 4], [ 2, 1, 4], [-5, -1, 4], [6, -3, 3]])
a = a.reshape((5,1,3))
m = a.shape[0]
n = a.shape[2]
d = np.zeros((n,n))
for i in range(m):
c = a[i,:] - np.transpose(a[i,:])
c = c**2
d += c
print d
[[ 0. 118. 120.]
[ 118. 0. 152.]
[ 120. 152. 0.]]
If you don't mind the dependency on scipy, you can use functions from the scipy.spatial.distance library:
In [17]: from scipy.spatial.distance import pdist, squareform
In [18]: a = np.array([[ 4, 2, 3], [-1, -5, 4], [ 2, 1, 4], [-5, -1, 4], [6, -3, 3]])
In [19]: d = pdist(a.T, metric='sqeuclidean')
In [20]: d
Out[20]: array([ 118., 120., 152.])
In [21]: squareform(d)
Out[21]:
array([[ 0., 118., 120.],
[ 118., 0., 152.],
[ 120., 152., 0.]])
You could eliminate the for-loop by using:
In [48]: ((a - a.swapaxes(1,2))**2).sum(axis=0)
Out[48]:
array([[ 0, 118, 120],
[118, 0, 152],
[120, 152, 0]])
Note that if a has shape (N, 1, M) then (a - a.swapaxes(1,2)) has shape (N, M, M). Make sure you have enough RAM to accommodate an array of this size. Page swapping can also slow the calculation to a crawl.
If you do have too little memory, you will have to break up the calculation in chunks:
m, _, n = a.shape
chunksize = 10**4
d = np.zeros((n,n))
for i in range(0, m, chunksize):
b = a[i:i+chunksize]
d += ((b - b.swapaxes(1,2))**2).sum(axis=0)
This is a compromise between performing the calculation on the entire array and
calculating row-by-row. If there are a million rows, and the chunksize is 10**4, then there will be only 100 iterations of the loop instead of a million.
Thus, it should be significantly faster than calculating row-by-row. Choose the largest value of chunksize you can which allows the calculation to be performed in RAM.
I have a numpy array consisting of a lot of 0s and a few non-zero entries e.g. like this (just a toy example):
myArray = np.array([[ 0. , 0. , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
Now I would like to move each of the non-zero entries with a given probability which means that some of the entries are moved, some might remain at the current position. Some of the rows are not allowed to contain a non-zero entry which means that values are not allowed to be moved there. I implemented that as follows:
import numpy as np
# for reproducibility
np.random.seed(2)
myArray = np.array([[ 0. , 0. , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
# list of rows where numbers are not allowed to be moved to
ignoreRows = [2]
# moving probability
probMove = 0.3
# get non-zero entries
nzEntries = np.nonzero(myArray)
# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0], nzEntries[1])
# store values
valNZ = [myArray[i] for i in indNZ]
# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(nzEntries))
allowedRows = [ind for ind in xrange(myArray.shape[0]) if ind not in ignoreRows] # replace by "range" in python 3.x
allowedCols = [ind for ind in xrange(myArray.shape[1])] # replace by "range" in python 3.x
for indProb, prob in enumerate(lProb):
# only move with a certain probability
if prob <= probMove:
# randomly change position
myArray[np.random.choice(allowedRows), np.random.choice(allowedCols)] = valNZ[indProb]
# set old position to zero
myArray[indNZ[indProb]] = 0.
print myArray
First, I determine all the indices and values of the non-zero entries. Then I assign a certain probability to each of these entries which determines whether the entry will be moved. Then I get the allowed target rows.
In the second step, I loop through the list of indices and move them according to their moving probability which is done by choosing from the allowed rows and columns, assigning the respective value to these new indices and set the "old" value to 0.
It works fine with the code above, however, speed really matters in this case and I wonder whether there is a more efficient way of doing this.
EDIT:
Hpaulj's answer helped me to get rid of the for-loop which is nice and the reason why I accepted his answer. I incorporated his comments and posted an answer below as well, just in case someone else stumbles over this example and wonders how I used his answer in the end.
You can index elements with arrays, so:
valNZ=myArray[nzEntries]
can replace the zip and comprehension.
Simplify these 2 assignments:
allowedCols=np.arange(myArray.shape[1]);
allowedRows=np.delete(np.arange(myArray.shape[0]), ignoreRows)
With:
I=lProb<probMove; valNZ=valNZ[I];indNZ=indNZ[I]
you don't need to perform the prog<probMove test each time in the loop; just iterate over valNZ and indNZ.
I think your random.choice can be generated for all of these valNZ at once:
np.random.choice(np.arange(10), 10, True)
# 10 choices from the range with replacement
With that it should be possible to move all of the points without a loop.
I haven't worked out the details yet.
There is one way in which your iterative move will be different from any parallel one. If a destination choice is another value, the iterative approach can over write, and possibly move a given value a couple of times. Parallel code will not perform the sequential moves. You have to decide whether one is correct or not.
There is a ufunc method, .at, that performs unbuffered operations. It works for operations like add, but I don't know if would apply to an indexing move like this.
simplified version of the iterative moving:
In [106]: arr=np.arange(20).reshape(4,5)
In [107]: I=np.nonzero(arr>10)
In [108]: v=arr[I]
In [109]: rows,cols=np.arange(4),np.arange(5)
In [110]: for i in range(len(v)):
dest=(np.random.choice(rows),np.random.choice(cols))
arr[dest]=v[i]
arr[I[0][i],I[1][i]] = 0
In [111]: arr
Out[111]:
array([[ 0, 18, 2, 14, 11],
[ 5, 16, 7, 13, 19],
[10, 0, 0, 0, 0],
[ 0, 17, 0, 0, 0]])
possible vectorized version:
In [117]: dest=(np.random.choice(rows,len(v),True),np.random.choice(cols,len(v),True))
In [118]: dest
Out[118]: (array([1, 1, 3, 1, 3, 2, 3, 0, 0]), array([3, 0, 0, 1, 2, 3, 4, 0, 1]))
In [119]: arr[dest]
Out[119]: array([ 8, 5, 15, 6, 17, 13, 19, 0, 1])
In [120]: arr[I]=0
In [121]: arr[dest]=v
In [122]: arr
Out[122]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 16, 0],
[13, 0, 15, 0, 17]])
If I sets 0 after, there are more zeros.
In [124]: arr[dest]=v
In [125]: arr[I]=0
In [126]: arr
Out[126]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]])
same dest, but done iteratively:
In [129]: for i in range(len(v)):
.....: arr[dest[0][i],dest[1][i]] = v[i]
.....: arr[I[0][i],I[1][i]] = 0
In [130]: arr
Out[130]:
array([[18, 19, 2, 3, 4],
[12, 14, 7, 11, 9],
[10, 0, 0, 16, 0],
[ 0, 0, 0, 0, 0]])
With this small size, and high moving density, the differences between iterative and vectorized solutions are large. For a sparse array they would be fewer.
Below you can find the code I came up with after incorporating hpaulj's answer and the answer from this question. This way, I got rid of the for-loop which improved the code a lot. Therefore, I accepted hpaulj's answer. Maybe the code below helps someone else in a similar situation.
import numpy as np
from itertools import compress
# for reproducibility
np.random.seed(2)
myArray = np.array([[ 0. , 0.2 , 0.79],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0.435 , 0. ]])
# list of rows where numbers are not allowed to be moved to
ignoreRows= []
# moving probability
probMove = 0.5
# get non-zero entries
nzEntries = np.nonzero(myArray)
# indices of the non-zero entries as tuples
indNZ = zip(nzEntries[0],nzEntries[1])
# store values
valNZ = myArray[nzEntries]
# generating probabilities for moving for each non-zero entry
lProb = np.random.rand(len(valNZ))
# get the rows/columns where the entries are allowed to be moved
allowedCols = np.arange(myArray.shape[1]);
allowedRows = np.delete(np.arange(myArray.shape[0]), ignoreRows)
# get the entries that are actually moved
I = lProb < probMove
print I
# get the values of the entries that are moved
valNZ = valNZ[I]
# get the indices of the entries that are moved
indNZ = list(compress(indNZ, I))
# get the destination for the entries that are moved
dest = (np.random.choice(allowedRows, len(valNZ), True), np.random.choice(allowedCols, len(valNZ), True))
print myArray
print indNZ
print dest
# set the old indices to 0
myArray[zip(*indNZ)] = 0
# move the values to their respective destination
myArray[dest] = valNZ
print myArray