I would like to set some values of a 2D array to a specific number by indexing them efficiently.
Say I have a 2D numpy array,
A = array([[1, 6, 6],
[9, 7, 7],
[10, 2, 2]])
and I would like to get the indices in the array that belong to a set of numbers, say indList=[10, 1] so that I can set them to zero. However, indList can be a huge list.
Is there a faster way for doing this without a for loop?
As a for loop it would be,
indList = [10, 1]
for i in indList:
A[A==i] = 0
But this can get inefficient when indList is large.
With numpy, you can vectorize this by first finding the indices of elements that are in indList and then setting them to be zero.
A = np.array([[1, 6, 6],
[9, 7, 7],
[10 ,2 ,2]])
A[np.where(np.isin(A, [10,1]))] = 0
This gives
A = [[0 6 6]
[9 7 7]
[0 2 2]]
From #Miket25's answer, there is actually no need to add the np.where layer. np.isin(A, [10, 1]) returns a boolean array which is perfectly acceptable as an index. So simply do
A[np.isin(A, [10, 1])] = 0
Related
question about slicing numpy arrays.
Say I have an array:
A = np.array([1,2,3,4,5,6,7,8,9]).reshape(3,3)
[1 2 3]
[4 5 6]
[7 8 9]
and indices:
idx = [2,2,1]
and I want to get up to the index value for each row..i.e [:2] in first row, [:2] in second, [:1] in third. Also would like to sum the slices as I go.
I know I can achieve this doing the following:
for i,a in zip(idx,A):
print(a[:i],sum(a[:i]))
output:
[1 2] 3
[4 5] 9
[7] 7
Is there anyway this could be achieved without a for loop? Main focus is to do the irregular slicing, the sum was just an arbitrary operation I want to perform.
Something like:
A[:,:idx]
just to give context to what I mean
You could create a matrix of indexes & create a mask by checking if the index is in the required range.
idx = np.repeat(np.arange(0,3), 3, 0).reshape(3,3).T
row_limits = np.array([[2], [2], [1]])
mask = idx < row_limits
masked_A = np.multiply(A, mask)
# masked_A outputs:
array([[1, 2, 0],
[4, 5, 0],
[7, 0, 0]])
and then apply sum along axis=1
masked_A.sum(1)
# outputs: array([3, 9, 7])
I have a numpy array like the following:
Xtrain = np.array([[1, 2, 3],
[4, 5, 6],
[1, 7, 3]])
I want to shuffle the items of each row separately, but do not want the shuffle to be the same for each row (as in several examples just shuffle column order).
For example, I want an output like the following:
output = np.array([[3, 2, 1],
[4, 6, 5],
[7, 3, 1]])
How can I randomly shuffle each of the rows randomly in an efficient way? My actual np array is over 100000 rows and 1000 columns.
Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:
In [86]: np.random.shuffle(Xtrain.T)
In [87]: Xtrain
Out[87]:
array([[2, 3, 1],
[5, 6, 4],
[7, 3, 1]])
Note that random.suffle() on a 2D array shuffles the rows not items in each rows. i.e. changes the position of the rows. Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.
If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:
In [172]: def crazyshuffle(arr):
...: x, y = arr.shape
...: rows = np.indices((x,y))[0]
...: cols = [np.random.permutation(y) for _ in range(x)]
...: return arr[rows, cols]
...:
Demo:
In [173]: crazyshuffle(Xtrain)
Out[173]:
array([[1, 3, 2],
[6, 5, 4],
[7, 3, 1]])
In [174]: crazyshuffle(Xtrain)
Out[174]:
array([[2, 3, 1],
[4, 6, 5],
[1, 3, 7]])
From: https://github.com/numpy/numpy/issues/5173
def disarrange(a, axis=-1):
"""
Shuffle `a` in-place along the given axis.
Apply numpy.random.shuffle to the given axis of `a`.
Each one-dimensional slice is shuffled independently.
"""
b = a.swapaxes(axis, -1)
# Shuffle `b` in-place along the last axis. `b` is a view of `a`,
# so `a` is shuffled in place, too.
shp = b.shape[:-1]
for ndx in np.ndindex(shp):
np.random.shuffle(b[ndx])
return
This solution is not efficient by any means, but I had fun thinking about it, so wrote it down. Basically, you ravel the array, and create an array of row labels, and an array of indices. You shuffle the index array, and index the original and row label arrays with that. Then you apply a stable argsort to the row labels to gather the data into rows. Apply that index and reshape and viola, data shuffled independently by rows:
import numpy as np
r, c = 3, 4 # x.shape
x = np.arange(12) + 1 # Already raveled
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()
np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]
inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)
Here is an IDEOne Link
We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.
target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]
shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]
target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
# [7, 4, 6, 8, 5],
# [4, 7, 9, 5, 6],
# [6, 6, 8, 2, 8],
# [1, 6, 7, 8, 3]])
Explanation
We use np.random.rand and argsort to mimic the effect from shuffling.
random.rand gives randomness.
Then, we use argsort with axis=1 to help rank each row. This creates the index that can be used for reordering.
Lets say you have array a with shape 100000 x 1000.
b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]
I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort
You may use Pandas:
df = pd.DataFrame(X_train)
_ = df.apply(lambda x: np.random.permutation(x), axis=1, raw=True)
df.values
Change the keyword to axis=0 if you want to shuffle columns.
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I want to get
[1,2
4,5
7,8]
ndarray[:][0:2]
it get
array([[1, 2, 3],
[4, 5, 6]])
why!?
ndarray[:] returns an identical array and when you use [0:2], it returns an arry with the first 2 elements, and hence [[1,2,3],[4,5,6]]
What you want to do is this : ndarray[0:3,0:2] or simpler ndarray[:,:2]
This will return a slice of the array slicing in 2 dimensions
ndarray[:] gives you the whole array, and with the following [0:2] you select the first two element of it, which is [1,2,3] and [4,5,6].
You need to slice (as DavidG already suggested) in the first bracket for all desired dimensions:
ndarray[:,0:2]
P.S.: Please change your title to something, that relates closer to your problem.
Your code:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(a[:])
Output: same as a, as you are selecting all of the rows:
[[1 2 3] [4 5 6] [7 8 9]]
Then indexing on that will apply the index on the result, all of the rows (i.e. a[:][0:2] is equivalent to a[0:2] - selecting the first two rows)
To select the first two columns:
print(a[:, 0:2])
Output (as expected):
[[1 2] [4 5] [7 8]]
I would recommend going through Numpy's indexing documentation: https://docs.scipy.org/doc/numpy/user/basics.indexing.html
I have the following problem:
Let's say I have an array defined like this:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
What I would like to do is to make use of Numpy multiple indexing and set several elements to 0. To do that I'm creating a vector:
indices_to_remove = [1, 2, 0]
What I want it to mean is the following:
Remove element with index '1' from the first row
Remove element with index '2' from the second row
Remove element with index '0' from the third row
The result should be the array [[1,0,3],[4,5,0],[0,8,9]]
I've managed to get values of the elements I would like to modify by following code:
values = np.diagonal(np.take(A, indices, axis=1))
However, that doesn't allow me to modify them. How could this be solved?
You could use integer array indexing to assign those zeros -
A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
Sample run -
In [445]: A
Out[445]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [446]: indices_to_remove
Out[446]: [1, 2, 0]
In [447]: A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
In [448]: A
Out[448]:
array([[1, 0, 3],
[4, 5, 0],
[0, 8, 9]])
I'm trying to iterate through a 2D array getting the sum for each list inside the array. For example I have:
test = [[5, 3, 6], [2, 1, 3], [1, 1, 3], [2, 6, 6], [4, 5, 3], [3, 6, 2], [5, 5, 2], [4, 4, 4], [3, 5, 3], [1, 3, 4]]
I want to take the values of each smaller array, so for example 5+3+6 and 2+1+3 and put them into a new array. So I'm aiming for something like:
testSum = [14, 6, 5, 14...].
I'm having trouble properly enumerating through a 2D array. It seems to jump around. I know my codes not correct but this is what i have so far:
k = 10
m = 3
testSum = []
#create array with 10 arrays of length 3
test = [[numpy.random.randint(1,7) for i in range(m)] for j in range(k)]
sum = 0
#go through each sub-array in test array
for array in test:
#add sums of sub-arrays
for i in array
sum += test[array][i]
testSum.append(sum)
You can do this more pythonic way,
In [17]: print [sum(i) for i in test]
[14, 6, 5, 14, 12, 11, 12, 12, 11, 8]
or
In [19]: print map(sum,test)
[14, 6, 5, 14, 12, 11, 12, 12, 11, 8]
Since you're using Numpy, you should let Numpy handle the looping: it's much more efficient than using explicit Python loops.
import numpy as np
k = 10
m = 3
test = np.random.randint(1, 7, size=(k, m))
print(test)
print('- ' * 20)
testSum = np.sum(test, axis=1)
print(testSum)
typical output
[[2 5 1]
[1 5 5]
[6 5 3]
[1 1 1]
[2 5 6]
[4 2 5]
[3 3 1]
[6 4 6]
[2 5 1]
[6 5 2]]
- - - - - - - - - - - - - - - - - - - -
[ 8 11 14 3 13 11 7 16 8 13]
As for the code you posted, it has a few problems. The main one being that you need to set the sum variable to zero for each sub-list. BTW, you shouldn't use sum as a variable name because that shadows Python's built-in sum function.
Also, your array access is wrong. (And you shouldn't use array as a variable name either, since it's the name of a standard module).
for array in test:
for i in array:
iterates over each list in test and then over each item in each of those list, so i is already an item of an inner list, so in
sum += test[array][i]
you are attempting to index the test list with a list instead of an integer, and then you're trying to index the result of that with the current item in i.
(In other words, in Python, when you iterate over a container object in a for loop the loop variable takes on the values of the items in the container, not their indices. This may be confusing if you are coming from a language where the loop variable gets the indices of those items. If you want the indices you can use the built-in enumerate function to get the indices and items at the same time).
Here's a repaired version of your code.
import numpy as np
k = 10
m = 3
#create array with 10 arrays of length 3
test = [[np.random.randint(1,7) for i in range(m)] for j in range(k)]
print(test)
print()
testSum = []
#go through each sub-array in test array
for array in test:
#add sums of sub-arrays
asum = 0
for i in array:
asum += i
testSum.append(asum)
print(testSum)
typical output
[[4, 5, 1], [3, 6, 6], [3, 4, 1], [2, 1, 1], [1, 6, 4], [3, 4, 4], [3, 2, 6], [6, 3, 2], [1, 3, 5], [5, 3, 3]]
[10, 15, 8, 4, 11, 11, 11, 11, 9, 11]
As I said earlier, it's much better to use Numpy arrays and let Numpy do the looping for you. However, if your program is only processing small lists there's no need to use Numpy: just use the functions in the standard random module to generate your random numbers and use the technique shown in Rahul K P's answer to calculate the sums: it's more compact and faster than using a Python loop.