Apply same permutation for every row in a 2D numpy array - python

To permute a 1D array A I know that you can run the following code:
import numpy as np
A = np.random.permutation(A)
I have a 2D array and want to apply exactly the same permutation for every row of the array. Is there any way you can specify the numpy to do that for you?

Generate random permutations for the number of columns in A and index into the columns of A, like so -
A[:,np.random.permutation(A.shape[1])]
Sample run -
In [100]: A
Out[100]:
array([[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]])
In [101]: A[:,np.random.permutation(A.shape[1])]
Out[101]:
array([[7, 5, 7, 4, 3],
[3, 5, 2, 0, 2],
[8, 4, 3, 8, 1]])

Actually you do not need to do this, from the documentation:
If x is a multi-dimensional array, it is only shuffled along its first
index.
So, taking Divakar's array:
a = np.array([
[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]
])
you can just do: np.random.permutation(a) and get something like:
array([[2, 5, 2, 0, 3],
[3, 5, 7, 4, 7],
[1, 4, 3, 8, 8]])
P.S. if you need to perform column permutations - just do np.random.permutation(a.T).T. Similar things apply to multi-dim arrays.

It depends what you mean on every row.
If you want to permute all values (regardless of row and column), reshape your array to 1d, permute, reshape back to 2d.
If you want to permutate each row but not shuffle the elements among the different columns you need to loop trough the one axis and call permutation.
for i in range(len(A)):
A[i] = np.random.permutation(A[i])
It can probably done shorter somehow but that is how it can be done.

Related

Numpy Array: Slice several values at every step

I am trying to extract several values at once from an array but I can't seem to find a way to do it in a one-liner in Numpy.
Simply put, considering an array:
a = numpy.arange(10)
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
I would like to be able to extract, say, 2 values, skip the next 2, extract the 2 following values etc. This would result in:
array([0, 1, 4, 5, 8, 9])
This is an example but I am ideally looking for a way to extract x values and skip y others.
I thought this could be done with slicing, doing something like:
a[:2:2]
but it only returns 0, which is the expected behavior.
I know I could obtain the expected result by combining several slicing operations (similarly to Numpy Array Slicing) but I was wondering if I was not missing some numpy feature.
If you want to avoid creating copies and allocating new memory, you could use a window_view of two elements:
win = np.lib.stride_tricks.sliding_window_view(a, 2)
array([[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9]])
And then only take every 4th window view:
win[::4].ravel()
array([0, 1, 4, 5, 8, 9])
Or directly go with the more dangerous as_strided, but heed the warnings in the documentation:
np.lib.stride_tricks.as_strided(a, shape=(3,2), strides=(32,8))
You can use a modulo operator:
x = 2 # keep
y = 2 # skip
out = a[np.arange(a.shape[0])%(x+y)<x]
Output: array([0, 1, 4, 5, 8, 9])
Output with x = 2 ; y = 3:
array([0, 1, 5, 6])

Appending to the rows and columns to Multi dimensional Arrays Numpy Python

I am trying to append to create a multi dimensional array where it inputs 4 random numbers per row. The code below does not work. How would I be able to fix it?
import numpy as np
import random
Array = np.array([[]])
for i in range(3):
for k in range(4):
Array[i][k]= np.append(Array[i][k], random.randint(0,9))
Expected Output:
[[1,3,4,8],
[2,3,6,4],
[7,4,1,5],
[8,3,1,1]]
Don't do this. It is highly inefficient to try to create an array like this incrementally using np.append. If you must do something like this, use a listand then convert the resulting list to anumpy.ndarray` later.
However, in this case, you simply want:
>>> import numpy as np
>>> np.random.randint(0, 10, (3,4))
array([[0, 3, 7, 4],
[6, 4, 2, 2],
[4, 4, 0, 6]])
Or perhaps:
>>> np.random.randint(0, 10, (4,4))
array([[8, 8, 2, 7],
[3, 7, 2, 1],
[5, 5, 5, 5],
[6, 2, 7, 9]])
Note, np.random.randint has an exclusive end, so if you want to draw from numbers [0, 9] you need to use 9+1 as the end.

How to loop back to beginning of the array for out of bounds index in numpy?

I have a 2D numpy array that I want to extract a submatrix from.
I get the submatrix by slicing the array as below.
Here I want a 3*3 submatrix around an item at the index of (2,3).
>>> import numpy as np
>>> a = np.array([[0, 1, 2, 3],
... [4, 5, 6, 7],
... [8, 9, 0, 1],
... [2, 3, 4, 5]])
>>> a[1:4, 2:5]
array([[6, 7],
[0, 1],
[4, 5]])
But what I want is that for indexes that are out of range, it goes back to the beginning of array and continues from there. This is the result I want:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
I know that I can do things like getting mod of the index to the width of the array; but I'm looking for a numpy function that does that.
And also for an one dimensional array this will cause an index out of range error, which is not really useful...
This is one way using np.pad with wraparound mode.
>>> a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
>>> pad_width = 1
>>> i, j = 2, 3
>>> startrow, endrow = i-1+pad_width, i+2+pad_width # for 3 x 3 submatrix
>>> startcol, endcol = j-1+pad_width, j+2+pad_width
>>> np.pad(a, (pad_width, pad_width), 'wrap')[startrow:endrow, startcol:endcol]
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
Depending on the shape of your patch (eg. 5 x 5 instead of 3 x 3) you can increase the pad_width and start and end row and column indices accordingly.
np.take does have a mode parameter which can wrap-around out of bound indices. But it's a bit hacky to use np.take for multidimensional arrays since the axis must be a scalar.
However, In your particular case you could do this:
a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
np.take(a, np.r_[2:5], axis=1, mode='wrap')[1:4]
Output:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
EDIT
This function might be what you are looking for (?)
def select3x3(a, idx):
x,y = idx
return np.take(np.take(a, np.r_[x-1:x+2], axis=0, mode='wrap'), np.r_[y-1:y+2], axis=1, mode='wrap')
But in retrospect, i recommend using modulo and fancy indexing for this kind of operation (it's basically what the mode='wrap' is doing internally anyways):
def select3x3(a, idx):
x,y = idx
return a[np.r_[x-1:x+2][:,None] % a.shape[0], np.r_[y-1:y+2][None,:] % a.shape[1]]
The above solution is also generalized for any 2d shape on a.

Sorting 2D array by the first n rows

How can I sort an array in NumPy by the two first rows?
For example,
A=array([[9, 2, 2],
[4, 5, 6],
[7, 0, 5]])
And I'd like to sort columns by the first two rows, such that I get back:
A=array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Thank you!
One approach is to transform the 2D array over which we want to take the argsort into an easier to handle 1D array. For that one idea could be to multiply the rows to take into accounts for the sorting purpose by successively decreasing values in the power of 10 sequence, sum them and then use argsort (note: this method will be numerically unstable for high values of k. Meant for values up to ~20):
def sort_on_first_k_rows(x, k):
# normalize each row so that its max value is 1
a = (x[:k,:]/x[:k,:,None].max(1)).astype('float64')
# multiply each row by the seq 10^n, for n=k-1,k-2...0
# Ensures that the contribution of each row in the sorting is
# captured in the final sum
a_pow = (a*10**np.arange(a.shape[0]-1,-1,-1)[:,None])
# Sort with the argsort on the resulting sum
return x[:,a_pow.sum(0).argsort()]
Checking with the shared example:
sort_on_first_k_rows(A, 2)
array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Or with another example:
A=np.array([[9, 2, 2, 1, 5, 2, 9],
[4, 7, 6, 0, 9, 3, 3],
[7, 0, 5, 0, 2, 1, 2]])
sort_on_first_k_rows(A, 2)
array([[1, 2, 2, 2, 5, 9, 9],
[0, 3, 6, 7, 9, 3, 4],
[0, 1, 5, 0, 2, 2, 7]])
The pandas library is very flexible for sorting DataFrames - but only based on columns. So I suggest to transpose and convert your array to a DataFrame like this (note that you need to specify column names for later defining the sorting criteria):
df = pd.DataFrame(A.transpose(), columns=['col'+str(i) for i in range(len(A))])
Then sort it and convert it back like this:
A_new = df.sort_values(['col0', 'col1'], ascending=[True, True]).to_numpy().transpose()

Indexing in NumPy: Access every other group of values

The [::n] indexing option in numpy provides a very useful way to index every nth item in a list. However, is it possible to use this feature to extract multiple values, e.g. every other pair of values?
For example:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
And I want to extract every other pair of values i.e. I want to return
a[0, 1, 4, 5, 8, 9,]
Of course the index could be built using loops or something, but I wonder if there's a faster way to use ::-style indexing in numpy but also specifying the width of the pattern to take every nth iteration of.
Thanks
With length of array being a multiple of the window size -
In [29]: W = 2 # window-size
In [30]: a.reshape(-1,W)[::2].ravel()
Out[30]: array([0, 1, 4, 5, 8, 9])
Explanation with breaking-down-the-steps -
# Reshape to split into W-sized groups
In [43]: a.reshape(-1,W)
Out[43]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
# Use stepsize to select every other pair starting from the first one
In [44]: a.reshape(-1,W)[::2]
Out[44]:
array([[0, 1],
[4, 5],
[8, 9]])
# Flatten for desired output
In [45]: a.reshape(-1,W)[::2].ravel()
Out[45]: array([0, 1, 4, 5, 8, 9])
If you are okay with 2D output, skip the last step as that still be a view into the input and virtually free on runtime. Let's verify the view-part -
In [47]: np.shares_memory(a,a.reshape(-1,W)[::2])
Out[47]: True
For generic case of not necessarily a multiple, we can use a masking based one -
In [64]: a[(np.arange(len(a))%(2*W))<W]
Out[64]: array([0, 1, 4, 5, 8, 9])
You can do that reshaping the array into a nx3 matrix, then slice up the first two elements for each row and finally flatten up the reshaped array:
a.reshape((-1,3))[:,:2].flatten()
resulting in:
array([ 0, 1, 3, 4, 6, 7, 9, 10])

Categories