How to sort a numpy matrix using a mask?

How to sort a numpy matrix using a mask? - python

I have two matrices A, B, Which look like this:
A = array([[2, 2, 1, 0, 8],
[8, 2, 0, 3, 7],
[3, 2, 6, 5, 3],
[1, 4, 2, 5, 8],
[2, 3, 7, 0, 3]])
B = array([[3, 7, 6, 8, 3],
[0, 7, 4, 4, 3],
[1, 2, 0, 0, 4],
[8, 6, 6, 7, 1],
[8, 1, 0, 4, 8]])
I am trying to sort A and B BUT I need B to be ordered with the mask from A.
I tried this:
mask = A.argsort()
A = A[mask]
B = B[mask]
However the return value is a shaped (5, 5, 5) matrix
The next snippet works, but is using two iterations. I need something faster. Has anybody an Idea ?
A = [row[order] for row, order in zip(A,mask)]
B = [row[order] for row, order in zip(B,mask)]

You can use fancy indexing. The result will be the same shape as your indices broadcasted together. Your column index is already the right shape. A row index of size (A.shape[0], 1) would broadcast correctly:
r = np.arange(A.shape[0]).reshape(-1, 1)
c = np.argsort(A)
A = A[r, c]
B = B[r, c]
The reason that your original index didn't work out is that you were indexing with a single dimension, which selects entire rows based on each location. This would have failed if you had more columns than rows.
A simpler way would be to follow what the argsort docs suggest:
A = np.take_along_axis(A, mask, axis=-1)
B = np.take_along_axis(B, mask, axis=-1)

Related

python replace value in array based on previous and following value in column

given the following array, I want to replace the zero with their previous value columnwise as long as it is surrounded by two values greater than zero.
I am aware of np.where but it would consider the whole array instead of its columns.
I am not sure how to do it and help would be appreciated.
This is the array:
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
and since the only zero that meets this condition is the second row/second column one,
the new array should be the following
new_a=np.array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
How do I accomplish this?
And what if I would like to extend the gap surrounded by nonzero ? For instance, the first column contains two 0 and the second column contains one 0, so the new array would be
new_a=np.array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])
In short, how do I solve this if the columnwise condition would be the one of having N consecutive zeros or less?

As a generic method, I would approach this using a convolution:
from scipy.signal import convolve2d
# kernel for top/down neighbors
kernel = np.array([[1],
[0],
[1]])
# is the value a zero?
m1 = a==0
# count non-zeros neighbors
m2 = convolve2d(~m1, kernel, mode='same') > 1
mask = m1&m2
# replace matching values with previous row value
a[mask] = np.roll(a, 1, axis=0)[mask]
output:
array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
filling from surrounding values
Using pandas to benefit from ffill/bfill (you can forward-fill in pure numpy but its more complex):
import pandas as pd
df = pd.DataFrame(a)
# limit for neighbors
N = 2
# identify non-zeros
m = df.ne(0)
# mask zeros
m2 = m.where(m)
# mask for values with 2 neighbors within limits
mask = m2.ffill(limit=N) & m2.bfill(limit=N)
df.mask(mask&~m).ffill()
array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])

That's one solution I found. I know it's basic but I think it works.
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
a_t = a.T
for i in range(len(a_t)):
ar = a_t[i]
for j in range(len(ar)-1):
if (j>0) and (ar[j] == 0) and (ar[j+1] > 0):
a_t[i][j] = a_t[i][j-1]
a = a_t.T

How to loop back to beginning of the array for out of bounds index in numpy?

I have a 2D numpy array that I want to extract a submatrix from.
I get the submatrix by slicing the array as below.
Here I want a 3*3 submatrix around an item at the index of (2,3).
>>> import numpy as np
>>> a = np.array([[0, 1, 2, 3],
... [4, 5, 6, 7],
... [8, 9, 0, 1],
... [2, 3, 4, 5]])
>>> a[1:4, 2:5]
array([[6, 7],
[0, 1],
[4, 5]])
But what I want is that for indexes that are out of range, it goes back to the beginning of array and continues from there. This is the result I want:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
I know that I can do things like getting mod of the index to the width of the array; but I'm looking for a numpy function that does that.
And also for an one dimensional array this will cause an index out of range error, which is not really useful...

This is one way using np.pad with wraparound mode.
>>> a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
>>> pad_width = 1
>>> i, j = 2, 3
>>> startrow, endrow = i-1+pad_width, i+2+pad_width # for 3 x 3 submatrix
>>> startcol, endcol = j-1+pad_width, j+2+pad_width
>>> np.pad(a, (pad_width, pad_width), 'wrap')[startrow:endrow, startcol:endcol]
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
Depending on the shape of your patch (eg. 5 x 5 instead of 3 x 3) you can increase the pad_width and start and end row and column indices accordingly.

np.take does have a mode parameter which can wrap-around out of bound indices. But it's a bit hacky to use np.take for multidimensional arrays since the axis must be a scalar.
However, In your particular case you could do this:
a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
np.take(a, np.r_[2:5], axis=1, mode='wrap')[1:4]
Output:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
EDIT
This function might be what you are looking for (?)
def select3x3(a, idx):
x,y = idx
return np.take(np.take(a, np.r_[x-1:x+2], axis=0, mode='wrap'), np.r_[y-1:y+2], axis=1, mode='wrap')
But in retrospect, i recommend using modulo and fancy indexing for this kind of operation (it's basically what the mode='wrap' is doing internally anyways):
def select3x3(a, idx):
x,y = idx
return a[np.r_[x-1:x+2][:,None] % a.shape[0], np.r_[y-1:y+2][None,:] % a.shape[1]]
The above solution is also generalized for any 2d shape on a.

Compare two 3d Numpy array and return unmatched values with index and later recreate them without loop

I am currently working on a problem where in one requirement I need to compare two 3d NumPy arrays and return the unmatched values with their index position and later recreate the same array. Currently, the only approach I can think of is to loop across the arrays to get the values during comparing and later recreating. The problem is with scale as there will be hundreds of arrays and looping effects the Latency of the overall application. I would be thankful if anyone can help me with better utilization of NumPy comparison while using minimal or no loops. A dummy code is below:
def compare_array(final_array_list):
base_array = None
i = 0
for array in final_array_list:
if i==0:
base_array =array[0]
else:
index = np.where(base_array != array)
#getting index like (array([0, 1]), array([1, 1]), array([2, 2]))
# to access all unmatched values I need to loop.Need to avoid loop here
i=i+1
return [base_array, [unmatched value (8,10)and its index (array([0, 1]), array([1, 1]), array([2, 2])],..]
# similarly recreate array1 back
def recreate_array(array_list):
# need to avoid looping while recreating array back
return list of array #i.e. [base_array, array_1]
# creating dummy array
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
final_array_list = [base_array,array_1, ...... ]
#compare base_array with other arrays and get unmatched values (like 8,10 in array_1) and their index
difff_array = compare_array(final_array_list)
# recreate array1 from the base array after receiving unmatched value and its index value
recreate_array(difff_array)

I think this may be what you're looking for:
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
match_mask = (base_array == array_1)
idx_unmatched = np.argwhere(~match_mask)
# idx_unmatched:
# array([[0, 1, 2],
# [1, 1, 2]])
# values with associated with idx_unmatched:
values_unmatched = base_array[tuple(idx_unmatched.T)]
# values_unmatched:
# array([5, 9])

I'm not sure I understand what you mean by "recreate them" (completely recreate them? why not use the arrays themselves?).
I can help you though by noting that ther are plenty of functions which vectorize with numpy, and as a general rule of thumb, do not use for loops unless G-d himself tells you to :)
For example:
If a, b are any np.arrays (regardless of dimensions), the simple a == b will return a numpy array of the same size, with boolean values. Trues = they are equal in this coordinate, and False otherwise.
The function np.where(c), will convert c to a boolean np.array, and return you the indexes in which c is True.
To clarify:
Here I instantiate two arrays, with b differing from a with -1 values:
Note what a==b is, at the end.
>>> a = np.random.randint(low=0, high=10, size=(4, 4))
>>> b = np.copy(a)
>>> b[2, 3] = -1
>>> b[0, 1] = -1
>>> b[1, 1] = -1
>>> a
array([[9, 9, 3, 4],
[8, 4, 6, 7],
[8, 4, 5, 5],
[1, 7, 2, 5]])
>>> b
array([[ 9, -1, 3, 4],
[ 8, -1, 6, 7],
[ 8, 4, 5, -1],
[ 1, 7, 2, 5]])
>>> a == b
array([[ True, False, True, True],
[ True, False, True, True],
[ True, True, True, False],
[ True, True, True, True]])
Now the function np.where, which output is a bit tricky, but can be used easily. This will return two arrays of the same size: the first array is the rows and the second array is the columns at places in which the given array is True.
>>> np.where(a == b)
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype=int64), array([0, 2, 3, 0, 2, 3, 0, 1, 2, 0, 1, 2, 3], dtype=int64))
Now you can "fix" the b array to match a, by switching the values of b ar indexes in which it differs from a, to be a's indexes:
>>> b[np.where(a != b)]
array([-1, -1, -1])
>>> b[np.where(a != b)] = a[np.where(a != b)]
>>> np.all(a == b)
True

Reshape jagged array and fill with zeros

The task I wish to accomplish is the following: Consider a 1-D array a and an array of indices parts of length N. Example:
a = np.arange(9)
parts = np.array([4, 6, 9])
# a = array([0, 1, 2, 3, 4, 5, 6, 7, 8])
I want to cast a into a 2-D array of shape (N, <length of longest partition in parts>), inserting values of a upto each index in indx in each row of the 2-D array, filling the remaining part of the row with zeroes, like so:
array([[0, 1, 2, 3],
[4, 5, 0, 0],
[6, 7, 8, 0])
I do not wish to use loops. Can't wrap my head around this, any help is appreciated.

Here's one with boolean-indexing -
def jagged_to_regular(a, parts):
lens = np.ediff1d(parts,to_begin=parts[0])
mask = lens[:,None]>np.arange(lens.max())
out = np.zeros(mask.shape, dtype=a.dtype)
out[mask] = a
return out
Sample run -
In [46]: a = np.arange(9)
...: parts = np.array([4, 6, 9])
In [47]: jagged_to_regular(a, parts)
Out[47]:
array([[0, 1, 2, 3],
[4, 5, 0, 0],
[6, 7, 8, 0]])

Sorting 2D array by the first n rows

How can I sort an array in NumPy by the two first rows?
For example,
A=array([[9, 2, 2],
[4, 5, 6],
[7, 0, 5]])
And I'd like to sort columns by the first two rows, such that I get back:
A=array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Thank you!

One approach is to transform the 2D array over which we want to take the argsort into an easier to handle 1D array. For that one idea could be to multiply the rows to take into accounts for the sorting purpose by successively decreasing values in the power of 10 sequence, sum them and then use argsort (note: this method will be numerically unstable for high values of k. Meant for values up to ~20):
def sort_on_first_k_rows(x, k):
# normalize each row so that its max value is 1
a = (x[:k,:]/x[:k,:,None].max(1)).astype('float64')
# multiply each row by the seq 10^n, for n=k-1,k-2...0
# Ensures that the contribution of each row in the sorting is
# captured in the final sum
a_pow = (a*10**np.arange(a.shape[0]-1,-1,-1)[:,None])
# Sort with the argsort on the resulting sum
return x[:,a_pow.sum(0).argsort()]
Checking with the shared example:
sort_on_first_k_rows(A, 2)
array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Or with another example:
A=np.array([[9, 2, 2, 1, 5, 2, 9],
[4, 7, 6, 0, 9, 3, 3],
[7, 0, 5, 0, 2, 1, 2]])
sort_on_first_k_rows(A, 2)
array([[1, 2, 2, 2, 5, 9, 9],
[0, 3, 6, 7, 9, 3, 4],
[0, 1, 5, 0, 2, 2, 7]])

The pandas library is very flexible for sorting DataFrames - but only based on columns. So I suggest to transpose and convert your array to a DataFrame like this (note that you need to specify column names for later defining the sorting criteria):
df = pd.DataFrame(A.transpose(), columns=['col'+str(i) for i in range(len(A))])
Then sort it and convert it back like this:
A_new = df.sort_values(['col0', 'col1'], ascending=[True, True]).to_numpy().transpose()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to sort a numpy matrix using a mask? - python

Related

python replace value in array based on previous and following value in column

How to loop back to beginning of the array for out of bounds index in numpy?

Compare two 3d Numpy array and return unmatched values with index and later recreate them without loop

Reshape jagged array and fill with zeros

Sorting 2D array by the first n rows

Categories

Resources