I have a binary matrix of N rows and K columns. I would like to reorder the matrix into "left-ordered form," meaning the columns are permuted such that if each column is treated as a number in binary format (with the first row's value as most significant bit and the last row's value as the least significant bit), the columns are in decreasing value from left to right. For instance, this would look like the following:
[[0, 1, 1],
[1, 1, 0]]
becomes
[[1, 1, 0],
[1, 0, 1]]
This question is the same as Left ordered binary matrix algorithm in R, but the answer there is insufficient. I have no upper bound on the number of rows N, so explicitly computing each column's binary value is impossible.
Unless I'm missing something, this is simply sorting the columns in reverse lexographic order?
import numpy as np
sample = np.array([
[1, 0, 1, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 0, 1, 1]
])
sample[:, np.lexsort(-sample[::-1])]
output:
array([[1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 1, 0]])
You can adapt bucket sort to sort it row by row until there's nothing left to sort.
import numpy as np
sample = np.array([
[1, 0, 1, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 0, 1, 1]
])
def SortMatrix(matrix, indices, row):
if indices.size <= 1 or row >= matrix.shape[0]:
return indices
left = indices[np.where(matrix[row, indices] == 1)]
right = indices[np.where(matrix[row, indices] == 0)]
return np.concatenate((SortMatrix(matrix, left, row+1), SortMatrix(matrix, right, row+1)))
sample = sample[:,SortMatrix(sample, np.array(range(sample.shape[1])), 0)]
print(sample)
# [[1 1 1 1 0 0 0 0]
# [1 1 0 0 1 1 0 0]
# [1 0 1 0 1 0 1 0]]
You can convert them to decimal and get the sorted order.
>>> mul = 2 ** (np.arange(arr.shape[0])[::-1]).reshape(1, -1)
>>> mul
array([[2, 1]], dtype=int32)
>>> order = np.argsort(mul # arr).squeeze()[::-1]
>>> order
array([1, 0, 2], dtype=int64)
>>> arr[:, order]
array([[1, 1, 0],
[1, 0, 1]])
Related
I've started learning numpy since yesterday.
my AIM is
Extract odd index elements from numpy array & even index elements from numpy and merge side by side vertically.
Let's say I have the array
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
What I've tried.
-->I've done transposing as I've to merge side by by side vertically.
mat = np.transpose(mat)
Which gives me
[[1 0 1 0 1]
[1 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[0 1 1 0 1]]
I've tried accessing odd index elements
odd = mat[1::2] print(odd)
Gives me
[[1 1 0 0 0] ----> wrong...should be [0,1,0,0,1] right? I'm confused
[0 0 1 0 0]] --->wrong...Should be [0,0,0,0,0] right? Where these are coming from?
My final output should like like
[[0 0 1 1 1]
[1 0 1 0 0]
[0 0 0 0 1]
[0 0 0 1 0]
[1 0 0 1 1]]
Type - np.nd array
Looks like you want:
mat[np.r_[1:mat.shape[0]:2,:mat.shape[0]:2]].T
Output:
array([[0, 0, 1, 1, 1],
[1, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[1, 0, 0, 1, 1]])
Intermediate:
np.r_[1:mat.shape[0]:2,:mat.shape[0]:2]
output: array([1, 3, 0, 2, 4])
While the selection of rows is straight forward, there are various ways of combining them.
In [244]: mat = np.array([[1, 1, 0, 0, 0],
...: [0, 1, 0, 0, 1],
...: [1, 0, 0, 1, 1],
...: [0, 0, 0, 0, 0],
...: [1, 0, 1, 0, 1]])
The odd rows:
In [245]: mat[1::2,:] # or mat[1::2]
Out[245]:
array([[0, 1, 0, 0, 1],
[0, 0, 0, 0, 0]])
The even rows:
In [246]: mat[0::2,:]
Out[246]:
array([[1, 1, 0, 0, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Joining the rows verticallly (np.vstack can also be used):
In [247]: np.concatenate((mat[1::2,:], mat[0::2,:]), axis=0)
Out[247]:
array([[0, 1, 0, 0, 1],
[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
But since you want columns - tranpose:
In [248]: np.concatenate((mat[1::2,:], mat[0::2,:]), axis=0).transpose()
Out[248]:
array([[0, 0, 1, 1, 1],
[1, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[1, 0, 0, 1, 1]])
We could transpose the selections first:
np.concatenate((mat[1::2,:].T, mat[0::2,:].T), axis=1)
or transpose before indexing (note the change in the ':' slice position):
np.concatenate((mat.T[:,1::2], mat.T[:,0::2]), axis=1)
The r_ in the other answer converts the slices into arrays and concatenates them, to make one row indexing array. That's equally valid.
So here alternate is the logic you can use.
1. convert array to list
2. Access nested list items based on mat[1::2] - odd & mat[::2] for even
3. concat them using np.concat at `axis =0` vertically.
4. Transpose them.
Implementaion.
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
mat_list = mat.tolist() ##############Optional
l_odd = mat_list[1::2]
l_even= mat_list[::2]
mask = np.concatenate((l_odd, l_even), axis=0)
mask = np.transpose(mask)
print(mask)
output #
[[0 0 1 1 1]
[1 0 1 0 0]
[0 0 0 0 1]
[0 0 0 1 0]
[1 0 0 1 1]]
Checking Type
print(type(mask))
Gives
<class 'numpy.ndarray'>
I have an array which contains 1's and 0's. A very small section of it looks like this:
arr=[[0,0,0,0,1],
[0,0,1,0,0],
[0,1,0,0,0],
[1,0,1,0,0]]
I want to change the value of every cell to 1, if it is to the left of a cell with a value of 1. I want all other cells to keep their value of 0, i.e:
arrOut=[[1,1,1,1,1],
[1,1,1,0,0],
[1,1,0,0,0]
[1,1,1,0,0]
Some rows have >1 cell with a value =1.
I have managed to do this using a very ugly double for-loop:
for i in range(len(arr)):
for j in range(len(arr[i])):
if arr[i][j]==1:
arrOut[i][0:j]=1
Does anyone know of another way to do this with using for loops? I'm relatively comfortable with numpy and pandas, but also open to other libraries.
Thanks!
You can flip using it, and use np.cumsum:
>>> arr[:, ::-1].cumsum(axis=1)[:, ::-1]
array([[1, 1, 1, 1, 1],
[1, 1, 1, 0, 0],
[1, 1, 0, 0, 0]], dtype=int32)
Or the same using np.fliplr,
>>> np.fliplr(np.fliplr(arr).cumsum(axis=1))
array([[1, 1, 1, 1, 1],
[1, 1, 1, 0, 0],
[1, 1, 0, 0, 0]], dtype=int32)
Using np.where:
>>> np.where(arr.cumsum(1)==0, 1, arr)
array([[1, 1, 1, 1, 1],
[1, 1, 1, 0, 0],
[1, 1, 0, 0, 0]], dtype=int32)
If array has more than one 1, use np.clip:
>>> arr
array([[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[0, 1, 0, 1, 0]])
>>> np.clip(arr[:, ::-1].cumsum(axis=1)[:, ::-1], 0, 1)
array([[1, 1, 1, 1, 1],
[1, 1, 1, 0, 0],
[1, 1, 1, 1, 0]], dtype=int32)
# If you want to make all 0s before the leftmost 1 to 1:
>>> np.where(arr.cumsum(1)==0, 1, arr)
array([[1, 1, 1, 1, 1],
[1, 1, 1, 0, 0],
[1, 1, 0, 1, 0]])
A possibility without using libraries:
for subarray in arr:
#Get the index of the 1 starting from behind so that if there are 2 1s,
#you get the index of the rightmost one
indexof1 = len(subarray) -1 - subarray[::-1].index(1)
#Until the 1, replace with 1s
subarray[:indexof1] = [1]*len(subarray[:indexof1])
Another solution using np.maximum.accumulate:
np.maximum.accumulate(arr[:,::-1],axis=1)[:,::-1]
Where np.maximum.accumulate simply apply a cumulative maximum (cummax).
I have a Pandas dataframe which looks like this:
col_1 col_2
a 4
a 3
b 2
c 2
d 1
b 4
c 1
I need to transform it into a NumPy array of 2D-arrays where each 2D-array corresponds to one of the letters. For example, if 'a' doesn't occur together with 1 and 2, and occurs with 3 and 4, 2D array corresponding to it should look like [0, 0, 1, 1]. So in this example I need:
[[0, 0, 1, 1], [0, 1, 0, 1], [0, 1, 0, 0], [1, 0, 0, 1]]
What is the best way to do this?
Here is one way crosstab
l = pd.crosstab(df.col_1,df.col_2).values.tolist()
Out[23]: [[0, 0, 1, 1], [0, 1, 0, 1], [1, 1, 0, 0], [1, 0, 0, 0]]
This question already has answers here:
How can I one hot encode in Python?
(22 answers)
Closed 3 years ago.
I would like to take a list of values and transform them to a table (2D-list) of 0's and 1's, with one column for each unique number in the source list and an equal number of rows to the original. Each row will have a 1 if that column index matches the original value-1.
I have code that accomplishes this task, but I'm wondering if there is a better/faster way to do it. (The actual dataset has millions of entries vs. the simplified set below)
Sample Input:
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
Desired output:
output_table = [[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]]
Current Solution:
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
max_val = max(value_list)
# initialize to table of 0's
a = [([0] * max_val) for i in range(len(value_list))]
# overwrite with 1's where required
for i in range(len(value_list)):
j = value_list[i] - 1
a[i][j] = 1
print(f'a = ')
for row in a:
print(f'{row}')
You can do:
import numpy as np
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
# create matrix of zeros
x = np.zeros(shape=(len(value_list), max(value_list)), dtype='int')
for i,v in enumerate(value_list):
x[i,v-1] = 1
print(x)
Output:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[1 0 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 1]
[0 0 0 0 1 0]
[0 0 0 1 0 0]
[0 0 1 0 0 0]]
You can try this:
dummy_list = [0]*6
output_table = [dummy_list[:i-1] + [1] + dummy_list[i:] for i in value_list]
Output:
output_table = [[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]]
I have very sparse matrices, so I want to extract the smallest rectangular region of a matrix that has non-zero values. I know that numpy.nonzero(a) gives you the indices of the elements that are non-zero, but how can I use this to extract a submatrix that contains the elements of the matrix at those indices.
To give an example, this is what I am aiming for:
>>> test
array([[0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 0, 0]])
>>> np.nonzero(test)
(array([1, 1, 1, 1, 2, 2]), array([1, 2, 3, 4, 2, 3]))
>>> submatrix(test)
array([[1, 1, 1, 1],
[0, 1, 1, 0]])
Does anyone know a simple way to do this in numpy? Thanks.
It seems like you're looking to find the smallest region of your matrix that contains all the nonzero elements. If that's true, here's a method:
import numpy as np
def submatrix(arr):
x, y = np.nonzero(arr)
# Using the smallest and largest x and y indices of nonzero elements,
# we can find the desired rectangular bounds.
# And don't forget to add 1 to the top bound to avoid the fencepost problem.
return arr[x.min():x.max()+1, y.min():y.max()+1]
test = np.array([[0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 0, 0]])
print submatrix(test)
# Result:
# [[1 1 1 1]
# [0 1 1 0]]