Transform 2d numpy array into 2d one hot encoding - python

How would I transform
a=[[0,6],
[3,7],
[5,5]]
into
b=[[1,0,0,0,0,0,1,0],
[0,0,0,1,0,0,0,1],
[0,0,0,0,0,1,0,0]]
I want to bring notice to how the final array in b only has one value set to 1 due to the repeat in the final array in a.

Using indexing:
a = np.array([[0,6],
[3,7],
[5,5]])
b = np.zeros((len(a), a.max()+1), dtype=int)
b[np.arange(len(a)), a.T] = 1
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])

This can also be done using numpy broadcasting and boolean comparision in the following way:
a = np.array([[0,6],
[3,7],
[5,5]])
# Convert to 3d array so that each element is present along the last axis
# Compare with max+1 to get the index of values as True.
b = (a[:,:,None] == np.arange(a.max()+1))
# Check if any element along axis 1 is true and convert the type to int
b = b.any(1).astype(int)
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])

Related

encode a 0-1 matrix from an integer matrix numpy

So I have an n*K integer matrix [Note: its a representation of the number of samples drawn from K-distributions (K-columns)]
a =[[0,1,0,0,2,0],
[0,0,1,0,0,0],
[3,0,0,0,0,0],
]
[Note: in the application context this matrix basically means that for the i row (sim instance) we drew 1 element from the "distribution 1" (1 \in [0,..K]) (a[0,1] = 1) and 2 from the distribution 4(a[0,4] = 2)].
What I need is to generate a 0-1 matrix that represents the same integer matrix but with ones(1). In this case, is a 3D matrix of n*a.max()*K that has a 1 for each sample that is drawn from the distributions. [Note: we need this matrix so we can multiply by our K-distribution sample matrix]
Output
b = [[[0,1,0,0,1,0], # we don't care if they samples are stack
[0,0,0,0,1,0],
[0,0,0,0,0,0]], # this is the first row representation
[[0,0,1,0,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0]], # this is the second row representation
[[1,0,0,0,0,0],
[1,0,0,0,0,0],
[1,0,0,0,0,0]], # this is the third row representation
]
how to do that in NumPy ?
Thanks !
from #michael-szczesny comment
a = np.array([[0,1,0,0,2,0],
[0,0,1,0,0,0],
[3,0,0,0,0,0],
])
b = (np.arange(1, a.max()+1)[:,None] <= a[:,None]).astype('uint8')
print(b)
array([[[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0]],
[[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]],
[[1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0]]], dtype=uint8)

Insert matrix inside another matrix using numpy without overwriting some original values

I need to insert a matrix inside another one using numpy
The matrix i need to insert is like this one:
tetraminos = [[0, 1, 0],
[1, 1, 1]]
While the other matrix is like this:
board = numpy.array([
[6,0,0,0,0,0,0,0,0,0],
[6,0,0,0,0,0,0,0,0,0]
])
The code i'm actually using this one:
board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])] = tetraminos
The problem matrix that i'm getting is this one:
wrong_matrix = numpy.array([
[[0,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]]
])
while the expected result is:
expected_result = numpy.array([
[6,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]
])
The error is that, since the matrix contains 0, when i insert it inside the new one i lost the first value in the first row (the number 6), while i wanted to keep it
Full code:
import numpy
if __name__ == '__main__':
board = numpy.array([
[6,0,0,0,0,0,0,0,0,0],
[6,0,0,0,0,0,0,0,0,0]
])
tetraminos = [[0, 1, 0], [1, 1, 1]]
board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])] = tetraminos
print(board)
expected_result = numpy.array([
[6,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0]
])
exit(1)
As long as you always want to put a constant value in there, you can treat your tetramino as a mask and use the np.putmask function:
>>> board = np.array([[6,0,0,0,0,0,0,0,0],[6,0,0,0,0,0,0,0,0]])
>>> board
array([[6, 0, 0, 0, 0, 0, 0, 0, 0],
[6, 0, 0, 0, 0, 0, 0, 0, 0]])
>>> tetraminos = [[0,1,0],[1,1,1]]
>>> np.putmask(board[0:len(tetraminos),0:len(tetraminos[0])], tetraminos,1)
>>> board
array([[6, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0]])
You might do it in two steps:
tetraminos = np.array([0, 1, 0], [1, 1, 1])
temp = board[0:0 + len(tetraminos), 0:0 + len(tetraminos[0])]
board[0:0 + tetraminos.shape[0], 0:0 + tetraminos.shape[1]] = np.where(tetraminos == 0, temp, tetraminos)
Output:
array([[6, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
A variant on the putmask
In [243]: board = np.array([
...: [6,0,0,0,0,0,0,0,0,0],
...: [6,0,0,0,0,0,0,0,0,0]
...: ])
In [244]: tetraminos = np.array([[0, 1, 0],
...: [1, 1, 1]])
In [245]: aview = board[:tetraminos.shape[0],:tetraminos.shape[1]]
Use the tetraminos as boolean to selection slots of aview to put values:
In [246]: aview[tetraminos.astype(bool)]=1
In [247]: board
Out[247]:
array([[6, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
This could be generalized if tetraminos has other non-zero values.

Distance to next non-zero element in one-dimensional numpy array

I have a one-dimensional numpy array consisting of ones and zeroes, like this:
[0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
For each non-zero element of the array, I want to calculate the "distance" to the next non-zero element. That is, I want to answer the question "How far away is the next non-zero element?" So the result for the above array would be:
[0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0]
Is there a built-in numpy function for this? And if not, what's the most efficient way to implement this in numpy?
Here is 2 liners. If you don't want override original a replace with copy()
import numpy as np
a = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
ix = np.where(a)[0]
a[ix[:-1]] = np.diff(ix)
print(a[:-1]) # --> array([0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0])
Probably not the best answer.
np.where will give you the locations of the non-zero indices in increasing order. By iterating through the result, you know the location of each 1 and the location of the following 1, and can build the result array yourself easily. If the 1s are sparse, this is probably pretty efficient.
Let me see if I can think of something more numpy-ish.
== UPDATE ==
Ah, just came to me
# Find the ones in the array
temp = np.where(x)[0]
# find the difference between adjacent elements
deltas = temp[1:] - temp[:-1]
# Build the result based on these
result = np.zeros_like(x)
result[temp[:-1]] = deltas
Let's try:
import numpy as np
arr = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
# create output
res = np.zeros_like(arr)
# select indices non-zero
where, = np.where(arr)
# assign the indices of the non-zero the diff
res[where[:-1]] = np.diff(where)
print(res)
Output
[0 0 0 1 3 0 0 6 0 0 0 0 0 4 0 0 0 0]

Numpy re-index to first N natural numbers

I have a matrix that has a quite sparse index (the largest values in both rows and columns are beyond 130000), but only a few of those rows/columns actually have non-zero values.
Thus, I want to have the row and column indices shifted to only represent the non-zero ones, by the first N natural numbers.
Visually, I want a example matrix like this
1 0 1
0 0 0
0 0 1
to look like this
1 1
0 1
but only if all values in the row/column are zero.
Since I do have the matrix in a sparse format, I could simply create a dictionary, store every value by an increasing counter (for row and matrix separately), and get a result.
row_dict = {}
col_dict = {}
row_ind = 0
col_ind = 0
# el looks like this: (row, column, value)
for el in sparse_matrix:
if el[0] not in row_dict.keys():
row_dict[el[0]] = row_ind
row_ind += 1
if el[1] not in col_dict.keys():
col_dict[el[1]] = col_ind
col_ind += 1
# now recreate matrix with new index
But I was looking for maybe an internal function in NumPy. Also note that I do not really know how to word the question, so there might well be a duplicate out there that I do not know of; Any pointers in the right direction are appreciated.
You can use np.unique:
>>> import numpy as np
>>> from scipy import sparse
>>>
>>> A = np.random.randint(-100, 10, (10, 10)).clip(0, None)
>>> A
array([[6, 0, 5, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 7, 0, 0, 0, 0, 4, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4, 0],
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0, 0]])
>>> B = sparse.coo_matrix(A)
>>> B
<10x10 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in COOrdinate format>
>>> runq, ridx = np.unique(B.row, return_inverse=True)
>>> cunq, cidx = np.unique(B.col, return_inverse=True)
>>> C = sparse.coo_matrix((B.data, (ridx, cidx)))
>>> C.A
array([[6, 5, 0, 0, 0],
[0, 0, 7, 4, 9],
[0, 0, 0, 4, 0],
[9, 0, 0, 0, 0],
[0, 0, 4, 0, 0]])

create sparse array from diagonal parts

How to construct sparse matrix from diagonal vectors like this:
Lets say my matrix is square with dimension N=6 and i have the following vector
vec = np.array([[1], [1,2]])
and I want to put those parts on diagonals
offset = np.array([2,3])
but vec[0] should start at Mat[0,2] and vec[1] should start at Mat[1,4]
I know about scipy.sparse.diags() but I don't think there is a way to specify just part of a diagonal where non-zero elements are present.
This is just an example to illustrate the problem. In reality I deal with very big arrays and I dont want to waste memory for useless zeros.
Is this the matrix that you want?
In [200]: sparse.dia_matrix(([[0,0,1,0,0,0],[0,0,0,0,1,2]],[2,3]),(6,6)).A
Out[200]:
array([[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 2],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
Yes, the specification includes zeros, which could be annoying in larger cases.
spdiags just wraps the dia_matrix, with the option of converting the result to another format. In your example that converts a 7 element sparse to a 3.
sparse.diags accepts a ragged list of values, but they still need to match the diagonals in length. And internally it converts them to the rectangular array that dia_matrix takes.
S3=sparse.diags([[1,0,0,0],[0,1,2]],[2,3],(6,6))
So if you really need to be parsimonious about the zeros you need to go the coo route.
For example:
In [363]: starts = [[0,2],[1,4]]
In [364]: data = np.concatenate(vec)
In [365]: rows=np.concatenate([range(s[0],s[0]+len(v)) for s,v in zip(starts, vec)])
In [366]: cols=np.concatenate([range(s[1],s[1]+len(v)) for s,v in zip(starts, vec)])
In [367]: sparse.coo_matrix((data,(rows,cols)),(6,6)).A
Out[367]:
array([[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 2],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])

Categories