How to convert a dataframe to ndarray of 0s and 1s? - python

I have a Pandas dataframe which looks like this:
col_1 col_2
a 4
a 3
b 2
c 2
d 1
b 4
c 1
I need to transform it into a NumPy array of 2D-arrays where each 2D-array corresponds to one of the letters. For example, if 'a' doesn't occur together with 1 and 2, and occurs with 3 and 4, 2D array corresponding to it should look like [0, 0, 1, 1]. So in this example I need:
[[0, 0, 1, 1], [0, 1, 0, 1], [0, 1, 0, 0], [1, 0, 0, 1]]
What is the best way to do this?

Here is one way crosstab
l = pd.crosstab(df.col_1,df.col_2).values.tolist()
Out[23]: [[0, 0, 1, 1], [0, 1, 0, 1], [1, 1, 0, 0], [1, 0, 0, 0]]

Related

How access odd index elements and even index elements and merge them vertically

I've started learning numpy since yesterday.
my AIM is
Extract odd index elements from numpy array & even index elements from numpy and merge side by side vertically.
Let's say I have the array
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
What I've tried.
-->I've done transposing as I've to merge side by by side vertically.
mat = np.transpose(mat)
Which gives me
[[1 0 1 0 1]
[1 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[0 1 1 0 1]]
I've tried accessing odd index elements
odd = mat[1::2] print(odd)
Gives me
[[1 1 0 0 0] ----> wrong...should be [0,1,0,0,1] right? I'm confused
[0 0 1 0 0]] --->wrong...Should be [0,0,0,0,0] right? Where these are coming from?
My final output should like like
[[0 0 1 1 1]
[1 0 1 0 0]
[0 0 0 0 1]
[0 0 0 1 0]
[1 0 0 1 1]]
Type - np.nd array
Looks like you want:
mat[np.r_[1:mat.shape[0]:2,:mat.shape[0]:2]].T
Output:
array([[0, 0, 1, 1, 1],
[1, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[1, 0, 0, 1, 1]])
Intermediate:
np.r_[1:mat.shape[0]:2,:mat.shape[0]:2]
output: array([1, 3, 0, 2, 4])
While the selection of rows is straight forward, there are various ways of combining them.
In [244]: mat = np.array([[1, 1, 0, 0, 0],
...: [0, 1, 0, 0, 1],
...: [1, 0, 0, 1, 1],
...: [0, 0, 0, 0, 0],
...: [1, 0, 1, 0, 1]])
The odd rows:
In [245]: mat[1::2,:] # or mat[1::2]
Out[245]:
array([[0, 1, 0, 0, 1],
[0, 0, 0, 0, 0]])
The even rows:
In [246]: mat[0::2,:]
Out[246]:
array([[1, 1, 0, 0, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Joining the rows verticallly (np.vstack can also be used):
In [247]: np.concatenate((mat[1::2,:], mat[0::2,:]), axis=0)
Out[247]:
array([[0, 1, 0, 0, 1],
[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
But since you want columns - tranpose:
In [248]: np.concatenate((mat[1::2,:], mat[0::2,:]), axis=0).transpose()
Out[248]:
array([[0, 0, 1, 1, 1],
[1, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[1, 0, 0, 1, 1]])
We could transpose the selections first:
np.concatenate((mat[1::2,:].T, mat[0::2,:].T), axis=1)
or transpose before indexing (note the change in the ':' slice position):
np.concatenate((mat.T[:,1::2], mat.T[:,0::2]), axis=1)
The r_ in the other answer converts the slices into arrays and concatenates them, to make one row indexing array. That's equally valid.
So here alternate is the logic you can use.
1. convert array to list
2. Access nested list items based on mat[1::2] - odd & mat[::2] for even
3. concat them using np.concat at `axis =0` vertically.
4. Transpose them.
Implementaion.
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
mat_list = mat.tolist() ##############Optional
l_odd = mat_list[1::2]
l_even= mat_list[::2]
mask = np.concatenate((l_odd, l_even), axis=0)
mask = np.transpose(mask)
print(mask)
output #
[[0 0 1 1 1]
[1 0 1 0 0]
[0 0 0 0 1]
[0 0 0 1 0]
[1 0 0 1 1]]
Checking Type
print(type(mask))
Gives
<class 'numpy.ndarray'>

How can I get a numpy array slides by choosing with specific rows and columns inplace?

As in the title, if I have a matrix a
a = np.diag(np.arange(5))
array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])
How can I assign a new 4x4 matrix or even 3x4 matrix to a without i-th row and i-th column? Let's say
b = array([[1,1,1,1],
[1,1,1,1],
[1,1,1,1])
I want to slice a and remove the first and second row and the second column of the matrix, which is something in R like
a[c(-1,-2), -2] = b
a =
array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1]])
But in python, I tried something like
a[[2,3,4],:][:,[0,1,3,4]]
output:
array([0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
This operation won't allow me to assign a new matrix to slices of a.
How can I do that? I really appreciate any help you can provide.
p.s.
I found in this special case, I can assign values by blocks. But what I actually want to ask is when we do slice like a[2:5, [0,2,3,4]], we can get a 3x4 matrix, and assign a new matrix to that position of the matrix. But I want to do is to slice 'a[[0,2,3,4],[0,2,3,4]]` to get a 4x4 matrix or other shapes(the index for row and column may even be random), and assign a new matrix to that position. But numpy gives me a 1d array.
newmatrix = a[[0, 1, 3, 4], :][:, [0, 1, 3, 4]]
Regarding setting the values of a matric part of a larger matrix, I think there is no direct option. But you can create the original matrix around the one to be added:
before = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])
insert_array = np.array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
first two rows without second column
first_step = np.delete(before[:2, :], 1, 1)
or
first_step = before[:2, [0, 2, 3, 4]]
appended to insert matrix
second_step = np.insert(insert_array, 0, first_step, axis=0)
second column appended
third_step = np.insert(second_step, 1, before[:, 1], axis=1)
final matrix
third_step = np.array([[0, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 0, 1, 1, 1]])
I can't find a one-step solution to do that. But I think we can assign matrix by block.
a[2:5, 0] = 1
a[2:5, 2:5] = 1
Then I can get what I want.

How to set values in a 2d numpy array given 1D indices for each row?

In numpy you can set the indices of a 1d array to a value
import numpy as np
b = np.array([0, 0, 0, 0, 0])
indices = [1, 3]
b[indices] = 1
b
array([0, 1, 0, 1, 0])
I'm trying to do this with multi-rows and an index for each row in the most programmatically elegant and computationally efficient way possible. For example
b = np.array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
indices = [[1, 3], [0, 1], [0, 3]]
The desired result is
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
I tried b[indices] and b[:,indices] but they resulted in an error or undesired result.
From searching, there are a few work arounds, but each tends to need at least 1 loop in python.
Solution 1: Run a loop through each row of the 2d array. The draw back for this is that the loop runs in python, and this part won't take advantage of numpy's c processing.
Solution 2: Use numpy put. The draw back is put works on a flattened version of the input array, so the indices need to be flattened too, and altered by the row size and number of rows, which would use a double for loop in python.
Solution 3: put_along_axis seems to only be able to set 1 value per row, so I would need to repeat this function for the number of values per row.
What would be the most computationally and programatically elegant solution? Anything where numpy would handle all the operations?
In [330]: b = np.zeros((3,5),int)
To set the (3,2) columns, the row indices need to be (3,1) shape (matching by broadcasting):
In [331]: indices = np.array([[1,3],[0,1],[0,3]])
In [332]: b[np.arange(3)[:,None], indices] = 1
In [333]: b
Out[333]:
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
put along does the same thing:
In [335]: b = np.zeros((3,5),int)
In [337]: np.put_along_axis(b, indices,1,axis=1)
In [338]: b
Out[338]:
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
On solution to build the indices in each dimension and then use a basic indexing:
from itertools import chain
b = np.array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
# Find the indices along the axis 0
y = np.arange(len(indices)).repeat(np.fromiter(map(len, indices), dtype=np.int_))
# Flatten the list and convert it to an array
x = np.fromiter(chain.from_iterable(indices), dtype=np.int_)
# Finaly set the items
b[y, x] = 1
It works even for indices lists with variable-sized sub-lists like indices = [[1, 3], [0, 1], [0, 2, 3]]. If your indices list always contains the same number of items in each sub-list then you can use the (more efficient) following code:
b = np.array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
indices = np.array(indices)
n, m = indices.shape
y = np.arange(n).repeat(m)
x = indices.ravel()
b[y, x] = 1
Simple one-liner based on Jérôme's answer (requires all items of indices to be equal-length):
>>> b[np.arange(np.size(indices)) // len(indices[0]), np.ravel(indices)] = 1
>>> b
array([[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])

Clearing a row based on the state of the first element

There is a 3d array:
Input:
[[[0,2,3,4]
[4,2,3,4]
[6,2,3,4]]
[[2,2,3,4]
[3,2,3,4]
[2,2,3,4]]]
How can I make a numpy array look like this?
rule: if array[:,:,0] < 3
Output:
[[[0,0,0,0]
[4,2,3,4]
[6,2,3,4]]
[[0,0,0,0]
[0,0,0,0]
[0,0,0,0]]]
Here's one way:
a[a[:,: ,0] <= 3, :] = 0
OUTPUT:
array([[[0, 0, 0, 0],
[4, 2, 3, 4],
[6, 2, 3, 4]],
[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]])
NOTE: I've assumed you wanna turn rows to 0 where the value is < or = to 3. Change the condition if required.

Python - Left Ordered Binary Matrix

I have a binary matrix of N rows and K columns. I would like to reorder the matrix into "left-ordered form," meaning the columns are permuted such that if each column is treated as a number in binary format (with the first row's value as most significant bit and the last row's value as the least significant bit), the columns are in decreasing value from left to right. For instance, this would look like the following:
[[0, 1, 1],
[1, 1, 0]]
becomes
[[1, 1, 0],
[1, 0, 1]]
This question is the same as Left ordered binary matrix algorithm in R, but the answer there is insufficient. I have no upper bound on the number of rows N, so explicitly computing each column's binary value is impossible.
Unless I'm missing something, this is simply sorting the columns in reverse lexographic order?
import numpy as np
sample = np.array([
[1, 0, 1, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 0, 1, 1]
])
sample[:, np.lexsort(-sample[::-1])]
output:
array([[1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 1, 0]])
You can adapt bucket sort to sort it row by row until there's nothing left to sort.
import numpy as np
sample = np.array([
[1, 0, 1, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 0, 1, 1]
])
def SortMatrix(matrix, indices, row):
if indices.size <= 1 or row >= matrix.shape[0]:
return indices
left = indices[np.where(matrix[row, indices] == 1)]
right = indices[np.where(matrix[row, indices] == 0)]
return np.concatenate((SortMatrix(matrix, left, row+1), SortMatrix(matrix, right, row+1)))
sample = sample[:,SortMatrix(sample, np.array(range(sample.shape[1])), 0)]
print(sample)
# [[1 1 1 1 0 0 0 0]
# [1 1 0 0 1 1 0 0]
# [1 0 1 0 1 0 1 0]]
You can convert them to decimal and get the sorted order.
>>> mul = 2 ** (np.arange(arr.shape[0])[::-1]).reshape(1, -1)
>>> mul
array([[2, 1]], dtype=int32)
>>> order = np.argsort(mul # arr).squeeze()[::-1]
>>> order
array([1, 0, 2], dtype=int64)
>>> arr[:, order]
array([[1, 1, 0],
[1, 0, 1]])

Categories