Map numpy categorical data to a numpy vector - python

I am having a numpy array that is looking like:
my_arr = array([[0., 0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.],
...
...]
I want to return a vector that will contain for each vector of my_arr the index of entry with value one. How can I do so?

You use np.argmax() for that.
inds = np.argmax(my_arr, axis=1)
# array([4, 1, 3, 4, 0, 4, 1, 4])

np.where(my_arr)[1]
Look at docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

You can use np.argwhere to return an array of coordinates:
arr = np.random.randint(0, 2, (5, 5))
print(arr)
[[0 0 1 1 1]
[0 1 0 1 1]
[1 1 0 0 1]
[1 1 1 0 0]
[1 1 1 1 0]]
res = np.argwhere(arr)
print(res)
array([[0, 2], [0, 3], ..., [4, 2], [4, 3]], dtype=int64)

Related

np.ufunc.at for 2D array

In order to compute confusion matrix (not the accuracy) loop over the predicted and true labels may be needed. How to perform that in a numpy manner, if next code does not give needed result?
>> a = np.zeros((5, 5))
>> indices = np.array([
[0, 0],
[2, 2],
[4, 4],
[0, 0],
[2, 2],
[4, 4],
])
np.add.at(a, indices, 1)
>> a
>> array([
[4., 4., 4., 4., 4.],
[0., 0., 0., 0., 0.],
[4., 4., 4., 4., 4.],
[0., 0., 0., 0., 0.],
[4., 4., 4., 4., 4.]
])
# Wanted
>> array([
[2., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 2., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 2.]
])
Docs say If first operand has multiple dimensions, indices can be a tuple of array like index objects or slice objects.
Using next tupling wanted result is reached.
np.add.at(a, (indices[:, 0], indices[:, 1]), 1)

How to get rid of nested for loop

mazeHow do i replace the nested for loop without affecting the functionality of the code:
def addCoordinate(self, x, y, blockType):
if self.x1 < x :
self.x1 = x
if self.y1 < y:
self.y1 = y
if self.x1 >= len(self.mazeboard) or self.y1 >= len(self.mazeboard):
modified_board = [[1 for a in range(self.x1 + 1)] for b in range(self.y1 + 1)]
for a in range(len(self.mazeboard)):
for b in range(len(self.mazeboard[a])):
modified_board[a][b] = self.mazeboard[a][b]
self.mazeboard = modified_board
self.mazeboard[x][y] = blockType
Yes, the nested loops & the range(len(self.mazeboard)) are highly unpythonic here, most of all when you just want to extend a matrix like
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
to
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
0 0 0 0 0 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
you could work in-place, completing the existing rows with ones, and adding rows of ones until you reach the proper dimension
Self-contained example:
mazeboard = [[0]*5 for _ in range(5)]
x1 = 7
x2 = 7
old_len = len(mazeboard[0])
# extend the existing rows
for m in mazeboard:
m += [1]*(x1+1-old_len)
# add rows
mazeboard += [[1]*(x1+1) for i in range(len(mazeboard),x2+1)]
print(mazeboard)
result:
[[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]]
so no nested loop, no useless copy, using list multiplication to generate the proper lengths for the lists to add.
If you work with a matrix in Python, you may want to consider using Numpy
You example becomes trivial with numpy. First, import numpy:
>>> import numpy as np
Create the 5x5 matrix:
>>> a=np.ones(shape=(5,5))
>>> a
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
Expand that matrix with 5 more columns and 5 more rows:
>>> a=np.pad(a,((0,5),(0,5)),mode='constant', constant_values=0)
>>> a
array([[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Instead of nested Python loops, you will have C code executing matrix function many times faster and more efficiently.

A neater way to set values at indexes with NumPy

I have a numpy array initially with zeros, like this:
v = np.zeros((5, 5))
v
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
I also have a set of arrays idx1 and idx2.
idx1
array([[0, 3],
[0, 4],
[1, 3],
[2, 4]])
idx2
array([[0, 1],
[0, 2],
[0, 4],
[1, 3]])
Look upon each pair of values as row and column indices. So, for example, in idx1, the first pair (0, 3) would be indexers into v[0, 3] and so on.
I want to first set values at indexes specified by idx1 to 1, followed by all indexes specified by idx2 to 0.
Also, please note that if there is a pair (i, j) in some array, I want to set v[i, j] and v[j, i] at the same time.
My final result becomes:
array([[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.]])
I currently achieve this by doing:
def set_vals(x, i, j, v):
x[i, j] = x.T[i, j] = v
v = np.zeros((5, 5))
i1, j1 = idx1[:, 0], idx1[:, 1]
i2, j2 = idx2[:, 0], idx2[:, 1]
set_vals(v, i1, j1, 1)
set_vals(v, i2, j2, 0)
v # the result
However, I believe there might be a better way. Would love to hear any thoughts/suggestions for improvement. Thanks!
In search of a more "compact" way of expressing it, I got this -
v = np.zeros((5, 5))
v[tuple(np.r_[idx1,idx1[:,::-1]].T)] = 1
v[tuple(np.r_[idx2,idx2[:,::-1]].T)] = 0
On python3.6+, you can use the * unpacking operator to reduce this further:
v[[*np.r_[idx1,idx1[:,::-1]].T]] = 1
v[[*np.r_[idx2,idx2[:,::-1]].T]] = 0
v
array([[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.]])

Numpy Cyclic Broadcast of Fancy Indexing

A is an numpy array with shape (6, 8)
I want:
x_id = np.array([0, 3])
y_id = np.array([1, 3, 4, 7])
A[ [x_id, y_id] += 1 # this doesn't actually work.
Tricks like ::2 won't work because the indices do not increase regularly.
I don't want to use extra memory to repeat [0, 3] and make a new array [0, 3, 0, 3] because that is slow.
The indices for the two dimensions do not have equal length.
which is equivalent to:
A[0, 1] += 1
A[3, 3] += 1
A[0, 4] += 1
A[3, 7] += 1
Can numpy do something like this?
Update:
Not sure if broadcast_to or stride_tricks is faster than nested python loops. (Repeat NumPy array without replicating data?)
You can convert y_id to a 2d array with the 2nd dimension the same as x_id, and then the two indices will be automatically broadcasted due to the dimension difference:
x_id = np.array([0, 3])
y_id = np.array([1, 3, 4, 7])
​
A = np.zeros((6,8))
A[x_id, y_id.reshape(-1, x_id.size)] += 1
A
array([[ 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.]])

Keras np_utils.to_categorical behaves differently

Why does Keras to_categorical behaves differently on [1, -1] and [2, -2]?
y = [1, -1, -1]
y_ = np_utils.to_categorical(y)
array([[ 0., 1.],
[ 0., 1.],
[ 0., 1.]])
y = [2, -2, -2]
y_ = np_utils.to_categorical(y)
array([[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
to_categorical does not take negative values, if you have a dataset that has negative values, you can pass y - y.min() to to_categorical so it works as you would expect:
>>> y = numpy.array([2, -2, -2])
>>> to_categorical(y)
array([[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
>>> to_categorical(y - y.min())
array([[ 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.]])
y = np.array(y, dtype='int').ravel()
if not num_classes:
num_classes = np.max(y) + 1
n = y.shape[0]
categorical = np.zeros((n, num_classes))
categorical[np.arange(n), y] = 1
above is the implementation of to_categorical.
So in [1, -1, -1] case what happened is :
num_classes = 2 [np.max()+1]
categorical shape becomes [3,2]
so when -1 comes it reads the last index and makes it 1. and for 1 also it reads index 1(index starts from 0).
that is why final output becomes
array([[ 0., 1.],
[ 0., 1.],
[ 0., 1.]])
in [2, -2, -2] case what happened is :
num_classes = 3 [np.max()+1]
categorical shape becomes [3,3]
so when -2 comes it reads the second last index and makes it 1. and for 2 it reads index 2(index starts from 0).
that is why final output becomes
array([[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
so if you try something like [2, -4, -4] it will give you an error as there is no index -4 as categorical shape is [3,3].

Categories