Convert NumPy array to 0 or 1 based on threshold

Convert NumPy array to 0 or 1 based on threshold - python

I have an array below:
a=np.array([0.1, 0.2, 0.3, 0.7, 0.8, 0.9])
What I want is to convert this vector to a binary vector based on a threshold.
take threshold=0.5 as an example, element that greater than 0.5 convert to 1, otherwise 0.
The output vector should like this:
a_output = [0, 0, 0, 1, 1, 1]
How can I do this?

np.where
np.where(a > 0.5, 1, 0)
# array([0, 0, 0, 1, 1, 1])
Boolean basking with astype
(a > .5).astype(int)
# array([0, 0, 0, 1, 1, 1])
np.select
np.select([a <= .5, a>.5], [np.zeros_like(a), np.ones_like(a)])
# array([ 0., 0., 0., 1., 1., 1.])
Special case: np.round
This is the best solution if your array values are floating values between 0 and 1 and your threshold is 0.5.
a.round()
# array([0., 0., 0., 1., 1., 1.])

You could use binarize from the sklearn.preprocessing module.
However this will work only if you want your final values to be binary i.e. '0' or '1'. The answers provided above are great of non-binary results as well.
from sklearn.preprocessing import binarize
a = np.array([0.1, 0.2, 0.3, 0.7, 0.8, 0.9]).reshape(1,-1)
x = binarize(a)
a_output = np.ravel(x)
print(a_output)
#everything together
a_output = np.ravel(binarize(a.reshape(1,-1), 0.5))

Related

How to implement Multinomial conditional distributions depending on the conditional binary value in Tensorflow Probability?

I am trying to build a graphical model in Tensorflow Probability, where we first sample a number of positive (1) and negative (0) examples (count_i) from Categorical distribution and then construct Multinomial distribution (Y_i) depending on the value of (count_i). These events (Y_i) are mutually exclusive :
Y_1 ~ Multinomial([.9, 0.1, 0.05, 0.05, 0.1], total_count = [tf.reduce_sum(tf.cast(count==1, tf.float32))
Y_2 ~ Multinomial([0.99, 0.01, 0., 0., 0.], total_count = [tf.reduce_sum(tf.cast(count==0, tf.float32))
I have read these tutorials, however I am stuck with two issues:
This code generates two arrays of length 500, whereas I only need 1 array of 500. What should I change so we only get 1 sample from Categorical distribution and then depending on the overall count of the value we are conditioning on, Multinomial is constructed ?
The sample from Categorical distribution gives only values of 0, whereas it should be a blend between 0 and 1. What am I doing wrong here?
My code is as follows. You can run these to replicate the behaviour:
def simplied_model():
return tfd.JointDistributionSequential([
tfd.Uniform(low=0., high = 1., name = 'e'), #e
lambda e: tfd.Sample(tfd.Categorical(probs = tf.stack([e, 1.-e], 0)), sample_shape =500), #count #should it be independent?
lambda count: tfd.Multinomial(probs = tf.constant([[.9, 0.1, 0.05, 0.05, 0.1], [0.99, 0.01, 0., 0., 0.]]), total_count = tf.cast(tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32)),tf.reduce_sum(tf.cast(count==0, tf.float32))], 0), dtype= tf.float32))
])
tt = simplied_model()
tt.resolve_graph()
tt.sample(1)

The first array will be your Y_{1} and the second will be your Y_{2}. The key is that your output will always be of shape (2, 5) because that is the length of the probabilities you are passing to tfd.Multinomial.
Code:
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability import distributions as tfd
# helper function
def _get_counts(vec):
zeros = tf.reduce_sum(tf.cast(vec == 0, tf.float32))
ones = tf.reduce_sum(tf.cast(vec == 1, tf.float32))
return tf.stack([ones, zeros], 0)
joint = tfd.JointDistributionSequential([
tfd.Sample( # sample from uniform to make it 2D
tfd.Uniform(0., 1., name="e"), 1),
lambda e: tfd.Sample(
tfd.Categorical(probs=tf.stack([e, 1.-e], -1)), 500),
lambda c: tfd.Multinomial(
probs=[
[0.9, 0.1, 0.05, 0.05, 0.1],
[0.99, 0.01, 0., 0., 0.],
],
total_count=_get_counts(c),
)
])
joint.sample(5) # or however many you want to sample
Output:
# [<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
# array([[0.5611458 ],
# [0.48223293],
# [0.6097224 ],
# [0.94013655],
# [0.14861858]], dtype=float32)>,
# <tf.Tensor: shape=(5, 1, 500), dtype=int32, numpy=
# array([[[1, 0, 0, ..., 1, 0, 1]],
#
# [[1, 1, 1, ..., 1, 0, 0]],
#
# [[0, 0, 0, ..., 1, 0, 0]],
#
# [[0, 0, 0, ..., 0, 0, 0]],
#
# [[1, 0, 1, ..., 1, 0, 1]]], dtype=int32)>,
# <tf.Tensor: shape=(2, 5), dtype=float32, numpy=
# array([[ 968., 109., 0., 0., 0.],
# [1414., 9., 0., 0., 0.]], dtype=float32)>]

How do you efficiently sum the occurences of a value in one array at positions in another array

Im looking for an efficient 'for loop' avoiding solution that solves an array related problem I'm having. I want to use a huge 1Darray (A -> size = 250.000) of values between 0 and 40 for indexing in one dimension, and a array (B) with the same size with values between 0 and 9995 for indexing in a second dimension.
The result should be an array with size (41, 9996) with for each index the amount of times that any value from array 1 occurs at a value from array 2.
Example:
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
which should result in:
[[0, 1, 0,
[0, 0, 0,
[0, 0, 1,
[0, 0, 2,
[1, 0, 0]]
The dirty way is too slow as the amount of data is huge, what you would be able to do is:
out = np.zeros(41,9995)
for i in A:
for j in B:
out[i,j] += 1
which will take 238.000 * 238.000 loops...
I've tried this, which works partially:
out = np.zeros(41,9995)
out[A,B] += 1
Which generates a result with 1 everywhere, regardless of the amount of times the values occur.
Does anyone have a clue how to fix this? Thanks in advance!

You are looking for a sparse tensor:
import torch
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()
Output:
tensor([[0., 1., 0.],
[0., 0., 0.],
[0., 0., 1.],
[0., 0., 2.],
[1., 0., 0.]])
You can also do the same with scipy sparse matrix:
import numpy as np
from scipy.sparse import coo_matrix
coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()
output:
array([[0., 1., 0.],
[0., 0., 0.],
[0., 0., 1.],
[0., 0., 2.],
[1., 0., 0.]])
Sometimes it is better to leave the matrix in its sparse representation, rather than forcing it to be "dense" again.

Use numpy.add.at:
import numpy as np
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
arr = np.zeros((5, 3))
np.add.at(arr, (A, B), 1)
print(arr)
Output
[[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]
[0. 0. 2.]
[1. 0. 0.]]

Given that the numbers are in a small range, bincount would be a good choice for bin-based summing -
def accumulate_coords(A,B):
nrows = A.max()+1
ncols = B.max()+1
return np.bincount(A*ncols+B,minlength=nrows*ncols).reshape(-1,ncols)
Sample run -
In [55]: A
Out[55]: array([0, 3, 2, 4, 3])
In [56]: B
Out[56]: array([1, 2, 2, 0, 2])
In [58]: accumulate_coords(A,B)
Out[58]:
array([[0, 1, 0],
[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[1, 0, 0]])

Python function that identifies if the numbers in a list or array are closer to 0 or 1

I have a numpy array of numbers. Below is an example:
[[-2.10044520e-04 1.72314372e-04 1.77235336e-04 -1.06613465e-04
6.76617611e-07 2.71623057e-03 -3.32789944e-05 1.44899758e-05
5.79249863e-05 4.06502549e-04 -1.35823707e-05 -4.13955189e-04
5.29862793e-05 -1.98286005e-04 -2.22829175e-04 -8.88758230e-04
5.62228710e-05 1.36249752e-05 -2.00474996e-05 -2.10090068e-05
1.00007518e+00 1.00007569e+00 -4.44597417e-05 -2.93724453e-04
1.00007513e+00 1.00007496e+00 1.00007532e+00 -1.22357142e-03
3.27903892e-06 1.00007592e+00 1.00007468e+00 1.00007558e+00
2.09869172e-05 -1.97610235e-05 1.00007529e+00 1.00007530e+00
1.00007503e+00 -2.68725642e-05 -3.00372853e-03 1.00007386e+00
1.00007443e+00 1.00007388e+00 5.86993822e-05 -8.69989983e-06
1.00007590e+00 1.00007488e+00 1.00007515e+00 8.81850779e-04
2.03875532e-05 1.00007480e+00 1.00007425e+00 1.00007517e+00
-2.44678912e-05 -4.36556267e-08 1.00007436e+00 1.00007558e+00
1.00007571e+00 -5.42990711e-04 1.45517859e-04 1.00007522e+00
1.00007469e+00 1.00007575e+00 -2.52271817e-05 -7.46339417e-05
1.00007427e+00]]
I want to know if each of the numbers is closer to 0 or 1. Is there a function in Python that could do it or do I have to do it manually?

A straightforward way:
lst=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
closerTo1 = [x >= 0.5 for x in lst]
Or you can use np:
import numpy as np
lst=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
arr = np.array(lst)
closerTo1 = arr >= 0.5
Note that >= 0.5 can be changed to > 0.5, however you choose to treat it.

numpy.rint is a ufunc that will round the elements of an array to the nearest integer.
>>> a = np.arange(0, 1.1, 0.1)
>>> a
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
>>> np.rint(a)
array([0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])
What if the numbers don't have to be between 0 and 1?
In that case, I'd use numpy.where.
>>> a = np.arange(-2, 2.1, 0.1)
>>> a
array([-2.00000000e+00, -1.90000000e+00, -1.80000000e+00, -1.70000000e+00,
-1.60000000e+00, -1.50000000e+00, -1.40000000e+00, -1.30000000e+00,
-1.20000000e+00, -1.10000000e+00, -1.00000000e+00, -9.00000000e-01,
-8.00000000e-01, -7.00000000e-01, -6.00000000e-01, -5.00000000e-01,
-4.00000000e-01, -3.00000000e-01, -2.00000000e-01, -1.00000000e-01,
1.77635684e-15, 1.00000000e-01, 2.00000000e-01, 3.00000000e-01,
4.00000000e-01, 5.00000000e-01, 6.00000000e-01, 7.00000000e-01,
8.00000000e-01, 9.00000000e-01, 1.00000000e+00, 1.10000000e+00,
1.20000000e+00, 1.30000000e+00, 1.40000000e+00, 1.50000000e+00,
1.60000000e+00, 1.70000000e+00, 1.80000000e+00, 1.90000000e+00,
2.00000000e+00])
>>> np.where(a <= 0.5, 0, 1)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Here is one simple way to do this:
>>> a = np.arange(-2, 2.1, 0.1)
>>> (a >= .5).astype(np.float)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1.])
(Change np.float to np.int if you want integers.)

You could use numpy.where:
import numpy as np
arr = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 2.0])
result = np.where(arr >= 0.5, 1, 0)
print(result)
Output
[0 0 0 0 1 1 1 1 1 1]
Note that this will return 1 for numbers above 1 (for instance 2).

You could use abs() to measure distances between your number and 0 and 1 and check which on is shorter.
x = [[-2.10044520e-04, 1.72314372e-04, 1.77235336e-04, -1.06613465e-04,
6.76617611e-07, 2.71623057e-03, -3.32789944e-05, 1.44899758e-05,
5.79249863e-05, 4.06502549e-04, -1.35823707e-05, -4.13955189e-04,
5.29862793e-05, -1.98286005e-04, -2.22829175e-04, -8.88758230e-04,
5.62228710e-05, 1.36249752e-05, -2.00474996e-05, -2.10090068e-05,
1.00007518e+00, 1.00007569e+00, -4.44597417e-05, -2.93724453e-04,
1.00007513e+00, 1.00007496e+00, 1.00007532e+00, -1.22357142e-03,
3.27903892e-06, 1.00007592e+00, 1.00007468e+00, 1.00007558e+00,
2.09869172e-05, -1.97610235e-05, 1.00007529e+00, 1.00007530e+00,
1.00007503e+00, -2.68725642e-05, -3.00372853e-03, 1.00007386e+00,
1.00007443e+00, 1.00007388e+00, 5.86993822e-05, -8.69989983e-06,
1.00007590e+00, 1.00007488e+00, 1.00007515e+00, 8.81850779e-04,
2.03875532e-05, 1.00007480e+00, 1.00007425e+00, 1.00007517e+00,
-2.44678912e-05, -4.36556267e-08, 1.00007436e+00, 1.00007558e+00,
1.00007571e+00, -5.42990711e-04, 1.45517859e-04, 1.00007522e+00,
1.00007469e+00, 1.00007575e+00, -2.52271817e-05, -7.46339417e-05,
1.00007427e+00]]
rounded_x = [0 if abs(i) < abs(1-i) else 1 for i in x[0]]
print(rounded_x)
Output:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1]

Here's a simple generalization for any arbitrary numbers a and b, instead of just 0 and 1:
def closerab(l, a=0, b=1):
l = np.asarray(l)
boolarr = (np.abs(l - b) > np.abs(l - a))
# returns two lists of indices, one for numbers closer to a and one for numbers closer to b
return boolarr.nonzero()[0], (boolarr==0).nonzero()[0]
This'll return two lists, one with the indices of the numbers closer to a, and one with the indices of the numbers closer to b.
Testing it out:
l = [
-2.10044520e-04, 1.72314372e-04, 1.77235336e-04, 1.06613465e-04,
6.76617611e-07, 2.71623057e-03, 3.32789944e-05, 1.44899758e-05,
5.79249863e-05, 4.06502549e-04, 1.35823707e-05, 4.13955189e-04,
5.29862793e-05, 1.98286005e-04, 2.22829175e-04, 8.88758230e-04,
5.62228710e-05, 1.36249752e-05, 2.00474996e-05, 2.10090068e-05,
1.00007518e+00, 1.00007569e+00, 4.44597417e-05, 2.93724453e-04,
1.00007513e+00, 1.00007496e+00, 1.00007532e+00, 1.22357142e-03,
3.27903892e-06, 1.00007592e+00, 1.00007468e+00, 1.00007558e+00,
2.09869172e-05, 1.97610235e-05, 1.00007529e+00, 1.00007530e+00,
1.00007503e+00, 2.68725642e-05, 3.00372853e-03, 1.00007386e+00,
1.00007443e+00, 1.00007388e+00, 5.86993822e-05, 8.69989983e-06,
1.00007590e+00, 1.00007488e+00, 1.00007515e+00, 8.81850779e-04,
2.03875532e-05, 1.00007480e+00, 1.00007425e+00, 1.00007517e+00,
-2.44678912e-05, 4.36556267e-08, 1.00007436e+00, 1.00007558e+00,
1.00007571e+00, 5.42990711e-04, 1.45517859e-04, 1.00007522e+00,
1.00007469e+00, 1.00007575e+00, 2.52271817e-05, 7.46339417e-05,
1.00007427e+00
]
print(closerab(l, 0, 1))
This outputs:
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 22, 23, 27, 28, 32, 33, 37, 38, 42, 43, 47, 48, 52, 53,
57, 58, 62, 63]),
array([20, 21, 24, 25, 26, 29, 30, 31, 34, 35, 36, 39, 40, 41, 44, 45, 46,
49, 50, 51, 54, 55, 56, 59, 60, 61, 64]))

Alternatively, you can use a ternary operator.
x = [-0.2, 0.1, 1.1, 0.75, 0.4, 0.2, 1.5, 0.9]
a = 0
b = 1
[a if i <= (a+b)/2 else b for i in x]

From the Python built-in function docs round(number[, ndigits]):
Return the floating point value number rounded to ndigits digits after the decimal point. If ndigits is omitted, it defaults to zero. The result is a floating point number. Values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done away from 0 (so, for example, round(0.5) is 1.0 and round(-0.5) is -1.0).
For numpy arrays in particular, you can use the numpy.round_ function.

your_list=[[-2.10044520e-04, 1.72314372e-04, 1.77235336e-04, 1.06613465e-04,
6.76617611e-07, 2.71623057e-03, 3.32789944e-05, 1.44899758e-05,
5.79249863e-05, 4.06502549e-04, 1.35823707e-05, 4.13955189e-04,
5.29862793e-05, 1.98286005e-04, 2.22829175e-04, 8.88758230e-04,
5.62228710e-05, 1.36249752e-05, 2.00474996e-05, 2.10090068e-05,
1.00007518e+00, 1.00007569e+00, 4.44597417e-05, 2.93724453e-04,
1.00007513e+00, 1.00007496e+00, 1.00007532e+00, 1.22357142e-03,
3.27903892e-06, 1.00007592e+00, 1.00007468e+00, 1.00007558e+00,
2.09869172e-05, 1.97610235e-05, 1.00007529e+00, 1.00007530e+00,
1.00007503e+00, 2.68725642e-05, 3.00372853e-03, 1.00007386e+00,
1.00007443e+00, 1.00007388e+00, 5.86993822e-05, 8.69989983e-06,
1.00007590e+00, 1.00007488e+00, 1.00007515e+00, 8.81850779e-04,
2.03875532e-05, 1.00007480e+00, 1.00007425e+00, 1.00007517e+00,
-2.44678912e-05, 4.36556267e-08, 1.00007436e+00, 1.00007558e+00,
1.00007571e+00, 5.42990711e-04, 1.45517859e-04, 1.00007522e+00,
1.00007469e+00, 1.00007575e+00, 2.52271817e-05, 7.46339417e-05,
1.00007427e+00]]
close_to_one_or_zero=[1 if x > 0.5 else 0 for x in your_list[0]]
close_to_one_or_zero
[0, 0, 0, 0, 0,....... 1, 1, 1, 0, 0, 1]

You can use round:
[round(i) for i in [0.1,0.2,0.3,0.8,0.9]]

Is there a better way to produce a membership matrix (one-hot array) for an array of cluster assignments in Python? [duplicate]

This question already has answers here:
Convert array of indices to one-hot encoded array in NumPy
(22 answers)
Closed 5 years ago.
After running kmeans I can easily get an array with the assigned clusters for ever data point. Now I want to get a membership matrix (one-hot array) which has the different clusters as columns and indicates the cluster assignment by either 1 or 0 in the matrix for each data point.
My code is shown below and it works but I am wondering if there is a more elegant way to do the same.
km = KMeans(n_clusters=3).fit(data)
membership_matrix = np.stack([np.where(km.labels_ == 0, 1,0),
np.where(km.labels_ == 1, 1,0),
np.where(km.labels_ == 2, 1,0)]
axis = 1)

So you can create 'one-hot array' which is equivalent to your membership array from array of cluster according to this question. Here is how you do it using np.eye
import numpy as np
clusters = np.array([2,1,2,2,0,1])
n_clusters = max(clusters) + 1
membership_matrix = np.eye(n_clusters)[clusters]
Output is as follows
array([[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])

Here's a method that's agnostic to the number of clusters you have (with your method, you'll have to "stack" more things if you have more clusters).
This code sample assumes you have six data points and 3 clusters:
NUM_DATA_POINTS = 6
NUM_CLUSTERS = 3
clusters = np.array([2,1,2,2,0,1]) # hard-coded as an example, but this is your KMeans output
# create your empty membership matrix
membership = np.zeros((NUM_DATA_POINTS, NUM_CLUSTERS))
membership[np.arange(NUM_DATA_POINTS), clusters] = 1
The key feature being used here is 2D array indexing - in the last line of code above, we index into the rows of membership sequentially (np.arange creates an incrementing sequence from 0 to NUM_DATA_POINTS-1) and into the columns of membership using the cluster assignments. Here's the relevant numpy reference.
It would produce the following membership matrix:
>>> membership
array([[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])

You are looking for LabelBinarizer. Give this code a try:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
membership_matrix = lb.fit_transform(km.labels_)
In contrast to other solutions proposed here, this approach:
Generates a compact membership matrix when the labels are not consecutive numbers.
Is able to deal with categorical labels.
Sample run:
In [9]: lb.fit_transform([0, 1, 2, 0, 2, 2])
Out[9]:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1]])
In [10]: lb.fit_transform([0, 1, 9, 0, 9, 9])
Out[10]:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1]])
In [11]: lb.fit_transform(['first', 'second', 'third', 'first', 'third', 'third'])
Out[11]:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1]])

Numpy matrix binarization using only one expression

I am looking for a way to binarize numpy N-d array based on the threshold using only one expression. So I have something like this:
np.random.seed(0)
np.set_printoptions(precision=3)
a = np.random.rand(4, 4)
threshold, upper, lower = 0.5, 1, 0
a is now:
array([[ 0.02 , 0.833, 0.778, 0.87 ],
[ 0.979, 0.799, 0.461, 0.781],
[ 0.118, 0.64 , 0.143, 0.945],
[ 0.522, 0.415, 0.265, 0.774]])
Now I can fire these 2 expressions:
a[a>threshold] = upper
a[a<=threshold] = lower
and achieve what I want:
array([[ 0., 1., 1., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 0., 1.],
[ 1., 0., 0., 1.]])
But is there a way to do this with just one expression?

We may consider np.where:
np.where(a>threshold, upper, lower)
Out[6]:
array([[0, 1, 1, 1],
[1, 1, 0, 1],
[0, 1, 0, 1],
[1, 0, 0, 1]])

Numpy treats every 1d array as a vector, 2d array as sequence of vectors (matrix) and 3d+ array as a generic tensor. This means when we perform operations, we are performing vector math. So you can just do:
>>> a = (a > 0.5).astype(np.int_)
For example:
>>> np.random.seed(0)
>>> np.set_printoptions(precision=3)
>>> a = np.random.rand(4, 4)
>>> a
>>> array([[ 0.549, 0.715, 0.603, 0.545],
[ 0.424, 0.646, 0.438, 0.892],
[ 0.964, 0.383, 0.792, 0.529],
[ 0.568, 0.926, 0.071, 0.087]])
>>> a = (a > 0.5).astype(np.int_) # Where the numpy magic happens.
>>> array([[1, 1, 1, 1],
[0, 1, 0, 1],
[1, 0, 1, 1],
[1, 1, 0, 0]])
Whats going on here is that you are automatically iterating through every element of every row in the 4x4 matrix and applying a boolean comparison to each element.
If > 0.5 return True, else return False.
Then by calling the .astype method and passing np.int_ as the argument, you're telling numpy to replace all boolean values with their integer representation, in effect binarizing the matrix based on your comparison value.

A shorter method is to simply multiply the boolean matrix from the condition by 1 or 1.0, depending on the type you want.
>>> a = np.random.rand(4,4)
>>> a
array([[ 0.63227032, 0.18262573, 0.21241511, 0.95181594],
[ 0.79215808, 0.63868395, 0.41706148, 0.9153959 ],
[ 0.41812268, 0.70905987, 0.54946947, 0.51690887],
[ 0.83693151, 0.10929998, 0.19219377, 0.82919761]])
>>> (a>0.5)*1
array([[1, 0, 0, 1],
[1, 1, 0, 1],
[0, 1, 1, 1],
[1, 0, 0, 1]])
>>> (a>0.5)*1.0
array([[ 1., 0., 0., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 1., 1.],
[ 1., 0., 0., 1.]])

You can write expression directly, this will return a boolean array, and it can be used simply as an 1-byte unsigned integer ("uint8") array for further calculations:
print a > 0.5
output
[[False True True True]
[ True True False True]
[False True False True]
[ True False False True]]
In one line and with custom upper/lower values you can write so for example:
upper = 10
lower = 3
treshold = 0.5
print lower + (a>treshold) * (upper-lower)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert NumPy array to 0 or 1 based on threshold - python

Related

How to implement Multinomial conditional distributions depending on the conditional binary value in Tensorflow Probability?

How do you efficiently sum the occurences of a value in one array at positions in another array

Python function that identifies if the numbers in a list or array are closer to 0 or 1

Is there a better way to produce a membership matrix (one-hot array) for an array of cluster assignments in Python? [duplicate]

Numpy matrix binarization using only one expression

Categories

Resources