Comparing elements at specific positions in numpy.ndarray - python

I don't know if the title describes my question. I have such list of floats obtained from a sigmoid activation function.
outputs =
[[0.015161413699388504,
0.6720218658447266,
0.0024502829182893038,
0.21356457471847534,
0.002232735510915518,
0.026410426944494247],
[0.006432057358324528,
0.0059209042228758335,
0.9866275191307068,
0.004609372932463884,
0.007315939292311668,
0.010821194387972355],
[0.02358204871416092,
0.5838017225265503,
0.005475651007145643,
0.012086033821106,
0.540218658447266,
0.010054176673293114]]
To calculate my metrics, I would like to say if any neuron's output value is greater than 0.5, it is assumed that the comment belongs to the class (multi-label problem). I could easily do that using
outputs = np.where(np.array(outputs) >= 0.5, 1, 0)
However, I would like to add a condition to consider only the bigger value if class#5 and and any other class have values > 0.5 (as class#5 cannot occur with other classes). How to write that condition?
In my example the output should be:
[[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 1 0 0 0 0]]
instead of:
[[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 1 0 0 1 0]]
Thanks,

You can write a custom function that you can then apply to each sub-array in outputs using the np.apply_along_axis() function:
def choose_class(a):
if (len(np.argwhere(a >= 0.5)) > 1) & (a[4] >= 0.5):
return np.where(a == a.max(), 1, 0)
return np.where(a >= 0.5, 1, 0)
outputs = np.apply_along_axis(choose_class, 1, outputs)
outputs
# array([[0, 1, 0, 0, 0, 0],
# [0, 0, 1, 0, 0, 0],
# [0, 1, 0, 0, 0, 0]])

For the simple mask, you don't need np.where
mask = outputs >= 0.5
If you want an integer instead of a boolean:
mask = (outputs >= 0.5).view(np.uint8)
To check the fifth column, you need to keep a reference to the original data around. You can get the maximum masked value in each relevant row with
rows = np.flatnonzero(mask[:, 4])
keep = (outputs[mask] * mask[rows]).argmax()
Then you can blank out the rows and set only the maximum value:
mask[rows] = 0
mask[rows, keep] = 1

One other solution:
# Your example input array
out = np.array([[0.015, 0.672, 0.002, 0.213, 0.002, 0.026],
[0.006, 0.005, 0.986, 0.004, 0.007, 0.010],
[0.023, 0.583, 0.005, 0.012, 0.540, 0.010]])
# We get the desired result
val = (out>=0.5)*out//(out.max(axis=1))[:,None]
This solution do the following operation:
Set to zero all the value < 0.5
Set to 1 the maximum value by row (iif this value is >= 0.5)

Related

Comparing two numpy arrays for compliance with two conditions

Consider two numpy arrays having the same shape, A and B, composed of 1s and 0s. A small example is shown:
A = [[1 0 0 1] B = [[0 0 0 0]
[0 0 1 0] [0 0 0 0]
[0 0 0 0] [1 1 0 0]
[0 0 0 0] [0 0 1 0]
[0 0 1 1]] [0 1 0 1]]
I now want to assign values to the two Boolean variables test1 and test2 as follows:
test1: Is there at least one instance where a 1 in an A column and a 1 in the SAME B column have row differences of exactly 1 or 2? If so, then test1 = True, otherwise False.
In the example above, column 0 of both arrays have 1s that are 2 rows apart, so test1 = True. (there are other instances in column 2 as well, but that doesn't matter - we only require one instance.)
test2: Do the 1 values in A and B all have different array addresses? If so, then test2 = True, otherwise False.
In the example above, both arrays have [4,3] = 1, so test2 = False.
I'm struggling to find an efficient way to do this and would appreciate some assistance.
Here is a simple way to test if two arrays have an entry one element apart in the same column (only in one direction):
(A[1:, :] * B[:-1, :]).any(axis=None)
So you can do
test1 = (A[1:, :] * B[:-1, :] + A[:-1, :] * B[1:, :]).any(axis=None) or (A[2:, :] * B[:-2, :] + A[:-2, :] * B[2:, :]).any(axis=None)
The second test can be done by converting the locations to indices, stacking them together, and using np.unique to count the number of duplicates. Duplicates can only come from the same index in two arrays since an array will never have duplicate indices. We can further speed up the calculation by using flatnonzero instead of nonzero:
test2 = np.all(np.unique(np.concatenate((np.flatnonzero(A), np.flatnonzero(B))), return_counts=True)[1] == 1)
A more efficient test would use np.intersect1d in a similar manner:
test2 = not np.intersect1d(np.flatnonzero(A), np.flatnonzero(B)).size
You can use masked_arrays and for second task you can do:
A_m = np.ma.masked_equal(A, 0)
B_m = np.ma.masked_equal(B, 0)
test2 = np.any((A_m==B_m).compressed())
And a naive way of doing first task is:
test1 = np.any((np.vstack((A_m[:-1],A_m[:-2],A_m[1:],A_m[2:]))==np.vstack((B_m[1:],B_m[2:],B_m[:-1],B_m[:-2]))).compressed())
output:
True
True
For Test2: You could just check if they found any similar indexes found for a value of 1.
A = np.array([[1, 0, 0, 1],[0, 0, 1, 0],[0, 0, 0, 0],[0, 0, 0, 0],[0, 0, 1, 1]])
B = np.array([[0, 0, 0, 0],[0, 0, 0, 0],[1, 1, 0, 0],[0, 0, 1, 0],[0, 1, 0, 1]])
print(len(np.intersect1d(np.flatnonzero(A==1),np.flatnonzero(B==1)))>0))

Looping through a numpy array

I have a 5 by 10 array and I want to flip a bit if a random number is greater than 0.9. However, this only works for the first row of the array and it doesn't get to the second and subsequent row. I replaced the bits with 3 and 4 so i can easily see if the flipping occurs. I have been getting results that look like this.
[[3 1 1 1 4 1 3 1 0 1]
[1 1 0 0 1 0 1 1 1 0]
[1 0 1 0 1 0 1 1 1 1]
[0 0 1 0 1 1 0 1 1 1]
[0 1 1 0 0 0 0 1 1 1]]
Please help me figure out where I'm wrong.
from random import random
RM = np.random.randint(0,2, size=(5,10))
print(RM)
for k in range(0, RM.shape[0]):
for j in range(0, RM.shape[1]):
A = random()
if A > 0.9:
if RM[k,j] == 0:
np.put(RM, [k,j], [3])
print("k1",k)
print("j1", j)
else:
np.put(RM, [k,j], [4])
print("k2", k)
else:
continue
print(RM)
Looking at the documentation of np.put
numpy.put(a, ind, v, mode='raise')[source]
Replaces specified elements of an array with given values.
under Examples:
a = np.arange(5)
np.put(a, [0, 2], [-44, -55])
a
array([-44, 1, -55, 3, 4])
So, if you feed a list to the function, it replaces multiple values in the flattened array.
To make your loop work, simply assigning the values to the array should work:
from random import random
RM = np.random.randint(0,2, size=(5,10))
print(RM)
for k in range(0, RM.shape[0]):
for j in range(0, RM.shape[1]):
A = random()
if A > 0.9:
if RM[k,j] == 0:
RM[k,j]=3
print("k1",k)
print("j1", j)
else:
RM[k,j] =4
print("k2", k)
else:
continue
Most likely you don't need the iteration. The flips are independent, you can generate the probabilities at one go, and just flip:
np.random.seed(100)
RM = np.random.randint(0,2, size=(5,10))
array([[0, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 0, 0, 1],
[0, 1, 0, 0, 0, 1, 1, 1, 0, 0],
[1, 0, 0, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 1]])
alpha = np.random.uniform(0,1,(5,10))
np.round(alpha,2)
array([[0.49, 0.4 , 0.35, 0.5 , 0.45, 0.09, 0.27, 0.94, 0.03, 0.04],
[0.28, 0.58, 0.99, 0.99, 0.99, 0.11, 0.66, 0.52, 0.17, 0.94],
[0.24, 1. , 0.58, 0.18, 0.39, 0.19, 0.41, 0.59, 0.72, 0.49],
[0.31, 0.58, 0.44, 0.36, 0.32, 0.21, 0.45, 0.49, 0.9 , 0.73],
[0.77, 0.38, 0.34, 0.66, 0.71, 0.11, 0.13, 0.46, 0.16, 0.96]])
RM[alpha>0.9] = abs(1-RM[alpha>0.9])
RM
array([[0, 0, 1, 1, 1, 1, 0, 1, 0, 0],
[0, 1, 1, 1, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1, 0, 0],
[1, 0, 0, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])
To iterate over a Numpy array, a convenient (and recommended) tool is nditer.
If you want to change values of the iterated array, op_flags=['readwrite']
should be passed.
To have access to the indices of the current element, in case of a
multi-dimension array, flags=['multi_index'] should be passed.
Below you have example code, which also prints indices in each case the current
element has been flipped.
To check how it operates, I added a printout of RM, both before
and after the loop.
np.random.seed(0)
RM = np.random.randint(0, 2, size=(5, 10))
print('Before:')
print(RM, '\n')
with np.nditer(RM, op_flags=['readwrite'], flags=['multi_index']) as it:
for x in it:
A = np.random.random()
if A > 0.9:
x[...] = 1 - x # Flip
print(f'Flip: <{it.multi_index}>, {A:.3f}')
print('\nAfter:')
print(RM)
To get repeatable result, I added np.random.seed(0) (remove it in the
target version).
With the above seeding, I got the following result:
Before:
[[0 1 1 0 1 1 1 1 1 1]
[1 0 0 1 0 0 0 0 0 1]
[0 1 1 0 0 1 1 1 1 0]
[1 0 1 0 1 1 0 1 1 0]
[0 1 0 1 1 1 1 1 0 1]]
Flip: <(0, 2)>, 0.945
Flip: <(1, 3)>, 0.944
Flip: <(2, 7)>, 0.988
Flip: <(4, 5)>, 0.976
Flip: <(4, 7)>, 0.977
After:
[[0 1 0 0 1 1 1 1 1 1]
[1 0 0 0 0 0 0 0 0 1]
[0 1 1 0 0 1 1 0 1 0]
[1 0 1 0 1 1 0 1 1 0]
[0 1 0 1 1 0 1 0 0 1]]
Compare elements indicated as flipped, in "Before" and "After" sections,
to confirm that the above code does its job.
Check also that no other element has been changed.
A bit tricky element in the above code is x[...] = 1 - x.
Note that 1 - x part reads the current value (so far it is OK).
But if you attempted to save anything to x, writing x =,
then you would break the link to the source array element.
In this case x would point to the new value, but not to the
current array element.
So in order not to break this link, just x[...] = notation is needed.

How to apply lower and upper threshold to NumPy array?

I have the following array
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
and would like to apply two thresholds, such that all values below -1.0 are set to 1 and all values above -0.3 are set to 0. For the values inbetween, the following rule should apply: if the last value was below -1.0 then it should be a 1 but if the last value was above -0.3, then it should be a 0.
For the example array above, the output should be
target = np.array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0])
If multiple consecutive values are between -1.0 and -0.3, then it should go back as far as required until there is a value above or below the two thresholds and set the output accordingly.
I tried to achieve this by iterating over the array and using a while inside the for loop to find the next occurence where the value is above the threshold, but it doesn't work:
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
p = []
def function(array, p):
for i in np.nditer(array):
if i < -1:
while i <= -0.3:
p.append(1)
i += 1
else:
p.append(0)
i += 1
return p
a = function(array, p)
print(a)
How can I apply the two thresholds to my array as described above?
What you are trying to achieve is called "thresholding with hysteresis". For this, I adapted the very nice algorithm from this answer:
Given your test data,
import numpy as np
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
you detect which values are below the first threshold -1.0, and which are above the second threshold -0.3:
low_values = array <= -1.0
high_values = array >= -0.3
These are the values for which you know the result: either 1 or 0. For all other values, it depends on its neighbors. Thus, all values for which either low_values or high_values is True are known.
You can get the indices of all known elements with:
known_values = high_values | low_values
known_idx = np.nonzero(known_values)[0]
To find the result for all unknown values, we use the np.cumsum function on the known_values array. The Booleans are interpreted as 0 or 1, so this gives us the following array:
acc = np.cumsum(known_values)
which will result in the following for your example:
[ 0 1 2 2 3 4 5 6 7 8 9 10 11].
Now, known_idx[acc - 1] will contain the index of the last known value for each point. With low_values[known_idx[acc - 1]] you get a True if the last known value was below -1.0 and a False if it was above -0.3:
result = low_values[known_idx[acc - 1]]
There is one problem left: If the initial value is below -1.0 or above -0.3, then everything works out perfectly fine. But if it is in-between, then it would depend on its left neighbor - which it doesn't have. So in your case, you simply define it to be zero.
We can do that by checking if acc[0] equals 0 or 1. If acc[0] = 1, then everything is fine, but if acc[0] = 0, then this means that the first value is between -1.0 and -0.3, so we have to set it to zero:
if not acc[0]:
result[0] = False
Finally, as we were doing lots of comparisons, our result array is a boolean array. To convert it to integer 0 and 1, we simply call
result = np.int8(result)
and we get our desired result:
array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0], dtype=int8)

How to display a percentage of an Array in python

How would you print out a percentage of an array?
For example:
if i had
x = np.array([2,3,1,0,4,3,5,4,3,2,3,4,5,10,15,120,102,10])
How would you set a percentage of the array to zero? if I wanted to keep the first 10% of an array as it is and change the remaining 90% of the array to zeros?
Thank you in advance?
This will give you roughly 90% at the front:
x[0:int(len(x)*0.9)]
And 90% at the back (by skipping the first 10%):
x[int(len(x)*0.1):]
So to set the last 90% to zero:
x[int(len(x)*0.1):] = 0
You could do like this:
import numpy as np
x = np.array([2,3,1,0,4,3,5,4,3,2,3,4,5,10,15,120,102,10])
cut_off = int(0.1*len(x))
print(len(x), cut_off)
for idx in range(cut_off,len(x)):
x[idx] = 0
x = [2,3,1,0,4,3,5,4,3,2,3,4,5,10,15,120,102,10]
index = 10 * len(x) / 100
x[index:] = [0]*(len(x) - 1 index )
print x
>>> x = [2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Here is what I would do:
x = np.array([2,3,1,0,4,3,5,4,3,2,3,4,5,10,15,120,102,10])
change = round(0.9 * len(x)) # changing 90%
x[-change:] = 0 # change from last value towards beginning of array
print(x)
yielding
[2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
do this work?
x = np.array([2,3,1,0,4,3,5,4,3,2,3,4,5,10,15,120,102,10])
j=len(x)
k=(j/100)*10
for index in range(k,j):
x[index]=0

NumPy: How to avoid this loop?

Is there a way to avoid this loop so optimize the code?
import numpy as np
cLoss = 0
dist_ = np.array([0,1,0,1,1,0,0,1,1,0]) # just an example, longer in reality
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1]) # just an example, longer in reality
t = float(dist_.size)
for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
cLoss+= 1 - TLabels[i]*(1. * np.sum(labels)/t)
print cLoss
Note: dist_ and TLabels are both numpy arrays with the same shape (t,1)
I am not sure what you exactly want to do, but are you aware of scipy.ndimage.measurements for computing on arrays with labels? It look like you want something like:
cLoss = len(dist_) - sum(TLabels * scipy.ndimage.measurements.sum(TLabels,dist_,dist_) / len(dist_))
I first wonder, what is labels at each step in the loop?
With dist_ = array([2,1,2]) and TLabels=array([1,2,3])
I get
[-1 1]
[1]
[-1 1]
The different length immediately raise a warning flag - it may be difficult to vectorize this.
With the longer arrays in the edited example
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
The labels vectors are all the same length. Is that normal, or just a coincidence of values?
Drop a couple of elements off of dist_, and labels are:
In [375]: for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
v = (1.*np.sum(labels)/t); v1 = 1-TLabels[i]*v
print(labels, v, TLabels[i], v1)
cLoss += v1
.....:
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, 1, 1.25)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
Again different lengths of labels, but really only a few calculations. There is 1 v value for each different dist_ value.
Without working out all the details, it looks like you are just calculating labels*labels for each distinct dist_ value, and then summing those.
This looks like a groupBy problem. You want to divide the dist_ into groups with a common value, and sum some function of their corresponding TLabels values. Python itertools has a groupBy function, so does pandas. I think both require you to sort dist_.
Try sorting dist_ and see if that adds any clarity to the problem.
I'm not sure if this is any better since I didn't exactly understand why you might want to do this. Many variables in your loop are bivalued hence can be computed in advance.
Also the entries of dist_ can be used as a boolean switch but I used an explicit copy anyhow.
dist_ = np.array([0,1,0,1,1,0,0,1,1,0])
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1])
t = len(dist)
dist_zeros = dist_== 0
one_zero_sum = [sum(TLabels[dist_zeros])/t , sum(TLabels[~dist_zeros])/t]
cLoss = sum([1-x*one_zero_sum[dist_[y]] for y,x in enumerate(TLabels)])
which results in cLoss = 8.2. I am using Python3 so didn't check whether this is a true division or not in Python2.

Categories