I have the following array
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
and would like to apply two thresholds, such that all values below -1.0 are set to 1 and all values above -0.3 are set to 0. For the values inbetween, the following rule should apply: if the last value was below -1.0 then it should be a 1 but if the last value was above -0.3, then it should be a 0.
For the example array above, the output should be
target = np.array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0])
If multiple consecutive values are between -1.0 and -0.3, then it should go back as far as required until there is a value above or below the two thresholds and set the output accordingly.
I tried to achieve this by iterating over the array and using a while inside the for loop to find the next occurence where the value is above the threshold, but it doesn't work:
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
p = []
def function(array, p):
for i in np.nditer(array):
if i < -1:
while i <= -0.3:
p.append(1)
i += 1
else:
p.append(0)
i += 1
return p
a = function(array, p)
print(a)
How can I apply the two thresholds to my array as described above?
What you are trying to achieve is called "thresholding with hysteresis". For this, I adapted the very nice algorithm from this answer:
Given your test data,
import numpy as np
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
you detect which values are below the first threshold -1.0, and which are above the second threshold -0.3:
low_values = array <= -1.0
high_values = array >= -0.3
These are the values for which you know the result: either 1 or 0. For all other values, it depends on its neighbors. Thus, all values for which either low_values or high_values is True are known.
You can get the indices of all known elements with:
known_values = high_values | low_values
known_idx = np.nonzero(known_values)[0]
To find the result for all unknown values, we use the np.cumsum function on the known_values array. The Booleans are interpreted as 0 or 1, so this gives us the following array:
acc = np.cumsum(known_values)
which will result in the following for your example:
[ 0 1 2 2 3 4 5 6 7 8 9 10 11].
Now, known_idx[acc - 1] will contain the index of the last known value for each point. With low_values[known_idx[acc - 1]] you get a True if the last known value was below -1.0 and a False if it was above -0.3:
result = low_values[known_idx[acc - 1]]
There is one problem left: If the initial value is below -1.0 or above -0.3, then everything works out perfectly fine. But if it is in-between, then it would depend on its left neighbor - which it doesn't have. So in your case, you simply define it to be zero.
We can do that by checking if acc[0] equals 0 or 1. If acc[0] = 1, then everything is fine, but if acc[0] = 0, then this means that the first value is between -1.0 and -0.3, so we have to set it to zero:
if not acc[0]:
result[0] = False
Finally, as we were doing lots of comparisons, our result array is a boolean array. To convert it to integer 0 and 1, we simply call
result = np.int8(result)
and we get our desired result:
array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0], dtype=int8)
Related
I have a (8864,40) array A, containing both negative and positive values. I wanna divide the positive values of the array with the maximum value of A, and divide the negative values with the minimum of A. Is it possible to do this whilst also keeping the shape of the array A? Any help would be appreciated.
please see the snipped below
A[A > 0] /= np.max(A)
A[A < 0] /= np.min(A)
This?
np.where(A > 0, A/A.max(), A/A.min())
If it is a list you can use list comprehension such as
x = [-2, 1, 3, 0, -4, -1, 0, 5, 2]
y = [i / max(x) if i > 0 else i / abs(min(x)) for i in x]
print(x)
print(y)
that produces
[-2, 1, 3, 0, -4, -1, 0, 5, 2]
[-0.5, 0.2, 0.6, 0.0, -1.0, -0.25, 0.0, 1.0, 0.4]
where sign of the number - or + is conserved. Without the use of abs() you will get only positive values.
I do not quite understand by the phrase
Is it possible to do this whilst also keeping the shape of the array A?
By any change the shape means the sign?
I don't know if the title describes my question. I have such list of floats obtained from a sigmoid activation function.
outputs =
[[0.015161413699388504,
0.6720218658447266,
0.0024502829182893038,
0.21356457471847534,
0.002232735510915518,
0.026410426944494247],
[0.006432057358324528,
0.0059209042228758335,
0.9866275191307068,
0.004609372932463884,
0.007315939292311668,
0.010821194387972355],
[0.02358204871416092,
0.5838017225265503,
0.005475651007145643,
0.012086033821106,
0.540218658447266,
0.010054176673293114]]
To calculate my metrics, I would like to say if any neuron's output value is greater than 0.5, it is assumed that the comment belongs to the class (multi-label problem). I could easily do that using
outputs = np.where(np.array(outputs) >= 0.5, 1, 0)
However, I would like to add a condition to consider only the bigger value if class#5 and and any other class have values > 0.5 (as class#5 cannot occur with other classes). How to write that condition?
In my example the output should be:
[[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 1 0 0 0 0]]
instead of:
[[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 1 0 0 1 0]]
Thanks,
You can write a custom function that you can then apply to each sub-array in outputs using the np.apply_along_axis() function:
def choose_class(a):
if (len(np.argwhere(a >= 0.5)) > 1) & (a[4] >= 0.5):
return np.where(a == a.max(), 1, 0)
return np.where(a >= 0.5, 1, 0)
outputs = np.apply_along_axis(choose_class, 1, outputs)
outputs
# array([[0, 1, 0, 0, 0, 0],
# [0, 0, 1, 0, 0, 0],
# [0, 1, 0, 0, 0, 0]])
For the simple mask, you don't need np.where
mask = outputs >= 0.5
If you want an integer instead of a boolean:
mask = (outputs >= 0.5).view(np.uint8)
To check the fifth column, you need to keep a reference to the original data around. You can get the maximum masked value in each relevant row with
rows = np.flatnonzero(mask[:, 4])
keep = (outputs[mask] * mask[rows]).argmax()
Then you can blank out the rows and set only the maximum value:
mask[rows] = 0
mask[rows, keep] = 1
One other solution:
# Your example input array
out = np.array([[0.015, 0.672, 0.002, 0.213, 0.002, 0.026],
[0.006, 0.005, 0.986, 0.004, 0.007, 0.010],
[0.023, 0.583, 0.005, 0.012, 0.540, 0.010]])
# We get the desired result
val = (out>=0.5)*out//(out.max(axis=1))[:,None]
This solution do the following operation:
Set to zero all the value < 0.5
Set to 1 the maximum value by row (iif this value is >= 0.5)
I have a numpy array where 0 denotes empty space and 1 denotes that a location is filled. I am trying to find a quick method of scanning the numpy array for where there are multiple values of zero adjacent to each other and return the location of the central zero.
For Example if I had the following array
[0 1 0 1]
[0 0 0 1]
[0 1 0 1]
[1 1 1 1]
I want to return the locations for which there is an adjacent zero on either side of a central zero
e.g
[1,1]
as this is the central of 3 zeros, i.e there is a zero either side of the zero at this location
Im aware that this can be calculated using if statements, but wondered if there was a more pythonic way of doing this.
Any help is greatly appreciated
The desired output here for arbitrary inputs is not exhaustively specified in the question, but here is a possible approach that might be useful for this kind of problem, and adapted to the details of the desired output. It uses np.cumsum, np.bincount, np.where, and np.median to find the middle index for groups of consecutive zeros along rows of a 2D array:
import numpy as np
def find_groups(x, min_size=3, value=0):
# Compute a sequential label for groups in each row.
xc = (x != value).cumsum(1)
# Count the number of occurances per group in each row.
counts = np.apply_along_axis(
lambda x: np.bincount(x, minlength=1 + xc.max()),
axis=1, arr=xc)
# Filter by minimum number of occurances.
i, j = np.where(counts >= min_size)
# Compute the median index of each group.
return [
(ii, int(np.ceil(np.median(np.where(xc[ii] == jj)[0]))))
for ii, jj in zip(i, j)
]
x = np.array([[0, 1, 0, 1],
[0, 0, 0, 1],
[0, 1, 0, 1],
[1, 1, 1, 1]])
print(find_groups(x))
# [(1, 1)]
It should work properly even for multiple rows with groups of varying sizes, and even multiple groups per row:
x2 = np.array([[0, 1, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0]])
print(find_groups(x2))
# [(1, 1), (1, 5), (2, 3), (3, 3)]
I have a big binary array (500 x 700) in which I want to check 'NaNs' and infill the central pixel with the mode of eight surrounding pixels (if more than 4 surrounding pixels have 0 or 1). It's more like a 3x3 sliding window search. Are there any tools/functions to do this in either xarray or scipy.ndimage or even numpy?
Eg.
arr = np.asarray([0, 1, 1, 1, 0, 1, 1, np.nan, 0, 1, 0, 1, 1, 1, 0, 1, 1, np.nan]).reshape(3,6)
arr[1,1] = 1
arr[-1,-1] = 1 (only 3 neighbours)
Any help would be highly appreciated..
Thanks in advance.
You can implement your idea directly using numpy and scipy.stats.mode.
First, find locations of nan values by comparing an array to itself because a NaN float is not equal to itself by definition. The np.where function will return all locations this condition holds, in two tuples of indices, one for the rows and the other for columns.
Then, for each location where a NaN is found, add 8 deltas to it to get its surrounding pixels. This can be done efficiently using a delta array, which lists all possible offsets for the row and column index, for each neighbour.
Finally, do a within-boundary check and run the mode function on the selected, valid neighbours and fill this value into the NaN cell.
Here's the code following my description above:
import numpy as np
import scipy.stats
arr = np.asarray([
0, 1, 1, 1, 0, 1,
1, np.nan, 0, 1, 0, 1,
1, 1, 0, 1, 1, np.nan
]).reshape(3, 6)
delta_rows = np.array([-1, -1, -1, 0, 0, 1, 1, 1])
delta_cols = np.array([-1, 0, 1, -1, 1, -1, 0, 1])
nan_rows, nan_cols = np.where(arr != arr)
for nan_row, nan_col in zip(nan_rows, nan_cols):
neighbour_rows = nan_row + delta_rows
neighbour_cols = nan_col + delta_cols
within_boundary = (
(0 <= neighbour_rows) & (neighbour_rows < arr.shape[0]) &
(0 <= neighbour_cols) & (neighbour_cols < arr.shape[1])
)
neighbour_rows = neighbour_rows[within_boundary]
neighbour_cols = neighbour_cols[within_boundary]
arr[nan_row, nan_col] = scipy.stats.mode(arr[neighbour_rows, neighbour_cols]).mode
Afterwards, we can see that each NaN value in arr is correctly populated with the mode of its surrounding cells:
>>> print(arr)
[[0. 1. 1. 1. 0. 1.]
[1. 1. 0. 1. 0. 1.]
[1. 1. 0. 1. 1. 1.]]
Is there a way to avoid this loop so optimize the code?
import numpy as np
cLoss = 0
dist_ = np.array([0,1,0,1,1,0,0,1,1,0]) # just an example, longer in reality
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1]) # just an example, longer in reality
t = float(dist_.size)
for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
cLoss+= 1 - TLabels[i]*(1. * np.sum(labels)/t)
print cLoss
Note: dist_ and TLabels are both numpy arrays with the same shape (t,1)
I am not sure what you exactly want to do, but are you aware of scipy.ndimage.measurements for computing on arrays with labels? It look like you want something like:
cLoss = len(dist_) - sum(TLabels * scipy.ndimage.measurements.sum(TLabels,dist_,dist_) / len(dist_))
I first wonder, what is labels at each step in the loop?
With dist_ = array([2,1,2]) and TLabels=array([1,2,3])
I get
[-1 1]
[1]
[-1 1]
The different length immediately raise a warning flag - it may be difficult to vectorize this.
With the longer arrays in the edited example
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
The labels vectors are all the same length. Is that normal, or just a coincidence of values?
Drop a couple of elements off of dist_, and labels are:
In [375]: for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
v = (1.*np.sum(labels)/t); v1 = 1-TLabels[i]*v
print(labels, v, TLabels[i], v1)
cLoss += v1
.....:
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, 1, 1.25)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
Again different lengths of labels, but really only a few calculations. There is 1 v value for each different dist_ value.
Without working out all the details, it looks like you are just calculating labels*labels for each distinct dist_ value, and then summing those.
This looks like a groupBy problem. You want to divide the dist_ into groups with a common value, and sum some function of their corresponding TLabels values. Python itertools has a groupBy function, so does pandas. I think both require you to sort dist_.
Try sorting dist_ and see if that adds any clarity to the problem.
I'm not sure if this is any better since I didn't exactly understand why you might want to do this. Many variables in your loop are bivalued hence can be computed in advance.
Also the entries of dist_ can be used as a boolean switch but I used an explicit copy anyhow.
dist_ = np.array([0,1,0,1,1,0,0,1,1,0])
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1])
t = len(dist)
dist_zeros = dist_== 0
one_zero_sum = [sum(TLabels[dist_zeros])/t , sum(TLabels[~dist_zeros])/t]
cLoss = sum([1-x*one_zero_sum[dist_[y]] for y,x in enumerate(TLabels)])
which results in cLoss = 8.2. I am using Python3 so didn't check whether this is a true division or not in Python2.