Get values in numpy array either side of a specified value - python

I have a numpy array, provided at random, which for this example looks like:
a = [10, 8, 6, 4, 2, -2, -4, -6, -8, -10, 1]
ideally, in this example, the values would be between -10 and 10 but this cannot be guaranteed (as above).
I want to retrive the 2 values closest to zero, such that:
b = a[a > 0][-1]
c = a[a < 0][0]
which would ideally return me the values of 2 and -2. However, the 1 value is included in the slice in b and i get returned the values of 1 and -2.
Is there a way in numpy to retrieve the values immediately 'next' to zero?
Its worth noting that whilst I always want to split the array at 0, the array could be any length and I could have an uneven number of positive and negative values in the array (i.e. [5, 4, 3, 2, 1, 0, -1])
A real world example is:
I want the yellow and green position but get returned the blue and green position instead, as the data crosses back over zero from -ve to +ve

This function should do the job:
import numpy as np
def my_func(x):
left = np.where(x[:-1]>0)[0][-1]
right = 1 + np.where(x[1:]<0)[0][0]
return x[left], x[right]
Demo:
>>> a = np.array([10, 8, 6, 4, 2, -2, -4, -6, -8, -10, 1])
>>> b = np.array([5, 4, 3, 2, 1, 0, -1])
>>> my_func(a)
(2, -2)
>>> my_func(b)
(1, -1)

Related

Can I put a condition for y-index in numpy.where?

I have a 2D numpy array taken from a segmentation. Therefore, it's an image like the one in the right:
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQwYeYOHk0xUJ6vBd_g8Xn1LxMON0g2qHpf_TPJx6h7IM5nG2OXeKtDuCcjgN9mqFtLB5c&usqp=CAU
The colours you see means that each value of my array can only have a value in a limit range (e.g., green is 5, orange is 7...). Now I would like to change all the cells that contains a 5 (green) and its y-coordinate is up to a value I want (e.g. only apply the later condition up to row 400). What's the most optimized algorithm to do this?
I guess that you can use something like:
np.where(myarray == 5, myarray, valueIwant)
but I will need to apply the condition for y-index...
Your current example seems to be misaligned with what you want:
a = np.array([1, 1, 2, 2, 3, 3])
np.where(a==2, a, 7)
produces:
array([7, 7, 2, 2, 7, 7])
If you want to replace 2 with some other value:
array([1, 1, 7, 7, 3, 3])
you can do this:
np.where(a==2, 7, a)
or
a[a==2] = 7
To replace only up to a certain value:
sub_array = a[:3]
sub_array[sub_array==2] = 7
a
array([1, 1, 7, 2, 3, 3])

Replace consecutive duplicates in 2D numpy array

I have a two dimensional numpy array x:
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
My goal is to replace all consecutive duplicate numbers with a specific value (lets take -1), but by leaving one occurrence unchanged.
I could do this as follows:
def replace_consecutive_duplicates(x):
consec_dup = np.zeros(x.shape, dtype=bool)
consec_dup[:, 1:] = np.diff(x, axis=1) == 0
x[consec_dup] = -1
return x
# current output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, 5, -1, -1, 3],
# [ 0, 2, -1, -1, -1, 1, -1, 4]])
However, in this case the one occurrence left unchanged is always the first.
My goal is to leave the middle occurrence unchanged.
So given the same x as input, the desired output of function replace_consecutive_duplicates is:
# desired output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, -1, 5, -1, 3],
# [ 0, -1, 2, -1, -1, 1, -1, 4]])
Note that in case consecutive duplicate sequences with an even number of occurrences the middle left value should be unchanged. So the consecutive duplicate sequence [2, 2, 2, 2] in x[1] becomes [-1, 2, -1, -1]
Also note that I'm looking for a vectorized solution for 2D numpy arrays since performance is of absolute importance in my particular use case.
I've already tried looking at things like run length encoding and using np.diff(), but I didn't manage to solve this. Hope you guys can help!
The main problem is that you require the length of the number of consecutives values. This is not easy to get with numpy, but using itertools.groupby we can solve it using the following code.
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
def replace_row(arr: np.ndarray, new_val=-1):
results = []
for val, count in itertools.groupby(arr):
k = len(list(count))
results.extend([new_val] * ((k - 1) // 2))
results.append(val)
results.extend([new_val] * (k // 2))
return np.fromiter(results, arr.dtype)
if __name__ == '__main__':
for idx, row in enumerate(x):
x[idx, :] = replace_row(row)
print(x)
Output:
[[ 1 2 8 4 -1 5 -1 3]
[ 0 -1 2 -1 -1 1 -1 4]]
This isn't vectorized, but can be combined with multi threading since every row is handled one by one.

numpy argmin vectorization

I'm trying to iterate over numpy rows, and put the index of each cluster of 3 elements that contains the lowest value into another row. This should be in the context of left, middle, right; the left and right edges only look at two values ('left and middle' or 'middle and right'), but everything in the middle should look at all 3.
For loops do this trivially, but it's very slow. Some kind of numpy vectorization would probably speed this up.
For example:
[1 18 3 6 2]
# should give the indices...
[0 0 2 4 4] # matching values 1 1 3 2 2
Slow for loop of an implementation:
for y in range(height):
for x in range(width):
i = 0 if x == 0 else x - 1
other_array[y,x] = np.argmin(array[y,i:x+2]) + i
NOTE: See update below for a solution with no for loops.
This works for an array of any number of dimensions:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.concatenate(
[
padded[..., np.newaxis, i:i+3]
for i in range(arr.shape[-1])
],
axis=-2,
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
The code uses np.pad to pad the last dimension of the array with an extra number to the left and one to the right, so we can always use windows of 3 elements for the argmin. It sets the extra elements as max+1 so they'll never be picked by argmin.
Then it uses an np.concatenate of a list of slices to add a new dimension with each of 3-element windows. This is the only place we're using a for loop and we're only looping over the last dimension, once, to create the separate 3-element windows. (See update below for a solution that removes this for loop.)
Finally, we call np.argmin on each of the windows.
We need to adjust them, which we can do by adding the offset of the first element of the window (which is actually -1 for the first window, since it's a padded element.) We can do the adjustment with a simple sum of an arange array, which works with the broadcast.
Here's a test with your sample array:
>>> x = np.array([1, 18, 3, 6, 2])
>>> window_argmin(x)
array([0, 0, 2, 4, 4])
And a 3d example:
>>> z
array([[[ 1, 18, 3, 6, 2],
[ 1, 2, 3, 4, 5],
[ 3, 6, 19, 19, 7]],
[[ 1, 18, 3, 6, 2],
[99, 4, 4, 67, 2],
[ 9, 8, 7, 6, 3]]])
>>> window_argmin(z)
array([[[0, 0, 2, 4, 4],
[0, 0, 1, 2, 3],
[0, 0, 1, 4, 4]],
[[0, 0, 2, 4, 4],
[1, 1, 1, 4, 4],
[1, 2, 3, 4, 4]]])
UPDATE: Here's a version using stride_tricks that doesn't use any for loops:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.lib.stride_tricks.as_strided(
padded,
shape=arr.shape + (3,),
strides=padded.strides + (padded.strides[-1],),
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
What helped me come up with the stride tricks solution was this numpy issue asking to add a sliding window function, linking to an example implementation of it, so I just adapted it for this specific case. It's still pretty much magic to me, but it works. 😁
Tested and works as expected for arrays of different numbers of dimensions.
import numpy as np
array = [1, 18, 3, 6, 2]
array.insert(0, np.max(array) + 1) # right shift of array
# [19, 1, 18, 3, 6, 2]
other_array = [ np.argmin(array[i-1:i+2]) + i - 2 for i in range(1, len(array)) ]
array.remove(np.max(array)) # original array
# [1, 18, 3, 6, 2]

How can I calculate the mean value of positive values in each row in numpy? [duplicate]

This question already has answers here:
Average positive numbers in a row
(2 answers)
How to ignore values when using numpy.sum and numpy.mean in matrices
(2 answers)
Closed 4 years ago.
I have a big numpy array of time series data. in each row I have 15 second of acceleration data. such as this:
a = [[1,2,3,-1,-2,-3,-4,-1,1,2,1,2,3,2,5],
[1,2,3,-1,-2,-3,-4,-1,1,2,1,2,3,2,5],
.
.
[1,2,3,-1,-2,-3,-4,-1,1,2,1,2,3,2,5]]
I want to calculate the average value of positive items in each row for example in this case. I want to have:
avg = [0.73 , 0.73, ... , 0.73]
I don't want to use for and loop in my implementation.
Here's the original answer:
a = [[1, 2, 3, -1, -2, -3, -4, -1, 1, 2, 1, 2, 3, 2, 5],
[1, 2, 3, -1, -2, -3, -4, -1, 1, 2, 1, 2, 3, 2, 5],
[1, 2, 3, -1, -2, -3, -4, -1, 1, 2, 1, 2, 3, 2, 5]]
b = np.array(a)
def avg(a):
return a[a > 0].mean()
np.apply_along_axis(avg, 1, b)
Output:
array([2.2, 2.2, 2.2])
EDIT: Here's a better answer according to the comment by #user3483203:
np.nanmean(np.where(b>=0, b, np.nan), axis=1)
If you want to get the average of only positive elements, you can do:
a.clip(0).sum(1)/np.sum(a>0,1)
If you want to sum only positive elements and divide by total number of elements each row. You can do:
a.clip(0).mean(1)

Making an array from multiple elements from different arrays

I want to make a new array out of different numbers from each array. This is an example:
import numpy as np
a=[[0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10],[0,1,2,3,4,5,6,7,8,9,10]]
b=[[0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10],[0,1,2,3,4,5,6,7,8,9,10]]
c=[[0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10],[0,1,2,3,4,5,6,7,8,9,10]]
d=[]
for c in range (0,2):
d.append([])
for s in range (0,10):
d[c] =np.concatenate((a[c][s],b[c][s],c[c][s]))
print(d)
when I print 'd', it gives me a TypeError: 'int' object is not subscriptable.
Is this due to the concatenante function? or can I use stack?
I want the outcome to be something like:
d[0][0]= [0,0,0]
having the first term from each array. d[0][0] is indexing to a file and a row. that's why I want this format.
Numpy is an incredibly powerful library so I would recommend always using it to manipulate your arrays first before you use for loops. You should look up what numpy axes and shapes mean.
The array d that you want seems to be 3D, but the arrays a, b and c are 2D. Therefore we will first expand the dimensons of the three arrays. Then we can easily concatenate them on this new dimension.
The following code achieves what you want:
import numpy as np
# First convert the lists themselves to numpy arrays.
a = np.array([[0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]) # Shape: (2, 11)
b = np.array([[0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]) # Shape: (2, 11)
c = np.array([[0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]) # Shape: (2, 11)
# Print the shape of the arrays
print(a.shape, b.shape, c.shape)
# Add an additional dimension to the three arrays along a new axis.
# axis 0 and axis 1 already exist. So we create it along axis 2.
a_ = np.expand_dims(a, axis=2) # Shape: (2, 11, 1)
b_ = np.expand_dims(b, axis=2) # Shape: (2, 11, 1)
c_ = np.expand_dims(c, axis=2) # Shape: (2, 11, 1)
# Print the shape of the arrays
print(a_.shape, b_.shape, c_.shape)
# Concatenate all three arrays along the last axis i.e. axis 2.
d = np.concatenate((a_, b_, c_), axis=2) # Shape: (2, 11, 3)
# Print d[0][0] to check if it is [0, 0, 0]
print(d[0][0])
You should print the individual arrays a, a_ and d to check what kind of transformations are taking place.

Categories