How can I get the last index of the element in a where b > a when a and b have different length using numpy.
For instance, for the following values:
>>> a = np.asarray([10, 20, 30, 40])
>>> b = np.asarray([12, 25])
I would expect a result of [0, 1] (0.. because 12 > 10 -> index 0 in a; 1.. because 25 > 20 -> index 1 in a). Obviously, the length of the result vector should equal the length of b (and the values of the result list should be less than the length of a (as they refer to the indices in a)).
Another test is for b = np.asarray([12, 25, 31, 9, 99]) (same a as above), the result should be array([ 0, 1, 2, -1, 3]).
A vectorized solution:
Remember that you can compare all elements in b with all elements in a using broadcasting:
b[:, None] > a
# array([[ True, False, False, False], # b[0] > a[:]
# [ True, True, False, False]]) # b[1] > a[:]
And now find the index of the last True value in each row, which equals to the first False value in each row, minus 1
np.argmin((b[:, None] > a), axis=1) - 1
# array([0, 1])
Note that there might be an ambiguity as to what a returned value of -1 means. It could mean
b[x] was larger than all elements in a, or
b[x] was not larger than any element in a
In our data, this means
a = np.asarray([10, 20, 30, 40])
b = np.asarray([9, 12, 25, 39, 40, 41, 50])
mask = b[:, None] > a
# array([[False, False, False, False], # 9 is smaller than a[:], case 2
# [ True, False, False, False],
# [ True, False, False, False],
# [ True, True, True, False],
# [ True, True, True, False],
# [ True, True, True, True], # 41 is larger than a[:], case 1
# [ True, True, True, True]]) # 50 is larger than a[:], case 1
So for case 1 we need to find rows with all True values:
is_max = np.all(mask, axis=1)
And for case 2 we need to find rows with all False values:
none_found = np.all(~mask, axis=1)
This means we can use the is_max to find and replace all case 1 -1 values with a positive index
mask = b[:, None] > a
is_max = np.all(mask, axis=1)
# array([False, False, False, False, False, True, True])
idx = np.argmin(mask, axis=1) - 1
# array([-1, 0, 0, 2, 2, -1, -1])
idx[is_max] = len(a) - 1
# array([-1, 0, 0, 2, 2, 3, 3])
However be aware that the index -1 has a meaning: Just like 3 it already means "the last element". So if you want to use idx for indexing, keeping -1 as an invalid value marker may cause trouble down the line.
Works even a has shorter length than b , first choose shorter list length then check if its has smaller numbers element wise :
[i for i in range(min(len(a),len(b))) if min(a, b, key=len)[i] > max(a, b, key=len)[i]]
# [0, 1]
You can zip a and b to combine them and then enumerate to iterate it with its index
[i for i,(x,y) in enumerate(zip(a,b)) if y>x]
# [0, 1]
np.asarray([i for i in range(len(b)) if b[i]>a[i]])
This should give you the answer. Also the length does not have to be same as that of either a or b.
Related
I have a tensor / array of shape N x M, where M is less than 10 but N can potentially be > 2000. All entries are larger than or equal to zero. I want to filter out rows that either
Do not contain any zeros
End with zeros only, i.e [1,2,0,0] would be valid but not [1,0,2,0] or [0,0,1,2]. Put differently once a zero appears all following entries of that row must also be zero, otherwise the row should be ignored.
as efficiently as possible. Consider the following example
Example:
[[35, 25, 17], # no zeros -> valid
[12, 0, 0], # ends with zeros -> valid
[36, 2, 0], # ends with zeros -> valid
[8, 0, 9]] # contains zeros and does not end with zeros -> invalid
should yield [True, True, True, False]. The straightforward implementation I came up with is:
import numpy as np
T = np.array([[35,25,17], [12,0,0], [36,2,0], [0,0,9]])
N,M = T.shape
valid = [i*[True,] + (M-i)*[False,] for i in range(1, M+1)]
mask = [((row > 0).tolist() in valid) for row in T]
Is there a more elegant and efficient solution to this? Any help is greatly appreciated!
Here's one way:
x[np.all((x == 0) == (x.cumprod(axis=1) == 0), axis=1)]
This calculates the row-wise cumulative product, matches the original array's zeros up with the cumprod array, then filters any rows where there's one or more False.
Workings:
In [3]: x
Out[3]:
array([[35, 25, 17],
[12, 0, 0],
[36, 2, 0],
[ 8, 0, 9]])
In [4]: x == 0
Out[4]:
array([[False, False, False],
[False, True, True],
[False, False, True],
[False, True, False]])
In [5]: x.cumprod(axis=1) == 0
Out[5]:
array([[False, False, False],
[False, True, True],
[False, False, True],
[False, True, True]])
In [6]: (x == 0) == (x.cumprod(axis=1) == 0)
Out[6]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, False]]) # bad row!
In [7]: np.all((x == 0) == (x.cumprod(axis=1) == 0), axis=1)
Out[7]: array([ True, True, True, False])
I am working with a timeseries data. Let's say I have two lists of equal shape and I need to find instances where both lists have numbers greater than zero at the same position.
To break it down
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
We can see that in four positions, both lists have numbers greater than 0. There are other instances , but I am sure if I can get a simple code, all it needs is adjusting the signs and I can also utilize in a larger scale.
I have tried
len([1 for i in A if i > 0 and 1 for i in B if i > 0 ])
But I think the answer it's giving me is a product of both instances instead.
Since you have a numpy tag:
A = np.array([1,0,2,0,4,6,0,5])
B = np.array([0,0,5,6,7,5,0,2])
mask = ((A>0)&(B>0))
# array([False, False, True, False, True, True, False, True])
mask.sum()
# 4
A[mask]
# array([2, 4, 6, 5])
B[mask]
# array([5, 7, 5, 2])
In pure python (can be generalized to any number of lists):
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
mask = [all(e>0 for e in x) for x in zip(A, B)]
# [False, False, True, False, True, True, False, True]
If you want to use vanilla python, this should be doing what you are looking for
l = 0
for i in range(len(A)):
if A[i] > 0 and B[i] > 0:
l = l + 1
I have an array like that:
array = np.array([
[True, False],
[True, False],
[True, False],
[True, True],
])
I would like to find the last occurance of True for each row of the array.
If it was 1d array I would do it in this way:
np.where(array)[0][-1]
How do I do something similar in 2D? Kind of like:
np.where(array, axis = 1)[0][:,-1]
but there is no axis argument in np.where.
Since True is greater than False, find the position of the largest element in each row. Unfortunately, argmax finds the first largest element, not the last one. So, reverse the array sideways, find the first True from the end, and recalculate the indexes:
(array.shape[1] - 1) - array[:, ::-1].argmax(axis=1)
# array([0, 0, 0, 1])
The method fails if there are no True values in a row. You can check if that's the case by dividing by array.max(axis=1). A row with no Trues will have its last True at the infinity :)
array[0, 0] = False
((array.shape[1] - 1) - array[:, ::-1].argmax(axis=1)) / array.max(axis=1)
#array([inf, 0., 0., 1.])
I found an older answer but didn't like that it returns 0 for both a True in the first position, and for a row of False.
So here's a way to solve that problem, if it's important to you:
import numpy as np
arr = np.array([[False, False, False], # -1
[False, False, True], # 2
[True, False, False], # 0
[True, False, True], # 2
[True, True, False], # 1
[True, True, True], # 2
])
# Make an adustment for no Trues at all.
adj = np.sum(arr, axis=1) == 0
# Get the position and adjust.
x = np.argmax(np.cumsum(arr, axis=1), axis=1) - adj
# Compare to expected result:
assert np.all(x == np.array([-1, 2, 0, 2, 1, 2]))
print(x)
Gives [-1 2 0 2 1 2].
in a Python list, I need to count how many times a value is exceeded.
This code counts how many values exceed a limit.
Suppose I have this example, and I want to count how many time 2 is exceeded.
array = [1, 2, 3, 4, 1, 2, 3, 1]
a = pd.Series(array)
print(len(a[a >= 2]))
# prints 5
How can I collapse consecutive values, such that 2 is returned instead?
First compute exc = a.ge(2) - a Series answering the question:
Does the current value is >= 2.
Then, to get a number of sequences of "exceeding" elements, run:
result = (exc.shift().ne(exc) & exc).sum()
The result for your data is just 2.
I think you are very close.
>>> a = [1, 2, 3, 4, 1, 2, 3, 1]
>>> b = a >= 2
>>> b
array([False, True, True, True, False, True, True, False])
Now, instead of counting Trues, you need to count how many times you see False, True. you can compare each item in b to the item before it, b[i] > b[i-1], to find False, Trues. and you need to consider the start of the array a as well.
>>> c = np.r_[ b[0], b[1:] > b[:-1] ]
>>> c
array([ False, True, False, False, False, True, False, False])
>>> np.sum( c )
2
where
>>> b[1:]
array([ True, True, True, False, True, True, False])
>>> b[:-1]
array([False, True, True, True, False, True, True])
You can use a set to remove duplicates before converting it to a numpy array.
import numpy as np
array = [1, 2, 3, 4, 1, 2, 3, 1]
arr_set = set(array)
a = pd.Series(list(arr_set))
print(len(a[a >= 2]))
You can also do this with numpy by only showing unique values and then filtering.
len(a.unique()[a.unique() >= 2])
Im trying to delete specific rows in my numpy array that following certain conditions.
This is an example:
a = np.array ([[1,1,0,0,1],
[0,0,1,1,1],
[0,1,0,1,1],
[1,0,1,0,1],
[0,0,1,0,1],
[1,0,1,0,0]])
I want to able to delete all rows, where specific columns are zero, this array could be a lot bigger.
In this example, if first two element are zero, or if last two elements are zero, the rows will be deleted.
It could be any combination, no only first element or last ones.
This should be the final:
a = np.array ([[1,1,0,0,1],
[0,1,0,1,1],
[1,0,1,0,1]])
For example If I try:
a[:,0:2] == 0
After reading:
Remove lines with empty values from multidimensional-array in php
and this question: How to delete specific rows from a numpy array using a condition?
But they don't seem to apply to my case, or probably I'm not understanding something here as nothing works my case.
This gives me all rows there the first two cases are zero, True, True
array([[False, False],
[ True, True],
[ True, False],
[False, True],
[ True, True],
[False, True]])
and for the last two columns being zero, the last row should be deleted too. So at the end I will only be left with 2 rows.
a[:,3:5] == 0
array([[ True, False],
[False, False],
[False, False],
[ True, False],
[ True, False],
[ True, True]])
Im trying something like this, but I don't understand now how to tell it to only give me the rows that follow the condition, although this only :
(a[a[:,0:2]] == 0).all(axis=1)
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, True, True, False],
[False, False, False, False, False]])
(a[((a[:,0])& (a[:,1])) ] == 0).all(axis=1)
and this shows everything as False
could you please guide me a bit?
thank you
Just adding in the question, that the case it wont always be the first 2 or the last 2. If my matrix has 35 columns, it could be the column 6th to 10th, and then column 20th and 25th. An user will be able to decide which columns they want to get deleted.
Try this
idx0 = (a[:,0:2] == 0).all(axis=1)
idx1 = (a[:,-2:] == 0).all(axis=1)
a[~(idx0 | idx1)]
The first two steps select the indices of the rows that match your filtering criteria. Then do an or (|) operation, and the not (~) operation to obtain the final indices you want.
If I understood correctly you could do something like this:
import numpy as np
a = np.array([[1, 1, 0, 0, 1],
[0, 0, 1, 1, 1],
[0, 1, 0, 1, 1],
[1, 0, 1, 0, 1],
[0, 0, 1, 0, 1],
[1, 0, 1, 0, 0]])
left = np.count_nonzero(a[:, :2], axis=1) != 0
a = a[left]
right = np.count_nonzero(a[:, -2:], axis=1) != 0
a = a[right]
print(a)
Output
[[1 1 0 0 1]
[0 1 0 1 1]
[1 0 1 0 1]]
Or, a shorter version:
left = np.count_nonzero(a[:, :2], axis=1) != 0
right = np.count_nonzero(a[:, -2:], axis=1) != 0
a = a[(left & right)]
Use the following mask:
[np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1)]
if you want to create a filtered view:
a[np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1)]
if you want to create a new array:
np.delete(a,np.where(~(np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1))), axis=0)