Finding instances similar in two lists with the same shape - python

I am working with a timeseries data. Let's say I have two lists of equal shape and I need to find instances where both lists have numbers greater than zero at the same position.
To break it down
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
We can see that in four positions, both lists have numbers greater than 0. There are other instances , but I am sure if I can get a simple code, all it needs is adjusting the signs and I can also utilize in a larger scale.
I have tried
len([1 for i in A if i > 0 and 1 for i in B if i > 0 ])
But I think the answer it's giving me is a product of both instances instead.

Since you have a numpy tag:
A = np.array([1,0,2,0,4,6,0,5])
B = np.array([0,0,5,6,7,5,0,2])
mask = ((A>0)&(B>0))
# array([False, False, True, False, True, True, False, True])
mask.sum()
# 4
A[mask]
# array([2, 4, 6, 5])
B[mask]
# array([5, 7, 5, 2])
In pure python (can be generalized to any number of lists):
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
mask = [all(e>0 for e in x) for x in zip(A, B)]
# [False, False, True, False, True, True, False, True]

If you want to use vanilla python, this should be doing what you are looking for
l = 0
for i in range(len(A)):
if A[i] > 0 and B[i] > 0:
l = l + 1

Related

How do I pass a list as changing condition in an array?

Let's say that I have an numpy array a = [1 2 3 4 5 6 7 8] and I want to change everything else but 1,2 and 3 to 0. With a list b = [1,2,3] a tried a[a not in b] = 0, but Python does not accept this. Currently I'm using a for loop like this:
c = a.unique()
for i in c:
if i not in b:
a[a == i] = 0
Which works very slowly (Around 900 different values in a 3D array around the size of 1000x1000x1000) and doesn't fell like the optimal solution for numpy. Is there a more optimal way doing it in numpy?
You can use numpy.isin() to create a boolean mask to use as an index:
np.isin(a, b)
# array([ True, True, True, False, False, False, False, False])
Use ~ to do the opposite:
~np.isin(a, b)
# array([False, False, False, True, True, True, True, True])
Using this to index the original array lets you assign zero to the specific elements:
a = np.array([1,2,3,4,5,6,7,8])
b = np.array([1, 2, 3])
a[~np.isin(a, b)] = 0
print(a)
# [1 2 3 0 0 0 0 0]

Count how many times a value is exceeded in a list

in a Python list, I need to count how many times a value is exceeded.
This code counts how many values exceed a limit.
Suppose I have this example, and I want to count how many time 2 is exceeded.
array = [1, 2, 3, 4, 1, 2, 3, 1]
a = pd.Series(array)
print(len(a[a >= 2]))
# prints 5
How can I collapse consecutive values, such that 2 is returned instead?
First compute exc = a.ge(2) - a Series answering the question:
Does the current value is >= 2.
Then, to get a number of sequences of "exceeding" elements, run:
result = (exc.shift().ne(exc) & exc).sum()
The result for your data is just 2.
I think you are very close.
>>> a = [1, 2, 3, 4, 1, 2, 3, 1]
>>> b = a >= 2
>>> b
array([False, True, True, True, False, True, True, False])
Now, instead of counting Trues, you need to count how many times you see False, True. you can compare each item in b to the item before it, b[i] > b[i-1], to find False, Trues. and you need to consider the start of the array a as well.
>>> c = np.r_[ b[0], b[1:] > b[:-1] ]
>>> c
array([ False, True, False, False, False, True, False, False])
>>> np.sum( c )
2
where
>>> b[1:]
array([ True, True, True, False, True, True, False])
>>> b[:-1]
array([False, True, True, True, False, True, True])
You can use a set to remove duplicates before converting it to a numpy array.
import numpy as np
array = [1, 2, 3, 4, 1, 2, 3, 1]
arr_set = set(array)
a = pd.Series(list(arr_set))
print(len(a[a >= 2]))
You can also do this with numpy by only showing unique values and then filtering.
len(a.unique()[a.unique() >= 2])

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]
If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

Get first index in array where relation is true

How can I get the last index of the element in a where b > a when a and b have different length using numpy.
For instance, for the following values:
>>> a = np.asarray([10, 20, 30, 40])
>>> b = np.asarray([12, 25])
I would expect a result of [0, 1] (0.. because 12 > 10 -> index 0 in a; 1.. because 25 > 20 -> index 1 in a). Obviously, the length of the result vector should equal the length of b (and the values of the result list should be less than the length of a (as they refer to the indices in a)).
Another test is for b = np.asarray([12, 25, 31, 9, 99]) (same a as above), the result should be array([ 0, 1, 2, -1, 3]).
A vectorized solution:
Remember that you can compare all elements in b with all elements in a using broadcasting:
b[:, None] > a
# array([[ True, False, False, False], # b[0] > a[:]
# [ True, True, False, False]]) # b[1] > a[:]
And now find the index of the last True value in each row, which equals to the first False value in each row, minus 1
np.argmin((b[:, None] > a), axis=1) - 1
# array([0, 1])
Note that there might be an ambiguity as to what a returned value of -1 means. It could mean
b[x] was larger than all elements in a, or
b[x] was not larger than any element in a
In our data, this means
a = np.asarray([10, 20, 30, 40])
b = np.asarray([9, 12, 25, 39, 40, 41, 50])
mask = b[:, None] > a
# array([[False, False, False, False], # 9 is smaller than a[:], case 2
# [ True, False, False, False],
# [ True, False, False, False],
# [ True, True, True, False],
# [ True, True, True, False],
# [ True, True, True, True], # 41 is larger than a[:], case 1
# [ True, True, True, True]]) # 50 is larger than a[:], case 1
So for case 1 we need to find rows with all True values:
is_max = np.all(mask, axis=1)
And for case 2 we need to find rows with all False values:
none_found = np.all(~mask, axis=1)
This means we can use the is_max to find and replace all case 1 -1 values with a positive index
mask = b[:, None] > a
is_max = np.all(mask, axis=1)
# array([False, False, False, False, False, True, True])
idx = np.argmin(mask, axis=1) - 1
# array([-1, 0, 0, 2, 2, -1, -1])
idx[is_max] = len(a) - 1
# array([-1, 0, 0, 2, 2, 3, 3])
However be aware that the index -1 has a meaning: Just like 3 it already means "the last element". So if you want to use idx for indexing, keeping -1 as an invalid value marker may cause trouble down the line.
Works even a has shorter length than b , first choose shorter list length then check if its has smaller numbers element wise :
[i for i in range(min(len(a),len(b))) if min(a, b, key=len)[i] > max(a, b, key=len)[i]]
# [0, 1]
You can zip a and b to combine them and then enumerate to iterate it with its index
[i for i,(x,y) in enumerate(zip(a,b)) if y>x]
# [0, 1]
np.asarray([i for i in range(len(b)) if b[i]>a[i]])
This should give you the answer. Also the length does not have to be same as that of either a or b.

Efficiently adapt a python list according to another list

I could not come up with a better title for my question, sorry.
I have two lists of the same length, e.g.
a = [True, False, False, True, True, False, False, True]
b = [1, 2, 2, 1, 1, 3, 3, 2 ]
i j i' j'
I want to adjust list a in such a way that whenever there is a block of or a single False in list a going from index i to j, it is adjusted according to the following condition:
if b[i-1]==b[j+1]:
a[i:j+1]=[True]*(j-i+1)
in the above example there are two such blocks: i,j=1,2 and i',j'=5,6.
The result should be:
a = [True, True, True, True, True, False, False, True]
I wrote a solution with a for loop using ifs but that is too slow since I want to use it on very large lists.
a = [True, False, False, True, True, False, False, True]
b = [1, 2, 2, 1, 1, 3, 3, 2 ]
#Edit: the next two lines were originally and wrongly inside the for loop
moving=True
istart=1
for i,trp in enumerate((a)):
if trp==False:
if moving==False:
# if this condition holds, the particle just started a new move
istart = i
moving = True
else:
if moving==True:
# is this condition holds, the particle has stopped its move
moving = False
if b[i]==b[istart-1]:
# if this holds, a needs to be adjusted
a[istart:i]=[True]*(i-istart)
Any help would be greatly appreciated. (The comments and variable names are like that since it's for analyzing a physics simulations)
You can try this:
import itertools
a = [True, False, False, True, True, False, False, True]
b = [1, 2, 2, 1, 1, 3, 3, 2 ]
new_a = [(a, list(b)) for a, b in itertools.groupby(zip(a, b), key=lambda x:x[0])]
final_list = list(itertools.chain(*[[True]*len(b) if not a and new_a[i-1][-1][-1] == new_a[i+1][-1][-1] and i > 0 else [c for c, d in b] for i, [a, b] in enumerate(new_a)]))
Output:
[True, True, True, True, True, False, False, True]
Edit: test with new input:
a = [True, False, True]
b = [1, 3, 1]
new_a = [(a, list(b)) for a, b in itertools.groupby(zip(a, b), key=lambda x:x[0])]
final_list = list(itertools.chain(*[[True]*len(b) if not a and new_a[i-1][-1][-1] == new_a[i+1][-1][-1] and i > 0 else [c for c, d in b] for i, [a, b] in enumerate(new_a)]))
Output:
[True, True, True]
Code explanation:
itertools.groupby forms the consecutive blocks of True/False values into single lists. Then, final_list stores the result of iterating over the lists stored in new_a, creating a new sublist of True values if the sublist is composed entirely of False values if and only if the preceding and following values are the same. This is determined by using enumerate to garner the current index for each iteration. That index can then be used to access the preceding and following values via i-1, i+1.

Categories