NumPy mean of multiple slices - python

Let's say I have an array a where I would like to calculate the mean across multiple slices defined in idx:
a = np.arange(10)
idx = np.random.choice([0,1], a.size).astype(bool)
a, idx
Out[1]: (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([False, False, True, True, False, False, False, True, True,
True]))
With the desired output:
array([2.5, 8. ])
Of course I could write a simple for loop, but I would prefer a fully vectorized approach given that the arrays sizes can become quite big and 2D.

It's possible to do it completely vectorized:
edges = np.diff(idx.astype(np.int8), prepend=0, append=0)
rising = np.where(edges == 1)[0]
falling = np.where(edges == -1)[0]
cum = np.insert(np.cumsum(a), 0, 0)
means = (cum[falling] - cum[rising]) / (falling - rising)
This takes about 0.2 seconds on my machine with a = np.arange(10**7).

Related

Finding instances similar in two lists with the same shape

I am working with a timeseries data. Let's say I have two lists of equal shape and I need to find instances where both lists have numbers greater than zero at the same position.
To break it down
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
We can see that in four positions, both lists have numbers greater than 0. There are other instances , but I am sure if I can get a simple code, all it needs is adjusting the signs and I can also utilize in a larger scale.
I have tried
len([1 for i in A if i > 0 and 1 for i in B if i > 0 ])
But I think the answer it's giving me is a product of both instances instead.
Since you have a numpy tag:
A = np.array([1,0,2,0,4,6,0,5])
B = np.array([0,0,5,6,7,5,0,2])
mask = ((A>0)&(B>0))
# array([False, False, True, False, True, True, False, True])
mask.sum()
# 4
A[mask]
# array([2, 4, 6, 5])
B[mask]
# array([5, 7, 5, 2])
In pure python (can be generalized to any number of lists):
A = [1,0,2,0,4,6,0,5]
B = [0,0,5,6,7,5,0,2]
mask = [all(e>0 for e in x) for x in zip(A, B)]
# [False, False, True, False, True, True, False, True]
If you want to use vanilla python, this should be doing what you are looking for
l = 0
for i in range(len(A)):
if A[i] > 0 and B[i] > 0:
l = l + 1

Count how many times a value is exceeded in a list

in a Python list, I need to count how many times a value is exceeded.
This code counts how many values exceed a limit.
Suppose I have this example, and I want to count how many time 2 is exceeded.
array = [1, 2, 3, 4, 1, 2, 3, 1]
a = pd.Series(array)
print(len(a[a >= 2]))
# prints 5
How can I collapse consecutive values, such that 2 is returned instead?
First compute exc = a.ge(2) - a Series answering the question:
Does the current value is >= 2.
Then, to get a number of sequences of "exceeding" elements, run:
result = (exc.shift().ne(exc) & exc).sum()
The result for your data is just 2.
I think you are very close.
>>> a = [1, 2, 3, 4, 1, 2, 3, 1]
>>> b = a >= 2
>>> b
array([False, True, True, True, False, True, True, False])
Now, instead of counting Trues, you need to count how many times you see False, True. you can compare each item in b to the item before it, b[i] > b[i-1], to find False, Trues. and you need to consider the start of the array a as well.
>>> c = np.r_[ b[0], b[1:] > b[:-1] ]
>>> c
array([ False, True, False, False, False, True, False, False])
>>> np.sum( c )
2
where
>>> b[1:]
array([ True, True, True, False, True, True, False])
>>> b[:-1]
array([False, True, True, True, False, True, True])
You can use a set to remove duplicates before converting it to a numpy array.
import numpy as np
array = [1, 2, 3, 4, 1, 2, 3, 1]
arr_set = set(array)
a = pd.Series(list(arr_set))
print(len(a[a >= 2]))
You can also do this with numpy by only showing unique values and then filtering.
len(a.unique()[a.unique() >= 2])

Numpy array limiting operation X[X < {value}] = {value}

I came across the following in a piece of code:
X = numpy.array()
X[X < np.finfo(float).eps] = np.finfo(float).eps
I found out the following from the documentation:
class numpy.finfo(dtype):
Machine limits for floating point types.
Parameters:
dtype : float, dtype, or instance
Kind of floating point data-type about which to get information.
I understand that np.finfo(float).eps returns the lowest represent-able float value and that X[X < np.finfo(float).eps] = np.finfo(float).eps makes sure that any value less than np.finfo(float).eps is not contained in the array X, but I'm unable to understand how exactly that happens in the statement of the form X[X < {value}] = {value} and what it means. Any help is appreciated much.
The first time I saw it was as a way to replace NaNs in an array
Basically the conditional X < np.finfo(float).eps creates a boolean mask of Xand then X is iterated over replacing values that have a True associated with them.
So for instance,
x=np.array([-4, -3, -2, -1, 0, 1, 2, 3, 4])
x[x < 0] = 0
Here the mask array would look like,
[True, True, True, True, False, False, False, False, False]
Its a quicker way of doing the following with large arrays,
x=np.array([-4, -3, -2, -1, 0, 1, 2, 3, 4])
for y, idx in enumerate(x):
if y < 0:
x[idx] = 0
This is a fancy way of changing values of an array and changing values if condition is met.
On an easy example:
X = np.random.randint(1, 100, size=5)
print(X) # array([ 1, 17, 92, 9, 11])
X[X < 50] = 50 # Change any value lower than 50 to 50
print(X) # array([50, 50, 92, 50, 50])
Basically this changes array X if you don't make a copy of it and former values are lost forever. Using np.where() would achieve same goal, but it would not override the original array.
X = np.random.randint(1, 100, size=5)
print(X) # array([ 1, 17, 92, 9, 11])
np.where(X < 50, 50, X) # array([50, 50, 92, 50, 50])
print(X) # array([ 1, 17, 92, 9, 11])
Extra info:
Fancy indexing You need to scroll a bit down tho (idk how to copy at specific header)
When we index a numpy array X with another array x, the output is a numpy array with values corresponding to the values of X at indices corresponding to the values of x.
And X < {value} returns a numpy array which has boolean values True or False against each item in X depending on whether the item passed the condition {item} < {value}. Hence, X[X < {value}] = {value} means that we're assigning the value {value} whenever an array item is less than {value}. The following would make things more clear:
>>> x = [1, 2, 0, 3, 4, 0, 5, 6, 0, 7, 8, 0]
>>> X = numpy.array(x)
>>> X < 1
array([False, False, True, False, False, True, False, False, True,
False, False, True])
>>> X[X < 1] = -1
>>> X
array([ 1, 2, -1, 3, 4, -1, 5, 6, -1, 7, 8, -1])
>>> X[x]
array([ 2, -1, 1, 3, 4, 1, -1, 5, 1, 6, -1, 1])
P.S. : The credit for this answer goes to #ForceBru and his comment above!

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]
If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

Find with python numpy the amount of combinations in tuple

I have a tuple (1,2,5,3,2,1,3,4,1) and want to find the amount of combinations of 1 and 2. So in this example it should return 2 because there are 3 times a 1 but only 2 times a 2.
The task that I want to solve is:
Give the amount of possible combinations by 1 and 2. Each number can only be used once for a combination.
I already solved the issue with this code:
count1 = tuple.count(number1)
count2 = tuple.count(number2)
if count1 < count2:
return count1
else:
return count2
return count1
But because I want to learn more magic things with numpy, I would like to know if there is a better solution to use it here.
Your if/else can be expressed more compactly with min:
In [707]: tup = (1,2,5,3,2,1,3,4,1)
In [708]: max(tup.count(1), tup.count(2))
Out[708]: 3
In [709]: min(tup.count(1),tup.count(2))
Out[709]: 2
numpy won't improve on this. There is a bincount, that counts all values in a range.
In [710]: arr = np.array(tup)
In [711]: arr
Out[711]: array([1, 2, 5, 3, 2, 1, 3, 4, 1])
In [712]: np.bincount(arr)
Out[712]: array([0, 3, 2, 2, 1, 1], dtype=int32)
We could select a could select a couple of values and as before get their min:
In [716]: np.bincount(arr)[[1,2]]
Out[716]: array([3, 2], dtype=int32)
In [717]: min(np.bincount(arr)[[1,2]])
Out[717]: 2
Keep in mind that np.array(tup) takes time; so sticking with list operations is often faster.
Another array approach is to use a broadcasted == test
In [532]: arr == [[1],[2]]
Out[532]:
array([[ True, False, False, False, False, True, False, False, True],
[False, True, False, False, True, False, False, False, False]], dtype=bool)
In [533]: _.sum(axis=1)
Out[533]: array([3, 2])
using sum on the booleans to count them.
There's also Counter class that can count all values in one call:
In [534]: from collections import Counter
In [535]: Counter(tup)
Out[535]: Counter({1: 3, 2: 2, 3: 2, 4: 1, 5: 1})
In [536]: min(_[1],_[2])
Out[536]: 2

Categories