numpy.diff with parameter n=2 produces strange result - python

I'm having a hard time understanding the behaviour of np.diff when n>1
The documentation gives the following example :
x = np.array([1, 2, 4, 7, 0])
np.diff(x)
array([ 1, 2, 3, -7])
np.diff(x, n=2)
array([ 1, 1, -10])
It seems from the first example that we are substracting each number by the previous one (x[i+1]-x[i]) and all results make sense.
The second time the function is called, with n=2, it seems that we're doing x[i+2]-x[i+1]-x[i] and the two first numbers (1 and 1) in the resulting array make sense but I am surprised the last number is not -11 (0 -7 -4) but -10.
Looking in the documentation I found this explaination
The first difference is given by out[i] = a[i+1] - a[i] along the given axis, higher differences are calculated by using diff recursively.
I fail to understand this 'recursively' so I'd be glad if someone had a clearer explanation !

np.diff(x, n=2) is the same as np.diff(np.diff(x)) (that's what "recursively" means in this case).

"Recursively" in this case simply means it's performing the same operation multiple times, each time on the array resulting from the previous step.
So:
x = np.array([1, 2, 4, 7, 0])
output = np.diff(x)
produces
output = [2-1, 4-2, 7-4, 0-7] = [1, 2, 3, -7]
If you use n=2, it simply does the same thing 2 times:
output = np.diff(x, n=2)
# first step, you won't see this result
output = [2-1, 4-2, 7-4, 0-7] = [1, 2, 3, -7]
# and then again (this will be your actual output)
output = [2-1, 3-2, -7-3] = [1, 1, -10]

Related

Index of Position of Values from B in A

I have a little bit of a tricky problem here...
Given two arrays A and B
A = np.array([8, 5, 3, 7])
B = np.array([5, 5, 7, 8, 3, 3, 3])
I would like to replace the values in B with the index of that value in A. In this example case, that would look like:
[1, 1, 3, 0, 2, 2, 2]
For the problem I'm working on, A and B contain the same set of values and all of the entries in A are unique.
The simple way to solve this is to use something like:
for idx in range(len(A)):
ind = np.where(B == A[idx])[0]
B_new[ind] = A[idx]
But the B array I'm working with contains almost a million elements and using a for loop gets super slow. There must be a way to vectorize this, but I can't figure it out. The closest I've come is to do something like
np.intersect1d(A, B, return_indices=True)
But this only gives me the first occurrence of each element of A in B. Any suggestions?
The solution of #mozway is good for small array but not for big ones as it runs in O(n**2) time (ie. quadratic time, see time complexity for more information). Here is a much better solution for big array running in O(n log n) time (ie. quasi-linear) based on a fast binary search:
unique_values, index = np.unique(A, return_index=True)
result = index[np.searchsorted(unique_values, B)]
Use numpy broadcasting:
np.where(B[:, None]==A)[1]
NB. the values in A must be unique
Output:
array([1, 1, 3, 0, 2, 2, 2])
Though cant tell exactly what the complexity of this is, I belive it will perform quite well:
A.argsort()[np.unique(B, return_inverse = True)[1]]
array([1, 1, 3, 0, 2, 2, 2], dtype=int64)

efficiently sample sequences of consecutive integers that terminate in the same number from a numpy array?

Suppose I have the following numpy array:
Space = np.arange(7)
Question: How could I generate a set of N samples from Space such that:
Each sample consist only of increasing or decreasing consecutive numbers
The sampling is done with replacement so the sample need not be monotonically increasing or decreasing.
Each sample ends with a 6 or 0, and
There is no limitation on the length of the samples (however each sample terminates once a 6 or 0 has been selected).
In essence I'm creating a markov reward process via numpy sampling (There is probably a more efficient packet for this, but i'm not sure what it would be.) For example if N = 3, a possible sampled set would look something like this.
Sample = [[1,0],[4, 3, 4, 5, 6],[4, 3, 2, 1, 2, 1, 0]]
I can accomplish this with something not very elegant like this:
N = len(Space)
Set = []
for i in range(3):
X = np.random.randint(N)
if (X == 0) | (X==6):
Set.append(X)
else:
Sample = []
while (X !=0) & (X != 6):
Next = np.array([X-1, X+1])
X = np.random.choice(Next)
Sample.append(X)
Set.append(Sample)
return(Set)
But I was wondering what a more efficient/pythonic way to go about this type of sampling, perhaps without so many loops? Or alternatively if there are better python libraries for this sort of thing? Thanks.
Numpy doesn't seem to be helping much here, I'd just use the standard random module. The main reason is that random faster when working with single values as this algorithm does and there doesn't seem to be any need to pull in an extra dependency unless needed.
from random import randint, choice
def bounded_path(lo, hi):
# r covers the interior space
r = range(lo+1, hi)
n = randint(lo, hi)
result = [n]
while n in r:
n += choice((-1, 1))
result.append(n)
return result
seems to do the right thing for me, e.g. evaluating the above 10 times, I get:
[0]
[4, 3, 4, 3, 2, 1, 0]
[5, 6]
[2, 3, 4, 3, 4, 5, 4, 3, 4, 3, 2, 1, 0]
[1, 0]
[1, 0]
[4, 3, 4, 3, 4, 3, 2, 3, 2, 1, 0]
[3, 2, 3, 2, 1, 0]
[6]
[4, 5, 4, 3, 4, 3, 2, 1, 0]
Just did quick benchmark of random number generation comparing:
def rng_np(X):
for _ in range(10):
X = np.random.choice(np.array([X-1,X+1]))
return X
def rng_py(X):
for _ in range(10):
X += choice((-1, +1))
return X
The Numpy version is ~30 times slower. Numpy has to do lots of extra work, building a Python array each iteration, converting to a Numpy array, switching in choice to allow for fancy vectorisation. Python knows that the (-1, +1) in the vanilla version is constant, so it's just built once (e.g. dis is useful to see what's going on inside).
You might be able to get somewhere by working with larger blocks of numbers, but I doubt it would be much faster. Maintaining the uniformity of starting point seems awkward, but you could probably do something if you were really careful! Numpy starts to break even when each call is vectorised over approx 10 values, and really shines when you have more than 100 values.

Reversing the list in python

In [122]: a = range(10)
In [123]: a[: : -1]
Out[123]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Could you explain the expression a[: : -1]?
a[:] is clearly understandable -> "start form the beginning(space before the colon) and retrieve the list upto the end (space after the colon)"
But I am not getting what the two colons are actually doing in the expression a[: : -1].
A slice takes three arguments, just like range: start, stop and step:
[0, 1, 2, 3, 4, 5][0:4:2] == list(range(0, 4, 2)) # every second element from 0 to 3
The negative step causes the slice to work backwards through the iterable. Without a start and stop (i.e. just the step [::-1]) it starts from the end, as it is working backwards.
The third argument (after two :'s) is the step size. -1 can be interpreted as stepping backwards. In other words, reversing the list.
Try with -2 step size i.e., a[::-2], You'll get:
[9, 7, 5, 3, 1]
Hope this helps.
More elaborate answers and explanations here Explain Python's slice notation

How to return all the minimum indices in numpy

I am a little bit confused reading the documentation of argmin function in numpy.
It looks like it should do the job:
Reading this
Return the indices of the minimum values along an axis.
I might assume that
np.argmin([5, 3, 2, 1, 1, 1, 6, 1])
will return an array of all indices: which will be [3, 4, 5, 7]
But instead of this it returns only 3. Where is the catch, or what should I do to get my result?
That documentation makes more sense when you think about multidimensional arrays.
>>> x = numpy.array([[0, 1],
... [3, 2]])
>>> x.argmin(axis=0)
array([0, 0])
>>> x.argmin(axis=1)
array([0, 1])
With an axis specified, argmin takes one-dimensional subarrays along the given axis and returns the first index of each subarray's minimum value. It doesn't return all indices of a single minimum value.
To get all indices of the minimum value, you could do
numpy.where(x == x.min())
See the documentation for numpy.argmax (which is referred to by the docs for numpy.argmin):
In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.
The phrasing of the documentation ("indices" instead of "index") refers to the multidimensional case when axis is provided.
So, you can't do it with np.argmin. Instead, this will work:
np.where(arr == arr.min())
I would like to quickly add that as user grofte mentioned, np.where returns a tuple and it states that it is a shorthand for nonzero which has a corresponding method flatnonzero which returns an array directly.
So, the cleanest version seems to be
my_list = np.array([5, 3, 2, 1, 1, 1, 6, 1])
np.flatnonzero(my_list == my_list.min())
=> array([3, 4, 5, 7])
Assuming that you want the indices of a list, not a numpy array, try
import numpy as np
my_list = [5, 3, 2, 1, 1, 1, 6, 1]
np.where(np.array(my_list) == min(my_list))[0]
The index [0] is because numpy returns a tuple of your answer and nothing (answer as a numpy array). Don't ask me why.
Recommended way (by numpy documents) to get all indices of the minimum value is:
x = np.array([5, 3, 2, 1, 1, 1, 6, 1])
a, = np.nonzero(x == x.min()) # a=>array([3, 4, 5, 7])

Reversed array in numpy?

Numpy tentative tutorial suggests that a[ : :-1] is a reversed a. Can someone explain me how we got there?
I understand that a[:] means for each element of a (with axis=0). Next : should denote the number of elements to skip (or period) from my understanding.
It isn't numpy, it's Python.
In Python, there are slices for sequence/iterable, which come in the following syntax
seq[start:stop:step] => a slice from start to stop, stepping step each time.
All the arguments are optional, but a : has to be there for Python to recognize this as a slice.
Negative values, for step, also work to make a copy of the same sequence/iterable in reverse order:
>>> L = range(10)
>>> L[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
And numpy follows that "rule" like any good 3rd party library..
>>> a = numpy.array(range(10))
>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
See this link
As others have noted, this is a python slicing technique, and numpy just follows suit. Hopefully this helps explain how it works:
The last bit is the stepsize. The 1 indicates to step by one element at a time, the - does that in reverse.
Blanks indicate the first and last, unless you have a negative stepsize, in which case they indicate last and first:
In [1]: import numpy as np
In [2]: a = np.arange(5)
In [3]: a
Out[3]: array([0, 1, 2, 3, 4])
In [4]: a[0:5:1]
Out[4]: array([0, 1, 2, 3, 4])
In [5]: a[0:5:-1]
Out[5]: array([], dtype=int64)
In [6]: a[5:0:-1]
Out[6]: array([4, 3, 2, 1])
In [7]: a[::-2]
Out[7]: array([4, 2, 0])
Line 5 gives an empty array since it tries to step backwards from the 0th element to the 5th.
The slice doesn't include the 'endpoint' (named last element) so line 6 misses 0 when going backwards.
This isn't specific to numpy, the slice a[::-1] is equivalent to slice(None, None, -1), where the first argument is the start index, the second argument is the end index, and the third argument is the step. None for start or stop will have the same behavior as using the beginning or end of the sequence, and -1 for step will iterate over the sequence in reverse order.
You can use the reversed Python built-in:
import numpy as np
bins = np.arange(0.0, 1.1, .1)
for i in reversed(bins):
print(i)

Categories