Interpolation with the most recent value - python

Say I have a set of (x,y) points in two arrays, x and y of the same length.
I would like to interpolate the values of y for new values of x_new. However, this interpolation should use the last (as "previously seen") value of y in the array.
In other words, the interpolation of
x = [0, 10, 15]
y = [1, 3, 6]
on
x_new = [1, 2, 9, 14, 16]
should return:
y_new = [1, 1, 1, 3, 6]
How can I do that in numpy? Is looping and manually checking the previous value my only alternative?
Explanation
The first element of y_new is 1, this is because its associated x_new value is 1, and the greatest, smaller than 1, x value is 0, and its y is 1.
Perhaps the best way to look at this is to consider x as temporal values, and I hoping to fill in y_new with the most recent y value.

Assume x is in increasing order. Here's how you could use np.searchsorted to do your interpolation:
In [194]: x
Out[194]: array([ 0, 10, 15])
In [195]: y
Out[195]: array([1, 3, 6])
In [196]: x_new
Out[196]: array([ 1, 2, 9, 14, 15, 16])
In [197]: i = np.searchsorted(x, x_new, side='right') - 1
In [198]: y_new = y[i]
In [199]: y_new
Out[199]: array([1, 1, 1, 3, 6, 6])
(x_new does not have to be sorted.)
This will give an incorrect result if any value in x_new is less then x[0], but that shouldn't be problem, because the process isn't defined in that case.

Related

What is the difference between (13027,) and (13027,1) in numpy expand_dim()

These are two outputs in a chunk of code after I apply the call .shape to a variable b before and after applying the call np.expand_dim(b, axis=1).
I see that the _dim part may seem like a dead giveaway, but the outputs don't seem to be different, except for, perhaps turning a row vector into a column vector (?):
b is [208. 193. 208. ... 46. 93. 200.] a row vector, but np.expand_dim(b, axis=1) gives:
[[208.]
[193.]
[208.]
...
[ 46.]
[ 93.]
[200.]]
Which could be interpreted as a column vector (?), as opposed to any increased number of dimensions.
What is the difference between (13027,) and (13027,1)
They are arrays of different dimensions and some operations apply to them differently. For example
>>> a = np.arange(5)
>>> b = np.arange(5, 10)
>>> a + b
array([ 5, 7, 9, 11, 13])
>>> np.expand_dims(a, axis=1) + b
array([[ 5, 6, 7, 8, 9],
[ 6, 7, 8, 9, 10],
[ 7, 8, 9, 10, 11],
[ 8, 9, 10, 11, 12],
[ 9, 10, 11, 12, 13]])
The last result is what we call broadcasting, for which you can read in the numpy docs, or even this SO question.
Basically np.expand_dims adds new axes at the specified dimensions and all the following achieve the same result
>>> a.shape
(5,)
>>> np.expand_dims(a, axis=(0, 2)).shape
(1, 5, 1)
>>> a[None,:,None].shape
(1, 5, 1)
>>> a[np.newaxis,:,np.newaxis].shape
(1, 5, 1)
Note that in numpy the transpose of a 1D array is still a 1D array. It isn't like in MATLAB where a row vector turns to a column vector.
>>> a
array([0, 1, 2, 3, 4])
>>> a.T
array([0, 1, 2, 3, 4])
>>> a.T.shape
(5,)
So in order to turn it to a "column vector" you have to turn the array from shape (N,) to (N, 1) with broadcasting (or reshaping). But you're better off treating it as a 2D array of N rows with 1 element per row.
(13027,) is treating the x axis as 0, while (13027,1) is treating the x axis as 1.
https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html
It's like "i" where i = 0 by default so if you don't explicitly define it, it will start at 0.

How to attach the same row to every element of a column, creating a 2D array?

I have a numpy array, say x, with shape:
(10,)
(i.e. a column) and one other array, say y, with shape:
(1,100)
(i.e. a row). I need to place a "copy" of y (the row) next to every element of x (the column), creating a new array of shape
(10,101)
What is the most efficient way to do that?
There are various ways. You could adjust dimensions, and concatenate.
Or you could make a target array of the right size, and copy values to it:
In [68]: x = np.arange(10)*10; y = np.arange(5).reshape(1,5)
In [69]: x
Out[69]: array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
In [70]: y
Out[70]: array([[0, 1, 2, 3, 4]])
In [71]: z = np.zeros((10,6),int)
In [72]: z[:,0] = x
In [73]: z[:,1:].shape
Out[73]: (10, 5)
In [74]: z[:,1:] = y
Here I'm take advantage of y shape (1,5) which can broadcast to (10,5)
In [75]: z
Out[75]:
array([[ 0, 0, 1, 2, 3, 4],
[10, 0, 1, 2, 3, 4],
[20, 0, 1, 2, 3, 4],
[30, 0, 1, 2, 3, 4],
[40, 0, 1, 2, 3, 4],
[50, 0, 1, 2, 3, 4],
[60, 0, 1, 2, 3, 4],
[70, 0, 1, 2, 3, 4],
[80, 0, 1, 2, 3, 4],
[90, 0, 1, 2, 3, 4]])
You can also use broadcasting and concatenate:
import numpy as np
a = np.arange(10)
b = np.arange(100).reshape(1, 100)
c = np.concatenate([a[:, np.newaxis], np.broadcast_to(b, (len(a), b.shape[1]))], axis=1)
print(c.shape)
# (10, 101)
Try this, making use of numpy expand_dims, repeat and block functions:
import numpy as np
x = np.ones(10)
y = np.zeros((1, 100))
x_expanded = np.expand_dims(x, axis=1)
y_expanded = np.repeat(y, 10, axis=0)
result = np.block([x_expanded, y_expanded])
How it works:
expand_dims adds a new dimension along axis 1, turning an array of shape (10,) into one of shape (10, 1)
repeat copies 10 times the values of y along axis 0, generating an array of shape (10, 100)
block yuxtaposes the two arrays, (10, 1) and (10, 100), to conform the result array of shape (10, 101)
Edit: I timed my approach and #hpaulj's. His solution is 3 times faster than mine, so in terms of efficiency you should use his:
# My approach. Benchmark: 2.690964487e-05 seg (nr of times: 10000000)
x = np.arange(10)*10
y = np.arange(100).reshape(1, 100)
x_expanded = np.expand_dims(x, axis=1)
y_expanded = np.repeat(y, 10, axis=0)
result = np.block([x_expanded, y_expanded])
# hpaulj approach. Benchmark: 7.89659798e-06 seg (nr of times: 10000000)
x = np.arange(10)*10
y = np.arange(100).reshape(1, 100)
z = np.zeros((10, 101), int)
z[:, 0] = x
z[:,1:] = y

Numpy: select value at a particular row for each column of a matrix

I have a 2D matrix X = ((a11, a12, .. a1n), (a21 .. a2n) .. (am1, .. amn)) and a 1D vector y = [y1, ..., yn] each yi is between 1 and m. For each column i of X I want to pick out the element at row yi. That is, I want to pick out the vector z = (a_(y1 1), ... a_(yn n)).
Is there a vectorized way to do this?
How about this:
In [39]: x = np.arange(12).reshape(4,3)
In [40]: y = np.array([0,3,2])
In [41]: x[y[None, :], np.arange(len(y))[None,:]][0]
Out[41]: array([ 0, 10, 8])
In [42]: x
Out[42]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
As an alternative solution, np.choose is useful for making the selections.
>>> x = np.arange(16).reshape(4,4)
So x looks like this:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Now the selection of the value at a particular row y in each column can be done like this:
>>> y = np.array([3, 0, 2, 1])
>>> np.choose(y, x)
array([12, 1, 10, 7])

Backwards axes in numpy.delete

It seems as though the axis argument in numpy.delete() is backwards from all other axis arguments in both numpy and pandas. Typically, axis=0 refers to columns and axis=1 refers to rows. For example:
import numpy as np
mat=np.array([[1,2], [3,4]])
# sum columns
np.sum(mat, axis=0)
# sum rows
np.sum(mat, axis=1)
# min of columns
np.min(mat, axis=0)
That all works like expected. But if I use numpy.delete, I have to switch:
# delete 1st row
np.delete(mat, 0, axis=0)
# delete 1st column
np.delete(mat, 0, axis=1)
Has anyone else noticed this? Am I crazy or is this by design?
It is by design. You are specifying the axis from which to delete the given index (or indices). For example, suppose we have z as follows:
In [62]: z
Out[62]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
You select different rows of z by varying the first index of z (i.e. by selecting indices along axis 0):
In [63]: z[0, :]
Out[63]: array([0, 1, 2, 3, 4])
In [64]: z[1, :]
Out[64]: array([5, 6, 7, 8, 9])
So it makes sense that you would also select axis=0 to delete, say, the row at index 1:
In [65]: np.delete(z, 1, axis=0)
Out[65]:
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14]])
Similarly, you use axis 1 (i.e. the second index) to access different columns:
In [66]: z[:, 0]
Out[66]: array([ 0, 5, 10])
In [67]: z[:, 3]
Out[67]: array([ 3, 8, 13])
and so you use axis=1 to delete columns:
In [68]: np.delete(z, 3, axis=1)
Out[68]:
array([[ 0, 1, 2, 4],
[ 5, 6, 7, 9],
[10, 11, 12, 14]])
Don't forget that this generalizes to n-dimensional arrays. For example, if you have a three-dimensional array a, and you want to delete the two-dimensional slice a[:, :, k], you would use np.delete(a, k, axis=2).

NumPy selecting specific column index per row by using a list of indexes

I'm struggling to select the specific columns per row of a NumPy matrix.
Suppose I have the following matrix which I would call X:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
I also have a list of column indexes per every row which I would call Y:
[1, 0, 2]
I need to get the values:
[2]
[4]
[9]
Instead of a list with indexes Y, I can also produce a matrix with the same shape as X where every column is a bool / int in the range 0-1 value, indicating whether this is the required column.
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.
I was thus wondering if there is a better solution?
If you've got a boolean array you can do direct selection based on that like so:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])
To go along with your initial example you could do the following:
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])
You can also add in an arange and do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])
You can do something like this:
In [7]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [8]: lst = [1, 0, 2]
In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])
More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
Recent numpy versions have added a take_along_axis (and put_along_axis) that does this indexing cleanly.
In [101]: a = np.arange(1,10).reshape(3,3)
In [102]: b = np.array([1,0,2])
In [103]: np.take_along_axis(a, b[:,None], axis=1)
Out[103]:
array([[2],
[4],
[9]])
It operates in the same way as:
In [104]: a[np.arange(3), b]
Out[104]: array([2, 4, 9])
but with different axis handling. It's especially aimed at applying the results of argsort and argmax.
A simple way might look like:
In [1]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [2]: y = [1, 0, 2] #list of indices we want to select from matrix 'a'
range(a.shape[0]) will return array([0, 1, 2])
In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])
You can do it by using iterator. Like this:
np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
Time:
N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)
##Aशwini चhaudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop
#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop
#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop
Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])
np.diag(X.T[Y])
Step by step:
Original arrays:
>>> X
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> Y
array([1, 0, 2, 2])
Transpose to make it possible to index it right.
>>> X.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Get rows in the Y order.
>>> X.T[Y]
array([[ 2, 5, 8, 11],
[ 1, 4, 7, 10],
[ 3, 6, 9, 12],
[ 3, 6, 9, 12]])
The diagonal should now become clear.
>>> np.diag(X.T[Y])
array([ 2, 4, 9, 12]
The answer from hpaulj using take_along_axis should be the accepted one.
Here is a derived version with an N-dim index array:
>>> arr = np.arange(20).reshape((2,2,5))
>>> idx = np.array([[1,0],[2,4]])
>>> np.take_along_axis(arr, idx[...,None], axis=-1)
array([[[ 1],
[ 5]],
[[12],
[19]]])
Note that the selection operation is ignorant about the shapes. I used this to refine a possibly vector-valued argmax result from histogram by fitting parabolas:
def interpol(arr):
i = np.argmax(arr, axis=-1)
a = lambda Δ: np.squeeze(np.take_along_axis(arr, i[...,None]+Δ, axis=-1), axis=-1)
frac = .5*(a(1) - a(-1)) / (2*a(0) - a(-1) - a(1)) # |frac| < 0.5
return i + frac
Note the squeeze to remove the dimension of size 1 resulting in the same shape of i and frac, the integer and fractional part of the peak position.
I'm quite sure that it is possible to avoid the lambda, but would the interpolation formula still look nice?

Categories