interp - Program to interpolate data using Lagrange
I am not able to complete the for-loop in the coding sequence below. I don't see anything wrong with it, since I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values.
def intrpf(xi,x,y):
"""Function to interpolate between data points
using Lagrange polynomial (quadratic)
Inputs
x Vector of x coordinates of data points (3 values)
y Vector of y coordinates of data points (3 values)
xi The x value where interpolation is computed
Output
yi The interpolation polynomial evaluated at xi
"""
#* Calculate yi = p(xi) using Lagrange polynomial
yi = ( (xi-x[1])*(xi-x[2])/((x[0]-x[1])*(x[0]-x[2])) * y[0]
+ (xi-x[0])*(xi-x[2])/((x[1]-x[0])*(x[1]-x[2])) * y[1]
+ (xi-x[0])*(xi-x[1])/((x[2]-x[0])*(x[2]-x[1])) * y[2] )
return yi
#* Initialize the data points to be fit by quadratic
x = np.empty(3)
y = np.empty(3)
print ('Enter data points as x,y pairs (e.g., [1, 2]')
for i in range(3):
temp = np.array(input('Enter data point: '))
x[i] = temp[0]
y[i] = temp[1]
#* Establish the range of interpolation (from x_min to x_max)
xr = np.array(input('Enter range of x values as [x_min, x_max]: '))
I'm getting stuck on this part, where it seems properly set up, but "Too many indices for array" appears on xi[i] within the for loop.
#* Find yi for the desired interpolation values xi using
# the function intrpf
nplot = 100 # Number of points for interpolation curve
xi = np.empty(nplot)
yi = np.empty(nplot)
for i in range(nplot) :
xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
From the docs of np.array:
Parameters:
object: _array_like_
An array, any object exposing the array interface, an object whose array method returns an array, or any (nested) sequence.
This means array should receive something like a list, in order to make the casting, while input returns a string. What python is trying to do here at the end of the day is something like
np.array('[1, 2]')
While it might be tempting to do something like
np.array(eval(input()))
you should never do this because it is unsafe as it allows the user to execute any kind of code in your program. If you really need that kind of input I would suggest something like
np.array(list(map(int, input('Enter data point: ')
.replace('[','')
.replace(']','')
.split(','))))
The error occurs with your data input lines:
Enter data points as x,y pairs (e.g., [1, 2]
Enter data point: [1,2]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-6-8d648ad8c9e4> in <module>
22 for i in range(3):
23 temp = np.array(input('Enter data point: '))
---> 24 x[i] = temp[0]
25 y[i] = temp[1]
26
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
The code doesn't even get to " I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values." part.
When asking for help, give full and accurate information about the error.
If I change the input lines to:
...: x = np.empty(3)
...: y = np.empty(3)
...: print ('Enter data points as x,y pairs')
...: for i in range(3):
...: temp = input('Enter data point: ').split()
...: x[i] = temp[0]
...: y[i] = temp[1]
...:
...: #* Establish the range of interpolation (from x_min to x_max)
...: xr = np.array(input('Enter range of x values as x_min, x_max: ').split(),float)
Enter data points as x,y pairs
Enter data point: 1 2
Enter data point: 3 4
Enter data point: 5 6
Enter range of x values as x_min, x_max: 0 4
In [9]: x
Out[9]: array([1., 3., 5.])
In [10]: y
Out[10]: array([2., 4., 6.])
In [11]: xr
Out[11]: array([0., 4.])
Getting array values via user input is not ideal, but this at least works. input (in Py3) does not evaluate the inputs; it just returns a string. I split it (on default space), and then assign the values to an array. x is defined as a float array, so the x[i]=temp[0] takes care of converting the string to float. Similarly the xr line makes a float array from the string inputs. This input style is not very robust; I could easily raise an error with wrong input.
===
The rest of the code runs with this input:
In [12]: nplot = 100 # Number of points for interpolation curve
...: xi = np.empty(nplot)
...: yi = np.empty(nplot)
...: for i in range(nplot) :
...: xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
...: yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
...:
In [13]: xi
Out[13]:
array([0. , 0.04, 0.08, 0.12, 0.16, 0.2 , 0.24, 0.28, 0.32, 0.36, 0.4 ,
0.44, 0.48, 0.52, 0.56, 0.6 , 0.64, 0.68, 0.72, 0.76, 0.8 , 0.84,
...
3.52, 3.56, 3.6 , 3.64, 3.68, 3.72, 3.76, 3.8 , 3.84, 3.88, 3.92,
3.96])
In [14]: yi
Out[14]:
array([1. , 1.04, 1.08, 1.12, 1.16, 1.2 , 1.24, 1.28, 1.32, 1.36, 1.4 ,
1.44, 1.48, 1.52, 1.56, 1.6 , 1.64, 1.68, 1.72, 1.76, 1.8 , 1.84,
....
4.52, 4.56, 4.6 , 4.64, 4.68, 4.72, 4.76, 4.8 , 4.84, 4.88, 4.92,
4.96])
Related
Lets say I have a 2D numpy array, like
arr = array([[0, 0.001 , 0.002], [0.03, 0.04, 0.05], [0.01, 0.002, 0.5], [0.05, 0.8, 0.003]])
and I want to perform a piecewise function on it, say
def gammacor(x):
return np.piecewise(x, [x <= 0.00313, x > 0.00313], [12.92*x, 1.055*x**(1/2.4)-0.055])
gcarr = gammacor(arr)
When I do this, I get an error:
TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions
If I try to run the function on the flattened array (with the plan to reshape back to n x 3 after running the function), I get the error:
ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 0 output values where the mask is true
Is there an easy way to apply a piecewise function to all elements of a 2D (or ND) array?
The third parameter of np.piecewise is a funclist.
They should be callables:
import numpy as np
arr = np.array([[0, 0.001, 0.002], [0.03, 0.04, 0.05], [0.01, 0.002, 0.5],
[0.05, 0.8, 0.003]])
p = np.piecewise(arr, [arr <= 0.00313, arr > 0.00313],
[lambda v: 12.92 * v,
lambda v: 1.055 * v ** (1 / 2.4) - 0.055])
print(p)
Output:
[[0. 0.01292 0.02584 ]
[0.18974828 0.22091636 0.24780053]
[0.09985282 0.02584 0.73535698]
[0.24780053 0.90633175 0.03876 ]]
def gammacor(x):
return np.piecewise(x, [x <= 0.00313, x > 0.00313],
[lambda v: 12.92 * v,
lambda v: 1.055 * v ** (1 / 2.4) - 0.055])
gcarr = gammacor(arr)
Background
After generating a list of random weights:
sizes = [784,30,10]
weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1],sizes[1:])]
utilize Numpy's Kronecker product to create foo (with shape (900, 23520)):
foo = np.kron(np.identity(30),weights[0])
Then, matrix multiply foo with a slice from data, namely,
bar = np.dot(foo,data[0])
whereby the data[0].shape is (23520,) and data[0].dtype is float32.
Question
foo is rather wasteful. How can weights[0], which has shape (30,784), be utilized for the multiplication with data[0] in a more resourceful manner?
More generally, data[0] is a slice from an array with shape (1666,23520), so the multiplication procedure will need to be carried out 1666 times. Also, the data array is close to sparse with less than 20% of entries being non-zero.
Here's the loop I had tried:
for i in range(len(data)):
foo = np.kron(np.identity(30),weights[0])
bar = np.dot(foo,data[i])
The trick is to reshape data into 3D tensor and then use np.tensordot against weights[0] and thus by-pass foo creation, like so -
k = 30 # kernel size
data3D = data.reshape(data.shape[0],k,-1)
out = np.tensordot(data3D, weights[0], axes=(2,1)).reshape(-1,k**2)
Under the hoods, tensordot uses transposing axes, reshaping and then np.dot. So, using all that manual-labor to avoid the function call to tensordot, we would have one, like so -
out = data.reshape(-1,data.shape[1]//k).dot(weights[0].T).reshape(-1,k**2)
Related post to understand tensordot.
Sample run
Let's use a toy example to explain on what's going on to people who might not have understand the problem :
In [68]: # Toy setup and code run with original codes
...: k = 3 # kernel size, which is 30 in the original case
...:
...: data = np.random.rand(4,6)
...: w0 = np.random.rand(3,2) # this is weights[0]
...: foo = np.kron(np.identity(k), w0)
...: output_first_row = foo.dot(data[0])
So, the question is to get rid of the foo creation step and get to output_first_row and do this for all rows of data.
The proposed solution is :
...: data3D = data.reshape(data.shape[0],k,-1)
...: vectorized_out = np.tensordot(data3D, w0, axes=(2,1)).reshape(-1,k**2)
Let's verify the results :
In [69]: output_first_row
Out[69]: array([ 0.11, 0.13, 0.34, 0.67, 0.53, 1.51, 0.17, 0.16, 0.44])
In [70]: vectorized_out
Out[70]:
array([[ 0.11, 0.13, 0.34, 0.67, 0.53, 1.51, 0.17, 0.16, 0.44],
[ 0.43, 0.23, 0.73, 0.43, 0.38, 1.05, 0.64, 0.49, 1.41],
[ 0.57, 0.45, 1.3 , 0.68, 0.51, 1.48, 0.45, 0.28, 0.85],
[ 0.41, 0.35, 0.98, 0.4 , 0.24, 0.75, 0.22, 0.28, 0.71]])
Runtime test for all proposed approaches -
In [30]: import numpy as np
In [31]: sizes = [784,30,10]
In [32]: weights = [np.random.rand(y, x) for x, y in zip(sizes[:-1],sizes[1:])]
In [33]: data = np.random.rand(1666,23520)
In [37]: k = 30 # kernel size
# #Paul Panzer's soln
In [38]: %timeit (weights[0] # data.reshape(-1, 30, 784).swapaxes(1, 2)).swapaxes(1, 2)
1 loops, best of 3: 707 ms per loop
In [39]: %timeit np.tensordot(data.reshape(data.shape[0],k,-1), weights[0], axes=(2,1)).reshape(-1,k**2)
10 loops, best of 3: 114 ms per loop
In [40]: %timeit data.reshape(-1,data.shape[1]//k).dot(weights[0].T).reshape(-1,k**2)
10 loops, best of 3: 118 ms per loop
This Q&A and the comments under, might help understand how tensordot works better with tensors.
You are essentially doing matrix-matrix multiplication where the first factor is weights[0] and the second is data[i] chopped up into 30 equal slices that form the columns.
import numpy as np
sizes = [784,30,10]
weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1],sizes[1:])]
k = 2
# create sparse data
data = np.maximum(np.random.uniform(-100, 1, (k, 23520)), 0)
foo = np.kron(np.identity(30),weights[0])
# This is the original loop as a list comprehension
bar = [np.dot(foo,d) for d in data]
# This is the equivalent using matrix multiplication.
# We can take advantage of the fact that the '#' operator
# can do batch matrix multiplication (it uses the last two
# dimensions as the matrix and all others as batch index).
# The reshape does the chopping up but gives us rows where columns
# are required, hence the first swapaxes.
# The second swapaxes is to make the result directly comparable to
# the `np.kron` based result.
bar2 = (weights[0] # data.reshape(k, 30, 784).swapaxes(1, 2)).swapaxes(1, 2)
# Instead of letting numpy do the batching we can glue all the
# columns of all the second factors together into one matrix
bar3 = (weights[0] # data.reshape(-1, 784).T).T.reshape(k, -1)
# This last formulation works more or less unchanged on sparse data
from scipy import sparse
dsp = sparse.csr_matrix(data.reshape(-1, 784))
bar4 = (weights[0] # dsp.T).T.reshape(k, -1)
print(np.allclose(bar, bar2.reshape(k, -1)))
print(np.allclose(bar, bar3))
print(np.allclose(bar, bar4))
Prints:
True
True
True
I'm currently trying to apply Chi-Squared analysis to some data.
I want to plot a colourmap of varying values depending on the two coefficients of a model
def f(x, coeff):
return coeff[0] + numpy.exp(coeff[1] * x)
def chi_squared(coeff, x, y, y_err):
return numpy.sum(((y - f(x, coeff) / y_err)**2)
us = numpy.linspace(u0, u1, n)
vs = numpy.linspace(v0, v1, n)
rs = numpy.meshgrid(us, vs)
chi = numpy.vectorize(chi_squared)
chi(rs, x, y, y_error)
I tried vectorizing the function to be able to pass a meshgrid of the varying coefficents to produce the colormap.
The values of x, y, y_err are all 1D arrays of length n.
And u, v are the various changing coefficients.
However this doesn't work, resulting in
IndexError: invalid index to scalar variable.
This is because coeff is passed as a scalar rather than a vector, however I don't know how to correct this.
Update
My aim is to take an array of coordinates
rs = [[[u0, v0], [u1, v0],..,[un, v0]],...,[[u0, vm],..,[un,vm]]
Where each coordinate is the coefficient parameters to be passed to the chi-squared method.
This should return a 2D array populated with Chi-Squared values for the appropriate coordinate
chi = [[c00, c10, ..., cn0], ..., [c0m, c1m, ..., cnm]]
I can then use this data to plot a colormap using imshow
Here's my first attempt to run your code:
In [44]: def f(x, coeff):
...: return coeff[0] + numpy.exp(coeff[1] * x)
...:
...: def chi_squared(coeff, x, y, y_err):
...: return numpy.sum((y - f(x, coeff) / y_err)**2)
(I had to remove the ( in that last line)
First guess at possible array values:
In [45]: x = np.arange(3)
In [46]: y = x
In [47]: y_err = x
In [48]: us = np.linspace(0,1,3)
In [49]: rs = np.meshgrid(us,us)
In [50]: rs
Out[50]:
[array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]]),
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])]
In [51]: chi_squared(rs, x, y, y_err)
/usr/local/bin/ipython3:5: RuntimeWarning: divide by zero encountered in true_divide
import sys
Out[51]: inf
oops, y_err shouldn't have a 0. Try again:
In [52]: y_err = np.array([1,1,1])
In [53]: chi_squared(rs, x, y, y_err)
Out[53]: 53.262865105526018
It also works if I turn the rs list into an array:
In [55]: np.array(rs).shape
Out[55]: (2, 3, 3)
In [56]: chi_squared(np.array(rs), x, y, y_err)
Out[56]: 53.262865105526018
Now, what was the purpose of vectorize?
The f function returns a (n,n) array:
In [57]: f(x, rs)
Out[57]:
array([[ 1. , 1.5 , 2. ],
[ 1. , 2.14872127, 3.71828183],
[ 1. , 3.21828183, 8.3890561 ]])
Lets modify the chi_squared to give sum an axis
In [61]: def chi_squared(coeff, x, y, y_err, axis=None):
...: return numpy.sum((y - f(x, coeff) / y_err)**2, axis=axis)
In [62]: chi_squared(np.array(rs), x, y, y_err)
Out[62]: 53.262865105526018
In [63]: chi_squared(np.array(rs), x, y, y_err, axis=0)
Out[63]: array([ 3. , 6.49033483, 43.77253028])
In [64]: chi_squared(np.array(rs), x, y, y_err, axis=1)
Out[64]: array([ 1.25 , 5.272053 , 46.74081211])
I'm tempted to change the coeff to coeff0, coeff1, to give more control from the start on how this parameter is passed, but it probably doesn't make a difference.
update
Now that you've been more specific about how the coeff values relate to x, y etc, I see that this can be solved with simple broadcasting. No need to use np.vectorize.
First, define a grid that has a different size; that way we, and the code, won't think that each dimension of the coeff grid has anything to do with the x,y values.
In [134]: rs = np.meshgrid(np.linspace(0,1,4), np.linspace(0,1,5), indexing='ij')
In [135]: coeff=np.array(rs)
In [136]: coeff.shape
Out[136]: (2, 4, 5)
Now look at what f looks like when given this coeff and x.
In [137]: f(x, coeff[...,None]).shape
Out[137]: (4, 5, 3)
coeff is effectively (4,5,1), while x is (1,1,3), resulting in a (4,5,3) (by broadcasting rules)
The same thing happens inside chi_squared, with the final step of sum on the last axis (size 3):
In [138]: chi_squared(coeff[...,None], x, y, y_err, axis=-1)
Out[138]:
array([[ 2. , 1.20406718, 1.93676807, 8.40646968,
32.99441808],
[ 2.33333333, 2.15923164, 3.84810347, 11.80559574,
38.73264336],
[ 3.33333333, 3.78106277, 6.42610554, 15.87138846,
45.13753532],
[ 5. , 6.06956056, 9.67077427, 20.60384785,
52.20909393]])
In [139]: _.shape
Out[139]: (4, 5)
One value for each coeff pair of values, the (4,5) grid.
I am doing a cubic spline interpolation using scipy.interpolate.splrep as following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 10)
y = np.sin(x)
tck = scipy.interpolate.splrep(x, y, task=0, s=0)
F = scipy.interpolate.PPoly.from_spline(tck)
I print t and c:
print F.x
array([ 0. , 0. , 0. , 0. ,
2.22222222, 3.33333333, 4.44444444, 5.55555556,
6.66666667, 7.77777778, 10. , 10. ,
10. , 10. ])
print F.c
array([[ -1.82100357e-02, -1.82100357e-02, -1.82100357e-02,
-1.82100357e-02, 1.72952212e-01, 1.26008293e-01,
-4.93704109e-02, -1.71230879e-01, -1.08680287e-01,
1.00658224e-01, 1.00658224e-01, 1.00658224e-01,
1.00658224e-01],
[ -3.43151441e-01, -3.43151441e-01, -3.43151441e-01,
-3.43151441e-01, -4.64551679e-01, 1.11955696e-01,
5.31983340e-01, 3.67415303e-01, -2.03354294e-01,
-5.65621916e-01, 1.05432909e-01, 1.05432909e-01,
1.05432909e-01],
[ 1.21033389e+00, 1.21033389e+00, 1.21033389e+00,
1.21033389e+00, -5.84561936e-01, -9.76335250e-01,
-2.60847433e-01, 7.38484392e-01, 9.20774403e-01,
6.63563923e-02, -9.56285846e-01, -9.56285846e-01,
-9.56285846e-01],
[ -4.94881722e-18, -4.94881722e-18, -4.94881722e-18,
-4.94881722e-18, 7.95220057e-01, -1.90567963e-01,
-9.64317117e-01, -6.65101515e-01, 3.74151231e-01,
9.97097891e-01, -5.44021111e-01, -5.44021111e-01,
-5.44021111e-01]])
So I had supplied the x array as :
array([ 0. , 1.11111111, 2.22222222, 3.33333333,
4.44444444, 5.55555556, 6.66666667, 7.77777778,
8.88888889, 10. ])
Q.1: The F.x (knots) are not the same as original x array and has duplicate values (possibly to force first derivative to zero?). Also some values in x (1.11111111, 8.88888889) are missing in F.x. Any ideas?
Q.2 The shape of F.c is (4, 13). I understand that 4 comes from the fact that it is cubic spline fit. But I do not know how do I select coefficients for each of the 9 sections that I want (from x = 0 to x=1.11111, x = 1.111111 to x = 2.222222 and so on). Any help in extraction of the coefficients for different segments would be appreciated.
If you want to have the knots in specific locations along the curves you need to use the argument task=-1 of splrep and give an array of interior knots as the t argument.
The knots in t must satisfy the following condition:
If provided, knots t must satisfy the Schoenberg-Whitney conditions, i.e., there must be a subset of data points x[j] such that t[j] < x[j] < t[j+k+1], for j=0, 1,...,n-k-2.
See the documentation here.
Then you should get F.c of the following size (4, <length of t> + 2*(k+1)-1) corresponding to the consecutive intervals along the curve (k+1 knots are added at either end of the curve by splrep).
Try the following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 20)
y = np.sin(x)
t = np.linspace(0, 10, 10)
tck = scipy.interpolate.splrep(x, y, t=t[1:-1])
F = scipy.interpolate.PPoly.from_spline(tck)
print(F.x)
print(F.c)
# Accessing coeffs of nth segment: index = k + n - 1
# Eg. for second segment:
print(F.c[:,4])
I wrote a code that performs a spline interpolation:
x1 = [ 0., 13.99576991, 27.99153981, 41.98730972, 55.98307963, 69.97884954, 83.97461944, 97.97038935, 111.9661593, 125.9619292, 139.9576991, 153.953469 ]
y1 = [ 1., 0.88675318, 0.67899118, 0.50012243, 0.35737022, 0.27081293, 0.18486778, 0.11043095, 0.08582272, 0.04946131, 0.04285015, 0.02901567]
x = np.array(x1)
y = np.array(y1)
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
new_x = np.linspace(x.min(), x.max(), new_length)
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
But in the new dataset generated new_x and new_y the original points are eliminated, only the first and the last values are kept. I would like to keep the original points.
Right, linspace won't generate any of the values in x except the ones you pass to it (x.min() and x.max()).
I don't have a great snappy answer, but here is one way to do it:
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
interpolated_x = np.linspace(x.min(), x.max(), new_length - len(x) + 2)
new_x = np.sort(np.append(interpolated_x, x[1:-1])) # include the original points
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
This code uses:
np.linspace to create as many extra points as we need
np.append to combine the array of extra points with the original points from x
np.sort to put the combined array in order
If your data is uniform, this is another way:
import numpy as np
def interpolate_add(x_add,num):
x_add_ls=[]
for i in range(x_add.shape[0]-1):
x_add_ls += (np.linspace(x_add[i],x_add[i+1],num+2)[0:-1]).tolist()
x_add_ls.append(x[-1])
return np.array(x_add_ls)
x=np.linspace(1,5,5)
print(x)
print(interpolate_add(x,3))
it prints:
[1. 2. 3. 4. 5.]
[1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. ]
In the code, interpolate_add have two parameters
x_add is your data(numpy.array). Data shape is (N,)
num is the insert data number between two original data.
For example, if your data is array([1, 3]) and the num is 1, the result is array([1, 2, 3])