Python: SciPy.interpolate PiecewisePolynomial - python

import numpy as np
from scipy.interpolate import PiecewisePolynomial
xi = np.array([1,10])
yi = np.array([10,1])
p = PiecewisePolynomial(xi,yi)
Does not yield a linear interpolation of the two points but
ZeroDivisionError: integer division or modulo by zero
What's wrong there?

Replace your yi with
yi = np.array([[10], [1]])
PiecewisePolynomial requires y array to be an array-like or a list-of-array structure. Each element of y can be a function value for x and its subsequent derivatives. Above change to y creates the correct linear interpolation
p = PiecewisePolynomial(xi,yi)
p.__call__([5.])
>> array([6.])
p.__call__([2.])
>> array([9.])

Related

Interpolate between linear and nonlinear values

I have been able to interpolate values successfully from linear values of x to sine-like values of y.
However - I am struggling to interpolate the other way - from nonlinear values of y to linear values of x.
The below is a toy example
import matplotlib.pylab as plt
from scipy import interpolate
#create 100 x values
x = np.linspace(-np.pi, np.pi, 100)
#create 100 values of y where y= sin(x)
y=np.sin(x)
#learn function to map y from x
f = interpolate.interp1d(x, y)
With new values of linear x
xnew = np.array([-1,1])
I get correctly interpolated values of nonlinear y
ynew = f(xnew)
print(ynew)
array([-0.84114583, 0.84114583])
The problem comes when I try and interpolate values of x from y.
I create a new function, the reverse of f:
f2 = interpolate.interp1d(y,x,kind='cubic')
I put in values of y that I successfully interpolated before
ynew=np.array([-0.84114583, 0.84114583])
I am expecting to get the original values of x [-1, 1]
But I get:
array([-1.57328791, 1.57328791])
I have tried putting in other values for the 'kind' parameter with no luck and am not sure if I have got the wrong approach here. Thanks for your help
I guess the problem raises from the fact, that x is not a function of y, since for an arbitrary y value there may be more than one x value found.
Take a look at a truncated range of data.
When x ranges from 0 to np.pi/2, then for every y value there is a unique x value.
In this case the snippet below works as expected.
>>> import numpy as np
>>> from scipy import interpolate
>>> x = np.linspace(0, np.pi / 2, 100)
>>> y = np.sin(x)
>>> f = interpolate.interp1d(x, y)
>>> f([0, 0.1, 0.3, 0.5])
array([0. , 0.09983071, 0.29551713, 0.47941047])
>>> f2 = interpolate.interp1d(y, x)
>>> f2([0, 0.09983071, 0.29551713, 0.47941047])
array([0. , 0.1 , 0.3 , 0.50000001])
Maxim provided the reason for this behavior. This interpolation is a class designed to work for functions. In your case, y=arcsin(x) is only in a limited interval a function. This leads to interesting phenomena in the interpolation routine that interpolates to the nearest y-value which in the case of the arcsin() function is not necessarily the next value in the x-y curve but maybe several periods away. An illustration:
import numpy as np
import matplotlib.pylab as plt
from scipy import interpolate
xmin=-np.pi
xmax=np.pi
fig, axes = plt.subplots(3, 3, figsize=(15, 10))
for i, fac in enumerate([2, 1, 0.5]):
x = np.linspace(xmin * fac, xmax*fac, 100)
y=np.sin(x)
#x->y
f = interpolate.interp1d(x, y)
x_fit = np.linspace(xmin*fac, xmax*fac, 1000)
y_fit = f(x_fit)
axes[i][0].plot(x_fit, y_fit)
axes[i][0].set_ylabel(f"sin period {fac}")
if not i:
axes[i][0].set_title(label="interpolation x->y")
#y->x
f2 = interpolate.interp1d(y, x)
y2_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x2_fit = f2(y2_fit)
axes[i][1].plot(x2_fit, y2_fit)
if not i:
axes[i][1].set_title(label="interpolation y->x")
#y->x with cubic interpolation
f3 = interpolate.interp1d(y, x, kind="cubic")
y3_fit = np.linspace(.99 * min(y), .99 * max(y), 1000)
x3_fit = f3(y3_fit)
axes[i][2].plot(x3_fit, y3_fit)
if not i:
axes[i][2].set_title(label="cubic interpolation y->x")
plt.show()
As you can see, the interpolation works along the ordered list of y-values (as you instructed it to), and this works particularly badly with cubic interpolation.

1D Wasserstein distance in Python

The formula below is a special case of the Wasserstein distance/optimal transport when the source and target distributions, x and y (also called marginal distributions) are 1D, that is, are vectors.
where F^{-1} are inverse probability distribution functions of the cumulative distributions of the marginals u and v, derived from real data called x and y, both generated from the normal distribution:
import numpy as np
from numpy.random import randn
import scipy.stats as ss
n = 100
x = randn(n)
y = randn(n)
How can the integral in the formula be coded in python and scipy? I'm guessing the x and y have to be converted to ranked marginals, which are non-negative and sum to 1, while Scipy's ppf could be used to calculate the inverse F^{-1}'s?
Note that when n gets large we have that a sorted set of n samples approaches the inverse CDF sampled at 1/n, 2/n, ..., n/n. E.g.:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
plt.plot(norm.ppf(np.linspace(0, 1, 1000)), label="invcdf")
plt.plot(np.sort(np.random.normal(size=1000)), label="sortsample")
plt.legend()
plt.show()
Also note that your integral from 0 to 1 can be approximated as a sum over 1/n, 2/n, ..., n/n.
Thus we can simply answer your question:
def W(p, u, v):
assert len(u) == len(v)
return np.mean(np.abs(np.sort(u) - np.sort(v))**p)**(1/p)
Note that if len(u) != len(v) you can still apply the method with linear interpolation:
def W(p, u, v):
u = np.sort(u)
v = np.sort(v)
if len(u) != len(v):
if len(u) > len(v): u, v = v, u
us = np.linspace(0, 1, len(u))
vs = np.linspace(0, 1, len(v))
u = np.linalg.interp(u, us, vs)
return np.mean(np.abs(u - v)**p)**(1/p)
An alternative method if you have prior information about the sort of distribution of your data, but not its parameters, is to find the best fitting distribution on your data (e.g. with scipy.stats.norm.fit) for both u and v and then do the integral with the desired precision. E.g.:
from scipy.stats import norm as gauss
def W_gauss(p, u, v, num_steps):
ud = gauss(*gauss.fit(u))
vd = gauss(*gauss.fit(v))
z = np.linspace(0, 1, num_steps, endpoint=False) + 1/(2*num_steps)
return np.mean(np.abs(ud.ppf(z) - vd.ppf(z))**p)**(1/p)
I guess I am a bit late but, but this is what I would do for an exact solution (using only numpy):
import numpy as np
from numpy.random import randn
n = 100
m = 80
p = 2
x = np.sort(randn(n))
y = np.sort(randn(m))
a = np.ones(n)/n
b = np.ones(m)/m
# cdfs
ca = np.cumsum(a)
cb = np.cumsum(b)
# points on which we need to evaluate the quantile functions
cba = np.sort(np.hstack([ca, cb]))
# weights for integral
h = np.diff(np.hstack([0, cba]))
# construction of first quantile function
bins = ca + 1e-10 # small tolerance to avoid rounding errors and enforce right continuity
index_qx = np.digitize(cba, bins, right=True) # right=True becouse quantile function is
# right continuous
qx = x[index_qx] # quantile funciton F^{-1}
# construction of second quantile function
bins = cb + 1e-10
index_qy = np.digitize(cba, bins, right=True) # right=True becouse quantile function is
# right continuous
qy = y[index_qy] # quantile funciton G^{-1}
ot_cost = np.sum((qx - qy)**p * h)
print(ot_cost)
In case you are interested, here you can find a more detailed numpy based implementation of the ot problem on the real line with dual and primal solutions as well: https://github.com/gnies/1d-optimal-transport. (I am still working on it though).

Evaluate numerically an equation with sympy

I want to ask something that provably is extremly easy but I didn't find how to do it... The point is that I want to define some function in python in a symbolic way using sympy in order to make its derivative and then use this expresion numerically.
Here an example is showed:
import numpy as np
from sympy import *
z = Symbol('z')
function = z*exp(z**2)
deriv = diff(function, z)
x = np.arange(1, 3, 0.1) #interval of points
#How can I evaluate numerically this array "x" with the function deriv???
Do you know how to do it? Thanks!
You can use lambdify with the numpy backend:
import numpy as np
from sympy import *
z = Symbol('z')
function = z*exp(z**2)
deriv = diff(function, z)
x = np.arange(1, 3, 0.1) #interval of points
d = lambdify(z, deriv, "numpy")
d(x)
# array([ 8.15484549e+00, 1.14689175e+01, 1.63762998e+01,
# 2.37373255e+01, 3.49286892e+01, 5.21825471e+01,
# 7.91672020e+01, 1.21994639e+02, 1.90992239e+02,
# 3.03860954e+02, 4.91383350e+02, 8.07886132e+02,
# 1.35069268e+03, 2.29681687e+03, 3.97320108e+03,
# 6.99317313e+03, 1.25255647e+04, 2.28335915e+04,
# 4.23706166e+04, 8.00431723e+04])

if y>0.0 and x -y>=-Q1: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have been trying to get this to work for a while now, but still not finding a way. I am trying to compute the Look ahead estimate density of a piecewise gaussian function. I'm trying to estimate the stationary distribution of a piecewise normally distributed function. is there a way to avoid the error type:
Error-type: the truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
for instance y=np.linspace(-200.0,200.0,100) and x = np,linspace(-200.0,200.0,100). then verify the condition as stated in the code below?
import numpy as np
import sympy as sp
from numpy import exp,sqrt,pi
from sympy import Integral, log, exp, sqrt, pi
import math
import matplotlib.pyplot as plt
import scipy.integrate
from scipy.special import erf
from scipy.stats import norm, gaussian_kde
from quantecon import LAE
from sympy.abc import q
#from sympy import symbols
#var('q')
#q= symbols('q')
## == Define parameters == #
mu=80
sigma=20
b=0.2
Q=80
Q1=Q*(1-b)
Q2=Q*(1+b)
d = (sigma*np.sqrt(2*np.pi))
phi = norm()
n = 500
#Phi(z) = 1/2[1 + erf(z/sqrt(2))].
def p(x, y):
# x, y = np.array(x, dtype=float), np.array(y, dtype=float)
Positive_RG = norm.pdf(x-y+Q1, mu, sigma)
print('Positive_R = ', Positive_RG)
Negative_RG = norm.pdf(x-y+Q2, mu, sigma)
print('Negative_RG = ', Negative_RG)
pdf_0= (1/(2*math.sqrt(2*math.pi)))*(erf((x+Q2-mu)/(sigma*np.sqrt(2)))-erf((x+Q1-mu)/(sigma*np.sqrt(2))))
Zero_RG =norm.pdf
print('Zero_RG',Zero_RG)
print ('y',y)
if y>0.0 and x -y>=-Q1:
#print('printA', Positive_RG)
return Positive_RG
elif y<0.0 and x -y>=-Q2:
#print('printC', Negative_RG)
return Negative_RG
elif y==0.0 and x >=-Q1:
#print('printB', Zero_RG)
return Zero_RG
return 0.0
Z = phi.rvs(n)
X = np.empty(n)
for t in range(n-1):
X[t+1] = X[t] + Z[t]
#X[t+1] = np.abs(X[t]) + Z[t]
psi_est = LAE(p, X)
k_est = gaussian_kde(X)
fig, ax = plt.subplots(figsize=(10,7))
ys = np.linspace(-200.0, 200.0, 200)
ax.plot(ys, psi_est(ys), 'g-', lw=2, alpha=0.6, label='look ahead estimate')
ax.plot(ys, k_est(ys), 'k-', lw=2, alpha=0.6, label='kernel based estimate')
ax.legend(loc='upper left')
plt.show()
See all those ValueError questions in the side bar????
This error is produced when a boolean array is used in a scalar boolean context, such as if or or/and.
Try your y or x in this test, or even simpler one. Experiment in a interactive shell.
if y>0.0 and x -y>=-Q1: ....
if y>0:
(y>0.0) and (x-y>=10)
will all produce this error with your x and y.
Notice also that I edited your question for clarity.
Error starts with quantecon.LAE(p, X), which expects a vectorized function p. Your function isn't vectorized, which is why everything else doesn't work. You copied some vectorized code, but left a lot of things as sympy style functions which is why the numpy folks were confused about what you wanted.
In this case "vectorized" means transforming two 1D arrays with length n into a 2D n x n array. In this case, you don't want to return 0.0, you want to return out a 2d ndArray which has the value 0.0 in locations out[i,j] where a boolean mask based on a function of x[i], y[j] is false.
You can do this by broadcasting:
def sum_function(x,y):
return x[:, None] + y[None, :] # or however you want to add them, broadcasted to 2D
def myFilter(x,y):
x, y = x.squeeze(), y.squeeze()
out=np.zeros((x.size,y.size))
xyDiff = x[:, None] - y[None, :]
out=np.where(np.bitwise_and(y[None, :] => 0.0, xyDiff >= -Q1), sum_function(x, y), out) # unless the sum functions are different
out=np.where(np.bitwise_and(y[None, :] < 0.0, xyDiff >= -Q2), sum_function(x, y), out)
return out

Finding the derivative of a plot given two axis - python

I have a list of the x-axis and another list of the y-axis values and currently I am finidng the derivative of the gradient as such:
from pylab import polyfit
x = [0,2,3,4]
y = [23,4,34,67]
(m,__) = polyfit(x,y,1)
print m
If I don't want to rely on the pylab/scipy polyfit, how else could I get the deriative?
matplotlib.pylab includes numpy for you, so just use the function numpy.polyfit directly:
import numpy as np
x = [0,2,3,4]
y = [23,4,34,67]
m, __ = np.polyfit(x, y, 1)
print m

Categories