Joint CDF in numpy

Joint CDF in numpy - python

Computing a CDF (Cummulative Distribution Function) in numpy is fairly straightforward, but now I want to move to multiple dimensions using the 3 dimensions of data and then compute be able to easily compute the corresponding X, Y, Z for say Nth percentile easily.
I'm finding the documentation out there is not the easiest to navigate and that would be useful. I'm trying to use what is already out there and not re-invent the wheel.
Here is how I do it in 1D:
h, x = np.histogramdd(np.array(full_data), bins = 10, normed = True)
dx = x[1] - x[0]
f1 = np.cumsum(h)*dx
Then plot:
plt.plot(x[1:], f1)
In 3D it will look like:
full_data = [[1,2,4], [2,3,4], ...]
Any suggestions for something more pythonic and elegant before I cludge something together.

Related

Scipy interp2d function produces z = f(x,y), I would like to solve for x

I am using the 2d interpolation function in scipy to smooth a 2d image. As I understand it, interpolate will return z = f(x,y). What I want to do is find x with known values of y and z. I tried something like this;
f = interp2d(x,y,z)
index = (np.abs(f(:,y) - z)).argmin()
However the interp2d object does not work that way. Any ideas on how to do this?

I was able to figure this out. yvalue, zvalue, xmin, and xmax are known values. By creating a linspace out of the possible values x can take on, a list can be created with all of the corresponding function values. Then using argmin() we can find the closest value in the list to the known z value.
f = interp2d(x,y,z)
xnew = numpy.linspace(xmin, xmax)
fnew = f(xnew, yvalue)
xindex = (numpy.abs(fnew - zvalue)).argmin()
xvalue = xnew(xindex)

Invert interpolation to give the variable associated with a desired interpolation function value

I am trying to invert an interpolated function using scipy's interpolate function. Let's say I create an interpolated function,
import scipy.interpolate as interpolate
interpolatedfunction = interpolated.interp1d(xvariable,data,kind='cubic')
Is there some function that can find x when I specify a:
interpolatedfunction(x) == a
In other words, "I want my interpolated function to equal a; what is the value of xvariable such that my function is equal to a?"
I appreciate I can do this with some numerical scheme, but is there a more straightforward method? What if the interpolated function is multivalued in xvariable?

There are dedicated methods for finding roots of cubic splines. The simplest to use is the .roots() method of InterpolatedUnivariateSpline object:
spl = InterpolatedUnivariateSpline(x, y)
roots = spl.roots()
This finds all of the roots instead of just one, as generic solvers (fsolve, brentq, newton, bisect, etc) do.
x = np.arange(20)
y = np.cos(np.arange(20))
spl = InterpolatedUnivariateSpline(x, y)
print(spl.roots())
outputs array([ 1.56669456, 4.71145244, 7.85321627, 10.99554642, 14.13792756, 17.28271674])
However, you want to equate the spline to some arbitrary number a, rather than 0. One option is to rebuild the spline (you can't just subtract a from it):
solutions = InterpolatedUnivariateSpline(x, y - a).roots()
Note that none of this will work with the function returned by interp1d; it does not have roots method. For that function, using generic methods like fsolve is an option, but you will only get one root at a time from it. In any case, why use interp1d for cubic splines when there are more powerful ways to do the same kind of interpolation?
Non-object-oriented way
Instead of rebuilding the spline after subtracting a from data, one can directly subtract a from spline coefficients. This requires us to drop down to non-object-oriented interpolation methods. Specifically, sproot takes in a tck tuple prepared by splrep, as follows:
tck = splrep(x, y, k=3, s=0)
tck_mod = (tck[0], tck[1] - a, tck[2])
solutions = sproot(tck_mod)
I'm not sure if messing with tck is worth the gain here, as it's possible that the bulk of computation time will be in root-finding anyway. But it's good to have alternatives.

After creating an interpolated function interp_fn, you can find the value of x where interp_fn(x) == a by the roots of the function
interp_fn2 = lambda x: interp_fn(x) - a
There are number of options to find the roots in scipy.optimize. For instance, to use Newton's method with the initial value starting at 10:
from scipy import optimize
optimize.newton(interp_fn2, 10)
Actual example
Create an interpolated function and then find the roots where fn(x) == 5
import numpy as np
from scipy import interpolate, optimize
x = np.arange(10)
y = 1 + 6*np.arange(10) - np.arange(10)**2
y2 = 5*np.ones_like(x)
plt.scatter(x,y)
plt.plot(x,y)
plt.plot(x,y2,'k-')
plt.show()
# create the interpolated function, and then the offset
# function used to find the roots
interp_fn = interpolate.interp1d(x, y, 'quadratic')
interp_fn2 = lambda x: interp_fn(x)-5
# to find the roots, we need to supply a starting value
# because there are more than 1 root in our range, we need
# to supply multiple starting values. They should be
# fairly close to the actual root
root1, root2 = optimize.newton(interp_fn2, 1), optimize.newton(interp_fn2, 5)
root1, root2
# returns:
(0.76393202250021064, 5.2360679774997898)

If your data are monotonic you might also try the following:
inversefunction = interpolated.interp1d(data, xvariable, kind='cubic')

Mentioning another option because I found this page in a google search and the other option works for my simple use case. Hopefully it'll be of use to someone.
If the function you're interpolating is very simple and always has a 1:1 relationship between y and x, then you can simply take your data, swap x and y when you pass it into interp1d, and then call the interpolation function in that direction.
Adapting code from https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.arange(0, 10)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y)
xnew = np.arange(0, 9, 0.1)
ynew = f(xnew)
plt.plot(x, y, 'o', xnew, ynew, '-')
plt.show()
When x and y have been swapped you can call swappedInterpolationFunction(a) to get the x value where that would occur.
f = interpolate.interp1d(y, x)
xnew = np.arange(np.exp(-9/3), np.exp(0), 0.01)
ynew = f(xnew)
plt.plot(y, x, 'o', xnew, ynew, '-')
plt.title("Inverted")
plt.show()
Of course, if the function ever has multiple x values for a given y value (like sine or a parabola) then this will not work because it will no longer be a 1:1 function from x to y, and the above answers are necessary. This is just a simplification in a limited use case.

numpy polyfit yields nonsense

I am trying to fit these values:
This is my code:
for i in range(-area,area):
stDev1= []
for j in range(-area,area):
stDev0 = stDev[i+i0][j+j0]
stDev1.append(stDev0)
slices[i] = stDev1
fitV = []
xV = []
for l in range(-area,area):
y = np.asarray(slices[l])
x = np.arange(0,2*area,1)
for m in range(-area,area):
fitV.append(slices[m][l])
xV.append(l)
fit = np.polyfit(xV,fitV,4)
yfit = function(fit,area)
x100 = np.arange(0,100,1)
plt.plot(xV,fitV,'.')
plt.savefig("fits1.png")
def function(fit,area):
yfit = []
for x in range(-area,area):
yfit.append(fit[0]+fit[1]*x+fit[2]*x**2+fit[3]*x**3+fit[4]*x**4)
return(yfit)
i0 = 400
j0 = 400
area = 50
stdev = 2d np.array([1300][800]) #just an image of "noise" feel free to add any image // 2d np array you like.
This yields:
obviously this is completly wrong?
I assume I miss understand the concept of polyfit? From the doc the requirement is that I feed it with with two arrays of shape x[i] y[i]? My values in
xV = [ x_1_-50,x_1_-49,...,x_1_49,x_2_-50,...,x_49_49]
and my ys are:
fitV = [y_1_-50,y_1_-49,...,y_1_49,...y_2_-50,...,y_2_49]

I do not completely understand your program. In the future, it would be helpful if you were to distill your issue to a MCVE. But here are some thoughts:
It seems, in your data, that for a given value of x there are multiple values of y. Given (x,y) data, polyfit returns a tuple that represents a polynomial function, but no function can map a single value of x onto multiple values of y. As a first step, consider collapsing each set of y values into a single representative value using, for example, the mean, median, or mode. Or perhaps, in your domain, there's a more natural way to do this.
Second, there is an idiomatic way to use the pair of functions np.polyfit and np.polyval, and you're not using them in the standard way. Of course, numerous useful departures from this pattern exist, but first make sure you understand the basic pattern of these two functions.
a. Given your measurements y_data, taken at times or locations x_data, plot them and make a guess as to the order of the fit. That is, does it look like a line? Like a parabola? Let's assume you believe your data to be parabolic, and that you'll use a second order polynomial fit.
b. Make sure that your arrays are sorted in order of increasing x. There are many ways to do this, but np.argsort is a easy one.
c. Run polyfit: p = polyfit(x_data,y_data,2), which returns a tuple containing the 2nd, 1st, and 0th order coefficients in p, (c2,c1,c0).
d. In the idiomatic use of polyfit and polyval, next you would generate your fit: polyval(p,x_data). Or perhaps you want the fit to be sampled more coarsely or finely, in which case you might take a subset of x_data or interpolate more values in x_data.
A complete example is below.
import numpy as np
from matplotlib import pyplot as plt
# these are your measurements, unsorted
x_data = np.array([18, 6, 9, 12 , 3, 0, 15])
y_data = np.array([583.26347805, 63.16059915, 100.94286909, 183.72581827, 62.24497418,
134.99558191, 368.78421529])
# first, sort both vectors in increasing-x order:
sorted_indices = np.argsort(x_data)
x_data = x_data[sorted_indices]
y_data = y_data[sorted_indices]
# now, plot and observe the parabolic shape:
plt.plot(x_data,y_data,'ks')
plt.show()
# generate the 2nd order fitting polynomial:
p = np.polyfit(x_data,y_data,2)
# make a more finely sampled x_fit vector with, for example
# 1024 equally spaced points between the first and last
# values of x_data
x_fit = np.linspace(x_data[0],x_data[-1],1024)
# now, compute the fit using your polynomial:
y_fit = np.polyval(p,x_fit)
# and plot them together:
plt.plot(x_data,y_data,'ks')
plt.plot(x_fit,y_fit,'b--')
plt.show()
Hope that helps.

How to best perform a surface integral over 2D point data?

I have a data set of 363 x- by 190 y-points with an associated functional value that I would to integrate over multiple different subregions.. I've tried to create a SciPy interp2d function to integrate; however, creating that function even with linear interpolation has taken over 2 hours (and is not yet done).
What is a better approach to perform this task?
Some snippets below...
In the convert_RT_to_XY function below, imb/jmb are the r,theta mesh boundaries that I convert to Cartesian boundaries.
Later, in my code, I convert the mesh boundaries (imb/jmb) to mesh-center values (imm,jmm), convert to vectors (iX, iY), convert my function a vector (iZ), and then attempt to make my interpolation function.
# Convert R, T mesh vectors to X, Y mesh arrays.
def convert_RT_to_XY(imb, jmb):
R, T = np.meshgrid(imb,jmb)
X = R * np.cos(np.radians(T*360))
Y = R * np.sin(np.radians(T*360))
return(X, Y)
...
imm = imb[:-1]+np.divide(np.diff(imb),2)
jmm = jmb[:-1]+np.divide(np.diff(jmb),2)
iX, iY = convert_RT_to_XY(imm, jmm)
iX = np.ndarray.flatten(iX)
iY = np.ndarray.flatten(iY)
iZ = np.ndarray.flatten(plot_function)
f = interpolate.interp2d(iX, iY, iZ, kind='linear')
Ultimately, I want to perform:
result = dblquad(f, 10, 30,
lambda x: 10,
lambda x: 30))

Look into SciPy's RectBivariateSpline. If you are placing your data on a Cartesian grid anyway, it performs much faster than interp2D

3D interpolation of 2 lists in python

hi i have two sets of data taken from two seperate import files which are both being imported into python and have been placed in two seperate lists as follows:
list 1 is of the form:
(node, x coordinate, y coordinate, z coordinate)
example list 1: [[1,0,0,0],[2,1,0,0],[3,0,1,0],[4,1,1,0],[5,0,0,1],[6,1,0,1],[7,0,1,1],[8,1,1,1]]
list 2 is in the form:
(x coordinate, y coordinate, z coordinate, temperature)
example list 2: [[0,0,0,100],[1,0,0,90],[0,1,0,85],[1,1,0,110],[0,0,1,115],[1,0,1,118],[0,1,1,100],[1,1,11,96]]
from these two lists I need to use the coordinates to create a third list which contains a node value and its corresponding temperature. This task is a simple dictionary function if all the x y and z coordinates match up however with the data i am working with this will not always be the case.
For example if in list 1 I add a new entry at the end of the list, node number 9;
new entry at end of list 1 [9, 0.5, 0.9, 0.25]
Now I find myself with a node number with no corresponding temperature. At this point an interpolation function will need to be performed on list 2 to give me the temperature related to this node. Through basic 3d interpolation calculations I have worked out that this temperature will be 97.9 therefore my final output list would look like this:
Output list:
(node, temperature)
Output list: [[1,100],[2,90],[3,85],[4,110],[5,115],[6,118],[7,100],[8,96],[9,97.9]]
I am reasonably new to python so am struggling to find a solution to this interpolation problem, I have been researching how to do this for a number of weeks now and have still not been able to find a solution.
Any help would be greatly greatly appreciated,
Thanks

There are quite a few interpolation routines in scipy, but above 2 dimensions, most of them only offer linear and nearest neighbour interpolation - which might not be sufficient for your use.
All of the interpolation routiens are listed on the interplation page of the scipy docs area. Straight away you can ignore the mnivariate, and 1D and 2D spline sections - you want the multivariate section.
There are 9 functions here, split into structured and unstructed data:
Unstructured data:
griddata(points, values, xi[, method, ...]) Interpolate unstructured
D-dimensional data.
LinearNDInterpolator(points, values[, ...]) Piecewise linear interpolant in N dimensions.
NearestNDInterpolator(points, values) Nearest-neighbour interpolation in N dimensions.
CloughTocher2DInterpolator(points, values[, tol]) Piecewise cubic, C1 smooth, curvature-minimizing interpolant in 2D.
Rbf(*args) A class for radial basis function approximation/interpolation of n-dimensional scattered data.
interp2d(x, y, z[, kind, copy, ...]) Interpolate over a 2-D grid. For >
data on a grid:
interpn(points, values, xi[, method, ...]) Multidimensional
interpolation on regular grids.
RegularGridInterpolator(points, values[, ...]) Interpolation on a regular grid in arbitrary dimensions
RectBivariateSpline(x, y, z[, bbox, kx, ky, s]) Bivariate spline approximation over a rectangular mesh.
plus an additional one in the see also section, though we'll ignore that.
You should read how they each work, it might help you understand a little better.
The way these functions work though, is that you pass them data i.e. x,y,z coords, and the corresponding values at those points, and they then return a function which allows you to get a point at any location.
I would recommend the Rbf function here though, as from what i can see it's the only nD option which does not limit you to linear or nearest neighbour interpolation.
For example, you have two lists:
node_locations = [(node, x_coord, y_coord, z_coord), ...]
temp_data = [(x0, y0, z0, temp0), (x1, y1, z1, temp1), ...]
xs, ys, zs, temps = zip(*teemp_data) # This will unpack your data into columns, rather than rows.
from scipy.interpolate import Rbf
rbfi = Rbf(xs, ys, zs, temps)
# I don't know how you want your output data, so i'm just dumping it in a dictionary.
node_data = {}
for node, x, y, z in node_locations:
node_data[node] = rbfi(x, y, z)
Try something like that.

For scientific computing, I wouldn't use lists but numpy arrays instead.
So in your case:
import numpy as np
nodes = np.array(example_list_1)
temperatures = np.array(example_list_2)
With this you can then go on to use scipy's interpolation functions, like for example:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata
from scipy.interpolate import griddata
interpolated = griddata(temperatures[:, :-1],
temperatures[:, -1],
nodes[:, 1:])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Joint CDF in numpy - python

Related

Scipy interp2d function produces z = f(x,y), I would like to solve for x

Invert interpolation to give the variable associated with a desired interpolation function value

numpy polyfit yields nonsense

How to best perform a surface integral over 2D point data?

3D interpolation of 2 lists in python

Categories

Resources