Generating 3D Gaussian Data [duplicate]

Generating 3D Gaussian Data [duplicate] - python

This question already has answers here:
Generating 3D Gaussian distribution in Python
(2 answers)
Closed 2 years ago.
I'm trying to generate a 3D distribution, where x, y represents the surface plane, and z is the magnitude of some value, distributed over a range.
I'm looking at numpy's multivariate_normal, but it only lets me get a number of samples. I'd like the ability to specify some x, y coordinate, and get back what the z value should be; so I'd be able to query gp(x, y) and get back a z value that adheres to some mean and covariance.
Perhaps a more illustrative (toy) example: assume I have some temperature distribution that can be modeled as a gaussian process. So I might have a mean temperature of 20 at (0, 0), and some covariance [[1, 0], [0, 1]]. I'd like to be able to create a model that I can then query at different x, y locations to get the temperature at that position (so, at (5, 5) I might get back something like 7 degrees).
How to best accomplish this?

I assume that your data can be copied to a single np.array, which I will refer to as X in my code, with shape X.shape = (n,2), where n is the number of data points you have and you can have n = 1, if you wish to test a single point at a time. 2, of course, refers to the 2D space spanned by your coordinates (x and y) base. Then:
def estimate_gaussian(X):
return X.mean(axis=0), np.cov(X.T)
def mva_gaussian( X, mu, sigma2 ):
k = len(mu)
# check if sigma2 is a vector and, if yes, use as the diagonal of the covariance matrix
if sigma2.ndim == 1 :
sigma2 = np.diag(sigma2)
X = X - mu
return (2 * np.pi)**(-k/2) * np.linalg.det(sigma2)**(-0.5) * \
np.exp( -0.5 * np.sum( np.multiply( X.dot( np.linalg.inv(sigma2) ), X ), axis=1 ) ).reshape( ( X.shape[0], 1 ) )
will do what you want - that is, given data points you will get the value of the gaussian function at those points (or a single point). This is actually a generalized version of what you need, as this function can describe a multivariate gaussian. You seem to be interested in the k = 2 case and a diagonal covariance matrix sigma2.
Moreover, this is also a probability distribution - which you say you don't want. We don't have enough info to know what exactly it is you're trying to fit to (i.e. what you expect the three parameters of the gaussian function to be. Usually, people are interested in a normal distribution). Nevertheless, you can simply change the parameters in the return statement of the mva_gaussian function according to your needs and ignore the estimate gaussian function if you don't want a normalized distribution (although a normalized function would still give you what you seek - a real valued temperature - as long as you know the normalization process - which you do :-) ).

You can create a multivariate normal using scipy.stats.multivariate_normal.
>>> import scipy.stats
>>> dist = scipy.stats.multivariate_normal(mean=[2,3], cov=[[1,0],
[0,1]])
Then to find p(x,y) you can use pdf
>>> dist.pdf([2,3])
0.15915494309189535
>>> dist.pdf([1,1])
0.013064233284684921
Which represents the probability (which you called z) given any [x,y]

Related

Linear regression for calibrating on a 2D grid - numpy

I am running a laser experiment where I'm trying to measure some features, but I also have to deal with a linear background. This background is both a constant (whatever light I measure when the laser is off) as well as a multiplicative scale factor. This background cannot be determined analytically, so I need to do an mx+b fit on the data. But, I need to do this on every point in the field of view.
The way I'd do it would be to take calibration images at a range of uniform brightnesses, and then run a regression, assigning a unique m_ij and b_ij to every point. I could probably do this in a for loop, but that seems like it'd be insanely slow for an image that's on the order of 1 mpx.
I found a solution here that used np.vander. I've tried using that, but (a) don't quite understand what I'm doing with it, and (b) it doesn't work with curve_fit. I could use np.linalg.lstsq, but it doesn't allow me to assign yerr corresponding to the noise of the images.
My current non-working example:
def fit_many_with_error(x, y, order=2, xerrs=None, yerrs=None):
'''
arguments:
x: [N]
y: [N x S]
where:
N - # of measurements per pixel
S - # pixels
returns [`order` x S]
'''
def f(x, m, b):
return m * x + b
A = np.vander(x, N=order)
B = np.vander(y, N=order)
params = curve_fit(f, A, B, sigma=None)
return params
params = fit_many_with_error(xvals, yvals)
Which gives me ValueError: operands could not be broadcast together with shapes (20,) (20,200,100)

How to implement the following formula for derivatives in python?

I'm trying to implement the following formula in python for X and Y points
I have tried following approach
def f(c):
"""This function computes the curvature of the leaf."""
tt = c
n = (tt[0]*tt[3] - tt[1]*tt[2])
d = (tt[0]**2 + tt[1]**2)
k = n/d
R = 1/k # Radius of Curvature
return R
There is something incorrect as it is not giving me correct result. I think I'm making some mistake while computing derivatives in first two lines. How can I fix that?
Here are some of the points which are in a data frame:
pts = pd.DataFrame({'x': x, 'y': y})
x y
0.089631 97.710199
0.089831 97.904541
0.090030 98.099313
0.090229 98.294513
0.090428 98.490142
0.090627 98.686200
0.090827 98.882687
0.091026 99.079602
0.091225 99.276947
0.091424 99.474720
0.091623 99.672922
0.091822 99.871553
0.092022 100.070613
0.092221 100.270102
0.092420 100.470020
0.092619 100.670366
0.092818 100.871142
0.093017 101.072346
0.093217 101.273979
0.093416 101.476041
0.093615 101.678532
0.093814 101.881451
0.094013 102.084800
0.094213 102.288577
pts_x = np.gradient(x_c, t) # first derivatives
pts_y = np.gradient(y_c, t)
pts_xx = np.gradient(pts_x, t) # second derivatives
pts_yy = np.gradient(pts_y, t)
After getting the derivatives I am putting the derivatives x_prim, x_prim_prim, y_prim, y_prim_prim in another dataframe using the following code:
d = pd.DataFrame({'x_prim': pts_x, 'y_prim': pts_y, 'x_prim_prim': pts_xx, 'y_prim_prim':pts_yy})
after having everything in the data frame I am calling function for each row of the data frame to get curvature at that point using following code:
# Getting the curvature at each point
for i in range(len(d)):
temp = d.iloc[i]
c_temp = f(temp)
curv.append(c_temp)

You do not specify exactly what the structure of the parameter pts is. But it seems that it is a two-dimensional array where each row has two values x and y and the rows are the points in your curve. That itself is problematic, since the documentation is not quite clear on what exactly is returned in such a case.
But you clearly are not getting the derivatives of x or y. If you supply only one array to np.gradient then numpy assumes that the points are evenly spaced with a distance of one. But that is probably not the case. The meaning of x' in your formula is the derivative of x with respect to t, the parameter variable for the curve (which is separate from the parameters to the computer functions). But you never supply the values of t to numpy. The values of t must be the second parameter passed to the gradient function.
So to get your derivatives, split the x, y, and t values into separate one-dimensional arrays--lets call them x and y and t. Then get your first and second derivatives with
pts_x = np.gradient(x, t) # first derivatives
pts_y = np.gradient(y, t)
pts_xx = np.gradient(pts_x, t) # second derivatives
pts_yy = np.gradient(pts_y, t)
Then continue from there. You no longer need the t values to calculate the curvatures, which is the point of the formula you are using. Note that gradient is not really designed to calculate the second derivatives, and it absolutely should not be used to calculate third or higher-order derivatives. More complex formulas are needed for those. Numpy's gradient uses "second order accurate central differences" which are pretty good for the first derivative, poor for the second derivative, and worthless for higher-order derivatives.

I think your problem is that x and y are arrays of double values.
The array x is the independent variable; I'd expect it to be sorted into ascending order. If I evaluate y[i], I expect to get the value of the curve at x[i].
When you call that numpy function you get an array of derivative values that are the same shape as the (x, y) arrays. If there are n pairs from (x, y), then
y'[i] gives the value of the first derivative of y w.r.t. x at x[i];
y''[i] gives the value of the second derivative of y w.r.t. x at x[i].
The curvature k will also be an array with n points:
k[i] = abs(x'[i]*y''[i] -y'[i]*x''[i])/(x'[i]**2 + y'[i]**2)**1.5
Think of x and y as both being functions of a parameter t. x' = dx/dt, etc. This means curvature k is also a function of that parameter t.
I like to have a well understood closed form solution available when I program a solution.
y(x) = sin(x) for 0 <= x <= pi
y'(x) = cos(x)
y''(x) = -sin(x)
k = sin(x)/(1+(cos(x))**2)**1.5
Now you have a nice formula for curvature as a function of x.
If you want to parameterize it, use
x(t) = pi*t for 0 <= t <= 1
x'(t) = pi
x''(t) = 0
See if you can plot those and make your Python solution match it.

Python: how to randomly sample from nonstandard Cauchy distribution, hence with different parameters?

I was looking here: numpy
And I can see you can use the command np.random.standard_cauchy() specifying an array, to sample from a standard Cauchy.
I need to sample from a Cauchy which might have x_0 != 0 and gamma != 1, i.e. might not be located at the origin, nor have scale equal to 1.
How can I do this?

If you have scipy, you can use scipy.stats.cauchy, which takes a location (x0) and a scale (gamma) parameter. It exposes the rvs method to draw random samples:
x = stats.cauchy.rvs(loc=100, scale=2.5, size=1000) # draw 1000 samples

You may avoid the dependency on SciPy, since the Cauchy distribution is part of the location-scale family. That means, if you draw a sample x from Cauchy(0, 1), just shift it by x_0 and multiply with gamma and x' = x_0 + gamma * x will be distributed according to Cauchy(x_0, gamma).

Calculate a plane from point cloud in Python without Numpy

I've seen several posts on this subject, but I need a pure Python (no Numpy or any other imports) solution that accepts a list of points (x,y,z coordinates) and calculates a normal for the closest plane that to those points.
I'm following one of the working Numpy examples from here: Fit points to a plane algorithms, how to iterpret results?
def fitPLaneLTSQ(XYZ):
# Fits a plane to a point cloud,
# Where Z = aX + bY + c ----Eqn #1
# Rearanging Eqn1: aX + bY -Z +c =0
# Gives normal (a,b,-1)
# Normal = (a,b,-1)
[rows,cols] = XYZ.shape
G = np.ones((rows,3))
G[:,0] = XYZ[:,0] #X
G[:,1] = XYZ[:,1] #Y
Z = XYZ[:,2]
(a,b,c),resid,rank,s = np.linalg.lstsq(G,Z)
normal = (a,b,-1)
nn = np.linalg.norm(normal)
normal = normal / nn
return normal
XYZ = np.array([
[0,0,1],
[0,1,2],
[0,2,3],
[1,0,1],
[1,1,2],
[1,2,3],
[2,0,1],
[2,1,2],
[2,2,3]
])
print fitPLaneLTSQ(XYZ)
[ -8.10792259e-17 7.07106781e-01 -7.07106781e-01]
I'm trying to adapt this code: Basic ordinary least squares calculation to replace np.linalg.lstsq
Here is what I have so far without using Numpy using the same coords as above:
xvals = [0,0,0,1,1,1,2,2,2]
yvals = [0,1,2,0,1,2,0,1,2]
zvals = [1,2,3,1,2,3,1,2,3]
""" Basic ordinary least squares calculation. """
sumx, sumy = map(sum, [xvals, yvals])
sumxy = sum(map(lambda x, y: x*y, xvals, yvals))
sumxsq = sum(map(lambda x: x**2, xvals))
Nsamp = len(xvals)
# y = a*x + b
# a (slope)
slope = (Nsamp*sumxy - sumx*sumy) / ((Nsamp*sumxsq - sumx**2))
# b (intercept)
intercept = (sumy - slope*sumx) / (Nsamp)
a = slope
b = intercept
normal = (a,b,-1)
mag = lambda x : math.sqrt(sum(i**2 for i in x))
nn = mag(normal)
normal = [i/nn for i in normal]
print normal
[0.0, 0.7071067811865475, -0.7071067811865475]
As you can see, the answers come out the same, but that is only because of this particular example. In other examples, they don't match. If you look closely you'll see that in the Numpy example the 'z' values are fed into 'np.linalg.lstsq', but in the non-Numpy version the 'z' values are ignored. How do I work in the 'z' values to the least-squares code?
Thanks

I do not think you can get away without implementing some basic matrix operations. As this is a multivariate linear regression problem, you will definitely need dot product, transpose and norm. These are easy. The difficult part is that you also need matrix inverse or QR decomposition or something similar. People usually use BLAS for these for good reasons, implementing them is not easy - but not impossible either.
With QR decomposition
I would start by creating a Matrix class that has the following methods
dot(m1, m2) (or __matmul__(m1, m2) if you have python 3.5): it is just the sum of products, should be straightforward
transpose(self): swapping matrix elements, should be easy
norm(self): square root of sum of squares (should be only used on vectors)
qr_decomp(self): this one is tricky. For an almost pure python implementation see this rosetta code solution (disclaimer: I have not thoroughly checked this code). It uses some numpy functions, but these are basic functions you can implement for your matrix class (shape, eye, dot, copysign, norm).
leastsqr_ut(R, A): solve the equation Rx = A if R is an upper triangular matrix. Not trivial, but should be easy enough as you can solve it equation by equation from the bottom.
With these, the solution is easy:
Generate the matrix G as detailed in your numpy example
Find the QR decomposition of G
Solve Rb = Q'z for b using that R is an upper triangular matrix
Then the normal vector you are looking for is (b[0], b[1], -1) (or the norm of it if you want a unit length normal vector).
With matrix inverse
The inverse of a 3x3 matrix is relatively easy to calculate, but this method is much less numerically stable than doing QR decomposition. If it is not an important concern, then you can do the following: implement
dot(m1, m2) (or __matmul__(m1, m2) if you have python 3.5): it is just the sum of products, should be straightforward
transpose(self): swapping matrix elements, should be easy
norm(self): square root of sum of squares (should be only used on vectors)
det(self): determinant, but it is enough if it works on 2x2 and 3x3 matrices, and for those simple formulas are available
inv(self): matrix inverse. It is enough if it works on 3x3 matrices, there is a simple formula for example here
Then the formula for b is b = inv(G'G) * (G'z) and your normal vector is again (b[0], b[1], -1).
As you can see, none of these are simple, and most of it is replicating some numpy functionality while making it a lot slower lot slower. So make sure you have absolutely no other choice.

I generated a code with a similar purpose (see "tangentplane_3D" function in the linked code).
In my case I had a scatter cloud of points that define a 3D ellipsoid. For each point I wanted to determine the tangent plane to the ellipsoid containing such point --> Goal: Determination of a 3D plane.
The problem can be seen in the following way: A plane is defined by its normal and the normal can be seen as the eigenvector associated to the minimum of the eigenvalues of a n set of points.
What I did, and you can check it on the code I posted, is to select k points close to the point of interest at which I wanted to calculate the tangent plane. Then, I performed a 3D Single Value Decomposition to these k points. Finally, from these SVD I selected the minimum eigenvalue and its associated eigenvector which is, in fact, the normal of the plane best fitting my set of points, and thus in my case, tangent to the ellipsoid plane. With the normal vector and the point you can subsequently calculate the complete plane equation.
I hope it helps!!
Best wishes.

Fast 3D interpolation of atmospheric data in Numpy/Scipy

I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.

Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.

This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html

There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.