Points that follow a 1/R density distribution in an XY grid? - python

I have a XY grid with some gridpoints having certain values assigned to them, in this case, each value means a certain mass, so basically point masses in a grid. I now want to obtain a set of points which follow a density distribution of 1/R, where R is the distance from the center, so R = sqrt(x^2 + y^2). By density distribution, I mean the number of points has to fall off as 1/R. How would I go about coding this?
My code is below:
import numpy as np
x = np.linspace(-50,50,100)
y = np.linspace(-50,50,100)
X, Y = np.meshgrid(x,y)
zeta_a = (25,25)
zeta_b = (-10,5)
M_a = 150
M_b = 150
The zeta_a and zeta_b correspond to 2 point masses having masses of 150 units. I also need to perform follow up calculations using these points, so i'd also like to know how to use a more general format rather than using 'a','b','c' for n-point masses.
Thanks for your help.

Assuming I understood your question (if not comments are welcomed):
The way to create any given distribution is by interpolating over the inverse of the distribution CDF. This is my function to do it:
import numpy as np
import matplotlib.pyplot as plt
def randdist(PDF, x, n):
"""Create a distribution following PDF(x). PDF and x
must be of the same length. n is the number of samples."""
fp = np.random.rand(n,)
CDF = np.cumsum(PDF)
return np.interp(fp, CDF, x)
Now, in your case we're going to work in polar coordinates with R distributed as 1/r and Theta uniformly distributed:
num = 1000 # The number of points
r = np.linspace(-50, 50, 100)
PDF = np.abs(1/r)
PDF = PDF/np.sum(PDF) # PDF should be normalized
R = randdist(PDF, r, num)
Theta = 2*np.pi*np.random.rand(num,)
Now let's create the points x and y vectors
x = [R[k]*np.cos(Theta[k]) for k in range(num)]
y = [R[k]*np.sin(Theta[k]) for k in range(num)]
To plot
plot(x,y,'.')
Note that in my answer there is a hard cutoff at r=50. There are ways to overcome this but for now I leave it as it is.
Now you seem to also want to embed the points inside a 2D grid, much like a histogram. You can do that using
z, _, _ = np.histogram2d(x, y, [100, 100])

Related

How to compute and plot the pdf from the empirical cdf?

I have two numpy arrays, one is an array of x values and the other an array of y values and together they give me the empirical cdf. E.g.:
plt.plot(xvalues, yvalues)
plt.show()
I assume the data needs to be smoothed somehow in order to give a smooth pdf.
I would like to plot the pdf. How can I do that?
The raw data is at: http://dpaste.com/1HVK5DR .
There are two main problems: Your data seems to be quite noisy, and it is not equally spaced: The points at the low end are sampled quite densly, while the ponts at the high end are sampled quite sparsely. This can cause numerical issues.
So first I suggest resampling the data using a linear interpolation to get equaly spaced samples: (Note that all the snippets appended to eachother form the content of one python file.)
import matplotlib.pyplot as plt
import numpy as np
from data import xvalues, yvalues #load data from file
print("#datapoints: {}".format(len(xvalues)))
#don't use every point if your computer is not very fast
xv = np.array(xvalues)[::5]
yv = np.array(yvalues)[::5]
#interpolate to have evenly space data
xi = np.linspace(xv.min(), xv.max(), 400)
yi = np.interp(xi, xv, yv)
Then, to smoothen the data, I suggest performing a RBF regression (=using an "RBF Network"). The idea is fiting a curve of the form
c(t) = sum a(i) * phi(t - x(i)) #(not part of the program)
where phi is some radial basis function. (In theory we could use any functions.) To have a very smooth result I choose a very smooth function, namely a gaussian: phi(x) = exp( - x^2/sigma^2) where sigma is yet to be determined. The x(i) are just some nodes that we can define. If we have a smooth function, we just need a few nodes. The number of nodes also determines how much computation needs to be done. The a(i) are the coefficients we can optimize to get the best fit. In this case I just use a least squares approach.
Note that IF we can write a function in the form above, it is very easy to compute the derivative, it is just
c(t) = sum a(i) * phi'(t - x(i))
where phi' is the derivative of phi. #(not part of the program)
Regarding sigma: It is usually a good idea to choose it as a multiple of the step between the nodes we chose. The greater we choose sigma, the smoother the resulting function gets.
#set up rbf network
rbf_nodes = xv[::50][None, :]#use a subset of the x-values as rbf nodes
print("#rbfs: {}".format(rbf_nodes.shape[1]))
#estimate width of kernels:
sigma = 20 #greater = smoother, this is the primary parameter to play with
sigma *= np.max(np.abs(rbf_nodes[0,1:]-rbf_nodes[0,:-1]))
# kernel & derivative
rbf = lambda r:1/(1+(r/sigma)**2)
Drbf = lambda r: -2*r*sigma**2/(sigma**2 + r**2)**2
#compute coefficients of rbf network
r = np.abs(xi[:, None]-rbf_nodes)
A = rbf(r)
coeffs = np.linalg.lstsq(A, yi, rcond=None)[0]
print(coeffs)
#evaluate rbf network
N=1000
xe = np.linspace(xi.min(), xi.max(), N)
Ae = rbf(xe[:, None] - rbf_nodes)
ye = Ae # coeffs
#evaluate derivative
N=1000
xd = np.linspace(xi.min(), xi.max(), N)
Bd = Drbf(xe[:, None] - rbf_nodes)
yd = Bd # coeffs
fig,ax = plt.subplots()
ax2 = ax.twinx()
ax.plot(xv, yv, '-')
ax.plot(xi, yi, '-')
ax.plot(xe, ye, ':')
ax2.plot(xd, yd, '-')
fig.savefig('graph.png')
print('done')
You need the derivative to go from CDF to PDF
PDF(x) = d CDF(x)/ dx
With NumPy, you could use gradient
pdf = np.gradient(yvalues, xvalues)
plt.plot(xvalues, pdf)
plt.show()
or manual differential
pdf = np.diff(yvalues)/np.diff(xvalues)
l = np.asarray(xvalues[:-1])
r = np.asarray(xvalues[1:])
plt.plot((l+r)/2.0, pdf) # points in the middle of interval
plt.show()
Both produce something like, updated picture it got botched somehow

Finding the correct x and y widths of 2D array for Gaussian fit

I'd like to fit my 2D numpy array (image) data to a Gaussian. I've read a lot of examples using scipy.optimize, and I've tried but the fit has never been good -- this is probably because my background is non-zero, and sometimes I have other peaks too. I think it might be easier for me to simply generate a Gaussian that has the parameters of the correct peak. I already have the subpixel centroid coordinates x and y of the peak I want, and can easily get the amplitude of the peak with data[y][x], although I guess I would have to round the coordinates. What I'm stuck on now is the x and y widths. My Gaussian function looks like this:
import numpy as np
def gaussian_func(xy, x0, y0, width_x, width_y, amp): #x0 and y0 are the centroid coordinates
x = xy[0]
y = xy[1]
offset = np.min(data) #should this be a median value of the background instead?
a = 1/(2*width_x**2)
c = 1/(2*width_y**2)
exp_term = a*(x-x0)**2 + c*(y-y0)**2
return (offset + amp * np.exp(-exp_term)).ravel()
x, y = np.arange(0, np.shape(data)[1], 1), np.arange(0, np.shape(data)[0], 1)
xx, yy = np.meshgrid(x, y)
gaussian = gaussian_func((xx, yy), x0, y0, width_x, width_y, amp)
gaussian = np.reshape(gaussian, np.shape(data))
So I'm basically just confused on what to insert for width_x and width_y. I know these terms are supposed to be interchangeable with the standard deviations in x and y, but when I tried simply using np.std(data), I got bad results. Do the widths correspond to the actual physical widths of the peak? If so, how do I find those? Thanks!

Basemap streamplot blank sphere

I want to make a streamplot in Basemap module, but I get a blank sphere. Please help me resolve this problem. I use matplotlib 1.3 and ordinary streamplot is working fine.
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
map = Basemap(projection='ortho',lat_0=45,lon_0=-100,resolution='l')
# draw lat/lon grid lines every 30 degrees.
map.drawmeridians(np.arange(0,360,30))
map.drawparallels(np.arange(-90,90,30))
# prepare grids
lons = np.linspace(0, 2*np.pi, 100)
lats = np.linspace(-np.pi/2, np.pi/2, 100)
lons, lats = np.meshgrid(lons, lats)
# parameters for vector field
beta = 0.0
alpha = 1.0
u = -np.cos(lats)*(beta - alpha*np.cos(2.0*lons))
v = alpha*(1.0 - np.cos(lats)**2)*np.sin(2.0*lons)
speed = np.sqrt(u*u + v*v)
# compute native map projection coordinates of lat/lon grid.
x, y = map(lons*180./np.pi, lats*180./np.pi)
# contour data over the map.
cs = map.streamplot(x, y, u, v, latlon = True, color = speed, cmap=plt.cm.autumn, linewidth=0.5)
plt.show()
I can't exactly tell you what's wrong, but from the matplotlib.streamplot manual:
matplotlib.pyplot.streamplot(x, y, u, v, density=1, linewidth=None,
color=None, cmap=None, norm=None, arrowsize=1, arrowstyle=u'-|>',
minlength=0.1, transform=None, zorder=1, hold=None)ΒΆ
Draws streamlines of a vector flow.
x, y : 1d arrays
an evenly spaced grid.
u, v : 2d arrays
x and y-velocities. Number of rows should match length of y, and the number of columns should match x.
Additionally from matplotlib.basemap.streamplot you can read that
If latlon keyword is set to True, x,y are intrepreted as longitude and latitude in degrees.
Which corresponds to the fact that x and y should be 1D arrays (lat, lon). However in your example x and y are
>>> np.shape(x)
(100, 100)
>>> np.shape(y)
(100, 100)
Then again you call the method map() "to compute native map projection coordinates of lat/lon grid" which is coincidentally the same as the name of your basemap.map. So it depends on which one do you want? Because both will return a value! (or better to say, both will return an error)
Aditionally check out the values you have in your u array. They are of range e-17. While other values are easily in the range e+30. IIRC the way you get streamlines is by solving a differential equation in which points you sent it as values are used as parameters at coordinates you sent. It's not hard to imagine a that while calculating something with these numbers a floating point round-off occurs and you suddenly start getting NaN or 0 values.
Try to scale your example better or if you want to pursue the solution to the end you can try and use np.seterr to get a more detailed idea where it fails.
Sorry I couldn't have been of a bigger help.

How to generate equispaced interpolating values

I have a list of (x,y) values that are not uniformly spaced. Here is the archive used in this question.
I am able to interpolate between the values but what I get are not equispaced interpolating points. Here's what I do:
x_data = [0.613,0.615,0.615,...]
y_data = [5.919,5.349,5.413,...]
# Interpolate values for x and y.
t = np.linspace(0, 1, len(x_data))
t2 = np.linspace(0, 1, 100)
# One-dimensional linear interpolation.
x2 = np.interp(t2, t, x_data)
y2 = np.interp(t2, t, y_data)
# Plot x,y data.
plt.scatter(x_data, y_data, marker='o', color='k', s=40, lw=0.)
# Plot interpolated points.
plt.scatter(x2, y2, marker='o', color='r', s=10, lw=0.5)
Which results in:
As can be seen, the red dots are closer together in sections of the graph where the original points distribution is denser.
I need a way to generate the interpolated points equispaced in x, y according to a given step value (say 0.1)
As askewchan correctly points out, when I mean "equispaced in x, y" I mean that two consecutive interpolated points in the curve should be distanced from each other (euclidean straight line distance) by the same value.
I tried unubtu's answer and it works well for smooth curves but seems to break for not so smooth ones:
This happens because the code calculates the point distance in an euclidean way instead of directly over the curve and I need the distance over the curve to be the same between points. Can this issue be worked around somehow?
Convert your xy-data to a parametrized curve, i.e. calculate all all distances between the points and generate the coordinates on the curve by cumulative summing. Then interpolate the x- and y-coordinates independently with respect to the new coordinates.
import numpy as np
from matplotlib import pyplot as plt
data = '''0.615 5.349
0.615 5.413
0.617 6.674
0.617 6.616
0.63 7.418
0.642 7.809
0.648 8.04
0.673 8.789
0.695 9.45
0.712 9.825
0.734 10.265
0.748 10.516
0.764 10.782
0.775 10.979
0.783 11.1
0.808 11.479
0.849 11.951
0.899 12.295
0.951 12.537
0.972 12.675
1.038 12.937
1.098 13.173
1.162 13.464
1.228 13.789
1.294 14.126
1.363 14.518
1.441 14.969
1.545 15.538
1.64 16.071
1.765 16.7
1.904 17.484
2.027 18.36
2.123 19.235
2.149 19.655
2.172 20.096
2.198 20.528
2.221 20.945
2.265 21.352
2.312 21.76
2.365 22.228
2.401 22.836
2.477 23.804'''
data = np.array([line.split() for line in data.split('\n')],dtype=float)
x,y = data.T
xd = np.diff(x)
yd = np.diff(y)
dist = np.sqrt(xd**2+yd**2)
u = np.cumsum(dist)
u = np.hstack([[0],u])
t = np.linspace(0,u.max(),10)
xn = np.interp(t, u, x)
yn = np.interp(t, u, y)
f = plt.figure()
ax = f.add_subplot(111)
ax.set_aspect('equal')
ax.plot(x,y,'o', alpha=0.3)
ax.plot(xn,yn,'ro', markersize=8)
ax.set_xlim(0,5)
Let's first consider a simple case. Suppose your data looked like the blue line,
below.
If you wanted to select equidistant points that were r distance apart,
then there would be some critical value for r where the cusp at (1,2) is the first equidistant point.
If you wanted points that were greater than this critical distance apart, then
the first equidistant point would jump from (1,2) to some place very different --
depicted by the intersection of the green arc with the blue line. The change is not gradual.
This toy case suggests that a tiny change in the parameter r can have a radical, discontinuous affect on the solution.
It also suggests that you must know the location of the ith equidistant point
before you can determine the location of the (i+1)-th equidistant point.
So it appears an iterative solution is required:
import numpy as np
import matplotlib.pyplot as plt
import math
x, y = np.genfromtxt('data', unpack=True, skip_header=1)
# find lots of points on the piecewise linear curve defined by x and y
M = 1000
t = np.linspace(0, len(x), M)
x = np.interp(t, np.arange(len(x)), x)
y = np.interp(t, np.arange(len(y)), y)
tol = 1.5
i, idx = 0, [0]
while i < len(x):
total_dist = 0
for j in range(i+1, len(x)):
total_dist += math.sqrt((x[j]-x[j-1])**2 + (y[j]-y[j-1])**2)
if total_dist > tol:
idx.append(j)
break
i = j+1
xn = x[idx]
yn = y[idx]
fig, ax = plt.subplots()
ax.plot(x, y, '-')
ax.scatter(xn, yn, s=50)
ax.set_aspect('equal')
plt.show()
Note: I set the aspect ratio to 'equal' to make it more apparent that the points are equidistant.
The following script will interpolate points with a equal step of x_max - x_min / len(x) = 0.04438
import numpy as np
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
data = np.loadtxt('data.txt')
x = data[:,0]
y = data[:,1]
f = interp1d(x, y)
x_new = np.linspace(np.min(x), np.max(x), x.shape[0])
y_new = f(x_new)
plt.plot(x,y,'o', x_new, y_new, '*r')
plt.show()
Expanding on the answer by #Christian K., here's how to do this for higher dimensional data with scipy.interpolate.interpn. Let's say we want to resample to 10 equally-spaced points:
import numpy as np
import scipy
# Assuming that 'data' is rows x dims (where dims is the dimensionality)
diffs = data[1:, :] - data[:-1, :]
dist = np.linalg.norm(diffs, axis=1)
u = np.cumsum(dist)
u = np.hstack([[0], u])
t = np.linspace(0, u[-1], 10)
resampled = scipy.interpolate.interpn((u,), pts, t)
It IS possible to generate equidistant points along the curve. But there must be more definition of what you want for a real answer. Sorry, but the code I've written for this task is in MATLAB, but I can describe the general ideas. There are three possibilities.
First, are the points to be truly equidistant from the neighbors in terms of a simple Euclidean distance? To do so would involve finding the intersection at any point on the curve with a circle of a fixed radius. Then just step along the curve.
Next, if you intend distance to mean distance along the curve itself, if the curve is a piecewise linear one, the problem is again easy to do. Just step along the curve, since distance on a line segment is easy to measure.
Finally, if you intend for the curve to be a cubic spline, again this is not incredibly difficult, but is a bit more work. Here the trick is to:
Compute the piecewise linear arclength from point to point along the curve. Call it t.
Generate a pair of cubic splines, x(t), y(t).
Differentiate x and y as functions of t. Since these are cubic segments, this is easy. The derivative functions will be piecewise quadratic.
Use an ode solver to move along the curve, integrating the differential arclength function. In MATLAB, ODE45 worked nicely.
Thus, one integrates
sqrt((x')^2 + (y')^2)
Again, in MATLAB, ODE45 can be set to identify those locations where the function crosses certain specified points.
If your MATLAB skills are up to the task, you can look at the code in interparc for more explanation. It is reasonably well commented code.

Fitting a line in 3D

Are there any algorithms that will return the equation of a straight line from a set of 3D data points? I can find plenty of sources which will give the equation of a line from 2D data sets, but none in 3D.
Thanks.
If you are trying to predict one value from the other two, then you should use lstsq with the a argument as your independent variables (plus a column of 1's to estimate an intercept) and b as your dependent variable.
If, on the other hand, you just want to get the best fitting line to the data, i.e. the line which, if you projected the data onto it, would minimize the squared distance between the real point and its projection, then what you want is the first principal component.
One way to define it is the line whose direction vector is the eigenvector of the covariance matrix corresponding to the largest eigenvalue, that passes through the mean of your data. That said, eig(cov(data)) is a really bad way to calculate it, since it does a lot of needless computation and copying and is potentially less accurate than using svd. See below:
import numpy as np
# Generate some data that lies along a line
x = np.mgrid[-2:5:120j]
y = np.mgrid[1:9:120j]
z = np.mgrid[-5:3:120j]
data = np.concatenate((x[:, np.newaxis],
y[:, np.newaxis],
z[:, np.newaxis]),
axis=1)
# Perturb with some Gaussian noise
data += np.random.normal(size=data.shape) * 0.4
# Calculate the mean of the points, i.e. the 'center' of the cloud
datamean = data.mean(axis=0)
# Do an SVD on the mean-centered data.
uu, dd, vv = np.linalg.svd(data - datamean)
# Now vv[0] contains the first principal component, i.e. the direction
# vector of the 'best fit' line in the least squares sense.
# Now generate some points along this best fit line, for plotting.
# I use -7, 7 since the spread of the data is roughly 14
# and we want it to have mean 0 (like the points we did
# the svd on). Also, it's a straight line, so we only need 2 points.
linepts = vv[0] * np.mgrid[-7:7:2j][:, np.newaxis]
# shift by the mean to get the line in the right place
linepts += datamean
# Verify that everything looks right.
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d as m3d
ax = m3d.Axes3D(plt.figure())
ax.scatter3D(*data.T)
ax.plot3D(*linepts.T)
plt.show()
Here's what it looks like:
If your data is fairly well behaved then it should be sufficient to find the least squares sum of the component distances. Then you can find the linear regression with z independent of x and then again independent of y.
Following the documentation example:
import numpy as np
pts = np.add.accumulate(np.random.random((10,3)))
x,y,z = pts.T
# this will find the slope and x-intercept of a plane
# parallel to the y-axis that best fits the data
A_xz = np.vstack((x, np.ones(len(x)))).T
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]
# again for a plane parallel to the x-axis
A_yz = np.vstack((y, np.ones(len(y)))).T
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]
# the intersection of those two planes and
# the function for the line would be:
# z = m_yz * y + c_yz
# z = m_xz * x + c_xz
# or:
def lin(z):
x = (z - c_xz)/m_xz
y = (z - c_yz)/m_yz
return x,y
#verifying:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
zz = np.linspace(0,5)
xx,yy = lin(zz)
ax.scatter(x, y, z)
ax.plot(xx,yy,zz)
plt.savefig('test.png')
plt.show()
If you want to minimize the actual orthogonal distances from the line (orthogonal to the line) to the points in 3-space (which I'm not sure is even referred to as linear regression). Then I would build a function that computes the RSS and use a scipy.optimize minimization function to solve it.

Categories