I'm trying to interpolate the following satellite ground track:
The problem is the discontinuity at (lon,lat)->(0,0), which causes a poor fitting curve in this region:
I'm not sure if a parametric interpolation will be the best approach, for now I have just applied a linear interpolation:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import interpolate
df = pd.read_table('groundtracks', sep='\t')
x = df['lon']
y = df['lat']
f = interpolate.interp1d(x, y, kind='linear')
xnew = np.linspace(x.min(), x.max(), num=x.count()*2)
ynew = f(xnew)
You shouldn't interpolate near 0, because the satellite was there at two very different points in time.
You should add 360° to the second half of your data (which might correspond with the upper left curve), so that you get a continuous curve from about -30° to 360°. Then interpolate. Then perform the reverse operation: lon[lon>180] -= 360.
As an alternative you might also try splitting the track into its two distinct halves, and interpolating them individually.
Related
One of the assumptions behind the Natural Cubic Spline is that at the endpoints of the interval of interpolation, the second derivative of the spline polynomials is set to be equal to 0. I tried to show that using the Natural Cubic Spline via from scipy.interpolate import CubicSplines in the example (code below).
from scipy.interpolate import CubicSpline
from numpy import linspace
import matplotlib.pyplot as plt
runge_f = lambda x: 1 / (1 + 25*x**2)
x = linspace(-2, 2, 11)
y = runge_f(x)
cs = CubicSpline(x, y, bc_type = "natural")
t = linspace(-5, 5, 1000)
plt.plot(x, y, "p", color="red")
plt.plot(t, runge_f(t), color="black")
plt.plot(t, cs(t), color="lightblue")
plt.show()
In the presented example, the extrapolated points' curvature is not equal to zero - shouldn't the extrapolation outside the interval be linear in the Natural Cubic Spline?
The curvature (second derivative) of the spline at the end points is indeed 0, as you can check by running this
print(cs(x[0],2), cs(x[-1], 2))
which calculates second derivatives at both ends of your x interpolation interval. However that does not mean the spline is flat beyond the limits -- it continues on as a cubic polynomial. If you want to extrapolate linearly outside the range, you have to do it yourself. It is easy enough to extrapolate flat: replace your cs=.. line with
from scipy.interpolate import interp1d
cs = interp1d(x,y,fill_value = (y[0],y[-1]), bounds_error = False)
to get something like this:
but a bit more work to extrapolate linearly. I am not sure there is a Python function for that (or rather I am sure there is one I just don't know what it is)
I'm wondering if there is a good way to match a Gaussian normal to a histogram in the form of a numpy array np.histogram(array, bins).
How can such a curve been plotted on the same graph and adjusted in height and width to the histogram?
You can fit your histogram using a Gaussian (i.e. normal) distribution, for example using scipy's curve_fit. I have written a small example below. Note that depending on your data, you may need to find a way to make good guesses for the starting values for the fit (p0). Poor starting values may cause your fit to fail.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
from scipy.stats import norm
def fit_func(x,a,mu,sigma,c):
"""gaussian function used for the fit"""
return a * norm.pdf(x,loc=mu,scale=sigma) + c
#make up some normally distributed data and do a histogram
y = 2 * np.random.normal(loc=1,scale=2,size=1000) + 2
no_bins = 20
hist,left = np.histogram(y,bins=no_bins)
centers = left[:-1] + (left[1] - left[0])
#fit the histogram
p0 = [2,0,2,2] #starting values for the fit
p1,_ = curve_fit(fit_func,centers,hist,p0,maxfev=10000)
#plot the histogram and fit together
fig,ax = plt.subplots()
ax.hist(y,bins=no_bins)
x = np.linspace(left[0],left[-1],1000)
y_fit = fit_func(x, *p1)
ax.plot(x,y_fit,'r-')
plt.show()
I'm new to Python and having some trouble with matplotlib. I currently have data that is contained in two numpy arrays, call them x and y, that I am plotting on a scatter plot with coordinates for each point (x, y) (i.e I have points x[0], y[0] and x1, y1 and so on on my plot). I have been using the following code segment to color the points in my scatter plot based on the spatial density of nearby points (found this on another stackoverflow post):
http://prntscr.com/abqowk
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
x = np.random.normal(size=1000)
y = x*3 + np.random.normal(size=1000)
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
idx = z.argsort()
fig,ax = plt.subplots()
ax.scatter(x,y,c=z,s=50,edgecolor='')
plt.show()
Output:
I've been using it without being sure exactly how it works (namely the point density calculation - if someone could explain how exactly that works, would also be much appreciated).
However, now I'd like to color code by the ratio of the spatial density of points in x,y to that of the spatial density of points in another set of numpy arrays, call them x2, y2. That is, I would like to make a plot such that I can identify how the density of points in x,y compares to the points in x2,y2 on the same scatter plot. Could someone please explain how I could go about doing this?
Thanks in advance for your help!
I've been trying to do the same thing based on that same earlier post, and I think I just figured it out! The trick is to use matplotlib.colors.Normalize() to define a scale and then weight it according to some data set (xnorm,ynorm):
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mplc
import matplotlib.cm as cm
from scipy.stats import gaussian_kde
def kdeplot(x,y,xnorm,ynorm):
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
wt = 1.0*len(x)/(len(xnorm)*1.0)
norm = mplc.Normalize(vmin=0, vmax=8/wt)
cmap = cm.gnuplot
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
args = (x,y)
kwargs = {'c':z,'s':10,'edgecolor':'','cmap':cmap,'norm':norm}
return args, kwargs
# (x1,y1) is some data set whose density map coloring you
# want to scale to (xnorm,ynorm)
args,kwargs = kdeplot(x1,y1,xnorm,ynorm)
plt.scatter(*args,**kwargs)
I used trial and error to optimize my normalization for my particular data and choice of colormap. Here's what my data looks like scaled to itself; here's my data scaled to some comparison data (which is on the bottom of that image).
I'm not sure this method is entirely general, but it works in my case: I know that my data and the comparison data are in similar regions of parameter space, and they both have gaussian scatter, so I can use a naive linear scaling determined by the number of data points and it results in something that gives the right idea visually.
I have two arrays of data that correspond to x and y values, that I would like to interpolate with a cubic spline.
I have tried to do this, but my interpolated function doesn't pass through my data points.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
re = np.array([0.2,2,20,200,2000,20000],dtype = float)
cd = np.array([103,13.0,2.72,0.800,0.401,0.433],dtype = float)
plt.yscale('log')
plt.xscale('log')
plt.xlabel( "Reynold's number" )
plt.ylabel( "Drag coefficient" )
plt.plot(re,cd,'x', label='Data')
x = np.linspace(0.2,20000,200000)
f = interp1d(re,cd,kind='cubic')
plt.plot(x,f(x))
plt.legend()
plt.show()
What I end up with looks like this;
Which is clearly an awful representation of my function. What am I missing here?
Thank you.
You can get the result you probably expect (smooth spline on the log axes) by doing this:
f = interp1d(np.log(re),np.log(cd), kind='cubic')
plt.plot(x,np.exp(f(np.log(x))))
This will build the interpolation in the log space and plot it correctly. Plot your data on a linear scale to see how the cubic has to flip to get the tail on the left hand side.
The main thing you are missing is the log scaling on your axes. The spline shown is not an unreasonable result given your input data. Try drawing the plot with plt.xscale('linear') instead of plt.xscale('log'). Perhaps a cubic spline is not the best interpolation technique, at least on the raw data. A better option may be to interpolate on the log of the data insead.
Given some data of shape 20x45, where each row is a separate data set, say 20 different sine curves with 45 data points each, how would I go about getting the same data, but with shape 20x100?
In other words, I have some data A of shape 20x45, and some data B of length 20x100, and I would like to have A be of shape 20x100 so I can compare them better.
This is for Python and Numpy/Scipy.
I assume it can be done with splines, so I am looking for a simple example, maybe just 2x10 to 2x20 or something, where each row is just a line, to demonstrate the solution.
Thanks!
Ubuntu beat me to it while I was typing this example, but his example just uses linear interpolation, which can be more easily done with numpy.interpolate... (The difference is only a keyword argument in scipy.interpolate.interp1d, however).
I figured I'd include my example, as it shows using scipy.interpolate.interp1d with a cubic spline...
import numpy as np
import scipy as sp
import scipy.interpolate
import matplotlib.pyplot as plt
# Generate some random data
y = (np.random.random(10) - 0.5).cumsum()
x = np.arange(y.size)
# Interpolate the data using a cubic spline to "new_length" samples
new_length = 50
new_x = np.linspace(x.min(), x.max(), new_length)
new_y = sp.interpolate.interp1d(x, y, kind='cubic')(new_x)
# Plot the results
plt.figure()
plt.subplot(2,1,1)
plt.plot(x, y, 'bo-')
plt.title('Using 1D Cubic Spline Interpolation')
plt.subplot(2,1,2)
plt.plot(new_x, new_y, 'ro-')
plt.show()
One way would be to use scipy.interpolate.interp1d:
import scipy as sp
import scipy.interpolate
import numpy as np
x=np.linspace(0,2*np.pi,45)
y=np.zeros((2,45))
y[0,:]=sp.sin(x)
y[1,:]=sp.sin(2*x)
f=sp.interpolate.interp1d(x,y)
y2=f(np.linspace(0,2*np.pi,100))
If your data is fairly dense, it may not be necessary to use higher order interpolation.
If your application is not sensitive to precision or you just want a quick overview, you could just fill the unknown data points with averages from neighbouring known data points (in other words, do naive linear interpolation).