Using SciPy to interpolate data into a quadratic fit - python

I have a set of data that when plotted most points congregate to the left of the x axis:
plt.plot(x, y, marker='o')
plt.title('Original')
plt.show()
ORIGINAL GRAPH
I want to use scipy to interpolate the data and later try to fit a quadratic line to the data. I am avoiding to simply fit a quadratic curve without interpolation since this will make the obtained curve biased towards the mass of data at one extreme end of the x axis. I tried this by using
f = interp1d(x, y, kind='quadratic')
# Array with points in between min(x) and max(x) for interpolation
x_interp = np.linspace(min(x), max(x), num=np.size(x))
# Plot graph with interpolation
plt.plot(x_interp, f(x_interp), marker='o')
plt.title('Interpolated')
plt.show()
and got INTERPOLATED GRAPH.
However, what I intend to get is something like this:
EXPECTED GRAPH
What am I doing wrong?
My values for x can be found here and values for y here.
Thank you!

Solution 1
I'm pretty sure this does what you want. It fits a second degree (quadratic) polynomial to your data, then plots that function on an evenly spaced array of x values ranging from the minimum to the maximum of your original x data.
new_x = np.linspace(min(x), max(x), num=np.size(x))
coefs = np.polyfit(x,y,2)
new_line = np.polyval(coefs, new_x)
Plotting it returns:
plt.scatter(x,y)
plt.scatter(new_x,new_line,c='g', marker='^', s=5)
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
if that wasn't what you meant...
However, from your question, it seems like you might be trying to force all your original y-values onto evenly spaced x-values (if that's not your intention, let me know, and I'll just delete this part).
This is also possible, there are lots of ways to do this, but I've done it here in pandas:
import pandas as pd
xy_df=pd.DataFrame({'x_orig': x, 'y_orig': y})
sorted_x_y=xy_df.sort_values('x_orig')
sorted_x_y['new_x'] = np.linspace(min(x), max(x), np.size(x))
plt.figure(figsize=[5,5])
plt.scatter(sorted_x_y['new_x'], sorted_x_y['y_orig'])
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
Which looks pretty different from your original data... which is why I think it might not be exactly what you're looking for.

Related

Is there a way in lmfit to only show the curve of the fit?

So I wrote some code with the help of lmfit to fit a Gaussian curve on some histogram data. While the curve itself is fine, when I try to plot the results in matplotlib, it displays the fit along with the data points. In reality, I want to plot histogram bars with the curve fit. How do you do this? Or alternatively, is there a way in lmfit to only show the fit curve and then add the histogram plot and combine them together?
Relevant part of my code:
counts, bin_edges = np.histogram(some_array, bins=1000)
bin_widths = np.diff(bin_edges)
x = bin_edges[:-1] + (bin_widths / 2)
y = counts
mod = GaussianModel()
pars = mod.guess(y, x=x)
final_fit = mod.fit(y, pars, x=x)
final_fit.plot_fit()
plt.show()
Here's the graphed result:
Gaussian curve
lmfit's builtin plotting routines are minimal wrappers around matplotlib, intended to give reasonable default plots for many cases. They don't make histograms.
But the arrays are readily available and using matplotlib to make a histogram is easy. I think all you need is:
import matplotlib.pyplot as plt
plt.hist(some_array, bins=1000, rwidth=0.5, label='binned data')
plt.plot(x, final_fit.best_fit, label='best fit')
plt.legend()
plt.show()

Density plot from plotting multiple arrays

I have a MxN (say, 1000x50) array. I want to plot each 50-point line onto the same plot, and have a heatmap of their density.
Simply doing a plt.pcolor(data) is not what I want, since I don't want to plot the matrix.
This is what I want to plot, but as I said it doesn't provide me with the heatmap I need.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.rand(1000, 50)
fig, ax = plt.subplots()
for i in range(0,1000):
ax.plot(data[i], '.')
plt.show()
I would like a way of getting this together (I assume it will have something to do with histograms and binning?).
EDIT: simply adding an alpha value to the plot ( ax.plot(data[i], '.r', alpha=0.01)) achieves something similar to what I want. I would like, however, to have a heatmap with different colours.
As you already pointed out in your question, probably one of the simplest approaches involves histograms. A linear approximation of the histogram is probably enough for this application.
You can use np.histogram to calculate bin heights and edges and use scipy.interpolate.interp1d to obtain a function that provides an interpolation of the histogram. We can define a simple helper function to get the approximate density around each value in one column of the data array:
# import scipy.interpolate as interp
def get_density(vals, bins=30, kind="linear"):
y, bin_edges = np.histogram(vals, bins=bins, density=True)
x = (bin_edges[1:] + bin_edges[:-1])/2.
f = interp.interp1d(x, y, kind=kind, fill_value="extrapolate")
return f(vals)
Then you can use any colormap you want to map the density to a color value. The easiest way to go from here is to use plt.scatter instead of plot, where you can provide a specific color for every data point.
I would do something like this:
fig, ax = plt.subplots()
for i in range(data.shape[1]):
colors = plt.cm.viridis(get_density(data[:, i]))
ax.scatter(i*np.ones(data.shape[0]), data[:, i], c=colors, marker='.')

Plotting KDE with logarithmic x-data in Matplotlib

I want to plot a KDE for some data with data that covers a large range in x-values. Therefore I want to use a logarithmic scale for the x-axis. For plotting I was using seaborn and the solution from Plotting 2D Kernel Density Estimation with Python, both of which fail once I set the xscale to logarithmic. When I take the logarithm of my x-data beforehand, everything looks fine, except the tics and ticlabels are still linear with the logarithm of the actual values as the labels. I could manually change the tics using something like:
labels = np.array(ax.get_xticks().tolist(), dtype=np.float64)
new_labels = [r'$10^{%.1f}$' % (labels[i]) for i in range(len(labels))]
ax.set_xticklabels(new_labels)
but in my eyes that looks just wrong and is nothing close to the axis labels (including the minor tics) when I would just use
ax.set_xscale('log')
Is there an easier way to plot a KDE with logarithmic x-data? Or is it possible to just change the tic- or label-scale without changing the scaling of the data, so that I could plot the logarithmic values of x and change the scaling of the labels afterwards?
Edit:
The plot I want to create looks like this:
The two right columns are what it is supposed to look like. There I used the the x data with the logarithm already applied. I don't like the labels on the x-axis, though.
The left column displays the plots, when the original data is used for the kde and all the other plots, and afterwards the scale is changed using
ax.set_xscale('log')
For some reason the kde, does not look like it is supposed to look. This is also not a result of erroneous data, since it looks just fine if the logarithmic data is used.
Edit 2:
A working example of code is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.multivariate_normal((0, 0), [[0.8, 0.05], [0.05, 0.7]], 100)
x = np.power(10, data[:, 0])
y = data[:, 1]
fig, ax = plt.subplots(2, 1)
sns.kdeplot(data=np.log10(x), data2=y, ax=ax[0])
sns.kdeplot(data=x, data2=y, ax=ax[1])
ax[1].set_xscale('log')
plt.show()
The ax[1] plot is not displayed correctly for me (the x-axis is inverted), but the general behavior is the same as for the case described above. I believe the problem lies with the bandwidth of the kde, which should probably account for the logarithmic x-data.
I found an answer that works for me and wanted to post it in case someone else has a similar problem.
Based on the accepted answer from this post, I defined a function that first applies the logarithm to the x-data and after the KDE was performed, transforms the x-values back to the original values. Afterwards I can simply plot the contours and use ax.set_xscale('log')
import numpy as np
import scipy.stats as st
def logx_kde(x, y, xmin, xmax, ymin, ymax):
x = np.log10(x)
# Peform the kernel density estimate
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
return np.power(10, xx), yy, f

python How to plot scatter and regression line with more than 127 or 128?

I am trying to make a simple scatter and also overlay a simple regression. All the x,y points plot in a scatter form, as expected, no matter what. Great. My problem is that if N is >127 then all the (x,y) points are plotted, but the regression line does not extend from the min(x) to the max(x). The regression line should extend all the way from the left side (to min(x)) all the way to the max(x). What is going on here and how can I fix it?
fig1, ax1 = plt.subplots(1,1)
N=128
x=np.random.rand(N)
y=np.random.rand(N)
fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)
ya=fit_fn(x)
ax1.plot(x,y, 'bo',x, ya,'-k')
I did notice that if I change the last line to
ax1.plot(x,y, 'bo',x, ya,'-ko')
then all the points plot, but this is not what i want since this gives me a scatter plot for x,ya instead of a line.
I get it now. I'm not quite sure why that happens like that, but there's a way around it. Does this produce the same result? (see mine bellow)
import matplotlib.pyplot as plt
import numpy as np
fig1, ax1 = plt.subplots(1,1)
#distribute N random points in interval [0,1>
N=300
x=np.random.rand(N)
y=np.random.rand(N)
#get fit information
fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)
#extend fitted line interval to make sure you
#get min and max on x axis
current = np.arange(min(x), max(x), 0.01)
current_fit = np.polyval(fit_fn, current)
#you can extend it even, default is color blue
future = np.arange(min(x)-0.5, max(x)+0.5, 0.01)
future_fit = np.polyval(fit_fn, future)
#plot
ax1.plot(x,y, 'bo')
ax1.plot(current, current_fit, "-ko")
ax1.plot(future, future_fit)
plt.show()

How to make colorbar display set limits with 3D data

I've found many examples for this using features like pcolor and clim, but unless I am misusing them, they seem to only want to work for 2 dimensional data with x, y, values.
My data is formatted as follows:
x y z values
Before I get to plotting in the script I am performing an interpolation which gives me newly gridded data, but with no change to the range of values, meaning a single colorbar will suffice between the two figures. This interpolated data is formatted as follows:
xi yi zi interp
The code I am using to plot it is as follows:
fig = plt.figure()
ax = fig.add_subplot(121, projection = '3d')
ax.scatter(xi, yi, zi, c=interp, alpha=0.08, edgecolors='none'
ax=fig.add_subplot(122, projection = '3d')
s = ax.scatter(x, y, z, c=values, alpha=0.3, edgecolors='none'
plt.colorbar(s)
All of this works just fine, however the problem arises when I load in a separate data-set with similar, but not identical range of values. The colorbars between the two data-sets display different ranges which is not ideal to draw comparisons between the two. All I am looking to do is to forcibly set the min (2100) and max (2600) for the colorbar so that I can apply it to any and all data-sets.
Sorry if this is an easy question, but as I said, I can't seem to find a solution that works for 3D data.
Cheers, Vlad.
I think you are looking for vmin and vmax , for example
ax.scatter(x, y, z, c=values, alpha=0.3, edgecolors='none',vmin=2100,vmax=2600)

Categories