How to transform nonlinear model to linear? - python

I'm ananlyzing a dataset, and I know that the data should follow a power model:
y = a*x**b
I transformed it to linear by taking logarithms:
ln(y) = ln(a) + b* ln(x)
However, the problems arised on adding a trend line to the plot
slope, intercept, r_value, p_value, std_err = scipy.stats.mstats.linregress(x_ln, y_ln)
yy = np.exp(intercept)*wetarea_x**slope
plt.scatter(wetarea_x, arcgis_wtrshd_x, color = 'blue')
plt.plot(wetarea_x, yy, color = 'green')
This is what I get with this code.
How to modify the code, so that the trend line on the plot would be correct?

Your green strange plot is what you get when you do a line plot in matplotlib, with the x values unsorted. It's a line plot, but it connects by lines (x, y) pairs jumping right and left (in your specific case, it looks like back to near the x-origin). That gives these strange patterns.
You don't have this problem with the blue plot, because it's a scatter plot.
Try calling the plot after sorting both arrays according to the indices of the first using numpy.argsort, say
wetarea_x[np.argsort(wetarea_x)]
and
yy[np.argsort(wetarea_x)]

Related

Plotting a curve from dots of a scatter plot

I'm facing a silly problem while plotting a graph from a regression function calculated using sci-kit-learn. After constructing the function I need to plot a graph that shows X and Y from the original data and calculated dots from my function. The problem is that my function is not a line, despite being linear, it uses a Fourier series in order to give the right shape for my curve, and when I try to plot the lines using:
ax.plot(df['GDPercapita'], modelp1.predict(df1), color='k')
I got a Graph like this:
Plot
But the trhu graph is supposed to be a line following those black points:
Dots to be connected
I'm generating the graph using the follow code:
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp1.predict(df1),color='k') #this line is changed to get the first pic.
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show(block=True)
Does anyone have an idea about what to do?
POST DISCUSSION EDIT:
Ok, so first things first:
The data can be download at: http://www.est.ufmg.br/~marcosop/est171-ML/dados/worldDevelopmentIndicators.csv
I had to generate new data using a Fourier expasion, with normalized values of GDPercapita, in order to perform an exhaustive optimization algorithm for Regression Function used to predict the LifeExpectancy, and found out the number o p parameters that generate the best Regression Function, this number is p=22.
Now I have to generate a Polynomial Function using the predictions points of the regression fuction with p=22, to show how the best regression function is compared to the Polynomial function with the 22 degrees.
To generate the prediction I use the following code:
from sklearn import linear_model
modelp22 = linear_model.LinearRegression()
modelp22.fit(xp22,y_train)
df22 = df[p22]
fig, ax = plt.subplots()
ax.scatter(df['GDPercapita'], df['LifeExpectancy'], edgecolors=(0, 0, 0))
ax.scatter(df['GDPercapita'], modelp22.predict(df22),color='k')
ax.set_xlabel('GDPercapita')
ax.set_ylabel('LifeExpectancy')
plt.show(block=True)
Now I need to use the predictions points to create a Polynomial Function and plot a graph with: The original data(first scatter), the predictions points(secont scatter) and the Polygonal Funciontion (a curve or plot) to show their visual relation.

Using SciPy to interpolate data into a quadratic fit

I have a set of data that when plotted most points congregate to the left of the x axis:
plt.plot(x, y, marker='o')
plt.title('Original')
plt.show()
ORIGINAL GRAPH
I want to use scipy to interpolate the data and later try to fit a quadratic line to the data. I am avoiding to simply fit a quadratic curve without interpolation since this will make the obtained curve biased towards the mass of data at one extreme end of the x axis. I tried this by using
f = interp1d(x, y, kind='quadratic')
# Array with points in between min(x) and max(x) for interpolation
x_interp = np.linspace(min(x), max(x), num=np.size(x))
# Plot graph with interpolation
plt.plot(x_interp, f(x_interp), marker='o')
plt.title('Interpolated')
plt.show()
and got INTERPOLATED GRAPH.
However, what I intend to get is something like this:
EXPECTED GRAPH
What am I doing wrong?
My values for x can be found here and values for y here.
Thank you!
Solution 1
I'm pretty sure this does what you want. It fits a second degree (quadratic) polynomial to your data, then plots that function on an evenly spaced array of x values ranging from the minimum to the maximum of your original x data.
new_x = np.linspace(min(x), max(x), num=np.size(x))
coefs = np.polyfit(x,y,2)
new_line = np.polyval(coefs, new_x)
Plotting it returns:
plt.scatter(x,y)
plt.scatter(new_x,new_line,c='g', marker='^', s=5)
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
if that wasn't what you meant...
However, from your question, it seems like you might be trying to force all your original y-values onto evenly spaced x-values (if that's not your intention, let me know, and I'll just delete this part).
This is also possible, there are lots of ways to do this, but I've done it here in pandas:
import pandas as pd
xy_df=pd.DataFrame({'x_orig': x, 'y_orig': y})
sorted_x_y=xy_df.sort_values('x_orig')
sorted_x_y['new_x'] = np.linspace(min(x), max(x), np.size(x))
plt.figure(figsize=[5,5])
plt.scatter(sorted_x_y['new_x'], sorted_x_y['y_orig'])
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
Which looks pretty different from your original data... which is why I think it might not be exactly what you're looking for.

The line of best fit doesn't match the scatter plot

Below is my scatter plot with a regression linear. Just by looking at how the markers are distributed on the plot, I feel like the linear is not covering them correctly. From what i see, it is supposed to be more of a diagonal and a more straight line instead of a curve. here is my code producing the plot:
for i in range (len(linkKarmaList)):
plt.scatter(commentKarmaList[i], linkKarmaList[i], marker="o", s=len(clearModSet[i])*1.0*0.9)
x = numpy.asarray(commentKarmaList)
y = numpy.asarray(linkKarmaList )
plt.plot(numpy.unique(x), numpy.poly1d(numpy.polyfit(x, y, 1))(numpy.unique(x)))
plt.xlabel('Comment Karma ')
plt.ylabel('Link Karma')
plt.title('Link and comment Karma of most popular Forums on reddit')
plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.show
Am I interpreting that correctly? What am I missing?
You're trying to fit a straight line y = a*x + b, which doesn't look like a straight line in log-space. Instead, you should be plotting a straight line in log-space.
This comes down to log(y) = a * log(x) + b
Which we can then rewrite to log(y) = log(x^a) + b
If we then take the exponent of this, we find:
y = x^a * 10^b or just y = C * x^a, where C (=10^b) and a are the fitting parameters and x and y are your data.
This is the function that makes a straight line in log-log space, which is the function you should try to fit against your data.
From what you show, I'd say the problem is that in the log-log plot the scatterplot looks more or less like a line.
The problem is that you're fitting against natural values and then plotting in a log-log plot.

Matplotlib negative axis

I want to fit y=mx+c straight line to my data points, but in log form. For this purpose I am using curve_fit module. My simple code is
def func(x,m,c):
return (x*m + c)
x=log10(xdata)
y=log10(ydata)
err=log10(error)
coeff, var = curve_fit(func,x,y,sigma=err)
yfit = func(x,coeff[0],coeff[1])
pl.plot(x,y,'r0')
pl.plot(x,yfit,'k-')
pl.show()
This plot gives me negative numbers on y axis as my y values are in mV. Is there any way to use original xdata and ydata (in mV) on plots with log fitting?
Plot transformed variables instead.
plot(10**x, 10**yfit, 'k-')
and maybe display the plot in log scale
set_yscale('log')

python How to plot scatter and regression line with more than 127 or 128?

I am trying to make a simple scatter and also overlay a simple regression. All the x,y points plot in a scatter form, as expected, no matter what. Great. My problem is that if N is >127 then all the (x,y) points are plotted, but the regression line does not extend from the min(x) to the max(x). The regression line should extend all the way from the left side (to min(x)) all the way to the max(x). What is going on here and how can I fix it?
fig1, ax1 = plt.subplots(1,1)
N=128
x=np.random.rand(N)
y=np.random.rand(N)
fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)
ya=fit_fn(x)
ax1.plot(x,y, 'bo',x, ya,'-k')
I did notice that if I change the last line to
ax1.plot(x,y, 'bo',x, ya,'-ko')
then all the points plot, but this is not what i want since this gives me a scatter plot for x,ya instead of a line.
I get it now. I'm not quite sure why that happens like that, but there's a way around it. Does this produce the same result? (see mine bellow)
import matplotlib.pyplot as plt
import numpy as np
fig1, ax1 = plt.subplots(1,1)
#distribute N random points in interval [0,1>
N=300
x=np.random.rand(N)
y=np.random.rand(N)
#get fit information
fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)
#extend fitted line interval to make sure you
#get min and max on x axis
current = np.arange(min(x), max(x), 0.01)
current_fit = np.polyval(fit_fn, current)
#you can extend it even, default is color blue
future = np.arange(min(x)-0.5, max(x)+0.5, 0.01)
future_fit = np.polyval(fit_fn, future)
#plot
ax1.plot(x,y, 'bo')
ax1.plot(current, current_fit, "-ko")
ax1.plot(future, future_fit)
plt.show()

Categories