Is it possible to find the peak (vertex?) values (x,y) of a polynomial regression line that was computed using Matplotlib?
I've included my basic setup below (of course with fuller data sets), as well as a screenshot of the actual regression line question.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
degree=6
setX={'Fixation Duration': {0:1,1:2,2:3}}
setY={'Fixation Occurrences': {0:1,1:2,2:3}}
X_gall=pd.DataFrame.from_dict(setX)
Y_gall=pd.DataFrame.from_dict(setY)
X_seqGall = np.linspace(X_gall.min(),X_gall.max(),300).reshape(-1,1)
polyregGall=make_pipeline(PolynomialFeatures(degree),LinearRegression())
polyregGall.fit(X_gall,Y_gall)
plt.scatter(X_gall,Y_gall, c="#1E4174", s=100.0, alpha=0.4)
plt.plot(X_seqGall,polyregGall.predict(X_seqGall),color="#1E4174", linewidth=4)
plt.show()
would like to find x,y values along red arrows
You can find the maximum from the underlying plot data.
First, let's change your plotting commands to explicitly define the axes:
fig, ax = plt.subplots(figsize=(6,4))
_ = ax.scatter(X_gall,Y_gall, c="#1E4174", s=100.0, alpha=0.4)
poly = ax.plot(X_seqGall,polyregGall.predict(X_seqGall),color="#1E4174", linewidth=4)
plt.show()
Now you can access the line data:
lines = poly[0].axes.lines
for line in lines:
max_y = np.max(line.get_ydata())
print(f"Maximum y is: {max_y}")
x_of_max_y = line.get_xdata()[np.argmax(line.get_ydata())]
print(f"x value of maximum y is: {x_of_max_y}")
Output:
Maximum y is: 3.1515605364361114
x value of maximum y is: 2.8127090301003346
Related
I want to get a list of random numbers from GMM.
I've plotted a distribution like this.
And I was wondering whether there is some way to get the value from the distribution curve, such as getting number 0.94, 0.96, 0.95.... How to get it?
I am not sure whether the belowing is useful for u or not. I got the code from Minimal reproduction
from matplotlib import rc
from sklearn import mixture
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
import matplotlib.ticker as tkr
import scipy.stats as stats
# x = open("prueba.dat").read().splitlines()
# create the data
x = np.concatenate((np.random.normal(5, 5, 1000),np.random.normal(10, 2, 1000)))
f = np.ravel(x).astype(np.float)
f=f.reshape(-1,1)
g = mixture.GaussianMixture(n_components=3,covariance_type='full')
g.fit(f)
weights = g.weights_
means = g.means_
covars = g.covariances_
plt.hist(f, bins=100, histtype='bar', density=True, ec='red', alpha=0.5)
# f_axis = f.copy().ravel()
# f_axis.sort()
# plt.plot(f_axis,weights[0]*stats.norm.pdf(f_axis,means[0],np.sqrt(covars[0])).ravel(), c='red')
plt.rcParams['agg.path.chunksize'] = 10000
plt.grid()
plt.show()
Thanks!
According to the documentation you can simply do g.sample(n_samples=64) on the fitted model to generate 64 samples
I have never been great with Python plotting concepts, and now I'm still apparently missing something new.
Here is my code.
import pandas as pd
import matplotlib.pyplot as plt
import sys
from numpy import genfromtxt
from sklearn.cluster import DBSCAN
data = pd.read_csv('C:\\Users\\path_here\\wine.csv')
data
# Reading in 2D Feature Space
model = DBSCAN(eps=0.9, min_samples=10).fit(data)
array_flavanoids = data.iloc[:, 2]
# Slicing array
array_colorintensity = data.iloc[:, 3]
# Scatter plot function
colors = model.labels_
plt.scatter(array_flavanoids, array_colorintensity, c=colors, marker='o')
plt.xlabel('Concentration of flavanoids', fontsize=16)
plt.ylabel('Color intensity', fontsize=16)
plt.title('Concentration of flavanoids vs Color intensity', fontsize=20)
plt.show()
Here is my result.
I am expecting the outliers to be in a different color than the non-outliers. So, something like this.
Maybe one color for outliers and another for non-outliers. I am just trying to learn the concept in this exercise. I am trying to follow the example from this link.
https://towardsdatascience.com/outlier-detection-python-cd22e6a12098
I am using this data source.
https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009
I am testing different data sets.
I got this to work.
from sklearn.cluster import DBSCAN
def dbscan(X, eps, min_samples):
ss = StandardScaler()
X = ss.fit_transform(X)
db = DBSCAN(eps=eps, min_samples=min_samples)
db.fit(X)
y_pred = db.fit_predict(X)
plt.scatter(X[:,0], X[:,1],c=y_pred, cmap='Paired')
plt.title("DBSCAN")
dbscan(data, eps=.5, min_samples=5)
I found this to be a great resource.
https://medium.com/#plog397/functions-to-plot-kmeans-hierarchical-and-dbscan-clustering-c4146ed69744
I'm trying to make a small program that will plot a graph with best fit line and that will predict the COST value based on inputted SIZE value.
I always get this error, and I do not know what it means:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
No handles with labels found to put in legend.
This is the graph that I get (red), and I think that curve should look like green curve that i have draw.
And finally, program makes prediction only when I exit the graph.
What am I doing wrong?
This is the code:
import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt
size=[[1],[2],[3],[4],[5],[7],[9],[10],[11],[13]]
cost=[[10],[22],[35],[48],[60],[80],[92],[111],[118],[133]]
def predict(size,cost,x):
dates=np.reshape(size,(len(size),1))
svr_poly=SVR(kernel="poly",C=1e3, degree=2)
svr_poly.fit(size,cost)
plt.scatter(size,cost, color="blue")
plt.plot(cost, svr_poly.predict(cost), color="red")
plt.xlabel("Size")
plt.ylabel("Cost")
plt.title("prediction")
plt.legend()
plt.show()
predictedcost=predict(size,cost,7)
print(predictedcost)
Here, I found the answer to this problem. So if you are interested, check it
import numpy as np
import matplotlib.pyplot as plt
import math
X = np.array([1,2,3,5,6,7,4,7,8,9,5,10,11,7,6,6,10,11,11,12,13,13,14])
Y=np.array([2,3,5,8,11,14,9,19,15,19,15,16,14,7,13,13,14,13,23,25,26,27,33])
koeficienti_polinom = np.polyfit(X, Y, 2)
a=koeficienti_polinom[0]
b=koeficienti_polinom[1]
c=koeficienti_polinom[2]
xval=np.linspace(np.min(X), np.max(X))
regression=a * xval**2 + b*xval + c
predX = float(input("Enter: "))
predY = a * predX**2 + b*predX + c
plt.scatter(X,Y, s=20, color="blue" )
plt.scatter(predX, predY, color="red")
plt.plot(xval, regression, color="black", linewidth=1)
print("Kvadratno predvidjanje: ",round(predY,2))
gh_data = ascii.read('http://dept.astro.lsa.umich.edu/~ericbell/data/GHOSTS/M81/ngc3031- field15.newphoto_radec')
ra = gh_data['col5'][:]
dec = gh_data['col6'][:]
f606 = gh_data['col3'][:]
f814 = gh_data['col4'][:]
plot(f6062-f8142,f8142, 'bo', alpha=0.15)
axis([-1,2.5,27,23])
xlabel('F606W-F814W')
ylabel('F814W')
title('Field 14')
The data set is imported and organized into different columns, I am trying to overlay a line of best fit, or linear regression over the scatterplot created, but I cannot figure out how. Thanks in advance.
As #rayryeng pointed out, your code just plots the data, but doesn't actually compute any regression results to plot. Here's one way of doing it:
import statsmodels.api as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({"y": range(1,11)+np.random.rand(10),
"x": range(1,11)+np.random.rand(10)})
Use statsmodels OLS method to fit a regression line, and params to extract the coefficient on the single regressor:
beta_1 = sm.OLS(data.y, data.x).fit().params
Produce a scatterplot and add a regression line:
fig, ax = plt.subplots()
ax.scatter(data.x, data.y)
ax.plot(range(1,11), [i*beta_1 for i in range(1,11)], label = "best fit")
ax.legend(loc="best")
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab`
mu = np.loadtxt('my_data/corr.txt')
d = mu[:,2]
y=[]
tot=0
min=999
for i in d:
y.append(float(i))
tot=tot+float(i)
if (min>float(i)):
min=float(i)
av=tot/len(y)
z=[]
m=[]
for i in y:
z.append(i-av)
m.append(i-min)
plt.acorr(z,usevlines=True,maxlags=None,normed=True)
plt.show()
WIth this code I have the output showing a bar chart.
Now,
1) How do I change this plot style to give just the trend line? I cant modify the line properties by any means.
2) How do I write this output data to a dat or txt file?
this should be a working minimal example:
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import normal
data = normal(0, 1, 1000)
# return values are lags, correlation vector and the drawn line
lags, corr, line, rest = plt.acorr(data, marker=None, linestyle='-', color='red', usevlines=False)
plt.show()
np.savetxt("correlations.txt", np.transpose((lags, corr)), header='Lags\tCorrelation')
But i would recommand not to connect the points.