Linear regression not applying to loglog scale with seaborn - python

I am currently trying to do a linear regression on a loglog plot using seaborn.
Currently it tries to do the linear regression on the normal scale even though the data is shown plotted in loglog scale.
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pylab as plt
import seaborn as sns
x = np.arange(1, 10)
y = x**2.0
data = pd.DataFrame(data={'x': x, 'y': y})
f, ax = plt.subplots(figsize=(7, 7))
ax.set(xscale="log", yscale="log")
sns.regplot("x", "y", data, ax=ax)
The only work around that I have been able to do is to log x and y in advance of plotting but then the scale for the x and y are no longer nice compared to the code above.
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pylab as plt
import seaborn as sns
x = np.arange(1, 10)
y = x**2.0
data = pd.DataFrame(data={'x': x, 'y': y})
data=np.log(data)
f, ax = plt.subplots(figsize=(7, 7))
sns.regplot("x", "y", data)
Is there a way to keep the loglog scale from the first example of code but have the linear regression apply to the loglog scale and not to the normal scale?

Related

How to find the centre of a 3D scatter plot?

I have been trying to plot the center of a 3D scatter plot and I am facing an issue determining the value for the z-value. I want the z axis value that puts the centre on the scatter plot. How to find the centre of a 3D scatter plot?
Here is what I have tried.
import xlrd
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# axes = [5, 5, 5]
# Create Data
# data = np.ones(axes, dtype=np.bool)
df=pd.read_csv('wr.csv')
df=df.sample(frac=1)
# print(df.head())
fig = plt.figure()
#---------------1st plt-------------------------
ax = fig.add_subplot(projection='3d')
ax.scatter(df.index, df['volatile acidity'], np.ones(df.shape[0]), s=5,color='red')
ax.scatter(np.mean(df.index),np.mean(df['volatile acidity']),1,color='black',s=200)
Fig1
Now I wanted to try the same thing with a different approach and let the z-axis value change dynamically according to the scatter plot.
import xlrd
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# axes = [5, 5, 5]
# Create Data
# data = np.ones(axes, dtype=np.bool)
df=pd.read_csv('wr.csv')
df=df.sample(frac=1)
# print(df.head())
fig = plt.figure()
#---------------1st plt-------------------------
ax = fig.add_subplot(projection='3d')
#ax.scatter(df.index, df['volatile acidity'], np.ones(df.shape[0]), s=5,color='red')
#ax.scatter(np.mean(df.index),np.mean(df['volatile acidity']),1,color='black',s=200)
ax.scatter(df.index, df['volatile acidity'], np.square(df['volatile acidity']), s=5,color='black')
ax.scatter(np.mean(df.index),np.mean(df['volatile acidity']),1,color='orange',s=20)
Here is the image for reference.
Fig2

Add a normal distribution to seaborn 2D histogram

Is it possible to take a histogram from seaborn and add a normal distribution?
Say I had something like this scatter plot and histogram from the documentation.
import seaborn as sns
penguins = sns.load_dataset("penguins")
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm");
plt.savefig('deletethis.png', bbox_inches='tight')
Can i superimpose a distribution on the sides like the image below?
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x = np.random.normal(size=100000)
# Plot histogram in one-dimension
plt.hist(x,bins=80,density=True)
xvals = np.arange(-4,4,0.01)
plt.plot(xvals, norm.pdf(xvals),label='$N(0,1)$')
plt.legend();
The following gives a Kernel Density Estimate which displays the distribution (and if it is normal):
g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.scatterplot, s=100, alpha=.5)
g.plot_marginals(sns.histplot, kde=True)
The following superimposes a normal distribution on the histograms in the axes.
import seaborn as sns
import numpy as np
import pandas as pd
from scipy.stats import norm
df1 = penguins.loc[:,["bill_length_mm", "bill_depth_mm"]]
axs = sns.jointplot("bill_length_mm", "bill_depth_mm", data=df1)
axs.ax_joint.scatter("bill_length_mm", "bill_depth_mm", data=df1, c='r', marker='x')
axs.ax_marg_x.cla()
axs.ax_marg_y.cla()
sns.distplot(df1.bill_length_mm, ax=axs.ax_marg_x, fit=norm)
sns.distplot(df1.bill_depth_mm, ax=axs.ax_marg_y, vertical=True, fit=norm)

How can I add a vertical line to a seaborn dist plot where it picks?

How can I add a vertical line at the x-location in which y is on its maximum in a seaborn dist plot?
import seaborn as sns, numpy as np
sns.set(); np.random.seed(0)
x = np.random.randn(5000)
ax = sns.distplot(x, kde = False)
PS_ In the example above, we know that it's probably going to pick at 0. I am interested to know how I can find this value in general, for any given distribution of x.
This is one way to get a more accurate point. First get the smooth distribution function, use it to extract the maxima, and then remove it.
import seaborn as sns, numpy as np
import matplotlib.pyplot as plt
sns.set(); np.random.seed(0)
x = np.random.randn(5000)
ax = sns.distplot(x, kde = True)
x = ax.lines[0].get_xdata()
y = ax.lines[0].get_ydata()
plt.axvline(x[np.argmax(y)], color='red')
ax.lines[0].remove()
Edit Alternate solution without using kde=True
import seaborn as sns, numpy as np
from scipy import stats
import matplotlib.pyplot as plt
sns.set(); np.random.seed(0)
x = np.random.randn(5000)
ax = sns.distplot(x, kde = False)
kde = stats.gaussian_kde(x) # Compute the Gaussian KDE
idx = np.argmax(kde.pdf(x)) # Get the index of the maximum
plt.axvline(x[idx], color='red') # Plot a vertical line at corresponding x
This results in the actual distribution and not the density values

Plot paraboloid surface fitting

How can I plot the paraboloid after fitting it using Python? in order to get that plot
import numpy as np
import scipy.optimize as opt
import matplotlib.pyplot as plt
doex = [0.4,0.165,0.165,0.585,0.585]
doey = [.45, .22, .63, .22, .63]
doez = np.array([1, .99, .98,.97,.96])
def paraBolEqn(data,a,b,c,d):
x,y = data
return -(((x-b)/a)**2+((y-d)/c)**2)+1.0
popt,pcov=opt.curve_fit(paraBolEqn,np.vstack((doex,doey)),doez,p0=[1.5,0.4,1.5,0.4])
print(popt)
Everything you need to know is documented at the mplot3d tutorial, where the different methods to make 3d plots in matplotlib are presented.
Your desired plot can be reproduced using the methods Axes3D.plot_wireframe and Axes3D.scatter:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
x, y = np.meshgrid(np.linspace(np.min(doex), np.max(doex),10), np.linspace(np.min(doey),np.max(doey), 10))
ax.plot_wireframe(x, y, paraBolEqn((x,y), *popt))
ax.scatter(doex, doey, doez, color='b')
which results in the following plot:

Drawing a logarithmic spiral in three axes in Python

I try to draw a logarithmic spiral in the form of a spring in three axes.
Using the parametric equations:
x=a*exp(b*th)*cos(th)
y=a*exp(b*th)*sin(th)
Using the code:
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
from math import exp,sin,cos
from pylab import *
mpl.rcParams['legend.fontsize'] = 10
fig = plt.figure()
ax = fig.gca(projection='3d')
n=100
a=0.5
b=0.20
th=np.linspace(0, 500, 10000)
x=a*exp(b*th)*cos(th)
y=a*exp(b*th)*sin(th)
ax.plot(x, y)
ax.legend()
plt.show()
I get:
However, I would like to stretch the spiral along the Z axis to get a result similar to the following, but using the logarithmic spiral as the basis:
How can you do it? How do you modify the function by adding a condition to the Z axis?
Which z to take it's a bit up to you. From the plot itself it's hard to say but my guess is that it's linear (the simplest option).
Taking your code and adding the z axis you can do something like this
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
from math import exp,sin,cos
from pylab import *
mpl.rcParams['legend.fontsize'] = 10
fig = plt.figure()
ax = fig.gca(projection='3d')
a=0.05
b=0.10
# took the liberty of reducing the max value for th
# as it was giving you values of the order of e42
th=np.linspace(0, 50, 10000)
x=a*exp(b*th)*cos(th)
y=a*exp(b*th)*sin(th)
z=np.linspace(0,2, 10000) # creating the z array with the same length as th
ax.plot(x, y, z) # adding z as an argument for the plot
ax.legend()
plt.show()
You can play with your a and b parameters to get the elliptical shape you want. You can also play with the definition of z to make it exponential, or logarithmic in growth.. or something else entirely.
BTW, your imports are a bit redundant and probably some funtions from one package are being shadowed by another package.
Since 95% of the points of the spiral are condensed in a single point in the middle of the plot it would make sense to restrict the plotted range to something like
th=np.linspace(475, 500, 10000)
Then using a linear range of z values would directly give you the desired curve in the plot, by simply specifying that range in the plot function, plot(x,y,z).
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['legend.fontsize'] = 10
fig = plt.figure()
ax = fig.gca(projection='3d')
a=0.5
b=0.20
th=np.linspace(475, 500, 10000)
x=a*np.exp(b*th)*np.cos(th)
y=a*np.exp(b*th)*np.sin(th)
z = np.linspace(0,2, len(th))
ax.plot(x, y, z)
#ax.legend()
plt.show()
Note that I cleaned up the imports here. E.g. if you import cos from math but later import everything (*) from pylab into the namespace, the function cos that is used is the numpy cos function, not the one from math (the math cos function would not work here anyways). In general: don't use pylab at all.

Categories