How to change axis scale in python? - python

More specifically how do I change it to work like this graph ? I've tried using plt.yscale() but to no avail as it only allows certain set values and I didn't get very far with using plt.axis. This code is a simple attempt at a linear regression with the values shown below, my coefficients (for a function A+Bx) were A=38.99 and B=2.055
X=np.array([2,4,6,8,10])
Y=np.array([42.0,48.4,51.3,56.3,58.6])
A, B=P.polyfit(X,Y,1)
plt.plot(X,Y,'o')
plt.plot(X,A+B*X)
plt.yscale('linear')
plt.show()
And my graph comes out looking like this:graph2 Which isn't wrong but I got curious on how to make it look like the one above and just couldn't figure it out.

I'm using Matplotlib's Object-Oriented API.
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline # only needed if running from a Jupiter Notebook
import numpy as np
X = np.array([2,4,6,8,10])
Y = np.array([42.0,48.4,51.3,56.3,58.6])
A, B = np.polyfit(X,Y,1)
fig, ax = plt.subplots()
ax.plot(X, Y, 'o')
ax.plot(X, B+A*X)
ax.xaxis.set_major_locator(mpl.ticker.FixedLocator([0, 2, 4, 6, 8, 10]))
ax.yaxis.set_major_locator(mpl.ticker.FixedLocator([40, 50, 60]))
ax.set(xlim=[0, 11], ylim=[40, 60], xlabel=r'Mass (kg) $->$', ylabel=r'Length (cm) $->$');

Related

Python Seaborn lineplot - Smoothing with Hue [duplicate]

I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.
You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:
For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.
See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.
One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)
I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).
Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()
It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.
Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.

drawing smooth line in python matplotlib [duplicate]

I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.
You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:
For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.
See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.
One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)
I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).
Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()
It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.
Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.

Changing fig size with statsmodel

I am trying to make QQ-plots using the statsmodel package. However, the resolution of the figure is so low that I could not possibly use the results in a presentation.
I know that to make networkX graph plot a higher resolution image I can use:
plt.figure( figsize=(N,M) )
networkx.draw(G)
and change the values of N and M to attain desirable results.
However, when I try the same method with a QQ-plot from the statsmodel package, it seems to have no impact on the size of the resulting figure, i.e., when I use
plt.Figure( figsize = (N,M) )
statsmodels.qqplot_2samples(sample1, sample2, line = 'r')
changing M and N have no effect on the figure size. Any ideas on how to fix this (and why this method isn't working)?
You can use mpl.rc_context to temporarily set the default figsize before plotting.
import numpy as np
import matplotlib as mpl
from statsmodels.graphics.gofplots import qqplot_2samples
np.random.seed(10)
sample1 = np.random.rand(10)
sample2 = np.random.rand(10)
n, m = 6, 6
with mpl.rc_context():
mpl.rc("figure", figsize=(n,m))
qqplot_2samples(sample1, sample2, line = 'r')
This is a great solution and works for other plots too - I upvoted it. Here is the implementation for acf and pacf plots.
N, M = 12, 6
fig, ax = plt.subplots(figsize=(N, M))
plot_pacf(df2, lags = 40, title='Daily Female Births', ax=ax)
plt.show()
The qqplot_2samples function has an ax parameter which allows you to specify
a matplotlib axes object on which the plot should be drawn. If you don't supply
the ax, then a new axes object is created for you.
So, as an alternative to cel's solution, if you wish to create your own figure,
then you should also pass the figure's axes object to qqplot_2samples:
sm.qqplot_2samples(sample1, sample2, line='r', ax=ax)
For example,
import scipy.stats as stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
N, M = 6, 5
fig, ax = plt.subplots(figsize=(N, M))
sample1 = stats.norm.rvs(5, size=1000)
sample2 = stats.norm.rvs(10, size=1000)
sm.qqplot_2samples(sample1, sample2, line='r', ax=ax)
plt.show()
Just use plt.rc("figure", figsize=(16,8)) before plotting.
Check this link here.
I used plt.rc()
plt.rc("figure", figsize=(10,6))
sm.graphics.tsa.plot_acf(nifty_50['close_price'], lags=36000);

Create a Student-Age graph in Python-Matplotlib

import matplotlib.pyplot as plt
x = ['Eric','Jhon','bill','Daniel']
y = [10, 17, 12.5, 20]
plt.plot(x,y)
plt.show()
When I run this code, I get this error
ValueError: could not convert string to float:
I want all names in list x at x-axis and corresponding ages are in second list y which will be used in bar graph.
So here I have 1 more question
Is it a good way to do in my case(I mean if we can create a list of tuples(name,age)) and that would be easy?? or something else.
The error occurs because matplotlib is expecting numerical data but you're providing strings (the names).
What you can do instead is plot your data using some numerical data and then replace the ticks on the x-axis using plt.xticks as below.
import matplotlib.pyplot as plt
names = ['Eric','John','Bill','Daniel']
x = range(len(names))
y = [10, 17, 12.5, 20]
plt.plot(x, y)
plt.xticks(x, names)
plt.show()

pyplot: loglog() with base e

Python (and matplotlib) newbie here coming over from R, so I hope this question is not too idiotic. I'm trying to make a loglog plot on a natural log scale. But after some googling I cannot somehow figure out how to force pyplot to use a base e scale on the axes. The code I have currently:
import matplotlib.pyplot as pyplot
import math
e = math.exp(1)
pyplot.loglog(range(1,len(degrees)+1),degrees,'o',basex=e,basey=e)
Where degrees is a vector of counts at each value of range(1,len(degrees)+1). For some reason when I run this code, pyplot keeps giving me a plot with powers of 2 on the axes. I feel like this ought to be easy, but I'm stumped...
Any advice is greatly appreciated!
When plotting using plt.loglog you can pass the keyword arguments basex and basey as shown below.
From numpy you can get the e constant with numpy.e (or np.e if you import numpy as np)
import numpy as np
import matplotlib.pyplot as plt
# Generate some data.
x = np.linspace(0, 2, 1000)
y = x**np.e
plt.loglog(x,y, basex=np.e, basey=np.e)
plt.show()
Edit
Additionally if you want pretty looking ticks you can use matplotlib.ticker to choose the format of your ticks, an example of which is given below.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
x = np.linspace(1, 4, 1000)
y = x**3
fig, ax = plt.subplots()
ax.loglog(x,y, basex=np.e, basey=np.e)
def ticks(y, pos):
return r'$e^{:.0f}$'.format(np.log(y))
ax.xaxis.set_major_formatter(mtick.FuncFormatter(ticks))
ax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))
plt.show()
It can also works for semilogx and semilogy to show them in e and also change their name.
import matplotlib.ticker as mtick
fig, ax = plt.subplots()
def ticks(y, pos):
return r'$e^{:.0f}$'.format(np.log(y))
plt.semilogy(Time_Series, California_Pervalence ,'gray', basey=np.e )
ax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))
plt.show()
Take a look at the image.

Categories