Changing fig size with statsmodel - python

I am trying to make QQ-plots using the statsmodel package. However, the resolution of the figure is so low that I could not possibly use the results in a presentation.
I know that to make networkX graph plot a higher resolution image I can use:
plt.figure( figsize=(N,M) )
networkx.draw(G)
and change the values of N and M to attain desirable results.
However, when I try the same method with a QQ-plot from the statsmodel package, it seems to have no impact on the size of the resulting figure, i.e., when I use
plt.Figure( figsize = (N,M) )
statsmodels.qqplot_2samples(sample1, sample2, line = 'r')
changing M and N have no effect on the figure size. Any ideas on how to fix this (and why this method isn't working)?

You can use mpl.rc_context to temporarily set the default figsize before plotting.
import numpy as np
import matplotlib as mpl
from statsmodels.graphics.gofplots import qqplot_2samples
np.random.seed(10)
sample1 = np.random.rand(10)
sample2 = np.random.rand(10)
n, m = 6, 6
with mpl.rc_context():
mpl.rc("figure", figsize=(n,m))
qqplot_2samples(sample1, sample2, line = 'r')

This is a great solution and works for other plots too - I upvoted it. Here is the implementation for acf and pacf plots.
N, M = 12, 6
fig, ax = plt.subplots(figsize=(N, M))
plot_pacf(df2, lags = 40, title='Daily Female Births', ax=ax)
plt.show()

The qqplot_2samples function has an ax parameter which allows you to specify
a matplotlib axes object on which the plot should be drawn. If you don't supply
the ax, then a new axes object is created for you.
So, as an alternative to cel's solution, if you wish to create your own figure,
then you should also pass the figure's axes object to qqplot_2samples:
sm.qqplot_2samples(sample1, sample2, line='r', ax=ax)
For example,
import scipy.stats as stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
N, M = 6, 5
fig, ax = plt.subplots(figsize=(N, M))
sample1 = stats.norm.rvs(5, size=1000)
sample2 = stats.norm.rvs(10, size=1000)
sm.qqplot_2samples(sample1, sample2, line='r', ax=ax)
plt.show()

Just use plt.rc("figure", figsize=(16,8)) before plotting.

Check this link here.
I used plt.rc()
plt.rc("figure", figsize=(10,6))
sm.graphics.tsa.plot_acf(nifty_50['close_price'], lags=36000);

Related

How to change the number of size categories in seaborn scatterplot

I tried hard to look through all the documentation and examples but I am not able to figure it out. How do I change the number of categories = the number of size bubbles, and their boundaries in seaborn scatterplot? The sizes parameter doesn't help here.
It always gives me 6 of them regardless of what I try (here 8, 16, ..., 48):
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", size="total_bill")
or
penguins = sns.load_dataset("penguins")
sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="body_mass_g")
And how do I change their boundaries? Ie. if I want to have 10, 20, 30, 40, 50 in the first case or 3000, 4000, 5000, 6000 in the second?
I know that going around and creating another column in the dataframe works but that is not wanted (adds unnecessary columns and even if I do it on the fly, it's just not what I am looking for).
Workaround:
def myfunc(mass):
if mass <3500:
return 3000
elif mass <4500:
return 4000
elif mass <5500:
return 5000
return 6000
penguins["mass"] = penguins.apply(lambda x: myfunc(x['body_mass_g']), axis=1)
sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="mass")
I don't think seaborn has a fine-grained control, it just tries to come up with something that works a bit intuitively for many situations, but not for all. The legend='full' parameter shows all values of the size column, but that can be too overwhelming.
The suggestion to create a new column with binned sizes has the drawback that this will also change the sizes used in the scatterplot.
An approach could be to create your own custom legend. Note that when the legend also contains other elements, this approach needs to be adapted a bit.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
tips = sns.load_dataset("tips")
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", size="total_bill", legend='full')
handles, labels = ax.get_legend_handles_labels()
labels = np.array([float(l) for l in labels])
desired_labels = [10, 20, 30, 40, 50]
desired_handles = [handles[np.argmin(np.abs(labels - d))] for d in desired_labels]
ax.legend(handles=desired_handles, labels=desired_labels, title=ax.legend_.get_title().get_text())
plt.show()
The code can be wrapped into a function, and e.g. applied to the penguins:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
def sizes_legend(desired_sizes, ax=None):
ax = ax or plt.gca()
handles, labels = ax.get_legend_handles_labels()
labels = np.array([float(l) for l in labels])
desired_handles = [handles[np.argmin(np.abs(labels - d))] for d in desired_sizes]
ax.legend(handles=desired_handles, labels=desired_sizes, title=ax.legend_.get_title().get_text())
penguins = sns.load_dataset("penguins")
ax = sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", size="body_mass_g", legend='full')
sizes_legend([3000, 4000, 5000, 6000], ax)
plt.show()

Gain plot using seaborn - matplotlib

I am generating gain plot based on the following example data in Matplotlib.
M_GRP_1 F_GRP_1 GRP_1 GAIN_GRP_1
0.036796 0.067024 0.058878 0.624948
0.000093 0.000087 0.000089 1.043674
0.000316 0.0002 0.000231 1.366149
0.011152 0.008329 0.00909 1.226813
0.001227 0.000747 0.000876 1.400792
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
fig.set_size_inches([18, 9])
ax.plot(np.linspace(0,1),np.linspace(0,1), color = 'black', linewidth = 2)
D = d.sort_values('GRP_1', ascending = False).cumsum()
ax.plot(D.iloc[:,2], D.iloc[:,0], color = 'orange', linewidth = 2)
plt.xlabel('Percentage of total data')
plt.ylabel('Gain')
plt.title ('Target groups :: GRP_1')
plt.legend(['Basline','Male'])
plt.grid(True)
plt.show()
However, I want to generate same plot using seaborn. I am wondering how I can do that as I,m not familiar with it.
Can any body suggest/help with this.
Thanks in advance
Seaborn is based on matplotlib, so most of your code is the same.
Just import seaborn as sns and replace ax.plot by sns.lineplot.
You may also want to add sns.set_theme() (or sns.set() prior to version 0.11.0) to apply seaborn default styles.

How to add axis offset in matplotlib plot?

I'm drawing several point plots in seaborn on the same graph. The x-axis is ordinal, not numerical; the ordinal values are the same for each point plot. I would like to shift each plot a bit to the side, the way pointplot(dodge=...) parameter does within multiple lines within a single plot, but in this case for multiple different plots drawn on top of each other. How can I do that?
Ideally, I'd like a technique that works for any matplotlib plot, not just seaborn specifically. Adding an offset to the data won't work easily, since the data is not numerical.
Example that shows the plots overlapping and making them hard to read (dodge within each plot works okay)
import pandas as pd
import seaborn as sns
df1 = pd.DataFrame({'x':list('ffffssss'), 'y':[1,2,3,4,5,6,7,8], 'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
sns.pointplot(data=df1, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='<')
sns.pointplot(data=df2, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='>')
I could use something other than seaborn, but the automatic confidence / error bars are very convenient so I'd prefer to stick with seaborn here.
Answering this for the most general case first.
A dodge can be implemented by shifting the artists in the figure by some amount. It might be useful to use points as units of that shift. E.g. you may want to shift your markers on the plot by 5 points.
This shift can be accomplished by adding a translation to the data transform of the artist. Here I propose a ScaledTranslation.
Now to keep this most general, one may write a function which takes the plotting method, the axes and the data as input, and in addition some dodge to apply, e.g.
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
The full functional code:
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
import pandas as pd
def draw_dodge(*args, **kwargs):
func = args[0]
dodge = kwargs.pop("dodge", 0)
ax = kwargs.pop("ax", plt.gca())
trans = ax.transData + transforms.ScaledTranslation(dodge/72., 0,
ax.figure.dpi_scale_trans)
artist = func(*args[1:], **kwargs)
def iterate(artist):
if hasattr(artist, '__iter__'):
for obj in artist:
iterate(obj)
else:
artist.set_transform(trans)
iterate(artist)
return artist
X = ["a", "b"]
Y = np.array([[1,2],[2,2],[3,2],[1,4]])
Dodge = np.arange(len(Y),dtype=float)*10
Dodge -= Dodge.mean()
fig, ax = plt.subplots()
for y,d in zip(Y,Dodge):
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
ax.margins(x=0.4)
plt.show()
You may use this with ax.plot, ax.scatter etc. However not with any of the seaborn functions, because they don't return any useful artist to work with.
Now for the case in question, the remaining problem is to get the data in a useful format. One option would be the following.
df1 = pd.DataFrame({'x':list('ffffssss'),
'y':[1,2,3,4,5,6,7,8],
'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
N = len(np.unique(df1["x"].values))*len([df1,df2])
Dodge = np.linspace(-N,N,N)/N*10
fig, ax = plt.subplots()
k = 0
for df in [df1,df2]:
for (n, grp) in df.groupby("h"):
x = grp.groupby("x").mean()
std = grp.groupby("x").std()
draw_dodge(ax.errorbar, x.index, x.values,
yerr =std.values.flatten(), ax=ax,
dodge=Dodge[k], marker="o", label=n)
k+=1
ax.legend()
ax.margins(x=0.4)
plt.show()
You can use linspace to easily shift your graphs to where you want them to start and end. The function also makes it very easy to scale the graph so they would be visually the same width
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
start_offset = 3
end_offset = start_offset
y1 = np.random.randint(0, 10, 20) ##y1 has 20 random ints from 0 to 10
y2 = np.random.randint(0, 10, 10) ##y2 has 10 random ints from 0 to 10
x1 = np.linspace(0, 20, y1.size) ##create a number of steps from 0 to 20 equal to y1 array size-1
x2 = np.linspace(0, 20, y2.size)
plt.plot(x1, y1)
plt.plot(x2, y2)
plt.show()

Using Colormaps to set color of line in matplotlib

How does one set the color of a line in matplotlib with scalar values provided at run time using a colormap (say jet)? I tried a couple of different approaches here and I think I'm stumped. values[] is a storted array of scalars. curves are a set of 1-d arrays, and labels are an array of text strings. Each of the arrays have the same length.
fig = plt.figure()
ax = fig.add_subplot(111)
jet = colors.Colormap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
retLine, = ax.plot(line, color=colorVal)
#retLine.set_color()
lines.append(retLine)
ax.legend(lines, labels, loc='upper right')
ax.grid()
plt.show()
The error you are receiving is due to how you define jet. You are creating the base class Colormap with the name 'jet', but this is very different from getting the default definition of the 'jet' colormap. This base class should never be created directly, and only the subclasses should be instantiated.
What you've found with your example is a buggy behavior in Matplotlib. There should be a clearer error message generated when this code is run.
This is an updated version of your example:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
import numpy as np
# define some random data that emulates your indeded code:
NCURVES = 10
np.random.seed(101)
curves = [np.random.random(20) for i in range(NCURVES)]
values = range(NCURVES)
fig = plt.figure()
ax = fig.add_subplot(111)
# replace the next line
#jet = colors.Colormap('jet')
# with
jet = cm = plt.get_cmap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
print scalarMap.get_clim()
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
colorText = (
'color: (%4.2f,%4.2f,%4.2f)'%(colorVal[0],colorVal[1],colorVal[2])
)
retLine, = ax.plot(line,
color=colorVal,
label=colorText)
lines.append(retLine)
#added this to get the legend to work
handles,labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right')
ax.grid()
plt.show()
Resulting in:
Using a ScalarMappable is an improvement over the approach presented in my related answer:
creating over 20 unique legend colors using matplotlib
I thought it would be beneficial to include what I consider to be a more simple method using numpy's linspace coupled with matplotlib's cm-type object. It's possible that the above solution is for an older version. I am using the python 3.4.3, matplotlib 1.4.3, and numpy 1.9.3., and my solution is as follows.
import matplotlib.pyplot as plt
from matplotlib import cm
from numpy import linspace
start = 0.0
stop = 1.0
number_of_lines= 1000
cm_subsection = linspace(start, stop, number_of_lines)
colors = [ cm.jet(x) for x in cm_subsection ]
for i, color in enumerate(colors):
plt.axhline(i, color=color)
plt.ylabel('Line Number')
plt.show()
This results in 1000 uniquely-colored lines that span the entire cm.jet colormap as pictured below. If you run this script you'll find that you can zoom in on the individual lines.
Now say I want my 1000 line colors to just span the greenish portion between lines 400 to 600. I simply change my start and stop values to 0.4 and 0.6 and this results in using only 20% of the cm.jet color map between 0.4 and 0.6.
So in a one line summary you can create a list of rgba colors from a matplotlib.cm colormap accordingly:
colors = [ cm.jet(x) for x in linspace(start, stop, number_of_lines) ]
In this case I use the commonly invoked map named jet but you can find the complete list of colormaps available in your matplotlib version by invoking:
>>> from matplotlib import cm
>>> dir(cm)
A combination of line styles, markers, and qualitative colors from matplotlib:
import itertools
import matplotlib as mpl
import matplotlib.pyplot as plt
N = 8*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, colormap)):
plt.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=4);
UPDATE: Supporting not only ListedColormap, but also LinearSegmentedColormap
import itertools
import matplotlib.pyplot as plt
Ncolors = 8
#colormap = plt.cm.Dark2# ListedColormap
colormap = plt.cm.viridis# LinearSegmentedColormap
Ncolors = min(colormap.N,Ncolors)
mapcolors = [colormap(int(x*colormap.N/Ncolors)) for x in range(Ncolors)]
N = Ncolors*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
fig,ax = plt.subplots(gridspec_kw=dict(right=0.6))
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, mapcolors)):
ax.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=3,prop={'size': 8})
U may do as I have written from my deleted account (ban for new posts :( there was). Its rather simple and nice looking.
Im using 3-rd one of these 3 ones usually, also I wasny checking 1 and 2 version.
from matplotlib.pyplot import cm
import numpy as np
#variable n should be number of curves to plot (I skipped this earlier thinking that it is obvious when looking at picture - sorry my bad mistake xD): n=len(array_of_curves_to_plot)
#version 1:
color=cm.rainbow(np.linspace(0,1,n))
for i,c in zip(range(n),color):
ax1.plot(x, y,c=c)
#or version 2: - faster and better:
color=iter(cm.rainbow(np.linspace(0,1,n)))
c=next(color)
plt.plot(x,y,c=c)
#or version 3:
color=iter(cm.rainbow(np.linspace(0,1,n)))
for i in range(n):
c=next(color)
ax1.plot(x, y,c=c)
example of 3:
Ship RAO of Roll vs Ikeda damping in function of Roll amplitude A44

Python-Matplotlib boxplot. How to show percentiles 0,10,25,50,75,90 and 100?

I would like to plot an EPSgram (see below) using Python and Matplotlib.
The boxplot function only plots quartiles (0, 25, 50, 75, 100). So, how can I add two more boxes?
I put together a sample, if you're still curious. It uses scipy.stats.scoreatpercentile, but you may be getting those numbers from elsewhere:
from random import random
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import scoreatpercentile
x = np.array([random() for x in xrange(100)])
# percentiles of interest
perc = [min(x), scoreatpercentile(x,10), scoreatpercentile(x,25),
scoreatpercentile(x,50), scoreatpercentile(x,75),
scoreatpercentile(x,90), max(x)]
midpoint = 0 # time-series time
fig = plt.figure()
ax = fig.add_subplot(111)
# min/max
ax.broken_barh([(midpoint-.01,.02)], (perc[0], perc[1]-perc[0]))
ax.broken_barh([(midpoint-.01,.02)], (perc[5], perc[6]-perc[5]))
# 10/90
ax.broken_barh([(midpoint-.1,.2)], (perc[1], perc[2]-perc[1]))
ax.broken_barh([(midpoint-.1,.2)], (perc[4], perc[5]-perc[4]))
# 25/75
ax.broken_barh([(midpoint-.4,.8)], (perc[2], perc[3]-perc[2]))
ax.broken_barh([(midpoint-.4,.8)], (perc[3], perc[4]-perc[3]))
ax.set_ylim(-0.5,1.5)
ax.set_xlim(-10,10)
ax.set_yticks([0,0.5,1])
ax.grid(True)
plt.show()

Categories