Seaborn -- map diagonal with asymmetric axes - python

I have some data where it makes sense to plot certain, but not all variables against each other and in particular where it only makes sense to plot KDEs for certain variables. This seems, generally, like a good use case for seaborn's PairGrid. However, I cannot use sb.PairGrid.map_diag for the variables for which I do want a KDE when it seems like I ought to be.
The following code works as I imagine it would:
import seaborn as sb
import pandas as pd
iris=sb.load_dataset('iris')
pgiris = sb.PairGrid(data=iris,
x_vars=['sepal_width','petal_width','sepal_length'],
y_vars=['sepal_width','petal_width','sepal_length'])
pgiris.map_diag(sb.kdeplot)
Let's imagine, though, that it doesn't make sense to plot sepal_length on both axes:
pgiris=sb.PairGrid(data=iris,x_vars=['sepal_width','petal_width','sepal_length'],y_vars=['sepal_width','petal_width'])
pgiris.map_diag(sb.kdeplot)
Under seaborn 0.9.0 and python3.6.7, this throws a TypeError for reasons I do not understand-- a cursory reading suggests that it does not assign any axes its diag_axes attribute. Oddly, the map_offdiag methods seem to work just fine, so I don't think that this is intended to not work for asymmetric pairgrids.
How do I properly map functions to diagonal elements of asymmetric PairGrids?

Related

Sawtooth look in violin plot [duplicate]

The following code gives me a very nice violinplot (and boxplot within).
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
foo = np.random.rand(100)
sns.violinplot(foo)
plt.boxplot(foo)
plt.show()
So far so good. However, when I look at foo, the variable does not contain any negative values. The seaborn plot seems misleading here. The normal matplotlib boxplot gives something closer to what I would expect.
How can I make violinplots with a better fit (not showing false negative values)?
As the comments note, this is a consequence (I'm not sure I'd call it an "artifact") of the assumptions underlying gaussian KDE. As has been mentioned, this is somewhat unavoidable, and if your data don't meet those assumptions, you might be better off just using a boxplot, which shows only points that exist in the actual data.
However, in your response you ask about whether it could be fit "tighter", which could mean a few things.
One answer might be to change the bandwidth of the smoothing kernel. You do that with the bw argument, which is actually a scale factor; the bandwidth that will be used is bw * data.std():
data = np.random.rand(100)
sns.violinplot(y=data, bw=.1)
Another answer might be to truncate the violin at the extremes of the datapoints. The KDE will still be fit with densities that extend past the bounds of your data, but the tails will not be shown. You do that with the cut parameter, which specifies how many units of bandwidth past the extreme values the density should be drawn. To truncate, set it to 0:
sns.violinplot(y=data, cut=0)
By the way, the API for violinplot is going to change in 0.6, and I'm using the development version here, but both the bw and cut arguments exist in the current released version and behave more or less the same way.

How to force AxesSubplot to use integer tick labels

I'm struggling to correctly apply Integerlabels to the AxesSubplot Object. Somehow all the other popular solutions don't work. I presume this has to do with the nature of how Pandas plots vs how Matplotlib natively plots a dataframe.
When looking at the docs, there is a mention about preventing resolution adjustments by using the optional parameter x_compat like so:
df['A'].plot(x_compat=True)
However this only throws me an error and I also can't find this parameter in the pandas 1.1.2 docs.
This is the code that produces the plot:
def count_id(id_val):
plot = df[df['ID']==id_val]['TRANSCRIPTION_STRING'].value_counts().plot(kind='bar')
plot.set_xticklabels(plot.get_xticklabels(), rotation=40, ha ='right')
print(type(plot))
I have found this answer and it wasn't helpful for the following reasons:
It uses .gca() which throws an error for my case.
It's specifically using matplotlib instead of pandas.plot, which is the same under the hood, but hardly has very different documentation and isn't obvious to work for my case. More experienced users might differ.
It was a good suggestion and I see the similarities, but I very much tried to find it helpful before asking and it just wasn't helpful.
You can use:
from matplotlib.ticker import MaxNLocator
plot.yaxis.set_major_locator(MaxNLocator(integer=True))
MaxNLocator is the default 'locator' for the ticks. You can also force some fixed multiples, e.g. set_major_locator(MultipleLocater(1)).
PS: The usual variable name for the return value of pandas plot functions is ax. This makes it clearer that the standard matplotlib functions can be used to customize the plots.
ax = df[df['ID']==id_val]['TRANSCRIPTION_STRING'].value_counts().plot(kind='bar')
ax.set_xticklabels(plot.get_xticklabels(), rotation=40, ha ='right')
ax.yaxis.set_major_locator(MaxNLocator(integer=True))

'Line 2D' object has no property 'kind' -- Are pyplot.plot( ) and .plot( ) different?

I'm learning the pandas module on datacamp, and in a particular course, the instructor uses:
dog_pack.plot(x= "height_cm", y= "weight_kg", kind="scatter")
plt.show()
to create a scatter plot. On my local PC, I try to do the same thing with the gapminder dataset, this works as intended:
# with the necessary imports (gapminder, matplotlib.pyplot, pandas)
gapminder.plot(x = "gdpPercap", y = "lifeExp", kind = "scatter")
But this throws an error:
# With the necessary imports
plt.plot(gapminder["gdpPercap"], gapminder["lifeExp"], kind = "scatter")
# This gives an error
And this works as intended:
plt.scatter(gapminder["gdpPercap"], gapminder["lifeExp"])
plt.show()
Are plt.plot( ) and .plot () (called on a dataframe) different?
Each function belongs to different library: DataFrame.plot is function of pandas, and pyplot.plot is a function of matplotlib.
Obviously, pandas' plot uses matplotlib to plot by default, as mentioned in .plot documentation. Even though, pandas developers decided on a bit different api, just to make it more convenient to plot a dataframe directly. So yes, they have different api - even so pandas is using pyplot in backend.
One example is kind attribute: it's an addition in pandas to easily plot, while in matplotlib the design is a bit different and you can't specify it.
Think that you are a developer of pandas: you have the way matplotlib plotting, but you want to make life easier (in your opinion) for your users. So for their opinions, designing one general .plot method with specifying a kind attribute is better for them and maybe easier for users.

Why can I create a plot using df.plot, but then modify the plot I see by calling the plt object?

Maybe I'm just missing something really straightforward here. But I know in pandas I can use the matplotlib plotting feature by simply doing dataframe.plot (info here).
But how is it that I can modify that EXACT plot by just using plt.title, plt.xlabel, plt.ylabel, etc.? it doesn't make sense to me. For reference, I'm following this tutorial
dataset.plot(x='MinTemp', y='MaxTemp', style='o')
plt.title('MinTemp vs MaxTemp')
plt.xlabel('MinTemp')
plt.ylabel('MaxTemp')
plt.show()
Does it have something to do with the fact that when I'm running .plot on a dataframe, I'm really creating a matplotlib.pyplot object?
Matplotlib has the concept of the current axes. Essentially what this means is that whenever you first do something that requires an axes object, one is created for you and becomes the default object that all of your future actions will be applied to until you change the current axes to something else. In your case, dataset.plot(x='MinTemp', y='MaxTemp', style='o') creates an axes object and sets it as the current axes. All of plt.title(), plt.xlabel(), and plt.ylabel() simply apply their changes to the current axes.

understanding (and finding) matplotlib source code

Here it appears that matplotlib's specgram returns 4 variables including the last which is a plot:
http://matplotlib.org/examples/pylab_examples/specgram_demo.html
But here it seems there is only 3 variables returned in the tuple:
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/mlab.py#L478
Where is the missing code to generate the specgram plot? Perhaps I am just confused on the difference between pylab and matplotlib. Either way, I can't find the source.
You're confusing the function that computes the data to be plotted with the function that plots the data.
mlab.specgram just computes the data, while the axes method specgram plots it.
Have a look at: https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L5786
ipython is very useful for things like this. method_name? will display the relevant documentation and the location of the source file, while method_name?? will display the relevant code, as well.
Understanding where the source for a matplotlib function is can be a bit confusing. Basically, anything in matplotlib.pyplot is auto-generated. Essentially all of the plotting methods are actually methods of the Axes object.
Hopefully that gets you started. If no one else gives a better answer, I'll elaborate more in a bit, when I have more time.

Categories