How to force AxesSubplot to use integer tick labels - python

I'm struggling to correctly apply Integerlabels to the AxesSubplot Object. Somehow all the other popular solutions don't work. I presume this has to do with the nature of how Pandas plots vs how Matplotlib natively plots a dataframe.
When looking at the docs, there is a mention about preventing resolution adjustments by using the optional parameter x_compat like so:
df['A'].plot(x_compat=True)
However this only throws me an error and I also can't find this parameter in the pandas 1.1.2 docs.
This is the code that produces the plot:
def count_id(id_val):
plot = df[df['ID']==id_val]['TRANSCRIPTION_STRING'].value_counts().plot(kind='bar')
plot.set_xticklabels(plot.get_xticklabels(), rotation=40, ha ='right')
print(type(plot))
I have found this answer and it wasn't helpful for the following reasons:
It uses .gca() which throws an error for my case.
It's specifically using matplotlib instead of pandas.plot, which is the same under the hood, but hardly has very different documentation and isn't obvious to work for my case. More experienced users might differ.
It was a good suggestion and I see the similarities, but I very much tried to find it helpful before asking and it just wasn't helpful.

You can use:
from matplotlib.ticker import MaxNLocator
plot.yaxis.set_major_locator(MaxNLocator(integer=True))
MaxNLocator is the default 'locator' for the ticks. You can also force some fixed multiples, e.g. set_major_locator(MultipleLocater(1)).
PS: The usual variable name for the return value of pandas plot functions is ax. This makes it clearer that the standard matplotlib functions can be used to customize the plots.
ax = df[df['ID']==id_val]['TRANSCRIPTION_STRING'].value_counts().plot(kind='bar')
ax.set_xticklabels(plot.get_xticklabels(), rotation=40, ha ='right')
ax.yaxis.set_major_locator(MaxNLocator(integer=True))

Related

'Line 2D' object has no property 'kind' -- Are pyplot.plot( ) and .plot( ) different?

I'm learning the pandas module on datacamp, and in a particular course, the instructor uses:
dog_pack.plot(x= "height_cm", y= "weight_kg", kind="scatter")
plt.show()
to create a scatter plot. On my local PC, I try to do the same thing with the gapminder dataset, this works as intended:
# with the necessary imports (gapminder, matplotlib.pyplot, pandas)
gapminder.plot(x = "gdpPercap", y = "lifeExp", kind = "scatter")
But this throws an error:
# With the necessary imports
plt.plot(gapminder["gdpPercap"], gapminder["lifeExp"], kind = "scatter")
# This gives an error
And this works as intended:
plt.scatter(gapminder["gdpPercap"], gapminder["lifeExp"])
plt.show()
Are plt.plot( ) and .plot () (called on a dataframe) different?
Each function belongs to different library: DataFrame.plot is function of pandas, and pyplot.plot is a function of matplotlib.
Obviously, pandas' plot uses matplotlib to plot by default, as mentioned in .plot documentation. Even though, pandas developers decided on a bit different api, just to make it more convenient to plot a dataframe directly. So yes, they have different api - even so pandas is using pyplot in backend.
One example is kind attribute: it's an addition in pandas to easily plot, while in matplotlib the design is a bit different and you can't specify it.
Think that you are a developer of pandas: you have the way matplotlib plotting, but you want to make life easier (in your opinion) for your users. So for their opinions, designing one general .plot method with specifying a kind attribute is better for them and maybe easier for users.

Why can I create a plot using df.plot, but then modify the plot I see by calling the plt object?

Maybe I'm just missing something really straightforward here. But I know in pandas I can use the matplotlib plotting feature by simply doing dataframe.plot (info here).
But how is it that I can modify that EXACT plot by just using plt.title, plt.xlabel, plt.ylabel, etc.? it doesn't make sense to me. For reference, I'm following this tutorial
dataset.plot(x='MinTemp', y='MaxTemp', style='o')
plt.title('MinTemp vs MaxTemp')
plt.xlabel('MinTemp')
plt.ylabel('MaxTemp')
plt.show()
Does it have something to do with the fact that when I'm running .plot on a dataframe, I'm really creating a matplotlib.pyplot object?
Matplotlib has the concept of the current axes. Essentially what this means is that whenever you first do something that requires an axes object, one is created for you and becomes the default object that all of your future actions will be applied to until you change the current axes to something else. In your case, dataset.plot(x='MinTemp', y='MaxTemp', style='o') creates an axes object and sets it as the current axes. All of plt.title(), plt.xlabel(), and plt.ylabel() simply apply their changes to the current axes.

Adding errorbars in ggplot Python

Can't find a way of adding errorbars to a Python ggplot plot. The following issue has been neglected for over a year. Nothing in the docs.
I had this same problem and found no solution. However I did find a way around it. You can use matplotlib in the style of ggplot. From there it's much easier to use error bars. I've attached an example of some code I used.
plt.style.use('ggplot')
This is an extract of one of my codes
df2.gLongCrFiltered['mean'].plot(kind='bar', yerr=df2.gLongCrFiltered['std'])
which returned this

When to use the matplotlib.pyplot class and when to use the plot object (matplotlib.collections.PathCollection)

I wondered what the logic is behind the question when to use the plot instance (which is a PathCollection) and when to use the plot class itself.
import matplotlib.pyplot as plt
p = plt.scatter([1,2,3],[1,2,3])
brings up a scatter plot. To make it work, I have to say:
plt.annotate(...)
and to configure axes labels or limits, you write:
plt.xlim(...)
plt.xlabel(...)
and so on.
But on the other hand, you write:
p.axes.set_aspect(...)
p.axes.yaxis.set_major_locator(...)
What is the logic behind this? Can I look it up somewhere? Unfortunately, I haven't found an answer to this particular question in the documentation.
When do you use the actual instance p for configuration of your graph and when do you use the pyplot class plt?
According to PEP20:
"Explicit is better than implicit."
"Simple is better than complex."
Oftentimes, the "make-it-just-work" code takes the pyplot route, as it hides away all of the figure and axes management that many wouldn't care about. This is often used for interactive mode coding, simple one-off scripts, or plotting performed at high-level scripts.
However, if you are creating a library module that is to do plotting, and you have no guarantee that the library user isn't doing any additional plotting of their own, then it is best to be explicit and avoid the pyplot interface. I usually design my functions to accept as optional arguments the axes and/or figure objects the user would like to operate upon (if not given, then I use plt.gcf() and/or plt.gca()).
The rule of thumb I have is that if the operation I am performing could be done via pyplot, but if doing so would likely change the "state machine", then I avoid pyplot. Note that any action via pyplot (such as plt.xlim()) gets/sets the current axes/figure/image (the "state machine"), while actions like ax.set_xlim() do not.
'plt' is just a shortcut, it's useful when you have only 1 plot. When you use the plt directly automatically matplotlib create a 'figure' and a subplot, but when you want to work with more than 1 subplot then you will need to use 'axes' methods, example:
fig = plt.figure()
a = fig.add_subplot(211)
b = fig.add_subplot(212)
print a.__class__ #<class 'matplotlib.axes.AxesSubplot'>
print fig.__class__ #<class 'matplotlib.figure.Figure'>
a.plot([0,1],[0,1],'r')
b.plot([1,0],[0,1],'b')
fig.show()
this could not be done using 'plt' directly.

understanding (and finding) matplotlib source code

Here it appears that matplotlib's specgram returns 4 variables including the last which is a plot:
http://matplotlib.org/examples/pylab_examples/specgram_demo.html
But here it seems there is only 3 variables returned in the tuple:
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/mlab.py#L478
Where is the missing code to generate the specgram plot? Perhaps I am just confused on the difference between pylab and matplotlib. Either way, I can't find the source.
You're confusing the function that computes the data to be plotted with the function that plots the data.
mlab.specgram just computes the data, while the axes method specgram plots it.
Have a look at: https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L5786
ipython is very useful for things like this. method_name? will display the relevant documentation and the location of the source file, while method_name?? will display the relevant code, as well.
Understanding where the source for a matplotlib function is can be a bit confusing. Basically, anything in matplotlib.pyplot is auto-generated. Essentially all of the plotting methods are actually methods of the Axes object.
Hopefully that gets you started. If no one else gives a better answer, I'll elaborate more in a bit, when I have more time.

Categories