difference between countplot and catplot - python

In python seaborn, What is the difference between countplot and catplot?
Eg:
sns.catplot(x='class', y='survived', hue='sex', kind='bar', data=titanic);
sns.countplot(y='deck', hue='class', data=titanic);

seaborn.countplot
Shows the counts of observations in each categorical bin using bars.
seaborn.catplot
Provides access to several axes-level functions that show the relationship between a numerical and one or more categorical variables using one of several visual representations.

There is a lot of overhead in catplot, or for that matter in FacetGrid, that will ensure that the categories are synchronized along the grid. Consider e.g. that you have a variable you plot along the columns of the grid for which not every age group occurs. You would still need to show that non-occuring age group and hold on to its color. Hence, two countplots next to each other do not necessarily make up one catplot.
However, if you are only interested in a single countplot, a catplot is clearly overkill. On the other hand, even a single countplot is overkill compared to a barplot of the counts.

Related

How can I create a plot that combines a plot of data, and a histogram of different data?

I need to create a plot that has two y-axis, and a single x-axis. On one x/y-axis pair, I need to plot several sets data (with lines). On the other x/y-axis pair, I need to plot a histogram of a different data set. The intention is to present several curves that represent the performance of several design variations, with a histogram of x-axis data, to visualize how optimized each variant is for the operating region.
Reference this example plot plot example.
There are several curves on the upper plot that represent the value of epsilon as a function of V for a set of variants A,B,C
The lower plot is a histogram that represents the amount of data points collected H for each V. This data is not directly related to the upper plot. The data on the lower plot visualizes the operating region for V, so that it is visually obvious what regions are more important for optimization.
I looked into the seaborn documentation for "Visualizing distributions of data" here.
It appears that the seaborn histograms can only be presented for the data being plotted.
I think that I need to do some combination of a separate line plot and histogram so that the correct data is represented in each plot.
I want this to be represented in a single figure, but I am unsure of the exact method to achieve this.
You'll need:
to share x axis: https://matplotlib.org/stable/gallery/subplots_axes_and_figures/shared_axis_demo.html
to adjust gap/space/padding between subplots:
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots_adjust.html
to Invert one of y axis (two options):
https://matplotlib.org/stable/gallery/subplots_axes_and_figures/invert_axes.html
https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.invert_yaxis.html

Is there a simple way to plot multiple series on one pandas scatter plot?

I come across this issue constantly; and my current solution is to create additional dataframes, I feel like there must be an easier solution.
Here is an example of data where I have multiple countries with multiple attributes:
If I wanted to plot Population vs. Depression (%) I would write:
ax = df.plot.scatter(x='Population', y='Depression (%)')
This isn't super helpful, as there are clearly lines linked to specific Countries (df['Country']). Is there a simple way to plot a scatter plot with different series (colors/shapes/etc) as different Countries?
Right now I use groupby to separate out individual Countries and plot them on the same axes (ax = ax).
Any thoughts or input would be greatly appreciated! Thank you!
Try c="Country" and then if you want some nice colors you can go colormap='viridis' for example documentation
ax2 = df.plot.scatter(x='length',
y='width',
c='species',
colormap='viridis')
Since you are using strings as variables we can't use this approach directly and need to convert the data to numbers. This can be done by writing:
c=df.country.astype("category").cat.codes

using matplotlib, not seaborn, make equal area violin plots

I'm trying to plot some data as split violins, for which I adapted this answer, to get a first pass. The issue with this is that the parameter controlling the violin sizes is a 'width' which means that distributions that are narrow will look materially smaller than distributions that are wide (they'll have less visual weight). I do not want to use Seaborn (I'm actually not using categorical data, for one thing), but it has a handy feature that makes the plotted area of violins equal. Does anyone have any ideas about how I could customize matplotlib's violinplot to do this?

Off-center X-Axis in Seaborn

I'm having an issue getting my boxplot to align with my x axis labels. I've tried adjusting the size of the chart, but the data points still look a little off. I appreciate any help!
This is the current chart:
It's hard to tell without an MCVE, But I'm guessing it's because you're using two categorical variables; x, and hue. This creates a so called "nested" (search for the key-word "smoke") box-plot, and if one of the categories is empty in some sense might cause the observed off-set.
Again, only guessing 'cause that's what you gave us.
Good luck!
This misalignment can happen when the hue argument is set.
You can add the dodge=False argument to the sns.boxplot function to keep boxplots aligned with the x-axis labels.
In your example, it would look like this:
sns.boxplot(x=df["Groups"], y=df["Rate per Month"], hue=df["Hours per Month"], dodge=False)
Description of the dodge parameter from the the seaborn.boxplot documentation:
dodge: bool, optional
When hue nesting is used, elements should be shifted along the categorical axis.
Example from the seaborn.boxplot documentation.

Pandas scatter plot

Im new to Python and Pandas but have a CSV file with multiple columns that I have read in to a dataframe. I would like to plot a scatter plot of x=Index and y='data'. Where the index is Index of the dataframe and is a date.
Thanks heaps
Jason
You can use plot_date:
plot_date(df.index, df.data)
Whilst, I guess technically not a scatter plot, you can use the pandas.plot function with point markers drawn on and no lines.
df.plot(marker='o', linewidth=0)
This then allows us to use all of the convenient pandas functionality you desire. e.g. plot two series and different scales, using a single function,
df.plot(marker='o', linewidth=0, secondary_y='y2')
The downside to this is that you lose some of the scatter functionality such as shading and sizing the markers differently.
Still, if your aim is a quick scatter plot, this might be the easiest route.

Categories