using matplotlib, not seaborn, make equal area violin plots - python

I'm trying to plot some data as split violins, for which I adapted this answer, to get a first pass. The issue with this is that the parameter controlling the violin sizes is a 'width' which means that distributions that are narrow will look materially smaller than distributions that are wide (they'll have less visual weight). I do not want to use Seaborn (I'm actually not using categorical data, for one thing), but it has a handy feature that makes the plotted area of violins equal. Does anyone have any ideas about how I could customize matplotlib's violinplot to do this?

Related

How can I create a plot that combines a plot of data, and a histogram of different data?

I need to create a plot that has two y-axis, and a single x-axis. On one x/y-axis pair, I need to plot several sets data (with lines). On the other x/y-axis pair, I need to plot a histogram of a different data set. The intention is to present several curves that represent the performance of several design variations, with a histogram of x-axis data, to visualize how optimized each variant is for the operating region.
Reference this example plot plot example.
There are several curves on the upper plot that represent the value of epsilon as a function of V for a set of variants A,B,C
The lower plot is a histogram that represents the amount of data points collected H for each V. This data is not directly related to the upper plot. The data on the lower plot visualizes the operating region for V, so that it is visually obvious what regions are more important for optimization.
I looked into the seaborn documentation for "Visualizing distributions of data" here.
It appears that the seaborn histograms can only be presented for the data being plotted.
I think that I need to do some combination of a separate line plot and histogram so that the correct data is represented in each plot.
I want this to be represented in a single figure, but I am unsure of the exact method to achieve this.
You'll need:
to share x axis: https://matplotlib.org/stable/gallery/subplots_axes_and_figures/shared_axis_demo.html
to adjust gap/space/padding between subplots:
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots_adjust.html
to Invert one of y axis (two options):
https://matplotlib.org/stable/gallery/subplots_axes_and_figures/invert_axes.html
https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.invert_yaxis.html

difference between countplot and catplot

In python seaborn, What is the difference between countplot and catplot?
Eg:
sns.catplot(x='class', y='survived', hue='sex', kind='bar', data=titanic);
sns.countplot(y='deck', hue='class', data=titanic);
seaborn.countplot
Shows the counts of observations in each categorical bin using bars.
seaborn.catplot
Provides access to several axes-level functions that show the relationship between a numerical and one or more categorical variables using one of several visual representations.
There is a lot of overhead in catplot, or for that matter in FacetGrid, that will ensure that the categories are synchronized along the grid. Consider e.g. that you have a variable you plot along the columns of the grid for which not every age group occurs. You would still need to show that non-occuring age group and hold on to its color. Hence, two countplots next to each other do not necessarily make up one catplot.
However, if you are only interested in a single countplot, a catplot is clearly overkill. On the other hand, even a single countplot is overkill compared to a barplot of the counts.

How do I plot a scatterplot with marginal histograms AND histogram of differences using matplotlib and/or seaborn?

I'm looking to augment my scatterplot (Python code, using Matplotlib and/or Seaborn) with marginal distributions (here plotted as histograms, but could also be kernel density estimates):
with a visualization of the differences (histogram/density estimate), like so:
I could probably roll my own, but this seems like such a common use case that I'm suspecting this might be implemented somewhere already in Matplotlib or Seaborn. A good fifteen minutes of Googling did not yield anything, and it also has not been asked before here on StackOverflow. Does anyone know of an off-the-shelf solution for this? (If no one does, I'll write my own and post it of course.) Thanks!

Two violin plots (horizontal and vertical) intersecting at a point

I am unsure whether this functionality is possible through Plotly or if it is achievable with other plotting packages so I am open to different solutions.
I am trying to plot two violin plots in a single figure where one violin is oriented horizontally while the other is vertical. I would like to specify the point at which they intersect (i.e the primary axis of each). Ideally each would be transparent and interactive.
Somewhat poor illustration of what I need
Thank you for any proposed solutions!
There are multiple Python libraries doing Violin Plots. Plotly is among them, check its manual.
Besides, Seaborn is really good at this:
It is a Python visualization library based on Matplotlib. The code snipped for the above example can be found in Seaborn's gallery.

Python: how to plot points with little overlapping

I am using python to plot points. The plot shows relationship between area and the # of points of interest (POIs) in this area. I have 3000 area values and 3000 # of POI values.
Now the plot looks like this:
The problem is that, at lower left side, points are severely overlapping each other so it is hard to get enough information. Most areas are not that big and they don't have many POIs.
I want to make a plot with little overlapping. I am wondering whether I can use unevenly distributed axis or use histogram to make a beautiful plot. Can anyone help me?
I would suggest using a logarithmic scale for the y axis. You can either use pyplot.semilogy(...) or pyplot.yscale('log') (http://matplotlib.org/api/pyplot_api.html).
Note that points where area <= 0 will not be rendered.
I think we have two major choices here. First adjusting this plot, and second choosing to display your data in another type of plot.
In the first option, I would suggest clipping the boundries. You have plenty of space around the borders. If you limit the plot to the boundries, your data would scale better. On top of it, you may choose to plot the points with smaller dots, so that they would seem less overlapping.
Second option would be to choose displaying data in a different view, such as histograms. This might give a better insight in terms of distribution of your data among different bins. But this would be completely different type of view, in regards to the former plot.
I would suggest trying to adjust the plot by limiting the boundries of the plot to the data points, so that the plot area would have enough space to scale the data and try using histograms later. But as I mentioned, these are two different things and would give different insights about your data.
For adjusting you might try this:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,y1,y2))
You would probably need to make minor adjustments to the axis variables. Note that there should definetly be better options instead of this, but this was the first thing that came to my mind.

Categories