Changing from a subplot, violin plot stops working? - python

I want to make a plot like the first subfigure here:
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(9, 4))
# generate some random test data
all_data = [np.random.normal(0, std, 100) for std in range(6, 10)]
# plot violin plot
axes[0].violinplot(all_data,
showmeans=False,
showmedians=True)
axes[0].set_title('violin plot')
This code works but I just want the first subplot as a separate plot, so I change to plt.figure and remove the parts related to axes[1], but I can't get the violin plot to work anymore!
I have also tried a separate plot using sns.violinplot but it rotates the violin and plots them all on top of each other. Tips?

For simple single plots it's often easier to use matplotlib's pyplot interface rather than the object-oriented interface. Some functions have different names between these interfaces, e.g. plt.title() corresponds to ax.set_title().
import matplotlib.pyplot as plt
import numpy as np
# generate some random test data
all_data = [np.random.normal(0, std, 100) for std in range(6, 10)]
# plot violin plot
plt.violinplot(all_data,
showmeans=False,
showmedians=True)
plt.title('violin plot')

If you create a figure using fig = plt.figure(), you still need to create a subplot in this figure using add_subplot(). You can do this as follows:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(figsize=(9, 4))
axes = fig.add_subplot()
# generate some random test data
all_data = [np.random.normal(0, std, 100) for std in range(6, 10)]
# plot violin plot
axes.violinplot(all_data,
showmeans=False,
showmedians=True)
axes.set_title('violin plot')
This produces the following figure:
Note that fig, axes = plt.subplots() is simply shorthand for the two lines above, and the default values for ncols and nrows are 1, so you can simply remove these arguments from your original code and it will also work:
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(figsize=(9, 4))
# generate some random test data
all_data = [np.random.normal(0, std, 100) for std in range(6, 10)]
# plot violin plot
axes.violinplot(all_data,
showmeans=False,
showmedians=True)
axes.set_title('violin plot')

Related

Aligning y-axis label with middle of subplot?

I'm working with Matplotlib and have a large number of 1D heatmaps, each with their own label. However, the labels are misaligned with the plots and I cannot figure out to get this to work automatically.
Here's an MWE
import numpy as np
import matplotlib.pyplot as plt
data = np.random.rand(10, 1000)
dogs = ["woof", "bark", "bowwow"]
fig, axs = plt.subplots(10)
for i in range(10):
axs[i].scatter(np.linspace(0, 1, 1000), np.linspace(0,1,1000)*0, 2000,
c=data[i, :], marker="|", cmap='inferno')
axs[i].set_frame_on(False)
axs[i].set_yticklabels([])
axs[i].set_xticklabels([])
axs[i].set_xticks([])
axs[i].set_yticks([])
axs[i].set_ylabel(dogs[i%3], rotation='horizontal')
plt.show()
I experimented with
axs[i].yaxis.set_label_coords(x, y)
for various values of x and y, and nothing seems to work. I would prefer to have it align automatically, with the bottom of the text corresponding to the bottom of the individual plot.
Attached is an image showcasing the alignment issue.
Example
You could create your heatmaps via seaborn, and use yticklabels=[label_name] to set the labels. Rotating the labels to 0 degrees should have them nicely aligned. Note that the data is expected to have a shape of 1xN.
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
labels = ['Alkaid', 'Mizar', 'Alioth', 'Megrez', 'Phecda', 'Merak', 'Dubhe']
nrows = len(labels)
fig, axs = plt.subplots(nrows=nrows, figsize=(10, 5))
for ax_i, data_i, label_i in zip(axs, np.random.randn(nrows, 1, 100).cumsum(axis=2), labels):
sns.heatmap(data=data_i, xticklabels=[], yticklabels=[label_i], cmap='inferno', cbar=False, ax=ax_i)
ax_i.tick_params(axis='y', rotation=0, labelsize=22, length=0) # length means length of the tick mark
plt.tight_layout()
plt.show()
After a bit of playing around, I found that
axs[0].set_ylabel("Pseudotime", fontsize=12, rotation='horizontal', ha='right', va='center')
is sufficient for aligning the y-labels.

Change color of violin plot in matplotlib according to how broad the distribution is

I have been trying to work on plotting a nice violin plot to present my data using Matplotlib library in Python. This is the code I have used:
ax.violinplot(vdistances,showmeans=False,showmedians=True)
ax.set_title('Distance analysis for molecule 1')
ax.set_xlabel('Atomic distances')
ax.set_ylabel('Distances in Amstrongs')
ax.set_xticks([x for x in range(1,len(distances)+1)])
plt.show()
And this is what I have come up with:
What I have been wondering is if it is possible to assign different tones of the same color to each of the violins depending on how broad the distribution is, and so the more disperse the data is.
You could loop through the generated violins, extract their height and use that to set a color:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
from matplotlib.cm import ScalarMappable
import numpy as np
fig, ax = plt.subplots(figsize=(12, 5))
vdistances = np.random.normal(np.random.uniform(10, 20, (20, 10)), np.random.uniform(2, 5, (20, 10)))
violins = ax.violinplot(vdistances, showmeans=False, showmedians=True)
ax.xaxis.set_major_locator(MultipleLocator(1))
heights = [violin.get_paths()[0].get_extents().height for violin in violins['bodies']]
norm = plt.Normalize(min(heights), max(heights))
cmap = plt.get_cmap('plasma')
for violin, height in zip(violins['bodies'], heights):
violin.set_color(cmap(norm(height)))
violin.set_alpha(0.5)
plt.colorbar(ScalarMappable(norm=norm, cmap=cmap), alpha=violin.get_alpha(), label='Violin Extent', ax=ax)
plt.tight_layout()
plt.show()

How can I add jitter to my seaborn and matplot plots?

I am working on trying to add Jitter to my plots using seaborn and matplot plots. I am getting mixed information form what I am reading online. Some information is saying coding needs to be done and other information show it as being as simple as jitter = True. I there another library or something that I should be importing that I am not aware of? Below is the code that I am running and trying to add jitter to:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df.describe()
%matplotlib inline
ax = plt.figure(figsize=(12, 6)).gca() # define axis
headcount_df.plot.scatter(x = 'Hour', y = 'TablesOpen', ax = ax, alpha = 0.2)
# auto_price.plot(kind = 'scatter', x = 'city-mpg', y = 'price', ax = ax)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen')# Set text for y axis
ax.set_xlabel('Hour')
ax = sns.kdeplot(headcount_df.loc[:, ['TablesOpen', 'Hour']], shade = True, cmap = 'PuBu')
headcount_df.plot.scatter(x = 'Hour', y = 'TablesOpen', ax = ax, jitter = True)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen')# Set text for y axis
ax.set_xlabel('Hour')
I receive the error: AttributeError: 'PathCollection' object has no property 'jitter' when trying to add the jitter. Any help or more information on this would be much appreciated
To add jitter to a scatter plot, first get a handle to the collection that contains the scatter dots. When a scatter plot is just created on an ax, ax.collections[-1] will be the desired collection.
Calling get_offsets() on the collection gets all the xy coordinates of the dots. Add some small random number to each of them. As in this case all coordinates are integers, adding a random number between 0 and 1 spreads the dots out evenly.
In this case the number of dots is very huge. To better see where the dots are concentrated, they can be made very small (marker=',', linewidth=0, s=1,) and be very transparent (e.g.alpha=0.1).
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
fig, ax = plt.subplots(figsize=(12, 6))
headcount_df.plot.scatter(x='Hour', y='TablesOpen', marker=',', linewidth=0, s=1, alpha=.1, color='crimson', ax=ax)
dots = ax.collections[-1]
offsets = dots.get_offsets()
jittered_offsets = offsets + np.random.uniform(0, 1, offsets.shape)
dots.set_offsets(jittered_offsets)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen') # Set text for y axis
ax.set_xlabel('Hour')
ax.set_xticks(range(25))
ax.autoscale(enable=True, tight=True)
plt.tight_layout()
plt.show()
As there are a huge number of points, drawing the 2D kde takes a long time. The time can be reduced by taking a random sample from the rows. Note that to draw a 2D kde, the latest versions of Seaborn want each column as a separate parameter.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
fig, ax = plt.subplots(figsize=(12, 6))
N = 5000
rand_sel_df = headcount_df.iloc[np.random.choice(range(len(headcount_df)), N)]
ax = sns.kdeplot(rand_sel_df['Hour'], rand_sel_df['TablesOpen'], shade=True, cmap='PuBu', ax=ax)
ax.set_title('Hour vs TablesOpen')
ax.set_xticks(range(25))
plt.tight_layout()
plt.show()

Colorbar for sns.jointplot "kde"-style on the side

I'm trying to plot a colorbar next to my density plot with marginal axes.
It does plot the colorbar, but unfortunately not on the side.
That's what a tried so far:
sns.jointplot(x,y, data=df3, kind="kde", color="skyblue", legend=True, cbar=True,
xlim=[-10,40], ylim=[900,1040])
It looks like this:
I also tried this:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
kdeplot = sns.jointplot(x=tumg, y=pumg, kind="kde")
plt.subplots_adjust(left=0.2, right=0.8, top=0.8, bottom=0.2)
cbar_ax = kdeplot.fig.add_axes([.85, .25, .05, .4])
plt.colorbar(cax=cbar_ax)
plt.show()
But with the second option I'm getting a runtime error:
No mappable was found to use for colorbar creation.
First define a mappable such as an image (with imshow) or a contour set (with contourf).
Does anyone have an idea how to solve the problem?
There only seems to be information for a colorbar when effectively creating the colorbar.
So, an idea is to combine both approaches: add a colorbar via kdeplot, and then move it to the desired location. This will leave the main joint plot with insufficient width, so its width also should be adapted:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
# create some dummy data: gaussian multivariate with 10 centers with each 1000 points
tumg = np.random.normal(np.tile(np.random.uniform(10, 20, 10), 1000), 2)
pumg = np.random.normal(np.tile(np.random.uniform(10, 20, 10), 1000), 2)
kdeplot = sns.jointplot(x=tumg, y=pumg, kind="kde", cbar=True)
plt.subplots_adjust(left=0.1, right=0.8, top=0.9, bottom=0.1)
# get the current positions of the joint ax and the ax for the marginal x
pos_joint_ax = kdeplot.ax_joint.get_position()
pos_marg_x_ax = kdeplot.ax_marg_x.get_position()
# reposition the joint ax so it has the same width as the marginal x ax
kdeplot.ax_joint.set_position([pos_joint_ax.x0, pos_joint_ax.y0, pos_marg_x_ax.width, pos_joint_ax.height])
# reposition the colorbar using new x positions and y positions of the joint ax
kdeplot.fig.axes[-1].set_position([.83, pos_joint_ax.y0, .07, pos_joint_ax.height])
plt.show()

Histogram at specific coordinates inside axes

What I want to achieve with Python 3.6 is something like this :
Obviously made in paint and missing some ticks on the xAxis. Is something like this possible? Essentially, can I control exactly where to plot a histogram (and with what orientation)?
I specifically want them to be on the same axes just like the figure above and not on separate axes or subplots.
fig = plt.figure()
ax2Handler = fig.gca()
ax2Handler.scatter(np.array(np.arange(0,len(xData),1)), xData)
ax2Handler.hist(xData,bins=60,orientation='horizontal',normed=True)
This and other approaches (of inverting the axes) gave me no results. xData is loaded from a panda dataframe.
# This also doesn't work as intended
fig = plt.figure()
axHistHandler = fig.gca()
axScatterHandler = fig.gca()
axHistHandler.invert_xaxis()
axHistHandler.hist(xData,orientation='horizontal')
axScatterHandler.scatter(np.array(np.arange(0,len(xData),1)), xData)
A. using two axes
There is simply no reason not to use two different axes. The plot from the question can easily be reproduced with two different axes:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
xData = np.random.rand(1000)
fig,(ax,ax2)= plt.subplots(ncols=2, sharey=True)
fig.subplots_adjust(wspace=0)
ax2.scatter(np.linspace(0,1,len(xData)), xData, s=9)
ax.hist(xData,bins=60,orientation='horizontal',normed=True)
ax.invert_xaxis()
ax.spines['right'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax2.tick_params(axis="y", left=0)
plt.show()
B. using a single axes
Just for the sake of answering the question: In order to plot both in the same axes, one can shift the bars by their length towards the left, effectively giving a mirrored histogram.
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
xData = np.random.rand(1000)
fig,ax= plt.subplots(ncols=1)
fig.subplots_adjust(wspace=0)
ax.scatter(np.linspace(0,1,len(xData)), xData, s=9)
xlim1 = ax.get_xlim()
_,__,bars = ax.hist(xData,bins=60,orientation='horizontal',normed=True)
for bar in bars:
bar.set_x(-bar.get_width())
xlim2 = ax.get_xlim()
ax.set_xlim(-xlim2[1],xlim1[1])
plt.show()
You might be interested in seaborn jointplots:
# Import and fake data
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(2,1000)
# actual plot
jg = sns.jointplot(data[0], data[1], marginal_kws={"bins":100})
jg.ax_marg_x.set_visible(False) # remove the top axis
plt.subplots_adjust(top=1.15) # fill the empty space
produces this:
See more examples of bivariate distribution representations, available in Seaborn.

Categories