I have created three tricontour plots and have overlayed these plots as shown below:
finalDf = pca_cont('%_helix')
x1 = list(finalDf['pc1'])
y1 = list(finalDf['pc2'])
z1 = list(finalDf['%_helix'])
zf1 = threshold (z1, 0.60)
finalDf2 = pca_cont('%_sheet')
x2 = list(finalDf2['pc1'])
y2 = list(finalDf2['pc2'])
z2 = list(finalDf2['%_sheet'])
zf2 = threshold (z2, 0.60)
finalDf3 = pca_cont('%_coil')
x3 = list(finalDf3['pc1'])
y3 = list(finalDf3['pc2'])
z3 = list(finalDf3['%_coil'])
zf3 = threshold (z3, 0.90)
plt.figure(figsize = (16,16))
plt.tricontour(x1,y1,zf1,500,cmap="Reds", alpha=.5)
plt.tricontour(x2,y2,zf2,500,cmap="Greens", alpha=.5)
plt.tricontour(x3,y3,zf3,500,cmap="Blues", alpha=.5)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("100 Dimension Embedding")
plt.colorbar()
plt.show()
But my problem is that I have three sets of colours and various depths so when I add a colorbar using plt.colorbar() it wont show all three. What is the best way to represent the three sets of data? Should I just plot a simple legend or is there a way to represent the three sets of data using three colorbars?
Thank you
Related
I am trying to create a plot by grouping the data into categories: "Topic", "Person" and "Organisation" similar to this graph:
How can I do this?
Thank you very much.
This was my coding:
figure, axis = plt.subplots(2, 2)
df_per = df_per.sort_values(by=['Frequency'])
y1 = plt.barh(y=df_per.Person, width=df_per.Frequency);
plt.title('Person')
df_org = df_org.sort_values(by=['Frequency'])
y2 = plt.barh(y=df_org.Organization, width=df_org.Frequency);
plt.title('Organization')
df_top = df_loc.sort_values(by=['Frequency'])
y3 = plt.barh(y=df_top.Topic, width=df_top.Frequency);
plt.title('Topic')
plt.show()
You need to select the axis that you want to plot on as a two dimensional array:
figure, axis = plt.subplots(2, 2)
df_per = df_per.sort_values(by=['Frequency'])
y1 = axis[0,0].barh(y=df_per.Person, width=df_per.Frequency);
plt.title('Person')
df_org = df_org.sort_values(by=['Frequency'])
y2 = axis[0,1].barh(y=df_org.Organization, width=df_org.Frequency);
plt.title('Organization')
df_top = df_loc.sort_values(by=['Frequency'])
y3 = axis[1,0].barh(y=df_top.Topic, width=df_top.Frequency);
plt.title('Topic')
plt.show()
I'd like to plot different data sets in an stacked histogram, but I want the data on top to have a step type.
I have done this one by splitting the data, first two sets in an stacked histogram and a sum of all the data sets in a different step histogram. Here is the code and plot
mu, sigma = 100, 10
x1 = list(mu + sigma*np.random.uniform(1,100,100))
x2 = list(mu + sigma*np.random.uniform(1,100,100))
x3 = list(mu + sigma*np.random.uniform(1,100,100))
plt.hist([x1, x2], bins=20, stacked=True, histtype='stepfilled', color=['green', 'red'], zorder=2)
plt.hist(x1+x2+x3, bins=20, histtype='step', ec='dodgerblue', ls='--', linewidth=3., zorder=1)
The problem with this example are the borders of the 'step' histogram that are wider than the width of the 'stepfilled' histogram. Any way of fixing this?
For the bars to coincide, two issues need to be solved:
The bin boundaries for the histograms should be exactly equal. They can be calculated dividing the distance from the overall minimum to maximum in N+1 equal parts. Both calls to plt.hist need the same bin boundaries.
The thick edge of the 'step' histogram makes the bars wider. Therefore, the other histogram needs edges of the same width. plt.hist doesn't seem to accept a list of colors for the different parts of the stacked histogram, so a fixed color needs to be set. Optionally, the edge color can be changed afterwards looping through the generated bars.
from matplotlib import pyplot as plt
import numpy as np
mu, sigma = 100, 10
x1 = mu + sigma * np.random.uniform(1, 100, 100)
x2 = mu + sigma * np.random.uniform(1, 100, 100)
x3 = mu + sigma * np.random.uniform(1, 100, 100)
xmin = np.min([x1, x2, x3])
xmax = np.max([x1, x2, x3])
bins = np.linspace(xmin, xmax, 21)
_, _, barlist = plt.hist([x1, x2], bins=bins, stacked=True, histtype='stepfilled',
color=['limegreen', 'crimson'], ec='black', linewidth=3, zorder=2)
plt.hist(np.concatenate([x1, x2, x3]), bins=bins, histtype='step',
ec='dodgerblue', ls='--', linewidth=3, zorder=1)
for bars in barlist:
for bar in bars:
bar.set_edgecolor(bar.get_facecolor())
plt.show()
This is how it would look like with cross-hatching (plt.hist(..., hatch='X')) and black edges:
Using the answers to this question I could generate the dist plot as I needed. However, when I want to apply the same solution to multiple plots, it doesn't seem to work as expected. I am seeking for proposed solutions. Here is what I am trying to do:
import seaborn as sns, numpy as np
from scipy import stats
import matplotlib.pyplot as plt
plt.figure(figsize=(20,10))
sns.set(); np.random.seed(0)
data01 = np.random.normal(10, 5, 1000)
data02 = np.random.normal(20, 5, 1000)
ax1 = sns.distplot(data01, color = 'blue', kde = True)
x1 = ax1.lines[0].get_xdata()
y1 = ax1.lines[0].get_ydata()
plt.axvline(x1[np.argmax(y1)], color='blue')
ax2 = sns.distplot(data02, color = 'red', kde = True)
x2 = ax2.lines[0].get_xdata()
y2 = ax2.lines[0].get_ydata()
plt.axvline(x2[np.argmax(y2)], color='red')
plt.legend()
Here is what I get, which is not what I expected (two vertical lines, one for each):
You need to use the correct index: Index 0 is for the blue kde, index 1 is for the blue vertical line, index 2 is for the red kde.
Intuitively, as the name suggests ax2.lines gives you the collection of all the lines on the current figure. When you plot distplot with kde=True the second time, you already have 2 lines (previous kde and vertical line) so the index of the second kde is 2 because indexing starts from 0 in python. This is because you are working with the same figure object so the artists plotted from ax1 will also be carried over to ax2. On the contrary, if you were to have individual subplots, then you would have used the same index 0 for both
ax1 = sns.distplot(data01, color = 'blue', kde = True)
x1 = ax1.lines[0].get_xdata()
y1 = ax1.lines[0].get_ydata()
plt.axvline(x1[np.argmax(y1)], color='blue')
ax2 = sns.distplot(data02, color = 'red', kde = True)
x2 = ax2.lines[2].get_xdata() # <--- Use correct index 2 here
y2 = ax2.lines[2].get_ydata() # <--- Use correct index 2 here
plt.axvline(x2[np.argmax(y2)], color='red')
plt.legend()
Is it at all possible for me to make one set of subplots (with 2 plots) in a for loop that runs three times, and then fit the three sets of subplots into one main figure. The whole point of this is to be able to have 6 plots on one figure, but have a space between every other plot. I know how to have 6 plots in one figure, but I can only put space between every plot instead of every other plot. I hope my question makes sense. As for the data that I'm using, it is a pretty basic data set I'm using for practice right now. Each pair of plot share the same x-axis, which is why I don't want a space between them.
import matplotlib.pyplot as plt
x1 = [0,1,2,3,4,5]
y1 = [i*2 for i in x1]
y2 = [i*3 for i in x1]
x2 = [4,8,12,16,20]
y3 = [i*5 for i in x2]
y4 = [i*3 for i in x2]
x3 = [0,1,2,3,4,5]
y5 = [i*4 for i in x3]
y6 = [i*7 for i in x3]
fig = plt.figure(1,figsize=(5,5))
ax1 = plt.subplot(611)
ax1.plot(x1,y1)
ax2 = plt.subplot(612)
ax2.plot(x1,y2)
ax3 = plt.subplot(613)
ax3.plot(x2,y3)
ax4 = plt.subplot(614)
ax4.plot(x2,y4)
ax5 = plt.subplot(615)
ax5.plot(x3,y5)
ax6 = plt.subplot(616)
ax6.plot(x3,y6)
fig.subplots_adjust(hspace=0.5)
plt.show()
This is what I get:
Your code makes a graph with six sub-plots. If you make eight subplots and leave two of them empty, you get your added space. Here is the code I used, slightly modified from your code.
import matplotlib.pyplot as plt
x1 = [0,1,2,3,4,5]
y1 = [i*2 for i in x1]
y2 = [i*3 for i in x1]
x2 = [4,8,12,16,20]
y3 = [i*5 for i in x2]
y4 = [i*3 for i in x2]
x3 = [0,1,2,3,4,5]
y5 = [i*4 for i in x3]
y6 = [i*7 for i in x3]
fig = plt.figure(1,figsize=(5,7))
ax1 = plt.subplot(811)
ax1.plot(x1,y1)
ax2 = plt.subplot(812)
ax2.plot(x1,y2)
ax3 = plt.subplot(814)
ax3.plot(x2,y3)
ax4 = plt.subplot(815)
ax4.plot(x2,y4)
ax5 = plt.subplot(817)
ax5.plot(x3,y5)
ax6 = plt.subplot(818)
ax6.plot(x3,y6)
fig.subplots_adjust(hspace=0.5)
plt.show()
I get this result:
I had to increase the figure size height to 7 inches to accommodate the extra space. Is that what you want?
My question is how do I correlate my two binned plots and output a Pearson's correlation coefficient?
I'm not sure how to properly extract the binned arrays necessary for the np.corrcoef function. Here's my script:
import numpy as np
import matplotlib.pyplot as plt
A = np.genfromtxt('data1.txt')
x1 = A[:,1]
y1 = A[:,2]
B=np.genfromtxt('data2.txt')
x2 = B[:,1]
y2 = B[:,2]
fig = plt.figure()
plt.subplots_adjust(hspace=0.5)
plt.subplot(121)
AA = plt.hexbin(x1,y1,cmap='jet',gridsize=500,vmin=0,vmax=450,mincnt=1)
plt.axis([-180,180,-180,180])
cb = plt.colorbar()
plt.title('Data1')
plt.subplot(122)
BB = plt.hexbin(x2,y2,cmap='jet',gridsize=500,vmin=0,vmax=450,mincnt=1)
plt.axis([-180,180,-180,180])
cb = plt.colorbar()
plt.title('Data 2')
array1 = np.ndarray.flatten(AA)
array2 = np.ndarray.flatten(BB)
print np.corrcoef(array1,array2)
plt.show()
The answer can be found in the documentation:
Returns: object
a PolyCollection instance; use get_array() on this PolyCollection to get the counts in each hexagon.
Here's a revised version of you code:
A = np.genfromtxt('data1.txt')
x1 = A[:,1]
y1 = A[:,2]
B = np.genfromtxt('data2.txt')
x2 = B[:,1]
y2 = B[:,2]
# make figure and axes
fig, (ax1, ax2) = plt.subplots(1, 2)
# define common keyword arguments
hex_params = dict(cmap='jet', gridsize=500, vmin=0, vmax=450, mincnt=1)
# plot and set titles
hex1 = ax1.hexbin(x1, y1, **hex_params)
hex2 = ax2.hexbin(x2, y2, **hex_params)
ax1.set_title('Data 1')
ax2.set_title('Data 2')
# set axes lims
[ax.set_xlim(-180, 180) for ax in (ax1, ax2)]
[ax.set_ylim(-180, 180) for ax in (ax1, ax2)]
# add single colorbar
fig.subplots_adjust(right=0.8, hspace=0.5)
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
fig.colorbar(hex2, cax=cbar_ax)
# get binned data and corr coeff
binned1 = hex1.get_array()
binned2 = hex2.get_array()
print np.corrcoef(binned1, binned2)
plt.show()
Two comments though: are you sure you want the pearson correlation coefficient? What are you actually trying to show? If you want to show the distributions are the same/different, you might want to use a Kolmogorov-Smirnov test.
Also don't use jet as a colormap. Jet is bad.