I am trying to create a plot by grouping the data into categories: "Topic", "Person" and "Organisation" similar to this graph:
How can I do this?
Thank you very much.
This was my coding:
figure, axis = plt.subplots(2, 2)
df_per = df_per.sort_values(by=['Frequency'])
y1 = plt.barh(y=df_per.Person, width=df_per.Frequency);
plt.title('Person')
df_org = df_org.sort_values(by=['Frequency'])
y2 = plt.barh(y=df_org.Organization, width=df_org.Frequency);
plt.title('Organization')
df_top = df_loc.sort_values(by=['Frequency'])
y3 = plt.barh(y=df_top.Topic, width=df_top.Frequency);
plt.title('Topic')
plt.show()
You need to select the axis that you want to plot on as a two dimensional array:
figure, axis = plt.subplots(2, 2)
df_per = df_per.sort_values(by=['Frequency'])
y1 = axis[0,0].barh(y=df_per.Person, width=df_per.Frequency);
plt.title('Person')
df_org = df_org.sort_values(by=['Frequency'])
y2 = axis[0,1].barh(y=df_org.Organization, width=df_org.Frequency);
plt.title('Organization')
df_top = df_loc.sort_values(by=['Frequency'])
y3 = axis[1,0].barh(y=df_top.Topic, width=df_top.Frequency);
plt.title('Topic')
plt.show()
Related
I want to read an Excel file, sum the values for the years 2021, 2020 and 2019 for the locations from the same region (region B) and then create a graph with two lines (for region A and B) which will show how the values for both regions have changed during the years.
I tried with this code:
import matplotlib.pyplot as plt
import pandas as pd
excel_file_path = "Testfile.xlsx"
df = pd.read_excel(excel_file_path)
x = ["2021", "2020", "2019"]
y1 = df_region["Values2021"]
y2 = df_region["Values2020"]
y3 = df_region["Values2019"]
fig = plt.figure(figsize=(20,5))
plt.plot(x, y1, color = 'red', label = "A")
plt.plot(x, y2, color = 'blue', label = "B")
plt.legend(loc='best')
plt.show()
But it isn't working for me - I get the following error:
"Exception has occurred: ValueError
x and y must have same first dimension, but have shapes (3,) and (2,)"
What do I need to do to get the result that I want? Any help would be greatly appreciated.
I think you meant to define your y1 and y2 a little differently.
x = ["2021", "2020", "2019"]
fig = plt.figure(figsize=(20,5))
colors = ['red', 'blue']
for i, region in enumerate(df_region.index):
y = df_region.loc[region, :]
plt.plot(x, y, color = colors[i], label = region)
plt.legend(loc='best')
plt.show()
which plots AND reads the region names from the DataFrame.
I try to plot 3 plots for each columns ('AFp1','AFp2','F9') in one figure with 'freqs' on the x axis and 'psd' on the y axis. I'm looking for a kind of loop through the variables because at the end I want to plot >50 plots in one figure.
Here I found a code that seems to do what I want but I don't get it to work:
num_plots = 20
colormap = plt.cm.gist_ncar
plt.gca().set_prop_cycle(plt.cycler('color', plt.cm.jet(np.linspace(0, 1, num_plots))))
x = np.arange(10)
labels = []
for i in range(1, num_plots + 1):
plt.plot(x, i * x + 5 * i)
labels.append(r'$y = %ix + %i$' % (i, 5*i))
plt.legend(labels, ncol=4, loc='upper center',
bbox_to_anchor=[0.5, 1.1],
columnspacing=1.0, labelspacing=0.0,
handletextpad=0.0, handlelength=1.5,
fancybox=True, shadow=True)
plt.show()
Here is how I tried to include this code in my for loop:
path = r'C:/M'
for fil in os.listdir(path):
#extract SUBJECT name
r = (fil.split(" ")[0])
#load file in pandas dataframe
data = pd.read_csv(path+f'{r} task.txt',sep=",",usecols= 'AFp1','AFp2','F9'])
data.columns = ['AFp1','AFp2','F9']
num_plots = 3
for columns in data(1, num_plots + 1):
freqs, psd = signal.welch(data[columns], fs=500,
window='hanning',nperseg=1000, noverlap=500, scaling='density', average='mean')
colormap = plt.cm.gist_ncar
plt.gca().set_prop_cycle(plt.cycler('color', plt.cm.jet(np.linspace(0, 1, num_plots))))
plt.plot(freqs, psd)
plt.legend(columns, ncol=4, loc='upper center',
bbox_to_anchor=[0.5, 1.1],
columnspacing=1.0, labelspacing=0.0,
handletextpad=0.0, handlelength=1.5,
fancybox=True, shadow=True)
plt.title(f'PSD for {r}')#, nperseg=1000, noverlap=500
plt.xlabel('Frequency [Hz]')
plt.ylabel('Power [V**2/Hz]')
plt.axis([0,50, -1, 5])
plt.show()
I get the following error:
for columns in data(1, num_plots + 1):
TypeError: 'DataFrame' object is not callable
If anyone could tell me how I can make it work, it would be great :D
Thank you very much,
Angelika
Shoaib's answer finally worked. Thank you very much:
"you should only use plt.show() once, so put it outside of for loop. your error is because data is an array but you used it as a function like data(something). you should see what is dimensions of data and then try to select columns or values using data[ something ] not data( something ). check dimensions of data using codes like print(data) or print(data[0]) or print(len(data)) or print(len(data[0])) etc. it will help you in debugging your code "
here is how you plot three functions in a figure
import matplotlib.pyplot as plt
from math import *
x_lim = 6
n = 1000
X = []
Y1 = []
Y2 = []
Y3 = []
for i in range(n):
x = x_lim * (i/n-1)
y1 = sin(x)
y2 = cos(x)
y3 = x**2
X.append( x )
Y1.append( y1 )
Y2.append( y2 )
Y3.append( y3 )
plt.plot(X,Y1)
plt.plot(X,Y2)
plt.plot(X,Y3)
plt.title("title")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
your question was not reproducible, because the file you are getting your data from is not entailed. so we can not reproduce your error with copy paste your code. but if you have an array namely data with for example 4 columns then you can separate each columns then plot them
for row in data:
x.append( row[0] )
Y1.append( row[1] )
Y2.append( row[2] )
Y3.append( row[3] )
I have created three tricontour plots and have overlayed these plots as shown below:
finalDf = pca_cont('%_helix')
x1 = list(finalDf['pc1'])
y1 = list(finalDf['pc2'])
z1 = list(finalDf['%_helix'])
zf1 = threshold (z1, 0.60)
finalDf2 = pca_cont('%_sheet')
x2 = list(finalDf2['pc1'])
y2 = list(finalDf2['pc2'])
z2 = list(finalDf2['%_sheet'])
zf2 = threshold (z2, 0.60)
finalDf3 = pca_cont('%_coil')
x3 = list(finalDf3['pc1'])
y3 = list(finalDf3['pc2'])
z3 = list(finalDf3['%_coil'])
zf3 = threshold (z3, 0.90)
plt.figure(figsize = (16,16))
plt.tricontour(x1,y1,zf1,500,cmap="Reds", alpha=.5)
plt.tricontour(x2,y2,zf2,500,cmap="Greens", alpha=.5)
plt.tricontour(x3,y3,zf3,500,cmap="Blues", alpha=.5)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("100 Dimension Embedding")
plt.colorbar()
plt.show()
But my problem is that I have three sets of colours and various depths so when I add a colorbar using plt.colorbar() it wont show all three. What is the best way to represent the three sets of data? Should I just plot a simple legend or is there a way to represent the three sets of data using three colorbars?
Thank you
Is it at all possible for me to make one set of subplots (with 2 plots) in a for loop that runs three times, and then fit the three sets of subplots into one main figure. The whole point of this is to be able to have 6 plots on one figure, but have a space between every other plot. I know how to have 6 plots in one figure, but I can only put space between every plot instead of every other plot. I hope my question makes sense. As for the data that I'm using, it is a pretty basic data set I'm using for practice right now. Each pair of plot share the same x-axis, which is why I don't want a space between them.
import matplotlib.pyplot as plt
x1 = [0,1,2,3,4,5]
y1 = [i*2 for i in x1]
y2 = [i*3 for i in x1]
x2 = [4,8,12,16,20]
y3 = [i*5 for i in x2]
y4 = [i*3 for i in x2]
x3 = [0,1,2,3,4,5]
y5 = [i*4 for i in x3]
y6 = [i*7 for i in x3]
fig = plt.figure(1,figsize=(5,5))
ax1 = plt.subplot(611)
ax1.plot(x1,y1)
ax2 = plt.subplot(612)
ax2.plot(x1,y2)
ax3 = plt.subplot(613)
ax3.plot(x2,y3)
ax4 = plt.subplot(614)
ax4.plot(x2,y4)
ax5 = plt.subplot(615)
ax5.plot(x3,y5)
ax6 = plt.subplot(616)
ax6.plot(x3,y6)
fig.subplots_adjust(hspace=0.5)
plt.show()
This is what I get:
Your code makes a graph with six sub-plots. If you make eight subplots and leave two of them empty, you get your added space. Here is the code I used, slightly modified from your code.
import matplotlib.pyplot as plt
x1 = [0,1,2,3,4,5]
y1 = [i*2 for i in x1]
y2 = [i*3 for i in x1]
x2 = [4,8,12,16,20]
y3 = [i*5 for i in x2]
y4 = [i*3 for i in x2]
x3 = [0,1,2,3,4,5]
y5 = [i*4 for i in x3]
y6 = [i*7 for i in x3]
fig = plt.figure(1,figsize=(5,7))
ax1 = plt.subplot(811)
ax1.plot(x1,y1)
ax2 = plt.subplot(812)
ax2.plot(x1,y2)
ax3 = plt.subplot(814)
ax3.plot(x2,y3)
ax4 = plt.subplot(815)
ax4.plot(x2,y4)
ax5 = plt.subplot(817)
ax5.plot(x3,y5)
ax6 = plt.subplot(818)
ax6.plot(x3,y6)
fig.subplots_adjust(hspace=0.5)
plt.show()
I get this result:
I had to increase the figure size height to 7 inches to accommodate the extra space. Is that what you want?
My question is how do I correlate my two binned plots and output a Pearson's correlation coefficient?
I'm not sure how to properly extract the binned arrays necessary for the np.corrcoef function. Here's my script:
import numpy as np
import matplotlib.pyplot as plt
A = np.genfromtxt('data1.txt')
x1 = A[:,1]
y1 = A[:,2]
B=np.genfromtxt('data2.txt')
x2 = B[:,1]
y2 = B[:,2]
fig = plt.figure()
plt.subplots_adjust(hspace=0.5)
plt.subplot(121)
AA = plt.hexbin(x1,y1,cmap='jet',gridsize=500,vmin=0,vmax=450,mincnt=1)
plt.axis([-180,180,-180,180])
cb = plt.colorbar()
plt.title('Data1')
plt.subplot(122)
BB = plt.hexbin(x2,y2,cmap='jet',gridsize=500,vmin=0,vmax=450,mincnt=1)
plt.axis([-180,180,-180,180])
cb = plt.colorbar()
plt.title('Data 2')
array1 = np.ndarray.flatten(AA)
array2 = np.ndarray.flatten(BB)
print np.corrcoef(array1,array2)
plt.show()
The answer can be found in the documentation:
Returns: object
a PolyCollection instance; use get_array() on this PolyCollection to get the counts in each hexagon.
Here's a revised version of you code:
A = np.genfromtxt('data1.txt')
x1 = A[:,1]
y1 = A[:,2]
B = np.genfromtxt('data2.txt')
x2 = B[:,1]
y2 = B[:,2]
# make figure and axes
fig, (ax1, ax2) = plt.subplots(1, 2)
# define common keyword arguments
hex_params = dict(cmap='jet', gridsize=500, vmin=0, vmax=450, mincnt=1)
# plot and set titles
hex1 = ax1.hexbin(x1, y1, **hex_params)
hex2 = ax2.hexbin(x2, y2, **hex_params)
ax1.set_title('Data 1')
ax2.set_title('Data 2')
# set axes lims
[ax.set_xlim(-180, 180) for ax in (ax1, ax2)]
[ax.set_ylim(-180, 180) for ax in (ax1, ax2)]
# add single colorbar
fig.subplots_adjust(right=0.8, hspace=0.5)
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
fig.colorbar(hex2, cax=cbar_ax)
# get binned data and corr coeff
binned1 = hex1.get_array()
binned2 = hex2.get_array()
print np.corrcoef(binned1, binned2)
plt.show()
Two comments though: are you sure you want the pearson correlation coefficient? What are you actually trying to show? If you want to show the distributions are the same/different, you might want to use a Kolmogorov-Smirnov test.
Also don't use jet as a colormap. Jet is bad.