I have a dataframe as shown below:
PNO,SID,SIZE,N3IC,S4IC,N4TC,KPAC,NAAC,ECTC
SJV026,VIDAIC,FINE,0.0926,0.0446,0.0333,0.0185,0.005,0.0516
SJV028,CHCRUC,FINE,0,0.1472,0.0076,0.0001,0.0025,0.0301
SJV051,AMSUL,FINE,0,0.727,0.273,0,0,0
SJV035,MOVES1,FINE,0.02,0.04092,0,0,0,0.45404
I am looking to plot this using matplotlib or seaborn where there will be 'n' number of subplots for each row of data (that is one bar plot for each row of data).
import matplotlib.pyplot as plt
import pandas as pd
from tkinter.filedialog import askopenfilename
Inp_Filename = askopenfilename()
df = pd.read_csv(Inp_Filename)
rows, columns = df.shape
fig, axes = plt.subplots(rows, 1, figsize=(15, 20))
count = 0
for each in df.iterrows():
row = df.iloc[count,3:]
row.plot(kind='bar')
count = count + 1
plt.show()
The above code output is not what I am looking for. Is there a way to plot each row of the data in the 'fig' and 'axes' above?
In principle the approach is correct. There are just a couple of errors in your code, which, when corrected, give the desired result.
import io
import matplotlib.pyplot as plt
import pandas as pd
u = u"""PNO,SID,SIZE,N3IC,S4IC,N4TC,KPAC,NAAC,ECTC
SJV026,VIDAIC,FINE,0.0926,0.0446,0.0333,0.0185,0.005,0.0516
SJV028,CHCRUC,FINE,0,0.1472,0.0076,0.0001,0.0025,0.0301
SJV051,AMSUL,FINE,0,0.727,0.273,0,0,0
SJV035,MOVES1,FINE,0.02,0.04092,0,0,0,0.45404"""
df = pd.read_csv(io.StringIO(u))
rows, columns = df.shape
fig, axes = plt.subplots(rows, 1, sharex=True, sharey=True)
for i, r in df.iterrows():
row = df.iloc[i,3:]
row.plot(kind='bar', ax=axes[i])
plt.show()
Related
I'm trying to plot two histogram using the result of a group by. But the labels just appear in one of the labels.
How can I put the label in both charts?
And how can I put different title for the charts (e.g. first as Men's grade and Second as Woman's grade)
import pandas as pd
import matplotlib.pyplot as plt
microdataEnem = pd.read_csv('C:\\Users\\Lucas\\AppData\\Local\\Programs\\Python\\Python39\\Scripts\\Data Science\\Data Analysis\\Projects\\ENEM\\DADOS\\MICRODADOS_ENEM_2019.csv', sep = ';', encoding = 'ISO-8859-1', nrows=10000)
sex_essaygrade = ['TP_SEXO', 'NU_NOTA_REDACAO']
filter_sex_essaygrade = microdataEnem.filter(items = sex_essaygrade)
filter_sex_essaygrade.dropna(subset = ['NU_NOTA_REDACAO'], inplace = True)
filter_sex_essaygrade.groupby('TP_SEXO').hist()
plt.xlabel('Grade')
plt.ylabel('Number of students')
plt.show()
Instead of using filter_sex_essaygrade.groupby('TP_SEXO').hist() you can try the following format: axs = filter_sex_essaygrade['NU_NOTA_REDACAO'].hist(by=filter_sex_essaygrade['TP_SEXO']). This will automatically title each histogram with the group name.
You'll want to set an the variable axs equal to this histogram object so that you can modify the x and y labels for both plots.
I created some data similar to yours, and I get the following result:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
sex_essaygrade = ['TP_SEXO', 'NU_NOTA_REDACAO']
## create two distinct sets of grades
sample_grades = np.concatenate((np.random.randint(low=70,high=100,size=100), np.random.randint(low=80,high=100,size=100)))
filter_sex_essaygrade = pd.DataFrame({
'NU_NOTA_REDACAO': sample_grades,
'TP_SEXO': ['Men']*100 + ['Women']*100
})
axs = filter_sex_essaygrade['NU_NOTA_REDACAO'].hist(by=filter_sex_essaygrade['TP_SEXO'])
for ax in axs.flatten():
ax.set_xlabel("Grade")
ax.set_ylabel("Number of students")
plt.show()
I am trying to create a grid of subplots. each subplot will look like the one that is on this site.
https://python-graph-gallery.com/24-histogram-with-a-boxplot-on-top-seaborn/
If I have 10 different sets of this style of plot I want to make them into a 5x2 for example.
I have read through the documentation of Matplotlib and cannot seem to figure out how do it. I can loop the subplots and have each output but I cannot make it into the rows and columns
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for c in df :
# Cut the window in 2 parts
f, (ax_box,
ax_hist) = plt.subplots(2,
sharex=True,
gridspec_kw={"height_ratios":(.15, .85)},
figsize = (10, 10))
# Add a graph in each part
sns.boxplot(df[c], ax=ax_box)
ax_hist.hist(df[c])
# Remove x axis name for the boxplot
plt.show()
the results would just take this loop and put them into a set of rows and columns in this case 5x2
You have 10 columns, each of which creates 2 subplots: a box plot and a histogram. So you need a total of 20 figures. You can do this by creating a grid of 2 rows and 10 columns
Complete answer: (Adjust the figsize and height_ratios as per taste)
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
f, axes = plt.subplots(2, 10, sharex=True, gridspec_kw={"height_ratios":(.35, .35)},
figsize = (12, 5))
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for i, c in enumerate(df):
sns.boxplot(df[c], ax=axes[0,i])
axes[1,i].hist(df[c])
plt.tight_layout()
plt.show()
I have a pandas dataframe which I would like to slice, and plot each slice in a separate subplot. I would like to use the sharey='all' and have matplotlib decide on some reasonable y-axis limits, rather than having to search the dataframe for the min and max and add offsets.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=0,ncols=0, sharey='all', tight_layout=True)
for i in range(1, len(df.columns) + 1):
ax = fig.add_subplot(2,3,i)
iC = df.iloc[:, i-1]
iC.plot(ax=ax)
Which gives the following plot:
In fact, it gives that irrespective of what I specify sharey to be ('all','col','row',True, or False). What I sought after using sharey='all' would be something like:
Can somebody perhaps explain me what I'm doing wrong here?
The following version would only add those axes you need for your df-columns and share their y-scales:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig = plt.figure(tight_layout=True)
ref_ax = None
for i in range(len(df.columns)):
ax = fig.add_subplot(2, 3, i+1, sharey=ref_ax)
ref_ax=ax
iC = df.iloc[:, i]
iC.plot(ax=ax)
plt.show()
The grid-layout Parameters, which are explicitly given as ...add_subplot(2, 3, ... here can of course be calculated with respect to len(df.columns).
Your plots are not shared. You create a subplot grid with 0 rows and 0 columns, i.e. no subplots at all, but those nonexisting subplots have their y axes shared. Then you create some other (existing) subplots, which are not shared. Those are the ones that are plotted to.
Instead you need to set nrows and ncols to some useful values and plot to those hence created axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=2,ncols=3, sharey='all', tight_layout=True)
for i, ax in zip(range(len(df.columns)), axes.flat):
iC = df.iloc[:, i]
iC.plot(ax=ax)
for j in range(len(df.columns),len(axes.flat)):
axes.flatten()[j].axis("off")
plt.show()
I have a Dataframe and I slice the Dataframe into three subsets. Each subset has 3 to 4 rows of data. After I slice the data frame into three subsets, I plot them using Matplotlib.
The problem I have is I am not able to create a plot where each subplot is plotted using sliced DataFrame. For example, in a group of three in a set, I have only one of the plots (last subplot) plotted where there is no data for the remaining two plots initial sets in a group. it looks like the 'r' value does not pass to 'r.plot' for all three subplots.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
df_grouped = df.groupby('key1')
for group_name, group_value in df_grouped:
rows, columns = group_value.shape
fig, axes = plt.subplots(rows, 1, sharex=True, sharey=True, figsize=(15,20))
for i,r in group_value.iterrows():
r = r[0:columns-1]
r.plot(kind='bar', fill=False, log=False)
I think you might want what I call df_subset to be summarized in some way, but here's a way to plot each group in its own panel.
# Your Code Setting Up the Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
# My Code to Plot in Three Panels
distinct_keys = df['key1'].unique()
fig, axes = plt.subplots(len(distinct_keys), 1, sharex=True, figsize=(3,5))
for i, key in enumerate(distinct_keys):
df_subset = df[df.key1==key]
# {maybe insert a line here to summarize df_subset somehow interesting?}
# plot
axes[i] = df_subset.plot(kind='bar', fill=False, log=False)
I have three different data sets where I produce a facetplot, each
a = sns.FacetGrid(data1, col="overlap", hue="comp")
a = (g.map(sns.kdeplot, "val",bw=0.8))
b = sns.FacetGrid(data2, col="overlap", hue="comp")
b = (g.map(sns.kdeplot, "val",bw=0.8))
c = sns.FacetGrid(data3, col="overlap", hue="comp")
c = (g.map(sns.kdeplot, "val",bw=0.8))
Each of those plots has three subplots in one row, so in total I have nine plots.
I would like to combine these plots, in a subplots setting like this
f, (ax1, ax2, ax3) = plt.subplots(3,1)
ax1.a
ax2.b
ax3.c
How can I do that?
A FacetGrid creates its own figure. Combining several figures into one is not an easy task. Additionally, there is no such thing as subplot rows which can be added to a figure. So one would need to manipulate the axes individually.
That said, it might be easier to find workarounds. E.g. if the dataframes to show have the same structure as it seems to be from the question code, one can combine the dataframes into a single frame with a new column and use this as the row attribute of the facet grid.
import numpy as np; np.random.seed(3)
import pandas as pd
import seaborn.apionly as sns
import matplotlib.pyplot as plt
def get_data(n=266, s=[5,13]):
val = np.c_[np.random.poisson(lam=s[0], size=n),
np.random.poisson(lam=s[1], size=n)].T.flatten()
comp = [s[0]]*n + [s[1]]*n
ov = np.random.choice(list("ABC"), size=2*n)
return pd.DataFrame({"val":val, "overlap":ov, "comp":comp})
data1 = get_data(s=[9,11])
data2 = get_data(s=[7,19])
data3 = get_data(s=[1,27])
#option1 combine
for i, df in enumerate([data1,data2,data3]):
df["data"] = ["data{}".format(i+1)] * len(df)
data = data1.append(data2)
data = data.append(data3)
bw = 2
a = sns.FacetGrid(data, col="overlap", hue="comp", row="data")
a = (a.map(sns.kdeplot, "val",bw=bw ))
plt.show()
You can also loop over the dataframes and axes to obtain the desired result.
import numpy as np; np.random.seed(3)
import pandas as pd
import seaborn.apionly as sns
import matplotlib.pyplot as plt
def get_data(n=266, s=[5,13]):
val = np.c_[np.random.poisson(lam=s[0], size=n),
np.random.poisson(lam=s[1], size=n)].T.flatten()
comp = [s[0]]*n + [s[1]]*n
ov = np.random.choice(list("ABC"), size=2*n)
return pd.DataFrame({"val":val, "overlap":ov, "comp":comp})
data1 = get_data(s=[9,11])
data2 = get_data(s=[7,19])
data3 = get_data(s=[1,27])
#option2 plot each subplot individually
data = [data1,data2,data3]
bw = 2
fig, axes = plt.subplots(3,3, sharex=True, sharey=True)
for i in range(3):
for j in range(3):
x = data[i]
x = x[x["overlap"] == x["overlap"].unique()[j]]
for hue in x["comp"].unique():
d = x[x["comp"] == hue]
sns.kdeplot(d["val"], ax=axes[i,j], bw=bw, label=hue )
plt.show()