pandas subplots with multiindex column - python

According to the following answer I should be able to use subplots with multiindex to show pairwise plots: Pandas Plotting with Multi-Index
But this seems not to work for me.
This is an example:
import pandas as pd
import numpy as np
# prepare example data
adf = pd.DataFrame(index = pd.date_range('2019-01-01', periods=30),
data= np.random.randint(0,100,(30,2)), columns=['X','Y'])
bdf = pd.DataFrame(index = pd.date_range('2019-01-01', periods=30),
data= np.random.randint(0,100,(30,2)), columns=['X','Y'])
df = pd.concat({'a': adf, 'b': bdf}).unstack(level=0)
# plot
_ = df.plot(kind='line', subplots=True, figsize=(10, 10))
The result is 4 plots for each column.
But I want 2 pairwise plots like this:
I can achieve this with the following line:
_ = [df.loc[:,df.columns.get_level_values(0) == c].plot() for c in df.columns.get_level_values(0).unique()]
But should this not be possible with multiindex and subplots feature?


How to display different values of the same column in plot graph/chart

The column class has 2 options for the value, either 'b' or 's'. I am trying to display a graph/chart that shows how many are 'b' and how many are 's'. I can't figure out how to do this when they are both in the same column.
The current code shows a scatter plot, but I'd like to use the data from column 'class'.
import pandas as pd
import matplotlib.pylab as plt
import numpy as np
#df = df.groupby('class')['class'].count()
df = pd.DataFrame(np.random.randint(0,10,size=(5, 2)), columns=['x','y'])
df['class'] = ['Benign','Malware','Benign','Malware','Malware']
# plot groupby results on the same canvas
fig, ax = plt.subplots(figsize=(8,6))
for n, grp in df.groupby('class'):
ax.scatter(x = "x", y = "y", data=grp, label=n)
Don't group by class and loop, rather aggregate (with value_counts) then plot:
or with matplotlib's functions:
s = df['class'].value_counts(), s)

How to plot seaborn lmplots in multiple subplots

I was trying to plot multiple lmplots in the same figure. But I am getting too many unwanted subplots.
I found another SO link How to plot 2 seaborn lmplots side-by-side? but that also did not help me.
In this example I want 1 row 2 columns.
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# data
df = sns.load_dataset('titanic')
# plot
m,n = 1,2
cols1 = ['age','fare']
cols2 = ['fare','age']
target = 'survived'
fontsize = 12
fig, ax = plt.subplots(m,n,figsize=figsize)
for i, (col1,col2) in enumerate(zip(cols1,cols2)):
hue=target, palette='Set1',
plt.tick_params(axis='both', which='major', labelsize=fontsize)
for i in range(m*n-len(cols1)):
My attempt so far:
df = pd.DataFrame({'x0':[10,20,30,40],
'y0': [100,200,300,400],
'target': [0,1,1,1]
df1 = df.append(df)
df1 = df1.reset_index(drop=True)
df1['x0'].iloc[len(df):] = df['x1'].to_numpy()
df1['y0'].iloc[len(df):] = df['y1'].to_numpy()
df1['col'] = ['c0']* len(df) + ['c1'] * len(df)
df1 = df1.drop(['x1','y1'],axis=1)
df1 = df1.rename(columns={'x0':'x','y0':'y'})

Count Rows in Dictionary of Dataframes

I have a dictionary of dataframes. I am trying to count the rows in each dataframe. For the real data, my code is counting just over ten thousand rows for a dataframe that has only has a few rows.
I have tried to reproduce the error using dummy data. Unfortunately, the code works fine with the dummy data!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Dataframe
Df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
# Map
Ma = Df.groupby('D')
# Dictionary of Dataframes
Di = {}
for name, group in Ma:
Di[str(name)] = group
# Count the Rows in each Dataframe
Li = []
for k in Di:
Count = Di[k].shape[0]
# Flatten
Li_1 = []
for sublist in Li:
for item in sublist:
# Histogram
plt.hist(Li_1, bins=10)
plt.xlabel("Rows / Dataframe")
fig = plt.gcf()
To get the number of rows corresponding to each category in 'D', you can simply use .size when you do your groupby:
pandas also allows you to directly plot graphs, so your code can be reduced to:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
plt.xlabel("Rows / Dataframe")
fig = plt.gcf()
Assuming that, the data in column D is a categorical variable. You can get the count for each category using seaborn countplot.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Dataframe
df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
# easy count plot in sns
But if you are looking for distribution plot but not categorical count plot then you can use the folowing part of the code to have distribution plot.
# for distribution plot
But if you want distribution plot after group by the elements which does not make any sense to me but you can use the following:
# for distribution plot after group by
sns.distplot(df.groupby('D').size() ,kde=False,bins=10)

Plotting through a subset of data frame in Pandas using Matplotlib

I have a Dataframe and I slice the Dataframe into three subsets. Each subset has 3 to 4 rows of data. After I slice the data frame into three subsets, I plot them using Matplotlib.
The problem I have is I am not able to create a plot where each subplot is plotted using sliced DataFrame. For example, in a group of three in a set, I have only one of the plots (last subplot) plotted where there is no data for the remaining two plots initial sets in a group. it looks like the 'r' value does not pass to 'r.plot' for all three subplots.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
df_grouped = df.groupby('key1')
for group_name, group_value in df_grouped:
rows, columns = group_value.shape
fig, axes = plt.subplots(rows, 1, sharex=True, sharey=True, figsize=(15,20))
for i,r in group_value.iterrows():
r = r[0:columns-1]
r.plot(kind='bar', fill=False, log=False)
I think you might want what I call df_subset to be summarized in some way, but here's a way to plot each group in its own panel.
# Your Code Setting Up the Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
# My Code to Plot in Three Panels
distinct_keys = df['key1'].unique()
fig, axes = plt.subplots(len(distinct_keys), 1, sharex=True, figsize=(3,5))
for i, key in enumerate(distinct_keys):
df_subset = df[df.key1==key]
# {maybe insert a line here to summarize df_subset somehow interesting?}
# plot
axes[i] = df_subset.plot(kind='bar', fill=False, log=False)

Side-by-side boxplot of multiple columns of a pandas DataFrame

One year of sample data:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
I want to boxplot these data side-by-side grouped by the month (i.e., two boxes per month, one for A and one for B).
For a single column sns.boxplot(df.index.month, df["A"]) works fine. However, sns.boxplot(df.index.month, df[["A", "B"]]) throws an error (ValueError: cannot copy sequence with size 2 to array axis with dimension 365). Melting the data by the index (pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")) in order to use seaborn's hue property as a workaround doesn't work either (TypeError: unhashable type: 'DatetimeIndex').
(A solution doesn't necessarily need to use seaborn, if it is easier using plain matplotlib.)
I found a workaround that basically produces what I want. However, it becomes somewhat awkward to work with once the DataFrame includes more variables than I want to plot. So if there is a more elegant/direct way to do it, please share!
df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)
here's a solution using pandas melting and seaborn:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
"B": rnd.randn(n)+1,
"C": rnd.randn(n) + 10, # will not be plotted
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)
month_dfs = []
for group in df.groupby(df.index.month):
for i,month_df in enumerate(month_dfs):
axi = plt.subplot(1, len(month_dfs), i + 1)
month_df.plot(kind='box', subplots=False, ax = axi)
plt.ylim([-4, 4])
Will give this
Not exactly what you're looking for but you get to keep a readable DataFrame if you add more variables.
You can also easily remove the axis by using
if i > 0:
y_axis = axi.axes.get_yaxis()
in the loop before
This is quite straight-forward using Altair:
df.reset_index().melt(id_vars = ["index"], value_vars=["A", "B"]).assign(month = lambda x: x["index"].dt.month)
alt.X('variable:N', title=''),
The code above melts the DataFrame and adds a month column. Then Altair creates box-plots for each variable broken down by months as the plot columns.
