Plot all pandas dataframe columns separately - python

I have a pandas dataframe who just has numeric columns, and I am trying to create a separate histogram for all the features
ind group people value value_50
1 1 5 100 1
1 2 2 90 1
2 1 10 80 1
2 2 20 40 0
3 1 7 10 0
3 2 23 30 0
but in my real life data there are 50+ columns, how can I create a separate plot for all of them
I have tried
df.plot.hist( subplots = True, grid = True)
It gave me an overlapping unclear plot.
how can I arrange them using pandas subplots = True. Below example can help me to get graphs in (2,2) grid for four columns. But its a long method for all 50 columns
fig, [(ax1,ax2),(ax3,ax4)] = plt.subplots(2,2, figsize = (20,10))

Pandas subplots=True will arange the axes in a single column.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.plot(subplots=True)
plt.tight_layout()
plt.show()
Here, tight_layout isn't applied, because the figure is too small to arange the axes nicely. One can use a bigger figure (figsize=(...)) though.
In order to have the axes on a grid, one can use the layout parameter, e.g.
df.plot(subplots=True, layout=(4,5))
The same can be achieved if creating the axes via plt.subplots()
fig, axes = plt.subplots(nrows=4, ncols=5)
df.plot(subplots=True, ax=axes)

If you want to plot them separately (which is why I ended up here), you can use
for i in df.columns:
plt.figure()
plt.hist(df[i])

An alternative for this task can be using the "hist" method with hyperparameter "layout". Example using part of the code provided by #ImportanceOfBeingErnest:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.hist(layout=(5,4), figsize=(15,10))
plt.show()

Using pandas.DataFrame I would suggest using pandas.DataFrame.apply. With a custom function, in this example plot(), you can print and save each figure seperately.
def plot(col):
fig, ax = plt.subplots()
ax.plot(col)
plt.show()
df.apply(plot)

While not asked for in the question I thought I'd add that using the x parameter to plot would allow you to specify a column for the x axis data.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20),columns=list('abcdefghijklmnopqrst'))
df.plot(x='a',subplots=True, layout=(4,5))
plt.tight_layout()
plt.show()
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

Related

How do I plot multiple lines within the same graph and each represents targeted group by using matplotlib? [duplicate]

In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()

Subplot of Subplots Matplotlib / Seaborn

I am trying to create a grid of subplots. each subplot will look like the one that is on this site.
https://python-graph-gallery.com/24-histogram-with-a-boxplot-on-top-seaborn/
If I have 10 different sets of this style of plot I want to make them into a 5x2 for example.
I have read through the documentation of Matplotlib and cannot seem to figure out how do it. I can loop the subplots and have each output but I cannot make it into the rows and columns
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for c in df :
# Cut the window in 2 parts
f, (ax_box,
ax_hist) = plt.subplots(2,
sharex=True,
gridspec_kw={"height_ratios":(.15, .85)},
figsize = (10, 10))
# Add a graph in each part
sns.boxplot(df[c], ax=ax_box)
ax_hist.hist(df[c])
# Remove x axis name for the boxplot
plt.show()
the results would just take this loop and put them into a set of rows and columns in this case 5x2
You have 10 columns, each of which creates 2 subplots: a box plot and a histogram. So you need a total of 20 figures. You can do this by creating a grid of 2 rows and 10 columns
Complete answer: (Adjust the figsize and height_ratios as per taste)
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
f, axes = plt.subplots(2, 10, sharex=True, gridspec_kw={"height_ratios":(.35, .35)},
figsize = (12, 5))
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for i, c in enumerate(df):
sns.boxplot(df[c], ax=axes[0,i])
axes[1,i].hist(df[c])
plt.tight_layout()
plt.show()

pandas how to have different color line graph

I have a dataframe,
index block array_size time
0 2 100 0.102710
1 2 1000 0.356194
2 2 10000 2.884903
3 2 100000 28.484935
4 2 1000000 293.656645
5 2 8000000 91286.889516
6 4 100 0.103323
7 4 1000 0.347484
8 4 10000 2.799290
9 4 100000 27.3598
I want to have different color for different value of block (2 and 4. second column)
My code for this plot is df.plot(x='array_size', y='time')
How can I have different color for each variables ?
You can either just plot them each separately with a simple groupby, or if you are willing to use seaborn that will allow you to specify a column for hue
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,4))
for idx, gp in df.groupby('block'):
gp.plot(x='array_size', y='time', ax=ax, label=idx)
plt.show()
With seaborn (at least 0.9) you can just do:
sns.lineplot(data=df, x='array_size', y='time', hue='block')
Since you have many 'blocks' the standard color cycler isn't going to cut it. You can adjust that easily with ax.set_prop_cycle
Sample Data
df = pd.DataFrame({'x': np.tile(np.arange(1,11,1),20),
'y': np.random.randint(1,25,200),
'block': np.repeat(np.arange(1,21,1),10)})
Code:
fig, ax= plt.subplots(figsize=(8,4))
colors = sns.color_palette("coolwarm", df.block.nunique())
ax.set_prop_cycle('color', colors)
for idx, gp in df.groupby('block'):
gp.plot(x='x', y='y', ax=ax, legend=False)
plt.show()
This should do it.
import matplotlib.pyplot as plt
for unq_value in df['block'].unique():
mask = df['block'] == unq_value
df_subset = df[mask]
plt.plot(df_subset['array_size'], df_subset['time'])
plt.show()
Here, we are finding the unique values in block column. Then subset the dataframe for each of the unique value, and then plotting separately for each subset of dataframe.
Use the color parameter to the .plot method to pass in a list of colors, one for each column.
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.plot.line.html

sharey='all' argument in plt.subplots() not passed to df.plot()?

I have a pandas dataframe which I would like to slice, and plot each slice in a separate subplot. I would like to use the sharey='all' and have matplotlib decide on some reasonable y-axis limits, rather than having to search the dataframe for the min and max and add offsets.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=0,ncols=0, sharey='all', tight_layout=True)
for i in range(1, len(df.columns) + 1):
ax = fig.add_subplot(2,3,i)
iC = df.iloc[:, i-1]
iC.plot(ax=ax)
Which gives the following plot:
In fact, it gives that irrespective of what I specify sharey to be ('all','col','row',True, or False). What I sought after using sharey='all' would be something like:
Can somebody perhaps explain me what I'm doing wrong here?
The following version would only add those axes you need for your df-columns and share their y-scales:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig = plt.figure(tight_layout=True)
ref_ax = None
for i in range(len(df.columns)):
ax = fig.add_subplot(2, 3, i+1, sharey=ref_ax)
ref_ax=ax
iC = df.iloc[:, i]
iC.plot(ax=ax)
plt.show()
The grid-layout Parameters, which are explicitly given as ...add_subplot(2, 3, ... here can of course be calculated with respect to len(df.columns).
Your plots are not shared. You create a subplot grid with 0 rows and 0 columns, i.e. no subplots at all, but those nonexisting subplots have their y axes shared. Then you create some other (existing) subplots, which are not shared. Those are the ones that are plotted to.
Instead you need to set nrows and ncols to some useful values and plot to those hence created axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=2,ncols=3, sharey='all', tight_layout=True)
for i, ax in zip(range(len(df.columns)), axes.flat):
iC = df.iloc[:, i]
iC.plot(ax=ax)
for j in range(len(df.columns),len(axes.flat)):
axes.flatten()[j].axis("off")
plt.show()

Plotting grouped data in same plot using Pandas

In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()

Categories