Multiple count plots in seaborn - python

I have a CSV file which has multiple columns, now I am trying to plot side by side count plot for selected columns, using below code, I am able to make only two-column, but when I trying to add more column, it's not working. How to plot multiple selected columns and plot it side by side.
While I plotting two graphs, its overlapping, how to increase the gap.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
train_data = pd.read_csv(r"train_ctrUa4K.csv")
plt.figure(figsize=(10, 8))
fig, ax =plt.subplots(1,2)
sns.countplot(train_data['Gender'], ax=ax[0])
sns.countplot(train_data['Dependents'], ax=ax[1])
#sns.countplot(train_data['Self_Employed'], ax=ax[1])
#sns.countplot(train_data['Property_Area'], ax=ax[1,1])
fig.show()

change the number of columns in the call to subplots()
fig, ax = plt.subplots(1,4)
sns.countplot(train_data['Gender'], ax=ax[0])
sns.countplot(train_data['Dependents'], ax=ax[1])
sns.countplot(train_data['Self_Employed'], ax=ax[2])
sns.countplot(train_data['Property_Area'], ax=ax[3])
If you have too many subplots to fit on a single line, you can increase the number of rows as well. Be careful that if you have more than one row and more than one column, then the variable ax will be a 2D array:
fig, ax = plt.subplots(2,2)
sns.countplot(train_data['Gender'], ax=ax[0,0])
sns.countplot(train_data['Dependents'], ax=ax[0,1])
sns.countplot(train_data['Self_Employed'], ax=ax[1,0])
sns.countplot(train_data['Property_Area'], ax=ax[1,1])

Related

Plotting a boxplot and histogram side by side with seaborn

I'm trying to plot a simple box plot next to a simple histogram in the same figure using seaborn (0.11.2) and pandas (1.3.4) in a jupyter notebook (6.4.5).
I've tried multiple approaches with nothing working.
fig, ax = plt.subplots(1, 2)
sns.boxplot(x='rent', data=df, ax=ax[0])
sns.displot(x='rent', data=df, bins=50, ax=ax[1])
There is an extra plot or grid that gets put next to the boxplot, and this extra empty plot shows up any time I try to create multiple axes.
Changing:
fig, ax = plt.subplots(2)
Gets:
Again, that extra empty plot next to the boxplot, but this time below it.
Trying the following code:
fig, (axbox, axhist) = plt.subplots(1,2)
sns.boxplot(x='rent', data=df, ax=axbox)
sns.displot(x='rent', data=df, bins=50, ax=axhist)
Gets the same results.
Following the answer in this post, I try:
fig, axs = plt.subplots(ncols=2)
sns.boxplot(x='rent', data=df, ax=axs[0])
sns.displot(x='rent', data=df, bins-50, ax=axs[1])
results in the same thing:
If I just create the figure and then the plots underneath:
plt.figure()
sns.boxplot(x='rent', data=df)
sns.displot(x='rent', data=df, bins=50)
It just gives me the two plots on top of each other, which I assume is just making two different figures.
I'm not sure why that extra empty plot shows up next to the boxplot when I try to do multiple axes in seaborn.
If I use pyplot instead of seaborn, I can get it to work:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
ax1.hist(df['rent'], bins=50)
ax2.boxplot(df['rent'])
Results in:
The closest I've come is to use seaborn only on the boxplot, and pyplot for the histogram:
plt.figure(figsize=(8, 5))
plt.subplot(1, 2, 1)
sns.boxplot(x='rent', data=df)
plt.subplot(1, 2, 2)
plt.hist(df['rent'], bins=50)
Results:
What am I missing? Why can't I get this to work with two seaborn plots on the same figure, side by side (1 row, 2 columns)?
Try this function:
def creating_box_hist(column, df):
# creating a figure composed of two matplotlib.Axes objects (ax_box and ax_hist)
f, (ax_box, ax_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)})
# assigning a graph to each ax
sns.boxplot(df[column], ax=ax_box)
sns.histplot(data=df, x=column, ax=ax_hist)
# Remove x axis name for the boxplot
ax_box.set(xlabel='')
plt.show()

how to plot many columns of Pandas data frame

I need to make scatter plots using Boston Housing Dataset. I want to plot all the other columns with MEDV column. This code makes all plot on the same graph. How can I separate them?enter image description here
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(12, 12))
for column, ax in zip(['CRIM', 'ZN','INDUS', 'CHAS', 'NOX', 'RM'], axes):
plt.scatter(boston_df[column], boston_df.MEDV)
Your code will work if you flatten the axes object because currently you are looping once over axes which is a 2-d object. So use axes.flatten() in the for loop and then use ax.scatter which will plot each column to a new figure.
The order of plotting will be the first row, then the second row and then the third row
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(12, 12))
for column, ax in zip(['CRIM', 'ZN','INDUS', 'CHAS', 'NOX', 'RM'], axes.flatten()):
ax.scatter(boston_df[column], boston_df.MEDV)
You need to use ax.scatter instead of plt.scatter so that they plot in the axes you've created.
Try plotting with ax[row,col].scatter(). This should do the trick. You have to iterate over both, rows and columns then.

Multi line plot and re-label legend

1) Why are lines so dense?
In the dataset the time is by hour, if time was by day would it make a difference. I would like to see a line chart for each host.
2) How can I re-label the legend from count to host?
fig, ax = plt.subplots(figsize=(15,7))
df.groupby('host').plot(x='time', y='count',ax=ax, legend=True)
You are plotting hourly data of more than 6 months. That's ~4k data points, of course it is dense. Daily data would be better, although it's still going to be dense.
There are a couple of options:
You could either use seaborn
import seaborn as sns
fig, ax = plt.subplots(figsize=(15,7))
sns.lineplot(x='time', y='count', ax=ax, hue='host')
Or do a loop on groupby:
fig, ax = plt.subplots(figsize=(15,7))
for h, d in df.groupby('host'):
d.plot(x='time', y='count', ax=ax, label=h)
Ad 2) just add label='name' as parameter, i.e:
fig, ax = plt.subplots(figsize=(15,7))
df.groupby('host').plot(x='time', y='count',ax=ax, legend=True, label='host')

horizontal grid only (in python using pandas plot + pyplot)

I would like to get only horizontal grid using pandas plot.
The integrated parameter of pandas only has grid=True or grid=False, so I tried with matplotlib pyplot, changing the axes parameters, specifically with this code:
import pandas as pd
import matplotlib.pyplot as plt
fig = plt.figure()
ax2 = plt.subplot()
ax2.grid(axis='x')
df.plot(kind='bar',ax=ax2, fontsize=10, sort_columns=True)
plt.show(fig)
But I get no grid, neither horizontal nor vertical. Is Pandas overwriting the axes? Or am I doing something wrong?
Try setting the grid after plotting the DataFrame. Also, to get the horizontal grid, you need to use ax2.grid(axis='y'). Below is an answer using a sample DataFrame.
I have restructured how you define ax2 by making use of subplots.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'lab':['A', 'B', 'C'], 'val':[10, 30, 20]})
fig, ax2 = plt.subplots()
df.plot(kind='bar',ax=ax2, fontsize=10, sort_columns=True)
ax2.grid(axis='y')
plt.show()
Alternatively, you can also do the following: Use the axis object returned from the DataFrame plot directly to turn on the horizontal grid
fig = plt.figure()
ax2 = df.plot(kind='bar', fontsize=10, sort_columns=True)
ax2.grid(axis='y')
Third option as suggested by #ayorgo in the comments is to chain the two commands as
df.plot(kind='bar',ax=ax2, fontsize=10, sort_columns=True).grid(axis='y')

Matplotlib/Pandas: How to plot multiple scatterplots within different locations in the same plot?

I often have two pandas dataframes, which I would like to plot within the same plot. Normally these are two samples, and I would like to contrast their properties, as an example:
The x axis simply has two locations, the left for the first dataset, and the right for the second dataset.
In matplotlib, one can plot multiple datasets within the same plot:
import matplotlib.pyplot as plt
x = range(100)
y = range(100,200)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(x[:4], y[:4], s=10, c='b', marker="s", label='first')
ax1.scatter(x[40:],y[40:], s=10, c='r', marker="o", label='second')
plt.show()
However,
(1) How do you separate your datasets into two compartmentalized locations like the first example?
(2) How do you accomplish this with two pandas dataframes? Do you merge them and then specify two locations for plotting?
Use return_type='axes' to get data1.boxplot to return a matplotlib Axes object. Then pass that axes to the second call to boxplot using ax=ax. This will cause both boxplots to be drawn on the same axes.
ax = df1.plot()
df2.plot(ax=ax)
a1=a[['a','time']]
ax = a1.boxplot(by='time', meanline=True, showmeans=True, showcaps=True,
showbox=True, showfliers=False, return_type='axes')
a2 = a[['c','time']]
a2.boxplot(by='time', meanline=True, showmeans=True, showcaps=True,
showbox=True, showfliers=False, ax=ax)

Categories