how to plot many columns of Pandas data frame - python

I need to make scatter plots using Boston Housing Dataset. I want to plot all the other columns with MEDV column. This code makes all plot on the same graph. How can I separate them?enter image description here
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(12, 12))
for column, ax in zip(['CRIM', 'ZN','INDUS', 'CHAS', 'NOX', 'RM'], axes):
plt.scatter(boston_df[column], boston_df.MEDV)

Your code will work if you flatten the axes object because currently you are looping once over axes which is a 2-d object. So use axes.flatten() in the for loop and then use ax.scatter which will plot each column to a new figure.
The order of plotting will be the first row, then the second row and then the third row
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(12, 12))
for column, ax in zip(['CRIM', 'ZN','INDUS', 'CHAS', 'NOX', 'RM'], axes.flatten()):
ax.scatter(boston_df[column], boston_df.MEDV)

You need to use ax.scatter instead of plt.scatter so that they plot in the axes you've created.

Try plotting with ax[row,col].scatter(). This should do the trick. You have to iterate over both, rows and columns then.

Related

Multiple count plots in seaborn

I have a CSV file which has multiple columns, now I am trying to plot side by side count plot for selected columns, using below code, I am able to make only two-column, but when I trying to add more column, it's not working. How to plot multiple selected columns and plot it side by side.
While I plotting two graphs, its overlapping, how to increase the gap.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
train_data = pd.read_csv(r"train_ctrUa4K.csv")
plt.figure(figsize=(10, 8))
fig, ax =plt.subplots(1,2)
sns.countplot(train_data['Gender'], ax=ax[0])
sns.countplot(train_data['Dependents'], ax=ax[1])
#sns.countplot(train_data['Self_Employed'], ax=ax[1])
#sns.countplot(train_data['Property_Area'], ax=ax[1,1])
fig.show()
change the number of columns in the call to subplots()
fig, ax = plt.subplots(1,4)
sns.countplot(train_data['Gender'], ax=ax[0])
sns.countplot(train_data['Dependents'], ax=ax[1])
sns.countplot(train_data['Self_Employed'], ax=ax[2])
sns.countplot(train_data['Property_Area'], ax=ax[3])
If you have too many subplots to fit on a single line, you can increase the number of rows as well. Be careful that if you have more than one row and more than one column, then the variable ax will be a 2D array:
fig, ax = plt.subplots(2,2)
sns.countplot(train_data['Gender'], ax=ax[0,0])
sns.countplot(train_data['Dependents'], ax=ax[0,1])
sns.countplot(train_data['Self_Employed'], ax=ax[1,0])
sns.countplot(train_data['Property_Area'], ax=ax[1,1])

Plots different columns of different dataframe in one plot as scatter plot

I am trying to plot different columns (longitude & latitude ) from different dataframes in one plot. But they are being plotted in different figures separately.
Here is the code I am using
fig,ax=plt.subplots()
cells_final.plot.scatter(x='lon',y='lat')
data_rupture.plot.scatter(x='Longitude',y='Latitude',color='red')
plt.show()
How can I plot this in one single figure?
Use the axes instance (ax) created by
fig, ax = plt.subplots()
And pass it as the ax parameter of pandas.DataFrame.plot,
fig,ax=plt.subplots()
cells_final.plot.scatter(x='lon',y='lat', ax=ax)
data_rupture.plot.scatter(x='Longitude',y='Latitude',color='red', ax=ax)
plt.show()
Or if you'd rather have the plots on different subplots in the same figure you can create multiple axes
fig, (ax1, ax2) = plt.subplots(1, 2)
cells_final.plot.scatter(x='lon',y='lat', ax=ax1)
data_rupture.plot.scatter(x='Longitude',y='Latitude',color='red', ax=ax2)
plt.show()
You need specify the axis:
fig,ax=plt.subplots(1,2, figsize=(12, 8))
cells_final.plot.scatter(x='lon',y='lat', ax=ax=[0])
data_rupture.plot.scatter(x='Longitude',y='Latitude',color='red', ax=ax[1])
plt.show()
Thanks #William Miller.......!

Matplotlib several subplots and axes

I am trying to use matplotlib to plot several subplots, each with 2 y-axis (the values are completely different between the two curves, so I have to plot them in different y-axis)
To plot one graph with 2 y-axis I do:
fig, ax1 = plt.subplots(figsize=(16, 10))
ax2 = ax1.twinx()
ax1.plot(line1, 'r')
ax2.plot(line2, 'g')
To plot 2 subplots, one with each curve I do:
plt.subplot(2,1,1)
plt.plot(line1, 'r')
plt.subplot(2,1,2)
plt.plot(line2, 'g')
I can't manage to merge the two methods.
I wanted something like:
fig, ax1 = plt.subplots(figsize=(16, 10))
plt.subplot(2,1,1)
ax2 = ax1.twinx()
ax1.plot(line1, 'r')
ax2.plot(line2, 'g')
plt.subplot(2,1,2)
ax1.plot(line3, 'r')
ax2.plot(line4, 'g')
But this doesn't work, it just shows 2 empty subplots.
How can I do this?
You should create your subplots first, then twin the axes for each subplot. It is easier to use the methods contained in the axis object to do the plotting, rather than the high level plot function calls.
The axes returned by subplots is an array of axes. If you have only 1 column or 1 row, it is a 1-D array, but if both are greater than 1 it is a 2-D array. In the later case, you need to either .ravel() the axes array or iterate over the rows and then axes in each row.
import numpy as np
import matplotlib.pyplot as plt
# create a figure with 4 subplot axes
fig, axes = plt.subplots(2,2, figsize=(8,8))
for ax_row in axes:
for ax in ax_row:
# create a twin of the axis that shares the x-axis
ax2 = ax.twinx()
# plot some data on each axis.
ax.plot(np.arange(50), np.random.randint(-10,10, size=50).cumsum())
ax2.plot(np.arange(50), 100+np.random.randint(-100,100, size=50).cumsum(), 'r-')
plt.tight_layout()
plt.show()

python - plotting N by 1 number of plots when N is unknown

I currently am plotting multiple plots across 4 axis using seaborn. In order to do this, I manually select nrows=4 and then run 4 boxplots at once.
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline
data=np.random.randn(1000)
label = ['A','B','C','D'] * 250
df = pd.DataFrame(
{'label': prod1,
'data': data
})
fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows=4, sharey=True)
fig.set_size_inches(12, 16)
sns.boxplot(data=df[df['label']=='A'], y='data', ax=ax1)
sns.boxplot(data=df[df['label']=='B'], y='data', ax=ax2)
sns.boxplot(data=df[df['label']=='C'], y='data', ax=ax3)
sns.boxplot(data=df[df['label']=='D'], y='data', ax=ax4)
I would like to rewrite this function so that it automatically recognizes the unique number of labels, creates the number of axes automatically, then plots.
Does anyone know how I can accomplish this? Thank you.
The assignment
fig, ax = plt.subplots(nrows=4, sharey=True)
makes ax a NumPy array of axes. This array can be one- or two-dimensional (depending on the value of the nrows and ncols parameters),
so calling ax.ravel() is used to ensure it is one-dimensional.
Now you can loop over zip(label, ax.ravel()) to call sns.boxplot once for each label and axes.
fig, ax = plt.subplots(nrows=4, sharey=True)
fig.set_size_inches(12, 16)
for labeli, axi in zip(label, ax.ravel()):
sns.boxplot(data=df[df['label']==labeli], y='data', ax=axi)
Note that zip ends when the shortest of the iterators end. So even though
label has length 1000, only the first 4 items are used in the loop since there
are only 4 axes.
Alternatively, just assign label = ['A','B','C','D'] since that variable is not used anywhere else (at least, not in the posted code).

wrong y axis range using matplotlib subplots and seaborn

I'm playing with seaborn for the first time, trying to plot different columns of a pandas dataframe on different plots using matplotlib subplots. The simple code below produces the expected figure but the last plot does not have a proper y range (it seems linked to the full range of values in the dataframe).
Does anyone have an idea why this happens and how to prevent it? Thanks.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pds
import seaborn as sns
X = np.arange(0,10)
df = pds.DataFrame({'X': X, 'Y1': 4*X, 'Y2': X/2., 'Y3': X+3, 'Y4': X-7})
fig, axes = plt.subplots(ncols=2, nrows=2)
ax1, ax2, ax3, ax4 = axes.ravel()
sns.set(style="ticks")
sns.despine(fig=fig)
sns.regplot(x='X', y='Y1', data=df, fit_reg=False, ax=ax1)
sns.regplot(x='X', y='Y2', data=df, fit_reg=False, ax=ax2)
sns.regplot(x='X', y='Y3', data=df, fit_reg=False, ax=ax3)
sns.regplot(x='X', y='Y4', data=df, fit_reg=False, ax=ax4)
plt.show()
Update: I modified the above code with:
fig, axes = plt.subplots(ncols=2, nrows=3)
ax1, ax2, ax3, ax4, ax5, ax6 = axes.ravel()
If I plot data on any axis but the last one I obtain what I'm looking for:
Of course I don't want the empty frames. All plots present the data with a similar visual aspect.
When data is plotted on the last axis, it gets a y range that is too wide like in the first example. Only the last axis seems to have this problem. Any clue?
If you want the scales to be the same on all axes you could create subplots with this command:
fig, axes = plt.subplots(ncols=2, nrows=2, sharey=True, sharex=True)
Which will make all plots to share relevant axis:
If you want manually to change the limits of that particular ax, you could add this line at the end of plotting commands:
ax4.set_ylim(top=5)
# or for both limits like this:
# ax4.set_ylim([-2, 5])
Which will give something like this:

Categories