Subplots Frequency Plots - python

I've been struggling to generate the frequency plot of 2 columns named "Country" and "Company" in my DataFrame and show them as 2 subplots. Here's what I've got.
Figure1 = plt.figure(1)
Subplot1 = Figure1.add_subplot(2,1,1)
and here I'm going to use the bar chart pd.value_counts(DataFrame['Country']).plot('barh')
to shows as first subplot.
The problem is, I cant just go: Subplot1.pd.value_counts(DataFrame['Country']).plot('barh') as Subplot1. has no attribute pd. ~ Could anybody shed some light in to this?
Thanks a million in advance for your tips,
R.

You don't have to create Figure and Axes objects separately, and you should probably avoid initial caps in variable names, to differentiate them from classes.
Here, you can use plt.subplots, which creates a Figure and a number of Axes and binds them together. Then, you can just pass the Axes objects to the plot method of pandas:
from matplotlib import pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
pd.value_counts(df['Country']).plot('barh', ax=ax1)
pd.value_counts(df['Company']).plot('barh', ax=ax2)

Pandas' plot method can take in a Matplotlib axes object and direct the resulting plot into that subplot.
# If you want a two plots, one above the other.
nrows = 2
ncols = 1
# Here axes contains 2 objects representing the two subplots
fig, axes = plt.subplots(nrows, ncols, figsize=(8, 4))
# Below, "my_data_frame" is the name of your Pandas dataframe.
# Change it accordingly for the code to work.
# Plot first subplot
# This counts the number of times each country appears and plot
# that as a bar char in the first subplot represented by axes[0].
my_data_frame['Country'].value_counts().plot('barh', ax=axes[0])
# Plot second subplot
my_data_frame['Company'].value_counts().plot('barh', ax=axes[1])

Related

Stacked bar plot in subplots using pandas .plot()

I created a hypothetical DataFrame containing 3 measurements for 20 experiments. Each experiment is associated with a Subject (3 possibilities).
import random
random.seed(42) #set seed
tuples = list(zip(*[list(range(20)),random.choices(['Jean','Marc','Paul'], k = 20)]))#index labels
index=pd.MultiIndex.from_tuples(tuples, names=['num_exp','Subject'])#index
test= pd.DataFrame(np.random.randint(0,100,size=(20, 3)),index=index,columns=['var1','var2','var3']) #DataFrame
test.head() #first lines
head
I succeeded in constructing stacked bar plots with the 3 measurements (each bar is an experiment) for each subject:
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False) #plots
plot1 plot2 plot3
Now, I would like to put each plot (for each subject) in a subplot. If I use the "subplots" argument, it gives me the following :
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False,subplots= True) #plot with subplot
plotsubplot1 plotsubplot2 plotsubplot3
It created a subplot for each measurment because they correspond to columns in my DataFrame.
I don't know how I could do otherwise because I need them as columns to create stacked bars.
So here is my question :
Is it possible to construct this kind of figure with stacked bar plots in subplots (ideally in an elegant way, without iterating) ?
Thanks in advance !
I solved my problem with a simple loop without using anything else than pandas .plot()
Pandas .plot() has an ax parameters for matplotlib axes object.
So, starting from the list of distinct subjects :
subj= list(dict.fromkeys(test.index.get_level_values('Subject')))
I define my subplots :
fig, axs = plt.subplots(1, len(subj))
Then, I have to iterate for each subplot :
for a in range(len(subj)):
test.loc[test.index.get_level_values('Subject') == subj[a]].unstack(level=1).plot(ax= axs[a], kind='bar', stacked=True,legend=False,xlabel='',fontsize=10) #Plot
axs[a].set_title(subj[a],pad=0,fontsize=15) #title
axs[a].tick_params(axis='y', pad=0,size=1) #yticks
And it works well ! :finalresult

Two seaborn plots with different scales displayed on same plot but bars overlap

I am trying to include 2 seaborn countplots with different scales on the same plot but the bars display as different widths and overlap as shown below. Any idea how to get around this?
Setting dodge=False, doesn't work as the bars appear on top of each other.
The main problem of the approach in the question, is that the first countplot doesn't take hue into account. The second countplot won't magically move the bars of the first. An additional categorical column could be added, only taking on the 'weekend' value. Note that the column should be explicitly made categorical with two values, even if only one value is really used.
Things can be simplified a lot, just starting from the original dataframe, which supposedly already has a column 'is_weeked'. Creating the twinx ax beforehand allows to write a loop (so writing the call to sns.countplot() only once, with parameters).
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_style('dark')
# create some demo data
data = pd.DataFrame({'ride_hod': np.random.normal(13, 3, 1000).astype(int) % 24,
'is_weekend': np.random.choice(['weekday', 'weekend'], 1000, p=[5 / 7, 2 / 7])})
# now, make 'is_weekend' a categorical column (not just strings)
data['is_weekend'] = pd.Categorical(data['is_weekend'], ['weekday', 'weekend'])
fig, ax1 = plt.subplots(figsize=(16, 6))
ax2 = ax1.twinx()
for ax, category in zip((ax1, ax2), data['is_weekend'].cat.categories):
sns.countplot(data=data[data['is_weekend'] == category], x='ride_hod', hue='is_weekend', palette='Blues', ax=ax)
ax.set_ylabel(f'Count ({category})')
ax1.legend_.remove() # both axes got a legend, remove one
ax1.set_xlabel('Hour of Day')
plt.tight_layout()
plt.show()
use plt.xticks(['put the label by hand in your x label'])

How to remove the extra figures created when running a for loop to create seaborn plots

I am trying to do EDA along with exploring the Matplotlib and Seaborn libraries.
The data_cat DataFrame has 4 columns and I want to create plots in a single row with 4 columns.
For that, I created a figure object with 4 axes objects.
fig, ax = plt.subplots(1,4, figsize = (16,4))
for i in range(len(data_cat.columns)):
sns.catplot(x = data_cat.columns[i], kind = 'count', data = data_cat, ax= ax[i])
The output for it is a figure with the 4 plots (as required) but it is followed by 4 blank plots that I think are the extra figure objects generated by the sns.catplot function.
Your code does not work as intended because sns.catplot() is a figure level function, that is designed to create its own grid of subplots if desired. So if you want to set up the subplot grid directly in matplotlib, as you do with your first line, you should use the appropriate axes level function instead, in this case sns.countplot():
fig, ax = plt.subplots(1, 4, figsize = (16,4))
for i in range(4):
sns.countplot(x = data_cat.columns[i], data = data_cat, ax= ax[i])
Alternatively, you could use pandas' df.melt() method to tidy up your dataset so that all the values from your four columns are in one column (say 'col_all'), and you have another column (say 'subplot') that identifies from which original column each value is. Then you can get all the subplots with one call:
sns.catplot(x='col_all', kind='count', data=data_cat, col='subplot')
I answered a related question here.

Add multiple axes from different sources into same figure

I am using Python/matplotlib to create a figure whereby it has three subplots, each returned from a different 'source' or class method.
For example, I have a script called 'plot_spectra.py' that contains the Spectra() class with method Plot().
So, calling Spectra('filename.ext').Plot() will return a tuple, as per the code below:
# create the plot
fig, ax = plt.subplots()
ax.contour(xx, yy, plane, levels=cl, cmap=cmap)
ax.set_xlim(ppm_1h_0, ppm_1h_1)
ax.set_ylim(ppm_13c_0, ppm_13c_1)
# return the contour plot
return fig, ax
It is my understanding that the 'figure' is the 'window' in matplotlib, and the 'ax' is an individual plot. I would then want to say, plot three of these 'ax' objects in the same figure, but I am struggling to do so because I keep getting an empty window and I think I have misunderstood what each object actually is.
Calling:
hnca, hnca_ax = Spectra('data/HNCA.ucsf', type='sparky').Plot(plane_ppm=resi.N(), vline=resi.H())
plt.subplot(2,2,1)
plt.subplot(hnca_ax)
eucplot, barplot = PlotEucXYIntensity(scores, x='H', y='N')
plt.subplot(2,2,3)
plt.subplot(eucplot)
plt.subplot(2,2,4)
plt.subplot(barplot)
plt.show()
Ultimately, what I am trying to obtain is a single window that looks like this:
Where each plot has been returned from a different function or class method.
What 'object' do I need to return from my functions? And how do I incorporate these three objects into a single figure?
I would suggest this kind of approach, where you specify the ax on which you want to plot in the function:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
def Spectra(data, ax):
ax.plot(data)
def PlotIntensity(data, ax):
ax.hist(data)
def SeabornScatter(data, ax):
sns.scatterplot(data, data, ax = ax)
spectrum = np.random.random((1000,))
plt.figure()
ax1 = plt.subplot(1,3,1)
Spectra(spectrum, ax1)
ax2 = plt.subplot(1,3,2)
SeabornScatter(spectrum, ax2)
ax3 = plt.subplot(1,3,3)
PlotIntensity(spectrum, ax3)
plt.tight_layout()
plt.show()
You can specify the grid for the subplots in very different ways, and you probably also want to have a look on the gridspec module.
One way to do this is:
f = plt.figure()
gs = f.add_gridspec(2,2)
ax = f.add_subplot(gs[0,:])
Think of the '2,2' as adding 2 row x 2 columns.
On the third line 'gs[0,:]' is telling to add a chart on row 0, all columns. This will create the chart on the top of your top. Note that indices begin with 0 and not with 1.
To add the 'eucplot' you will have to call a different ax on row 1 and column 0:
ax2 = f.add_subplot(gs[1,0])
Lastly, the 'barplot' will go in yet a different ax on row 1, column 1:
ax3 = f.add_subplot(gs[1,1])
See this site here for further reference: Customizing Figure Layouts Using GridSpec and Other Functions

Python: Parallel coordinates subplots in subplot

I saw this example on how to create a parallel coordinate plot: Parallel Coordinates:
This creates a nice Parallel Coordinates figure, but I would like to add this plot to an already existing figure in a subplot (there should be another plot next to it in the same plot).
For the already existing figure, the figure and axes are defined as:
fig = plt.figure(figsize=plt.figaspect(2.))
ax = fig.add_subplot(1,2,1)
For the Parallel Coordinates, they suggest:
fig, axes = plt.subplots(1, dims-1, sharey=False)
How can I reconcile both initializations of the figure and the ax(es)?
One option is to create all the axes using subplots then just shift the location of the one that you don't want to have wspace=0 as is done for the Parallel Coordinate plots:
import matplotlib.pylab as plt
dims = 4
fig, axes = plt.subplots(1, dims-1 + 1, sharey=False)
plt.subplots_adjust(wspace=0)
ax1 = axes[0]
pos = ax1.get_position()
ax1.set_position(pos.translated(tx = -0.1,ty=0))
I have added 1 to the number of columns creates (leaving it explicitly -1+1) and set wspace=0 which draws all the plots adjacent to one another with no space inbetween. Take the left most axes and get the position which is a Bbox. This is nice as it gives you the ability to translate it by tx=-0.1 separating your existing figure.

Categories