How to draw multi-series histogram from several pd.Series objects? - python

I want to draw a multi-series histogram chart that looks like this:
multi-series histogram chart
I'm trying to add this to an existing Jupyter notebook that already had code in place to establish a double chart:
fig, (ax, ax2) = plt.subplots(2,1)
The existing code uses the style where the plotting is done using methods on the data objects themselves. For example, here's some of the existing code that plots line charts in one of the existing subplots:
ax = termstruct[i].T.plot.line(ax=ax, c=linecolor,
dashes=dash, grid=True, linewidth=width, figsize=FIGURE_SIZE)
The main point I'm making here is that the way the plotting is achieved is to use the .plot.line method on the Pandas pd.Series (termstruct). This is not at all consistent with the examples and tutorials I was able to find online for drawing charts with pyplot, but it works and it establishes a framework I'm trying to work within.
So I started by taking the obvious step of adding a 3rd subplot for my histogram by changing the subplots call to plt from above:
fig, (ax, ax2, ax3) = plt.subplots(3,1)
My data are in four separate pd.Series objects, where each one represents a series that should map to one of the colors in the chart example at the top of this post. But when I try following the same general coding style of using methods on the data objects to do the plotting, I always seem to wind up with the X and Y axes opposite what I want, like this:
what I wound up with!
The code that generated the above chart was:
ax3 = NakedPNLperMo.plot.hist(ax=ax3,grid=True, figsize=FIGURE_SIZE)
ax3 = H9PNLperMo.plot.hist(ax=ax3, grid=True, figsize=FIGURE_SIZE)
ax3 = H12PNLperMo.plot.hist(ax=ax3, grid=True, figsize=FIGURE_SIZE)
ax3 = H15PNLperMo.plot.hist(ax=ax3, grid=True, figsize=FIGURE_SIZE)
NakedPNLperMo and the other 3 pd.Series objects are full of arcane financial symbols, but a simplified version of their contents (to make this clear) would be:
NakedPNLperMO = pd.Series(data=[1.2,3.4,5.6,7.8,-2.3,-4.6],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
My intention/goal is that the data are plotted on the Y axis and the index values ('Month 1', etc.) are like columns across the x axis, but I can't seem to get that output no matter what I try.
Clearly the problem is the axes are swapped. But when I went looking for how to fix that, I couldn't find any examples online that follow this approach of drawing the chart using methods on the data objects. Everything I found in online tutorials was using a bunch of calls to plt to set up the charts. And more to the point, I couldn't see any way to follow the style in those examples and still draw the chart as a 3rd subplot alongside the 2 subplots already defined by this program.
My first (and foremost) question is what I SHOULD be trying next... Does it make sense to figure out how to change the parameters of [data-object].plot.xxx to get the axes the way I need them, or would it make more sense to follow the completely different style of making a series of calls to plt to design and draw the charts? The former would be consistent with what I have, but I can't find any online help for using that coding style. (Should I infer that it's a deprecated style of doing things?)
If the answer to the above is to take the approach of calling plt like the online examples all seem to show, how can I use the ax3 that ties this chart into the existing subplots? If the answer to the above is to stick with the approach of [data-object].plot.xxx, where can I find help on using that style? All the online examples I could find followed a different coding style.
And of course the most obvious question: How do I swap the axes so the chart looks right? :)
Thanks!

I hope this code help you, I have created three series to show you how you can do it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#jupyter notebook only
%matplotlib inline
s1 = pd.Series(data=[1.2,3.4,5.6,7.8,-2.3,-4.6],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
s2=pd.Series(data=[5,3.4,7.4,-5.1,-2.3,3],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
s3=pd.Series(data=[5,2,-2.4,0,1,3],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
df=pd.concat([s1,s2,s3],axis=1)
df.columns=['s1','s2','s3']
print(df)
ax=df.plot(kind='bar',figsize=(10,10),fontsize=15)
#------------------------------------------------#
plt.xticks(rotation=-45)
#grid on
plt.grid()
# set y=0
ax.axhline(0, color='black', lw=1)
#change size of legend
ax.legend(fontsize=20)
#hiding upper and right axis layout
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#changing the thickness
ax.spines['bottom'].set_linewidth(3)
ax.spines['left'].set_linewidth(3)
Output:
s1 s2 s3
Month 1 1.2 5.0 5.0
Month 2 3.4 3.4 2.0
Month 3 5.6 7.4 -2.4
Month 4 7.8 -5.1 0.0
Month 5 -2.3 -2.3 1.0
Month 6 -4.6 3.0 3.0
Figure

Related

Problems with plt.subplots, which should be the best option?

I'm new in both python and stackoverflow... I come from the ggplot2 R background and I am still getting stacked with python. I don't understand why I have a null plot before my figure using matplotlib... I just have a basic pandas series and I want to plot some of the rows in a subplot, and some on the others (however my display is terrible and I don't know why/how to fix it). Thank you in advance!
df = organism_df.T
fig, (ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize=(5,5))
ax1 = df.iloc[[0,2,3,-1]].plot(kind='bar')
ax1.get_legend().remove()
ax1.set_title('Number of phages/bacteria interacting vs actual DB')
ax2 = df.iloc[[1,4,5,6,7]].plot(kind='bar')
ax2.get_legend().remove()
ax2.set_title('Number of different taxonomies with interactions')
plt.tight_layout()
The method plot from pandas would need the axes given as an argument, e.g., df.plot(ax=ax1, kind='bar'). In your example, first the figure (consisting of ax1 and ax2) is created, then another figure is created by the plot function (at the same time overwriting the original ax1 object) etc.

Is there a way to add a line plot on top of all plots within a Catplot grid in Seaborn/Python?

Hello I am very new to using python, I am starting to use it for creating graphs at work (for papers and reports etc). I was just wondering if someone could help with the problem which I have detailed below? I am guessing there is a very simple solution but I can't figure it out and it is driving me insane!
Basically, I am plotting the results from an experiment where by on the Y-axis I have the results which in this case is a numerical number (Result), against the x-axis which is categorical and is labeled Location. The data is then split across four graphs based on which machine the experiment is carried out on (Machine)(Also categorical).
This first part is easy the code used is this:
'sns.catplot(x='Location', y='Result', data=df3, hue='Machine', col='Machine', col_wrap = 2, linewidth=2, kind='swarm')'
this provides me with the following graph:
I now want to add another layer to the plot where by it is a red line which represents the Upper spec limit for the data.
So I add the following line off code to the above:
'sns.lineplot(x='Location',y=1.8, data=df3, linestyle='--', color='r',linewidth=2)'
This then gives the following graph:
As you can see the red line which I want is only on one of the graphs, all I want to do is add the same red line across all four graphs in the exact same position etc.
Can anyone help me???
You could use .map to draw a horizontal lines on each of the subplots. You need to catch the generated FacetGrid object into a variable.
Here is an example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset('titanic').dropna()
g = sns.catplot(x='class', y='age', data=titanic,
hue='embark_town', col='embark_town', col_wrap=2, linewidth=2, kind='swarm')
g.map(plt.axhline, y=50, ls='--', color='r', linewidth=2)
plt.tight_layout()
plt.show()

matplotlib: plotting more than one figure at once

I am working with 3 pandas dataframes having the same column structures(number and type), only that the datasets are for different years.
I would like to plot the ECDF for each of the dataframes, but everytime I do this, I do it individually (lack python skills). So also, one of the figures (2018) is scaled differently on x-axis making it a bit difficult to compare. Here's how I do it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from empiricaldist import Cdf
df1 = pd.read_csv('2016.csv')
df2 = pd.read_csv('2017.csv')
df3 = pd.read_csv('2018.csv')
#some info about the dfs
df1.columns.values
array(['id', 'trip_id', 'distance', 'duration', 'speed', 'foot', 'bike',
'car', 'bus', 'metro', 'mode'], dtype=object)
modal_class = df1['mode']
print(modal_class[:5])
0 bus
1 metro
2 bike
3 foot
4 car
def decorate_ecdf(title, x, y):
plt.xlabel(x)
plt.ylabel(y)
plt.title(title)
#plotting the ecdf for 2016 dataset
for name, group in df1.groupby('mode'):
Cdf.from_seq(group.speed).plot()
title, x, y = 'Speed distribution by travel mode (April 2016)','speed (m/s)', 'ECDF'
decorate_ecdf(title,x,y)
#plotting the ecdf for 2017 dataset
for name, group in df2.groupby('mode'):
Cdf.from_seq(group.speed).plot()
title, x, y = 'Speed distribution by travel mode (April 2017)','speed (m/s)', 'ECDF'
decorate_ecdf(title,x,y)
#plotting the ecdf for 2018 dataset
for name, group in df3.groupby('mode'):
Cdf.from_seq(group.speed).plot()
title, x, y = 'Speed distribution by travel mode (April 2018)','speed (m/s)', 'ECDF'
decorate_ecdf(title,x,y)
Output:
I am pretty sure this isn't the pythonist way of doing it, but a dirty way to get the work done. You can also see how the 2018 plot is scaled differently on the x-axis.
Is there a way to enforce that all figures are scaled the same way?
How do I re-write my code such that the figures are plotted by calling a function once?
When using pyplot, you can plot using an implicit method with plt.plot(), or you can use the explicit method, by creating and calling the figure and axis objects with fig, ax = plt.subplots(). What happened here is, in my view, a side-effect from using the implicit method.
For example, you can use two pd.DataFrame.plot() commands and have them share the same axis by supplying the returned axis to the other function.
foo = pd.DataFrame(dict(a=[1,2,3], b=[4,5,6]))
bar = pd.DataFrame(dict(c=[3,2,1], d=[6,5,4]))
ax = foo.plot()
bar.plot(ax=ax) # ax object is updated
ax.plot([0,3], [1,1], 'k--')
You can also create the figure and axis object previously, and supply as needed. Also, it's perfectly file to have multiple plot commands. Often, my code is 25% work, 75% fiddling with plots. Don't try to be clever and lose on readability.
fig, axes = plt.subplots(nrows=3, ncols=1, sharex=True)
# In this case, axes is a numpy array with 3 axis objects
# You can access the objects with indexing
# All will have the same x range
axes[0].plot([-1, 2], [1,1])
axes[1].plot([-2, 1], [1,1])
axes[2].plot([1,3],[1,1])
So you can combine both of these snippets to your own code. First, create the figure and axes object, then plot each dataframe, but supply the correct axis to them with the keyword ax.
Also, suppose you have three axis objects and they have different x limits. You can get them all, then set the three to have the same minimum value and the same maximum value. For example:
axis_list = [ax1, ax2, ax3] # suppose you created these separately and want to enforce the same axis limits
minimum_x = min([ax.get_xlim()[0] for ax in axis_list])
maximum_x = max([ax.get_xlim()[1] for ax in axis_list])
for ax in axis_list:
ax.set_xlim(minimum_x, maximum_x)

Histograms of grouped data

I'm fairly new to Python and getting my head around this has been really hard.
I have a code like this
df = p.read_csv("files/athena-query-1.txt", ";")
ax = df.hist(column="distance", range=[0.0, 0.5], bins=100, by="gate_id")
All I want is to see a distance distribution per gate on separate charts. If there are 400 gate_id, I want to see 400 distribution plots.
It tells me that the ax is a collection of AxesSubplot. When I try to plot this, I get only one graph that is unreadable. My guessing is that it tries to create a single chart (a Figure?).
EDIT:
I reproduced a minimal example of what I think you might mean:
#create dataframe with 100 random values of normal distribution for 'distance', and distributing (1,2,3,4) as 'gate_id' evenly among the values:
df=pd.DataFrame({'distance': scipy.stats.norm.rvs(size=100), 'gate_id': 25*[1,2,3,4]})
df.hist(column='distance', range=[0.0, 0.5], bins=100, by='gate_id')
This yields a figure with 4 subplots, corresponding to 'gate_id':
However if I try for 400 as you mentioned, the figure isn't even shown. Probably because its simply not big enough to hold 400 subplots. This is the reason I recommend the first solution example I gave below.
ORIGINAL:
If you want 400 seperate distribution plots, then why not create a 400 figures using matplotlib?
from matplotlib import pyplot as plt
for i in range(400):
fig, ax = plt.subplots()
ax.plot(<dataframe['x']>,<dataframe['y']>)
or you can also try to plot a huge figure with many subplots, such as
fig, ( (ax1, ax2, ax3, ...<fill up here>..., ax10), (ax11, ..., ax20), ..., (ax91, ..., ax100)) = plt.subplots(nrows=10, ncols=10)
ax1.bar(<dataframe['x']>,<dataframe['y']>)
...
ax100.bar(<dataframe['x']>,<dataframe['y']>)
This is only for 100 subplots, not sure if 400 is just simply too big.

Remove one of the two legends produced in this Seaborn figure?

I have just started using seaborn to produce my figures. However I can't seem to remove one of the legends produced here.
I am trying to plot two accuracies against each other and draw a line along the diagonal to make it easier to see which has performed better (if anyone has a better way of plotting this data in seaborn - let me know!). The legend I'd like to keep is the one on the left, that shows the different colours for 'N_bands' and different shapes for 'Subject No'
ax1 = sns.relplot(y='y',x='x',data=df,hue='N bands',legend='full',style='Subject No.',markers=['.','^','<','>','8','s','p','*','P','X','D','H','d']).set(ylim=(80,100),xlim=(80,100))
ax2 = sns.lineplot(x=range(80,110),y=range(80,110),legend='full')
I have tried setting the kwarg legend to 'full','brief' and False for both ax1 and ax2 (together and separately) and it only seems to remove the one on the left, or both.
I have also tried to remove the axes using matplotlib
ax1.ax.legend_.remove()
ax2.legend_.remove()
But this results in the same behaviour (left legend dissapearing).
UPDATE: Here is a minimal example you can run yourself:
test_data = np.array([[1.,2.,100.,9.],[2.,1.,100.,8.],[3.,4.,200.,7.]])
test_df = pd.DataFrame(columns=['x','y','p','q'], data=test_data)
sns.set_context("paper")
ax1=sns.relplot(y='y',x='x',data=test_df,hue='p',style='q',markers=['.','^','<','>','8'],legend='full').set(ylim=(0,4),xlim=(0,4))
ax2=sns.lineplot(x=range(0,5),y=range(0,5),legend='full')
Although this doesn't reproduce the error perfectly as the right legend is coloured (I have no idea how to reproduce this error then - does the way my dataframe was created make a difference?). But the essence of the problem remains - how do I remove the legend on the right but keep the one on the left?
You're plotting a lineplot in the (only) axes of a FacetGrid produced via relplot. That's quite unconventional, so strange things might happen.
One option to remove the legend of the FacetGrid but keeping the one from the lineplot would be
g._legend.remove()
Full code (where I also corrected for the confusing naming if grids and axes)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
test_data = np.array([[1.,2.,100.,9.],[2.,1.,100.,8.],[3.,4.,200.,7.]])
test_df = pd.DataFrame(columns=['x','y','p','q'], data=test_data)
sns.set_context("paper")
g=sns.relplot(y='y',x='x',data=test_df,hue='p',style='q',markers=['.','^','<','>','8'], legend='full')
sns.lineplot(x=range(0,5),y=range(0,5),legend='full', ax=g.axes[0,0])
g._legend.remove()
plt.show()
Note that this is kind of a hack, and it might break in future seaborn versions.
The other option is to not use a FacetGrid here, but just plot a scatter and a line plot in one axes,
ax1 = sns.scatterplot(y='y',x='x',data=test_df,hue='p',style='q',
markers=['.','^','<','>','8'], legend='full')
sns.lineplot(x=range(0,5),y=range(0,5), legend='full', ax=ax1)
plt.show()

Categories