Sharing two y axes on multiple matplotlib subplots [duplicate] - python

This question already has an answer here:
How to share secondary y-axis between subplots in matplotlib
(1 answer)
Closed 5 years ago.
My goal is to have two rows and three columns of plots using matplotlib. Each graph in the top row will contain two data series, and two y-axes. I want to make the scales on each axis line up so that the corresponding data series are directly comparable. Right now I have it so that the primary y-axis on each graph is aligned, but I can't get the secondary y-axes to align. Here is my current code:
import matplotlib.pyplot as plt
import pandas as pd
excel_file = 'test_data.xlsx'
sims = ['Sim 02', 'Sim 01', 'Sim 03']
if __name__ == '__main__':
data = pd.read_excel(excel_file, skiprows=[0, 1, 2, 3], sheetname=None, header=1, index_col=[0, 1], skip_footer=10)
plot_cols = len(sims)
plot_rows = 2
f, axes = plt.subplots(plot_rows, plot_cols, sharex='col', sharey='row')
secondary_ax = []
for i, sim in enumerate(sims):
df = data[sim]
modern = df.loc['Modern']
traditional = df.loc['Traditional']
axes[0][i].plot(modern.index, modern['Idle'])
secondary_ax.append(axes[0][i].twinx())
secondary_ax[i].plot(modern.index, modern['Work'])
axes[1][i].bar(modern.index, modern['Result'])
axes[0][i].set_xlim(12, 6)
if i > 0:
secondary_ax[0].get_shared_y_axes().join(secondary_ax[0], secondary_ax[i])
# secondary_ax[0].get_shared_y_axes().join(x for x in secondary_ax)
plt.show()
The solution I tried (Both the line in the if statement, and the last line before plt.show()) were solutions to similar questions, however it didn't resolve my issue. Nothing breaks, the secondary axes just aren't aligned.
I also tried adding an extra row in the plt.subplots method and using twinx() to combined the first two rows, but it created an empty second row of plots none-the-less.
As a fall back I think I could manually check each axes for the maxes and mins, and loop through each to update manually, but I'd love to find a cleaner solution if one is out there, and anyone has any insight. Thanks.

You just need to share the y axes before plotting your data:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# excel_file = 'test_data.xlsx'
sims = ['Sim 02', 'Sim 01', 'Sim 03']
if __name__ == '__main__':
# data = pd.read_excel(excel_file, skiprows=[0, 1, 2, 3], sheetname=None, header=1, index_col=[0, 1], skip_footer=10)
modern = pd.DataFrame(np.random.randint(0, 100, (100, 3)), columns=sims)
traditional = pd.DataFrame(np.random.randint(10, 30, (100, 3)), columns=sims)
traditional[sims[1]] = traditional[sims[1]] + 40
traditional[sims[2]] = traditional[sims[2]] - 40
data3 = pd.DataFrame(np.random.randint(0, 100, (100, 3)), columns=sims)
plot_cols = len(sims)
plot_rows = 2
f, axes = plt.subplots(plot_rows, plot_cols, sharex='col', sharey='row', figsize=(30, 10))
secondary_ax = []
for i, sim in enumerate(sims):
df = data[sim]
modern_series = modern[sim]
traditional_series = traditional[sim]
idle = data3
axes[0][i].plot(modern_series.index, modern_series)
secondary_ax.append(axes[0][i].twinx())
if i > 0:
secondary_ax[0].get_shared_y_axes().join(secondary_ax[0], secondary_ax[i])
secondary_ax[i].plot(traditional_series.index, traditional_series)
# axes[1][i].bar(data3.index, data3)
axes[0][i].set_xlim(12, 6)
plt.show()

Related

how to plot pairs in different subplots with difference on the side

I want to make a plot in seaborn but I am having some difficulties. The data has 2 variable: time (2 levels) and state (2 levels). I want to plot time on the x axis and state as different subplots, showing individual data lines. Finally, to the right of these I want to show a difference plot of the difference between time 2 and time 1, for each of the levels of state. I cannot do it very well, because I cannot get the second plot to show onto the right. Here has been my try:
import numpy as np
import pandas as pd
import seaborn as sns
# Just making some fake data
ids = [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5]
times = [1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2]
states = ['A', 'B', 'A', 'B'] * 5
np.random.seed(121)
resps = [(i*t) + np.random.normal() for i, t in zip(ids, times)]
DATA = {
'identity': ids,
'time': times,
'state': states,
'resps': resps
}
df = pd.DataFrame(DATA)
# Done with data
g = sns.relplot(
data=df, kind='line',
col='state', x='time', y='resps', units='identity',
estimator=None, alpha=.5, height=5, aspect=.7)
# # Draw a line onto each Axes
g.map(sns.lineplot,"time", "resps", lw=5, ci=None)
# Make a wide data to make the difference
wide = df.set_index(['identity', 'state', 'time']).unstack().reset_index()
A = wide['state']=='A'
B = wide['state']=='B'
wide['diffA'] = wide[A][('resps', 2)] - wide[A][('resps', 1)]
wide['diffB'] = wide[B][('resps', 2)] - wide[B][('resps', 1)]
wide['difference'] = wide[['diffA', 'diffB']].sum(axis=1)
wide = wide.drop(columns=[('diffA', ''), ('diffB', '')])
sns.pointplot(x='state', y='difference', data=wide, join=False)
Output from the first
And output from the second:
Is there no way to put them together? Even though they are different data? I did try to use matplotlib. And then achieved slightly better results but this still had a problem because I wanted the two left plots to have a shared y axis but not the difference. This created lots of work as well, because I want to be flexible for different numbers of the state variable, but only kept to 2 for simplicity. Here is a paint version of what I want to do (sorry for the poor quality), hopefully with some more control over appearance but this is secondary:
Is there a reliable way to do this in a simpler way? Thanks!
The problem is that sns.relplot operates at a figure level. This means it creates its own figure object and we cannot control the axes it uses. If you want to leverage seaborn for the creation of the lines without using "pure" matplotlib, you can copy the lines on matplotlib axes:
import numpy as np
import pandas as pd
import seaborn as sns
# Just making some fake data
ids = [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5]
times = [1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2]
states = ['A', 'B', 'A', 'B'] * 5
np.random.seed(121)
resps = [(i*t) + np.random.normal() for i, t in zip(ids, times)]
DATA = {
'identity': ids,
'time': times,
'state': states,
'resps': resps
}
df = pd.DataFrame(DATA)
# Done with data
g = sns.relplot(
data=df, kind='line',
col='state', x='time', y='resps', units='identity',
estimator=None, alpha=.5, height=5, aspect=.7)
# # Draw a line onto each Axes
g.map(sns.lineplot,"time", "resps", lw=5, ci=None)
# Make a wide data to make the difference
wide = df.set_index(['identity', 'state', 'time']).unstack().reset_index()
A = wide['state']=='A'
B = wide['state']=='B'
wide['diffA'] = wide[A][('resps', 2)] - wide[A][('resps', 1)]
wide['diffB'] = wide[B][('resps', 2)] - wide[B][('resps', 1)]
wide['difference'] = wide[['diffA', 'diffB']].sum(axis=1)
wide = wide.drop(columns=[('diffA', ''), ('diffB', '')])
# New code ----------------------------------------
import matplotlib.pyplot as plt
plt.close(g.figure)
fig = plt.figure(figsize=(12, 4))
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2, sharey=ax1)
ax3 = fig.add_subplot(1, 3, 3)
l = list(g.axes[0][0].get_lines())
l2 = list(g.axes[0][1].get_lines())
for ax, g_ax in zip([ax1, ax2], g.axes[0]):
l = list(g_ax.get_lines())
for line in l:
ax.plot(line.get_data()[0], line.get_data()[1], color=line.get_color(), lw=line.get_linewidth())
ax.set_title(g_ax.get_title())
sns.pointplot(ax=ax3, x='state', y='difference', data=wide, join=False)
# End of new code ----------------------------------
plt.show()
Result:

Set 'global' colorbar range for multiple matplotlib subplots of different ranges

I would like to plot data in subplots using matplotlib.pyplot in python. Each subplot will contain data of different ranges. I would like to plot them using pyplot.scatter, and use one single colorbar for the entire plot. Thus, the colorbar should encompass the entire range of the values in every subplot. However, when I use a loop to plot the subplots and call a colorbar outside of the loop, it only uses the range of values from the last subplot. A lot of examples available concern the sizing the position of the colorbar, so this answer (how to make one universal colorbar for multiple subplots) is not obvious.
I have the following self-contained example code. Here, two subplots are rendered, one that should be colored with frigid temperatures typical of Russia and the other with tropical temperatures of Brazil. However, the end result shows a colorbar that only ranges the tropical Brazilian temperatures, making the Russia subplot erroneous:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
core_list = ['Russia', 'Brazil']
core_depth = [0, 2, 4, 6, 8, 10]
lo = [-33, 28]
hi = [10, 38]
df = pd.DataFrame([], columns = ['Location', 'Depth', '%TOC', 'Temperature'])
#Fill df
for ii, name in enumerate(core_list):
for jj in core_depth:
df.loc[len(df.index)] = [name, jj, (np.random.randint(1, 20))/10, np.random.randint(lo[ii], hi[ii])]
#Russia data have much colder temperatures than Brazil data due to hi and lo
#Plot data from each location using scatter plots
fig, axs = plt.subplots(nrows = 1, ncols = 2, sharey = True)
for nn, name in enumerate(core_list):
core_mask = df['Location'] == name
data = df.loc[core_mask]
plt.sca(axs[nn])
plt.scatter(data['Depth'], data['%TOC'], c = data['Temperature'], s = 50, edgecolors = 'k')
axs[nn].set_xlabel('%TOC')
plt.text(1.25*min(data['%TOC']), 1.75, name)
if nn == 0:
axs[nn].set_ylabel('Depth')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Temperature, degrees C')
#How did Russia get so warm?!? Temperatures and ranges of colorbar are set to last called location.
#How do I make one colorbar encompass global temperature range of both data sets?
The output of this code shows that the temperatures in Brazil and Russia fall within the same range of colors:
We know intuitively, and from glancing at the data, that this is wrong. So, how do we tell pyplot to plot this correctly?
The answer is straightforward using the vmax and vmin controls of pyplot.scatter. These must be set with a universal range of data, not just the data focused on in any single iteration of a loop. Thus, to change the code above:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
core_list = ['Russia', 'Brazil']
core_depth = [0, 2, 4, 6, 8, 10]
lo = [-33, 28]
hi = [10, 38]
df = pd.DataFrame([], columns = ['Location', 'Depth', '%TOC', 'Temperature'])
#Fill df
for ii, name in enumerate(core_list):
for jj in core_depth:
df.loc[len(df.index)] = [
name,
jj,
(np.random.randint(1, 20))/10,
np.random.randint(lo[ii], hi[ii])
]
#Russia data have much colder temperatures than Brazil data due to hi and lo
#Plot data from each location using scatter plots
fig, axs = plt.subplots(nrows = 1, ncols = 2, sharey = True)
for nn, name in enumerate(core_list):
core_mask = df['Location'] == name
data = df.loc[core_mask]
plt.sca(axs[nn])
plt.scatter(
data['Depth'],
data['%TOC'],
c=data['Temperature'],
s=50,
edgecolors='k',
vmax=max(df['Temperature']),
vmin=min(df['Temperature'])
)
axs[nn].set_xlabel('%TOC')
plt.text(1.25*min(data['%TOC']), 1.75, name)
if nn == 0:
axs[nn].set_ylabel('Depth')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Temperature, degrees C')
Now, the output shows a temperature difference between Russia and Brazil, which one would expect after a cursory glance at the data. The change that fixes this problem occurs within the for loop, however it references all of the data to find a max and min:
plt.scatter(data['Depth'], data['%TOC'], c = data['Temperature'], s = 50, edgecolors = 'k', vmax = max(df['Temperature']), vmin = min(df['Temperature']) )

Python: Barplot colored according to a third variable

Currently I am trying to create a Barplot that shows the amount of reviews for an app per week. The bar should however be colored according to a third variable which contains the average rating of the reviews in each week (range: 1 to 5).
I followed the instructions of the following post to create the graph: Python: Barplot with colorbar
The code works fine:
# Import Packages
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
# Create Dataframe
data = [[1, 10, 3.4], [2, 15, 3.9], [3, 12, 3.6], [4, 30,1.2]]
df = pd.DataFrame(data, columns = ["week", "count", "score"])
# Convert to lists
data_x = list(df["week"])
data_hight = list(df["count"])
data_color = list(df["score"])
#Create Barplot:
data_color = [x / max(data_color) for x in data_color]
fig, ax = plt.subplots(figsize=(15, 4))
my_cmap = plt.cm.get_cmap('RdYlGn')
colors = my_cmap(data_color)
rects = ax.bar(data_x, data_hight, color=colors)
sm = ScalarMappable(cmap=my_cmap, norm=plt.Normalize(1,5))
sm.set_array([])
cbar = plt.colorbar(sm)
cbar.set_label('Color', rotation=270,labelpad=25)
plt.show()
Now to the issue: As you might notice the value of the average score in week 4 is "1.2". The Barplot does however indicate that the value lies around "2.5". I understand that this stems from the following code line, which standardizes the values by dividing it with the max value:
data_color = [x / max(data_color) for x in data_color]
Unfortunatly I am not able to change this command in a way that the colors resemble the absolute values of the scores, e.g. with a average score of 1.2 the last bar should be colored in deep red not light orange. I tried to just plug in the regular score values (Not standardized) to solve the issue, however, doing so creates all bars with the same green color... Since this is only my second python project, I have a hard time comprehending the process behind this matter and would be very thankful for any advice or solution.
Cheers Neil
You identified correctly that the normalization is the problem here. It is in the linked code by valued SO user #ImportanceOfBeingEarnest defined for the interval [0, 1]. If you want another normalization range [normmin, normmax], you have to take this into account during the normalization:
# Import Packages
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
# Create Dataframe
data = [[1, 10, 3.4], [2, 15, 3.9], [3, 12, 3.6], [4, 30,1.2]]
df = pd.DataFrame(data, columns = ["week", "mycount", "score"])
# Not necessary to convert to lists, pandas series or numpy array is also fine
data_x = df.week
data_hight = df.mycount
data_color = df.score
#Create Barplot:
normmin=1
normmax=5
data_color = [(x-normmin) / (normmax-normmin) for x in data_color] #see the difference here
fig, ax = plt.subplots(figsize=(15, 4))
my_cmap = plt.cm.get_cmap('RdYlGn')
colors = my_cmap(data_color)
rects = ax.bar(data_x, data_hight, color=colors)
sm = ScalarMappable(cmap=my_cmap, norm=plt.Normalize(normmin,normmax))
sm.set_array([])
cbar = plt.colorbar(sm)
cbar.set_label('Color', rotation=270,labelpad=25)
plt.show()
Sample output:
Obviously, this does not check that all values are indeed within the range [normmin, normmax], so a better script would make sure that all values adhere to this specification. We could, alternatively, address this problem by clipping the values that are outside the normalization range:
#...
import numpy as np
#.....
#Create Barplot:
normmin=1
normmax=3.5
data_color = [(x-normmin) / (normmax-normmin) for x in np.clip(data_color, normmin, normmax)]
#....
You may also have noticed another change that I introduced. You don't have to provide lists - pandas series or numpy arrays are fine, too. And if you name your columns not like pandas functions such as count, you can access them as df.ABC instead of df["ABC"].

Matplotlib: custom ticker for pandas MultiIndex DataFrame

I have a large pandas MultiIndex DataFrame that I would like to plot. A minimal example would look like:
import pandas as pd
years = range(2015, 2018)
fields = range(4)
days = range(4)
bands = ['R', 'G', 'B']
index = pd.MultiIndex.from_product(
[years, fields], names=['year', 'field'])
columns = pd.MultiIndex.from_product(
[days, bands], names=['day', 'band'])
df = pd.DataFrame(0, index=index, columns=columns)
df.loc[(2015,), (0,)] = 1
df.loc[(2016,), (1,)] = 1
df.loc[(2017,), (2,)] = 1
If I plot this using plt.spy, I get:
However, the tick locations and labels are less than desirable. I would like the ticks to completely ignore the second level of the MultiIndex. Using IndexLocator and IndexFormatter, I'm able to do the following:
from matplotlib.ticker import IndexFormatter, IndexLocator
import matplotlib.pyplot as plt
ax = plt.gca()
plt.spy(df)
xbase = len(bands)
xoffset = xbase / 2
xlabels = df.columns.get_level_values('day')
ax.xaxis.set_major_locator(IndexLocator(base=xbase, offset=xoffset))
ax.xaxis.set_major_formatter(IndexFormatter(xlabels))
plt.xlabel('Day')
ax.xaxis.tick_bottom()
ybase = len(fields)
yoffset = ybase / 2
ylabels = df.index.get_level_values('year')
ax.yaxis.set_major_locator(IndexLocator(base=ybase, offset=yoffset))
ax.yaxis.set_major_formatter(IndexFormatter(ylabels))
plt.ylabel('Year')
plt.show()
This gives me exactly what I want:
But here's the problem. My actual DataFrame has 15 years, 4,000 fields, 365 days, and 7 bands. If I actually label every single day, the labels would be illegible. I could place a tick every 50 days, but I would like the ticks to be dynamic so that when I zoom in, the ticks become more fine-grained. Basically what I'm looking for is a custom MultiIndexLocator that combines the placement of IndexLocator with the dynamism of MaxNLocator.
Bonus: My data is really nice in the sense that there are always the same number of fields for every year and the same number of bands for every day. But what if this was not the case? I would love to contribute a generic MultiIndexLocator and MultiIndexFormatter to matplotlib that works for any MultiIndex DataFrame.
Matplotlib does not know about dataframes or MultiIndex. It simply plots the data you supply. I.e. you get the same as if you were plotting the numpy array of data, spy(df.values).
So I would suggest to first set the extent of the image correctly such that you may use numeric tickers. Then a MaxNLocator should work fine, unless you do not zoom in too much.
import numpy as np
import pandas as pd
from matplotlib.ticker import MaxNLocator
import matplotlib.pyplot as plt
plt.rcParams['axes.formatter.useoffset'] = False
years = range(2000, 2018)
fields = range(9) #17
days = range(120) #365
bands = ['R', 'G', 'B', 'A']
index = pd.MultiIndex.from_product(
[years, fields], names=['year', 'field'])
columns = pd.MultiIndex.from_product(
[days, bands], names=['day', 'band'])
data = np.random.rand(len(years)*len(fields),len(days)*len(bands))
x,y = np.meshgrid(np.arange(data.shape[1]),np.arange(data.shape[0]))
data += 2*((y//len(fields)+x//len(bands)) % 2)
df = pd.DataFrame(data, index=index, columns=columns)
############
# Plotting
############
xbase = len(bands)
xlabels = df.columns.get_level_values('day')
ybase = len(fields)
ylabels = df.index.get_level_values('year')
extent = [xlabels.min()-np.diff(np.unique(xlabels))[0]/2.,
xlabels.max()+np.diff(np.unique(xlabels))[0]/2.,
ylabels.min()-np.diff(np.unique(ylabels))[0]/2.,
ylabels.max()+np.diff(np.unique(ylabels))[0]/2.,]
fig, ax = plt.subplots()
ax.imshow(df.values, extent=extent, aspect="auto")
ax.set_ylabel('Year')
ax.set_xlabel('Day')
ax.xaxis.set_major_locator(MaxNLocator(integer=True,min_n_ticks=1))
ax.yaxis.set_major_locator(MaxNLocator(integer=True,min_n_ticks=1))
plt.show()

Plotting two datasets from different folders on one plot

I have two folders with similar number of files: maindirNo and maindirWith. I'm trying to plot each pair of similar files from folders on one plot:
for i in [maindirNo, maindirWith]:
for root, dirs, files in os.walk(i):
for fil in files:
if 'output.rsv' in fil:
df = pd.read_csv(os.path.join(i, fil), skiprows = 9, delimiter = r'\s+', header = None)
df['SIMULATEDm'] = mergedlevels
df['OBSERVEDm'] = df_observed['OBSERVEDm']
df['date'] = pd.date_range('1/1991','12/2040', freq='MS')
if i == maindirNo:
plt.plot(df['date'], df['SIMULATEDm'], 'b', label='No outlet')
if i == maindirWith:
plt.plot(df['date'], df['SIMULATEDm'], 'r', label='With outlet')
plt.legend(loc = 'lower right')
plt.savefig('C:/Users/sgulbin/Desktop/AGU_Conf/plots/%s.jpg' %fil)
plt.close()
The problem is that I either have all datesets plotted on one plot, or one plot for each file (I need two datasets on one plot). I assume I can append output to an empty dataframe and then plot it, but is there a simplest way to plot them through the loop?
P.S. I know there are kind of similar questions to this, but not exactly.
pandas uses matplotlib which gives fig and ax when you create many plots. ie. 5 plots in one column
fig, ax = plt.subplots(5, 1)
and then you can use ax[0], a[1] to choose plot for drawed line.
import matplotlib.pyplot as plt
import pandas as pd
import random
SIZE = 5
# create grid 5x1
fig, ax = plt.subplots(SIZE, 1)
# --- first folder --- blue ---
for idx in range(SIZE):
# dataframe with random data as example
df = pd.DataFrame([ random.randint(0,10) for _ in range(10) ])
# draw it
ax[idx].plot(df, 'b')
# --- second folder --- red ---
for idx in range(SIZE):
# dataframe with random data as example
df = pd.DataFrame([ random.randint(0,10) for _ in range(10) ])
# draw it
ax[idx].plot(df, 'r')
plt.show()

Categories