Matplotlib: Plot on double y-axis plot misaligned - python

I'm trying to plot two datasets into one plot with matplotlib. One of the two plots is misaligned by 1 on the x-axis.
This MWE pretty much sums up the problem. What do I have to adjust to bring the box-plot further to the left?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
titles = ["nlnd", "nlmd", "nlhd", "mlnd", "mlmd", "mlhd", "hlnd", "hlmd", "hlhd"]
plotData = pd.DataFrame(np.random.rand(25, 9), columns=titles)
failureRates = pd.DataFrame(np.random.rand(9, 1), index=titles)
color = {'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue',
'caps': 'Gray'}
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
plotData.plot.box(ax=ax1, color=color, sym='+')
failureRates.plot(ax=ax2, color='b', legend=False)
ax1.set_ylabel('Seconds')
ax2.set_ylabel('Failure Rate in %')
plt.xlim(-0.7, 8.7)
ax1.set_xticks(range(len(titles)))
ax1.set_xticklabels(titles)
fig.tight_layout()
fig.show()
Actual result. Note that its only 8 box-plots instead of 9 and that they're starting at index 1.

The issue is a mismatch between how box() and plot() work - box() starts at x-position 1 and plot() depends on the index of the dataframe (which defaults to starting at 0). There are only 8 plots because the 9th is being cut off since you specify plt.xlim(-0.7, 8.7). There are several easy ways to fix this, as #Sheldore's answer indicates, you can explicitly set the positions for the boxplot. Another way you can do this is to change the indexing of the failureRates dataframe to start at 1 in construction of the dataframe, i.e.
failureRates = pd.DataFrame(np.random.rand(9, 1), index=range(1, len(titles)+1))
note that you need not specify the xticks or the xlim for the question MCVE, but you may need to for your complete code.

You can specify the positions on the x-axis where you want to have the box plots. Since you have 9 boxes, use the following which generates the figure below
plotData.plot.box(ax=ax1, color=color, sym='+', positions=range(9))

Related

Two seaborn plots with different scales displayed on same plot but bars overlap

I am trying to include 2 seaborn countplots with different scales on the same plot but the bars display as different widths and overlap as shown below. Any idea how to get around this?
Setting dodge=False, doesn't work as the bars appear on top of each other.
The main problem of the approach in the question, is that the first countplot doesn't take hue into account. The second countplot won't magically move the bars of the first. An additional categorical column could be added, only taking on the 'weekend' value. Note that the column should be explicitly made categorical with two values, even if only one value is really used.
Things can be simplified a lot, just starting from the original dataframe, which supposedly already has a column 'is_weeked'. Creating the twinx ax beforehand allows to write a loop (so writing the call to sns.countplot() only once, with parameters).
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_style('dark')
# create some demo data
data = pd.DataFrame({'ride_hod': np.random.normal(13, 3, 1000).astype(int) % 24,
'is_weekend': np.random.choice(['weekday', 'weekend'], 1000, p=[5 / 7, 2 / 7])})
# now, make 'is_weekend' a categorical column (not just strings)
data['is_weekend'] = pd.Categorical(data['is_weekend'], ['weekday', 'weekend'])
fig, ax1 = plt.subplots(figsize=(16, 6))
ax2 = ax1.twinx()
for ax, category in zip((ax1, ax2), data['is_weekend'].cat.categories):
sns.countplot(data=data[data['is_weekend'] == category], x='ride_hod', hue='is_weekend', palette='Blues', ax=ax)
ax.set_ylabel(f'Count ({category})')
ax1.legend_.remove() # both axes got a legend, remove one
ax1.set_xlabel('Hour of Day')
plt.tight_layout()
plt.show()
use plt.xticks(['put the label by hand in your x label'])

multiple boxplots, side by side, using matplotlib from a dataframe

I'm trying to plot 60+ boxplots side by side from a dataframe and I was wondering if someone could suggest some possible solutions.
At the moment I have df_new, a dataframe with 66 columns, which I'm using to plot boxplots. The easiest way I found to plot the boxplots was to use the boxplot package inside pandas:
boxplot = df_new.boxplot(column=x, figsize = (100,50))
This gives me a very very tiny chart with illegible axis which I cannot seem to change the font size for, so I'm trying to do this natively in matplotlib but I cannot think of an efficient way of doing it. I'm trying to avoid creating 66 separate boxplots using something like:
fig, ax = plt.subplots(nrows = 1,
ncols = 66,
figsize = (10,5),
sharex = True)
ax[0,0].boxplot(#insert parameters here)
I actually do not not how to get the data from df_new.describe() into the boxplot function, so any tips on this would be greatly appreciated! The documentation is confusing. Not sure what x vectors should be.
Ideally I'd like to just give the boxplot function the dataframe and for it to automatically create all the boxplots by working out all the quartiles, column separations etc on the fly - is this even possible?
Thanks!
I tried to replace the boxplot with a ridge plot, which takes up less space because:
it requires half of the width
you can partially overlap the ridges
it develops vertically, so you can scroll down all the plot
I took the code from the seaborn documentation and adapted it a little bit in order to have 60 different ridges, normally distributed; here the code:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import itertools
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# # Create the data
n = 20
x = list(np.random.randn(1, 60)[0])
g = [item[0] + item[1] for item in list(itertools.product(list('ABCDEFGHIJ'), list('123456')))]
df = pd.DataFrame({'x': n*x,
'g': n*g})
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
This is the result I get:
I don't know if it will be good for your needs, in any case keep in mind that keeping so many distributions next to each other will always require a lot of space (and a very big screen).
Maybe you could try dividing the distrubutions into smaller groups and plotting them a little at a time?

How to use different axis scales in pandas' DataFrame.plot.hist?

I find DataFrame.plot.hist to be amazingly convenient, but I cannot find a solution in this case.
I want to plot the distribution of many columns in the dataset. The problem is that pandas retains the same scale on all x axes, rendering most of the plots useless. Here is the code I'm using:
X.plot.hist(subplots=True, layout=(13, 6), figsize=(20, 45), bins=50, sharey=False, sharex=False)
plt.show()
And here's a section of the result:
It appears that the issue is that pandas uses the same bins on all the columns, irrespectively of their values. Is there a convenient solution in pandas or am I forced to do it by hand?
I centered the data (zero mean and unit variance) and the result improved a little, but it's still not acceptable.
There are a couple of options, here is the code and output:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Dummy data - value ranges differ a lot between columns
X = pd.DataFrame()
for i in range(18):
X['COL0{0}'.format(i+38)]=(2**i)*np.random.random(1000)
# Method 1 - just using the hist function to generate each plot
X.hist(layout=(3, 6), figsize=(20, 10), sharey=False, sharex=False, bins=50)
plt.title('Method 1')
plt.show()
# Method 2 - generate each plot separately
cols = plt.cm.spectral(np.arange(1,255,13))
fig, axes = plt.subplots(3,6,figsize=(20,10))
for index, column in enumerate(X.columns):
ax = axes.flatten()[index]
ax.hist(X[column],bins=50, label=column, fc=cols[index])
ax.legend(loc='upper right')
ax.set_ylim((0,1.2*ax.get_ylim()[1]))
fig.suptitle('Method 2')
fig.show()
The first plot:
The second plot:
I would definitely recommend the second method as you have much more control over the individual plots, for example you can change the axes scales, labels, grid parameters, and almost anything else.
I couldn't find anything that would allow you to modify the original plot.hist bins to accept individually calculated bins.
I hope this helps!

How to change the positions of subplot titles and axis labels in Seaborn FacetGrid?

I am trying to plot a polar plot using Seaborn's facetGrid, similar to what is detailed on seaborn's gallery
I am using the following code:
sns.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1.25)
# Set up a grid of axes with a polar projection
g = sns.FacetGrid(df_total, col="Construct", hue="Run", col_wrap=5, subplot_kws=dict(projection='polar'), size=5, sharex=False, sharey=False, despine=False)
# Draw a scatterplot onto each axes in the grid
g.map(plt.plot, 'Rad', ''y axis label', marker=".", ms=3, ls='None').set_titles("{col_name}")
plt.savefig('./image.pdf')
Which with my data gives the following:
I want to keep this organisation of 5 plots per line.
The problem is that the title of each subplot overlap with the values of the ticks, same for the y axis label.
Is there a way to prevent this behaviour? Can I somehow shift the titles slightly above their current position and can I shift the y axis labels slightly on the left of their current position?
Many thanks in advance!
EDIT:
This is not a duplicate of this SO as the problem was that the title of one subplot overlapped with the axis label of another subplot.
Here my problem is that the title of one subplot overlaps with the ticks label of the same subplot and similarly the axis label overlaps with the ticks label of the same subplot.
I also would like to add that I do not care that they overlap on my jupyter notebook (as it as been created with it), however I want the final saved image with no overlap, so perhaps there is something I need to do to save the image in a slightly different format to avoid that, but I don't know what (I am only using plt.savefig to save it).
EDIT 2: If someone would like to reproduce the problem here is a minimal example:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
sns.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1.5)
# Generate an example radial datast
r = np.linspace(0, 10000, num=100)
df = pd.DataFrame({'label': r, 'slow': r, 'medium-slow': 1 * r, 'medium': 2 * r, 'medium-fast': 3 * r, 'fast': 4 * r})
# Convert the dataframe to long-form or "tidy" format
df = pd.melt(df, id_vars=['label'], var_name='speed', value_name='theta')
# Set up a grid of axes with a polar projection
g = sns.FacetGrid(df, col="speed", hue="speed",
subplot_kws=dict(projection='polar'), size=4.5, col_wrap=5,
sharex=False, sharey=False, despine=False)
# Draw a scatterplot onto each axes in the grid
g.map(plt.scatter, "theta", "label")
plt.savefig('./image.png')
plt.show()
Which gives the following image in which the titles are not as bad as in my original problem (but still some overlap) and the label on the left hand side overlap completely.
In order to move the title a bit higher you can set at new position,
ax.title.set_position([.5, 1.1])
In order to move the ylabel a little further left, you can add some padding
ax.yaxis.labelpad = 25
To do this for the axes of the facetgrid, you'd do:
for ax in g.axes:
ax.title.set_position([.5, 1.1])
ax.yaxis.labelpad = 25
The answer provided by ImportanceOfBeingErnest in this SO question may help.

matplotlib: Creating two (stacked) subplots with SHARED X axis but SEPARATE Y axis values

I am using matplotlib 1.2.x and Python 2.6.5 on Ubuntu 10.0.4. I am trying to create a SINGLE plot that consists of a top plot and a bottom plot.
The X axis is the date of the time series. The top plot contains a candlestick plot of the data, and the bottom plot should consist of a bar type plot - with its own Y axis (also on the left - same as the top plot). These two plots should NOT OVERLAP.
Here is a snippet of what I have done so far.
datafile = r'/var/tmp/trz12.csv'
r = mlab.csv2rec(datafile, delimiter=',', names=('dt', 'op', 'hi', 'lo', 'cl', 'vol', 'oi'))
mask = (r["dt"] >= datetime.date(startdate)) & (r["dt"] <= datetime.date(enddate))
selected = r[mask]
plotdata = zip(date2num(selected['dt']), selected['op'], selected['cl'], selected['hi'], selected['lo'], selected['vol'], selected['oi'])
# Setup charting
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # Eg, Jan 12
dayFormatter = DateFormatter('%d') # Eg, 12
monthFormatter = DateFormatter('%b %y')
# every Nth month
months = MonthLocator(range(1,13), bymonthday=1, interval=1)
fig = pylab.figure()
fig.subplots_adjust(bottom=0.1)
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(months)#mondays
ax.xaxis.set_major_formatter(monthFormatter) #weekFormatter
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
ax.format_ydata = price
ax.grid(True)
candlestick(ax, plotdata, width=0.5, colorup='g', colordown='r', alpha=0.85)
ax.xaxis_date()
ax.autoscale_view()
pylab.setp( pylab.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
# Add volume data
# Note: the code below OVERWRITES the bottom part of the first plot
# it should be plotted UNDERNEATH the first plot - but somehow, that's not happening
fig.subplots_adjust(hspace=0.15)
ay = fig.add_subplot(212)
volumes = [ x[-2] for x in plotdata]
ay.bar(range(len(plotdata)), volumes, 0.05)
pylab.show()
I have managed to display the two plots using the code above, however, there are two problems with the bottom plot:
It COMPLETELY OVERWRITES the bottom part of the first (top) plot - almost as though the second plot was drawing on the same 'canvas' as the first plot - I can't see where/why that is happening.
It OVERWRITES the existing X axis with its own indice, the X axis values (dates) should be SHARED between the two plots.
What am I doing wrong in my code?. Can someone spot what is causing the 2nd (bottom) plot to overwrite the first (top) plot - and how can I fix this?
Here is a screenshot of the plot created by the code above:
[[Edit]]
After modifying the code as suggested by hwlau, this is the new plot. It is better than the first in that the two plots are separate, however the following issues remain:
The X axis should be SHARED by the two plots (i.e. the X axis should be shown only for the 2nd [bottom] plot)
The Y values for the 2nd plot seem to be formmated incorrectly
I think these issues should be quite easy to resolve however, my matplotlib fu is not great at the moment, as I have only recently started programming with matplotlib. any help will be much appreciated.
There seem to be a couple of problems with your code:
If you were using figure.add_subplots with the full
signature of subplot(nrows, ncols, plotNum) it may have
been more apparent that your first plot asking for 1 row
and 1 column and the second plot was asking for 2 rows and
1 column. Hence your first plot is filling the whole figure.
Rather than fig.add_subplot(111) followed by fig.add_subplot(212)
use fig.add_subplot(211) followed by fig.add_subplot(212).
Sharing an axis should be done in the add_subplot command using sharex=first_axis_instance
I have put together an example which you should be able to run:
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
import datetime as dt
n_pts = 10
dates = [dt.datetime.now() + dt.timedelta(days=i) for i in range(n_pts)]
ax1 = plt.subplot(2, 1, 1)
ax1.plot(dates, range(10))
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
ax2.bar(dates, range(10, 20))
# Now format the x axis. This *MUST* be done after all sharex commands are run.
# put no more than 10 ticks on the date axis.
ax1.xaxis.set_major_locator(mticker.MaxNLocator(10))
# format the date in our own way.
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
# rotate the labels on both date axes
for label in ax1.xaxis.get_ticklabels():
label.set_rotation(30)
for label in ax2.xaxis.get_ticklabels():
label.set_rotation(30)
# tweak the subplot spacing to fit the rotated labels correctly
plt.subplots_adjust(hspace=0.35, bottom=0.125)
plt.show()
Hope that helps.
You should change this line:
ax = fig.add_subplot(111)
to
ax = fig.add_subplot(211)
The original command means that there is one row and one column so it occupies the whole graph. So your second graph fig.add_subplot(212) cover the lower part of the first graph.
Edit
If you dont want the gap between two plots, use subplots_adjust() to change the size of the subplots margin.
The example from #Pelson, simplified.
import matplotlib.pyplot as plt
import datetime as dt
#Two subplots that share one x axis
fig,ax=plt.subplots(2,sharex=True)
#plot data
n_pts = 10
dates = [dt.datetime.now() + dt.timedelta(days=i) for i in range(n_pts)]
ax[0].bar(dates, range(10, 20))
ax[1].plot(dates, range(10))
#rotate and format the dates on the x axis
fig.autofmt_xdate()
The subplots sharing an x-axis are created in one line, which is convenient when you want more than two subplots:
fig, ax = plt.subplots(number_of_subplots, sharex=True)
To format the date correctly on the x axis, we can simply use fig.autofmt_xdate()
For additional informations, see shared axis demo and date demo from the pylab examples.
This example ran on Python3, matplotlib 1.5.1

Categories