Weird behavior of matplotlibs boxplot when using the notch shape

Weird behavior of matplotlibs boxplot when using the notch shape - python

I am encountering some weird behavior in matplotlib's boxplot function when I am using the "notch" shape. I am using some code that I have written a while ago and never had those issues -- I am wondering what the problem is. Any ideas?
When I turn the notch shape off it looks normal though
This would be the code:
def boxplot_modified(data):
fig = plt.figure(figsize=(8,6))
ax = plt.subplot(111)
bplot = plt.boxplot(data,
#notch=True, # notch shape
vert=True, # vertical box aligmnent
sym='ko', # red circle for outliers
patch_artist=True, # fill with color
)
# choosing custom colors to fill the boxes
colors = 3*['lightgreen'] + 3*['lightblue'], 'lightblue', 'lightblue', 'lightblue']
for patch, color in zip(bplot['boxes'], colors):
patch.set_facecolor(color)
# modifying the whiskers: straight lines, black, wider
for whisker in bplot['whiskers']:
whisker.set(color='black', linewidth=1.2, linestyle='-')
# making the caps a little bit wider
for cap in bplot['caps']:
cap.set(linewidth=1.2)
# hiding axis ticks
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# adding horizontal grid lines
ax.yaxis.grid(True)
# remove axis spines
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["bottom"].set_visible(True)
ax.spines["left"].set_visible(True)
plt.xticks([y+1 for y in range(len(data))], 8*['x'])
# raised title
#plt.text(2, 1, 'Modified',
# horizontalalignment='center',
# fontsize=18)
plt.tight_layout()
plt.show()
boxplot_modified(df.values)
and when I make a plain plot without the customization, the problem still occurs:
def boxplot(data):
fig = plt.figure(figsize=(8,6))
ax = plt.subplot(111)
bplot = plt.boxplot(data,
notch=True, # notch shape
vert=True, # vertical box aligmnent
sym='ko', # red circle for outliers
patch_artist=True, # fill with color
)
plt.show()
boxplot(df.values)

Okay, as it turns out, this is actually a correct behavior ;)
From Wikipedia:
Notched box plots apply a "notch" or narrowing of the box around the median. Notches are useful in offering a rough guide to significance of difference of medians; if the notches of two boxes do not overlap, this offers evidence of a statistically significant difference between the medians. The width of the notches is proportional to the interquartile range of the sample and inversely proportional to the square root of the size of the sample. However, there is uncertainty about the most appropriate multiplier (as this may vary depending on the similarity of the variances of the samples). One convention is to use +/-1.58*IQR/sqrt(n).
This was also discussed in an issue on GitHub; R produces a similar output as evidence that this behaviour is "correct."
Thus, if we have this weird "flipped" appearance in the notched box plots, it simply means that the 1st quartile has a lower value than the confidence of the mean and vice versa for the 3rd quartile. Although it looks ugly, it's actually useful information about the (un)confidence of the median.
A bootstrapping (random sampling with replacement to estimate parameters of a sampling distribution, here: confidence intervals) might reduce this effect:
From the plt.boxplot documentation:
bootstrap : None (default) or integer
Specifies whether to bootstrap the confidence intervals
around the median for notched boxplots. If bootstrap==None,
no bootstrapping is performed, and notches are calculated
using a Gaussian-based asymptotic approximation (see McGill, R.,
Tukey, J.W., and Larsen, W.A., 1978, and Kendall and Stuart,
1967). Otherwise, bootstrap specifies the number of times to
bootstrap the median to determine it's 95% confidence intervals.
Values between 1000 and 10000 are recommended.

Related

Python plt.contour colorbar

I am trying to do a plot of a seismic wave using plt.contour.
I have 3 arrays:
time (x-axis)
frequency (y-axis)
amplitude (z-axis)
This is my results so far:
The problem is that I want to change the scaling of the colorbar: making a gradation and not having this white color when the amplitude is low. But I am not able to do so, even though I spent a lot of time browsing the doc.
I read that plt.pcolormesh is not appropriate here (it is just working here because I am in a special case), but this what I want to get regarding to the colours and colorbar:
This is the code I wrote:
T = len(time[0])*(time[0][1] - time[0][0]) # multiply ampFFT with T to offset
Z = abs(ampFFT)*(T) # abbreviation
# freq = frequency, ampFFT = Fast Fourier Transform of the amplitude of the wave
# freq, amFFT and time have same dimensions: 40 x 1418 (40 steps of time discretization x steps to have the total time. 2D because it is easier to use)
maxFreq = abs(freq).max() # maxium frequency for plot boundaries
maxAmpFFT = abs(Z).max()/2 # maxium ampFFT for plot boundaries of colorbar divided by 2 to scale better with the colors
minAmpFFT = abs(Z).min()
plt.figure(1)
plt.contour(time, freq, Z, vmin=minAmpFFT, vmax=maxAmpFFT)
plt.colorbar()
plt.ylim(0,maxFreq) # 0 to remove the negative frequencies useless here
plt.title("Amplitude intensity regarding to time and frequency")
plt.xlabel('time (in secondes)')
plt.ylabel('frequency (in Hz)')
plt.show()
Thank you for your attention!
NB : In case you were wondering about plt.pcolormesh: the plot is completely messed up when I choose to increase the time discretization (here I split the time in 40, but when I split the time in 1000 the plot is not correct, and I want to be able to split the time in smaller pieces).
EDIT: When I use plt.contourf instead of plt.contour I got this plot:
Which is not really convincing either. I understand why the yellow colour takes so much space (it is because I set a low vmax), but I don't understand why there is still white colour in my plot.
EDIT 2: My teacher plotted my data, and I have the correct data. The only problem that is left is the white background in my plot (and the deep blue on left and right border for nor apparent reason when I use plt.contourf). Despite those problems, the highest amplitude is located around 0.5 Hz, which is in agreement with the work of my teacher.
He used gnuplot, but since I don't know gnuplot, I prefer to use python.

Solution/Workaround I found
Here is what I did to display my data like countourf does, but without the display problems:
Explanation: for the surface, I took abs(freq) instead of just freq because I have negative frequencies.
It is because that when calculating the frequency of a FFT, you have a frequency that repeat itself a 2nd time like this:
You have 2 way of obtaining this frequency:
- the frequency is positive, this array is 2 x Nyquist frequency (so if you divide the array by 2, you have all your wave, and it doesn't repeat itself).
- the frequency starts negative and go to positive, this array also is 2 x Nyquist frequency (so if you remove the negative value you have all your wave, and it doesn't repeat itself).
Python fft.fftfreq use the 2nd option. plot_surface doesn't work well with removing the data of an array (for me it was still displayed). So I made the frequency value absolute and the problem disappeared.
fig = plt.figure(1, figsize=(18,15)) # figsize: increase plot size
ax = fig.gca(projection='3d')
surf = ax.plot_surface(time, abs(freq), Z, rstride=1, cstride=1, cmap=cm.magma, linewidth=0, antialiased=False, vmin=minAmpFFT, vmax=maxAmpFFT)
ax.set_zlim(0, maxAmpFFT)
ax.set_ylim(0, maxFreq)
ax.view_init(azim=90, elev=90) # change view to top view, with axis in the right direction
plt.title("Amplitude intensity (m/Hz^0.5) regarding to time and frequency")
plt.xlabel('x : time (in secondes)')
plt.ylabel('y : frequency (in Hz)')
# ax.yaxis._set_scale('log') # should be in log, but does not work
plt.gca().invert_xaxis() # invert x axis !! MUST BE AFTER X,Y,Z LIM
plt.gca().invert_yaxis() # invert y axis !! MUST BE AFTER X,Y,Z LIM
plt.colorbar(surf)
fig.tight_layout()
plt.show()
This is the plot I got:

Radius of matplotlib scatter plot [duplicate]

In the pyplot document for scatter plot:
matplotlib.pyplot.scatter(x, y, s=20, c='b', marker='o', cmap=None, norm=None,
vmin=None, vmax=None, alpha=None, linewidths=None,
faceted=True, verts=None, hold=None, **kwargs)
The marker size
s:
size in points^2. It is a scalar or an array of the same length as x and y.
What kind of unit is points^2? What does it mean? Does s=100 mean 10 pixel x 10 pixel?
Basically I'm trying to make scatter plots with different marker sizes, and I want to figure out what does the s number mean.

This can be a somewhat confusing way of defining the size but you are basically specifying the area of the marker. This means, to double the width (or height) of the marker you need to increase s by a factor of 4. [because A = WH => (2W)(2H)=4A]
There is a reason, however, that the size of markers is defined in this way. Because of the scaling of area as the square of width, doubling the width actually appears to increase the size by more than a factor 2 (in fact it increases it by a factor of 4). To see this consider the following two examples and the output they produce.
# doubling the width of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*4**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Notice how the size increases very quickly. If instead we have
# doubling the area of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*2**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Now the apparent size of the markers increases roughly linearly in an intuitive fashion.
As for the exact meaning of what a 'point' is, it is fairly arbitrary for plotting purposes, you can just scale all of your sizes by a constant until they look reasonable.
Edit: (In response to comment from #Emma)
It's probably confusing wording on my part. The question asked about doubling the width of a circle so in the first picture for each circle (as we move from left to right) it's width is double the previous one so for the area this is an exponential with base 4. Similarly the second example each circle has area double the last one which gives an exponential with base 2.
However it is the second example (where we are scaling area) that doubling area appears to make the circle twice as big to the eye. Thus if we want a circle to appear a factor of n bigger we would increase the area by a factor n not the radius so the apparent size scales linearly with the area.
Edit to visualize the comment by #TomaszGandor:
This is what it looks like for different functions of the marker size:
x = [0,2,4,6,8,10,12,14,16,18]
s_exp = [20*2**n for n in range(len(x))]
s_square = [20*n**2 for n in range(len(x))]
s_linear = [20*n for n in range(len(x))]
plt.scatter(x,[1]*len(x),s=s_exp, label='$s=2^n$', lw=1)
plt.scatter(x,[0]*len(x),s=s_square, label='$s=n^2$')
plt.scatter(x,[-1]*len(x),s=s_linear, label='$s=n$')
plt.ylim(-1.5,1.5)
plt.legend(loc='center left', bbox_to_anchor=(1.1, 0.5), labelspacing=3)
plt.show()

Because other answers here claim that s denotes the area of the marker, I'm adding this answer to clearify that this is not necessarily the case.
Size in points^2
The argument s in plt.scatter denotes the markersize**2. As the documentation says
s : scalar or array_like, shape (n, ), optional
size in points^2. Default is rcParams['lines.markersize'] ** 2.
This can be taken literally. In order to obtain a marker which is x points large, you need to square that number and give it to the s argument.
So the relationship between the markersize of a line plot and the scatter size argument is the square. In order to produce a scatter marker of the same size as a plot marker of size 10 points you would hence call scatter( .., s=100).
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([0],[0], marker="o", markersize=10)
ax.plot([0.07,0.93],[0,0], linewidth=10)
ax.scatter([1],[0], s=100)
ax.plot([0],[1], marker="o", markersize=22)
ax.plot([0.14,0.86],[1,1], linewidth=22)
ax.scatter([1],[1], s=22**2)
plt.show()
Connection to "area"
So why do other answers and even the documentation speak about "area" when it comes to the s parameter?
Of course the units of points**2 are area units.
For the special case of a square marker, marker="s", the area of the marker is indeed directly the value of the s parameter.
For a circle, the area of the circle is area = pi/4*s.
For other markers there may not even be any obvious relation to the area of the marker.
In all cases however the area of the marker is proportional to the s parameter. This is the motivation to call it "area" even though in most cases it isn't really.
Specifying the size of the scatter markers in terms of some quantity which is proportional to the area of the marker makes in thus far sense as it is the area of the marker that is perceived when comparing different patches rather than its side length or diameter. I.e. doubling the underlying quantity should double the area of the marker.
What are points?
So far the answer to what the size of a scatter marker means is given in units of points. Points are often used in typography, where fonts are specified in points. Also linewidths is often specified in points. The standard size of points in matplotlib is 72 points per inch (ppi) - 1 point is hence 1/72 inches.
It might be useful to be able to specify sizes in pixels instead of points. If the figure dpi is 72 as well, one point is one pixel. If the figure dpi is different (matplotlib default is fig.dpi=100),
1 point == fig.dpi/72. pixels
While the scatter marker's size in points would hence look different for different figure dpi, one could produce a 10 by 10 pixels^2 marker, which would always have the same number of pixels covered:
import matplotlib.pyplot as plt
for dpi in [72,100,144]:
fig,ax = plt.subplots(figsize=(1.5,2), dpi=dpi)
ax.set_title("fig.dpi={}".format(dpi))
ax.set_ylim(-3,3)
ax.set_xlim(-2,2)
ax.scatter([0],[1], s=10**2,
marker="s", linewidth=0, label="100 points^2")
ax.scatter([1],[1], s=(10*72./fig.dpi)**2,
marker="s", linewidth=0, label="100 pixels^2")
ax.legend(loc=8,framealpha=1, fontsize=8)
fig.savefig("fig{}.png".format(dpi), bbox_inches="tight")
plt.show()
If you are interested in a scatter in data units, check this answer.

You can use markersize to specify the size of the circle in plot method
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.randn(20)
x2 = np.random.randn(20)
plt.figure(1)
# you can specify the marker size two ways directly:
plt.plot(x1, 'bo', markersize=20) # blue circle with size 10
plt.plot(x2, 'ro', ms=10,) # ms is just an alias for markersize
plt.show()
From here

It is the area of the marker. I mean if you have s1 = 1000 and then s2 = 4000, the relation between the radius of each circle is: r_s2 = 2 * r_s1. See the following plot:
plt.scatter(2, 1, s=4000, c='r')
plt.scatter(2, 1, s=1000 ,c='b')
plt.scatter(2, 1, s=10, c='g')
I had the same doubt when I saw the post, so I did this example then I used a ruler on the screen to measure the radii.

I also attempted to use 'scatter' initially for this purpose. After quite a bit of wasted time - I settled on the following solution.
import matplotlib.pyplot as plt
input_list = [{'x':100,'y':200,'radius':50, 'color':(0.1,0.2,0.3)}]
output_list = []
for point in input_list:
output_list.append(plt.Circle((point['x'], point['y']), point['radius'], color=point['color'], fill=False))
ax = plt.gca(aspect='equal')
ax.cla()
ax.set_xlim((0, 1000))
ax.set_ylim((0, 1000))
for circle in output_list:
ax.add_artist(circle)
This is based on an answer to this question

If the size of the circles corresponds to the square of the parameter in s=parameter, then assign a square root to each element you append to your size array, like this: s=[1, 1.414, 1.73, 2.0, 2.24] such that when it takes these values and returns them, their relative size increase will be the square root of the squared progression, which returns a linear progression.
If I were to square each one as it gets output to the plot: output=[1, 2, 3, 4, 5]. Try list interpretation: s=[numpy.sqrt(i) for i in s]

plt.quiver() plotting dots instead of vectors in some places

I'm currently analyzing some data by creating a vector plot. All the vectors have length 1 unit. Most show up fine, but certain vectors such as:
fig = plt.figure()
plt.axes(xlim=(-24, 24), ylim=(0, 150))
plt.quiver([-19.1038], [96.5851], [-19.1001+19.1038], [97.5832-96.5851],angles='xy', scale_units='xy', scale=1, headwidth=1, headlength=10, minshaft=5)
plt.show()
show up as a point. (Please note that I am not drawing my vectors individually like this; I only drew this particular one to try to debug my code.) This appears to only be occurring for nearly vertical vectors. I've also noticed that this issue is resolved if I "zoom in" on the vector (i.e. change the axis scaling). However, I cannot do that as many other vectors in my plot will be outside of the domain/range. Is there another way to fix this?
The problem is demonstrated in the below figure:

There are two components to your problem, and both have to do with how you chose to represent your data.
The default behaviour of quiver is to auto-scale your vectors to a reasonable size for a pretty result. The documentation says as much:
The default settings auto-scales the length of the arrows to a reasonable size. To change this behavior see the scale and scale_units kwargs.
And then
scale_units : [ ‘width’ | ‘height’ | ‘dots’ | ‘inches’ | ‘x’ | ‘y’ | ‘xy’ ], None, optional
[...]
If scale_units is ‘x’ then the vector will be 0.5 x-axis units. To plot vectors in the x-y plane, with u and v having the same units as x and y, use angles='xy', scale_units='xy', scale=1.
So in your case, you're telling quiver to plot the arrow in xy data units. Since your arrow is of unit length, it is drawn as a 1-length arrow. Your data limits, on the other hand, are huge: 40 units wide, 150 units tall. On this scale a length-1 arrow is just too small, and matplotlib decides to truncate the arrow and plot a dot instead.
If you zoom in, as you said yourself, the arrow appears. If we remove the parameters that turn your arrow into a toothpick, it turns out that the arrow you plot is perfectly fine if you look close enough (not the axes):
Now, the question is why this behaviour depends on the orientation of your vectors. The reason for this behaviour is that the x and y limits are different in your plot, so a unit-length horizontal line and a unit-length vertical line contain a different number of pixels (since your data is scaled in xy units). This implies that while horizontal arrows are long enough to be represented accurately, vertical ones become so short that matplotlib decides to truncate them to dots, which shouldn't be too obvious with the default arrow format, but it is pretty bad with your custom arrows. Your use case is such that the rendering cut-off used by matplotlib happens to fall between the length of your horizontal vectors and the length of your vertical ones.
You have two straightforward choices. One is to increase the scaling for your arrows to the point where every orientation is represented accurately. This would probably be the solution to Y in a small XY problem here. What you should really do, is represent your data accurately. Since you're plotting your vector field in xy data units, you presumably want your x and y axes to have equal sizes, and you want your arrows to have visually unit length (i.e. a length that's independent from their orientation).
So I suggest that you force your plot to have equal units on both axes, at the cost of ending up with a rectangular figure:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.axis('scaled') # <-- key addition
ax.axis([-24, 24, 0, 150])
ax.quiver([-19.1038], [96.5851], [-19.1001+19.1038], [97.5832-96.5851],
angles='xy', scale_units='xy', scale=1, headwidth=1,
headlength=10, minshaft=5)
plt.show()
Trust me: there's a tiny arrow in there. The main point is that this way either all of your vectors will be dots (if you're zoomed out too much), or neither of them will. Then you have a sane situation, and can choose the overall scaling of your vectors accordingly.

Misaligned bins in matplotlib stackplot

I am trying to make a stack plot where the bins don't seem to be aligning correctly with the data. What I have plotted is the proportion of something in a sphere as you go radially outward from the center. The error became visible to me in the rightmost section of this plot. The lighter blue should be a vertical column of one width. Instead the dark blue seems to slant into the lighter blue section.
What I believe is the problem is that the data are not evenly spaced. For example: at a radius of 300 I might have a certain proportion value. Then at a radius of 330 I might have another, then the next at 400.
I had thought that stackplot would be able to take care of this but it appears not. Is there a way for me to straighten up these columns of data?
Source Code:
def phaseProp(rad,phase):
#phaseLabel = np.array(['coe','en','fs','olv','maj','perov','ppv','ring','wad','per','wust','st'])
#print phaseLabel
rad = rad/1000.
phase = phase/100.
print phase[:,:]
#print phase[:,0]
fig, ax = plt.subplots(figsize = (15,10))
ax.stackplot(rad[:],phase[:,0],phase[:,1],phase[:,2],phase[:,3],phase[:,4], \
phase[:,5],phase[:,6],phase[:,7],phase[:,8], \
phase[:,9] ,phase[:,10],phase[:,11],phase[:,12], \
colors = ['gainsboro','gold','lightsage','darkorange','tomato','indianred',\
'darksage','sage','palevioletred','darkgrey','dodgerblue' ,'mediumblue' ,'darkblue' ])
plt.legend([mpatches.Patch(color='gainsboro'),
mpatches.Patch(color='gold'),
mpatches.Patch(color='lightsage'),
mpatches.Patch(color='darkorange'),
mpatches.Patch(color='tomato'),
mpatches.Patch(color='indianred'),
mpatches.Patch(color='darksage'),
mpatches.Patch(color='sage'),
mpatches.Patch(color='palevioletred'),
mpatches.Patch(color='darkgrey'),
mpatches.Patch(color='dodgerblue'),
mpatches.Patch(color='mediumblue'),
mpatches.Patch(color='darkblue')],
['coe','opx','ol','gt','pv','ppv','rw','wad','fp','st','h2o','iceIh','iceVII'],\
loc='upper center', bbox_to_anchor=(0.5, 1.127),fancybox=True, shadow=True, ncol=5,fontsize='20')
plt.ylabel(r'Phase Proportion',fontsize = 34)
plt.xlabel(r'Radius (km)',fontsize = 34)
plt.tick_params(axis='both', which='both', labelsize=32)
plt.xlim(rad[noc+1],rad[nr])
plt.ylim(0,1.0)
#ax.stackplot(rad,phase)
#plt.gca().invert_xaxis()
plt.show()

I've had a look at your problem and I think the problem lies with the fact that the last two points for the H20 line are (7100,0) and (7150,1) therefore it simply slopes up as you are seeing.
However it is very simple to add an additional point to give a square edge:
rad_amended = np.hstack((rad,rad[-1])) #extend the array by 1
rad_amended[-2] = rad[-2] +1 #alter the penultimate value
phase_amended = np.vstack((phase,phase[-1])) #extend the Y values
nr+=1 #extend the range of the x-axis
phaseProp(rad_amended,phase_amended)
This principle could be extended for the full dataset and give square edges to every Area, but I assume you are happy with the rest of the graph?

Changing aspect ratio of subplots in matplotlib

I have created a series of simple greyscale images which I have plotted in a grid (unfortunately, can't upload an image because I don't have a high enough reputation :( ).
The pseudo-code is
# Define matplotlib PyPlot object
nrow = 8
ncol = 12
fig, axes = plt.subplots(nrow, ncol, subplot_kw={'xticks': [], 'yticks': []})
fig.subplots_adjust(hspace=0.05, wspace=0.05)
# Sample the fine scale model at random well locations
for ax in axes.flat:
plot_data = # some Python code here to create 2D grey scale array...
# ... create sub-plot
img = ax.imshow(plot_data, interpolation='none')
img.set_cmap('gray')
# Display the plot
plt.show()
I want to change the aspect ratio so that the plots are squashed vertically and stretched horizontally. I have tried using ax.set_aspect and passing 'aspect' as a subplot_kw argument but to no avail. I also switched 'autoscale' off but I can then only see a handful of pixels. All suggestions welcome!
Thanks in advance!!
#JoeKington - thank you! That was a great reply!! Still trying to get my head around it all. Thanks also to the other posters for their suggestions. So, the original plot looked like this: http://imgur.com/Wi6v4cs
When I set' aspect='auto'' the plot looks like this: http://imgur.com/eRBO6MZ
which is a big improvement. All I need to do now is adjust the subplot size so that sub-plots are plotted in a portrait aspect ratio of eg 2:1, but with the plot filling the entire sub-plot. I guess 'colspan' would do this?

The Short Answer
You're probably wanting to call:
ax.imshow(..., aspect='auto')
imshow will set the aspect ratio of the axes to 1 when it is called, by default. This will override any aspect you specify when you create the axes.
However, this is a common source of confusion in matplotlib. Let me back up and explain what's going on in detail.
Matplotlib's Layout Model
aspect in matplotlib refers to the ratio of the xscale and yscale in data coordinates. It doesn't directly control the ratio of the width and height of the axes.
There are three things that control the size and shape of the "outside box" of a matplotlib axes:
The size/shape of the Figure (shown in red in figures below)
The specified extent of the Axes in figure coordinates (e.g. the subplot location, shown in green in figures below)
The mechanism that the Axes uses to accommodate a fixed aspect ratio (the adjustable parameter).
Axes are always placed in figure coordinates in other words, their shape/size is always a ratio of the figure's shape/size. (Note: Some things such as axes_grid will change this at draw time to get around this limitation.)
However, the extent the axes is given (either from its subplot location or explicitly set extent) isn't necessarily the size it will take up. Depending on the aspect and adjustable parameters, the Axes will shrink inside of its given extent.
To understand how everything interacts, let's plot a circle in lots of different cases.
No Fixed Aspect
In the basic case (no fixed aspect ratio set for the axes), the axes will fill up the entire space allocated to it in figure coordinates (shown by the green box).
The x and y scales (as set by aspect) will be free to change independently, distorting the circle:
When we resize the figure (interactively or at figure creation), the axes will "squish" with it:
Fixed Aspect Ratio, adjustable='box'
However, if the aspect ratio of the plot is set (imshow will force the aspect ratio to 1, by default), the Axes will adjust the size of the outside of the axes to keep the x and y data ratios at the specified aspect.
A key point to understand here, though, is that the aspect of the plot is the aspect of the x and y data scales. It's not the aspect of the width and height of the plot. Therefore, if the aspect is 1, the circle will always be a circle.
As an example, let's say we had done something like:
fig, ax = plt.subplots()
# Plot circle, etc, then:
ax.set(xlim=[0, 10], ylim=[0, 20], aspect=1)
By default, adjustable will be "box". Let's see what happens:
The maximum space the Axes can take up is shown by the green box. However, it has to maintain the same x and y scales. There are two ways this could be accomplished: Change the x and y limits or change the shape/size of the Axes bounding box. Because the adjustable parameter of the Axes is set to the default "box", the Axes shrinks inside of its maximum space.
And as we resize the figure, it will keep shrinking, but maintain the x and y scales by making the Axes use up less of the maximum space allocated to the axes (green box):
Two quick side-notes:
If you're using shared axes, and want to have adjustable="box", use adjustable="box-forced" instead.
If you'd like to control where the axes is positioned inside of the "green box" set the anchor of the axes. E.g. ax.set_anchor('NE') to have it remain "pinned" to the upper right corner of the "green box" as it adjusts its size to maintain the aspect ratio.
Fixed Aspect, adjustable="datalim"
The other main option for adjustable is "datalim".
In this case, matplotlib will keep the x and y scales in data space by changing one of the axes limits. The Axes will fill up the entire space allocated to it. However, if you manually set the x or y limits, they may be overridden to allow the axes to both fill up the full space allocated to it and keep the x/y scale ratio to the specified aspect.
In this case, the x limits were set to 0-10 and the y-limits to 0-20, with aspect=1, adjustable='datalim'. Note that the y-limit was not honored:
And as we resize the figure, the aspect ratio says the same, but the data limits change (in this case, the x-limit is not honored).
On a side note, the code to generate all of the above figures is at: https://gist.github.com/joferkington/4fe0d9164b5e4fe1e247
What does this have to do with imshow?
When imshow is called, it calls ax.set_aspect(1.0), by default. Because adjustable="box" by default, any plot with imshow will behave like the 3rd/4th images above.
For example:
However, if we specify imshow(..., aspect='auto'), the aspect ratio of the plot won't be overridden, and the image will "squish" to take up the full space allocated to the Axes:
On the other hand, if you wanted the pixels to remain "square" (note: they may not be square depending on what's specified by the extent kwarg), you can leave out the aspect='auto' and set the adjustable parameter of the axes to "datalim" instead.
E.g.
ax.imshow(data, cmap='gist_earth', interpolation='none')
ax.set(adjustable="datalim")
Axes Shape is Controlled by Figure Shape
The final part to remember is that the axes shape/size is defined as a percentage of the figure's shape/size.
Therefore, if you want to preserve the aspect ratio of the axes and have a fixed spacing between adjacent subplots, you'll need to define the shape of the figure to match. plt.figaspect is extremely handy for this. It simply generates a tuple of width, height based on a specified aspect ratio or a 2D array (it will take the aspect ratio from the array's shape, not contents).
For your example of a grid of subplots, each with a constant 2x1 aspect ratio, you might consider something like the following (note that I'm not using aspect="auto" here, as we want the pixels in the images to remain square):
import numpy as np
import matplotlib.pyplot as plt
nrows, ncols = 8, 12
dx, dy = 1, 2
figsize = plt.figaspect(float(dy * nrows) / float(dx * ncols))
fig, axes = plt.subplots(nrows, ncols, figsize=figsize)
for ax in axes.flat:
data = np.random.random((10*dy, 10*dx))
ax.imshow(data, interpolation='none', cmap='gray')
ax.set(xticks=[], yticks=[])
pad = 0.05 # Padding around the edge of the figure
xpad, ypad = dx * pad, dy * pad
fig.subplots_adjust(left=xpad, right=1-xpad, top=1-ypad, bottom=ypad)
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.