Misaligned bins in matplotlib stackplot - python

I am trying to make a stack plot where the bins don't seem to be aligning correctly with the data. What I have plotted is the proportion of something in a sphere as you go radially outward from the center. The error became visible to me in the rightmost section of this plot. The lighter blue should be a vertical column of one width. Instead the dark blue seems to slant into the lighter blue section.
What I believe is the problem is that the data are not evenly spaced. For example: at a radius of 300 I might have a certain proportion value. Then at a radius of 330 I might have another, then the next at 400.
I had thought that stackplot would be able to take care of this but it appears not. Is there a way for me to straighten up these columns of data?
Source Code:
def phaseProp(rad,phase):
#phaseLabel = np.array(['coe','en','fs','olv','maj','perov','ppv','ring','wad','per','wust','st'])
#print phaseLabel
rad = rad/1000.
phase = phase/100.
print phase[:,:]
#print phase[:,0]
fig, ax = plt.subplots(figsize = (15,10))
ax.stackplot(rad[:],phase[:,0],phase[:,1],phase[:,2],phase[:,3],phase[:,4], \
phase[:,5],phase[:,6],phase[:,7],phase[:,8], \
phase[:,9] ,phase[:,10],phase[:,11],phase[:,12], \
colors = ['gainsboro','gold','lightsage','darkorange','tomato','indianred',\
'darksage','sage','palevioletred','darkgrey','dodgerblue' ,'mediumblue' ,'darkblue' ])
plt.legend([mpatches.Patch(color='gainsboro'),
mpatches.Patch(color='gold'),
mpatches.Patch(color='lightsage'),
mpatches.Patch(color='darkorange'),
mpatches.Patch(color='tomato'),
mpatches.Patch(color='indianred'),
mpatches.Patch(color='darksage'),
mpatches.Patch(color='sage'),
mpatches.Patch(color='palevioletred'),
mpatches.Patch(color='darkgrey'),
mpatches.Patch(color='dodgerblue'),
mpatches.Patch(color='mediumblue'),
mpatches.Patch(color='darkblue')],
['coe','opx','ol','gt','pv','ppv','rw','wad','fp','st','h2o','iceIh','iceVII'],\
loc='upper center', bbox_to_anchor=(0.5, 1.127),fancybox=True, shadow=True, ncol=5,fontsize='20')
plt.ylabel(r'Phase Proportion',fontsize = 34)
plt.xlabel(r'Radius (km)',fontsize = 34)
plt.tick_params(axis='both', which='both', labelsize=32)
plt.xlim(rad[noc+1],rad[nr])
plt.ylim(0,1.0)
#ax.stackplot(rad,phase)
#plt.gca().invert_xaxis()
plt.show()

I've had a look at your problem and I think the problem lies with the fact that the last two points for the H20 line are (7100,0) and (7150,1) therefore it simply slopes up as you are seeing.
However it is very simple to add an additional point to give a square edge:
rad_amended = np.hstack((rad,rad[-1])) #extend the array by 1
rad_amended[-2] = rad[-2] +1 #alter the penultimate value
phase_amended = np.vstack((phase,phase[-1])) #extend the Y values
nr+=1 #extend the range of the x-axis
phaseProp(rad_amended,phase_amended)
This principle could be extended for the full dataset and give square edges to every Area, but I assume you are happy with the rest of the graph?

Related

Get automatically coordinates of subplots in order to set them for automatic positioning of legend

I tried in a first time to set manually the location for the main legend of a main plot produced by Getdist tool.
The plot below represents the 1/2 sigma confidence levels coming from a covariance matrix with joint distributions. It is produced by Getdist tool.
The main routine that generates this plot is :
# g.settings
g = plots.get_subplot_plotter()
g.settings.figure_legend_frame = True
g.settings.legend_fontsize = 21
g.triangle_plot([matrix1, matrix2],
names,
filled = True,
contour_colors = ['darkblue','red'],
line_args = [{'lw':2, 'color':'darkblue'},
{'lw':2, 'color':'red'}]
)
g.add_legend(['Opt. Flat. No Gamma. - cross - standard situation - Criterion taking into accound a = 200',\
'Pess. Flat. No Gamma. - cross - standard situation - Criterion taking into account a = 300' ],\
bbox_to_anchor = [1.5, 8.5])
The value 1.5 seems to correspond to the x-coordinate (width) 8.5 corresponds to the y-coordinate of legend (height).
Now, I would like to automatically do this process instead of set manual at each time the position of the legend.
I want the top right of the legend to be positioned at the top border of the first left upper box (just at the level of top line border below the "1sigma ± 0.0012" title).
I would like also the legend to be pushed to the right of the figure (up to the right border for the right lower box of the figure: identified by sigma8 "1sigma ± 0.001" title ; Caution: I want it located before the 1.0 and 0.0 xticks, just at the x-coordinate of right line border).
Here what I tried to get the global coordinates (the entire plot) of the top border for this left upper box :
# First, get y coordinates of top border for first Likelihood
box1 = g.subplots[0,0]
box1_coords = box1._position.bounds
print('box1_coords = ', box1_coords)
and I get at the execution the following values :
box1_coords = (0.125, 0.7860975609756098, 0.09451219512195125, 0.09390243902439022)
As you can see, these values seem to be normalized, so I don't know how to handle if I want to insert these values into :
bbox_to_anchor = [box1_coords[0], box1_coords[1]]
This line of code produces a bad position for legend, as expected.
So, how can I manage to automatically assign the good values for bbox_to_anchor to get what I want (y-coordinate at level of top border of left upper box identified by the "1sigma ± 0.0012" title) and pushed on the right side up to the right border of right lower box (x-coordinate identified by sigma8 with "1sigma ± 0.001" title)?
Update 1
I tried to adapt them to my case, but issue still occurs. Here what I have done:
# g.settings
g = plots.get_subplot_plotter()
# get the max y position of the top left axis
top_left_plot = g.subplots[0,0].axes.get_position().ymax
# get the max x position of the bottom right axis
# it is -1 to reference the last plot
bottom_right_plot = g.subplots[-1,-1].axes.get_position().xmax
I don't know why the values of top_left_plot and bottom_right_plot are not the good ones.
I think that subplots[0,0] (for top y-coordinate of legend) refers to the top left subplot and subplots[-1,-1] to the bottom right subplot (for right x-coordinate of legend) but considering this, it doesn't work.
For example :
# g.settings
g = plots.get_subplot_plotter()
# Call triplot
g.triangle_plot([matrix1, matrix2],
names,
filled = True,
legend_labels = [],
contour_colors = ['darkblue','red'],
line_args = [{'lw':2, 'color':'darkblue'},
{'lw':2, 'color':'red'}])
g.add_legend(['Opt. Flat. No Gamma. - cross - standard situation - Criterion taking into accound a = 200',
'Pess. Flat. No Gamma. - cross - standard situation - Criterion taking into account a = 300'],
legend_loc='upper right',
bbox_to_anchor=(bottom_right_plot, top_left_plot)
)
I get :
legend_coords y_max, x_max 0.88 0.9000000000000001
I can't understand why these values (seems to be comprised between 0.0 and 1.0) are not taken into account with g.add_legend.
With #mullinscr's solution, I get the following figure :
If I take for the coordinates of legend position by forcing :
top_left_plot = 8.3
bottom_right_plot = 1.0
This looks like to the first figure of this post. But these 2 values are not comprised between 0.0 and 1.0 like it should.
Update 2
#mullinscr, thanks, I have followed your update and always get an issue. If I apply the same code snippet directly in my script, i.e :
g.add_legend(['An example legend - item 1'],
legend_loc='upper right', # we want to specify the location of this point
bbox_to_anchor=(bottom_right_plot, top_left_plot),
bbox_transform=plt.gcf().transFigure, # this is the x and y co-ords we extracted above
borderaxespad=0, # this means there is no padding around the legend
edgecolor='black')
Then I get the following figure :
As you can see, the coordinates are not really what is really expected : a slight shift on x-coordinate and y-coordinate is present.
If I apply your code snippet for my legend text, I get:
I give you the link of my entire script, this will be easier maybe for you to see an error compared what is expected:
My entire Python script
Here's my answer, it's the same as #scleronomic's answer, but I'll point out some of the things that tripped me up when figuring this out.
Below is my code to reproduce your desired positioning, I've tried to create the same subplot layout to you, but through matplotlib not getdist -- same result though.
As you discovered, the trick lies in extracting the position data of the first and last axes (top-left and lower-right), to reference from. The bounds method that you used returns: the x0, y0, width and height of the axes (see the docs). However what we want is the maximum x and y, so that our legend corner is in the top right. This can be achieved by using the xmax and ymax method:
axes.flatten()[-1].get_position().xmax
axes.flatten()[0].get_position().ymax
Once we have these variables they can be passed into the bbox_to_anchor parameter of the add_legend() function, as you did. However, if we use loc='upper right' too, it tells matplotlib that we want the upper right of the legend to be pinned to this top right corner. Finally, we need to set borderaxespad=0 otherwise the legend won't sit exactly where we want it to due to default padding.
Please see my example code below, as well as the resulting picture. Note that I left the top-right plot in so you can see that it lines up correctly.
Also, note that as #scleronomic says, calls to plt.tight_layout() etc will mess this positioning up.
import matplotlib.pyplot as plt
# code to layout subplots as in your example:
# --------------------------------------------
g, axes = plt.subplots(nrows=7, ncols=7,figsize=(10,10))
unwanted = [1,2,3,4,5,9,10,11,12,13,17,
18,19,20,25,26,27,33,34,41]
for ax in axes.flatten():
ax.plot([1,2], [1,2])
ax.set_yticks([])
ax.set_xticks([])
for n, ax in enumerate(axes.flatten()):
if n in unwanted:
ax.remove()
# Code to answer your question:
# ------------------------------
# get the max y position of the top left axis
top_left_plot = axes.flatten()[0].get_position().ymax
# get the max x position of the bottom right axis
# it is -1 to reference the last plot
bottom_right_plot = axes.flatten()[-1].get_position().xmax
# I'm using the matplotlib so it is g.legend() not g.add_legend
# but g.add_legend() should work the same as it is a wrapper of th ematplotlib func
g.legend(['Opt. Flat. No Gamma. - cross - standard situation - Criterion taking into accound a = 200',
'Pess. Flat. No Gamma. - cross - standard situation - Criterion taking into account a = 300'],
loc='upper right', # we want to specify the location of this point
bbox_to_anchor=(bottom_right_plot, top_left_plot), # this is the x and y co-ords we extracted above
borderaxespad=0, # this means there is no padding around the legend
edgecolor='black') # I set it black for this example
plt.show()
Update
After #youpilat13's comments, I investigated some more and installed getdist to try and recreate with that tool. Initially I got the same results, but found the trick is, unlike if you were making this in matplotlib, you have to transform the legend's coordinates to figure coordinates. This can be achieved with the following in the g.add_legend() call:
bbox_transform=plt.gcf().transFigure
Here is a complete example:
import getdist
from getdist import plots, MCSamples
from getdist.gaussian_mixtures import GaussianND
covariance = [[0.001**2, 0.0006*0.05, 0], [0.0006*0.05, 0.05**2, 0.2**2], [0, 0.2**2, 2**2]]
mean = [0.02, 1, -2]
gauss=GaussianND(mean, covariance)
g = plots.get_subplot_plotter(subplot_size=3)
g.triangle_plot(gauss,filled=True)
top_left_plot = g.subplots.flatten()[0].get_position().ymax
bottom_right_plot = g.subplots.flatten()[-1].get_position().xmax
g.add_legend(['An example legend - item 1'],
legend_loc='upper right', # we want to specify the location of this point
bbox_to_anchor=(bottom_right_plot, top_left_plot),
bbox_transform=plt.gcf().transFigure, # this is the x and y co-ords we extracted above
borderaxespad=0, # this means there is no padding around the legend
edgecolor='black')
And the resulting image:
It basically works as you described. The bboxes (xmin, ymin, width, height) of the axes are given in fractions of the figure and plt.legend() uses the same format so the two are compatible. By setting the upper right corner of the legend to the corner defined by the outer most axes you get the clean layout and don't have to worry about the exact size of the legend.
import matplotlib.pyplot as plt
n = 4
# Create the subplot grid
# Alternative: fig, ax = plt.subplots(n, n); ax[i, j].remove() for j > i
fig = plt.figure()
gs = fig.add_gridspec(nrows=n, ncols=n)
ax = np.zeros((n, n), dtype=object)
for i in range(n):
for j in range(n):
if j <= i:
ax[i, j] = fig.add_subplot(gs[i, j])
# add this to make the position of the legend easier to spot
ax[0, -1] = fig.add_subplot(gs[0, -1])
# Plot some dummy data
ax[0, 0].plot(range(10), 'b-o', label='Dummy Label 4x4')
# Set the legend
y_max = ax[0][0].get_position().ymax
x_max = ax[-1][-1].get_position().xmax
fig.legend(loc='upper right', bbox_to_anchor=(x_max, y_max),
borderaxespad=0)
plt.show()
Some pitfalls could be using the Constrained Layout
or using bbox_inches='tight' when saving the file as both screw up the position of the legend in unexpected ways.
For some more examples of legend placement I found this collection
very helpful.

Adding a circle on specific date in a matplotlib plot

I have a matpltolib plot made using this code:
ax.plot(df_c.index, y1, color='b')
Here df_c.index is:
DatetimeIndex(['2019-10-31', '2019-11-01', '2019-11-02', '2019-11-03',
'2019-11-04', '2019-11-05', '2019-11-06', '2019-11-07',
'2019-11-08', '2019-11-09',
...
'2020-04-04', '2020-04-05', '2020-04-06', '2020-04-07',
'2020-04-08', '2020-04-09', '2020-04-10', '2020-04-11',
'2020-04-12', '2020-04-13'],
dtype='datetime64[ns]', length=166, freq=None)
The above code makes a lineplot.
I want to add a circle on this date '2020-04-12' with a value of 100. How do I do that? I tried:
ax.plot(datetime.date(2020, 04, 12), 100, 'bo')
but it does not work. How can I fix it?
I'm not entirely certain where you want to draw your circle, but I present here three different circle positions, and a simpler fourth alternative just for highlighting a specific date. A demo image is shown at the bottom. First, let's just plot some data:
import matplotlib.pyplot as plt
dates = ['2019-10-31', '2019-11-01', '2019-11-02', '2019-11-03']
y = [i*i for i in range(len(dates))] # some random y values
# An arbitrary date value to encircle
encircled_date = dates[1]
val_of_encircled_date = y[1]
# Plot the graph
fig, ax = plt.subplots()
ax.plot_date(dates,y,'-')
bottom, top = plt.ylim() # Later, we'll need the min value of the y-axis for correct positioning of circle
Now, if you just want a circle at the graph, as it passes through your specific date, the simplest and (imho) best approach is to simply replot that specific value, but with markerstyle='o'. Adjust marker size, line width and color to your preferences:
# Plot a circle around specific graph value
ax.plot_date(encircled_date, val_of_encircled_date,
'og', # marker style 'o', color 'g'
fillstyle='none', # circle is not filled (with color)
ms=10.0) # size of marker/circle
Then, if you instead wanted a circle around the date-tick, on the x-axis for your specific date, it is a little more tricky depending on what details you need to encircle. Of course, you could use the approach above to get a small circle around the tick only, but I'll show a more advanced approach based on another SO-question:
# Plot a circle around the 'tick' of specific date on the x-axis
circle1 = plt.Circle((encircled_date, bottom), # position
1.0 / len(dates), # radius
color='r',
clip_on=False, # allow drawing outside of axes
fill=False)
ax.add_artist(circle1)
The above solution only encircles the tick, and not the date label itself. We may micro adjust the circle to fit the date-label inside, by tuning two offset parameters,
# Plot a circle around the specific date's label on the x-axis
pos_offset = 0.5
len_offset = 0.4
circle2 = plt.Circle((encircled_date, bottom-pos_offset), # position
(1.0+len_offset) / len(dates), # radius
color='purple',
clip_on=False, # allow drawing outside of axis
fill=False)
ax.add_artist(circle2)
However, this tuning may be a tedious task. If your objective is only to emphasize this particular date, it may be better to simply reconfigure the x-label. You may for instance change the color of the label like this,
ax.get_xticklabels()[2].set_color("red")
ax.get_xticklabels()[2].set_weight("bold")
The four different approaches are shown in the image below. I hope this helps.
One final remark: When you get a densely populated x-axis of dates, it might be worthwhile looking into more advanced formatting of date-labels which you can read all about in the official documentation. For instance, they show how to neatly rotate the labels to fit them closer together without overlapping.

Radius of matplotlib scatter plot [duplicate]

In the pyplot document for scatter plot:
matplotlib.pyplot.scatter(x, y, s=20, c='b', marker='o', cmap=None, norm=None,
vmin=None, vmax=None, alpha=None, linewidths=None,
faceted=True, verts=None, hold=None, **kwargs)
The marker size
s:
size in points^2. It is a scalar or an array of the same length as x and y.
What kind of unit is points^2? What does it mean? Does s=100 mean 10 pixel x 10 pixel?
Basically I'm trying to make scatter plots with different marker sizes, and I want to figure out what does the s number mean.
This can be a somewhat confusing way of defining the size but you are basically specifying the area of the marker. This means, to double the width (or height) of the marker you need to increase s by a factor of 4. [because A = WH => (2W)(2H)=4A]
There is a reason, however, that the size of markers is defined in this way. Because of the scaling of area as the square of width, doubling the width actually appears to increase the size by more than a factor 2 (in fact it increases it by a factor of 4). To see this consider the following two examples and the output they produce.
# doubling the width of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*4**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Notice how the size increases very quickly. If instead we have
# doubling the area of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*2**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Now the apparent size of the markers increases roughly linearly in an intuitive fashion.
As for the exact meaning of what a 'point' is, it is fairly arbitrary for plotting purposes, you can just scale all of your sizes by a constant until they look reasonable.
Edit: (In response to comment from #Emma)
It's probably confusing wording on my part. The question asked about doubling the width of a circle so in the first picture for each circle (as we move from left to right) it's width is double the previous one so for the area this is an exponential with base 4. Similarly the second example each circle has area double the last one which gives an exponential with base 2.
However it is the second example (where we are scaling area) that doubling area appears to make the circle twice as big to the eye. Thus if we want a circle to appear a factor of n bigger we would increase the area by a factor n not the radius so the apparent size scales linearly with the area.
Edit to visualize the comment by #TomaszGandor:
This is what it looks like for different functions of the marker size:
x = [0,2,4,6,8,10,12,14,16,18]
s_exp = [20*2**n for n in range(len(x))]
s_square = [20*n**2 for n in range(len(x))]
s_linear = [20*n for n in range(len(x))]
plt.scatter(x,[1]*len(x),s=s_exp, label='$s=2^n$', lw=1)
plt.scatter(x,[0]*len(x),s=s_square, label='$s=n^2$')
plt.scatter(x,[-1]*len(x),s=s_linear, label='$s=n$')
plt.ylim(-1.5,1.5)
plt.legend(loc='center left', bbox_to_anchor=(1.1, 0.5), labelspacing=3)
plt.show()
Because other answers here claim that s denotes the area of the marker, I'm adding this answer to clearify that this is not necessarily the case.
Size in points^2
The argument s in plt.scatter denotes the markersize**2. As the documentation says
s : scalar or array_like, shape (n, ), optional
size in points^2. Default is rcParams['lines.markersize'] ** 2.
This can be taken literally. In order to obtain a marker which is x points large, you need to square that number and give it to the s argument.
So the relationship between the markersize of a line plot and the scatter size argument is the square. In order to produce a scatter marker of the same size as a plot marker of size 10 points you would hence call scatter( .., s=100).
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([0],[0], marker="o", markersize=10)
ax.plot([0.07,0.93],[0,0], linewidth=10)
ax.scatter([1],[0], s=100)
ax.plot([0],[1], marker="o", markersize=22)
ax.plot([0.14,0.86],[1,1], linewidth=22)
ax.scatter([1],[1], s=22**2)
plt.show()
Connection to "area"
So why do other answers and even the documentation speak about "area" when it comes to the s parameter?
Of course the units of points**2 are area units.
For the special case of a square marker, marker="s", the area of the marker is indeed directly the value of the s parameter.
For a circle, the area of the circle is area = pi/4*s.
For other markers there may not even be any obvious relation to the area of the marker.
In all cases however the area of the marker is proportional to the s parameter. This is the motivation to call it "area" even though in most cases it isn't really.
Specifying the size of the scatter markers in terms of some quantity which is proportional to the area of the marker makes in thus far sense as it is the area of the marker that is perceived when comparing different patches rather than its side length or diameter. I.e. doubling the underlying quantity should double the area of the marker.
What are points?
So far the answer to what the size of a scatter marker means is given in units of points. Points are often used in typography, where fonts are specified in points. Also linewidths is often specified in points. The standard size of points in matplotlib is 72 points per inch (ppi) - 1 point is hence 1/72 inches.
It might be useful to be able to specify sizes in pixels instead of points. If the figure dpi is 72 as well, one point is one pixel. If the figure dpi is different (matplotlib default is fig.dpi=100),
1 point == fig.dpi/72. pixels
While the scatter marker's size in points would hence look different for different figure dpi, one could produce a 10 by 10 pixels^2 marker, which would always have the same number of pixels covered:
import matplotlib.pyplot as plt
for dpi in [72,100,144]:
fig,ax = plt.subplots(figsize=(1.5,2), dpi=dpi)
ax.set_title("fig.dpi={}".format(dpi))
ax.set_ylim(-3,3)
ax.set_xlim(-2,2)
ax.scatter([0],[1], s=10**2,
marker="s", linewidth=0, label="100 points^2")
ax.scatter([1],[1], s=(10*72./fig.dpi)**2,
marker="s", linewidth=0, label="100 pixels^2")
ax.legend(loc=8,framealpha=1, fontsize=8)
fig.savefig("fig{}.png".format(dpi), bbox_inches="tight")
plt.show()
If you are interested in a scatter in data units, check this answer.
You can use markersize to specify the size of the circle in plot method
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.randn(20)
x2 = np.random.randn(20)
plt.figure(1)
# you can specify the marker size two ways directly:
plt.plot(x1, 'bo', markersize=20) # blue circle with size 10
plt.plot(x2, 'ro', ms=10,) # ms is just an alias for markersize
plt.show()
From here
It is the area of the marker. I mean if you have s1 = 1000 and then s2 = 4000, the relation between the radius of each circle is: r_s2 = 2 * r_s1. See the following plot:
plt.scatter(2, 1, s=4000, c='r')
plt.scatter(2, 1, s=1000 ,c='b')
plt.scatter(2, 1, s=10, c='g')
I had the same doubt when I saw the post, so I did this example then I used a ruler on the screen to measure the radii.
I also attempted to use 'scatter' initially for this purpose. After quite a bit of wasted time - I settled on the following solution.
import matplotlib.pyplot as plt
input_list = [{'x':100,'y':200,'radius':50, 'color':(0.1,0.2,0.3)}]
output_list = []
for point in input_list:
output_list.append(plt.Circle((point['x'], point['y']), point['radius'], color=point['color'], fill=False))
ax = plt.gca(aspect='equal')
ax.cla()
ax.set_xlim((0, 1000))
ax.set_ylim((0, 1000))
for circle in output_list:
ax.add_artist(circle)
This is based on an answer to this question
If the size of the circles corresponds to the square of the parameter in s=parameter, then assign a square root to each element you append to your size array, like this: s=[1, 1.414, 1.73, 2.0, 2.24] such that when it takes these values and returns them, their relative size increase will be the square root of the squared progression, which returns a linear progression.
If I were to square each one as it gets output to the plot: output=[1, 2, 3, 4, 5]. Try list interpretation: s=[numpy.sqrt(i) for i in s]

Arrow pointing to a point on a curve

I am trying to plot arrows pointing at a point on a curve in python using matplotlib.
On this line i need to point vertical arrows at specific points.
This is for indicating forces acting on a beam, so their direction is very important. Where the curve is the beam and the arrow is the force.
I know the coordinate of said point, exactly, but it is of cause changing with the input.
This input should also dictate whether the arrow points upwards or downwards from the line. (negative and positive forces applied).
I have tried endlessly with plt.arrow, but because the scale changes drastically and so does the quadrant in which the arrow has to be. So it might have to start at y < 0 and end in a point where y > 0.
The problem is that the arrowhead length then points the wrong way like this --<. instead of -->.
So before I go bald because of this, I would like to know if there is an easy way to apply a vertical arrow (could be infinite in the opposite direction for all i care) pointing to a point on a curve, of which I can control whether it point upwards to the curve, or downwards to the curve.
I'm not sure I completely follow you, but my approach would be to use annotate rather than arrow (just leave the text field blank). You can specify one end of the arrow in data coordinates and the other in offset pixels: but you do have to map your forces (indicating the length of the arrows) to number of pixels. For example:
import matplotlib.pyplot as plt
import numpy as np
# Trial function for adding vertical arrows to
def f(x):
return np.sin(2*x)
x = np.linspace(0,10,1000)
y = f(x)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x,y, 'k', lw=2)
ax.set_ylim(-3,3)
def add_force(F, x1):
"""Add a vertical force arrow F pixels long at x1 (in data coordinates)."""
ax.annotate('', xy=(x1, f(x1)), xytext=(0, F), textcoords='offset points',
arrowprops=dict(arrowstyle='<|-', color='r'))
add_force(60, 4.5)
add_force(-45, 6.5)
plt.show()
The inverted arrowhead is due to a negative sign of the head_length variable. Probably you are scaling it using a negative value. Using head_length= abs(value)*somethingelse should take care of your problem.

Weird behavior of matplotlibs boxplot when using the notch shape

I am encountering some weird behavior in matplotlib's boxplot function when I am using the "notch" shape. I am using some code that I have written a while ago and never had those issues -- I am wondering what the problem is. Any ideas?
When I turn the notch shape off it looks normal though
This would be the code:
def boxplot_modified(data):
fig = plt.figure(figsize=(8,6))
ax = plt.subplot(111)
bplot = plt.boxplot(data,
#notch=True, # notch shape
vert=True, # vertical box aligmnent
sym='ko', # red circle for outliers
patch_artist=True, # fill with color
)
# choosing custom colors to fill the boxes
colors = 3*['lightgreen'] + 3*['lightblue'], 'lightblue', 'lightblue', 'lightblue']
for patch, color in zip(bplot['boxes'], colors):
patch.set_facecolor(color)
# modifying the whiskers: straight lines, black, wider
for whisker in bplot['whiskers']:
whisker.set(color='black', linewidth=1.2, linestyle='-')
# making the caps a little bit wider
for cap in bplot['caps']:
cap.set(linewidth=1.2)
# hiding axis ticks
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# adding horizontal grid lines
ax.yaxis.grid(True)
# remove axis spines
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["bottom"].set_visible(True)
ax.spines["left"].set_visible(True)
plt.xticks([y+1 for y in range(len(data))], 8*['x'])
# raised title
#plt.text(2, 1, 'Modified',
# horizontalalignment='center',
# fontsize=18)
plt.tight_layout()
plt.show()
boxplot_modified(df.values)
and when I make a plain plot without the customization, the problem still occurs:
def boxplot(data):
fig = plt.figure(figsize=(8,6))
ax = plt.subplot(111)
bplot = plt.boxplot(data,
notch=True, # notch shape
vert=True, # vertical box aligmnent
sym='ko', # red circle for outliers
patch_artist=True, # fill with color
)
plt.show()
boxplot(df.values)
Okay, as it turns out, this is actually a correct behavior ;)
From Wikipedia:
Notched box plots apply a "notch" or narrowing of the box around the median. Notches are useful in offering a rough guide to significance of difference of medians; if the notches of two boxes do not overlap, this offers evidence of a statistically significant difference between the medians. The width of the notches is proportional to the interquartile range of the sample and inversely proportional to the square root of the size of the sample. However, there is uncertainty about the most appropriate multiplier (as this may vary depending on the similarity of the variances of the samples). One convention is to use +/-1.58*IQR/sqrt(n).
This was also discussed in an issue on GitHub; R produces a similar output as evidence that this behaviour is "correct."
Thus, if we have this weird "flipped" appearance in the notched box plots, it simply means that the 1st quartile has a lower value than the confidence of the mean and vice versa for the 3rd quartile. Although it looks ugly, it's actually useful information about the (un)confidence of the median.
A bootstrapping (random sampling with replacement to estimate parameters of a sampling distribution, here: confidence intervals) might reduce this effect:
From the plt.boxplot documentation:
bootstrap : None (default) or integer
Specifies whether to bootstrap the confidence intervals
around the median for notched boxplots. If bootstrap==None,
no bootstrapping is performed, and notches are calculated
using a Gaussian-based asymptotic approximation (see McGill, R.,
Tukey, J.W., and Larsen, W.A., 1978, and Kendall and Stuart,
1967). Otherwise, bootstrap specifies the number of times to
bootstrap the median to determine it's 95% confidence intervals.
Values between 1000 and 10000 are recommended.

Categories