I have a reasonably complicated grid of subplots that involves two sets (one on the left and another on the right) of columns plotting a set of quantities for each row, separated by a common legend to label the entries in each row.
Here is a sample of what I want to accomplish
Using matplotlib with constrained_layout = True works 95% perfectly for applying the out optimal sizes & spacing for the columns, down to the tricky case of having the legend run down the middle. The remaining 5% is highlighted in red, where the wordy x-axis tick labels seem to push away the columns: it would be perfect if there was a way to make the layout engine "ignore" the tick labels in determining the spacing.
Methods using other libraries are also appreciated. Thank you in advance.
What I tried:
subplots_adjust
GridSpec
The main difficulty with those attempts:
constrained_layout is incompatible with those settings, so one must sacrifice the optimized legend spacing at the cost of getting the column spacing right, or vice versa.
I have very large dataset that I cannot plot directly using holoviews. I want to make a scatterplot with categorial data. Unfortunately my data is very sparse and many points have NA as category. I would like to make these points gray. Is there any way to make datashader know what I want to do?
I show you the way I do it now (as more or less proposed in https://holoviews.org/user_guide/Large_Data.html ).
I provide you an example:
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import datashader as ds
from datashader.colors import Sets1to3
from holoviews.operation.datashader import datashade,spread
raw_data = [('Alice', 60, 'London', 5) ,
('Bob', 14, 'Delhi' , 7) ,
('Charlie', 66, np.NaN, 11) ,
('Dave', np.NaN,'Delhi' , 15) ,
('Eveline', 33, 'Delhi' , 4) ,
('Fred', 32, 'New York', np.NaN ),
('George', 95, 'Paris', 11)
]
# Create a DataFrame object
df = pd.DataFrame(raw_data, columns=['Name', 'Age', 'City', 'Experience'])
df['City']=pd.Categorical(df['City'])
x='Age'
y='Experience'
color='City'
cats=df[color].cat.categories
# Make dummy-points (currently the only way to make a legend: https://holoviews.org/user_guide/Large_Data.html)
for cat in cats:
#Just to make clear how many points of a given category we have
print(cat,((df[color]==cat)&(df[x].notnull())&(df[y].notnull())).sum())
color_key=[(name,color) for name, color in zip(cats,Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(color=c,size=0) for n,c in color_key})
# Create the plot with datashader
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),)
datashaded=datashade(points,aggregator=ds.by(color)).opts(width=800, height=480)
(spread(datashaded,px=4, shape='square')*color_points).opts(legend_position='right')
It produces the following picture:
You can see some issues:
Most importantly although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
Then there are some minor issues I have I did not want to open questions for. (If you think they deserve their own question please tell me, I am new to stackoverflow and appreciate your advice.)
One other problem:
The dots are not all of the same size. This is quite ugly. Is there a way to change that?
And then there is also a question that I have: Does the datashader internally also use the .cat.categories-method to decide what color to use? How are the colors, that datashader uses, determined? Because I wonder whether the legend is always in correct order (showing the correct colors: If you permute the order in cats then color_key and cats are not in the same order anymore and the legend shows wrong colors). It seems to always work the way I do but I feel a bit insecure.
And maybe someone wants to give their opinion whether Points is okay to use for scatterplots in this case. Because I do not see any difference to Scatter and also semantically there is not really one variable that causes the other (although one might argue that age causes experience in this case, but I am going to plot variables where it is not easy at all to find those kinds of causalities) so it is best to use Points if I understood the documentation https://holoviews.org/reference/elements/bokeh/Points.html correctly.
Most importantly although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
Right now I believe Datashader replaces NaNs with zeros (see https://github.com/holoviz/datashader/blob/master/datashader/transfer_functions/__init__.py#L351). Seems like a good feature request to be able to supply Datashader with a color to use for NaNs instead, but in the meantime, I'd recommend replacing the NaNs with an actual category name like "Other" or "Missing" or "Unknown", and then both the coloring and the legend should reflect that name.
One other problem: The dots are not all of the same size. This is quite ugly. Is there a way to change that?
Usually Datashader in a Bokeh HoloViews plot will render once initially before it is put into a Bokeh layout, and will be triggered to update once the layout is finished with a final version. Here, the initial rendering is being auto-ranged to precisely the range of the data points, then clipped by the boundaries of the plot (making squares near the edges become rectangles), and then the range of the plot is updated once the legend is added. To see how that works, remove *color_points and you'll see the same shape of dots, but now cropped by the plot edges:
You can manually trigger an update to the plot by zooming or panning slightly once it's displayed, but to force it to update without needing manual intervention, you can supply an explicit plot range:
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),).redim.range(Age=(0,90), Experience=(0,14))
It would be great if you could file a bug report on HoloViews asking why it is not refreshing automatically in this case when the legend is included. Hopefully a simple fix!
Does the datashader internally also use the .cat.categories-method to decide what color to use? How are the colors, that datashader uses, determined? Because I wonder whether the legend is always in correct order (showing the correct colors: If you permute the order in cats then color_key and cats are not in the same order anymore and the legend shows wrong colors). It seems to always work the way I do but I feel a bit insecure.
Definitely! Whenever you show a legend, you should be sure to pass in the color key as a dictionary so that you can be sure that the legend and the plot coloring use the same colors for each category. Just pass in color_key={k:v for k,v in color_key} when you call datashade.
And maybe someone wants to give their opinion whether Points is okay to use for scatterplots in this case. Because I do not see any difference to Scatter and also semantically there is not really one variable that causes the other (although one might argue that age causes experience in this case, but I am going to plot variables where it is not easy at all to find those kinds of causalities) so it is best to use Points if I understood the documentation https://holoviews.org/reference/elements/bokeh/Points.html correctly.
Points is meant for a 2D location in a plane where both axes are interchangeable, such as the physical x,y location on a geometric plane. Points has two independent dimensions, and expects you to have any dependent dimensions be used for the color, marker shape, etc. It's true that if you don't know which variable might be dependent on the other, a Scatter is tricky to use, but simply choosing to put something on the x axis will set people up to think that that variable is independent, so there's not much that can be done about it. Definitely not appropriate to use Points in this case.
looking for some "magic" command that make the maps of the subplots (2x2 in my case) well speared not too much but with the right spacing in order to be considered "quality plot" I found that i can set all using the option rect inside plt.tight_layout I spend time to find this parameters : plt.tight_layout(rect=(0.02,0.02,0.97,0.97))
Now the plot is fitting well the pdf image but the 2 plots of the top is to close to the 2 below look the picture without going out of bound on the top ? and how can obtain the plot title a bit more separate respect the figure ? hope in your hint !
EDIT ok .. if I use the command plt.title('...',y=1.1) this is taken just on the last plot (the axs[1,1]) while I'm write the command before all the subplot !
Sorry I don't have an answer for automatic resizing but since you asked for some hints, there is one possible solution:
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots_adjust.html
What you need is wspace for horizontal spacing and hspace for horizontal spacing between the subplots. This gives you freedom to choose the desired spacing to make your plot a "quality plot". Hope it helps.
As to your second question about placing the title, you can use:
plt.title("Title here", y=1.25)
where y defines the position of your title in relative coordinates. y=1 would mean at the top x-axis and y=0.5 would mean in the centre of the plot. Since you have a title for each subplot, you can use the respective relative coordinates for each.
I am plotting some scalar data as a contour plot with matplotlib.contourf. On top of it, I am plotting some vector data with matplotlib.arrow. The basic plot has come along OK, but now I need to put a box on the plot with a default-size arrow plus the data value to which it corresponds, so the viewer will know what kind of scale he is looking at. For instance, I need a box with a horizontal arrow of some length and, below that, some text like "10 cm/sec".
First, if anyone can give me a simple approach to this, I would be grateful.
Second, the approach I have tried is to do the contour plot, then plot the arrows, then add a rectangle to the plot like so:
rect=pl.Rectangle((300,70),15,15,fc='white')
pl.gca().add_patch(rect)
and then, finally, put my scale arrow and text on top of this rectangle.
This isn't working because the rectangle patch covers up the contour, but it doesn't cover up the arrows in the plot. Is there a way to move the patch completely "to the front" of everything else?
Got it. Using pylab.quiver and pylab.quiverkey functions. quiver produces a nice vector field with just a few lines of code, and quiverkey makes it easy to produce a scaling vector with text. And, for some reason, the arrows plotted with quiver are indeed covered by my rectangle, so it is easy to make the scaling arrow very visible. There are still some mysteries in all of this for me. If anyone wants to try to clear them up, would be much obliged. But I have a way now to do what I need in this instance.
I have a matplotlib axes instance inside which I'm animating an AxesImage via blitting.
What I'd like to do is animate the ticks on the x-axis as well.
I am updating the data on the AxesImage (and subsequently) drawing its artist quite frequently, and on each update I'd like to move one extra tick placed to highlight the position of something.
This is what I'm doing right now:
axis = axes.get_xaxis
im.set_data(new_data)
axis.set_ticks([10,20,30,x,t])
axis.set_ticklabels(["p", "u", "z", "z", "i"])
axes.draw_artist(im)
axes.draw_artist(axis)
While I see the ticks correctly updating, the labels are not. I think that the axes bbox does not include the axes, is this possible? If so, how can I animate it? Should I copy and restore from somewhere else?
The axes bbox doesn't include anything outside of the "inside" of the axes (e.g. it doesn't include the tick labels, title, etc.)
One quick way around this is to just grab the entire region of the figure when you're blitting. (E.g. background = canvas.copy_from_bbox(fig.bbox))
This can cause problems if you have multiple subplots and only want to animate one of them. In that case, you can do something along the lines of background = canvas.copy_from_bbox(ax.bbox.expanded(1.1, 1.2)). You'll have to guesstimate the ratios you need, though.
If you need the exact extent of the tick labels, it's a bit trickier. The easiest way is to iterate through the ticklabel objects and get the union with ax.bbox. You can make this a one-liner: ax.bbox.union([label.get_window_extent() for label in ax.get_xticklabels()]).
At any rate, one of those three options should do what you need, I think.