How to make this chart easier to read? - python

I want to know how to get my x axis labels to display bigger so that the team labels aren't overlapping. I'm sure it's just a matter of configuring the chart size
My code:
plt.plot(prem_data.Team, prem_data.attack_scored,'o')
plt.plot(prem_data.Team, prem_data.defence_saves)
plt.xlabel("Team")
plt.ylabel("Attack goals scored & Defence tackles")
plt.legend(["attack scored", "defence saved"])
plt.show()

I can imagine there being two mutually non-exclusive solutions.
Directly alter the size of the font. This can be achieved via calling plt.rcParams.update({'font.size': <font_size>}), assuming that you have imported matplotlib.pyplot under the alias plt, as you have done in the source code provided. You would probably want to set the <font_size> to be small to prevent overlapping labels, but this would require some experimentation.
Increase the size of the figure. This can be done in a number of ways, but perhaps the simplest method you can implement with minimal edits to your current code would be to use the command plt.rcParams["figure.figsize"] = <fig_size> where <fig_size> is a tuple specifying the size of the figure in inches, such as (10, 5).
With some trial and error, you should be able to manipulate the size of the font and the figure to produce a plot with improved readability.
Note: The method for altering figure size I introduced above is not the most conventional way to go about this problem. Instead, it is much more common to use matplotlib.pyplot.figure or similar variants. For more information, I recommend that you check out this thread and the documentation.

Related

How to tell datashader to use gray color for NA values when plotting categorical data

I have very large dataset that I cannot plot directly using holoviews. I want to make a scatterplot with categorial data. Unfortunately my data is very sparse and many points have NA as category. I would like to make these points gray. Is there any way to make datashader know what I want to do?
I show you the way I do it now (as more or less proposed in https://holoviews.org/user_guide/Large_Data.html ).
I provide you an example:
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import datashader as ds
from datashader.colors import Sets1to3
from holoviews.operation.datashader import datashade,spread
raw_data = [('Alice', 60, 'London', 5) ,
('Bob', 14, 'Delhi' , 7) ,
('Charlie', 66, np.NaN, 11) ,
('Dave', np.NaN,'Delhi' , 15) ,
('Eveline', 33, 'Delhi' , 4) ,
('Fred', 32, 'New York', np.NaN ),
('George', 95, 'Paris', 11)
]
# Create a DataFrame object
df = pd.DataFrame(raw_data, columns=['Name', 'Age', 'City', 'Experience'])
df['City']=pd.Categorical(df['City'])
x='Age'
y='Experience'
color='City'
cats=df[color].cat.categories
# Make dummy-points (currently the only way to make a legend: https://holoviews.org/user_guide/Large_Data.html)
for cat in cats:
#Just to make clear how many points of a given category we have
print(cat,((df[color]==cat)&(df[x].notnull())&(df[y].notnull())).sum())
color_key=[(name,color) for name, color in zip(cats,Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(color=c,size=0) for n,c in color_key})
# Create the plot with datashader
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),)
datashaded=datashade(points,aggregator=ds.by(color)).opts(width=800, height=480)
(spread(datashaded,px=4, shape='square')*color_points).opts(legend_position='right')
It produces the following picture:
You can see some issues:
Most importantly although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
Then there are some minor issues I have I did not want to open questions for. (If you think they deserve their own question please tell me, I am new to stackoverflow and appreciate your advice.)
One other problem:
The dots are not all of the same size. This is quite ugly. Is there a way to change that?
And then there is also a question that I have: Does the datashader internally also use the .cat.categories-method to decide what color to use? How are the colors, that datashader uses, determined? Because I wonder whether the legend is always in correct order (showing the correct colors: If you permute the order in cats then color_key and cats are not in the same order anymore and the legend shows wrong colors). It seems to always work the way I do but I feel a bit insecure.
And maybe someone wants to give their opinion whether Points is okay to use for scatterplots in this case. Because I do not see any difference to Scatter and also semantically there is not really one variable that causes the other (although one might argue that age causes experience in this case, but I am going to plot variables where it is not easy at all to find those kinds of causalities) so it is best to use Points if I understood the documentation https://holoviews.org/reference/elements/bokeh/Points.html correctly.
Most importantly although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
Right now I believe Datashader replaces NaNs with zeros (see https://github.com/holoviz/datashader/blob/master/datashader/transfer_functions/__init__.py#L351). Seems like a good feature request to be able to supply Datashader with a color to use for NaNs instead, but in the meantime, I'd recommend replacing the NaNs with an actual category name like "Other" or "Missing" or "Unknown", and then both the coloring and the legend should reflect that name.
One other problem: The dots are not all of the same size. This is quite ugly. Is there a way to change that?
Usually Datashader in a Bokeh HoloViews plot will render once initially before it is put into a Bokeh layout, and will be triggered to update once the layout is finished with a final version. Here, the initial rendering is being auto-ranged to precisely the range of the data points, then clipped by the boundaries of the plot (making squares near the edges become rectangles), and then the range of the plot is updated once the legend is added. To see how that works, remove *color_points and you'll see the same shape of dots, but now cropped by the plot edges:
You can manually trigger an update to the plot by zooming or panning slightly once it's displayed, but to force it to update without needing manual intervention, you can supply an explicit plot range:
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),).redim.range(Age=(0,90), Experience=(0,14))
It would be great if you could file a bug report on HoloViews asking why it is not refreshing automatically in this case when the legend is included. Hopefully a simple fix!
Does the datashader internally also use the .cat.categories-method to decide what color to use? How are the colors, that datashader uses, determined? Because I wonder whether the legend is always in correct order (showing the correct colors: If you permute the order in cats then color_key and cats are not in the same order anymore and the legend shows wrong colors). It seems to always work the way I do but I feel a bit insecure.
Definitely! Whenever you show a legend, you should be sure to pass in the color key as a dictionary so that you can be sure that the legend and the plot coloring use the same colors for each category. Just pass in color_key={k:v for k,v in color_key} when you call datashade.
And maybe someone wants to give their opinion whether Points is okay to use for scatterplots in this case. Because I do not see any difference to Scatter and also semantically there is not really one variable that causes the other (although one might argue that age causes experience in this case, but I am going to plot variables where it is not easy at all to find those kinds of causalities) so it is best to use Points if I understood the documentation https://holoviews.org/reference/elements/bokeh/Points.html correctly.
Points is meant for a 2D location in a plane where both axes are interchangeable, such as the physical x,y location on a geometric plane. Points has two independent dimensions, and expects you to have any dependent dimensions be used for the color, marker shape, etc. It's true that if you don't know which variable might be dependent on the other, a Scatter is tricky to use, but simply choosing to put something on the x axis will set people up to think that that variable is independent, so there's not much that can be done about it. Definitely not appropriate to use Points in this case.

How to buffer pyplot plots

TL;DR: I want to do something like
cache.append(fig.save_lines)
....
cache.load_into(fig)
I'm writing a (QML) front-end for a pyplot-like and matplotlib based MCMC sample visualisation library, and hit a small roadblock. I want to be able to produce and cache figures in the background, so that when the user moves some sliders, the plots aren't re-generated (they are complex and expensive to re-compute) but just brought in from the cache.
In order to do that I need to be able to do the plotting (but not the rendering) offline and then simply change the contents of a canvas. Effectively I want to do something like cache the
line = plt.plot(x,y)
object, but for multiple subplots.
The library produces very complex plots so I can't keep track of the line2D objects and use those.
My attempt at a solution: render to a pixmap with the correct DPI and use that. Issues arise if I resize the canvas, and not want to re-scale the Pixmaps. I've had situations where the wonderful SO community came up with much better solutions than what I had in mind, so if anyone has experience and/or ideas for how to get this behaviour, I'd be very much obliged!

Matplotlib: increase distance between automatically generated tick labels

When generating a new figure or axis with matplotlib (or pyplot), there is (I assume) some sort of automated way to determine how many ticks are appropriate for each axis.
Unfortunately, this often results in labels which are too close to be read comfortably, or even overlap. I'm aware of the ways to specify tick locations and labels explicitly (e.g. ax.set_xticks, ax.set_xtick_labels, but I wonder if whatever does the automatic tick distribution if nothing is specified can be influenced by some global matplotlib parameter(s).
Do such global parameters exist, and what are they?
I'm generating lots of figures automatically and save them, and it can get a little annoying having to treat them all individually ...
In case there is no simple way to tell matplotlib to thin out the labels, is there some other workaround to achieve more generous spacing between them?
by reading the documentation of xticks matplotlib.pyplot.xticks there seems to be no such global arguments.
However it is very simple to get around it by using the explicit xticks and xticks_labels taking into account that you can:
change the font size (decrease it)
rotate the labels (by 45°) or make them vertical (less overlapping).
increase the fig size.
program a function that generates adaptif xticks based on your input.
and many other possible workarounds.

Multiple tick locators on single axis of a plot in matplotlib

I've done some searching around, and cannot easily find a solution this problem. Effectively, I want to have multiple tick locators on a single axis such that I can do something like in the plot below.
Note how the x-axis starts off logarithmic, but becomes linear once 500 is reached. I figured one possible solution was to simply divide the data into two portions, plot it on two graphs, each with their own locators, and then put the graphs right next to each other so they're seamless, but that seems very unpythonic. Anyone have a better solution?
I suspect the following URL might be of use:
http://matplotlib.org/examples/axes_grid/parasite_simple2.html (click on the plot to have the python code)
If you need some specialized graphs, it's always a good idea to have a look at the Matplotlib gallery:
http://matplotlib.org/gallery.html
EDIT: It is possible to make custom ticks on the X-axis:
http://matplotlib.org/examples/ticks_and_spines/ticklabels_demo_rotation.html
You may find an implementation of this scale by Jesús Torrado here.

Interactive selection of series in a matplotlib plot

I have been looking for a way to be able to select which series are visible on a plot, after a plot is created.
I need this as i often have plots with many series. they are too many to plot at the same time, and i need to quickly and interactively select which series are visible. Ideally there will be a window with a list of series in the plot and checkboxes, where the series with the checked checkbox is visible.
Does anyone know if this has been already implemented somewhere?, if not then can someone guide me of how can i do it myself?
Thanks!
Omar
It all depends on how much effort you are willing to do and what the exact requirements are, but you can bet it has already been implemented somewhere :-)
If the aim is mainly to not clutter the image, it may be sufficient to use the built-in capabilities; you can find relevant code in the matplotlib examples library:
http://matplotlib.org/examples/event_handling/legend_picking.html
http://matplotlib.org/examples/widgets/check_buttons.html
If you really want to have a UI, so you can guard the performance by limiting the amount of plots / data, you would typically use a GUI toolbox such as GTK, QT or WX. Look here for some articles and example code:
http://eli.thegreenplace.net/2009/05/23/more-pyqt-plotting-demos/
A list with checkboxes will be fine if you have a few plots or less, but for more plots a popup menu would probably be better. I am not sure whether either of these is possible with matplotlib though.
The way I implemented this once was to use a slider to select the plot from a list - basically you use the slider to set the index of the series that should be shown. I had a few hundred series per dataset, so it was a good way to quickly glance through them.
My code for setting this up was roughly like this:
fig = pyplot.figure()
slax = self.fig.add_axes((0.1,0.05,0.35,0.05))
sl = matplotlib.widgets.Slider(slax, "Trace #", 0, len(plotlist), valinit=0.0)
def update_trace():
ax.clear()
tracenum = int(np.floor(sl.val))
ax.plot(plotlist[tracenum])
fig.canvas.draw()
sl.on_changed(update_trace)
ax = self.fig.add_axes((0.6, 0.2, 0.35, 0.7))
fig.add_subplot(axes=self.traceax)
update_trace()
Here's an example:
Now that plot.ly has opened sourced their libraries, it is a really good choice for interactive plots in python. See for example: https://plot.ly/python/legend/#legend-names. You can click on the legend traces and select/deselect traces.
If you want to embed in an Ipython/Jupyter Notebook, that is also straightforward: https://plot.ly/ipython-notebooks/gallery/

Categories