Displaying large graphs with matplotlib

Displaying large graphs with matplotlib - python

I'm using the matplotlib Python library to graph information about text. On my y-axis, I have a word, and on my x-axis, I have the number of times that word appears in the text. The thing is, with large pieces of text, the number of unique of words becomes undisplayable on a screen.
I'm currently using the PyCharm IDE and there is a helpful tool which shows my bar graph in its entirety, just zoomed out a lot. I can zoom into this graph and see all my data nicely.
My question is if there is a way to do such a thing with matplotlib. That is, make a graph such that in the case that there are too many words to display, it resizes and zooms out so that all the words do not overlap one another.
If there are any other means to display data such as mine in a better way, I would highly appreciate any suggestions. Thanks!

You can increase the figure size before plotting.
import matplotlib.pyplot as plt
plt.figure(figsize=(X, Y)

Related

matplotlib legend performance issue

I am using Jupyter-notebook with python 3.6.2 and matplotlib to plot some data.
When I plot my data, I want to add a legend to the plot (basically to know which line is which)
However calling plt.legend takes a lot of time (almost as much as the plot itself, which to my understanding should just be instant).
Minimal toy problem that reproduces the issue:
import numpy as np
import matplotlib.pyplot as plt
# Toy useless data (one milion x 4)
my_data = np.random.rand(1000000,4)
plt.plot(my_data)
#plt.legend(['A','C','G','T'])
plt.show()
The data here is just random and useless, but it reproduces my problem:
If I uncomment the plt.legend line, the run takes almost double the time
Why? Shouldn't the legend just look at the plot, see that 4 plots have been made, and draw a box assigning each color to the corresponding string?
Why is a simple legend taking so much time?
Am I missing something?

Replicating the answer by #bnaecker, such that this question is answered:
By default, the legend will be placed in the "best" location, which requires computing how many points from each line are inside a potential legend box. If there are many points, this can take a while. Drawing is much faster when specifying a location other than "best", e.g. plt.legend(loc=3).

matplotlib shows different figure than saves from the show() window

I plot rather complex data with matplotlib's imshow(), so I prefer to first visually inspect if it is all right, before saving. So I usually call plt.show(), see if it is fine, and then manually save it with a GUI dialog in the show() window. And everything was always fine, but recently I started getting a weird thing. When I save the figure I get a very wrong picture, though it looks perfectly fine in the matplotlib's interactive window.
If I zoom to a specific location and then save what I see, I get a fine figure.
So, this is the correct one (a small area of the picture, saved with zooming first):
And this one is a zoom into approximately the same area of the figure, after I saved it all:
For some reason pixels in the second one are much bigger! That is vary bad for me - as you can see, it looses a lot of details in there.
Unfortunately, my code is quite complicated and I wasn't able to reproduce it with some randomly generated data. This problem appeared after I started to plot two triangles of the picture separately: I read my two huge data files with np.loadtxt(), get np.triu(data1) and np.tril(data2), mask zeroes, NAs, -inf and +inf and then plot them on the same axes with plt.imshow(data, interpolation='none', origin='lower', extent=extent). I do lot's of other different things to make it nicer, but I guess it doesn't matter, because it all worked like a charm before.
Please, let me know, if you need to know anything else specific from my code, that could be relevant to this problem.

When you save a figure in png/jpg you are forced to rasterize it, convert it to a finite number of pixels. If you want to keep the full resolution, you have a few options:
Use a very high dpi parameter, like 900. Saving the plot will be slow, and many image viewers will take some time to open it, but the information is there and you can always crop it.
Save the image data, the exact numbers you used to make the plot. Whenever you need to inspect it, load it in Matplotlib in interactive mode, navigate to your desired corner, and save it.
Use SVG: it is a vector graphics format, so you are not limited to pixels.
Here is how to use SVG:
import matplotlib
matplotlib.use('SVG')
import matplotlib.pyplot as plt
# Generate the image
plt.imshow(image, interpolation='none')
plt.savefig('output_image')
Edit:
To save a true SVG you need to use the SVG backend from the beginning, which is unfortunately, incompatible with interactive mode. Some backends, like GTKCairo seem to allow both, but the result is still rasterized, not a true SVG.
This may be a bug in matplotlib, at least, to the best of my knowledge, it is not documented.

Interactive selection of series in a matplotlib plot

I have been looking for a way to be able to select which series are visible on a plot, after a plot is created.
I need this as i often have plots with many series. they are too many to plot at the same time, and i need to quickly and interactively select which series are visible. Ideally there will be a window with a list of series in the plot and checkboxes, where the series with the checked checkbox is visible.
Does anyone know if this has been already implemented somewhere?, if not then can someone guide me of how can i do it myself?
Thanks!
Omar

It all depends on how much effort you are willing to do and what the exact requirements are, but you can bet it has already been implemented somewhere :-)
If the aim is mainly to not clutter the image, it may be sufficient to use the built-in capabilities; you can find relevant code in the matplotlib examples library:
http://matplotlib.org/examples/event_handling/legend_picking.html
http://matplotlib.org/examples/widgets/check_buttons.html
If you really want to have a UI, so you can guard the performance by limiting the amount of plots / data, you would typically use a GUI toolbox such as GTK, QT or WX. Look here for some articles and example code:
http://eli.thegreenplace.net/2009/05/23/more-pyqt-plotting-demos/

A list with checkboxes will be fine if you have a few plots or less, but for more plots a popup menu would probably be better. I am not sure whether either of these is possible with matplotlib though.
The way I implemented this once was to use a slider to select the plot from a list - basically you use the slider to set the index of the series that should be shown. I had a few hundred series per dataset, so it was a good way to quickly glance through them.
My code for setting this up was roughly like this:
fig = pyplot.figure()
slax = self.fig.add_axes((0.1,0.05,0.35,0.05))
sl = matplotlib.widgets.Slider(slax, "Trace #", 0, len(plotlist), valinit=0.0)
def update_trace():
ax.clear()
tracenum = int(np.floor(sl.val))
ax.plot(plotlist[tracenum])
fig.canvas.draw()
sl.on_changed(update_trace)
ax = self.fig.add_axes((0.6, 0.2, 0.35, 0.7))
fig.add_subplot(axes=self.traceax)
update_trace()
Here's an example:

Now that plot.ly has opened sourced their libraries, it is a really good choice for interactive plots in python. See for example: https://plot.ly/python/legend/#legend-names. You can click on the legend traces and select/deselect traces.
If you want to embed in an Ipython/Jupyter Notebook, that is also straightforward: https://plot.ly/ipython-notebooks/gallery/

transforming coordinates in matplotlib

I'm trying to plot a series of rectangles and lines based on a tab delimited text file in matplotlib. The coordinates are quite large in the data and shown be drawn to scale -- except scaled down by some factor X -- in matplotlib.
What's the easiest way to do this in matplotlib? I know that there are transformations, but I am not sure how to define my own transformation (i.e. where the origin is and what the scale factor is) in matplotlib and have it easily convert between "data space" and "plot space". Can someone please show a quick example or point me to the right place?

If you simply use matplotlib's plot function, the plot will fit into one online window, so you don't really need to 'rescale' explicitly. Linearly rescaling is pretty easy, if you include some code sample to show your formatting of the data, somebody can help you in translating the origin and scaling the coordinates.

Change dynamically the contents of a matplotlib plot

I while ago, I was comparing the output of two functions using python and matplotlib. The result was as good as simple, since plotting with matplotlib is quite easy: I just plotted two arrays with different markers. Piece of cake.
Now I find myself with the same problem, but now I have a lot of pair of curves to compare. I initially tried plotting everything with different colors and markers. This did not satisfy me since the ranges of each curve are not quite the same. In addition to this, I quickly ran out of colors and markers that were sufficiently different to identify (RGBCMYK, after that, custom colors resemble any of the previous ones).
I also tried subplotting each pair of curves, obtaining a window with many plots. Too crowded.
I tried one window per plot, too many windows.
So I was just wondering if there is any existing widget or if you have any suggestion (or a different idea) to accomplish this:
I want to see a pair of curves and then select easily the next one, with a slidebar, button, mouse scroll, or any other widget or event. By changing curves, the previous one should disappear, the legend should change and its axis as well.

Well I managed to do it with an event handler for mouse clicks. I will change it for something more useful, but I post my solution anyway.
import matplotlib.pyplot as plt
figure = plt.figure()
# plotting
plt.plot([1,2,3],[10,20,30],'bo-')
plt.grid()
plt.legend()
def on_press(event):
print 'you pressed', event.button, event.xdata, event.ydata
event.canvas.figure.clear()
# select new curves to plot, in this example [1,2,3] [0,0,0]
event.canvas.figure.gca().plot([1,2,3],[0,0,0], 'ro-')
event.canvas.figure.gca().grid()
event.canvas.figure.gca().legend()
event.canvas.draw()
figure.canvas.mpl_connect('button_press_event', on_press)

Sounds like you want to embed matplotlib in an application. There are some resources available for that:
user interface examples
Embedding in WX

I really like using traits. If you follow the tutorial Writing a graphical application for scientific programming , you should be able to do what you want. The tutorial shows how to interact with a matplotlib graph using graphical user interface.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.