Does Bokeh's WebGL speed up heatmaps? - python

I have been reading a lot about Bokeh for visualisations of large datasets. I plan on plotting a heatmap with over 25 million points.
I saw read the page on speeding up WebGL and they mention that any plots with glyphs are accelerated.
Does the Heatmap plot use glyphs? Will there be any benefits in turning on WebGL for heatmap plots?

Pretty much everything that Bokeh draws is a glyph of some type. However, the text on that page you link actually states that "allows rendering some glyph types on graphics hardware." Currently (as of Bokeh 0.12.3) WebGL support only extends to scatter-type markers (e.g. circle, x, etc) and to lines. But HeatMap is implemented using the Rect glyph, so I would not expect WebGL to offer any improvement at the present time.
But I would add: It's good to thoroughly investigate any actual performance hotspots. Bokeh is really two libraries: a Python library and a JavaScript library. If you are seeing performance issues, are you sure it's on the JS side? For example, you have not said what your data sizes are. Are you sure it's not actually the binning/aggregation (that happens on the Python side) that is your issue?
Finally, if you have data sizes that are in the millions-to-billions of points range, you should probably be looking at the separate bokeh/datashader project.

Related

Simple 3D Graphics in Python

I am working in a project where 3D visualizations are important to see what is happening during the setup stage and perhaps for visual validation by making a short videos of what is happening.
The problem that I have is that 3D visualizations in Python are too sophisticated, and complicated to learn for what I need. I find that Mathematica is the perfect kind of software...but it is not portable and is very expensive.
Question
Is there any Python package similar to Mathematica?
Clarification
I don't want a "plotting" program, since plotting is not what I am looking for. I want to generate simple geometric shapes like spheres and cubes that can move around, this is more than enough. Give some coordinates, perhaps a rotation, and the program just shows the desired image(s) to export as a .png or make a quick video; as in Mathematica.
Packages like Pygame, Panda3D, Pyglet, etc., look too complicated and an overkill for what I need, as well as software like Blender, etc. Jupyter notebooks are similar, but they don't have the 3D graphics capabilities. I found a Python module named Fresnel, but it looks too sophisticated for what I need.
I have read several answers to this question here in Stack Overflow, but they seem outdated and not really what I am looking for.
Further Clarification
To draw spheres in Mathematica you do:
coordinates = Flatten[Table[Table[Table[ {i,j,k}, {k,1,10}], {j,1,10}], {i,1,10}],1]
spheres= Flatten[Table[Graphics3D[{Sphere[coordinates[[i]],0.5]}],{i,1,Length[coordinates]}]]
Show[{spheres}]
This is a simple quick and easy way of displaying a group of spheres. To use any program in Python to do the same, it seems like you must be an expert in 3D graphics to do this simple thing.
Programs that have capabilities of using Python scripts, like Blender, make it difficult to use the interface in a straight forward way (try doing the same in Blender, it will take a while just to learn the basics!).
I know several other user-friendly plotting libraries than matplotlib, but not a lot provide an interactive view. There is of course the well known vtk but it's not for end-user
plotly
For usage in a notebook, like jupyter and mathematica, you probably would go for plotly It's using a browser-based interface with plots very similar to mathematica
pymadcad
If you need a more offline version and what you are looking for is some view you can rotate/zoom/pan to look on your geometry by different sides, you can take a look at pymadcad
It even works with touchscreens.
It is not centered on 3D visualization, so it's a bit overkill to use it only for it, but for 3D curves, 3D surfaces, spheres and cubes as you said, it can do the job
simple plots with pymadcad:
from madcad import *
from madcad.rendering import Displayable
from madcad.displays import GridDisplay
# create a wire from custom points
s = 100
mycurve = Wire([ vec3(sin(t/s), cos(t/s), 0.1*t/s)
for t in range(int(s*6*pi)) ])
# create a sphere
mysphere = uvsphere(vec3(0), 0.5)
# display in a separated window
show([
mycurve, # displaying the curve
mysphere, # displaying the sphere
Displayable(GridDisplay, vec3(0)), # this is to have a 2D grid centered on origin
])
result:
(The window is dark because so is my system theme, but likely it will adapt to yours)

How to buffer pyplot plots

TL;DR: I want to do something like
cache.append(fig.save_lines)
....
cache.load_into(fig)
I'm writing a (QML) front-end for a pyplot-like and matplotlib based MCMC sample visualisation library, and hit a small roadblock. I want to be able to produce and cache figures in the background, so that when the user moves some sliders, the plots aren't re-generated (they are complex and expensive to re-compute) but just brought in from the cache.
In order to do that I need to be able to do the plotting (but not the rendering) offline and then simply change the contents of a canvas. Effectively I want to do something like cache the
line = plt.plot(x,y)
object, but for multiple subplots.
The library produces very complex plots so I can't keep track of the line2D objects and use those.
My attempt at a solution: render to a pixmap with the correct DPI and use that. Issues arise if I resize the canvas, and not want to re-scale the Pixmaps. I've had situations where the wonderful SO community came up with much better solutions than what I had in mind, so if anyone has experience and/or ideas for how to get this behaviour, I'd be very much obliged!

difference between datashader and other plotting libraries

I want to understand the clear difference between Datashader and other graphing libraries eg plotly/matplotlib etc.
I understand that in order to plot millions/billions of data points, we need datashader as other plotting libraries will hung up the browser.
But what exactly is the reason which makes datashader fast and does not hung up the browser and how exactly the plotting is done which doesnt put any load on the browser ????
Also, datashader doesnt put any load on browser because in the backend datashader will create a graph on the basis of my dataframe and send only the image to the browser which is why its fast??
Plz explain i am unable to understand the in and out clearly.
It may be helpful to first think of Datashader not in comparison to Matplotlib or Plotly, but in comparison to numpy.histogram2d. By default, Datashader will turn a long list of (x,y) points into a 2D histogram, just like histogram2d. Doing so only requires a simple increment of a grid cell for each new point, which is easily accellerated to machine-code speeds with Numba and is trivial to parallelize with Dask. The resulting array is then at most the size of your display screen, no matter how big your dataset is. So it's cheap to process in a separate program that adds axes, labels, etc., and it will never crash your browser.
By contrast, a plotting program like Plotly will need to convert each data point into a JSON or other serialized representation, pass that to JavaScript in the browser, have JavaScript draw a shape into a graphics buffer, and make each such shape support hover and other interactive features. Those interactive features are great, but it means Plotly is doing vastly more work per data point than Datashader is, and requires that the browser can hold all those data points. The only computation Datashader needs to do with your full data is to linearly scale the x and y locations of each point to fit the grid, then increment the grid value, which is much easier than what Plotly does.
The comparison to Matplotlib is slightly more complicated, because with an Agg backend, Matplotlib is also pre-rendering to a fixed-size graphics buffer before display (somewhat like Datashader). But Matplotlib was written before Numba and Dask (making it more difficult to speed up), it still has to draw shapes for each point (not just a simple increment), it can't fully parallelize the operations (because later points overwrite earlier ones in Matplotlib), and it provides anti-aliasing and other nice features not available in Datashader. So again Matplotlib is doing a lot more work than Datashader.
But if what you really want to do is see the faithful 2D distribution of billions of data points, Datashader is the way to go, because that's really all it is doing. :-)
From the datashader docs,
datashader is designed to "rasterize" or "aggregate" datasets into regular grids that can be viewed as images, making it simple and quick to see the properties and patterns of your data. Datashader can plot a billion points in a second or so on a 16GB laptop, and scales up easily to out-of-core or distributed processing for even larger datasets.
There aren't any tricks going on in any of these libraries - rendering a huge number of points takes a long time. What datashader does is to shift the burden of visualization from rendering to computing. There's a very good reason you have to create a canvas before plotting instructions in datashader. The first step in a datashader pipeline is to rasterize a dataset, in other words, it approximates the position of each piece of data and then uses aggregation functions to determine the intensity or color of each pixel. This allows datashader to plot enormous numbers of points; even more points than can be held in memory.
Matplotlib, on the other hand, renders every single point you instruct it to plot, making plotting large datasets time consuming or even impossible.

Stacked heatmaps - seaborn solution?

EDIT: For some reason I've been downvoted twice for posting this question (it hurts ppl) so I've rejigged it.
How do you combine multiple heatmaps in a stacked way with same color scale like to following image?
Additionally, does anyone know how to create the Augmented suffix tree?
Background:
I've worked through the python jupyter notebooks at the following link on how to create the heatmaps of (any) daily consumption profiles using seaborn
http://www.datadrivenbuilding.org/
...however there's a realllllllly cool combination graphic I'd love to be able to reproduce.
That image is an edited version of an image from this paper:
C. Miller, Z. Nagy, A. Schlueter, Automated daily pattern filtering of
measured building performance data, Automation in Construction 49,
Part A (2015) 1–17. doi:10.1016/j.autcon.2014.09.004. URL
http://www.sciencedirect.com/science/article/pii/S0926580514002015
They came up with the visualisation techniques themselves and describe them there. It looks like C. Miller is the one who wrote the notebook that you already found that shows how to draw the stacked heatmaps.
The augmented suffix tree is a type of visualization called a Sankey Diagram. You can plot these very beautifully using Plotly for example, or pySankey if you want to use matplotlib.

Is there an equivalent to the Matlab figure window in Python (with all tools)?

I'm just wondering if it exists an equivalent to the Matlab figure window in Python where we can modify plots directly from the figure window, or add some features (text, box , arrow, and so on), or make curve fitting, etc.
Matplotlib is good, but it is not as high-level as the Matlab figure. We need to code everything and if we want to modify plots, we need to modify the code directly (except for some basic stuffs like modifing the line color)
With matplotlib, you will indeed remain in the "code it all" workflow. This is not directly the answer you expect but the matplotlib documentation recently gained a very instructive figure that will probably help you if you stay with matplotlib: http://matplotlib.org/examples/showcase/anatomy.html shows the "anatomy" of the figure with all the proper designations for the parts of the figure.
Overall, I could always find examples of what I needed in their excellent gallery http://matplotlib.org/gallery.html
In my opinion, you'll save time by coding these customizations instead of doing them by hand. You may indeed feel otherwise but if not there is a ton of examples of matplotlib code on SO, in their docs and a large community of people around it :-)

Categories