Python/Jupyter: scroll scatter-plot with many data points horizontally

Python/Jupyter: scroll scatter-plot with many data points horizontally - python

I want to visualize time-series-like data with several measurements over time.
There are a lot of such measurements in a dataset, in the order of tens to hundreds of thousands.
In order to view these in a notebook or HTML page, I would like some efficient method to show a subrange of the whole time range with just a view hundred to thousand database and have controls to scroll lef/right i.e. forward/backward in time through the data.
I have tried doing this with Plotly and a range slider, but unfortunately this does not scale to a lot of data at all. Apparently, this approach creates all the graph data in the output javascript, which slows down everything and at some point makes the browser hang or crash.
What I would need is an approach that actually only renders the data in the subrange and interacts with the python code via the scrolling widgets to update the view.
Ideally, this would work with Plotly as I am using it for all other visualizations, but any other efficient solution would also be welcome.

Plotly runs into rendering issues when there are too many data points within the window (see Plotly Benchmarks). I would suggest using Plotly-Resampler which resamples data that is within the user's view.

Related

Python interactive plotting for large data sets

Suppose I have a dataset with 100k rows (1000 different times, 100 different series, an observation for each, and auxilliary information). I'd like to create something like the following:
(1) first panel of plot has time on x axis, and average of the different series (and standard error) on y axis.
(2) based off the time slice (vertical line) we hover over in panel 1, display a (potentially down sampled) scatter plot of auxilliary information versus the series value at that time slice.
I've looked into a few options for this: (1) matplotlib + ipywidgets doesn't seem to handle it unless you explicitly select points via a slider. This also doesn't translate well to html exporting. This is not ideal, but is potentially workable. (2) altair - this library is pretty sleek, but from my understanding, I need to give it the whole dataset for it to handle the interactions, but it also can't handle more than 5kish data points. This would preclude my use case, correct?
Any suggestions as to how to proceed? Is what I'm asking impossible in the current state of things?

You can work with datasets larger than 5k rows in Altair, as specified in this section of the docs.
One of the most convenient solutions in my opinion is to install altair_data_server and then add alt.data_transformers.enable('data_server') on the top of your notebooks and scripts. This server will provide the data to Altair as long as your Python process is running so there is no need to include all the data as part of the created chart specification, which means that the 5k error will be avoided. The main drawback is that it wont work if you export to a standalone HTML because you rely on being in an environment where the server Python process is running.

Choroplethmapbox slow to render?

I’ve been playing around with Plotly and Dash for the first time over the past few days, with the hope of developing a browser-based data explorer for geographic NetCDF4 data. I’ve been impressed at how straightforward this has been so far, however I’m finding that some interactions with choroplethmapbox are taking longer to update and render than expected. I believe this may be the same issue discussed here
The following refers to the code and sample data available here, where the Dash application can be run using:
python choropleth.py (Python 3.7).
The source of my data comes from a 4D NetCDF4 file (in this case a model of ocean temperature - temp.nc) with dimensions of time, depth, lat and lon. In my case I’m only plotting a 2D chloropleth map, but I’d like the user to interactively select the desired time interval (and eventually depth) as well (the render will always be in 2D space).
Using the examples from here, I’m using a GeoJSON file of the 2D grid cells coupled with a Pandas DataFrame to render ocean temperature. Everything is working as expected, however any changes to the slider value (time) take a long time to update (approx six seconds on my machine). It appears as though there’s a second or so between selecting the slider value and running the update_figure() callback, then another 4-5 seconds before the new render starts to take place in the browser.
The update_figure() callback reads the requested data directly from the NetCDF4 file, then directly updates the Z values in the existing figure dictionary and returns this as a new figure (see code fragment below). At first I was concerned that the slow response time was due to reading from the NetCDF4, however a basic timing function shows that the update_figure() callback runs in less than 0.01 seconds in most cases. So it appears the delay is either coming from the #app.callback or the render function (post update_figure()) in Dash?
# Create the callback and callback function (update_figure)
#app.callback(Output('plot', 'figure'),
[Input('slide', 'value')],
[State('plot','relayoutData'),State('plot', 'figure')])
def update_figure(x,r,f):
t0 = tme.time()
f['layout']['mapbox']['center']['lat'] = f['layout']['mapbox']['center']['lat']
f['layout']['mapbox']['center']['lon'] = f['layout']['mapbox']['center']['lon']
f['layout']['mapbox']['zoom'] = f['layout']['mapbox']['zoom']
# If the map window has been panned or zoomed, grab those values for the new figure
if r is not None:
if 'mapbox.center' in r:
f['layout']['mapbox']['center']['lat'] = r['mapbox.center']['lat']
f['layout']['mapbox']['center']['lon'] = r['mapbox.center']['lon']
f['layout']['mapbox']['zoom'] = r['mapbox.zoom']
# Extract the new time values from the NetCDF file
tmp = nc['temp'][x, -1, :, :].values.flatten()
# Repace the Z values in the original figure with the updated values, leave everything else (e.g. cell geojson and max/min ranges) as-is
f['data'][0]['z'] = np.where(np.isnan(tmp), None, tmp).tolist()
print("update_figure() time: ",tme.time()-t0)
return f
I suspect that the slow render times are somehow related to the GeoJSON of each cell polygon (47k grid cell polygons are being rendered in total, with each polygon being defined by 6 points (i.e. 284k points total)), and unfortunately this can’t be simplified any further.
I'm seeking suggestions on how I can speed up the update/render when a user is interacting with the application. Two ideas I've had include:
Utilising WebGL if possible? It's unclear to me from the documentation whether choroplethmapbox already uses WebGL? If not, is there a pathway for making use of this for faster rendering?
Implementing some form of client side callback, although I don't know if this is possible given that I need to read the values directly out of the NetCDF file when requested by the user? Perhaps it's possible to just read/return the new Z values, then merge that with the existing GeoJSON on the client side?
Suggestions appreciated.

Is there a way to access the Matplotlib render method and change it in Python?

This is an open ended question. I used a tkinter program to display large sets of data in matplotlib, however I want to make it run faster. So what I thought about is to only update the data in the plot as the user guides their mouse along the x and y axis. Is there a way to override the matplotlib module such that I can render data as the mouse pans across the axis?
For example, I have the current view within y = [0,1], however data exists at y = 2. I can make an algorithm to only pass data within that range in order to pass to the data analytics algorithm for display. I just want to know what possible steps exists in order to dynamically render data in matplotlib instead of having it all there at once. Perhaps using the navigational tool, it can render when the mouse is released? The data is quite large.

Trigger update of Bokeh plot upon new data point in stream

I am trying to create a plot in Bokeh to visualize data from a live data-feed. I'm fairly new to Bokeh at this point. The data stream is a stream of files where the data for the plots first requires to be extracted and then pre-processed before being visualized. This part is currently handled by the python watchdog package, where the processing is triggered upon the appearance of a new file in the streams that are monitored.
The output of this is a dictionary holding all the information for this particular datapoint needed in the plots handled by the Bokeh app.
My question is, how would I trigger an update of the Bokeh plot when a new datapoint arrives?
I had looked into add_periodic_callback, but since I do not know up front when a new datapoint will arrive, nor how much time there will be between them, I risk missing data in the plot. What would be the best way to solve this?
1) Use functionality "x" in Bokeh that I am unaware of, that will trigger an update of the ColumnDataSource and the actual plots, exactly when a new datapoint arrives (this would be my preferred solution).
2) Create a form of buffer data source that stores the data for the past NN files and then use add_periodic_callback to a function that queries this source to update the ColumnDataSource
3) Another solution than the two above, that I don't know of with my limited software development skills.

matplotlib, draw multiple graphs / points in figure

I am trying to develop a telemetry system which sends sensor data from an Arduino, plotted in realtime. For this I'm using Python and the matplotlib-library. My problem is that every time a new data point arrives, I want to add that data point by plotting it into the same figure as the other data points. So far I could not find a solution to this.

You can stream data from an Arduino into a Plotly graph with the Arduino API in Plotly. You have two options: continuously transmit data (which it sounds like you'll want to do), or transmit a single chunk.
It will update the graph every few seconds if you refresh the page.
The Arduino API is available here. And, if you're already using Python, you can use the extend option to update data into another plot. The Python API is here.
Here's an example of how it looks to transmit from an Arduino, and you can see the interactive version here
Full disclosure: I work at Plotly.

as far as I can see, you have a few different ways of doing this (i'll list them in what I consider increasing difficulty
Making a bitmap file, eg .png, which has to be regenerated each time a new datapoint arrives. To do this you need to have your old data stored somewhere in a file or in a database.
Using svg in a browser. Then you can add on points or lines using javascript (e.g. http://sickel.net/blogg/?p=1506 )
Make a bitmap, store it and edit it to add in new points - this really gets tricky if you either wants to "roll old points off" at one end, or rescale the image when more data arrives.
Make a series of bitmaps, and have the total graph as a combination of a lot of slices. - here you can easily "roll off" old points, but you are out of luck if you want to rescale.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python/Jupyter: scroll scatter-plot with many data points horizontally - python

Plotly runs into rendering issues when there are too many data points within the window (see Plotly Benchmarks). I would suggest using Plotly-Resampler which resamples data that is within the user's view.

Related

Python interactive plotting for large data sets

Choroplethmapbox slow to render?

Is there a way to access the Matplotlib render method and change it in Python?

Trigger update of Bokeh plot upon new data point in stream

matplotlib, draw multiple graphs / points in figure

Categories

Resources