Efficient visualisation in Python - python

I have data (generated by an algorithm I wrote for it) for a random process which consists of coalescing and branching random walks on a finite space that I would like to visualize using python and probably something from matplotlib.
The data look like this:
A list of lists the of states of the process at times when something changes (a walk moves to an empty spot, coalesces with another one or a new particle is born), so something like this (let's say the process lives on {0,1,2,3,4}:
[[0,1,2,0,2],...,[1,0,2,2,0]], so at the beginning I start with the process having particles at positions 1,2 and 4 (there are two different kinds of particles so that "1" indicates the presence of a first type and "2" of the second, whole "0" means nothing there)
And I have also the list of events that alter the process, so a list of lists of the form
[place,time,type]
so I know what happens where and at what time (which corresponds to writing appropriate marks in the graphical representation, for example an arrow to the left if the event was that a particle moved to the left).
I wrote something like this :
import pylab as P
P.plot(-spacebound,0,spacebound,maxtime)
while something in the process:
current=listofevents.pop(0)
for i that are nonempty at current time:
P.arrow() in a way corresponding to the data
P.show()
This works, but it is extremely slow so that if I have a big process it takes an enormous amount of time to make this visualization (while generating the process data takes a few seconds at most for rather extreme parameters - a big space, time and a high rate of particle births which means a a lot of events changing the process often).
I am pretty sure using arrows like this is pretty idiotic, but since I've only visualized things in R so far (I could of course simply export my data from python and visualize them in R but I want to avoid that) I am also very green at doing this in Python.
I tried some googling, found out about matplotlib and looked at some tutorials there and apart from the arrows I also tried just visualizing the states of the process (without the events) by looping plt.scatter() over all the states, but while this is slightly faster, it is still extremely slow and it also looks messy.
So how would I plot this in a sensible way? Even a link to something like "learn to do plotting in Python properly" is welcome as an answer. Thanks!

matplotlib is not for interactive plotting. It used for generating a article-quality plots. For interactive plots you could try to use Chaco or other libs. The Chaco ideology is to create a plot and link it with the data. As you update the data you get your chart updated automatically.

Related

Python interactive plotting for large data sets

Suppose I have a dataset with 100k rows (1000 different times, 100 different series, an observation for each, and auxilliary information). I'd like to create something like the following:
(1) first panel of plot has time on x axis, and average of the different series (and standard error) on y axis.
(2) based off the time slice (vertical line) we hover over in panel 1, display a (potentially down sampled) scatter plot of auxilliary information versus the series value at that time slice.
I've looked into a few options for this: (1) matplotlib + ipywidgets doesn't seem to handle it unless you explicitly select points via a slider. This also doesn't translate well to html exporting. This is not ideal, but is potentially workable. (2) altair - this library is pretty sleek, but from my understanding, I need to give it the whole dataset for it to handle the interactions, but it also can't handle more than 5kish data points. This would preclude my use case, correct?
Any suggestions as to how to proceed? Is what I'm asking impossible in the current state of things?
You can work with datasets larger than 5k rows in Altair, as specified in this section of the docs.
One of the most convenient solutions in my opinion is to install altair_data_server and then add alt.data_transformers.enable('data_server') on the top of your notebooks and scripts. This server will provide the data to Altair as long as your Python process is running so there is no need to include all the data as part of the created chart specification, which means that the 5k error will be avoided. The main drawback is that it wont work if you export to a standalone HTML because you rely on being in an environment where the server Python process is running.

How to buffer pyplot plots

TL;DR: I want to do something like
cache.append(fig.save_lines)
....
cache.load_into(fig)
I'm writing a (QML) front-end for a pyplot-like and matplotlib based MCMC sample visualisation library, and hit a small roadblock. I want to be able to produce and cache figures in the background, so that when the user moves some sliders, the plots aren't re-generated (they are complex and expensive to re-compute) but just brought in from the cache.
In order to do that I need to be able to do the plotting (but not the rendering) offline and then simply change the contents of a canvas. Effectively I want to do something like cache the
line = plt.plot(x,y)
object, but for multiple subplots.
The library produces very complex plots so I can't keep track of the line2D objects and use those.
My attempt at a solution: render to a pixmap with the correct DPI and use that. Issues arise if I resize the canvas, and not want to re-scale the Pixmaps. I've had situations where the wonderful SO community came up with much better solutions than what I had in mind, so if anyone has experience and/or ideas for how to get this behaviour, I'd be very much obliged!

Can I truncate or "zoom-in" on a section of a pyplot figure before calling show, to avoid exceeding the complexity limit?

I don't have a specific code example because this is something I do a lot in an interactive session, so I'm sort of looking for a more general answer to my question.
So say I want to look at two large datasets to check of they are synchronized:
plt.plot(Big_List_X)
plt.plot(list_Y_to_X_index, Big_List_Y)
plot.show()
Where list_Y_to_X_index is a list that links Big_List_Y to Big_List_X. This causes a rendering complexity error.
This is what my workaround looks like:
plt.plot(Big_List_X[0::100])
plt.plot(list_Y_to_X_index[0::100], Big_List_Y)
plot.show()
In this case I loose precession and can't really see what's happening up close.
Alternatively:
plt.plot(Big_List_X[some_lower_boundry:some_upper_boundry])
plt.plot(list_Y_to_X_index[some_lower_boundry:some_upper_boundry], Big_List_Y)
plot.show()
This only works if I have some sort of index linking one list to the other. And I feel like it's a bit inelegant. Regardless of what I do, I still end up playing a guessing game do see how much data I can look a in a window without hitting the limit.
Is there anything I can do like:
plt.show(range = [some_lower_bound, some_upper_bound])
And am I stuck playing the guessing game of how many points I can see at once?

matplotlib, draw multiple graphs / points in figure

I am trying to develop a telemetry system which sends sensor data from an Arduino, plotted in realtime. For this I'm using Python and the matplotlib-library. My problem is that every time a new data point arrives, I want to add that data point by plotting it into the same figure as the other data points. So far I could not find a solution to this.
You can stream data from an Arduino into a Plotly graph with the Arduino API in Plotly. You have two options: continuously transmit data (which it sounds like you'll want to do), or transmit a single chunk.
It will update the graph every few seconds if you refresh the page.
The Arduino API is available here. And, if you're already using Python, you can use the extend option to update data into another plot. The Python API is here.
Here's an example of how it looks to transmit from an Arduino, and you can see the interactive version here
Full disclosure: I work at Plotly.
as far as I can see, you have a few different ways of doing this (i'll list them in what I consider increasing difficulty
Making a bitmap file, eg .png, which has to be regenerated each time a new datapoint arrives. To do this you need to have your old data stored somewhere in a file or in a database.
Using svg in a browser. Then you can add on points or lines using javascript (e.g. http://sickel.net/blogg/?p=1506 )
Make a bitmap, store it and edit it to add in new points - this really gets tricky if you either wants to "roll old points off" at one end, or rescale the image when more data arrives.
Make a series of bitmaps, and have the total graph as a combination of a lot of slices. - here you can easily "roll off" old points, but you are out of luck if you want to rescale.

How to visualize IP addresses as they change in python?

I've written a little script that collects my external IP address every time I open a new terminal window and appends it, at well as the current time, to a text file. I'm looking for ideas on a way to visualize when/how often my IP address changes. I bounce between home and campus and could separate them using the script, but it would be nice to visualize them separately.
I frequently use matplotlib. Any ideas?
Plot your IP as a point on the xkcd internet map (or some zoomed in subset of the map, to better show different but closely neighboring IPs).
Plot each point "stacked" proportional to how often you've had that IP, and color the IPs to make more recent points brighter, less recent points proportionally darker.
"When" is one dimensional temporal data, which is well shown by a timeline. At larger timescales, you'd probably lose the details, but most any plot of "when" would have this defect.
For "How often", a standard 2d (bar) plot of time vs frequency, divided into buckets for each day/week/month, would be a standard way to go. A moving average might also be informational.
You could combine the timeline & bar plot, with the timeline visible when you're zoomed in & the frequency display when zoomed out.
How about a bar plot with time on the horizontal axis where the width of each bar is the length of time your computer held a particular IP address and the height of each bar is inversely proportional to the width? That would also give a plot of when vs how often plot.
You could also interpret the data as a pulse density modulated signal, like what you get on a SuperAudio CD. You could graph this or even listen to the data. As there's no obvious time length for an IP change event, the length of a pulse would be a tunable parameter. Along similar lines, you could view the data as a square wave (triangular wave, sawtooth &c), where each IP change event is a level transition. Sounds like a fun Pure Data project.
There's a section in the matplotlib user guide about drawing bars on a chart to represent ranges. I've never done that myself but it seems appropriate for what you're looking for.
Assuming you specified terminal, i'll assume you are on a UNIX variant system. Using the -f switch along with the command line utility tail can allow you to constantly monitor the end of a file. You could also use something like IBM's inotify, which can monitor file changes or dnotify (and place the file in it's own directory) which usually comes standard on most distributions (you can then call tail -n 1 to get the last line). Once the line changes, you can grab the current system time since epoch using Python's time.time() and subtract it from the time of the last change, then plot this difference using matplotlib. I assume you could categorize the times
into ranges to make the graphing easier on yourself. 1 Bar for less than 1 hour change intervals, another for changes between 1 - 5 hours, and so on.
There is a Python implementation of tail -f located here if you don't want to use it directly. Upon a detection of a change in the file, you could perform the above.

Categories