live plotting in matplotlib while performing a measurement that takes time - python

I would like to perform a measurement and plot a graph while the measurement is
running. This measurements takes quite some time in python (it has to retrieve data over a slow connection). The problem is that the graph freezes when measuring. The measurement
consists of setting a center wavelength, and then measuring some signal.
My program looks something like this:
# this is just some arbitrary library that has the functions set_wavelength and
# perform_measurement
from measurement_module import set_wavelength, perform_measurement
from pylab import *
xdata = np.linspace(600,1000,30) # this will be the x axis
ydata = np.zeros(len(xdata)) # this will be the y data. It will
for i in range(len(xdata)):
# this call takes approx 1 s
set_wavelength(xdata[i])
# this takes approx 10 s
ydata[i] = perform_measurement(xdata)
# now I would like to plot the measured data
plot(xdata,ydata)
draw()
This will work when it is run in IPython with the -pylab module switched on,
but while the measurement is running the figure will freeze. How can modify
the behaviour to have an interactive plot while measuring?
You cannot simply use pylab.ion(), because python is busy while performing the measurements.
regards,
Dirk

You can, though maybe a bit awkward, run the data-gathering as a serparate process. I find Popen in the subprocess module quite handy. Then let that data-gathering script save what it does to disk somewhere and you use
Popen.poll()
To check if it has completed.
It ought to work.

I recommend buffering the data in large chunks and render/re-render when the buffer fills up. If you want it to be nonblocking look at greenlets.
from gevent.greenlet import Greenlet
import copy
def render(buffer):
'''
do rendering stuff
'''
pass
buff = ''
while not_finished:
buff = connection.read()
g = Greenlet(render, copy.deepcopy(buff))
g.start()

Slow input and output is the perfect time to use threads and queues in Python. Threads have there limitations, but this is the case where they work easily and effectively.
Outline of how to do this:
Generally the GUI (e.g., the matplotlib window) needs to be in the main thread, so do the data collection in a second thread. In the data thread, check for new data coming in (and if you do this in some type of infinite polling loop, put in a short time.sleep to release the thread occasionally). Then, whenever needed, let the main thread know that there's some new data to be processed/displayed. Exactly how to do this depends on details of your program and your GUI, etc. You could just use a flag in the data thread that you check for from the main thread, or a theading.Event, or, e.g., if you have a wx backend for matplotlib wx.CallAfter is easy. I recommend looking through one of the many Python threading tutorials to get a sense of it, and also threading with a GUI usually has a few issues too so just do a quick google on threading with your particular backend. This sounds cumbersome as I explain it so briefly, but it's really pretty easy and powerful, and will be smoother than, e.g., reading and writing to the same file from different processes.

Take a look at Traits and Chaco, Enthought's type system and plotting library. They provide a nice abstraction to solve the problem you're running into. A Chaco plot will update itself whenever any of its dependencies change.

Related

Sample data for an ADC

I applied the following code (https://circuitpython.readthedocs.io/projects/ads1x15/en/latest/examples.html) to read voltage.
In the last line of python code, I have set time.sleep command as (1/3300 s)
I have following queries:
In the time column, time-step comes out to be approximately (0.02 s). However expected time-step is (1/3300)s. Why does this occur?
How do I ensure that the time-step i.e sampling frequency between two successive time data points remains exactly at 3300 Hz. ?
How do I ensure that 1st time-data point starts with "0"?
Can somebody please clarify my doubts!
The sampling rate of the ADS1015 is meant to be 3300S/sec only in continuous mode, and sampling one channel at a time.
There are 2 steps here:
Ensure your ADC is in continuous sampling mode.
Putting it in continuous mode would be something like "adc.mode = 0", provided your library supports it. I have used this one https://github.com/adafruit/Adafruit_ADS1X15 and it does support it.
Ensure that the Data rate in the config register is set to 3300. (page 16 on the datasheet at https://cdn-shop.adafruit.com/datasheets/ads1015.pdf)
Purely that would also mostly not be enough, getting to the full potential of the ADC would also need a compatible processor that can handle large amounts of data on its i2c bus. Something like a raspberry pi is mostly not powerful enough.
Using faster languages like C/C++ would also help.
You have at least 3 problems and need to read the time module docs.
time.time is not guaranteed to be accurate to more than a second. In the following, in IDLE Shell on Win 10, multiple time.time() calls give same time.
>>> for i in range(30):
print(time.perf_counter(), time.time(), time.perf_counter())
8572.4002846 1607086901.7035756 8572.4002855
8572.4653746 1607086901.756807 8572.4653754
8572.4706208 1607086901.7724454 8572.4706212
8572.4755909 1607086901.7724454 8572.4755914
8572.4806756 1607086901.7724454 8572.4806759
... # time.time continues repeating 3 or 4 times.
time.sleep(t) has a minimum system-dependent interval, even if t is much smaller. On Windows, it is about .015 seconds. There is no particular upper limit if there is other system activity.
>>> for i in range(5):
print(time.perf_counter())
time.sleep(.0000001)
9125.1041623
9125.1188101
9125.134417
9125.1565579
9125.1722012
Printing to IDLE's shell is slower than running a program direct with Python (from command line) and printing to the system console. For one thing, IDLE runs user code in a separate process, adding interprocess overhead. For another, IDLE is a GUI program and the GUI framework, tk via tkinter, add more overhead. IDLE is designed for learning Python and developing Python programs. It is not optimized for running Python programs.
If user code outputs to a tkinter GUI it creates in the same process, avoiding the interprocess delay, the minimum interval is much shorter, about .0012 seconds in this particular example.
>>> import tkinter as tk
>>> r = tk.Tk()
>>> t = tk.Text(r)
>>> t.pack()
>>> for i in range(5):
t.insert('insert', f'{time.perf_counter()}\n')
r.update()
# In the text widget...
9873.6484271
9873.6518752
9873.6523338
9873.6527421
9873.6532307

How can I measure the coverage (in production system)?

I would like to measure the coverage of my Python code which gets executed in the production system.
I want an answer to this question:
Which lines get executed often (hot spots) and which lines are never used (dead code)?
Of course this must not slow down my production site.
I am not talking about measuring the coverage of tests.
I assume you are not talking about test suite code coverage which the other answer is referring to. That is a job for CI indeed.
If you want to know which code paths are hit often in your production system, then you're going to have to do some instrumentation / profiling. This will have a cost. You cannot add measurements for free. You can do it cheaply though and typically you would only run it for short amounts of time, long enough until you have your data.
Python has cProfile to do full profiling, measuring call counts per function etc. This will give you the most accurate data but will likely have relatively high impact on performance.
Alternatively, you can do statistical profiling which basically means you sample the stack on a timer instead of instrumenting everything. This can be much cheaper, even with high sampling rate! The downside of course is a loss of precision.
Even though it is surprisingly easy to do in Python, this stuff is still a bit much to put into an answer here. There is an excellent blog post by the Nylas team on this exact topic though.
The sampler below was lifted from the Nylas blog with some tweaks. After you start it, it fires an interrupt every millisecond and records the current call stack:
import collections
import signal
class Sampler(object):
def __init__(self, interval=0.001):
self.stack_counts = collections.defaultdict(int)
self.interval = interval
def start(self):
signal.signal(signal.VTALRM, self._sample)
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
def _sample(self, signum, frame):
stack = []
while frame is not None:
formatted_frame = '{}({})'.format(
frame.f_code.co_name,
frame.f_globals.get('__name__'))
stack.append(formatted_frame)
frame = frame.f_back
formatted_stack = ';'.join(reversed(stack))
self.stack_counts[formatted_stack] += 1
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
You inspect stack_counts to see what your program has been up to. This data can be plotted in a flame-graph which makes it really obvious to see in which code paths your program is spending the most time.
If i understand it right you want to learn which parts of your application is used most often by users.
TL;DR;
Use one of the metrics frameworks for python if you do not want to do it by hand. Some of them are above:
DataDog
Prometheus
Prometheus Python Client
Splunk
It is usually done by function level and it actually depends on application;
If it is a desktop app with internet access:
You can create a simple db and collect how many times your functions are called. For accomplish it you can write a simple function and call it inside every function that you want to track. After that you can define an asynchronous task to upload your data to internet.
If it is a web application:
You can track which functions are called from js (mostly preferred for user behaviour tracking) or from web api. It is a good practice to start from outer to go inner. First detect which end points are frequently called (If you are using a proxy like nginx you can analyze server logs to gather information. It is the easiest and cleanest way). After that insert a logger to every other function that you want to track and simply analyze your logs for every week or month.
But you want to analyze your production code line by line (it is a very bad idea) you can start your application with python profilers. Python has one already: cProfile.
Maybe make a text file and through your every program method just append some text referenced to it like "Method one executed". Run the web application like 10 times thoroughly as a viewer would and after this make a python program that reads the file and counts a specific parts of it or maybe even a pattern and adds it to a variable and outputs the variables.

How to measure the sample rate in python when getting data from a ADC?

I am working in a project using Raspberry Pi 3 B where I get data from a IR sensor(Sharp GP2Y0A21YK0F) through a ADC MPC3008 and display it in real-time using PyQtgraph library.
However, it seems that I am getting very few samples and the graph is not "smooth" as I expect.
I am using the Adafruit Python MCP3008 Library and the function mcp.read_adc(0) to get the data.
Is there a way to measure the sample rate in Python?
Thank you
Hugo Oliveira
I would suggest setting up some next level buffering, ideally via multiprocessing (see multiprocessing and GUI updating - Qprocess or multiprocessing?) to better get a handle on how fast you can access the data. Currently you're using a QTimer to poll with, which is only getting 3 raw reads every 50 msec... so you're REALLY limiting yourself artificially via the timer. I haven't used the MCP3008, but a quick look at some at their code seems like you'll have to set up some sample testing to try some things out, or investigate further for better documentation. The question is the behavior of the mcp.read_adc(0) method and is it blocking or non-blocking... if non-blocking, does it return stale data if there's no new data, ... etc. It would be ideal if it was blocking from a timing sense, you could just set up a loop on it and time delta each successive return to determine how fast you're able to get new samples. If it's non-blocking, you would want it to return null for no new samples, and only return the actual samples that were new if it does return something. You'll have to play around with it and see how it behaves. At any rate, once you get the secondary thread set up to just poll the mcp.read_adc(0), then you can use the update() timer to collect the latest buffer and plot it. I also don't know the implications of multi-threading / multiprocessing on the RaspPI (see general discussion here: Multiprocessing vs Threading Python) , but anything should be better than the QTimer polling.

Python minecraft pyglet glClear() skipping frames

I recently downloaded fogleman's excellent "Minecraft in 500 lines" demo from https://github.com/fogleman/Craft. I used the 2to3 tool and corrected some details by hand to make it runnable under python3. I am now wondering about a thing with the call of self.clear() in the render method. This is my modified rendering method that is called every frame by pyglet:
def on_draw(self):
""" Called by pyglet to draw the canvas.
"""
frameStart = time.time()
self.clear()
clearTime = time.time()
self.set_3d()
glColor3d(1, 1, 1)
self.model.batch.draw()
self.draw_focused_block()
self.set_2d()
self.draw_label()
self.draw_reticle()
renderTime = time.time()
self.clearBuffer.append(str(clearTime - frameStart))
self.renderBuffer.append(str(renderTime - clearTime))
As you can see, I took the execution times of self.clear() and the rest of the rendering method. The call of self.clear() calls this method of pyglet, that can be found at .../pyglet/window/__init__.py:
def clear(self):
'''Clear the window.
This is a convenience method for clearing the color and depth
buffer. The window must be the active context (see `switch_to`).
'''
gl.glClear(gl.GL_COLOR_BUFFER_BIT | gl.GL_DEPTH_BUFFER_BIT)
So I basically do make a call to glClear().
I noticed some frame drops while testing the game (at 60 FPS), so I added the above code to measure the execution time of the commands, and especially that one of glClear(). I found out that the rendering itself never takes longer than 10 ms. But the duration of glClear() is a bit of a different story, here is the distribution for 3 measurements under different conditions:
Duration of glClear() under different conditions.
The magenta lines show the time limit of a frame. So everything behind the first line means there was a frame drop.
The execution time of glClear() seems to have some kind of "echo" after the first frame expires. Can you explain me why? And how can I make the call faster?
Unfortunately I am not an OpenGL expert, so I am thankful for every advice guys. ;)
Your graph is wrong. Well, at least it's not a suitable graph for the purpose of measuring performance. Don't ever trust a gl* function to execute when you tell it to, and don't ever trust it to execute as fast as you'd expect it to.
Most gl* functions aren't executed right away, they couldn't be. Remember, we're dealing with the GPU, telling it to do stuff directly is slow. So, instead, we write a to-do list (a command queue) for the GPU, and dump it into VRAM when we really need the GPU to output something. This "dump" is part of a process called synchronisation, and we can trigger one with glFlush. Though, OpenGL is user friendly (compared to, say, Vulkan, at least), and as such it doesn't rely on us to explicitly flush the command queue. Many gl* functions, which exactly depending on your graphics driver, will implicitly synchronise the CPU and GPU state, which includes a flush.
Since glClear usually initiates a frame, it is possible that your driver thinks it'd be good to perform such an implicit synchronisation. As you might imagine, synchronisation is a very slow process and blocks CPU execution until it's finished.
And this is probably what's going on here. Part of the synchronisation process is to perform memory transactions (like glBufferData, glTexImage*), which are probably queued up until they're flushed with your glClear call. Which makes sense in your example; the spikes that we can observe are probably the frames after you've uploaded a lot of block data.
But bear in mind, this is just pure speculation, and I'm not a total expert on this sort of stuff either, so don't trust me on the exact details. This page on the OpenGL wiki is a good resource on this sort of thing.
Though one thing is certain, your glClear call does not take as long as your profiler says it does. You should be using a profiler dedicated to profiling graphics.

Update live plot in SLOW loop without plot greying out

Sometimes, I do live updates to a plot in a loop. Normally, this works fine, but when the processing within the loop takes a long time, the plot 'greys out'/sleeps, for all but the first 10 seconds of this time. This can be quite annoying, as it makes it typically not possible to distinguish the curves (I could use dotted lines of course, but...). I'm using Ubuntu, and about 10 seconds is the threshold where this starts to happen for me.
Below is some toy code to reproduce the problem, and some pictures to demonstrate what happens.
Is there an easy way to prevent this 'greying out' behaviour?
import numpy as np
import pylab as p
import time
def create_data(i):
time.sleep(10) # INCREASE THIS VALUE TO MAKE THE PLOT GREY OUT WHILE IT WAITS
return np.sin(np.arange(i) * 0.1)
def live_plot(y):
p.cla()
p.plot(y)
p.plot(y**2)
p.draw()
p.pause(0.01)
for i in xrange(1000):
y = create_data(i)
live_plot(y)
The issue is that the GUI window becomes non-responsive from the PoV of the window manager so it 'helpfully' grays it out to tell you this. There may be a setting in gnome/unity (not sure which you are using) to disable this. (system dependent, maybe impossible :( )
One solution is to push the computation to another thread/process which will allow you to use a blocking show to leave the GUI main loop responsive. (complex :( )
Another solution is to in your slow loop periodically poke the GUI loop. I think calling fig.canvas.flush_events() should be sufficient. (dirty :( )
In short, GUIs (and asynchronous stuff in general) are hard to get right.
In Ubuntu 18.04, go to "search program" and type in "compiz" in the search box. When the ccsm window comes up, look for and uncheck the "fade window" box.

Categories