Scheduling non-periodic events with multiple threads - python

I am attempting to develop a GUI program in python (using pyqt5) to interact with a data acquisition device (DAQ) that will be connected via LAN or USB to a windows PC. On the click of a button (in the GUI), the DAQ will perform a test.
Each "Test" will consist of collecting a reading (collecting a reading takes about 1.5 seconds) at user-defined intervals from the start of the test (e.g., 0.1 min, 0.2 min, 0.5 min, 1 min, 2 min, 5 min...1000 min etc.). A reading is collected by execution of a function, so code for a single test might look like this:
import time
t=[0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000] #times from start of test to collect readings at (min)
intervals=[t[i]-t[i-1] for i in range(1,len(t))] #time delta between readings (min)
def GetReading():
#some code to connect to the DAQ (using pyvisa) and collect reading
reading=['2020-01-02 17:33:33',1.23456] #the data returned from the DAQ
return reading
def RunTest(r):
results=[GetReading()] #get the initial (t=0) reading
ReadTime=1.5 #time in seconds to collect 1 reading (I may use an implementation of
#time.run_process() or similar to actually calculate this this instead)
for j in r:
time.sleep(j*60-ReadTime)
results.append(GetReading())
return results
RunTest(intervals)
The DAQ can only perform one reading at a time. I would like to be able to run multiple tests simultaneously, and have my program automatically wait and start a new test when it is feasible (i.e., delay starting a test on click if another test is already running).
The first, say 5 readings, are imperative that they happen on time, but the subsequent readings of a given test can be delayed by a bit without affecting the quality of the test. For example, if a test is running at the 0.2 min reading interval, and the user initiates a new test, the program would wait until the current test completed say, the 5 min reading, before starting the additional test sequence.
Subsequent readings beyond the 5 min reading could be delayed to collect the first 5 readings of a new test sequence, or collect a reading from another test.
I'm struggling with how to program this, conceptually. I think i need to use multiprocesses or similar to allow multiple tests to be run in parallel (though no actual parallel readings can occur). Or, perhaps I can use scheduler? I'm just not sure how to implement either of these; I've never used them before, and I'm having trouble understanding examples I find in the context of my problem.
Furthermore, I need to be able to access results (output from RunTest) between calls to GetReading() (e.g., to view data as the test progresses), and using the time.sleep wouldn't allow that.
UPDATE
The measurement the DAQ is collecting is deformation, via a LVDT.
The time zero in var t is not actually the button click supplied by the user. On button click, the DAQ will open the specified channel and the program will monitor for a change in deformation above a certain threshold. The user will then physically start the test (which involves adding a weight on some material, to measure stress-strain properties), and time zero will occur at i-1 where i is the first instance of change above the threshold is detected (i.e., t=0 corresponds to the zero-deformation reading the instant before the weight is added). I need the whole process from button click, to adding the weight, to collecting up to the 5 minute reading to be uninterrupted for a single test (Deformation occurs most rapidly, and potentially erratically, in the first 5 minutes or so).

Below code works, but doesn't ensure, that first measurements of a new test are prioritized.
If this is is essential, then the solution will be a little bit more difficult.
In order to be sure, that only one function / thread is reading data at a given time, you can use a mutex (threading.Lock)
from threading import Lock
read_lock = Lock()
def get_data():
with read_lock:
#some code to connect to the DAQ (using pyvisa) and collect reading
reading=['2020-01-02 17:33:33',1.23456] #the data returned from the DAQ
return reading
I'd propose to write a function, that fetches the result and appends it to a results list.
Any object being modified by one thread and read by another should be protected with a Lock, therefore there is a second lock to avoid simultaneous reading / writing of results.
results_lock = Lock()
def get_and_store_data(results):
result = get_data()
with results_lock:
results.append(result)
You can schedule a get_and_store_data action with threading.Timer
Below a the full code example:
from threading import Lock
from threading import Timer
t=[0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000] #times from start of test to collect readings at (min)
read_lock = Lock()
results_lock = Lock()
def get_data():
import time
import datetime
import random
with read_lock:
time.sleep(1.5)
#some code to connect to the DAQ (using pyvisa) and collect reading
reading = [
datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
random.randint(0,10000) / 1000,
]
return reading
def get_and_store_data(results):
result = get_data()
with results_lock:
results.append(result)
# schedule measures for one test
def schedule_measures(measure_times, results):
timers = []
for t in measure_times:
timer = Timer(t, get_and_store_data, args=[results])
timer.start()
timers.append(timer)
def main():
results = []
meas_times = [tim * 1 for tim in t]
schedule_measures(meas_times, results)
while True:
msg = "Please press enter to display results or q and enter to quit"
choice = input(msg).strip()
if choice == "q":
break
print("Results:")
with results_lock:
for result in results:
print(result)
main()
If you want to reduce the 'drift' between measurements, then you could do something like:
import time
def schedule_measures(measure_times, results):
timers = []
t_0 = time.time()
for t in measure_times:
now = time.time()
timer = Timer(t - (now - t_0), get_and_store_data, args=[results])
timer.start()
timers.append(timer)
Though the drift will probably be low enough anyways, but it's a neat trick if you have more CPU intensive actions in your schedule function or if you do not want to schedule all events at startup.
For prioritizing measurements it might perhaps be easier to create a sorted list of lists with the calculated times when the measurement should be performed and start the next timer only when the previous timer fired. there had to be some logic to decide which measurement should be the next one to be scheduled. I don't have time now, but will probably come back within the next 12 hours with a suggested algorithm

Related

Outputing analog voltage continuously to NI DAQ modules with nidaqmx-python

This is about working with the nidaqmx-python package, maintained by National Instruments for the purpose of interfacing their acquisition modules.
Specs: NI cDAQ-9178 with NI 9264 output card plugged into it. Package nidaqmx-python in a conda virtual environment for Python 3.7 on a machine running Windows 10.
Overarching goal: read an input voltage continuously and, after lowpass filtering it in real time with an IIR filter, output some PID-calculated analog voltage to continuously drive some machine (irrelevant which one).
Concrete goal right now: understand how to make the best possible use of nidaqmx-python high-level functions and callbacks to continously output voltage through my cDAQ in a way that is efficient, with the PC buffer being correctly managed and all, and understanding how this happens.
Knowledge: I'm OK in python but I have only been playing with the nidaqmx-python package for some weeks now. I have successfully managed to use the built in callback mechanism which allows to continuously read an analog signal at some sampling rate, and thought it would then be straightforward to do the writing part. It seems it is not and I am struggling with it, although I have read the (not very friendly?) documentation for the package, here.
Issues: with the code here below, which seemed to be like a good and easy way to try to get to know these functions, I simply try to increase the values in an array, data, representing the voltage to be outputed, and then with the function register_every_n_samples_transferred_from_buffer_event (documented here) I have the callback my_callback be called every time the device has read 10 samples from the PC buffer. That callback does something simple: it uses write_many_sample to write data to the PC buffer. I wanted to use this simple example to check if, with these parameters, I could get from 0 to 5V in 5 seconds (seeing as I increase by 0.01 Volts every 10 ms, because the rate is 1000 Hz and the callback is called every 10 transferred samples, i.e. at 100 Hz). This fails and I go from 0 to 5 Volts in approximately 25 seconds (checked with a mutlimeter).
Code:
# Continuous write single channel
import numpy as np
import nidaqmx
from nidaqmx.stream_writers import (AnalogSingleChannelWriter)
from nidaqmx import constants
global datasize
global bufsize
global rate_outcfg
global rate_callback
datasize = 10 # I guess this ought to be same as rate_callback
bufsize = 10 # issues warnings as is; can be increased to stop warnings
rate_outcfg = 1000 # chosen at random; contraint is to update output at 100Hz so whatever works would be fine here
rate_callback = 10 # basically rate_outcfg/100 as I would like to update output at 100Hz (note setting it to that does not work)
# ISSUE: it seems instead of refreshing voltage every second it updates every bufsize/10 seconds, if counter_limit = 100
# This means there is something I am missing
global counter_limit
counter_limit = 1 # 1 to update every callback call (which is supposed to be at 100Hz rate)
global data
data = np.empty((bufsize,)) # cannot be vertical for nidaqmx to work
data[:] = 0 # starting voltage in Volts
global stream
global counter
counter = 0
def my_callback(task_idx, event_type, num_samples, callback_data):
global counter
global counter_limit
if counter == counter_limit: # with 100, voltage will change at 1Hz given the above parameters (should be config better)
counter = 0
data[:] = data[:] + 0.01
else:
counter = counter + 1
stream.write_many_sample(data, timeout=constants.WAIT_INFINITELY)
return 0
def setTask(t):
t.ao_channels.add_ao_voltage_chan("cDAQ2Mod8/ao0")
t.timing.cfg_samp_clk_timing(rate=rate_outcfg, sample_mode=nidaqmx.constants.AcquisitionType.CONTINUOUS,
samps_per_chan=bufsize) # last arg is the buffer size for continuous output
task = nidaqmx.Task()
setTask(task)
stream = AnalogSingleChannelWriter(task.out_stream, auto_start=False) # with auto_start=True it complains
# Call the my_callback function everytime rate_callback samples are read by device from PC buffer
task.register_every_n_samples_transferred_from_buffer_event(rate_callback, my_callback)
stream.write_many_sample(data) # first manual write to buffer, required otherwise it complains it can't start
task.start()
input('hey') # task runs for as long as ENTER is not pressed
task.close() # important otherwise when re-running the code it says specified device is reserved!
# NOTE somehow once ENTER is pressed it takes some seconds to actually stop if bufsize is very large, I don't know why
Notes:
It seems the timing depends on the bufsize: doubling it to 20 results in reaching 5 Volts in approximately 50 seconds instead of 25 s.
Unless I make my buffer very large, I get a warning at every callback call:
While writing to the buffer during a regeneration, the actual data generated might have alternated between old data and new data. That is, while the driver was replacing the old pattern in the buffer with the new pattern, the device might have generated a portion of new data, then a portion of old data, and then a portion of new data again.
Reduce the sample rate, use a larger buffer, or refer to documentation about DAQmx Write for information about other ways to avoid this warning.
error_buffer.value.decode("utf-8"), error_code))
C:\Users\james\anaconda3\envs\venv37_drift_null\lib\site-packages\nidaqmx\errors.py:141: DaqWarning:
Warning 200015 occurred.
Note the counter variable counter to allow for some flexibility (currently counter_limit is set to 1 so that the output data increases every run).
Bottom line: I'm a bit lost with this. Ideally I would like to understand how I can achieve, e.g. going from 0 to 5 V in 5 seconds. But this is only an example. I would like to understand what role is played by the different variables bufsize, rate_callback and rate_outcfg and what sets the timing of execution. Ultimately I'd like to get to the point where my basic understanding enables me to write such a simple task (outputing a continuously increasing voltage - or some other function like a sine wave) in a way that is efficient and warning free).
Many thanks to anyone contributing !

How to use multiple, but a limited number of threads in Python to process a list

I have a dataframe, several thousand rows in length, that contains two pairs of GPS coordinates in one of the columns, with which I am trying to calculate the drive time between those coordinates. I have a function that takes in those coordinates and returns the drive time and it takes maybe 3-8 seconds to calculate each entry. So, the total process can take quite a while. What I'd like to be able to do is: using maybe 3-5 threads, iterate through the list and calculate the drive time and move on to the next entry while the other threads are completing and not creating more than 5 threads in the process. Independently, I have everything working - I can run multiple threads, I can track the thread count and wait until the max number of allowed threads drops below limit until the next starts and can iterate the dataframe and calculate the drive time. However, I'm having trouble piecing it all together. Here's an edited, slimmed down version of what I have.
import pandas
import threading
import arcgis
class MassFunction:
#This is intended to keep track of the active threads
MassFunction.threadCount = 0
def startThread(functionName,params=None):
#This kicks off a new thread and should count up to keep track of the threads
MassFunction.threadCount +=1
if params is None:
t = threading.Thread(target=functionName)
else:
t = threading.Thread(target=functionName,args=[params])
t.daemon = True
t.start()
class GeoAnalysis:
#This class handles the connection to the ArcGIS services
def __init__(self):
super(GeoAnalysis, self).__init__()
self.my_gis = arcgis.gis.GIS("https://www.arcgis.com", username, pw)
def drivetimeCalc(self, coordsString):
#The coords come in as a string, formatted as 'lat_1,long_1,lat_2,long_2'
#This is the bottleneck of the process, as this calculation/response
#below takes a few seconds to get a response
points = coordsString.split(", ")
route_service_url = self.my_gis.properties.helperServices.route.url
self.route_layer = arcgis.network.RouteLayer(route_service_url, gis=self.my_gis)
point_a_to_point_b = "{0}, {1}; {2}, {3}".format(points[1], points[0], points[3], points[2])
result = self.route_layer.solve(stops=point_a_to_point_b,return_directions=False, return_routes=True,output_lines='esriNAOutputLineNone',return_barriers=False, return_polygon_barriers=False,return_polyline_barriers=False)
travel_time = result['routes']['features'][0]['attributes']['Total_TravelTime']
#This is intended to 'remove' one of the active threads
MassFunction.threadCount -=1
return travel_time
class MainFunction:
#This is to give access to the GeoAnalysis class from this class
GA = GeoAnalysis()
def closureDriveTimeCalc(self,coordsList):
#This is intended to loop in the event that a fifth loop gets started and will prevent additional threads from starting
while MassFunction.threadCount > 4:
pass
MassFunction.startThread(MainFunction.GA.drivetimeCalc,coordsList)
def driveTimeAnalysis(self,location):
#This reads a csv file containing a few thousand entries.
#Each entry/row contains gps coordinates, which need to be
#iterated over to calculate the drivetimes
locationMemberFile = pandas.read_csv(someFileName)
#The built-in apply() method in pandas seems to be the
#fastest way to iterate through the rows
locationMemberFile['DRIVETIME'] = locationMemberFile['COORDS_COL'].apply(self.closureDriveTimeCalc)
When I run this right now, using VS Code, I can see the thread counts go up into the thousands in the call stack, so I feel like it is not waiting for the thread to finish and adding/subtracting from the threadCount value. Any ideas/suggestions/tips would be much appreciated.
EDIT: Essentially my problem is how do I get the travel_time value back so that it can be placed into the dataframe. I currently have no return statement for closureDriveTimeCalc function, so while the function runs correctly, it doesn't send any information back into the apply() method.
Rather than do this in an apply, I'd use multiprocessing Pool.map:
from multiprocessing import Pool
with Pool(processes=4) as pool:
locationMemberFile['DRIVETIME'] = pool.map(self.closureDriveTimeCalc, locationMemberFile['COORDS_COL']))

Job scheduling for data scraping on Python

I'm scraping (extracting) data from a certain website. The data contains two values that I need, namely (grid) frequency value and time.
The data on the website is being updated every second. I'd like to continuously save these values (append them) into a list or a tuple using python. To do that I tried using schedule library. The following job schedule commands run the data scraping function (socket_freq) every second.
import schedule
schedule.every(1).seconds.do(socket_freq)
while True:
schedule.run_pending()
I'm facing two problems:
I don't know how to restrict the schedule to run during a chosen time interval. For example, i'd like to run it for 5 or 10 minutes. how do I define that? I mean how to I tell the schedule to stop after a certain time.
if I run this code and stop it after few seconds (using break), then I often get multiple entries, for example here is one result, where the first list[ ] in the tuple refers to the time value and the second list[ ] is the values of frequency:
out:
(['19:27:02','19:27:02','19:27:02','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:04','19:27:04','19:27:04', ...],
['50.020','50.020','50.020','50.018','50.018','50.018','50.018','50.018','50.018','50.018','50.017','50.017','50.017'...])
As you can see, the time variable is entered (appended) multiple times, although I used a schedule that runs every 1 second. What i'd actually would expect to retrieve is:
out:
(['19:27:02','19:27:03','19:27:04'],['50.020','50.018','50.017'])
Does anybody know how to solve these problems?
Thanks!
(I'm using python 2.7.9)
Ok, so here's how I would tackle these problems:
Try to obtain a timestamp at the start of your program and then simply check if it has been working long enough each time you execute piece of code you are scheduling.
Use time.sleep() to put your program to sleep for a period of time.
Check my example below:
import schedule
import datetime
import time
# Obtain current time
start = datetime.datetime.now()
# Simple callable for example
class DummyClock:
def __call__(self):
print datetime.datetime.now()
schedule.every(1).seconds.do(DummyClock())
while True:
schedule.run_pending()
# 5 minutes == 300 seconds
if (datetime.datetime.now() - start).seconds >= 300:
break
# And here we halt execution for a second
time.sleep(1)
All refactoring is welcome

Looping at a constant rate with high precision for signal sampling

I am trying to sample a signal at 10Khz in Python. There is no problem when try to run this code(at 1KHz):
import sched, time
i = 0
def f(): # sampling function
s.enter(0.001, 1, f, ())
global i
i += 1
if i == 1000:
i = 0
print "one second"
s = sched.scheduler(time.time, time.sleep)
s.enter(0.001, 1, f, ())
s.run()
When I try to make the time less, it starts to exceed one second(in my computer, 1.66s at 10e-6).
It it possible to run a sampling function at a specific frequency in Python?
You didn't account for the code's overhead. Each iteration, this error adds up and skews the "clock".
I'd suggest to use a loop with time.sleep() instead (see comments to https://stackoverflow.com/a/14813874/648265) and count the time to sleep from the next reference moment so the inevitable error doesn't add up:
period=0.001
t=time.time()
while True:
t+=period
<...>
time.sleep(max(0,t-time.time())) #max is needed in Windows due to
#sleep's behaviour with negative argument
Note that the OS scheduling will not allow you to reach precisions beyond a certain level since other processes have to preempt yours from time to time. In this case, you'll need to use some OS-specific facilities for multimedia applications or work out a solution that doesn't need this level of accuracy (e.g. sample the signal with a specialized app and work with its saved output).

Control the speed of a loop

I'm currently reading physics in the university, and im learning python as a little hobby.
To practise both at the same time, i figured I'll write a little "physics engine" that calculates the movement of an object based on x,y and z coordinates. Im only gonna return the movement in text (at least for now!) but i want the position updates to be real-time.
To do that i need to update the position of an object, lets say a hundred times a second, and print it back to the screen. So every 10 ms the program prints the current position.
So if the execution of the calculations take 2 ms, then the loop must wait 8ms before it prints and recalculate for the next position.
Whats the best way of constructing a loop like that, and is 100 times a second a fair frequency or would you go slower, like 25 times/sec?
The basic way to wait in python is to import time and use time.sleep. Then the question is, how long to sleep? This depends on how you want to handle cases where your loop misses the desired timing. The following implementation tries to catch up to the target interval if it misses.
import time
import random
def doTimeConsumingStep(N):
"""
This represents the computational part of your simulation.
For the sake of illustration, I've set it up so that it takes a random
amount of time which is occasionally longer than the interval you want.
"""
r = random.random()
computationTime = N * (r + 0.2)
print("...computing for %f seconds..."%(computationTime,))
time.sleep(computationTime)
def timerTest(N=1):
repsCompleted = 0
beginningOfTime = time.clock()
start = time.clock()
goAgainAt = start + N
while 1:
print("Loop #%d at time %f"%(repsCompleted, time.clock() - beginningOfTime))
repsCompleted += 1
doTimeConsumingStep(N)
#If we missed our interval, iterate immediately and increment the target time
if time.clock() > goAgainAt:
print("Oops, missed an iteration")
goAgainAt += N
continue
#Otherwise, wait for next interval
timeToSleep = goAgainAt - time.clock()
goAgainAt += N
time.sleep(timeToSleep)
if __name__ == "__main__":
timerTest()
Note that you will miss your desired timing on a normal OS, so things like this are necessary. Note that even with asynchronous frameworks like tulip and twisted you can't guarantee timing on a normal operating system.
Since you cannot know in advance how long each iteration will take, you need some sort of event-driven loop. A possible solution would be using the twisted module, which is based on the reactor pattern.
from twisted.internet import task
from twisted.internet import reactor
delay = 0.1
def work():
print "called"
l = task.LoopingCall(work)
l.start(delay)
reactor.run()
However, as has been noted, don't expect a true real-time responsiveness.
A piece of warning. You may not expect a real time on a non-realtime system. The sleep family of calls guarantees at least a given delay, but may well delay you for more.
Therefore, once you returned from sleep, query current time, and make the calculations into the "future" (accounting for the calculation time).

Categories