python accurate timing of multiple sensors - python

I have a hardware setup (Nvidia Jetson) with multiple sensors which I read out and store for later processing. I use multiprocessing to read the sensors in separate processes. I essentially read the data and add a time to it so I can later on interpolate the data of the different sensors so they "sync up".
To store the time I use time.perf_counter() because I read it is an accurate option. I don't necessarily need the time since epoch, but I do need the time of all the processes to go at the same speed as a wall clock (so real time, not cpu-time). Does this work with perf_counter() or do I need to use a different function?

Related

The way to write ten thousand data points to InfluxDB per second

I’m using a raspberry pi 4 to collect sensor data with a python script.
Like:
val=mcp.read_adc(0)
Which can read ten thousand data per second.
And now I want to save these data to influx for real-time analysis.
I have tried to save them to a log file while reading, and then use telegraf to collect as this blog did:
But it’s not working for my stream data as it is too slow.
Also I have tried to use python's influxdb module to write directly, like:
client.write(['interface,path=address,elementss=link value=3.14'],{'db':'db'},204,'line')
It's worse.
So how can I write these data into influxdb in time. Are there any solutions?
Thank you much appreciated!
Btw, I'm a beginner and can only use simple python, so sad.
InfluxDB OSS will process writes faster if you batch them. The python client has a batch parameter batch_size that you can use to do this. If you are reading ~10k points/s I would try a batch size of about 10k too. The batches should be compressed to speed transfer.
The write method also allows sending the tags path=address,elementss=link as a dictionary. Doing this should decrease parsing effort.
Are you also running InfluxDB on the raspberry pi or do you send the data off the Pi over a network connection?
I noticed that you said in the comments that nanosecond precision is very important but you did not include a timestamp in your line protocol point example. You should provide a timestamp yourself if the timing is this critical. Without an explicit timestamp in the data, InfluxDB will insert a timestamp at "when the data arrives" which is unpredictable.
As noted in the comments, you may want to consider preprocessing this data some before sending it to InfluxDB. We can't make a suggestion without knowing how you are processing the piezo data to detect footsteps. Usually ADC values are averaged in small batches (10 - 100 reads, depending) to reduce noise. Assuming your footstep detector runs continuously, you'll have over 750 million points per day from a single sensor. This is a lot of data to store and postprocess.
Please edit your question to include move information, if you are willing.

Threading priority workaround method

I have a python application where I use threading for the I/O bound tasks (reading from two seperate input sensors). I am aware that becouse of the GIL it is not possible to set a priority to a thread however I feel like my problem simply must be common enought that someone has though of a decent workaround. As I run the application it uses the maximum computational power of the CPU and I assume that is the problem, however I can not work around using the full potential.
Now for the issue, I know that a specific sensor is sending data every ~24:th ms (might drift over time). However the time when the application reads the data for example in the following time-order:
Available data at time (s): 4.361776
Available data at time (s): 4.3772116
Available data at time (s): 4.4171033
Available data at time (s): 4.4250908
Available data at time (s): 4.4596746
Available data at time (s): 4.5154788
Available data at time (s): 4.5154788
Available data at time (s): 4.5254734
Which is basicly on average 24ms inbetween each measurement but they are read in "clumps". Does anyone have a workaround for the problem? I know that I could implement sort of a "guessing" algorith to estimate the actual time for the measurement based on the time for the previous measurements however that seems like a solution prone to unexpected errors.

Getting Dask map_blocks to make use of all available resources

I am using Dask to parallelize time series satellite imagery analysis on a cluster with a substantial amount of computational resources.
I have set up a distributed scheduler with many workers (--nprocs = 56) each managing one thread (--nthreads = 1) and 4GB of memory due to the embarrassingly parallel nature of the work.
My data comes in as an xarray that is chunked into a dask array and map_blocks is used to map a function across each chunk in order to generate an output array that will be saved to an image file.
data = inputArray.chunk(chunks={'y':1})
client.persist(data)
future = data.data.map_blocks(timeSeriesTrends.timeSeriesTrends, jd, drop_axis=[1])
future = client.persist(future)
dask.distributed.wait(future)
outputArray = future.compute()
My problem is that Dask does not make use of all the resources I have allocated to it. Instead it begins with very few parallelized tasks and slowly adds more as processes finish without ever reaching capacity.
This dramatically restricts the capabilities of the hardware I have access to as many of my resources spend most of their time sitting idle.
Is my approach appropriate for generating an output array from an input array? How can I best make use of the hardware I have access to in this situation?

How to find the expected file transfer duration in Python?

I am using rsync to transfer files using rsync in Python. I have the basic UI where user can selects the file and initiate the transfer. I want to show the Expected Time Duration to transfer all the files they selected. I know the total size of all the files in bytes. What's the smart way to show them the expected file transfer duration? It doesn't have to be exact precise.
To calculate an estimated time to completion for anything, you simply need to keep track of the amount of time taken to transfer the data currently completed and base your estimate for the rest of the data on the past speed. Once you get that basic method, there are all sorts of ways you can adjust your estimate to take account of acceleration, congestion and other effects - for example, taking the amount of data transferred in the last 100 seconds, breaking this down into 20s increments and calculating a weighted mean speed.
I'm not familiar with using rsync in Python. Are you just calling it using os.exec*() or are you using something like pysync (http://freecode.com/projects/pysync)? If you are spawning rsync processes, you'll struggle to get granular data (esp. if transferring large files). I suppose you could spawn rsync --progress and get/parse the progress lines in some sneaky way but that seems horridly awkward.

Running process in parallel for data collection

I am collecting data from two pieces of equipment using serial ports (scale and conductivity probe). I need to continuously collect data from the scale which I average between collection points of the conductivity probe (roughly a minuet).
Thus I need to run two processes at the same time. One that collects data from the scale, and other which waits for data from the conductivity probe, once it gets the data it would send a command to the other process in order to get the collected scale data, which is then time stamped and saved into .csv file.
I looked into subprocess but it I cant figure out how to reset a running script. Any suggestions on what to look into.
Instead of using threads you could also implementing your data sources as generators and just loop over them to consume the incoming data and do something with it. Perhaps using two different generators and zipping them together, actually would be a nice experiment I'm not entirely sure it can be done...

Categories