Visualize data in microsecond resolution - python

I'm measuring multiple processes in a component to see where bottlenecks are. These processes take anything from 1-1000us to complete.
I'm logging this in an influxDB database, set to us resolution, using Python3.
My problem is visualising this. I tried grafana thinking it would suit me. However, when graphing this microsecond data it will show multiple datapoints on 1 ms, the max grafana supports, making it impossible to see increments or zoom in or anything similar.
Judging by some google results, 1, 2, 3, I'm not alone.
Is there any way I can make this data more readable/understandable by either having the graphing tool display it in microseconds or be able to change the X-axis to something different than a timestamp. (Ideally something in similar to grafana or chronograf.)
Thanks.

According to this Grafana feature request post (from 2016):
https://github.com/grafana/grafana/issues/6252
Quote by Torkel Ödegaard, Co-founder of Grafana:
No there is no way to do that. It would be quite tricky as all time
formats in javascript (and time libs) only go down to millisecond resolution.
As it seems this is currently not possible (even not in the mid-term future) as Javascript supports only milliseconds

Related

Configuring matplotlib to fetch data on demand?

I am working on software that processes time series. Sometimes these are very long (>10 million data points). Our software is very usable for shorter time series but gets unusably bogged down for these long ones. When looking at the RAM usage, it's almost 10x what all the time series data together occupy.
When doing some tests, it's clear that a lot of memory is used by matplotlib, which we are using to plot the time series. Using a separate piece of code that includes ONLY loading of the time series from a file and plotting, I can see that when going from loading only (with the plotting command commented out) to plotting, the memory usage goes up almost 3-fold. This is true whether or not the whole time range is visible within the given axis limits, although passing only a small slice of the series (numpy array) to matplotlib DOES proportionally reduce the excess memory.
Given that we expect users to scroll through the time series and only view short chunks at a time, it would be much better to have matplotlib only fetch the visible portion of the numpy array, grabbing new elements as the user scrolls or zooms. In fact, it would likely be preferable to replace the X and Y arrays with generators that re-compute the values on the fly as the plot needs them, possibly caching points just outside the limits to make scrolling faster. The X values in particular are simple linspaces that would likely be best not stored at all, given that computing them should be as fast as a lookup into a huge array, never mind storing them once in the outer software AND also in matplotlib.
I know we could try to "fake" this by capturing user events sent to the plot and re-sending new X and Y arrays all the time, but this feels clunky, prone to all sorts of corner cases where things get out of sync, and like trying to take over from the plotting library things it "wants" to do itself. At some point it would become easier just to write our own simple plotting routine in C/C++ that does the computations and draws lines using a graphics API. In fact, the nearest closed-source competitor to our software seems to be doing just that, given that it's super snappy and uses an amount of RAM that is a mere fraction of the size of a time series. But, we want our software to be extensible by users without a deep understanding of the internals of our code.
Is there a standard way of handling this, or is this just too far from the "spirit" of matplotlib to be worth using it? And in that case, is there an alternative Python plotting library with exactly this use case in mind? I would imagine that data scientists working with terabytes of data would want a way to graphically explore it without the plotting code eating terabytes of storage itself...

The way to write ten thousand data points to InfluxDB per second

I’m using a raspberry pi 4 to collect sensor data with a python script.
Like:
val=mcp.read_adc(0)
Which can read ten thousand data per second.
And now I want to save these data to influx for real-time analysis.
I have tried to save them to a log file while reading, and then use telegraf to collect as this blog did:
But it’s not working for my stream data as it is too slow.
Also I have tried to use python's influxdb module to write directly, like:
client.write(['interface,path=address,elementss=link value=3.14'],{'db':'db'},204,'line')
It's worse.
So how can I write these data into influxdb in time. Are there any solutions?
Thank you much appreciated!
Btw, I'm a beginner and can only use simple python, so sad.
InfluxDB OSS will process writes faster if you batch them. The python client has a batch parameter batch_size that you can use to do this. If you are reading ~10k points/s I would try a batch size of about 10k too. The batches should be compressed to speed transfer.
The write method also allows sending the tags path=address,elementss=link as a dictionary. Doing this should decrease parsing effort.
Are you also running InfluxDB on the raspberry pi or do you send the data off the Pi over a network connection?
I noticed that you said in the comments that nanosecond precision is very important but you did not include a timestamp in your line protocol point example. You should provide a timestamp yourself if the timing is this critical. Without an explicit timestamp in the data, InfluxDB will insert a timestamp at "when the data arrives" which is unpredictable.
As noted in the comments, you may want to consider preprocessing this data some before sending it to InfluxDB. We can't make a suggestion without knowing how you are processing the piezo data to detect footsteps. Usually ADC values are averaged in small batches (10 - 100 reads, depending) to reduce noise. Assuming your footstep detector runs continuously, you'll have over 750 million points per day from a single sensor. This is a lot of data to store and postprocess.
Please edit your question to include move information, if you are willing.

Is Python sometimes simply not fast enough for a task?

I noticed a lack of good soundfont-compatible synthesizers written in Python. So, a month or so ago, I started some work on my own (for reference, it's here). Making this was also a challenge that I set for myself.
I keep coming up against the same problem again and again and again, summarized by this:
To play sound, a stream of data with a more-or-less constant rate of flow must be sent to the audio device
To synthesize sound in real time based on user input, little-to-no buffering can be used
Thus, there is a cap on the amount of time one 'buffer generation loop' can take
Python, as a language, simply cannot run fast enough to do synthesize sound within this time limit
The problem is not my code, or at least, I've tried to optimize it to extreme levels - using local variables in time-sensitive parts of the code, avoiding using dots to access variables in loops, using itertools for iteration, using pre-compiled macros like max, changing thread switching parameters, doing as few calculations as possible, making approximations, this list goes on.
Using Pypy helps, but even that starts to struggle after not too long.
It's worth noting that (at best) my synth at the moment can play about 25 notes simultaneously. But this isn't enough. Fluidsynth, a synth written in C, has a cap on the number of notes per instrument at 128 notes. It also supports multiple instruments at a time.
Is my assertion that Python simply cannot be used to write a synthesizer correct? Or am I missing something very important?

Getting good timestamps for sensor readings

I'm trying to get timestamp data to match accelerometer and gyroscope readings.
I'm using a Raspberry Pi 3 B+ with python to pull accelerometer (Adxl345) and gyroscope (ITG3200) readings. I'm currently reading them through I2C as fast as I can, and I write a timestamp from the system time (time.time()) immediately before reading.
I thought this would be sufficiently accurate, but when I look at the resulting data the time is often not monotonic and/or just looks wrong. In fact, the result often seem to better match the motion I was tracking if I throw out all but the first timestamp and then synthetically create times based on the frequency of the device I'm collecting from!
That said, this approach clearly gives me wrong data, but I'm at a loss as to where to pull correct data. The accelerometer and gyro don't seem to include times in anything they do, and if I pull data as fast as I can I'm still bound to miss some from them at their highest rates meaning the times I use will always be somewhat wrong.
The accelerometer also has a FIFO cache it can store some values in, but again, if I pull from that case how do I know what timestamp goes with each value?? The documention mentions the cache storing values but nothing about the timestamp.
All of which leads me to believe I'm missing something. Is there a secret here I don't know? Or a standard practice I'm unaware of? Any thoughts or suggestions would be most welcome.

Realtime art project --- input: sound --- output: image (better title?)

I am not quite sure if I should ask this question here.
I want to make an art project.
I want to use voice as an input, and an image as output.
The image changes accordingly to the sound.
How can I realise this? Because I need realtime or a delay under 50 ms.
At first I thougt it would be better to use a micro controller.
But I want to calculate huge images, maybe I microcontroller can't do this.
For example I want to calculate 10.000 moving objects.
Could I realise this with windows/linux/mircocontroller?
It would be very good if I could use Python.
Or do you thing processing is a better choice?
Do you need more details?
Have you thought about using a graphical dataflow environment like Pure Data (Pd) or Max? Max is a commercial product, but Pd is free.
Even if you don't end up using Pd for your final project, it makes an excellent rapid prototyping tool. Whilst the graphics processing capabilities of Pd are limited, there are extensions such as Gridflow and Gem, which may help you. Certainly with Pd you can analyse incoming sound using the [fiddle~] object, which will give you the overall pitch and frequency/amplitude of individual partials and [env~], which will give you RMS amplitude. You could then very easily map changes in sound (pitch, amplitude, timbre) to various properties of an image such as colour, shapes, number of elements and so on in Gem or Gridflow.
10k moving objects sounds like a heck of a lot even on a modern desktop GPU! Calculating all of those positions on-the-fly is going to consume a lot of resources. I think even with a dedicated C++ graphics library like openFrameworks, this might be a struggle. You might want to consider an optimisation strategy like pre-rendering aspects of the image, and using the real-time audio control to determine which pre-rendered components are displayed at any given time. This might give the illusion of control over 10k objects, when in reality much of it is pre-rendered.
Good luck!
The above answer is a good one, PD is very flexible, but if you want something more code oriented and better suited to mixing with MCUs, processing might be better.
Another good way to do it would be to use Csound with the Csound Python API. Csound has a steep learning curve, but tons tons of audio analysis functionality and it's very good for running with low latency. You could definitely analyse an input signal in real time and then send control values out to a graphic environment, scripting both with Python.

Categories