PyAudio - Synchronizing playback and record

PyAudio - Synchronizing playback and record - python

I'm currently using PyAudio to work on a lightweight recording utility that fits a specific need of an application I'm planning. I am working with an ASIO audio interface. What I'm writing the program to do is play a wav file through the interface, while simultaneously recording the output from the interface. The interface is processing the signal onboard in realtime and altering the audio. As I'm intending to import this rendered output into a DAW, I need the output to be perfectly synced with the input audio. Using a DAW I can simultaneously play audio into my interface and record the output. It is perfectly synced in the DAW when I do this. The purpose of my utility is to be able to trigger this from a python script.
Through a brute-force approach I've come up with a solution that works, but I'm now stuck with a magic number and I'm unsure of whether this is some sort of constant or something I can calculate. If it is a number I can calculate that would be ideal, but I still would like to understand where it is coming from either way.
My callback is as follows:
def testCallback(in_data, frame_count, time_info, status):
#read data from wave file
data = wave_file.readframes(frame_count)
#calculate number of latency frames for playback and recording
#1060 is my magic number
latencyCalc = math.ceil((stream.get_output_latency() + stream.get_input_latency()) * wave_file.getframerate()) + 1060
#no more data in playback file
if data == "":
#this is the number of times we must keep the loop alive to capture all playback
recordEndBuffer = latencyCalc / frame_count
if lastCt < recordEndBuffer:
#return 0-byte data to keep callback alive
data = b"0"*wave_file.getsampwidth()*frame_count
lastCt += 1
#we start recording before playback, so this accounts for the initial "pre-playback" data in the output file
if firstCt > (latencyCalc/frame_count):
wave_out.writeframes(in_data)
else:
firstCt += 1
return (data, pyaudio.paContinue)
My concern is in the function:
latencyCalc = math.ceil((stream.get_output_latency() + stream.get_input_latency()) * wave_file.getframerate()) + 1060
I put this calculation together by observing the offset of my output file in comparison to the original playback file. Two things were occurring, my output file was starting later than the original file when played simultaneously, and it would also end early. Through trial and error I determined it was a specific number of frames extra at the beginning and missing at the end. This calculates those number of frames. I do understand the first piece, it is the input/output latencies (provided in second/subsecond accuracy) converted to frames using the sample rate. But I'm not quite sure how to fill in the 1060 value as I'm not sure where it comes from.
I've found that by playing with the latency settings on my ASIO driver, my application continues to properly sync the recorded file even as the output/input latencies above change due to the adjustment (input/output latencies are always the same value), so the 1060 appears to be consistent on my machine. However, I simply don't know whether this is a value that can be calculated. Or if it is a specific constant, I'm unsure what exactly it represents.
Any help in better understanding these values would be appreciated. I'm happy my utility is now working properly, but would like to fully understand what is happening here, as I suspect potentially using a different interface would likely no longer work correctly (I would like to support this down the road for a few reasons).
EDIT 4/8/2014 in response to Roberto:
The value I receive for
latencyCalc = math.ceil((stream.get_output_latency() + stream.get_input_latency()) * wave_file.getframerate()) + 1060
is 8576, with the extra 1060 bringing to total latency to 9636 frames. You are correct in your assumption of why I added the 1060 frames. I am playing the file through the external ASIO interface, and the processing I'm hoping to capture in my recorded file is the result of the processing that occurs on the interface (not something I have coded). To compare the outputs, I simply played the test file and recorded the interface's output without any of the processing effects engaged on the interface. I then examined the two tracks in Audacity, and by trial and error determined that 1060 was the closest I could get the two to align. I have since realized it is still not exactly perfect, but it is incredibly close and audibly undetectable when played simulataneously (which is not true when the 1060 offset is removed, there is a noticeable delay). Adding/removing an additional frame is too much compensation in comparison to 1060 as well.
I do believe you are correct that the additional latency is from the external interface. I was initially wondering if it was something I could calculate with the numerical info I had at hand, but I am concluding it's just a constant in the interface. I feel this is true as I have determined that if I remove the 1060, the offset of the file is exactly the same as performing the same test but manually in Reaper (this is exactly the process I'm automating). I am getting much better latency than I would in reaper with my new brute force offset, so I'm going to call this a win. In my application, the goal is to completely replace the original file with the newly processed file, so the absolute minimum latency between the two is desired.
In response to your question about ASIO in PyAudio, the answer is fortunately yes. You must compile PortAudio using the ASIO SDK for PortAudio to function with ASIO, and then update the PyAudio setup to compile this way. Fortunately I'm working on windows, http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio which has ASIO support built in, and the devices will then be accessible through ASIO.

Since I'm not allowed to comment, I'll ask you here: What is the value of stream.get_output_latency() + stream.get_input_latency()) * wave_file.getframerate()? And how did you get that number 1060 in the first place?
With the line of code you marked off:
latencyCalc = math.ceil((stream.get_output_latency() + stream.get_input_latency()) * wave_file.getframerate()) + 1060, you simply add extra 1060 frames to your total latency. It's not clear to me from your description, why you do this, but I assume that you have measured total latency in your resulting file, and there is always constant number of extra frames, beside the sum of input latency + output latency. So, did you consider that this extra delay might be due to processing? You said that you do some processing of the input audio signal; and processing certainly takes some time. Try to do the same with unaltered input signal, and see if the extra delay is reduced/removed. Even the other parts of your application, e.g. if application has GUI, all those things can slow the recording down. You didn't describe your app completely, but I'm guessing that the extra latency is caused by your code, and the operations that the code does. And why is the 'magic number' always the same? Because your code is always the same.
resume: What the 'magic number' represents? Obviously, it represents some extra latency, in addition to your total round -trip latency.
What is causing this extra latency? The cause is most likely somewhere in your code. Your application is doing something that takes some additional time, and thus makes some additional delay. The only other possible thing that comes to my mind, is that you have added some additional 'silence period', somewhere in your settings, so you can check this out, too.

Related

Interfacing Load Cell and HX711 with Raspberry Pi 3 using Python 3

I am currently working on a IoT Project in which I am trying to interface my Raspberry Pi 3 to HX711 so that I can read weight readings from my load cell having a range of 200 kg.
For the Python code, I tried this Python library from github
According to the description of this repository, I first calibrated the HX711 (calibration.py) by using a 5 kg known weight, giving me the offset and scale. After which I copied them and used them in example_python3.py.
But I keep getting variable readings from the load cell as shown in the following screenshot from the Raspberry Pi window:
I am getting this output by putting 5 kg load. I tried this loop of calibration and checking the output many many times but my output is still variable.
This is the code that I was using:
import RPi.GPIO as GPIO
import time
import sys
from hx711 import HX711
# Force Python 3 ###########################################################
if sys.version_info[0] != 3:
raise Exception("Python 3 is required.")
############################################################################
GPIO.setwarnings(False)
hx = HX711(5, 6)
def cleanAndExit():
print("Cleaning...")
GPIO.cleanup()
print("Bye!")
sys.exit()
def setup():
"""
code run once
"""
#Pasted Offset and Scale I got from calibration..
hx.set_offset(8608276.3125)
hx.set_scale(19.828315054835493)
def loop():
"""
code run continuosly
"""
try:
val = hx.get_grams()
print(val)
hx.power_down()
time.sleep(0.001)
hx.power_up()
except (KeyboardInterrupt, SystemExit):
cleanAndExit()
##################################
if __name__ == "__main__":
setup()
while True:
loop()

Unfortunately I do not have an HX711 so I cannot test your code. But I can give some pointers that might help.
My main question is: why does your code contain
hx.power_down()
time.sleep(0.001)
hx.power_up()
in the loop? According to the datasheet, the output settling time (i.e. the time from power up, reset, input channel change and gain change to valid stable output data) is 400 ms. So if you power down the HX711 after every reading, every reading will be instable!
Furthermore, how much deviation between readings do you expect? Your values currently fluctuate between roughly 4990 and 5040, which is a difference of 50, so only 1% difference. That's not really bad. Unfortunately, the accuracy of the HX711 is not specified in the datasheet, so I can't determine if this is "correct" or "wrong". However, you should check what you can expect before assuming something is wrong. The data sheet mentions an input offset drift of 0.2 mV, while the full-scale differential input voltage (at gain 128) is 20 mV. That's 1% too. (That might be coincidence, but you should probably dive into it if you want to be sure.)
Did you check the timing of your serial communication? The code just toggles IO pins without any specific timing, while the datasheet mentions that PD_SCK must be high for at least 0.2 us and at most 50 us. Toggling faster might result in incorrect readings, while toggling slower might cause the device to reset (since keeping PD_SCK high for longer than 60 us causes it to enter power down mode). See for example this C implementation which included a fix for fast CPUs. Your Python library did not include this fix.
I'm not sure how you would enforce this using a Raspberry Pi, though. It seems like you're just lucky if you get this working reliably on a Raspberry Pi (or any other non-real-time platform), because if your code is interrupted in a bad time, the reading may fail (see this comment for example).
I've read a few reports on the internet of persons stating that the HX711 needs to "warm up" for 2-3 minutes, so the readings become more stable after that time.
Finally, the issue could also be hardware related. There seem to be many low-quality boards. For example, there is this known design fault that might be related.
NB: Also note that your offset (8,608,276.3125) can't be correct. The HX711 returns a 24-bit 2's complement value. That means a value between -8,388,607 and +8,388,608. Your value is outside that range. The reason that you got this value is that the library you're using is not taking the data coding into account correctly. See this discussion. There are several forks of the repository in which this was fixed, for example this one. If correctly read, the value would have been -8,168,939.6875. This bug won't affect the accuracy, but could result in incorrect results for certain weights.
Just a final note for everyone thinking that the precision is 24-bit and thus it should return very reliable readings: precision is not the same as accuracy. Just because the device returns 24 bits does not mean that those bits are correct. How close the value is to the real value (the actual weight) depends on many other factors. Note that the library that you use by default reads the weight 16 times and averages the result. That shouldn't be necessary at all if the device is so accurate!
My advice:
Remove the power down from your loop (you may want to use it once in your setup though).
Wait at least 400 ms before your first reading.
Fix the data conversion for 2's complement logic if you want to use the full range.
Do not expect the device to be accurate to 24 bits.
If you still want to improve the accuracy of the readings, you might want to dive into the hardware- or timing-related issues (the datasheet and these github issues contain a lot of information).

Is Python sometimes simply not fast enough for a task?

I noticed a lack of good soundfont-compatible synthesizers written in Python. So, a month or so ago, I started some work on my own (for reference, it's here). Making this was also a challenge that I set for myself.
I keep coming up against the same problem again and again and again, summarized by this:
To play sound, a stream of data with a more-or-less constant rate of flow must be sent to the audio device
To synthesize sound in real time based on user input, little-to-no buffering can be used
Thus, there is a cap on the amount of time one 'buffer generation loop' can take
Python, as a language, simply cannot run fast enough to do synthesize sound within this time limit
The problem is not my code, or at least, I've tried to optimize it to extreme levels - using local variables in time-sensitive parts of the code, avoiding using dots to access variables in loops, using itertools for iteration, using pre-compiled macros like max, changing thread switching parameters, doing as few calculations as possible, making approximations, this list goes on.
Using Pypy helps, but even that starts to struggle after not too long.
It's worth noting that (at best) my synth at the moment can play about 25 notes simultaneously. But this isn't enough. Fluidsynth, a synth written in C, has a cap on the number of notes per instrument at 128 notes. It also supports multiple instruments at a time.
Is my assertion that Python simply cannot be used to write a synthesizer correct? Or am I missing something very important?

How to find the expected file transfer duration in Python?

I am using rsync to transfer files using rsync in Python. I have the basic UI where user can selects the file and initiate the transfer. I want to show the Expected Time Duration to transfer all the files they selected. I know the total size of all the files in bytes. What's the smart way to show them the expected file transfer duration? It doesn't have to be exact precise.

To calculate an estimated time to completion for anything, you simply need to keep track of the amount of time taken to transfer the data currently completed and base your estimate for the rest of the data on the past speed. Once you get that basic method, there are all sorts of ways you can adjust your estimate to take account of acceleration, congestion and other effects - for example, taking the amount of data transferred in the last 100 seconds, breaking this down into 20s increments and calculating a weighted mean speed.
I'm not familiar with using rsync in Python. Are you just calling it using os.exec*() or are you using something like pysync (http://freecode.com/projects/pysync)? If you are spawning rsync processes, you'll struggle to get granular data (esp. if transferring large files). I suppose you could spawn rsync --progress and get/parse the progress lines in some sneaky way but that seems horridly awkward.

Storing and replaying binary network data with python

I have a Python application which sends 556 bytes of data across the network at a rate of 50 Hz. The binary data is generated using struct.pack() which returns a string, which is subsequently written to a UDP socket.
As well as transmitting this data, I would like to save this data to file as space-efficiently as possible, including a timestamp for each message, so that I can replay the data at a later time. What would be the best way of doing this using Python?
I have mulled over using a logging object, but have not yet found out whether Python can read in log files so that I can replay the data. Also, I don't know whether the logging object can handle binary data.
Any tips would be much appreciated! Although Wireshark would be an option, I'd rather store the data using my application so that I can automatically start new data files each time I run the program.

Python's logging system is intended to process human-readable strings, and it's intended to be easy to enable or disable depending on whether it's you (the developer) or someone else running your program. Don't use it for something that your application always needs to output.
The simplest way to store the data is to just write the same 556-byte string that you send over the socket out to a file. If you want to have timestamps, you could precede each 556-byte message with the time of sending, converted to an integer, and packed into 4 or 8 bytes using struct.pack(). The exact method would depend on your specific requirements, e.g. how precise you need the time to be, and whether you need absolute time or just relative to some reference point.

One possibility for a compact timestamp for replay purposes...: set the time as a floating point number of seconds since the epoch with time.time(), multiply by 50 since you said you're repeating this 50 times a second (the resulting unit, one fiftieth of a second, is sometimes called "a jiffy"), truncate to int, subtract from the similar int count of jiffies since the epoch that you measured at the start of your program, and struct.pack the result into an unsigned int with the number of bytes you need to represent the intended duration -- for example, with 2 bytes for this timestamp, you could represent runs of about 1200 seconds (20 minutes), but if you plan longer runs you'd need 4 bytes (3 bytes is just too unwieldy IMHO;-).
Not all operating systems have time.time() returning decent precision, so you may need more devious means if you need to run on such unfortunately limited OSs. (That's VERY os-dependent, of course). What OSs do you need to support...?
Anyway...: for even more compactness, use a slightly higher multiplier than 50 (say 10000) for more accuracy, and store, each time, the difference wrt the previous timestamp -- since that difference should not be much different from a jiffy (if I understand your spec correctly) that should be about 200 or so of these "tenth-thousands of a second" and you can store a single unsigned byte (and have no limit wrt the duration of runs you're storing for future replay). This depends even more on accurate returns from time.time() of course.
If your 556-byte binary data is highly compressible, it will be worth your while to use gzip to store the stream of timestamp-then-data in compressed form; this is best assessed empirically on your actual data, though.

How get sound input from microphone in python, and process it on the fly?

Greetings,
I'm trying to write a program in Python which would print a string every time it gets a tap in the microphone. When I say 'tap', I mean a loud sudden noise or something similar.
I searched in SO and found this post: Recognising tone of the audio
I think PyAudio library would fit my needs, but I'm not quite sure how to make my program wait for an audio signal (realtime microphone monitoring), and when I got one how to process it (do I need to use Fourier Transform like it was instructed in the above post)?
Thank you in advance for any help you could give me.

If you are using LINUX, you can use pyALSAAUDIO.
For windows, we have PyAudio and there is also a library called SoundAnalyse.
I found an example for Linux here:
#!/usr/bin/python
## This is an example of a simple sound capture script.
##
## The script opens an ALSA pcm for sound capture. Set
## various attributes of the capture, and reads in a loop,
## Then prints the volume.
##
## To test it out, run it and shout at your microphone:
import alsaaudio, time, audioop
# Open the device in nonblocking capture mode. The last argument could
# just as well have been zero for blocking mode. Then we could have
# left out the sleep call in the bottom of the loop
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,alsaaudio.PCM_NONBLOCK)
# Set attributes: Mono, 8000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(8000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
# The period size controls the internal number of frames per period.
# The significance of this parameter is documented in the ALSA api.
# For our purposes, it is suficcient to know that reads from the device
# will return this many frames. Each frame being 2 bytes long.
# This means that the reads below will return either 320 bytes of data
# or 0 bytes of data. The latter is possible because we are in nonblocking
# mode.
inp.setperiodsize(160)
while True:
# Read data from device
l,data = inp.read()
if l:
# Return the maximum of the absolute value of all samples in a fragment.
print audioop.max(data, 2)
time.sleep(.001)

...and when I got one how to process it (do I need to use Fourier Transform like it was instructed in the above post)?
If you want a "tap" then I think you are interested in amplitude more than frequency. So Fourier transforms probably aren't useful for your particular goal. You probably want to make a running measurement of the short-term (say 10 ms) amplitude of the input, and detect when it suddenly increases by a certain delta. You would need to tune the parameters of:
what is the "short-term" amplitude measurement
what is the delta increase you look for
how quickly the delta change must occur
Although I said you're not interested in frequency, you might want to do some filtering first, to filter out especially low and high frequency components. That might help you avoid some "false positives". You could do that with an FIR or IIR digital filter; Fourier isn't necessary.

I know it's an old question, but if someone is looking here again... see https://python-sounddevice.readthedocs.io/en/0.4.1/index.html .
It has a nice example "Input to Ouput Pass-Through" here https://python-sounddevice.readthedocs.io/en/0.4.1/examples.html#input-to-output-pass-through .
... and a lot of other examples as well ...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.