How to control a sound card programmatically? - python

I'm playing with pyaudio on a mac using a Saffire Pro 40 sound card.
Currently I have two inputs plugged in and I'd like to control the levels of the second input channel programmatically. (This works fine using the sound card's mix control software).
I've been going through the pyaudio docs, but haven't found anything glaring on this issue so far. What's the simplest way to essentially do what the mix control software does (control volume per channel) programmatically? (A Python API would be nice, but not essential)
To simplify: it looks like it's possible to manually read the streams from the channels I want to control, scale them using numpy, them write them as output, but I'm hoping there is a method to simply send a normalized value per channel to control it.
So instead of something like this:
stream1 = pyaudioInstance.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
output = True,
input_device_index = 0,
frames_per_buffer = CHUNK
)
stream2 = pyaudioInstance.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
input_device_index = 1,
frames_per_buffer = CHUNK
)
while processingAudio:
# manually fetch each channel
data1In = stream1.read(CHUNK)
data2In = stream2.read(CHUNK)
# convert to numpy to easy scale the arrays
decodeddata1 = numpy.fromstring(data1In, numpy.int16)
decodeddata2 = numpy.fromstring(data2In, numpy.int16)
newdata = (decodeddata1 * 0.5 + decodeddata2* 0.1).astype(numpy.int16)
# finally write the processed data
stream1.write(result.tostring())
This is a bit misleading but I would need to mix separate channels from the same input device index. However what I'm hoping is something like:
someSoundCardAPI.channels[0].setVolume(0.2)
Having a look at the Channel Maps example feels closer to what I'm after. At the moment I find the host_api_specific part of API a bit confusing and I was hoping someone already has some experience successfully using this.
I am using OSX 10.10

I don't really have any experience with OSX, so I don't know, but normally you can remote-control everything with AppleScript.
See, for example, this question.
It doesn't say how to control the volume of a single channel separately, though.
Probably you should ask there ...
Regarding the inferior work-around, you can use python-sounddevice to create a little (untested) Python script:
import sounddevice as sd
def callback(indata, outdata, *stuff):
outdata[:] = indata * [1, 0.5]
with sd.Stream(channels=2, callback=callback):
input()
This script will run until you press <Return> and it will reduce the volume of the second channel.

Related

Inaccurate real-time audio FFT interpretation with Python

I'm trying to use Python to create a live music visualization. The libraries I'm using are SoundCard (for live audio capture) and Librosa (for short-time Fourier transform).
However I suspect I'm not interpreting the audio data correctly. Looking at the 100Hz-200Hz bin, I get a constant stream of sound even when the song doesn't contain that much bass (or really, any whatsoever). I admit I am a bit in over my head with all the audio processing FFT stuff, since it's not really my expertise and the math beats me most of the time.
This is the function that captures and analyses the audio. lb is set to the speakers and works properly. Fs is set to 48000 and I record 1000 frames in the attempt of keeping 48FPS. fftwindowsize is set to 2048*8 because... I'm not sure. I increased the number until Librosa stopped throwing warnings.
def audioanalysis():
with lb.recorder(samplerate=Fs) as mic:
rawdata = mic.record(numframes=1000)
datalen: int = int(rawdata.size/2)
monodata = numpy.empty(datalen)
for x in range(0, datalen):
monodata[x] = max(rawdata[x][0], rawdata[x][1])
data = numpy.abs(librosa.stft(monodata, n_fft=fftwindowsize, hop_length=1024))
return librosa.amplitude_to_db(data, ref=numpy.max)
And the code for making buckets:
frequencies = librosa.core.fft_frequencies(n_fft=fftwindowsize)
freq_index_ratio = len(frequencies)/frequencies[len(frequencies)-1] / 2
[...]
for i in range(0,buckets):
avg = 0
for j in range (i * bucketsize, (i+1)*bucketsize):
avg += amp(spectrogram=spectrogram, freq=j)
amps[i] = avg/bucketsize
def amp(spectrogram, freq) -> float:
return spectrogram[int(freq*freq_index_ratio)]
Over the course of a song, amps[1] (so 100Hz-200Hz) stays in the -50dB to -30dB range, which isn't really useful or representative of the song playing.
Is my FFT analysis wrong? Is there no way to better interpret short samples of sound?
P.S. I know my Python code isn't excellent. This is my first project in Python :)

Read left channel of wav data into numpy array

I'm using pyaudio to take input from a microphone or read a wav file, and analyze the stream while playing it. I want to only analyze the right channel if the input is stereo. I've been able to extract the data and convert to integers using loops:
levels = []
length = len(data)
if channels == 1:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
elif channels == 2:
for i in range(length//4):
j = 4 * i + 2
volume = abs(struct.unpack('<h', data[j:j+2])[0])
levels.append(volume)
I think this working correctly, I know it runs without error on a laptop and Raspberry Pi 3, but it appears to consume too much time to run on a Raspberry Pi Zero when simultaneously streaming the output to a speaker. I figure that eliminating the loop and using numpy may help. I assume I need to use np.ndarray to do this, and the first parameter will be (CHUNK,) where CHUNK is my chunk size for analyzing the audio (I'm using 1024). And the format would be '<h', as in the struct code above, I think. But I'm at a loss as to how to code it correctly for each of the two cases (mono and right channel only for stereo). How do I create the numpy arrays for each of the two cases?
You are here reading 16-bit integers from a binary file. It seems that you are first reading the data into data variable with something like data = f.read(), which is here not visible. Then you do:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
BTW, that code is wrong, it shoud be abs(struct.unpack('<h', data[2*i:2*i+2])[0]), otherwise you are overlapping bytes from different values.
To do the same with numpy, you should just do this (instead of both f.read()and the whole loop):
data = np.fromfile(f, dtype='<i2')
This is over 100 times faster than the manual thing above in my test on 5 MB of data.
In the second case, you have interleaved left-right-left-right values. Again you can read them all (assuming you have enough memory) and then access only one half:
data = np.fromfile(f, dtype='<i2')
left = data[::2]
right = data[1::2]
This processes everything, even though you need just one half, but it is still much much faster.
EDIT: If the data not coming from a file, np.fromfile can be replaced with np.frombuffer. Then you have this:
channel_data = np.frombuffer(data, dtype='<i2')
if channels == 2:
channel_data = channel_data[1::2]
levels = np.abs(channel_data)

Making a synthesized sound at arbitrary pitch in python

I'm finding it strangely hard to find a synthesizer module for python that allows the program to play a note at an arbitrary pitch. Preferably the note should be more than just a pure sinewave and should include at least a few harmonics - it should be more than just a beep.
The idea is to be able to write something like
the_module.play(frequency, loudness, duration)
or
my_synth = the_module.newsynth()
my_synth.play(frequency, loudness, duration)
where frequency is specified in Hz, and have a synthesized tone play from the user's speakers. There's JavaScript modules for doing this, such as Tone.js, but does anyone know of an approach using Python?
If on windows, you can use the builtin winsound.Beep.
If on Linux, you need to write directly to /dev/audio, like suggested here:
def beep(frequency, amplitude, duration):
sample = 8000
half_period = int(sample/frequency/2)
beep = chr(amplitude)*half_period+chr(0)*half_period
beep *= int(duration*frequency)
audio = file('/dev/audio', 'wb')
audio.write(beep)
audio.close()

Python wave audio sample rate

I am trying to tie together javascript front end, flask server and microsoft's cognitive services for audio identification.
Microsoft's server requests audio data to be with specific parameters, particularly it requests 16000 framerate\frequency.
But from the browser on windows I can only get 41000.
Now, I get audio at 41000, and then save it like this:
audioData = message['audio']
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
af.writeframes(audioData)
af.close()
Audio is received through socketio in form of a dict\json data. If I save it directly without changing anything, it sounds fine. But If I change the sample rate to 16000, it obviously sounds distorted and very slow, so a few seconds of audio stretch into a minute or so.
How do I correctly change the audio rate witohut affecting how it sounds in Python 3.4?
Thanks.
EDIT:
Here is the working code:
with open("audioData_original.wav", 'wb') as of:
of.write(message['audio'])
audioFile = wave.open("audioData_original.wav", 'r')
n_frames = audioFile.getnframes()
audioData = audioFile.readframes(n_frames)
originalRate = audioFile.getframerate()
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
converted = audioop.ratecv(audioData, 2, 1, originalRate, 16000, None)
af.writeframes(converted[0])
af.close()
audioFile.close()
The downside here is that even though I get audio data from mediaRecorder Api through json, so I have it in memory... And I write it down on disk, and open it again to be able to get the sampling rate using wave's functions. But how do I do it without writing it to disk? Thanks. If I have to make a new question for that, sure, can do that.
EDIT2:
Oh, ok, answering my own follow-up question - io.BytesIO did the trick.
Have a look at audioop.ratecv (it's in the standard library)
Let it operate on the raw frames of your sample (in your case, audioData).
It's a simple algorithm so expect some sound quality loss, but I guess for speech that is insignificant.

Producing histogram of sound for m3u of radio station

I want to sample a radio station which broadcasts in format *.m3u8 and to produce the histogram of the first n seconds (where the user fixes n).
I had been trying using radiopy but it doesn't work and gnuradio seems useless. How can I produce and show this histogram?
EDIT: Now I use Gstreamer v1.0 so I can play it directly but now I need to live-sample my broadcast. How can I do it using Gst?
gnuradio seems useless
Well, I'd argue that this is what you're looking for, if you're looking for a live spectrogram:
As you can see, it's but a matter of connecting a properly configured audio source to a Qt GUI sink. If properly configured (I wrote an answer about that, and a GNU Radio wiki page as well).
Point is: you shouldn't be trying to play an internet station by yourself. Let a software do that which knows what it is doing.
In your case, I'd recommend:
using VLC or mplayer to write the radio, after decoding it to PCM 32bit float of a fixed sampling rate to a file.
Use Python with the libraries Numpy to open that file (samples = numpy.fromfile(filename, dtype=numpy.float32)), and matplotlib/pyplot to plot a spectrogram to a file, i.e. something like (untested, because written right here):
#!/usr/bin/python2
import sys
import os
import tempfile
import numpy
from matplotlib import pyplot
stream = sys.argv[1] ## you can pass the stream URL as argument
outfile = sys.argv[2] ## second argument: output file; ending determines type!
num_of_seconds = min(int(sys.argv[3]), 60) # not more than 1min of streaming
(intermediate_file, inter_fname) = tempfile.mkstemp()
# pan = number of output channels (1: mix to mono)
# resample = sampling rate. this must be the same for all files, so that we can actually compare spectrograms
# format = sample format, here: native floats
sys.system("mplayer -endpos %d -vo null -af pan=1 -af resample=441000 -af format=floatne -ao pcm:nowaveheader:file=%s" % num_of_seconds % inter_fname)
samples = numpy.fromfile(inter_fname, dtype=float32)
pyplot.figure((num_of_seconds * 44100, 256), dpi=1)
### Attention: this call to specgram expects of you to understand what the Discrete Fourier Transform does.
### This uses a Hanning window by default; whether that is appropriate for audio data is questionable. Use all your DSP skillz!
### pyplot.specgram has a lot of options, including colormaps, frequency scaling, overlap. Make yourself acquintanced with those!
pyplot.specgram(samples, NFFT=256, FS=44100)
pyplot.savefig(outfile, bbox_inches="tight")

Categories