Python wave audio sample rate - python

I am trying to tie together javascript front end, flask server and microsoft's cognitive services for audio identification.
Microsoft's server requests audio data to be with specific parameters, particularly it requests 16000 framerate\frequency.
But from the browser on windows I can only get 41000.
Now, I get audio at 41000, and then save it like this:
audioData = message['audio']
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
af.writeframes(audioData)
af.close()
Audio is received through socketio in form of a dict\json data. If I save it directly without changing anything, it sounds fine. But If I change the sample rate to 16000, it obviously sounds distorted and very slow, so a few seconds of audio stretch into a minute or so.
How do I correctly change the audio rate witohut affecting how it sounds in Python 3.4?
Thanks.
EDIT:
Here is the working code:
with open("audioData_original.wav", 'wb') as of:
of.write(message['audio'])
audioFile = wave.open("audioData_original.wav", 'r')
n_frames = audioFile.getnframes()
audioData = audioFile.readframes(n_frames)
originalRate = audioFile.getframerate()
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
converted = audioop.ratecv(audioData, 2, 1, originalRate, 16000, None)
af.writeframes(converted[0])
af.close()
audioFile.close()
The downside here is that even though I get audio data from mediaRecorder Api through json, so I have it in memory... And I write it down on disk, and open it again to be able to get the sampling rate using wave's functions. But how do I do it without writing it to disk? Thanks. If I have to make a new question for that, sure, can do that.
EDIT2:
Oh, ok, answering my own follow-up question - io.BytesIO did the trick.

Have a look at audioop.ratecv (it's in the standard library)
Let it operate on the raw frames of your sample (in your case, audioData).
It's a simple algorithm so expect some sound quality loss, but I guess for speech that is insignificant.

Related

Python nidaqmx stream read does not change on every read

What I'm trying to do is setup 16 analog input channels, sample them constantly at a given rate and read 1 sample from each channel when calling the read function. Ideally I would like to read the newest sample so I can timestamp it when reading.
The problem is that the readings do not change from read to read, only after a few seconds. If I adjust the sampling speed, I can get to a situation where I get an error saying the software can't keep up with the hardware sampling rate.
Which part of my code is wrong?
import numpy
import nidaqmx
from nidaqmx.stream_readers import AnalogSingleChannelReader, AnalogMultiChannelReader
from nidaqmx.constants import Edge, AcquisitionType
# Create a task and a reader
task = nidaqmx.Task()
values_read = numpy.zeros(16, dtype = numpy.float64)
task.ai_channels.add_ai_current_chan('cDAQ1Mod2/ai0:15')
task.timing.cfg_samp_clk_timing(rate = 1000, active_edge = Edge.RISING, sample_mode = AcquisitionType.CONTINUOUS, samps_per_chan = 1)
reader = AnalogMultiChannelReader(task.in_stream)
task.start()
while 1:
reader.read_one_sample(values_read)
print(values_read)
The sampling rate is 1000 but you are reading only one sample each time. Usually, each Read call takes a few milliseconds. You are not reading fast enough hence the buffer overflow error.
Suggestions:
Reduce sample rate.
Read more samples per Read call.
Since you want to read only the latest data and timestamp yourself, you can use the On Demand software timed acquisition. See example ai_voltage_sw_timed.py

How to play base64-encoded sound in Python

I want to play a base64-encoded sound in Python, I've tried using Pygame.mixer but all I get is a hiss of white noise.
This is an example of my code:
import pygame
coinflip = b'data:audio/ogg;base64,T2dnUwACAAAAAAAAAACYZ...' # Truncated for brevity
flip = pygame.mixer.Sound(coinflip)
ch = flip.play()
while ch.get_busy():
pygame.time.wait(100)
The pygame mixer works well if I import a wav/mp3/ogg file, but I want to write a compact self-contained program that doesn't need external files, so I'm trying to embed a base64 encoded version of the sound in the Python code.
NB: The solution doesn't need to be using pygame, but it would be preferable since I'm already using it elsewhere in the program.
The reason you hear white noise is because you try to play audio data with a diffrent encoding then expected.
I think the documentation is not 100% clear about this, but it states that a Sound object represents actual sound sample data. It can be loaded from a file or a buffer. Apparently, when using a buffer, it does expect raw sample data, not some base64-encoded data (and not even raw MP3 or OGG file data).
Note that there has been an issue reported about this on the GitHub repository.
So there are two things you can do:
Get the raw bytes of your sound (e.g. using pygame.mixer.Sound(filename).get_raw(), or for simple sounds you could create them mathematically) and decode that in base64 format.
Wrap the original (MP3/OGG encoded) file data in a BytesIO object, which is a file-like object, so the Sound module will treat it like a file and properly decode it.
Note that in both cases, you still need to base64-decode the data first! The pygame module doesn't automatically do that for you.
Since you want a small file, option 2 is the best. But I'll give examples of both solutions.
Example 1
If you have the raw sample data, you could use that directly as the buffer argument for pygame.mixer.Sound(). Note that the sample data must match the frequency, bit size and number of channels used by the mixer. The following is a small example that plays a 400 Hz sine wave tone.
import base64
import pygame
# The following bytes object consists of 160 signed 8-bit samples,
# which are base64 encoded, When played at 8000 Hz, it results in a
# tone of 400 Hz. The duration of the sound is 0.02 Hz, so it should
# be looped 50 times per second for longer sounds.
base64_encoded_sound_data = b'''
gKfK5vj/+ObKp39YNRkHAQcZNViAp8rm+P/45sqngFg1GQcBBxk1WI
Cnyub4//jmyqeAWDUZBwEHGTVYf6fK5vj/+ObKp39YNRkHAQcZNViA
p8rm+P/45sqngFg1GQcBBxk1WH+nyub4//jmyqd/WDUZBwEHGTVYf6
fK5vj/+ObKp39YNRkHAQcZNViAp8rm+P/45sqnf1g1GQcBBxk1WA==
'''
pygame.mixer.init(frequency=8000, size=8, channels=1, allowedchanges=0)
sound_data = base64.b64decode(base64_encoded_sound_data)
sound = pygame.mixer.Sound(sound_data)
ch = sound.play(loops=50)
while ch.get_busy():
pygame.time.wait(100)
Example 2
If you want to use a MP3 or OGG file (which is generally much smaller), you could do it like the following example
import base64
import io
import pygame
# Your base64-encoded data here.
# NOTE: Do NOT include the "data:audio/ogg;base64," part.
base64_encoded_sound_file_data = b'T2dnUwACAAAAAAAAAACY...' # Truncated for brevity
pygame.mixer.init()
sound_file_data = base64.b64decode(base64_encoded_sound_file_data)
assert sound_file_data.startswith(b'OggS') # just to prove it is an Ogg Vorbis file
sound_file = io.BytesIO(sound_file_data)
# The following line will only work with VALID data. With above example data it will fail.
sound = pygame.mixer.Sound(sound_file)
ch = sound.play()
while ch.get_busy():
pygame.time.wait(100)
I would have preferred to use real data in this example as well, but the smallest useful Ogg file I could find was 9 kB, which would add about 120 long lines of data, and I don't think that is appropriate for a Stack Overflow answer. But if you replace it with your own data (which is hopefully a valid Ogg audio file), it should work.

Python - Reading a large audio file to a stream?

The Question
I want to load an audio file of any type (mp3, m4a, flac, etc) and write it to an output stream.
I tried using pydub, but it loads the entire file at once which takes forever and runs out of memory easily.
I also tried using python-vlc, but it's been unreliable and too much of a black box.
So, how can I open large audio files chunk-by-chunk for streaming?
Edit #1
I found half of a solution here, but I'll need to do more research for the other half.
TL;DR: Use subprocess and ffmpeg to convert the file to wav data, and pipe that data into np.frombuffer. The problem is, the subprocess still has to finish before frombuffer is used.
...unless it's possible to have the pipe written to on 1 thread while np reads it from another thread, which I haven't tested yet. For now, this problem is not solved.
I think the python package https://github.com/irmen/pyminiaudio can be of helpful. You can stream an audio file like this
import miniaudio
audio_path = "my_audio_file.mp3"
target_sampling_rate = 44100 #the input audio will be resampled a this sampling rate
n_channels = 1 #either 1 or 2
waveform_duration = 30 #in seconds
offset = 15 #this means that we read only in the interval [15s, duration of file]
waveform_generator = miniaudio.stream_file(
filename = audio_path,
sample_rate = target_sampling_rate,
seek_frame = int(offset * target_sampling_rate),
frames_to_read = int(waveform_duration * target_sampling_rate),
output_format = miniaudio.SampleFormat.FLOAT32,
nchannels = n_channels)
for waveform in waveform_generator:
#do something with the waveform....
I know for sure that this works on mp3, ogg, wav, flac but for some reason it does not on mp4/acc and I am actually looking for a way to read mp4/acc

How to change microphone sample rate to 16000 on linux?

I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.
The microphone info is listed like this by pyaudio:
{'index': 1, 'structVersion': 2, 'name': 'Logitech USB Microphone: Audio (hw:1,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': -1.0, 'defaultHighInputLatency': 0.034829931972789115, 'defaultHighOutputLatency': -1.0, 'defaultSampleRate': 44100.0}
The first thing I tried was setting the pyAudio stream sample rate to 44100 and feeding the model that. But after testing I found out that the model does not work well when it gets a rate different from its requested 16000.
I have been trying to find a way to have the microphone change rate to 16000, or at least have its rate converted to 16000 when it is used in the python script, but to no avail.
The latest thing I have tried is changing the .asoundrc file to find away to change the rate, but I don't know if it is possible to change the microphone's rate to 16000 within this file. This is how the file currently looks like:
pcm.!default {
type asymd
playback.pcm
{
type plug
slave.pcm "dmix"
}
capture.pcm
{
type plug
slave.pcm "usb"
}
}
ctl.!default {
type hw
card 0
}
pcm.usb {
type hw
card 1
device 0
rate 16000
}
The python code I made works on windows, which I guess is because windows does convert the rate of the input to the sample rate in the code. But Linux does not seem to do this.
tldr; microphone rate is 44100, but has to change to 16000 to be usable. How do you do this on Linux?
Edit 1:
I create the pyAudio stream like this:
self.paStream = self.pa.open(rate = self.model.sampleRate(), channels = 1, format= pyaudio.paInt16, input=True, input_device_index = 1, frames_per_buffer= self.model.beamWidth())
It uses the model's rate and model's beamwidth, and the number of channels of the microphone and index of the microphone.
I get the next audio frame and to format it properly to use with the stream I create for the model I do this:
def __get_next_audio_frame__(self):
audio_frame = self.paStream.read(self.model.beamWidth(), exception_on_overflow= False)
audio_frame = struct.unpack_from("h" * self.model.beamWidth(), audio_frame)
return audio_frame
exception_on_overflow = False was used to test the model with an input rate of 44100, without this set to False the same error as I currently deal with would occur. model.beamWidth is a variable that hold the value for the amount of chunks the model expects. I then read that amount of chunks and reformat them before feeding them to the model's stream. Which happens like this:
modelStream.feedAudioContent(self.__get_next_audio_frame__())
So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.
The file is located here: /etc/pulse/daemon.conf .
We can open and edit this file on Raspberian using sudo vi daemon.conf.
Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.
If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.

How to control a sound card programmatically?

I'm playing with pyaudio on a mac using a Saffire Pro 40 sound card.
Currently I have two inputs plugged in and I'd like to control the levels of the second input channel programmatically. (This works fine using the sound card's mix control software).
I've been going through the pyaudio docs, but haven't found anything glaring on this issue so far. What's the simplest way to essentially do what the mix control software does (control volume per channel) programmatically? (A Python API would be nice, but not essential)
To simplify: it looks like it's possible to manually read the streams from the channels I want to control, scale them using numpy, them write them as output, but I'm hoping there is a method to simply send a normalized value per channel to control it.
So instead of something like this:
stream1 = pyaudioInstance.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
output = True,
input_device_index = 0,
frames_per_buffer = CHUNK
)
stream2 = pyaudioInstance.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
input_device_index = 1,
frames_per_buffer = CHUNK
)
while processingAudio:
# manually fetch each channel
data1In = stream1.read(CHUNK)
data2In = stream2.read(CHUNK)
# convert to numpy to easy scale the arrays
decodeddata1 = numpy.fromstring(data1In, numpy.int16)
decodeddata2 = numpy.fromstring(data2In, numpy.int16)
newdata = (decodeddata1 * 0.5 + decodeddata2* 0.1).astype(numpy.int16)
# finally write the processed data
stream1.write(result.tostring())
This is a bit misleading but I would need to mix separate channels from the same input device index. However what I'm hoping is something like:
someSoundCardAPI.channels[0].setVolume(0.2)
Having a look at the Channel Maps example feels closer to what I'm after. At the moment I find the host_api_specific part of API a bit confusing and I was hoping someone already has some experience successfully using this.
I am using OSX 10.10
I don't really have any experience with OSX, so I don't know, but normally you can remote-control everything with AppleScript.
See, for example, this question.
It doesn't say how to control the volume of a single channel separately, though.
Probably you should ask there ...
Regarding the inferior work-around, you can use python-sounddevice to create a little (untested) Python script:
import sounddevice as sd
def callback(indata, outdata, *stuff):
outdata[:] = indata * [1, 0.5]
with sd.Stream(channels=2, callback=callback):
input()
This script will run until you press <Return> and it will reduce the volume of the second channel.

Categories