I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.
The microphone info is listed like this by pyaudio:
{'index': 1, 'structVersion': 2, 'name': 'Logitech USB Microphone: Audio (hw:1,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': -1.0, 'defaultHighInputLatency': 0.034829931972789115, 'defaultHighOutputLatency': -1.0, 'defaultSampleRate': 44100.0}
The first thing I tried was setting the pyAudio stream sample rate to 44100 and feeding the model that. But after testing I found out that the model does not work well when it gets a rate different from its requested 16000.
I have been trying to find a way to have the microphone change rate to 16000, or at least have its rate converted to 16000 when it is used in the python script, but to no avail.
The latest thing I have tried is changing the .asoundrc file to find away to change the rate, but I don't know if it is possible to change the microphone's rate to 16000 within this file. This is how the file currently looks like:
pcm.!default {
type asymd
playback.pcm
{
type plug
slave.pcm "dmix"
}
capture.pcm
{
type plug
slave.pcm "usb"
}
}
ctl.!default {
type hw
card 0
}
pcm.usb {
type hw
card 1
device 0
rate 16000
}
The python code I made works on windows, which I guess is because windows does convert the rate of the input to the sample rate in the code. But Linux does not seem to do this.
tldr; microphone rate is 44100, but has to change to 16000 to be usable. How do you do this on Linux?
Edit 1:
I create the pyAudio stream like this:
self.paStream = self.pa.open(rate = self.model.sampleRate(), channels = 1, format= pyaudio.paInt16, input=True, input_device_index = 1, frames_per_buffer= self.model.beamWidth())
It uses the model's rate and model's beamwidth, and the number of channels of the microphone and index of the microphone.
I get the next audio frame and to format it properly to use with the stream I create for the model I do this:
def __get_next_audio_frame__(self):
audio_frame = self.paStream.read(self.model.beamWidth(), exception_on_overflow= False)
audio_frame = struct.unpack_from("h" * self.model.beamWidth(), audio_frame)
return audio_frame
exception_on_overflow = False was used to test the model with an input rate of 44100, without this set to False the same error as I currently deal with would occur. model.beamWidth is a variable that hold the value for the amount of chunks the model expects. I then read that amount of chunks and reformat them before feeding them to the model's stream. Which happens like this:
modelStream.feedAudioContent(self.__get_next_audio_frame__())
So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.
The file is located here: /etc/pulse/daemon.conf .
We can open and edit this file on Raspberian using sudo vi daemon.conf.
Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.
If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.
Related
I'm using ffmpeg-python. I would like to change the sample rate of the audio file.
In ffmpeg-, it seems that you can change the sample rate as follows.
ffmpeg -i" movie.mp3 "-y" movie.flac "-ar 44100
-ar is sample rate.
How do I change the sample rate by ffmpeg-python?
This is my source code that is currently being written.
stream = ffmpeg.input(input_file_path)
audio = stream.audio
stream = ffmpeg.output(audio, output_file_path)
ffmpeg.run(stream, capture_stdout=True, capture_stderr=True)
I soleved.
I can set keyword arguments like this.
stream = ffmpeg.output(audio, output_file_path, **{'ar': '16000','acodec':'flac'})
It is important to keep the ** in place. If you remove it, you will get an error.
I am trying to create a system a part of which requires a microphone to be connected to an arduino. I haven't worked with microphones a lot.
I have connected a microphone (Adafruit Electret Microphone Amplifier - MAX9814 with Auto Gain Control ) to an arduino nano. I want to record audio data from this.
void setup() {
Serial.begin(9600);
pinMode(A2, INPUT);
}
void loop() {
if(Serial.available())
{
Serial.println(analogRead(A2));
}
}
I send the data to the computer and record it using a python script and converted it into a WAV file to make sure that the microphone is working properly. I tried multiple things, using the ADC value, scaling the ADC value between -1 and 1, converting into voltage and then scaling it, but nothing seems to work. When I play it back all I can hear is static with a few clicks where the voice should be.
Below is the python code i wrote for the configuration where I am sending the ADC value using println. Here I collect the data using pyserial library and convert it into a float. Then I normalize it between -1 and 1. Then I save it in a wav file.
import serial
import matplotlib.pyplot as plt
import sounddevice as sd
import numpy as np
from scipy.io.wavfile import write
import pyaudio
import wave
def audnorm(aud):
normaud= -1+2*((aud-np.amin(aud))/(np.amax(aud)-np.amin(aud)))
return normaud
ser = serial.Serial('/dev/ttyACM0',115200)
ser.flushInput()
sound=[]
sound2=[]
while True:
try:
ser_bytes = ser.readline()
ser_bytes2= float(ser_bytes)
sound.append(ser_bytes2)
sound2.append(ser_bytes)
print(ser_bytes+"\t"+str(ser_bytes2))
print(type(ser_bytes))
except:
print("Keyboard Interrupt")
break
print(str(len(sound)))
soundnp= np.asarray(sound)
soundnp= soundnp - np.mean(soundnp)
soundnorm= audnorm(soundnp)
soundnormstr= [str(x) for x in soundnorm]
plt.plot(soundnp)
plt.show()
plt.plot(soundnorm)
plt.show()
wf = wave.open("output.wav", 'wb')
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(10000)
wf.writeframes(b''.join(soundnormstr))
wf.close()
I have attached 2 images of the data I recorded using this code.
What am I doing wrong?
Raw Data
Normalized Data
To be recorded without distortion, the signal you're trying to record - I assume it's an audio signal - requires three things: 1) sampling at a uniform rate, 2) sampling at more than 8,000 samples per second to be able to barely understand a voice, and 3) transmitting or storing the data as fast as you acquire it.
re: 1 & 2) There is an instructable that goes into all the messy details of recording high-fidelity audio on the Arduino. It contains far more information than I could write here. See https://www.instructables.com/id/Arduino-Audio-Input/
If your application requires the Arduino to simply detect that there is a sound - such as a clapping pair of hands - you can get by with a lower and non-uniform sample rate. Search for "Arduino Clapper" to get some ideas.
I agree with Bradford. You will need to make a uniform sampling to acquire the audio signal and 8000 Hz is a minimum.
I think that you need to set a higher serial baud rate to achieve this sampling frequency. I have slightly modified your code to measure with an oscilloscope the "actual maximum frequency" of the serial "transmission" (plus the analogWrite).
void setup() {
// Serial.begin(9600);
Serial.begin(115200);
pinMode(A2, INPUT);
}
void loop() {
//if(Serial.available())
{
int val = analogRead(A2);
Serial.write(0);
}
}
On the oscilloscope, it is roughly 9 kHz frequency, sending simply zeros on the serial wire. see the attached figure. It might be doable (for speech, not for music).
What shall be evaluated and achieved:
I try to record audio data with a minimum of influence by hard- and especially software. After using Adobe Audition for some time I stumbled across PyAudio and was driven by curiosity as well as the possibility to refresh my Python knowledge.
As the fact displayed in the headline above may have given away I compared the sample values of two wave files (indeed sections of them) and had to find out that both programmes produce different output.
As I am definitely at my wit`s end, I do hope to find someone who could help me.
What has been done so far:
An M-Audio “M-Track Two-Channel USB Interface” has been used to record Audio Data with Audition CS6 and PyAudio simultaneously as the following steps are executed in the given order…
Audition is prepared for recording by opening “Prefrences/ Audio Hardware” and selecting the audio interface, a sample rate of 48 kHz and a latency of 250 ms (this value has been examined thoughout the last years as to be the second lowest I can get without getting the warning for lost samples – if I understood the purpose correctly I just have to worry about loosing samples cause monitoring is not an issue).
A new file with one channel, a sample rate of 48 kHz and a bit depth of 24 bit is opened.
The Python code (displayed below) is started and leads to a countdown being used to change over to Audition and start the recording 10 s before Python starts its.)
Wait until Python prints the “end of programme” message.
Stop and save the data recorded by Audition.
Now data has to be examined:
Both files (one recorded by Audition and Python respectively) are opened in Audition (Multitrack session). As Audition was started and terminated manually the two files have completely different beginning and ending times. Then they are aligned visually so that small extracts (which visually – by waveform shape – contain the same data) can be cut out and saved.
A Python programme has been written opening, reading and displaying the sample values using the default wave module and matplotlib.pyplot respectively (graphs are shown below).
Differences in both waveforms and a big question mark are revealed…
Does anybody have an idea why Audition is showing different sample values and specifically where precisely the mistake (is there any?) hides??
some (interesting) observations
a) When calling the pyaudio.PyAudio().get_default_input_device_info() method the default sample rate is listed as 44,1 kHz even though the default M-Track sample rate is said to be 48 kHz by its specifications (indeed Audition recognizes the 48 kHz by resampling incoming data if another rate was selected). Any ideas why and how to change this?
b) Aligning both files using the beginning of the sequence covered by PyAudio and checking whether they are still “in phase” at the end reveals no – PyAudio is shorter and seems to have lost samples (even though no exception was raised and the “exception on overflow” argument is “True”)
c) Using the “frames_per_buffer” keyword in the stream open method I was unable to align both files, having no idea where Python got its data from.
d) Using the “.get_default_input_device_info()” method and trying different sample rates (22,05 k, 44,1 k, 48 k, 192 k) I always receive True as an output.
Official Specifications M-Track:
bit depth = 24 bit
sample rate = 48 kHz
input via XLR
output via USB
Specifications Computer and Software:
Windows 8.1
I5-3230M # 2,6 GHz
8 GB RAM
Python 3.4.2 with PyAudio 0.2.11 – 32 bit
Audition CS6 Version 5.0.2
Python Code
import pyaudio
import wave
import time
formate = pyaudio.paInt24
channels = 1
framerate = 48000
fileName = 'test ' + '.wav'
chunk = 6144
# output of stream.get_read_available() at different positions
p = pyaudio.PyAudio()
stream = p.open(format=formate,
channels=channels,
rate=framerate,
input=True)
#frames_per_buffer=chunk) # observation c
# COUNTDOWN
for n in range(0, 30):
print(n)
time.sleep(1)
# get data
sampleList = []
for i in range(0, 79):
data = stream.read(chunk, exception_on_overflow = True)
sampleList.append(data)
print('end -', time.strftime('%d.%m.%Y %H:%M:%S', time.gmtime(time.time())))
stream.stop_stream()
stream.close()
p.terminate()
# produce file
file = wave.open(fileName, 'w')
file.setnchannels(channels)
file.setframerate(framerate)
file.setsampwidth(p.get_sample_size(formate))
file.writeframes(b''.join(sampleList))
file.close()
Figure 1: first comparison Audition – PyAudio
image 1
Figure 2: second comparison Audition - Pyaudio
image 2
I am trying to tie together javascript front end, flask server and microsoft's cognitive services for audio identification.
Microsoft's server requests audio data to be with specific parameters, particularly it requests 16000 framerate\frequency.
But from the browser on windows I can only get 41000.
Now, I get audio at 41000, and then save it like this:
audioData = message['audio']
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
af.writeframes(audioData)
af.close()
Audio is received through socketio in form of a dict\json data. If I save it directly without changing anything, it sounds fine. But If I change the sample rate to 16000, it obviously sounds distorted and very slow, so a few seconds of audio stretch into a minute or so.
How do I correctly change the audio rate witohut affecting how it sounds in Python 3.4?
Thanks.
EDIT:
Here is the working code:
with open("audioData_original.wav", 'wb') as of:
of.write(message['audio'])
audioFile = wave.open("audioData_original.wav", 'r')
n_frames = audioFile.getnframes()
audioData = audioFile.readframes(n_frames)
originalRate = audioFile.getframerate()
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
converted = audioop.ratecv(audioData, 2, 1, originalRate, 16000, None)
af.writeframes(converted[0])
af.close()
audioFile.close()
The downside here is that even though I get audio data from mediaRecorder Api through json, so I have it in memory... And I write it down on disk, and open it again to be able to get the sampling rate using wave's functions. But how do I do it without writing it to disk? Thanks. If I have to make a new question for that, sure, can do that.
EDIT2:
Oh, ok, answering my own follow-up question - io.BytesIO did the trick.
Have a look at audioop.ratecv (it's in the standard library)
Let it operate on the raw frames of your sample (in your case, audioData).
It's a simple algorithm so expect some sound quality loss, but I guess for speech that is insignificant.
I'm playing with pyaudio on a mac using a Saffire Pro 40 sound card.
Currently I have two inputs plugged in and I'd like to control the levels of the second input channel programmatically. (This works fine using the sound card's mix control software).
I've been going through the pyaudio docs, but haven't found anything glaring on this issue so far. What's the simplest way to essentially do what the mix control software does (control volume per channel) programmatically? (A Python API would be nice, but not essential)
To simplify: it looks like it's possible to manually read the streams from the channels I want to control, scale them using numpy, them write them as output, but I'm hoping there is a method to simply send a normalized value per channel to control it.
So instead of something like this:
stream1 = pyaudioInstance.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
output = True,
input_device_index = 0,
frames_per_buffer = CHUNK
)
stream2 = pyaudioInstance.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
input_device_index = 1,
frames_per_buffer = CHUNK
)
while processingAudio:
# manually fetch each channel
data1In = stream1.read(CHUNK)
data2In = stream2.read(CHUNK)
# convert to numpy to easy scale the arrays
decodeddata1 = numpy.fromstring(data1In, numpy.int16)
decodeddata2 = numpy.fromstring(data2In, numpy.int16)
newdata = (decodeddata1 * 0.5 + decodeddata2* 0.1).astype(numpy.int16)
# finally write the processed data
stream1.write(result.tostring())
This is a bit misleading but I would need to mix separate channels from the same input device index. However what I'm hoping is something like:
someSoundCardAPI.channels[0].setVolume(0.2)
Having a look at the Channel Maps example feels closer to what I'm after. At the moment I find the host_api_specific part of API a bit confusing and I was hoping someone already has some experience successfully using this.
I am using OSX 10.10
I don't really have any experience with OSX, so I don't know, but normally you can remote-control everything with AppleScript.
See, for example, this question.
It doesn't say how to control the volume of a single channel separately, though.
Probably you should ask there ...
Regarding the inferior work-around, you can use python-sounddevice to create a little (untested) Python script:
import sounddevice as sd
def callback(indata, outdata, *stuff):
outdata[:] = indata * [1, 0.5]
with sd.Stream(channels=2, callback=callback):
input()
This script will run until you press <Return> and it will reduce the volume of the second channel.