I'm using ffmpeg-python. I would like to change the sample rate of the audio file.
In ffmpeg-, it seems that you can change the sample rate as follows.
ffmpeg -i" movie.mp3 "-y" movie.flac "-ar 44100
-ar is sample rate.
How do I change the sample rate by ffmpeg-python?
This is my source code that is currently being written.
stream = ffmpeg.input(input_file_path)
audio = stream.audio
stream = ffmpeg.output(audio, output_file_path)
ffmpeg.run(stream, capture_stdout=True, capture_stderr=True)
I soleved.
I can set keyword arguments like this.
stream = ffmpeg.output(audio, output_file_path, **{'ar': '16000','acodec':'flac'})
It is important to keep the ** in place. If you remove it, you will get an error.
Related
What I'm trying to do is setup 16 analog input channels, sample them constantly at a given rate and read 1 sample from each channel when calling the read function. Ideally I would like to read the newest sample so I can timestamp it when reading.
The problem is that the readings do not change from read to read, only after a few seconds. If I adjust the sampling speed, I can get to a situation where I get an error saying the software can't keep up with the hardware sampling rate.
Which part of my code is wrong?
import numpy
import nidaqmx
from nidaqmx.stream_readers import AnalogSingleChannelReader, AnalogMultiChannelReader
from nidaqmx.constants import Edge, AcquisitionType
# Create a task and a reader
task = nidaqmx.Task()
values_read = numpy.zeros(16, dtype = numpy.float64)
task.ai_channels.add_ai_current_chan('cDAQ1Mod2/ai0:15')
task.timing.cfg_samp_clk_timing(rate = 1000, active_edge = Edge.RISING, sample_mode = AcquisitionType.CONTINUOUS, samps_per_chan = 1)
reader = AnalogMultiChannelReader(task.in_stream)
task.start()
while 1:
reader.read_one_sample(values_read)
print(values_read)
The sampling rate is 1000 but you are reading only one sample each time. Usually, each Read call takes a few milliseconds. You are not reading fast enough hence the buffer overflow error.
Suggestions:
Reduce sample rate.
Read more samples per Read call.
Since you want to read only the latest data and timestamp yourself, you can use the On Demand software timed acquisition. See example ai_voltage_sw_timed.py
The Question
I want to load an audio file of any type (mp3, m4a, flac, etc) and write it to an output stream.
I tried using pydub, but it loads the entire file at once which takes forever and runs out of memory easily.
I also tried using python-vlc, but it's been unreliable and too much of a black box.
So, how can I open large audio files chunk-by-chunk for streaming?
Edit #1
I found half of a solution here, but I'll need to do more research for the other half.
TL;DR: Use subprocess and ffmpeg to convert the file to wav data, and pipe that data into np.frombuffer. The problem is, the subprocess still has to finish before frombuffer is used.
...unless it's possible to have the pipe written to on 1 thread while np reads it from another thread, which I haven't tested yet. For now, this problem is not solved.
I think the python package https://github.com/irmen/pyminiaudio can be of helpful. You can stream an audio file like this
import miniaudio
audio_path = "my_audio_file.mp3"
target_sampling_rate = 44100 #the input audio will be resampled a this sampling rate
n_channels = 1 #either 1 or 2
waveform_duration = 30 #in seconds
offset = 15 #this means that we read only in the interval [15s, duration of file]
waveform_generator = miniaudio.stream_file(
filename = audio_path,
sample_rate = target_sampling_rate,
seek_frame = int(offset * target_sampling_rate),
frames_to_read = int(waveform_duration * target_sampling_rate),
output_format = miniaudio.SampleFormat.FLOAT32,
nchannels = n_channels)
for waveform in waveform_generator:
#do something with the waveform....
I know for sure that this works on mp3, ogg, wav, flac but for some reason it does not on mp4/acc and I am actually looking for a way to read mp4/acc
I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.
The microphone info is listed like this by pyaudio:
{'index': 1, 'structVersion': 2, 'name': 'Logitech USB Microphone: Audio (hw:1,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': -1.0, 'defaultHighInputLatency': 0.034829931972789115, 'defaultHighOutputLatency': -1.0, 'defaultSampleRate': 44100.0}
The first thing I tried was setting the pyAudio stream sample rate to 44100 and feeding the model that. But after testing I found out that the model does not work well when it gets a rate different from its requested 16000.
I have been trying to find a way to have the microphone change rate to 16000, or at least have its rate converted to 16000 when it is used in the python script, but to no avail.
The latest thing I have tried is changing the .asoundrc file to find away to change the rate, but I don't know if it is possible to change the microphone's rate to 16000 within this file. This is how the file currently looks like:
pcm.!default {
type asymd
playback.pcm
{
type plug
slave.pcm "dmix"
}
capture.pcm
{
type plug
slave.pcm "usb"
}
}
ctl.!default {
type hw
card 0
}
pcm.usb {
type hw
card 1
device 0
rate 16000
}
The python code I made works on windows, which I guess is because windows does convert the rate of the input to the sample rate in the code. But Linux does not seem to do this.
tldr; microphone rate is 44100, but has to change to 16000 to be usable. How do you do this on Linux?
Edit 1:
I create the pyAudio stream like this:
self.paStream = self.pa.open(rate = self.model.sampleRate(), channels = 1, format= pyaudio.paInt16, input=True, input_device_index = 1, frames_per_buffer= self.model.beamWidth())
It uses the model's rate and model's beamwidth, and the number of channels of the microphone and index of the microphone.
I get the next audio frame and to format it properly to use with the stream I create for the model I do this:
def __get_next_audio_frame__(self):
audio_frame = self.paStream.read(self.model.beamWidth(), exception_on_overflow= False)
audio_frame = struct.unpack_from("h" * self.model.beamWidth(), audio_frame)
return audio_frame
exception_on_overflow = False was used to test the model with an input rate of 44100, without this set to False the same error as I currently deal with would occur. model.beamWidth is a variable that hold the value for the amount of chunks the model expects. I then read that amount of chunks and reformat them before feeding them to the model's stream. Which happens like this:
modelStream.feedAudioContent(self.__get_next_audio_frame__())
So after some more testing I wound up editing the config file for pulse. In this file you are able to uncomment entries which allow you to edit the default and/or alternate sampling rate. The editing of the alternative sampling rate from 48000 to 16000 is what was able to solve my problem.
The file is located here: /etc/pulse/daemon.conf .
We can open and edit this file on Raspberian using sudo vi daemon.conf.
Then we need to uncomment the line ; alternate-sample-rate = 48000 which is done by removing the ; and change the value of 48000 to 16000. Save the file and exit vim. Then restart the Pulseaudio using pulseaudio -k to make sure it runs the changed file.
If you are unfamiliar with vim and Linux here is a more elaborate guide through the process of changing the sample rate.
I'm doing an rfft and irfft from a wave file:
samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
output = np.fft.irfft(fftData).astype(data.dtype)
So it reads from a file and then does rfft. However it produces a lot of noise when I play the audio with py audio stream. I tried to search an answer to this question and used this solution:
rfft or irfft increasing wav file volume in python
That is why I have the .astype(data.dtype) when doing the irfft. However it doesn't reduce the noise, it reduced it a bit but still it sounds all wrong.
This is the playback, where p is the pyAudio:
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
stream.write(output)
stream.stop_stream()
stream.close()
p.terminate()
So what am I doing wrong here?
Thanks!
edit: Also I tried to use .astype(dtype=np.float32) when doing the irfft as the pyaudio uses that when streaming audio. However it was still noisy.
The best working solution this far seems to be normalization with median value and using .astype(np.float32) as pyAudio output is float32:
samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
fftData = np.divide(fftData, np.median(fftData))
output = np.fft.irfft(fftData).astype(dtype=np.float32)
If anyone has better solutions I'd like to hear. I tried with mean normalization but it still resulted in clipping audio, normalization with np.max made the whole audio too low. This normalization problem with FFT is always giving me trouble and haven't found any 100% working solutions here in SO.
I am trying to tie together javascript front end, flask server and microsoft's cognitive services for audio identification.
Microsoft's server requests audio data to be with specific parameters, particularly it requests 16000 framerate\frequency.
But from the browser on windows I can only get 41000.
Now, I get audio at 41000, and then save it like this:
audioData = message['audio']
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
af.writeframes(audioData)
af.close()
Audio is received through socketio in form of a dict\json data. If I save it directly without changing anything, it sounds fine. But If I change the sample rate to 16000, it obviously sounds distorted and very slow, so a few seconds of audio stretch into a minute or so.
How do I correctly change the audio rate witohut affecting how it sounds in Python 3.4?
Thanks.
EDIT:
Here is the working code:
with open("audioData_original.wav", 'wb') as of:
of.write(message['audio'])
audioFile = wave.open("audioData_original.wav", 'r')
n_frames = audioFile.getnframes()
audioData = audioFile.readframes(n_frames)
originalRate = audioFile.getframerate()
af = wave.open('audioData.wav', 'w')
af.setnchannels(1)
af.setparams((1, 2, 16000, 0, 'NONE', 'Uncompressed'))
converted = audioop.ratecv(audioData, 2, 1, originalRate, 16000, None)
af.writeframes(converted[0])
af.close()
audioFile.close()
The downside here is that even though I get audio data from mediaRecorder Api through json, so I have it in memory... And I write it down on disk, and open it again to be able to get the sampling rate using wave's functions. But how do I do it without writing it to disk? Thanks. If I have to make a new question for that, sure, can do that.
EDIT2:
Oh, ok, answering my own follow-up question - io.BytesIO did the trick.
Have a look at audioop.ratecv (it's in the standard library)
Let it operate on the raw frames of your sample (in your case, audioData).
It's a simple algorithm so expect some sound quality loss, but I guess for speech that is insignificant.