I accidentally forgot to convert some NumPy arrays to bytes objects when using PyAudio, but to my surprise it still played audio, even if it sounded a bit off. I wrote a little test script (see below) for playing 1 second of a 440Hz tone, and it seems like writing a NumPy array directly to a PyAudio Stream cuts that tone short.
Can anyone explain why this happens? I thought a NumPy array was a contiguous sequence of bytes with some header information about its dtype and strides, so I would've predicted that PyAudio played the full second of the tone after some garbled audio from the header, not cut the tone off.
# script segment
import pyaudio
import numpy as np
RATE = 48000
p = pyaudio.PyAudio()
stream = p.open(format = pyaudio.paFloat32, channels = 1, rate = RATE, output = True)
TONE = 440
SECONDS = 1
t = np.arange(0, 2*np.pi*TONE*SECONDS, 2*np.pi*TONE/RATE)
sina = np.sin(t).astype(np.float32)
sinb = sina.tobytes()
# console commands segment
stream.write(sinb) # bytes object plays 1 second of 440Hz tone
stream.write(sina) # still plays 440Hz tone, but noticeably shorter than 1 second
The problem is more subtle than you describe. Your first call is passing a bytes array of size 192,000. The second call is passing a list of float32 values with size 48,000. pyaudio handles both of them, and passes the buffer to portaudio to be played.
However, when you opened pyaudio, you told it you were sending paFloat32 data, which has 4 bytes per sample. The pyaudio write handler takes the length of the array you gave it, and divides by the number of channels times the sample size to determine how many audio samples there are. In your second call, the length of the array is 48,000, which it divides by 4, and thereby tells portaudio "there are 12,000 samples here".
So, everyone understood the format, but were confused about the size. If you change the second call to
stream.write(sina, 48000)
then no one has to guess, and it works perfectly fine.
Related
I'm trying to use Python to create a live music visualization. The libraries I'm using are SoundCard (for live audio capture) and Librosa (for short-time Fourier transform).
However I suspect I'm not interpreting the audio data correctly. Looking at the 100Hz-200Hz bin, I get a constant stream of sound even when the song doesn't contain that much bass (or really, any whatsoever). I admit I am a bit in over my head with all the audio processing FFT stuff, since it's not really my expertise and the math beats me most of the time.
This is the function that captures and analyses the audio. lb is set to the speakers and works properly. Fs is set to 48000 and I record 1000 frames in the attempt of keeping 48FPS. fftwindowsize is set to 2048*8 because... I'm not sure. I increased the number until Librosa stopped throwing warnings.
def audioanalysis():
with lb.recorder(samplerate=Fs) as mic:
rawdata = mic.record(numframes=1000)
datalen: int = int(rawdata.size/2)
monodata = numpy.empty(datalen)
for x in range(0, datalen):
monodata[x] = max(rawdata[x][0], rawdata[x][1])
data = numpy.abs(librosa.stft(monodata, n_fft=fftwindowsize, hop_length=1024))
return librosa.amplitude_to_db(data, ref=numpy.max)
And the code for making buckets:
frequencies = librosa.core.fft_frequencies(n_fft=fftwindowsize)
freq_index_ratio = len(frequencies)/frequencies[len(frequencies)-1] / 2
[...]
for i in range(0,buckets):
avg = 0
for j in range (i * bucketsize, (i+1)*bucketsize):
avg += amp(spectrogram=spectrogram, freq=j)
amps[i] = avg/bucketsize
def amp(spectrogram, freq) -> float:
return spectrogram[int(freq*freq_index_ratio)]
Over the course of a song, amps[1] (so 100Hz-200Hz) stays in the -50dB to -30dB range, which isn't really useful or representative of the song playing.
Is my FFT analysis wrong? Is there no way to better interpret short samples of sound?
P.S. I know my Python code isn't excellent. This is my first project in Python :)
I am attempting to preprocess audiofiles to be used in a neural net with soundfile.read(), but the function is formatting the returned data differently for different .FLAC files with the same sample rate and length. For example, calling data, sr = soundfile.read(audiofile1) produced an array with shape data.shape = (48000, 2) (where individual element values were either the amplitude, 0, or the negative amplitude in NumPy float64), while calling data, sr = soundfile.read(audiofile2) produced an array with shape data.shape = (48000,) (where individual element values were varied NumPy float64).
Also, if it helps, audiofile1 was a recording taken from a recording taken via PyAudio, whereas audiofile2 was a sample from the LibriSpeech corpus.
So, my question is twofold:
Why is soundfile.read() producing two different data formats, and how do I ensure that the function returns the arrays in the same format in the future?
Your audiofile2 sample is mono, whereas your audiofile1 recording is stereo (i.e. you probably recorded it with a PyAudio stream configured with channels=2). So I suggest you first figure out whether you need mono or stereo for your application.
If all you really care is a mono audio signal, you can convert stereo (or more generally N-channel) audio to mono by averaging the channels:
data, sr = soundfile.read(audiofile)
if np.dim(data)>1:
data = np.mean(data,axis=1)
If you need stereo audio, then you may create an additional channel by duplicating the one you have (although that would not be adding the usual additional information such as phase or amplitude differences between the different channels) with:
if np.dim(data)<2:
data = np.tile(data,(2,1)).transpose()
It's as simple as:
data, sr = soundfile.read(audiofile2, always_2d=True)
With this, data.shape will always have two elements; data.shape[0] will be the number of frames and data.shape[1] will be the number of channels.
I'm using pyaudio to take input from a microphone or read a wav file, and analyze the stream while playing it. I want to only analyze the right channel if the input is stereo. I've been able to extract the data and convert to integers using loops:
levels = []
length = len(data)
if channels == 1:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
elif channels == 2:
for i in range(length//4):
j = 4 * i + 2
volume = abs(struct.unpack('<h', data[j:j+2])[0])
levels.append(volume)
I think this working correctly, I know it runs without error on a laptop and Raspberry Pi 3, but it appears to consume too much time to run on a Raspberry Pi Zero when simultaneously streaming the output to a speaker. I figure that eliminating the loop and using numpy may help. I assume I need to use np.ndarray to do this, and the first parameter will be (CHUNK,) where CHUNK is my chunk size for analyzing the audio (I'm using 1024). And the format would be '<h', as in the struct code above, I think. But I'm at a loss as to how to code it correctly for each of the two cases (mono and right channel only for stereo). How do I create the numpy arrays for each of the two cases?
You are here reading 16-bit integers from a binary file. It seems that you are first reading the data into data variable with something like data = f.read(), which is here not visible. Then you do:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
BTW, that code is wrong, it shoud be abs(struct.unpack('<h', data[2*i:2*i+2])[0]), otherwise you are overlapping bytes from different values.
To do the same with numpy, you should just do this (instead of both f.read()and the whole loop):
data = np.fromfile(f, dtype='<i2')
This is over 100 times faster than the manual thing above in my test on 5 MB of data.
In the second case, you have interleaved left-right-left-right values. Again you can read them all (assuming you have enough memory) and then access only one half:
data = np.fromfile(f, dtype='<i2')
left = data[::2]
right = data[1::2]
This processes everything, even though you need just one half, but it is still much much faster.
EDIT: If the data not coming from a file, np.fromfile can be replaced with np.frombuffer. Then you have this:
channel_data = np.frombuffer(data, dtype='<i2')
if channels == 2:
channel_data = channel_data[1::2]
levels = np.abs(channel_data)
I'm doing an rfft and irfft from a wave file:
samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
output = np.fft.irfft(fftData).astype(data.dtype)
So it reads from a file and then does rfft. However it produces a lot of noise when I play the audio with py audio stream. I tried to search an answer to this question and used this solution:
rfft or irfft increasing wav file volume in python
That is why I have the .astype(data.dtype) when doing the irfft. However it doesn't reduce the noise, it reduced it a bit but still it sounds all wrong.
This is the playback, where p is the pyAudio:
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=fs,
output=True)
stream.write(output)
stream.stop_stream()
stream.close()
p.terminate()
So what am I doing wrong here?
Thanks!
edit: Also I tried to use .astype(dtype=np.float32) when doing the irfft as the pyaudio uses that when streaming audio. However it was still noisy.
The best working solution this far seems to be normalization with median value and using .astype(np.float32) as pyAudio output is float32:
samplerate, data = wavfile.read(location)
input = data.T[0] # first track of audio
fftData = np.fft.rfft(input[sample:], length)
fftData = np.divide(fftData, np.median(fftData))
output = np.fft.irfft(fftData).astype(dtype=np.float32)
If anyone has better solutions I'd like to hear. I tried with mean normalization but it still resulted in clipping audio, normalization with np.max made the whole audio too low. This normalization problem with FFT is always giving me trouble and haven't found any 100% working solutions here in SO.
What shall be evaluated and achieved:
I try to record audio data with a minimum of influence by hard- and especially software. After using Adobe Audition for some time I stumbled across PyAudio and was driven by curiosity as well as the possibility to refresh my Python knowledge.
As the fact displayed in the headline above may have given away I compared the sample values of two wave files (indeed sections of them) and had to find out that both programmes produce different output.
As I am definitely at my wit`s end, I do hope to find someone who could help me.
What has been done so far:
An M-Audio “M-Track Two-Channel USB Interface” has been used to record Audio Data with Audition CS6 and PyAudio simultaneously as the following steps are executed in the given order…
Audition is prepared for recording by opening “Prefrences/ Audio Hardware” and selecting the audio interface, a sample rate of 48 kHz and a latency of 250 ms (this value has been examined thoughout the last years as to be the second lowest I can get without getting the warning for lost samples – if I understood the purpose correctly I just have to worry about loosing samples cause monitoring is not an issue).
A new file with one channel, a sample rate of 48 kHz and a bit depth of 24 bit is opened.
The Python code (displayed below) is started and leads to a countdown being used to change over to Audition and start the recording 10 s before Python starts its.)
Wait until Python prints the “end of programme” message.
Stop and save the data recorded by Audition.
Now data has to be examined:
Both files (one recorded by Audition and Python respectively) are opened in Audition (Multitrack session). As Audition was started and terminated manually the two files have completely different beginning and ending times. Then they are aligned visually so that small extracts (which visually – by waveform shape – contain the same data) can be cut out and saved.
A Python programme has been written opening, reading and displaying the sample values using the default wave module and matplotlib.pyplot respectively (graphs are shown below).
Differences in both waveforms and a big question mark are revealed…
Does anybody have an idea why Audition is showing different sample values and specifically where precisely the mistake (is there any?) hides??
some (interesting) observations
a) When calling the pyaudio.PyAudio().get_default_input_device_info() method the default sample rate is listed as 44,1 kHz even though the default M-Track sample rate is said to be 48 kHz by its specifications (indeed Audition recognizes the 48 kHz by resampling incoming data if another rate was selected). Any ideas why and how to change this?
b) Aligning both files using the beginning of the sequence covered by PyAudio and checking whether they are still “in phase” at the end reveals no – PyAudio is shorter and seems to have lost samples (even though no exception was raised and the “exception on overflow” argument is “True”)
c) Using the “frames_per_buffer” keyword in the stream open method I was unable to align both files, having no idea where Python got its data from.
d) Using the “.get_default_input_device_info()” method and trying different sample rates (22,05 k, 44,1 k, 48 k, 192 k) I always receive True as an output.
Official Specifications M-Track:
bit depth = 24 bit
sample rate = 48 kHz
input via XLR
output via USB
Specifications Computer and Software:
Windows 8.1
I5-3230M # 2,6 GHz
8 GB RAM
Python 3.4.2 with PyAudio 0.2.11 – 32 bit
Audition CS6 Version 5.0.2
Python Code
import pyaudio
import wave
import time
formate = pyaudio.paInt24
channels = 1
framerate = 48000
fileName = 'test ' + '.wav'
chunk = 6144
# output of stream.get_read_available() at different positions
p = pyaudio.PyAudio()
stream = p.open(format=formate,
channels=channels,
rate=framerate,
input=True)
#frames_per_buffer=chunk) # observation c
# COUNTDOWN
for n in range(0, 30):
print(n)
time.sleep(1)
# get data
sampleList = []
for i in range(0, 79):
data = stream.read(chunk, exception_on_overflow = True)
sampleList.append(data)
print('end -', time.strftime('%d.%m.%Y %H:%M:%S', time.gmtime(time.time())))
stream.stop_stream()
stream.close()
p.terminate()
# produce file
file = wave.open(fileName, 'w')
file.setnchannels(channels)
file.setframerate(framerate)
file.setsampwidth(p.get_sample_size(formate))
file.writeframes(b''.join(sampleList))
file.close()
Figure 1: first comparison Audition – PyAudio
image 1
Figure 2: second comparison Audition - Pyaudio
image 2