To perform an end-to-end test of an embedded platform that plays musical notes, we are trying to record via a microphone and identify whether a specific sound were played using the device' speakers. The testing setup is not a real-time system so we don't really know when (or even if) the expected sound begins and ends.
The expected sound is represented in a wave file (or similar) we can read from disk.
How can we run a test that asserts whether the sound were played as expected?
There are a few ways to tackle this problem:
Convert the expected sound into a sequence of frequency amplitude
pairs. Then, record the sound via the microphone and convert that
recording into a corresponding sequence of frequency amplitude
pairs. Finally, compare the two sequences to see if they match.
This task can be accomplished using the modules scipy, numpy,
and matplotlib.
We'll need to generate a sequence of frequency amplitude pairs
for the expected sound. We can do this by using the
scipy.io.wavfile.read() function to read in a wave file
containing the expected sound. This function will return a tuple
containing the sample rate (in samples per second) and a numpy
array containing the amplitudes of the waveform. We can then use
the numpy.fft.fft() function to convert the amplitudes into a
sequence of frequency amplitude pairs.
We'll need to record the sound via the microphone. For this,
we'll use the pyaudio module. We can create a PyAudio object
using the pyaudio.PyAudio() constructor, and then use the
open() method to open a stream on the microphone. We can then
read in blocks of data from the stream using the read() method.
Each block of data will be a numpy array containing the
amplitudes of the waveform at that particular moment in time.
We can then use the numpy.fft.fft() function to convert the
amplitudes into a sequence of frequency amplitude pairs.
Finally, we can compare the two sequences of frequency
amplitude pairs to see if they match. If they do match, then we
can conclude that the expected sound was recorded correctly. If
they don't match, then we can conclude that the expected sound
was not recorded correctly.
Use a sound recognition system to identify the expected sound in the recording.
from pydub import AudioSegment
from pydub.silence import split_on_silence, detect_nonsilent
from pydub.playback import play
def get_sound_from_recording():
sound = AudioSegment.from_wav("recording.wav") # detect silent chunks and split recording on them
chunks = split_on_silence(sound, min_silence_len=1000, keep_silence=200) # split on silences longer than 1000ms. Anything under -16 dBFS is considered silence. keep 200ms of silence at the beginning and end
for i, chunk in enumerate(chunks):
play(chunk)
return chunks
Cross-correlate the recording with the expected sound. This will produce a sequence of values that indicates how closely the recording matches the expected sound. A high value at a particular time index indicates that the recording and expected sound match closely at that time.
# read in the wav file and get the sampling rate
sampling_freq, audio = wavfile.read('audio_file.wav')
# read in the reference image file
reference = plt.imread('reference_image.png')
# cross correlate the image and the audio signal
corr = signal.correlate2d(audio, reference)
# plot the cross correlation signal
plt.plot(corr)
plt.show()
This way you can set up your test to check if you are getting the correct output.
For example,
y, sr = librosa.load("sound.wav",sr=44100,mono=True)
half = int(y.shape / 2)
y1 = y[:half]
y2 = y[half:]
y_pit= librosa.effects.pitch_shift(y2, sr, n_steps=24)
y = np.concatenate([y1,y_pit])
This code imports sound.wav and pitch-shift only the later half ,then finally makes one sound file.
Now, what I want to do is more.
I would like to pitch-shift only around specific hz like 440hz=A
For example
In the case, I have sound (A C E) = Am Chord
I want to pitch shift only around A then make (G C E)
Where should I start ?? or librosa.effects.pitch_shift is useful for this purpose???
This is not possible with a pitch shifter. A pitch shifter simply changes the frequencies by slowing the sound up or down (as a varispeed) then cutting some small slices if the resulting sound is longer or, at the contrary, duplicating some small slices while it is shorter. As you can imagine, this process handles the whole wave as a single thing meaning that the spectrum is completely transposed.
Doing what you want requires a much more sophisticated technique called resynthesis which first converts the wave in a synthetic sound using FFT and additive synthesis (or other techniques more appropriate when the sound is noisy), then allows some manipulation on independent parts of the spectrum, and finally reconverts the synthetic sound to an audio wave. There is a standalone software doing that quite well which is called Spear. You could also investigate Loris which seems to have a python module.
I have a piece of python script that runs through a continuous loop (~5Hz) to obtain data from a set of sensors connected to my PC, much like a proximity sensor.
I would like to translate this sensor data into audio output, using python and in a continuous manner. That is: whilst my sensor loop is running I want to generate and play a continuous sinusoidal audio sound, of which the frequency is modulated by the sensor output (e.g. higher sensor value = higher frequency). This is sort of the output that I want (without the GUI, of course: http://www.szynalski.com/tone-generator/)
I've looked through a lot of the available packages (pyDub, pyAudio, Winsound)but all seem to solve a piece of the puzzle, either signal generation, saving or playing, but I can't seem to find out how to combine them.
It's possible to perform frequency modulation and link different frequencies together and then save them, how to play them in real-time and without clogging up my sensor
It's possible to play threaded audio using WinSound -> but how to update frequence in real-time?
Or is this not a feasible route to walk on using python and should I write a script that inputs the sensor data into another more audio-friendly language?
Thanks.
I have a piece of python script that runs through a continuous loop (~5Hz)
Does it not work if you just add winsound.Beep(frequency_to_play, 1) in the loop?
I've been working on an animatronic shooting gallery using a Raspberry Pi B+ and an Arduino Mega. Overall, things are going well except for one detail. I'm having trouble keeping the motor movements in sync with sound.
For example, say I got this talking pig. You hit the target and the pig says "Hey, Kid! Watch were you're pointing that thing! You'll put your eye out!" or something like that. Problem is, the motors making the pig move can't keep up with the audio. They lag behind, getting out of sync with the audio. The longer the routine, the further they fall behind. What's more, it doesn't lag behind at a consistent rate. It seems to depend on what else the computer is doing at the time. For example, I had Audacity running at the same time as my program was running, and there was noticeably more lag then when my program was the only program running. I also notice a slight difference in lag between running my program in IDLE vs running from terminal.
Some of my targets are more sensitive to synchronization than others, for example, if I had a rocket that went up to a whooshing sound, it's not really an issue if the motor moves a little faster or slower, but mouth movements look terrible if they're off by more than maybe a tenth of a second.
To control my motors, I use a pickled list of tuples each containing 0: time in ms after beginning of routine for the position signal to be sent, 1: the number of the motor for the signal to be sent to, and 2: the position of the motor. The main program loops through all the targets and gives each a chance to see if it is time for it to send the next motor position command.
Here is my current code:
import time
import pygame.mixer
import os
import cPickle as pickle
import RPi.GPIO as GPIO
from Adafruit_PWM_Servo_Driver import PWM
GPIO.setmode(GPIO.BOARD)
chan_list = [17,18]
GPIO.setup(chan_list, GPIO.IN, pull_up_down=GPIO.PUD_UP)
GPIO.add_event_detect(17, GPIO.FALLING)
GPIO.add_event_detect(18, GPIO.FALLING)
pygame.mixer.init(channels=1, frequency=22050)
pygame.mixer.set_num_channels(10)
pwm = PWM(0x40)
pwm.setPWMFreq(60)
game_on = 1
class Bot:
def __init__(self, folder, servopins):
self.startExecute = 0
self.finishExecute = 0
self.folder = folder # name of folder containing routine sound and motor position data
self.servoPins = servopins #list of servo motor pins for this bot
## self.IRpin = IRpin
self.startTime = 0 # start time of curent routine in ms. begins at 0ms
self.counter = 0 #next frame of routine to be executed
self.isRunning = 0 #1 if bot is running a a routine 0 if not target inactive if 1
self.routine = 0 # position of current routine in self.routines
## self.numRoutines = 0
self.sounds = [] #list containing sound objects for routines. One per routine
self.routines = [] #list of lists of tuples containing time, motor number, and motor position. One per routine
self.list_dir = os.listdir(self.folder) # creates a list containing the names of all routine sound and motor position data files
self.list_dir.sort()
self.currentFrame = () # tuple containing current routine frame containing execution time, motor number and motor position waiting to be executed
# appends all routine sound files into a list
for filo in self.list_dir:
if filo.endswith('.wav'):
self.sounds.append(pygame.mixer.Sound(self.folder + filo))
print filo
#appends all routine motor position files into a list
for filo in self.list_dir:
if filo.endswith('.pkl'):
self.incoming = open(self.folder + filo)
self.pklroutine = pickle.load(self.incoming)
self.routines.append(self.pklroutine)
self.incoming.close()
print filo
# self.sound = pygame.mixer.Sound(str(self.routine) + '.wav')
# self.motorfile = open(str(self.routine) + '.pkl')
# starts a routine running. resets counter to first frame of routine.
#Starts routine timer. Starts routine sound object playing. Loads first frame of routine
def run(self):
self.isRunning = 1
self.startTime = round(time.clock() * 1000)
self.sounds[self.routine].play()
self.currentFrame = self.routines[self.routine][self.counter]
def execute(self):
if self.counter == 0:
self.startExecute = round(time.clock() * 1000)
if self.currentFrame[0] <= (round(time.clock() * 1000) - self.startTime):
pwm.setPWM(self.servoPins[self.currentFrame[1]-1], 0, int(round(150 + ((self.currentFrame[2] * 2.6)))))
if self.counter < (len(self.routines[self.routine]) -2):
self.counter = self.counter + 1
self.currentFrame = self.routines[self.routine][self.counter]
else:
print (round(time.clock() * 1000) - self.startTime)
self.finishExecute = round(time.clock() * 1000)
self.counter = 0
self.isRunning = 0
if self.routine < (len(self.routines) - 1):
self.routine = self.routine + 1
else:
self.routine = 0
bot1 = Bot('Pig/', [0,1])
bot2 = Bot('Goat/', [2,3])
while 1:
if game_on:
if GPIO.event_detected(17):
bot1.run()
if GPIO.event_detected(18):
bot2.run()
if bot1.isRunning:
bot1.execute()
if bot2.isRunning == 1:
bot2.execute()
The list of tuples is created using two other programs, one on the Arduino Mega, and another on my PC using Python. The arduino is connected to a couple of potentiometers connected to +5V and ground, with the signal voltage coming from the middle terminal into an analog input. The arduino converts this analog value to a motor position in degrees, and sends that over the serial port to the PC that saves it in a tuple along with the motor number, and time that the byte was received, appends the tuple to a list, then at the end of the program, saves it in a pkl file. I will provide that code if somebody really wants to see it, but I don't believe the problem lies within the data produced by this process.
If you have any suggestions about how I could fix this code to make it work, that would be the preferable solution, because the solution I'm contemplating now seems like it might just be one massive PITA.
I'm thinking that I could use one audio channel (say, the left channel) for my audio, then in the other channel, I could have sine waves of different frequencies and amplitudes with each frequency being assigned to a specific motor. Say for example I have this one motor that moves a mouth. This motor will have an assigned frequency of 1000 Hz. Whenever the mouth needs to be open, there will be a 1000 Hz sine wave in the motor control channel. I would use an Arduino Mega to detect that sine wave, and send the PWM signal to the motor to move to a certain position. The position may even be determined by the amplitude of the sine wave. To make things a little simpler, I could just have 3 positions for the mouth, closed (no signal), open a little (small amplitude signal), and open a little more (larger amplitude signal).
All of this would be done using this FFT library: http://wiki.openmusiclabs.com/wiki/ArduinoFFT
I would need to build a circuit to convert the AC signal from the RPi to a 0-5V signal. I have found this circuit here that looks like it plausibly might work, assuming I can find a way to cleanly amplify the signal from the RPi to +/- 2.5V. Source
Other than that, I don't know how difficult or effective it would be to implement this solution. For one, I'm not sure if I can send PWM signals to the motors and run the FFT library at the same time, and I've never worked with this library, or had any experience with FFT at all.
My fallback solution is to design the shooting gallery so that it is less dependent on synchronization. This would involve such compromises such as just making the mouth open and close at a steady rate rather than trying to make it move in sync with audio, and keeping triggered routines short (<= 5 seconds)
Okay, problem defined, now on to specific questions:
What can I do to make the current version of the program work without having to resort to analyzing a bunch of sine waves using FFT?
Failing that, any suggestions on the input circuit? What could I use to cleanly amplify the RPi signal to +/- 2.5V, and will the circuit I linked to actually work to convert that signal to a 0V-5V signal readable by an Arduino analog input?
Will using the FFT library interfere with sending the PWM signals to the motors?
How much of a PITA will it be to extract usable data from a bunch of sine waves using FFT?
The documentation for the FFT library is a little lacking, How do I set the range of frequencies to analyze? I see that it starts at 0 Hz(DC), but what's the high end? Can I set that? I only need maybe 30 usable bins. What frequencies should I use to get the clearest signals? I'm guessing I don't want to use the highest and lowest, because the lowest would take longer to detect, and the highest will be more distorted due to the poor quality of the audio coming out of the RPi. Should I set it to 64 bins and use the middle 30? And again, how do I determine the center frequency of each bin?
Gosh, who will answer to all these Qs.
1. Your program is doing OK, Raspi is responsible for your trouble.
OS has its timer which interrupts every process, usually 100 times per second. Then OS does what it needs to do and chooses another process to be active. Then after X MS or US, according to OSes preference, that process is interrupted, and so on. This causes your program to pause at least 0.01 sec (best case). I don't know for Raspi B+, but on B kernel interrupts were interfering with software PWM as well, because, as far as I remember, Raspi B has only one hardware PWM pin. You have more than one servo, which have to be updated every 20 MS with correct duty cycle to keep the wanted position. Your idea to use audio output to control servoes is really not so bad.
For that, you do not need FFT, you just generate the sine wave according to formula (use numpy):
sine = A*np.sin(2*np.pi*np.arange(how_long)*freq/fs)
Where A is amplitude, freq is desired frequency of sine, and fs is the sample rate of audio output. how_long is number of samples you want to generate. Anyway, PWM is not a sine, it is a square signal. So you would not even need that. Just N samples of zeros and N samples of 1.
I can recommend implementing a signal check from Arduino. You tell a motor, start to move, when it does, play the sound. Slow down motors to be able to catch up with sound.
The best way of doing this would be to make your motor controler out of some MCU, and not use Raspi GPIO for that. Then talk to MCU using I2C or RS232. This works very well.
If you have multiple motors, take care that chosen MCU has enough hardware PWM pins, or choose one with high oscilator frequency and add demultiplexor to it.
Yes, if you use FFT, or start another program, or something, PWM will suffer badly.
I think numpy is included in Raspbian and FFT works very well on it, but you would not need it anyway. May be, but just may be, try pyaudio or alsaaudio instead of pygame to see who is faster in responding. On PyPI you have one nice module SWMixer, that turns pyaudio into pygame.mixer(), so you would not need to do anything extra. Although, I think, this will be very slow because of manual mixing. But pyaudio can open multiple streams on which you can write parallely. Also, it would be great, for this purpose to have a real time OS.
There are some prepared images for Raspi. I found, few years back, an image from Machinoid, precompiled Xenomai with Debian. The trouble was that audio driver did not work. It is possible that somebody fixed that. On RTOS, you can specify time in which some action should be performed, thus making it atomic. I.e. OS will do all to perform it in specified time. In that manner OSes interrupts and switching to other processes are governed by timeout of required action.
You choose the easiest solution for you. I would firstly see what can I do with the feedback connection, if it does not work, I would try with audio, and if this does not work, I would use external IC for PWM.
Audio approach has a limit. You cannot make two pigs talk at the same time. :D
Enjoy yourself. I would! :D
I'm working on a project where I need to know the amplitude of sound coming in from a microphone on a computer.
I'm currently using Python with the Snack Sound Toolkit and I can record audio coming in from the microphone, but I need to know how loud that audio is. I could save the recording to a file and use another toolkit to read in the amplitude at given points in time from the audio file, or try and get the amplitude while the audio is coming in (which could be more error prone).
Are there any libraries or sample code that can help me out with this? I've been looking and so far the Snack Sound Toolkit seems to be my best hope, yet there doesn't seem to be a way to get direct access to amplitude.
Looking at the Snack Sound Toolkit examples, there seems to be a dbPowerSpectrum function.
From the reference:
dBPowerSpectrum ( )
Computes the log FFT power spectrum of the sound (at the sample number given in the start option) and returns a list of dB values. See the section item for a description of the rest of the options. Optionally an ending point can be given, using the end option. In this case the result is the average of consecutive FFTs in the specified range. Their default spacing is taken from the fftlength but this can be changed using the skip option, which tells how many points to move the FFT window each step. Options:
EDIT: I am assuming when you say amplitude, you mean how "loud" the sound appears to a human, and not the time domain voltage(Which would probably be 0 throughout the entire length since the integral of sine waves is going to be 0. eg: 10 * sin(t) is louder than 5 * sin(t), but their average value over time is 0. (You do not want to send non-AC voltages to a speaker anyways)).
To get how loud the sound is, you will need to determine the amplitudes of each frequency component. This is done with a Fourier Transform (FFT), which breaks down the sound into it's frequency components. The dbPowerSpectrum function seems to give you a list of the magnitudes (forgive me if this differs from the exact definition of a power spectrum) of each frequency. To get the total volume, you can just sum the entire list (Which will be close, xept it still might be different from percieved loudness since the human ear has a frequency response itself).
I disagree completely with this "answer" from CookieOfFortune.
granted, the question is poorly phrased... but this answer is making things much more complex than necessary. I am assuming that by 'amplitude' you mean perceived loudness. as technically each sample in the (PCM) audio stream represents an amplitude of the signal at a given time-slice. to get a loudness representation try a simple RMS calculation:
RMS
|K<
I'm not sure if this will help, but
skimpygimpy
provides facilities for parsing WAVE files into python
sequences and back -- you could potentially use this
to examine the wave form samples directly and do
what you like. You will have to read some source,
these subcomponents are not documented.