get playing wav audio level as output - python

I want to make a speaking mouth which moves or emits light or something when a playing wav file emits sound. So I need to detect when a wav file is speaking or when it is in a silence between words. Currently I'm using a pygame script that I have found
import pygame
pygame.mixer.init()
pygame.mixer.music.load("my_sentence.wav")
pygame.mixer.music.play()
while pygame.mixer.music.get_busy() == True:
continue
I guess I could make some checking at the while loop to look the sounds output level, or something like that, and then send it to one of the gpio outputs. But I don't know how to achieve that.
Any help would be much appreciated

You'll need to inspect the WAV file to work out when the voice is present. The simplest way to do this is look for loud and quiet periods. Because sound works with waves, when it's quiet the values in the wave file won't change very much, and when it's loud they'll be changing a lot.
One way of estimating loudness is the variance. As you can see the the article, this can be defined as E[(X - mu)^2], which could be written average((X - average(X))^2). Here, X is the value of the signal at a given point (the values stored in the WAV file, called sample in the code). If it's changing a lot, the variance will be large.
This would let you calculate the loudness of an entire file. However, you want to track how loud the file is at any given time, which means you need a form of moving average. An easy way to get this is with a first-order low-pass filter.
I haven't tested the code below so it's extremely unlikely to work, but it should get you started. It loads the WAV file, uses low-pass filters to track the mean and variance, and works out when the variance goes above and below a certain threshold. Then, while playing the WAV file it keeps track of the time since it started playing, and prints out whether the WAV file is loud or quiet.
Here's what you might still need to do:
Fix all my deliberate mistakes in the code
Add something useful to react to the loud/quiet changes
Change the threshold and reaction_time to get good results with your audio
Add some hysteresis (a variable threshold) to stop the light flickering
I hope this helps!
import wave
import struct
import time
def get_loud_times(wav_path, threshold=10000, time_constant=0.1):
'''Work out which parts of a WAV file are loud.
- threshold: the variance threshold that is considered loud
- time_constant: the approximate reaction time in seconds'''
wav = wave.open(wav_path, 'r')
length = wav.getnframes()
samplerate = wav.getframerate()
assert wav.getnchannels() == 1, 'wav must be mono'
assert wav.getsampwidth() == 2, 'wav must be 16-bit'
# Our result will be a list of (time, is_loud) giving the times when
# when the audio switches from loud to quiet and back.
is_loud = False
result = [(0., is_loud)]
# The following values track the mean and variance of the signal.
# When the variance is large, the audio is loud.
mean = 0
variance = 0
# If alpha is small, mean and variance change slower but are less noisy.
alpha = 1 / (time_constant * float(sample_rate))
for i in range(length):
sample_time = float(i) / samplerate
sample = struct.unpack('<h', wav.readframes(1))
# mean is the average value of sample
mean = (1-alpha) * mean + alpha * sample
# variance is the average value of (sample - mean) ** 2
variance = (1-alpha) * variance + alpha * (sample - mean) ** 2
# check if we're loud, and record the time if this changes
new_is_loud = variance > threshold
if is_loud != new_is_loud:
result.append((sample_time, new_is_loud))
is_loud = new_is_loud
return result
def play_sentence(wav_path):
loud_times = get_loud_times(wav_path)
pygame.mixer.music.load(wav_path)
start_time = time.time()
pygame.mixer.music.play()
for (t, is_loud) in loud_times:
# wait until the time described by this entry
sleep_time = start_time + t - time.time()
if sleep_time > 0:
time.sleep(sleep_time)
# do whatever
print 'loud' if is_loud else 'quiet'

Related

How can I remove an annoying 'snap' noise each time I loop a wav file?

I'm trying to create a simple controllable and non-blocking looping wavplayer. The following class created using simpleaudio works, however, there is a brief pause and noticeable popping/snapping noise between each loop of the audio. I'm honestly not sure what's causing it and I'm not sure how I could fix it either than using a different audio module entirely. Any suggestions?
import simpleaudio as sa
from threading import Thread
class WavPlayer(Thread):
def __init__(self,filepath,loop=False):
self.loop = loop
self.wav_obj = sa.WaveObject.from_wave_file(filepath)
self.play_obj = None
Thread.__init__(self)
def run(self):
self.play_obj = self.wav_obj.play() #initialize play buffer and play once
while self.loop is True:
if not self.play_obj.is_playing():
print("played again")
self.play_obj = self.wav_obj.play()
def terminate(self):
print("music terminated")
self.play_obj.stop()
self.loop = False
self.join()
main_loop = WavPlayer("main_loop.wav",True)
menu_loop = WavPlayer("menu.wav",True)
main_loop.start()
z = input("Enter to end theme looping")
main_loop.loop = False
z = input("Enter to terminate music")
main_loop.terminate()
print("playing next song")
menu_loop.start()
cont = ''
while cont != 'n':
cont = input("continue testing non-blocking? enter n to stop: ")
menu_loop.terminate()
Filedropper link to menu_loop wav file
Filedropper link to main_loop wav file
I encounterded the same problem while programming a tuner that generates an infinitely long note from a limited number of samples (I used pyaudio, and a tuning note is a very simple sound, which makes the pops very obvious, but the remedy much easier).
If you loop a wave object like:
** **
* * *
* * *
* * *
** **
... you'll hear a pop because of the large difference between the last and the first samples. Dropping a few samples until the last and the first sample match will eliminate the pops (simpleaudio allows you to read the samples into numpy arrays which you then can truncate (slightly) until the first and last samples match).
Cutting at zero crossings is a special case of this, see this discussion. Of course, some sound libraries may already truncate your samples at zero crossings.
Before you take the trouble to extend your program to convert your samples to numpy arrays and truncating them, you first could try using a sound editor like Audacity and do it by hand, and then listen to the result.

Syncing gifs to tempo of music results in shorter duration than expected

I'm attempting to sync a gif to the beat of music playing on Spotify, but I'm encountering a speed issue while doing so. I must be going crazy because I can't find a reason as to why this isn't working. Below is my methodology:
Take initial BPM (ex: 150) and find the Beats/Second (BPS)
BPS = BPM / 60
Find the Seconds/Beat (SPB) from the Beats/Second (BPS)
SPB = 1 / BPS
Find the Seconds/Loop (SPL) by multiplying by the number of Beats/Loop (BPL) of the .gif
SPL = SPB * BPL
Convert Seconds/Loop (SPL) to Milliseconds/Loop (MSPL)
MSPL = SPL * 1000
Divide the Milliseconds/Loop (MSPL) by the number of frames (num_frames) in the .gif to find the time required for one frame (frame_time), rounding to the nearest even number since .gif frame times are only accurate up to whole milliseconds
frame_time = MSPL / num_frames
Add up the total frame times (actual_duration) and loop through frames adding or subtracting 1 millisecond until actual_duration matches ceil(MSPL) (always prioritizing longer actual duration over shorter duration)
difference = MSPL - actual_duration
if not math.isclose(0, difference):
# Add the difference and always prioritize longer duration compared to real duration value
correction = int(math.ceil(difference))
for i in range(0, abs(correction)):
# Add/subtract corrections as necessary to get actual duration as close as possible to calculated duration
frame_times[i % len(frame_times)] += math.copysign(1, correction)
Now from this, the actual Milliseconds/Loop of the gif should always be equal to MSLP or greater than MSLP. However, when I save the .gif with the specified frame times, if the correction value is not 0 then the .gif always plays at a faster speed than expected. I have noticed that when using other services online that provide the same "sync gif to music" functionality, this is also the case; so it's not just me going crazy I think.
Below is the actual code used to get frame times:
def get_frame_times(tempo: float, beats_per_loop: int, num_frames: int):
# Calculate the number of seconds per beat in order to get number of milliseconds per loop
beats_per_sec = tempo / 60
secs_per_beat = 1 / beats_per_sec
duration = math.ceil(secs_per_beat * beats_per_loop * 1000)
frame_times = []
# Try to make frame times as even as possible by dividing duration by number of frames and rounding
actual_duration = 0
for _ in range(0, num_frames):
# Rounding method: Bankers Rounding (round to the nearest even number)
frame_time = round(duration / num_frames)
frame_times.append(frame_time)
actual_duration += frame_time
# Add the difference and always prioritize longer duration compared to real duration value
difference = duration - actual_duration
if not math.isclose(0, difference):
correction = int(math.ceil(difference))
for i in range(0, abs(correction)):
# Add/subtract corrections as necessary to get actual duration as close as possible to calculated duration
frame_times[i % len(frame_times)] += math.copysign(1, correction)
return frame_times
I'm saving the gif by using PIL (Pillow)'s Image module:
frame_times = get_frame_times(tempo, beats_per_loop, num_frames)
frames = []
for i in range(0, num_frames):
# Frames are appended to frames list here
# disposal=2 used since the frames may be transparent
frames[0].save(
output_file,
save_all=True,
append_images=frames[1:],
loop=0,
duration=frame_times,
disposal=2)
Is there anything that I am doing wrong here? I can't seem to find out why this isn't working and why the actual duration of the gif is much shorter than the specified frame times. It makes me feel slightly better that other sites/services that provide this functionality end up with the same results, but at the same time I feel like this should definitely be possible.
Solved! I don't know if this is a limitation of the .gif format or if it's a limitation of the Image module in PIL, but it appears that frame times can only be accurate up to multiples of 10 milliseconds. Upon inspecting the actual frame times of the modified image, they were being floored to the nearest multiple of 10 resulting in an overall faster playback speed than expected.
To fix this, I modified the code to choose frame times in increments of 10 (again prioritizing a longer actual duration if necessary) and dispersing the frame time adjustments as evenly as possible throughout the list:
def get_frame_times(tempo: float, beats_per_loop: int, num_frames: int):
# Calculate the number of seconds per beat in order to get number of milliseconds per loop
beats_per_sec = tempo / 60
secs_per_beat = 1 / beats_per_sec
duration = round_tens(secs_per_beat * beats_per_loop * 1000)
frame_times = []
# Try to make frame times as even as possible by dividing duration by number of frames
actual_duration = 0
for _ in range(0, num_frames):
frame_time = round_tens(duration / num_frames)
frame_times.append(frame_time)
actual_duration += frame_time
# Adjust frame times to match as closely as possible to the actual duration, rounded to multiple of 10
# Keep track of which indexes we've added to already and attempt to split corrections as evenly as possible
# throughout the frame times
correction = duration - actual_duration
adjust_val = int(math.copysign(10, correction))
i = 0
seen_i = {i}
while actual_duration != duration:
frame_times[i % num_frames] += adjust_val
actual_duration += adjust_val
if i not in seen_i:
seen_i.add(i)
elif len(seen_i) == num_frames:
seen_i.clear()
i = 0
else:
i += 1
i += num_frames // abs(correction // 10)
return frame_times

Remove the system sound output when recording with microphone in Python

Actually I am recording every sound with the microphone, and I would like to filter the input in order to remove the system sound output, to understand clearly the user's voice when music is playing for example (like what Skype does).
I am looking for a Python module which allows to do that in Ubuntu 16.04, or at least something which record the system output.
Here's my script (I am using Pyaudio) :
THRESHOLD = 1500
CHUNK_SIZE = 1024
FORMAT = pyaudio.paInt16
RATE = 44100
MAX_RECORDING_TIME = 7 # seconds
MAX_SILENCE_UNITS = 65
def is_silent(snd_data):
"Returns 'True' if below the 'silent' threshold"
return max(snd_data) < THRESHOLD
def normalize(snd_data):
"Average the volume out"
MAXIMUM = 16384
times = float(MAXIMUM)/max(abs(i) for i in snd_data)
r = array('h')
for i in snd_data:
r.append(int(i*times))
return r
def record():
"""
Record a word or words from the microphone and
return the data as an array of signed shorts.
Normalizes the audio.
the recording stops after 7 seconds or a sequence of 65 silent recording units
"""
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT, channels=1, rate=RATE,
input=True, output=False,
frames_per_buffer=CHUNK_SIZE)
num_silent = 0
snd_started = False
r = array('h')
begin_time = 0
while 1:
# little endian, signed short
snd_data = array('h', stream.read(CHUNK_SIZE))
if byteorder == 'big':
snd_data.byteswap()
if (snd_started):
r.extend(snd_data)
silent = is_silent(snd_data)
if silent and snd_started: # we compute the number of silent units
num_silent += 1
elif not silent and not snd_started:
print("start recording !")
snd_started = True
begin_time = time.time() # we save the current time
if not silent:
num_silent = 0
now = int(time.time())
if snd_started and (now-begin_time>MAX_RECORDING_TIME or num_silent > MAX_SILENCE_UNITS):
break
print("recording finished !")
sample_width = p.get_sample_size(FORMAT)
stream.stop_stream()
stream.close()
p.terminate()
r = normalize(r)
return sample_width, r
def record_to_file(path):
"Records from the microphone and outputs the resulting data to 'path'"
sample_width, data = record()
data = pack('<' + ('h'*len(data)), *data)
wf = wave.open(path, 'wb')
wf.setnchannels(1)
wf.setsampwidth(sample_width)
wf.setframerate(RATE)
wf.writeframes(data)
wf.close()
Removing sound output completely from a voice recording (usually without the best conditions i.e. a desktop mic in noisy environment), is very hard to do well and there are many filtering techniques although the most simple one comes to mind.
Removing the system output direct sound
To get the audio system output, you will need to have some sort of loopback device, probably using PulseAudio. This way you can open 2 input audio streams, and will be able to receive your microphone data and system output data at the same time (this will work on a blocking approach like you have currently but I would be wary if you swap to callbacks).
The simplest way then, is to subtract all of the values in the system audio output audio block from the block of audio received by the microphone. Assuming there is no real latency issue, then this will remove all of the direct sound coming out of the device from the microphone recording.
Pseudocode:
output = microphone_audioBlock - systemOutput_audioBlock
You will need to think about a couple of things:
You will need to check if headphones are used so you aren't still subtracting the sound
This will not cancel all of the indirect sound (i.e. the reverb / reflections generated in the room)
This method is simple but as I mentioned it will not cancel the indirect sound. There are many methods for cancelling indirect sound but are all generally research concepts.
Reducing background noise
Other than that you will probably want to reduce background noise ; In DSP terms this is called noise suppression.
Since you will only have access to one microphone and won't have great control over the positioning (most likely) there is no straight forward way to implement this without some sort of DSP algorithm implementation. I have attached a couple of places where you can read up on active noise suppression techniques below:
Speech enhancement based on a priori signal to noise estimation
LMS Adaptive Filters for Noise Cancellation: A Review

Collecting large amounts of data efficiently

I have a program that creates a solar system, integrates until a close encounter between adjacent planets occur (or until 10e+9 years), then writes two data points to a file. The try and except acts as a flag when planets get too close. This process is repeated 16,000 times. This is all being done by importing the module REBOUND, which is a software package that integrates the motion of particles under the influence of gravity.
for i in range(0,16000):
def P_dist(p1, p2):
x = sim.particles[p1].x - sim.particles[p2].x
y = sim.particles[p1].y - sim.particles[p2].y
z = sim.particles[p1].z - sim.particles[p2].z
dist = np.sqrt(x**2 + y**2 + z**2)
return dist
init_periods = [sim.particles[1].P,sim.particles[2].P,sim.particles[3].P,sim.particles[4].P,sim.particles[5].P]
try:
sim.integrate(10e+9*2*np.pi)
except rebound.Encounter as error:
print(error)
print(sim.t)
for j in range(len(init_periods)-1):
distance = P_dist(j, j+1)
print(j,":",j+1, '=', distance)
if distance <= .01: #returns the period ratio of the two planets that had the close enecounter and the inner orbital period between the two
p_r = init_periods[j+1]/init_periods[j]
with open('good.txt', 'a') as data: #opens a file writing the x & y values for the graph
data.write(str(math.log10(sim.t/init_periods[j])))
data.write('\n')
data.write(str(p_r))
data.write('\n')
Whether or not there is a close encounter depends mostly on a random value I have assigned, and that random value also controls how long a simulation can run. For instance, I chose the random value to be a max of 9.99 and a close encounter happened at approximately 11e+8 years(approximately 14 hours). The random values range from 2-10, and close encounters happen more often on the lower side. Every iteration, if a close encounter occurs, my code will write to the file where I believe may be taking up a lot of simulation time. Since the majority of my simulation time is taken up by trying to locate close encounters, I'd like to shed some time by finding a way to collect the data needed without having to append to the file every iteration.
Since I'm attempting to plot the data collected from this simulation, would creating two arrays and outputting data into those be faster? Or is there a way to only have to write to a file once, when all 16000 iterations are complete?
sim is a variable holding all of the information about the solar system.
This is not the full code, I left out the part where I created the solar system.
count = 0
data = open('good.txt', 'a+')
....
if distance <= .01:
count+=1
while(count<=4570)
data.write(~~~~~~~)
....
data.close()
The problem isn't that you write every time you find a close encounter. It's that, for each encounter, you open the file, write one output record, and close the file. All the opening and appending is slow. Try this, instead: open the file once, and do only one write per record.
# Near the top of the program
data = open('good.txt', 'a')
...
if distance <= .01: #returns the period ratio of the two planets that had the close enecounter and the inner orbital period between the two
# Write one output record
p_r = init_periods[j+1]/init_periods[j]
data.write(str(math.log10(sim.t/init_periods[j])) + '\n' +
str(p_r) + '\n')
...
data.close()
This should work well, as writes will get buffered, and will often run in parallel with the next computation.

Find the speed of download for a progressbar

I'm writing a script to download videos from a website. I've added a report hook to get download progress. So, far it shows the percentage and size of the downloaded data. I thought it'd be interesting to add download speed and eta.
Problem is, if I use a simple speed = chunk_size/time the speeds shown are accurate enough but jump around like crazy. So, I've used the history of time taken to download individual chunks. Something like, speed = chunk_size*n/sum(n_time_history).
Now it shows a stable download speed, but it is most certainly wrong because its value is in a few bits/s, while the downloaded file visibly grows at a faster pace.
Can somebody tell me where I'm going wrong?
Here's my code.
def dlProgress(count, blockSize, totalSize):
global init_count
global time_history
try:
time_history.append(time.monotonic())
except NameError:
time_history = [time.monotonic()]
try:
init_count
except NameError:
init_count = count
percent = count*blockSize*100/totalSize
dl, dlu = unitsize(count*blockSize) #returns size in kB, MB, GB, etc.
tdl, tdlu = unitsize(totalSize)
count -= init_count #because continuation of partial downloads is supported
if count > 0:
n = 5 #length of time history to consider
_count = n if count > n else count
time_history = time_history[-_count:]
time_diff = [i-j for i,j in zip(time_history[1:],time_history[:-1])]
speed = blockSize*_count / sum(time_diff)
else: speed = 0
n = int(percent//4)
try:
eta = format_time((totalSize-blockSize*(count+1))//speed)
except:
eta = '>1 day'
speed, speedu = unitsize(speed, True) #returns speed in B/s, kB/s, MB/s, etc.
sys.stdout.write("\r" + percent + "% |" + "#"*n + " "*(25-n) + "| " + dl + dlu + "/" + tdl + tdlu + speed + speedu + eta)
sys.stdout.flush()
Edit:
Corrected the logic. Download speed shown is now much better.
As I increase the length of history used to calculate the speed, the stability increases but sudden changes in speed (if download stops, etc.) aren't shown.
How do I make it stable, yet sensitive to large changes?
I realize the question is now more math oriented, but it'd be great if somebody could help me out or point me in the right direction.
Also, please do tell me if there's a more efficient way to accomplish this.
_count = n if count > n else count
time_history = time_history[-_count:]
time_weights = list(range(1,len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
To make it more stable and not react when download spikes up or down you could add this as well:
_count = n if count > n else count
time_history = time_history[-_count:]
time_history.remove(min(time_history))
time_history.remove(max(time_history))
time_weights = list(range(1, len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)
This will remove highest and lowest spike in time_history which will make number displayed more stable. If you want to be picky, you probably could generate weights before removal, and then filter mapped values using time_diff.index(min(time_diff)).
Also using non-linear function (like sqrt()) for weights generation will give you better results. Oh and as I said in comments: adding statistical methods to filter times should be marginally better, but I suspect it's not worth overhead it would add.

Categories