Making a synthesized sound at arbitrary pitch in python - python

I'm finding it strangely hard to find a synthesizer module for python that allows the program to play a note at an arbitrary pitch. Preferably the note should be more than just a pure sinewave and should include at least a few harmonics - it should be more than just a beep.
The idea is to be able to write something like, loudness, duration)
my_synth = the_module.newsynth(), loudness, duration)
where frequency is specified in Hz, and have a synthesized tone play from the user's speakers. There's JavaScript modules for doing this, such as Tone.js, but does anyone know of an approach using Python?

If on windows, you can use the builtin winsound.Beep.
If on Linux, you need to write directly to /dev/audio, like suggested here:
def beep(frequency, amplitude, duration):
sample = 8000
half_period = int(sample/frequency/2)
beep = chr(amplitude)*half_period+chr(0)*half_period
beep *= int(duration*frequency)
audio = file('/dev/audio', 'wb')


Inaccurate real-time audio FFT interpretation with Python

I'm trying to use Python to create a live music visualization. The libraries I'm using are SoundCard (for live audio capture) and Librosa (for short-time Fourier transform).
However I suspect I'm not interpreting the audio data correctly. Looking at the 100Hz-200Hz bin, I get a constant stream of sound even when the song doesn't contain that much bass (or really, any whatsoever). I admit I am a bit in over my head with all the audio processing FFT stuff, since it's not really my expertise and the math beats me most of the time.
This is the function that captures and analyses the audio. lb is set to the speakers and works properly. Fs is set to 48000 and I record 1000 frames in the attempt of keeping 48FPS. fftwindowsize is set to 2048*8 because... I'm not sure. I increased the number until Librosa stopped throwing warnings.
def audioanalysis():
with lb.recorder(samplerate=Fs) as mic:
rawdata = mic.record(numframes=1000)
datalen: int = int(rawdata.size/2)
monodata = numpy.empty(datalen)
for x in range(0, datalen):
monodata[x] = max(rawdata[x][0], rawdata[x][1])
data = numpy.abs(librosa.stft(monodata, n_fft=fftwindowsize, hop_length=1024))
return librosa.amplitude_to_db(data, ref=numpy.max)
And the code for making buckets:
frequencies = librosa.core.fft_frequencies(n_fft=fftwindowsize)
freq_index_ratio = len(frequencies)/frequencies[len(frequencies)-1] / 2
for i in range(0,buckets):
avg = 0
for j in range (i * bucketsize, (i+1)*bucketsize):
avg += amp(spectrogram=spectrogram, freq=j)
amps[i] = avg/bucketsize
def amp(spectrogram, freq) -> float:
return spectrogram[int(freq*freq_index_ratio)]
Over the course of a song, amps[1] (so 100Hz-200Hz) stays in the -50dB to -30dB range, which isn't really useful or representative of the song playing.
Is my FFT analysis wrong? Is there no way to better interpret short samples of sound?
P.S. I know my Python code isn't excellent. This is my first project in Python :)

Playing audio in python at given timestamp

I am trying to find a way in python to play a section of an audio file given a start and end time.
For example, say I have an audio file that is 1 min in duration. I want to play the section from 0:30 to 0:45 seconds.
I do not want to process or splice the file, only playback of the given section.
Any suggestions would be greatly appreciated!
I found a great solution using pydub:
from pydub import AudioSegment
from pydub.playback import play
audiofile = #path to audiofile
start_ms = #start of clip in milliseconds
end_ms = #end of clip in milliseconds
sound = AudioSegment.from_file(audiofile, format="wav")
splice = sound[start_ms:end_ms]
step one is to get your python to play entire audio file ... several libraries are available for this ... see if the library has a time specific api call ... you can always roll up your sleeves and implement this yourself after you read the audio file into a buffer or possibly stream the file and stop streaming at end of chosen time section
Another alternative is to leverage command line tools like ffmpeg which is the Swiss Army Knife of audio processing ... ffmpeg has command line input parms to do time specific start and stop ... also look at its sibling ffplay
Similar to ffplay/ffmpeg is another command line audio tool called sox
Use PyMedia and Player. Look at the functions SeekTo() and SeekEndTime(). I think you will be able to find a right solution after playing around with these functions.
I always have trouble installing external libraries and if you are running your code on a server and you don't have sudo privileges then it becomes even more cumbersome. Don't even get me started on ffmpeg installation.
So, here's an alternative solution with scipy and native IPython that avoids the hassle of installing some other library.
from import wavfile # to read and write audio files
import IPython #to play them in jupyter notebook without the hassle of some other library
def PlayAudioSegment(filepath, start, end, channel='none'):
# get sample rate and audio data
sample_rate, audio_data = # where filepath = 'directory/audio.wav'
#get length in minutes of audio file
print('duration: ', audio_data.shape[0] / sample_rate / 60,'min')
## splice the audio with prefered start and end times
spliced_audio = audio_data[start * sample_rate : end * sample_rate, :]
## choose left or right channel if preferred (0 or 1 for left and right, respectively; or leave as a string to keep as stereo)
spliced_audio = spliced_audio[:,channel] if type(channel)==int else spliced_audio
## playback natively with IPython; shape needs to be (nChannel,nSamples)
return IPython.display.Audio(spliced_audio.T, rate=sample_rate)
Use like this:
filepath = 'directory_with_file/audio.wav'
start = 30 # in seconds
end = 45 # in seconds
channel = 0 # left channel

Increasing the playback speed of combined wav file in python?

I was trying to combine multiple wav files in python using pydub but the output song's playback speed was kinda slower than I wanted. So I referred to this question and tried the same.
import os, glob
import random
from pydub import AudioSegment
FRAMERATE = 44100 # The frequency of default wav file
OUTPUT_FILE = 'MySong/random.wav'
audio_data = [AudioSegment.from_wav(wavfile)
for wavfile in glob.glob(os.path.join('wav_files/', '*.wav'))]
my_music = sum([random.choice(audio_data)for i in range(100)])
my_music = my_music.set_frame_rate(FRAMERATE * 4)
my_music.export(OUTPUT_FILE, format='wav')
But this isn't working. Is there any technical reason I'm unaware of, or is there any better way of doing it?
to increase pace without changing pitch, you’ll need to do something a little fancier than changing the frame rate (which will give you a “chipmunk” effect).
If you’re dealing with spoken word, you can try stripping out silence with the (unfortunately undocumented) functions in pydub.silence.
You can also look at AudioSegment().speedup() which is a naive attempt at resampling. You can also make a copy of that function and try to improve it (and contribute back to pydub?)

How to control a sound card programmatically?

I'm playing with pyaudio on a mac using a Saffire Pro 40 sound card.
Currently I have two inputs plugged in and I'd like to control the levels of the second input channel programmatically. (This works fine using the sound card's mix control software).
I've been going through the pyaudio docs, but haven't found anything glaring on this issue so far. What's the simplest way to essentially do what the mix control software does (control volume per channel) programmatically? (A Python API would be nice, but not essential)
To simplify: it looks like it's possible to manually read the streams from the channels I want to control, scale them using numpy, them write them as output, but I'm hoping there is a method to simply send a normalized value per channel to control it.
So instead of something like this:
stream1 = format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
output = True,
input_device_index = 0,
frames_per_buffer = CHUNK
stream2 = format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
input_device_index = 1,
frames_per_buffer = CHUNK
while processingAudio:
# manually fetch each channel
data1In =
data2In =
# convert to numpy to easy scale the arrays
decodeddata1 = numpy.fromstring(data1In, numpy.int16)
decodeddata2 = numpy.fromstring(data2In, numpy.int16)
newdata = (decodeddata1 * 0.5 + decodeddata2* 0.1).astype(numpy.int16)
# finally write the processed data
This is a bit misleading but I would need to mix separate channels from the same input device index. However what I'm hoping is something like:
Having a look at the Channel Maps example feels closer to what I'm after. At the moment I find the host_api_specific part of API a bit confusing and I was hoping someone already has some experience successfully using this.
I am using OSX 10.10
I don't really have any experience with OSX, so I don't know, but normally you can remote-control everything with AppleScript.
See, for example, this question.
It doesn't say how to control the volume of a single channel separately, though.
Probably you should ask there ...
Regarding the inferior work-around, you can use python-sounddevice to create a little (untested) Python script:
import sounddevice as sd
def callback(indata, outdata, *stuff):
outdata[:] = indata * [1, 0.5]
with sd.Stream(channels=2, callback=callback):
This script will run until you press <Return> and it will reduce the volume of the second channel.

Producing histogram of sound for m3u of radio station

I want to sample a radio station which broadcasts in format *.m3u8 and to produce the histogram of the first n seconds (where the user fixes n).
I had been trying using radiopy but it doesn't work and gnuradio seems useless. How can I produce and show this histogram?
EDIT: Now I use Gstreamer v1.0 so I can play it directly but now I need to live-sample my broadcast. How can I do it using Gst?
gnuradio seems useless
Well, I'd argue that this is what you're looking for, if you're looking for a live spectrogram:
As you can see, it's but a matter of connecting a properly configured audio source to a Qt GUI sink. If properly configured (I wrote an answer about that, and a GNU Radio wiki page as well).
Point is: you shouldn't be trying to play an internet station by yourself. Let a software do that which knows what it is doing.
In your case, I'd recommend:
using VLC or mplayer to write the radio, after decoding it to PCM 32bit float of a fixed sampling rate to a file.
Use Python with the libraries Numpy to open that file (samples = numpy.fromfile(filename, dtype=numpy.float32)), and matplotlib/pyplot to plot a spectrogram to a file, i.e. something like (untested, because written right here):
import sys
import os
import tempfile
import numpy
from matplotlib import pyplot
stream = sys.argv[1] ## you can pass the stream URL as argument
outfile = sys.argv[2] ## second argument: output file; ending determines type!
num_of_seconds = min(int(sys.argv[3]), 60) # not more than 1min of streaming
(intermediate_file, inter_fname) = tempfile.mkstemp()
# pan = number of output channels (1: mix to mono)
# resample = sampling rate. this must be the same for all files, so that we can actually compare spectrograms
# format = sample format, here: native floats
sys.system("mplayer -endpos %d -vo null -af pan=1 -af resample=441000 -af format=floatne -ao pcm:nowaveheader:file=%s" % num_of_seconds % inter_fname)
samples = numpy.fromfile(inter_fname, dtype=float32)
pyplot.figure((num_of_seconds * 44100, 256), dpi=1)
### Attention: this call to specgram expects of you to understand what the Discrete Fourier Transform does.
### This uses a Hanning window by default; whether that is appropriate for audio data is questionable. Use all your DSP skillz!
### pyplot.specgram has a lot of options, including colormaps, frequency scaling, overlap. Make yourself acquintanced with those!
pyplot.specgram(samples, NFFT=256, FS=44100)
pyplot.savefig(outfile, bbox_inches="tight")
