Let's say I have a sound file (file1.wav) which is 1s long.
I can read it in via
from scipy.io import wavfile
samplerate, data = wavfile.read("file1.wav")
I can then fourier-transform it via:
from scipy.fft import fft
yf=fft(data)
Now lets say I have another file2 which contains a sound as well which does not have the same duration as file1 (it might also have another samplerate).
Now I would like to create a sound from the spectrum yf which is as long as the file2 and add both.
How can I compute a sound from file1 with the samplerate and duration from file2 in order to be able to add both?
It sounds like the essential question here is "how do I stretch/compress audio to another duration". This is a nontrivial task, there isn't a silver bullet method that works well in all cases. See Audio time stretching and pitch scaling on Wikipedia. It matters what kind of audio you are operating on: speech, music, vs. something else.
A decent place to start is waveform-similarity-based synchronized overlap-add or WSOLA algorithm. One way to perform WSOLA is with the free SoX command line utility using its "tempo" effect:
Change the audio playback speed but not its pitch. This effect uses the WSOLA algorithm. The audio is chopped up into segments which are then shifted in the time domain and overlapped (cross-faded) at points where their waveforms are most similar as determined by measurement of ‘least squares’.
Example use:
sox infile.wav outfile.wav tempo -s 1.1
where the 1.1 means "speed up by 10%" and the -s configures for speech (other options are -m for music or -l for generic "linear" processing). There are other options besides this, check out the documentation for more detail.
(Side note: A related problem is pitch shifting audio without changing the duration. SoX can do that too; see the "pitch" and "bend" effects.)
If you want to perform time stretching in Python, there is a pysox library that wraps SoX. Another possibility in Python is audiotsm, which implements WSOLA and a couple other time stretching methods.
Related
Let's say I have a few very long audio files (for ex., radio recordings). I need to extract 5 seconds after particular sound (for ex., ad start sound) from each file. Each file may contain 3-5 such sounds, so I should get *(3-5)number of source files result files.
I found librosa and scipy python libraries, but not sure if they can help. What should I start with?
You could start by calculating the correlation of the signal with your particular sound. Not sure if librosa offers this. I'd start with scipy.signal.correlate or scipy.signal.convolve.
Not sure what your background is. Start here if you need some theory.
Basically the correlation will be high if the audio matches your particular signal or is very similar to it. After identifying these positions you can select an area around them.
Pure tones in Psychopy are ending with clicks. How can I remove these clicks?
Tones generated within psychopy and tones imported as .wav both have the same problem. I tried adding 0.025ms of fade out in the .wav tones that I generated using Audacity. But still while playing them in psychopy, they end with a click sound.
Now I am not sure how to go ahead with this. I need to perform a psychoacoustic experiment and it can not proceed with tone presentation like that.
Crackling sounds or clicks are, to my knowledge, often associated with buffering errors. Many years back, I experienced similar problems on Linux systems when an incorrect bitrate was set. So there could be at least two possible culprits at work here: the bitrate, and the buffer size.
You already applied both an onset and offset ramp to allow the membranes to swing in/out, so this should not be the issue. (By the way, I think you meant 0.025 seconds instead of ms? Otherwise, the ramps would be too short!)
PyGame initializes the sound system with the following settings:
initPygame(rate=22050, bits=16, stereo=True, buffer=1024)
Whereas Pyo initializes it the following way:
initPyo(rate=44100, stereo=True, buffer=128)
The documentation of psychopy.sound states:
For control of bitrate and buffer size you can call psychopy.sound.init before
creating your first Sound object:
from psychopy import sound
sound.init(rate=44100, stereo=True, buffer=128)
s1 = sound.Sound('ding.wav')
So, I would suggest you:
Try out both sound backends, Pyo and PyGame -- you can change which one to use in the PsychoPy preferences under General / audio library. Change the field to ['pyo'] to use Pyo only, or to ['pygame'] to use only PyGame.
Experiment with different settings for bitrate and buffer size with both backends (Pyo, PyGame).
If you want to get started with serious psychoacoustics, however, I would suggest you do not use either of the proposed solutions, and get some piece of professional sound hardware or a data-acquisition board with analog outputs, which will deliver undistorted sound with sub-millisecond precision, such as the devices produced by National Instruments or competitors. The NI boards can be controlled from Python via PyLibNIDAQmx.
Clicks in the beginning and end of sounds often occur because the sound is stopped mid-way so that the wave abruptly goes from some value to zero. This waveform can only be made using high-amplitude high-frequency waves superimposed on the signal, i.e. a click. So the solution is to make the wave stop while on zero.
Are you using an old version of psychopy? If yes, then upgrade. Newer versions add a Hamming window (fade in/out) to self-generated tones which should avoid the click.
For the .wav files, try adding (extra) silence in the end, e.g. 50 ms. It might be that psychopy stops the sound prematurely.
I have one source wav. file where i have recorded 5 tones separated by silence.
I have to make 5 different wav. files containing only this tones (without any silence)
I am using scipy.
I was trying to do sth similar as in this post: constructing a wav file and writing it to disk using scipy
but it does not work for me.
Can you please advise me how to do it ?
Thanks in advance
You need to perform silence detection to identify the tones starts and ends. You can use a measurement of magnitude, such as the root mean square.
Here's how to do a RMS using scipy
Depending on your noise levels, this may be hard to do programmatically. You may want to consider simply using a audio editor, such as audacity
I am trying to extract the prevailing bitrate of a video file (e.g. .mkv file containing a movie) at a regular sampling interval of between 1-10 seconds under conditions of normal playback. Kind of like you may see in vlc, during playback of the file in the statistics window.
Can anyone suggest the best way to bootstrap the coding of such an analyser? Is there a library that provides an API to such information that people know of? Perhaps a Python wrapper for ffmpeg or equivalent tool that processes video files and can thereby extract such statistics.
What I am really aiming for is a CSV format file containing the seconds offset and the average or actual bitrate in KiB/s at that offset into the asset.
Update:
I built pyffmpeg and wrote the following spike:
import pyffmpeg
reader = pyffmpeg.FFMpegReader(False)
reader.open("/home/mark/Videos/BBB.m2ts", pyffmpeg.TS_VIDEO)
tracks=reader.get_tracks()
# Called for each frame
def obs(f):
pass
tracks[0].set_observer(obs)
reader.run()
But observing frame information (f) in the callback does not appear to give me any hooks to calculate per second bitrates. In fact bitrate calculations within pyffmpeg are measured across the entire file (filesize / duration) and so the treatment within the library is very superficial. Clearly its focus is on extract i-frames and other frame/GOP specific work.
Something like these:
http://code.google.com/p/pyffmpeg/
http://pymedia.org/
You should be able to do this with gstreamer. http://pygstdocs.berlios.de/pygst-tutorial/seeking.html has an example of a simple media player. It calls
pos_int = self.player.query_position(gst.FORMAT_TIME, None)[0]
periodically. All you have to do is call query_position() a second time with gst.FORMAT_BYTES, do some simple math, and voila! Bitrate vs. time.
I'm working on a project where I need to know the amplitude of sound coming in from a microphone on a computer.
I'm currently using Python with the Snack Sound Toolkit and I can record audio coming in from the microphone, but I need to know how loud that audio is. I could save the recording to a file and use another toolkit to read in the amplitude at given points in time from the audio file, or try and get the amplitude while the audio is coming in (which could be more error prone).
Are there any libraries or sample code that can help me out with this? I've been looking and so far the Snack Sound Toolkit seems to be my best hope, yet there doesn't seem to be a way to get direct access to amplitude.
Looking at the Snack Sound Toolkit examples, there seems to be a dbPowerSpectrum function.
From the reference:
dBPowerSpectrum ( )
Computes the log FFT power spectrum of the sound (at the sample number given in the start option) and returns a list of dB values. See the section item for a description of the rest of the options. Optionally an ending point can be given, using the end option. In this case the result is the average of consecutive FFTs in the specified range. Their default spacing is taken from the fftlength but this can be changed using the skip option, which tells how many points to move the FFT window each step. Options:
EDIT: I am assuming when you say amplitude, you mean how "loud" the sound appears to a human, and not the time domain voltage(Which would probably be 0 throughout the entire length since the integral of sine waves is going to be 0. eg: 10 * sin(t) is louder than 5 * sin(t), but their average value over time is 0. (You do not want to send non-AC voltages to a speaker anyways)).
To get how loud the sound is, you will need to determine the amplitudes of each frequency component. This is done with a Fourier Transform (FFT), which breaks down the sound into it's frequency components. The dbPowerSpectrum function seems to give you a list of the magnitudes (forgive me if this differs from the exact definition of a power spectrum) of each frequency. To get the total volume, you can just sum the entire list (Which will be close, xept it still might be different from percieved loudness since the human ear has a frequency response itself).
I disagree completely with this "answer" from CookieOfFortune.
granted, the question is poorly phrased... but this answer is making things much more complex than necessary. I am assuming that by 'amplitude' you mean perceived loudness. as technically each sample in the (PCM) audio stream represents an amplitude of the signal at a given time-slice. to get a loudness representation try a simple RMS calculation:
RMS
|K<
I'm not sure if this will help, but
skimpygimpy
provides facilities for parsing WAVE files into python
sequences and back -- you could potentially use this
to examine the wave form samples directly and do
what you like. You will have to read some source,
these subcomponents are not documented.