How to transpose part of sound by librosa - python

For example,
y, sr = librosa.load("sound.wav",sr=44100,mono=True)
half = int(y.shape / 2)
y1 = y[:half]
y2 = y[half:]
y_pit= librosa.effects.pitch_shift(y2, sr, n_steps=24)
y = np.concatenate([y1,y_pit])
This code imports sound.wav and pitch-shift only the later half ,then finally makes one sound file.
Now, what I want to do is more.
I would like to pitch-shift only around specific hz like 440hz=A
For example
In the case, I have sound (A C E) = Am Chord
I want to pitch shift only around A then make (G C E)
Where should I start ?? or librosa.effects.pitch_shift is useful for this purpose???

This is not possible with a pitch shifter. A pitch shifter simply changes the frequencies by slowing the sound up or down (as a varispeed) then cutting some small slices if the resulting sound is longer or, at the contrary, duplicating some small slices while it is shorter. As you can imagine, this process handles the whole wave as a single thing meaning that the spectrum is completely transposed.
Doing what you want requires a much more sophisticated technique called resynthesis which first converts the wave in a synthetic sound using FFT and additive synthesis (or other techniques more appropriate when the sound is noisy), then allows some manipulation on independent parts of the spectrum, and finally reconverts the synthetic sound to an audio wave. There is a standalone software doing that quite well which is called Spear. You could also investigate Loris which seems to have a python module.

Related

How do I find amplitude of wav file in python?

I am working with wav files analysis using the librosa library in python. I used librosa.load() to load the audio file. Apparently this function loads the wav file into a numpy array with normalised amplitude values in the range -1 to 1. But I need to get the actual amplitude values for processing. How can I find that?
Thanks in advance!
You observed correctly that librosa always normalizes the samples to mono [-1:1] (and also 22050 Hz). That said, it's digital audio, so could multiply with whatever you want to get a different scale. If you insist, that your samples are on a scale of -2^15 to 2^15, simply multiply with 2^15. It pretty much means the same.
You won't gain anything, except dragging a peculiarity of the encoding audio format into your data.
That said, if that's what you want, you could use PySoundFile like this:
import soundfile as sf
y, sr = sf.read('existing_file.wav', dtype='int16')
The parameter dtype='int16' tells the library to assume a signed 16bit format per sample.
You can't. As Hendrik mentioned, the signal is digital and the amplitude in the WAV file won't tell you anything about the actual sound wave amplitude / sound power. That's completely lost the moment it was digitalised to WAV.
That being said, you can compute e.g. loudness, a relative perception of the sound power. If you are dealing with human auditory system, one of the recommended approaches is to:
Use to the Bark scale (Bark scale better reflects how we hear).
Compute energy in each bin.
(Optional) Normalise by the overall sum.
If you don't want to compute it yourself, check out e.g. YAAFE.

How to properly use pitch_shift (librosa)?

I try to use the librosa and pitch_shift from librosa.
I recorded some my voice and used this code:
sampling_rate= 44100
y, sr = librosa.load(directory, sr=sampling_rate) # y is a numpy array of the wav file, sr = sample rate
y_shifted = librosa.effects.pitch_shift(y, sr, n_steps=4, bins_per_octave=24) # shifted by 4 half steps
librosa.output.write_wav(directory, y_shifted, sr=sampling_rate, norm=False)
It works fine - almost.
I hear some noise in my new voice (after pitch_shifting)
Is there something what I need to use?
Without shift:
https://vocaroo.com/i/s1qEEDvzcUHN
With shift (n_steps = 4):
https://vocaroo.com/i/s0cOiC0cFJSB
Pitch-shifting typically involves an STFT, the shift—usually of a magnitude spectrum along the frequency axis, and then signal reconstruction via the Griffin-Lim-algorithm (Quora-explanation on how Griffin-Lim works).
The problem is that when we shift the magnitude spectrum, we do just that—and ignore the phase! Griffin-Lim tries to find a reasonable solution to find the correct phase when reconstructing the time domain signal, but it's often just that: a reasonable solution, not a perfect one. And that is why you hear this metallic twang. That's the phases of your signal not being quite right (also called "phasiness").
I believe your function call to librosa is perfectly alright. It may just not be the greatest implementation on earth. Give PyRubberband a try. It's based on Rubberband (a C++ library) and has a good reputation.

how to manipulate an image very fast in accordance to an own math function in python

I'm trying to create a 360 degree camera just like google street cameras
(This is my whole code if you are interested)
I have a individual kind of perspective equation that map pixel [xold,yold] to [xnew,ynew] in accordance to Alpha and Beta angles as inputs.
To simplify that equation and my question, i assume i'm just trying to rotate an image.
Now my question is how to rotate an image by using rotation equation on each pixel very fast on pygame or anyother intractive shell:
xnew = xold * cos(alpha) - yold * sin(alpha)
ynew = xold * sin(alpha) + yold * cos(alpha)
Assume pygame.transform.rotate() is not available
Read the following words from pygame.org:
http://www.pygame.org/docs/ref/surface.html
"There is support for pixel access for the Surfaces. Pixel access on hardware surfaces is slow and not recommended. Pixels can be accessed using the get_at() and set_at() functions. These methods are fine for simple access, but will be considerably slow when doing of pixel work with them. If you plan on doing a lot of pixel level work, it is recommended to use a pygame.PixelArray object for direct pixel access of surfaces, which gives an array like view of the surface. For involved mathematical manipulations try the pygame.surfarray module for accessing surface pixel data using array interfaces module (It’s quite quick, but requires NumPy.)"
pygame.Surface.set_at((x,y),Color) is definitely the easiest way to do it, but for performance (which is what you asked), you must use pygame.PixelArray or pygame.surfarray.
I can't do the coding for you because I'm short on time, but these websites will point you in the right direction:
http://www.pygame.org/docs/ref/pixelarray.html#pygame.PixelArray
http://www.pygame.org/docs/ref/surfarray.html#module-pygame.surfarray
Good luck with your coding!
Given that you are trying to simulate a 3D environment, it would be extremely hard to beat a solution with PyOpenGL performance-wise. From what I saw when I ran your code, it looks like you are implementing a "skybox", where the viewer would be in a virtual cube. OpenGL is meant for 3D computations like this, so you do not need to manually shift pixels on at a time but instead let the GPU do you that for you while you just pass in a series of vertices and textures! If you need really complicated equations that manipulate every single pixel on the screen, you would then be able to use GLSL shaders to do that work on the GPU in parallel. Let me know if you want me to elaborate on this if you are interested in this approach, as it is would be very different from your current code.

Generating smooth audio from Python on a low-powered computer

I am trying to write a simple audio function generator in Python, to be run on a Raspberry Pi (model 2). The code essentially does this:
Generate 1 second of the audio signal (say, a sine wave, or a square wave, etc)
Play it repeatedly in a loop
For example:
import pyaudio
from numpy import linspace,sin,pi,int16
def note(freq, len, amp=1, rate=44100):
t = linspace(0,len,len*rate)
data = sin(2*pi*freq*t)*amp
return data.astype(int16) # two byte integers
RATE = 44100
FREQ = 261.6
pa = pyaudio.PyAudio()
s = pa.open(output=True,
channels=2,
rate=RATE,
format=pyaudio.paInt16,
output_device_index=2)
# generate 1 second of sound
tone = note(FREQ, 1, amp=10000, rate=RATE)
# play it forever
while True:
s.write(tone)
The problem is that every iteration of the loop results in an audible "tick" in the audio, even when using an external USB sound card. Is there any way to avoid this, rather than trying to rewrite everything in C?
I tried using the pyaudio callback interface, but that actually sounded worse (like maybe my Pi was flatulent).
The generated audio needs to be short because it will ultimately be adjusted dynamically with an external control, and anything more than 1 second latency on control changes just feels awkward. Is there a better way to produce these signals from within Python code?
You're hearing a "tick" because there's a discontinuity in the audio you're sending. One second of 261.6 Hz contains 261.6 cycles, so you end up with about half a cycle left over at the end:
You'll need to either change the frequency so that there are a whole number of cycles per second (e.g, 262 Hz), change the duration such that it's long enough for a whole number of cycles, or generate a new audio clip every second that starts in the right phase to fit where the last chunk left off.
I was looking for a similar question to yours, and found a variation that plays a pre-calculated length by concatenating a bunch of pre-calculated chunks.
http://milkandtang.com/blog/2013/02/16/making-noise-in-python/
Using a for loop with a 1-second pre-calculated chunk "play_tone" function seems to generate a smooth sounding output, but this is on a PC. If this doesn't work for you, it may be that the raspberry pi has a different back-end implementation that doesn't like successive writes.

Get the amplitude at a given time within a sound file?

I'm working on a project where I need to know the amplitude of sound coming in from a microphone on a computer.
I'm currently using Python with the Snack Sound Toolkit and I can record audio coming in from the microphone, but I need to know how loud that audio is. I could save the recording to a file and use another toolkit to read in the amplitude at given points in time from the audio file, or try and get the amplitude while the audio is coming in (which could be more error prone).
Are there any libraries or sample code that can help me out with this? I've been looking and so far the Snack Sound Toolkit seems to be my best hope, yet there doesn't seem to be a way to get direct access to amplitude.
Looking at the Snack Sound Toolkit examples, there seems to be a dbPowerSpectrum function.
From the reference:
dBPowerSpectrum ( )
Computes the log FFT power spectrum of the sound (at the sample number given in the start option) and returns a list of dB values. See the section item for a description of the rest of the options. Optionally an ending point can be given, using the end option. In this case the result is the average of consecutive FFTs in the specified range. Their default spacing is taken from the fftlength but this can be changed using the skip option, which tells how many points to move the FFT window each step. Options:
EDIT: I am assuming when you say amplitude, you mean how "loud" the sound appears to a human, and not the time domain voltage(Which would probably be 0 throughout the entire length since the integral of sine waves is going to be 0. eg: 10 * sin(t) is louder than 5 * sin(t), but their average value over time is 0. (You do not want to send non-AC voltages to a speaker anyways)).
To get how loud the sound is, you will need to determine the amplitudes of each frequency component. This is done with a Fourier Transform (FFT), which breaks down the sound into it's frequency components. The dbPowerSpectrum function seems to give you a list of the magnitudes (forgive me if this differs from the exact definition of a power spectrum) of each frequency. To get the total volume, you can just sum the entire list (Which will be close, xept it still might be different from percieved loudness since the human ear has a frequency response itself).
I disagree completely with this "answer" from CookieOfFortune.
granted, the question is poorly phrased... but this answer is making things much more complex than necessary. I am assuming that by 'amplitude' you mean perceived loudness. as technically each sample in the (PCM) audio stream represents an amplitude of the signal at a given time-slice. to get a loudness representation try a simple RMS calculation:
RMS
|K<
I'm not sure if this will help, but
skimpygimpy
provides facilities for parsing WAVE files into python
sequences and back -- you could potentially use this
to examine the wave form samples directly and do
what you like. You will have to read some source,
these subcomponents are not documented.

Categories