Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I need any python library to change pitch of my wav file without any raw audio data processing.
I spent couple hours to find it, but only found some strange raw data processing code snippets and video, that shows real-time pitch shift, but without source code.
Since a wav file basically is raw audio data, you won't be able to change the pitch without "raw audio processing".
Here is what you could do.
You will need the wave (standard library) and numpy modules.
import wave
import numpy as np
Open the files.
wr = wave.open('input.wav', 'r')
# Set the parameters for the output file.
par = list(wr.getparams())
par[3] = 0 # The number of samples will be set by writeframes.
par = tuple(par)
ww = wave.open('pitch1.wav', 'w')
ww.setparams(par)
The sound should be processed in small fractions of a second. This cuts down on reverb. Try setting fr to 1; you'll hear annoying echos.
fr = 20
sz = wr.getframerate()//fr # Read and process 1/fr second at a time.
# A larger number for fr means less reverb.
c = int(wr.getnframes()/sz) # count of the whole file
shift = 100//fr # shifting 100 Hz
for num in range(c):
Read the data, split it in left and right channel (assuming a stereo WAV file).
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # left and right channel
Extract the frequencies using the Fast Fourier Transform built into numpy.
lf, rf = np.fft.rfft(left), np.fft.rfft(right)
Roll the array to increase the pitch.
lf, rf = np.roll(lf, shift), np.roll(rf, shift)
The highest frequencies roll over to the lowest ones. That's not what we want, so zero them.
lf[0:shift], rf[0:shift] = 0, 0
Now use the inverse Fourier transform to convert the signal back into amplitude.
nl, nr = np.fft.irfft(lf), np.fft.irfft(rf)
Combine the two channels.
ns = np.column_stack((nl, nr)).ravel().astype(np.int16)
Write the output data.
ww.writeframes(ns.tostring())
Close the files when all frames are processed.
wr.close()
ww.close()
I recommend trying Librosa's pitch shift function:
https://librosa.github.io/librosa/generated/librosa.effects.pitch_shift.html
import librosa
y, sr = librosa.load('your_file.wav', sr=16000) # y is a numpy array of the wav file, sr = sample rate
y_shifted = librosa.effects.pitch_shift(y, sr, n_steps=4) # shifted by 4 half steps
You can try pydub for quick and easy pitch change across entire audio file and for different formats (wav, mp3 etc).
Here is a working code. Inspiration from here and refer here for more details on pitch change.
from pydub import AudioSegment
from pydub.playback import play
sound = AudioSegment.from_file('in.wav', format="wav")
# shift the pitch up by half an octave (speed will increase proportionally)
octaves = 0.5
new_sample_rate = int(sound.frame_rate * (2.0 ** octaves))
# keep the same samples but tell the computer they ought to be played at the
# new, higher sample rate. This file sounds like a chipmunk but has a weird sample rate.
hipitch_sound = sound._spawn(sound.raw_data, overrides={'frame_rate': new_sample_rate})
# now we just convert it to a common sample rate (44.1k - standard audio CD) to
# make sure it works in regular audio players. Other than potentially losing audio quality (if
# you set it too low - 44.1k is plenty) this should now noticeable change how the audio sounds.
hipitch_sound = hipitch_sound.set_frame_rate(44100)
#Play pitch changed sound
play(hipitch_sound)
#export / save pitch changed sound
hipitch_sound.export("out.wav", format="wav")
Related
I am using pydub to do some experiments with an audio file. After I load it I want to take some parts and analyze them further with numpy, so I extract the raw data as explained here Pydub raw audio data
song = AudioSegment.from_file("dodos_delight.m4a")# Size of segments to break song into for volume calculations
# Take the first 17.5 seconds
start = 0
end = 17.5*1000
guitar = song[start:end]
#Now get raw data as an array
bit_depth = song.sample_width * 8
array_type = get_array_type(bit_depth)
fs = song.frame_rate
guitar_np = array.array(array_type, guitar.raw_data)
guitar_t = np.arange(0,len(guitar_np)/fs,1/fs)
However len(guitar_np)/fs = 35 which does not make sense. It's the exact double of what it should be. The only way to be 17.5 would be if fs was doubled, but the points are taken at 1/fs time apart.
If I try to save the data like this
from scipy.io.wavfile import write
rate = fs
scaled = np.int16(guitar_np / np.max(np.abs(guitar_np)) * 32767)
write('test.wav', rate, scaled)
I get a super slow version of it, and the only way to make it sound as the original is to save it with rate = fs*2
Any thought?
I'm still starting with audio processing using python and still trying to understand the nature of data and how to process it. I've been stuck trying to increase/decrease the amplitude (volume/loudness if I can say so) of a wav file (without using Pydub Audiosegment) I haven't seen this answered somewhere from what I saw
I extract the audio data using the following code but I don't know what to do next :
import numpy as np
import wave
filename = 'violin.wav'
audiofile = wave.open(filename,'rb')
nch = audiofile.getnchannels()
if nch == 2:
print('Stereo audio file')
elif nch == 1:
print('Mono audio file')
sw = audiofile.getsampwidth()
n_frames = audiofile.getnframes()
fr = audiofile.getframerate()
frames = audiofile.readframes(-1)
typ = { 1: np.int8, 2: np.int16, 4: np.int32 }.get(sw)
data = np.frombuffer(frames,dtype=typ)
I have tried increasing the values of the data array by certain amount but seems that's not how it works I also was trying to do it using the Fourier Transform but I get stuck reversing the process
How can I change the amplitude from the numpy array? is it necessary to go through Fourier Transform for that?
Thank Youu!
Well works fine with be.
The missing piece might be how to convert back to bytes without changing the size (since you did not use unpack, you might have done something wrong when pack. Good news is you can't pack the same way you unpacked)
import numpy as np
import wave
filename = 'violin.wav'
audiofile = wave.open(filename,'rb')
nch = audiofile.getnchannels()
if nch == 2:
print('Stereo audio file')
elif nch == 1:
print('Mono audio file')
sw = audiofile.getsampwidth()
n_frames = audiofile.getnframes()
fr = audiofile.getframerate()
frames = audiofile.readframes(-1)
typ = { 1: np.int8, 2: np.int16, 4: np.int32 }.get(sw)
data = np.frombuffer(frames,dtype=typ)
# Your code, so far
p = audiofile.getparams() # Just to get all params
outfile = wave.open("out.wav", 'wb')
outfile.setparams(p) # same params
outfile.writeframes((data*0.5).astype(typ).tobytes())
outfile.close()
Note that data*0.5 is float type (data//2 would have kept the correct type, but I assume you may want to scale with any scalar value).
So, we need to put back to the correct int type. And you already computed it, so it's easy: I just reuse you typ variabe
(data*0.5).astype(typ) is the data, scale by 0.5, with the correct type.
So (data*0.5).astype(type).tobytes() are the bytes.
Note on adding value
Note that adding a value does nothing (except, if too big, saturating the file). It's a wave. So it's the frequency, and the amplitude that counts. Not the 0.
This remark is not computer science. Just basic physics: sound is the variation of atmospheric pressure. When you hit a La (or A) on a piano, the cord vibrate at 440 Hz, and so it creates an variation of air pressure, at 440 Hz. Whether weather is good and pressure is 102000 Pa, and, because of the piano, oscillate between 101999.9 Pa and 102000.1 Pa, or weather is bad, and pressure oscillate between 98999.9 Pa and 99000.1 Pa, or even you are in a plane, listening to a piano, making pressure in your ear oscillate between 69999.9 Pa and 70000.1 Pa, you won't really think the sound is different. It is the oscilation (its frequency, its amplitude) that matters, not the medium value it oscillates around.
Another way to put it: the direct way to play a sound is to move a membrane following the data in data.
Imagine you have, connected to your computer (in fact, it is what you have, more or less) an engine able to displace a metal plaque. If you send 0 to that engine, plaque is positioned at 0 mm. If you send 1000, it is positioned at 1mm, 2000 -> 2mm. Etc.
Now, if, at a rate of 44100 samples per second (assuming 44100 Hz sampling), you send the data in data, to make the motion of the plaque follows your data, you'll here the sound.
Now, if you add 1000 to all your data, and play it again.
Well, plaque will follows the exact same movement. Only 1mm more to the right. But that is the same movement. It is exactly as if you move yourself 1mm to the left before playing again. Of course, you don't expect any change from that.
On the other hand, if you multiply all the data by 2, and play again, then motion of the plaque will be the same, but twice as big (a displacement of 1mm is turned into one of 2mm). So, not surprisingly, you hear the same sound. But louder.
So, long story short: adding a value to sound samples does nothing. Except maybe making the values go out of the possible range.
I am trying to detect sudden loud noises in audio recordings. One way I have found to do this is by creating a spectrogram of the audio and adding the values of each column. By graphing the sum of the values in each column, one can see a spike every time there is a sudden loud noise. The problem is that, in my use case, I need to play a beep tone (with a frequency of 2350 Hz), while the audio is being recorded. The spectrogram of the beep looks like this:
As you can see, at the beginning and the end of this beep (which is a simple tone with a frequency of 2350 Hz), there are other frequencies present, which I have been unsuccessful in removing. These unwanted frequencies cause a spike when summing up the columns of the spectrogram, at the beginning and at the end of the beep. I want to avoid this because I don't want my beep to be detected as a sudden loud noise. See the spectrogram below for reference:
Here is what the graph of the sum of each column in the spectrogram:
Obviously, I want to avoid having false positives in my algorithm. So I need some way of getting rid of the spikes caused by the beginning and end of the beep. One idea that I have had so far is to add random noise with a low decibel value above and/or below the 2350 Hz line in the beep spectrogram above. This would ideally, create a tone that sounds very similar to the original, but instead of creating a spike when I add up all the values in the column, it would create more of a plateau. Is this idea a feasible solution to my problem? If so, how would I go about creating a beep sound that has random noise like I described above using python? Is there another, easier solution to my problem that I am overlooking?
Currently, I am using the following code to generate my beep sound:
import math
import wave
import struct
audio = []
sample_rate = 44100.0
def append_sinewave(
freq=440.0,
duration_milliseconds=500,
volume=1.0):
"""
The sine wave generated here is the standard beep. If you want something
more aggresive you could try a square or saw tooth waveform. Though there
are some rather complicated issues with making high quality square and
sawtooth waves... which we won't address here :)
"""
global audio # using global variables isn't cool.
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(volume * math.sin(2 * math.pi * freq * ( x / sample_rate )))
return
def save_wav(file_name):
# Open up a wav file
wav_file=wave.open(file_name,"w")
# wav params
nchannels = 1
sampwidth = 2
# 44100 is the industry standard sample rate - CD quality. If you need to
# save on file size you can adjust it downwards. The stanard for low quality
# is 8000 or 8kHz.
nframes = len(audio)
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
# WAV files here are using short, 16 bit, signed integers for the
# sample size. So we multiply the floating point data we have by 32767, the
# maximum value for a short integer. NOTE: It is theortically possible to
# use the floating point -1.0 to 1.0 data directly in a WAV file but not
# obvious how to do that using the wave module in python.
for sample in audio:
wav_file.writeframes(struct.pack('h', int( sample * 32767.0 )))
wav_file.close()
return
append_sinewave(volume=1, freq=2350)
save_wav("output.wav")
Not really an answer - more of a question.
You're asking the speaker to go from stationary to a sine wave instantaneously - that is quite hard to do (though the frequencies aren't that high). If it does manage it, then the received signal should be the convolution of the top hat and the sine wave (sort of like what you are seeing, but without having some data and knowing what you're doing for the spectrogram it's hard to tell).
In either case you could check this by smoothing the start and end of your tone. Something like this for your tone generation:
tr = 0.05 # rise time, in seconds
tf = duration_milliseconds / 1000 # finish time of tone, in seconds
for x in range(int(num_samples)):
t = x / sample_rate # Time of sample in seconds
# Calculate a bump function
bump_function = 1
if 0 < t < tr: # go smoothly from 0 to 1 at the start of the tone
tp = 1 - t / tr
bump_function = math.e * math.exp(1/(tp**2 - 1))
elif tf - tr < t < tf: # go smoothly from 1 to 0 at the end of the tone
tp = 1 + (t - tf) / tr
bump_function = math.e * math.exp(1/(tp**2 - 1))
audio.append(volume * bump_function * math.sin(2 * math.pi * freq * t))
You might need to tune the rise time a bit. With this form of bump function you know that you have a full volume tone from tr after the start to tr before the end. Lots of other functions exist, but if this smooths the start/stop effects in your spectrogram then you at least know why they are there. And prevention is generally better than trying to remove the effect in post-processing.
I'm in a need for a program that transforms tones recorded by the microphone into keybord presses. Example: if somebody sings at a frequency between 400hz and 600hz at the microphone and the average tone is 550hz, then i store the average frequency in the var 'tom', and the key "G" of my keyboard is pressed.
Even tho i'm newbye at programming, i searched and figured out a way to do so,
by using Audiopy at python language, by recording small WAV files, i could then read those and get a number as an average frequency, and with this number and some ifs and elifs, press keys (not that hard to find a code to press keys), in an enormous WHILE that repeats the process while the program runs, and so i would have the process of talking, reading the small files the talk would produce, and then transforming into key presses, according to the tone.
The main problem is that I have no idea on how to transform the WAV files i've been recording on a single average number of frequency. Can somebody help me with this? Or with the big picture? Cuz i know this method is not a really good one. Thanks! I was using this code to record, that I found on the Audiopy website:
import pyaudio
import wave
import numpy as np
import pyaudio
CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "output1.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done chunk")
stream.stop_stream()
stream.close()
p.terminate()
To press the keys, this other code:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
if tom >= 400 and tom<=500:
shell.SendKeys("G")
PS.: I'm using Windows
You can use the Fourier transform to convert sound into frequencies.
More specifically, use the one-dimensional discrete Fourier Transform provided by numpy.fft.rfft.
An example to read a single second from a stereo WAV file and extract the frequencies.
import wave
import numpy as np
with wave.open('input.wav', 'r') as wr:
sz = wr.getframerate() # Read and process 1 second.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # separate into left and right channel
lf, rf = np.absolute(np.fft.rfft(left)), np.absolute(np.fft.rfft(right))
The lf and rf are numpy arrays containing the intensity of each frequency. Using numpy.argmax you can get the index (frequency) with the highest strength.
But try it and graph the result using e.g. matplotlib. You'll see that there are probably multiple peaks in the data. For example you might find a peak at 50 Hz or 60 Hz. This is most probably interference from mains electricity and should be ignored by zero-ing out the data.
Example for 60 Hz:
lf[55:65], rf[55:65] = 0, 0
Below is an example plot made with matplotlib from a one-second sound clip. The top graph shows the samples from the WAV file while the bottom one shows the same data converted to frequencies. This is a graph of a person speaking, so there are many peaks. The highest is around 200 Hz.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
i was wondering if there is a way to set the audio pitch. A certain tone would be the base. i want to know how to make the pitch go up or down. thanks.
also how do you play an audio tone. also if you know about any modules that do this, i would like to know them thanks.
my goal is to create a pong game that a blind person could play. the higher the ball is, the higher the pitch. the lower the ball, the lower the pitch. preferably in python. thanks in advance
If you want to try pyaudio library, then you can use this function piece of code I created some days ago!
import pyaudio
import struct
import math
SHRT_MAX=32767 # short uses 16 bits in complement 2
def my_sin(t,frequency):
radians = t * frequency * 2.0 * math.pi
pulse = math.sin(radians)
return pulse
#pulse_function creates numbers in [-1,1] interval
def generate(duration = 5,pulse_function = (lambda t: my_sin(t,1000))):
sample_width=2
sample_rate = 44100
sample_duration = 1.0/sample_rate
total_samples = int(sample_rate * duration)
p = pyaudio.PyAudio()
pformat = p.get_format_from_width(sample_width)
stream = p.open(format=pformat,channels=1,rate=sample_rate,output=True)
for n in range(total_samples):
t = n*sample_duration
pulse = int(SHRT_MAX*pulse_function(t))
data=struct.pack("h",pulse)
stream.write(data)
#example of a function I took from wikipedia.
major_chord = f = lambda t: (my_sin(t,440)+my_sin(t,550)+my_sin(t,660))/3
#choose any frequency you want
#choose amplitude from 0 to 1
def create_pulse_function(frequency=1000,amplitude=1):
return lambda t: amplitude * my_sin(t,frequency)
if __name__=="__main__":
# play fundamental sound at 1000Hz for 5 seconds at maximum intensity
f = create_pulse_function(1000,1)
generate(pulse_function=f)
# play fundamental sound at 500Hz for 5 seconds at maximum intensity
f = create_pulse_function(500,1)
generate(pulse_function=f)
# play fundamental sound at 500Hz for 5 seconds at 50% intensity
f = create_pulse_function(500,0.5)
generate(pulse_function=f)