On the client side, I am sending a blob audio (wav) file. On the server side, I am trying to convert the blob file to an audio wav file. I did the following:
blob = request.FILES['file']
name = "TEST.wav"
audio = wave.open(name, 'wb')
audio.setnchannels(1)
audio.writeframes(blob.read())
I thought that converting the blob would be similar to converting a blob image to a jpeg file, but was very incorrect in that assumption. That didn't work; I get an error - "Error: sample width not specified." I then used setsampwidth() and tossed in an arbitrary number between 1 and 4 (after looking at the wave.py source file...I don't know why the bytes have to be between 1 and 4). After that another error is thrown - "Error: sampling rate not specified." How do I specify the sampling rate?
What does the setnchannels(), setsampwidth() methods do? Is there an "easy" way I generate the wav file from the blob?
Previously, I never do it before.. but, in my test this script below is worked well for me.. (But the audio output isn't same like original file).
>>> nchannels = 2
>>> sampwidth = 2
>>> framerate = 8000
>>> nframes = 100
>>>
>>> import wave
>>>
>>> name = 'output.wav'
>>> audio = wave.open(name, 'wb')
>>> audio.setnchannels(nchannels)
>>> audio.setsampwidth(sampwidth)
>>> audio.setframerate(framerate)
>>> audio.setnframes(nframes)
>>>
>>> blob = open("original.wav").read() # such as `blob.read()`
>>> audio.writeframes(blob)
>>>
I found this method at https://stackoverflow.com/a/3637480/6396981
Finally, by changing the value of nchannels and sampwidth with 1. and I got an audio that same with original file.
nchannels = 1
sampwidth = 1
framerate = 8000
nframes = 1
Tested under Python2, and got an error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 4: invalid start byte on Python3.
I have encountered the same problem as well. My problem was having a low pitched output compared to the original. I manage to reverse engineer the original audio to get the nframes, samplerate, and sampwidth using getnframes(),getframerate(), and getsampwidth() respectively. At last, I managed to tweak the sample frequency/ frame rate to somehow bring the perfect tone.
The tweaking became perfect at a certain offset frequency than the original. Mine worked fine at an offset sum of the sixteenth of the original samplerate.
i.e.
OffsetFrequency = OriginalFrequency/16
Frequency = OriginalFrequency + OffsetFrequency
Related
I faced the module Struct for the first time and my code gives me an error: "unpack requires a buffer of 1486080 bytes"
Here is my code:
def speed_up(n):
source = wave.open('sound.wav', mode='rb')
dest = wave.open('out.wav', mode='wb')
dest.setparams(source.getparams())
frames_count = source.getnframes()
data = struct.unpack("<" + str(frames_count) + "h", source.readframes(frames_count))
new_data = []
for i in range(0, len(data), n):
new_data.append(data[i])
newframes = struct.pack('<' + str(len(new_data)) + 'h', new_data)
dest.writeframes(newframes)
source.close()
dest.close()
How to figure out which format should I use?
The issue in your code is that you're providing struct.unpack with the wrong number of bytes. This is because of your usage of the wave module: Each frame in a wave file has getnchannels() samples, so when calling readframes(n) you will get back n * getnchannels() samples and this is the number you'd have to pass to struct.unpack.
To make your code more robust, you'd also need to look at getsampwidth() and use an appropriate format character, but the vast majority of wave files are 16-bit.
In the comments you also mentioned that the code didn't work after adding print(len(source.readframes(frames_count))). You didn't show the full code but I assume this is because you called readframes twice without calling rewind, so the second call didn't have any more data to return. It would be best to store the result in a variable if you want to use it in multiple lines.
I'm working with Python to do some string decoding and I am trying to understand what does this line of code...
for irradiance_data in struct.iter_unpack("qHHHHfff", irradiance_list_bytes):
print(irradiance_data)
In my case irradiance_list_bytes is something like this
"\xf5R\x960\x00\x00\x00\x009\x0f\xb4\x03\x01\x00d\x00\xa7D\xd1BC\x8c\x9d\xc2\xb3\xa5\xf0\xc0\xaer\x990\x00\x00\x00\x000\x0f\xb2\x03\x01\x00d\x00\x8f+\xd1B\x81\x9c\x9d\xc2\xf7\xfb\xe6\xc0u\x96\x9c0\x00\x00\x00\x00.\x0f\xb1\x03\x01\x00d\x00\xfe\x81\xd3B\x8a\r\x9e\xc2\xb4\xe7\x01\xc1\x1a\x7f\x9f0\x00\x00\x00\x00*\x0f\xb0\x03\x01\x00d\x00Z\xf5\xd3B\xedq\x9e\xc2&\xa1\x03\xc1\x94\x82\xa20\x00\x00\x00\x00-\x0f\xb1\x03\x01\x00d\x00\xb6\x8f\xd3Bg\xdf\x9d\xc2\x00\xad\xfd\xc0#\x93\xa50\x00\x00\x00\x000\x0f\xb2\x03\x01\x00d\x00\x95n\xd4B\x1d'\x9e\xc2\x1dW\x01\xc1\xd3\xa1\xa80\x00\x00\x00\x001\x0f\xb2\x03\x01\x00d\x00\x1d\xbc\xd3B\xeb\xca\x9d\xc2s\xbf\xf2\xc0.\xaf\xab0\x00\x00\x00\x001\x0f\xb2\x03\x01\x00d\x00\x13\xad\xd4BJx\x9d\xc2G(\xfb\xc0.\xc2\xae0\x00\x00\x00\x007\x0f\xb4\x03\x01\x00d\x00\xd1\xc9\xd4BS\xb8\x9d\xc2\xf0\xd9\xf8\xc0"
And the message error is
AttributeError: 'module' object has no attribute 'iter_unpack'
I beleive that, I have to change "qHHHHfff" to another string format, but I don't understand ?
The complete code is here...
import os
import glob
import exiftool
import base64
import struct
irradiance_list_tag = 'XMP:IrradianceList'
irradiance_calibration_measurement_golden_tag = 'XMP:IrradianceCalibrationMeasurementGolden'
irradiance_calibration_measurement_tag = 'XMP:IrradianceCalibrationMeasurement'
tags = [ irradiance_list_tag, irradiance_calibration_measurement_tag ]
directory = '/home/stagiaire/Bureau/AAAA/'
channels = [ 'RED' ]
index = 0
for channel in channels:
files = glob.glob(os.path.join(directory, '*' + channel + '*'))
with exiftool.ExifTool() as et:
metadata = et.get_tags_batch(tags, files)
for file_metadata in metadata:
irradiance_list = file_metadata[irradiance_list_tag]
irradiance_calibration_measurement = file_metadata[irradiance_calibration_measurement_tag]
irradiance_list_bytes = base64.b64decode(irradiance_list)
print(files[index])
index += 1
for irradiance_data in struct.iter_unpack("qHHHHfff", irradiance_list_bytes):
print(irradiance_data)
EDIT
So a stated by Strubbly, this is the solution for this question.
print struct.unpack("I",x[:4])
for i in range(8):
start = 4 + i*28
print struct.unpack("qHHHHfff",x[start:start+28])
struct.iter_unpack is only available in Python 3 and you are using Python 2.
There is no direct equivalent. struct.unpack will unpack one lump of 28 bytes (with that format string). struct.iter_unpack will unpack multiples of 28 bytes in Python 3.
If your data was suitable for struct.iter_unpack with that format code then you could do something like this:
for i in range(0,len(x),28):
print struct.unpack("qHHHHfff",x[i:i+28])
Unfortunately your sample data is not a multiple of 28 bytes long and so I would expect an error in Python 3 as well.
Without knowing about your data it is hard to correct your code but, at a guess, you data might have 4 bytes of some other data at the front. So that could be unpacked with something like this:
print struct.unpack("I",x[:4])
for i in range(8):
start = 4 + i*28
print struct.unpack("qHHHHfff",x[start:start+28])
In this example I have guessed that the first four bytes are an unsigned int but I have no way of knowing if that is correct. More information is needed.
I have a python script that receives chunks of binary raw audio data and I would like to change the sample rate of those chunks to 16000 and then pipe them to another component.
I tried my luck with audiotools but without success:
# f is a filelike FIFO buffer
reader = PCMFileReader(f, 44100, 1, 1, 16)
conv = PCMConverter(reader, 16000, 1, 1, 16)
Then I just write to the buffer anytime, I get a new chunk:
f.write(msg)
And read from the buffer in another thread:
while not reader.file.closed:
fl = conf.read(10)
chunk = fl.to_bytes(False, True)
The problem is that I get this value error, which seems to come from a "samplerate.c" library:
ValueError: SRC_DATA->data_out is NULL
This error only occurs with resampling. If I turn off that step, then everything works fine and I get playable audio.
Therefore my question: What would be a good tool for this task? And if audiotools turns out to be the right answer, how do I do that correctly.
here is a simplified resampler code
dataFormat is a number of bytes per sample in the stream, ex: stereo 16 bit would be = 4, original_samples is a source bin string size, desired_samples is a desired bit string size, 16KHz->44K1Hz ex: original = 160 but desired = 441, pcm is a source bin string, return is resampled bin string) :
def resampleSimplified(pcm, desired_samples, original_samples, dataFormat):
samples_to_pad = desired_samples - original_samples
q, r = divmod(desired_samples, original_samples)
times_to_pad_up = q + int(bool(r))
times_to_pad_down = q
pcmList = [pcm[i:i+dataFormat] for i in range(0, len(pcm), dataFormat)]
if samples_to_pad > 0:
# extending pcm times_to_pad times
pcmListPadded = list(itertools.chain.from_iterable(
itertools.repeat(x, times_to_pad_up) for x in pcmList)
)
else:
# shrinking pcm times_to_pad times
if times_to_pad_down > 0:
pcmListPadded = pcmList[::(times_to_pad_down)]
else:
pcmListPadded = pcmList
padded_pcm = ''.join(pcmListPadded[:desired_samples])
return padded_pcm
I'm trying to read the data from a .wav file.
import wave
wr = wave.open("~/01 Road.wav", 'r')
# sample width is 2 bytes
# number of channels is 2
wave_data = wr.readframes(1)
print(wave_data)
This gives:
b'\x00\x00\x00\x00'
Which is the "first frame" of the song. These 4 bytes obviously correspond to the (2 channels * 2 byte sample width) bytes per frame, but what does each byte correspond to?
In particular, I'm trying to convert it to a mono amplitude signal.
If you want to understand what the 'frame' is you will have to read the standard of the wave file format. For instance: https://web.archive.org/web/20140221054954/http://home.roadrunner.com/~jgglatt/tech/wave.htm
From that document:
The sample points that are meant to be "played" ie, sent to a Digital to Analog Converter(DAC) simultaneously are collectively called a sample frame. In the example of our stereo waveform, every two sample points makes up another sample frame. This is illustrated below for that stereo example.
sample sample sample
frame 0 frame 1 frame N
_____ _____ _____ _____ _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____| |_____|_____|
_____
| | = one sample point
|_____|
To convert to mono you could do something like this,
import wave
def stereo_to_mono(hex1, hex2):
"""average two hex string samples"""
return hex((ord(hex1) + ord(hex2))/2)
wr = wave.open('piano2.wav','r')
nchannels, sampwidth, framerate, nframes, comptype, compname = wr.getparams()
ww = wave.open('piano_mono.wav','wb')
ww.setparams((1,sampwidth,framerate,nframes,comptype,compname))
frames = wr.readframes(wr.getnframes()-1)
new_frames = ''
for (s1, s2) in zip(frames[0::2],frames[1::2]):
new_frames += stereo_to_mono(s1,s2)[2:].zfill(2).decode('hex')
ww.writeframes(new_frames)
There is no clear-cut way to go from stereo to mono. You could just drop one channel. Above, I am averaging the channels. It all depends on your application.
For wav file IO I prefer to use scipy. It is perhaps overkill for reading a wav file, but generally after reading the wav it is easier to do downstream processing.
import scipy.io.wavfile
fs1, y1 = scipy.io.wavfile.read(filename)
From here the data y1, will be N samples long, and will have Z columns where each column corresponds to a channel. To convert to a mono wav file you don't say how you'd like to do that conversion. You can take the average, or whatever else you'd like. For average use
monoChannel = y1.mean(axis=1)
As a direct answer to your question: two bytes make one 16-bit integer value in the "usual" way, given by the explicit formula: value = ord(data[0]) + 256 * ord(data[1]). But using the struct module is a better way to decode (and later reencode) such multibyte integers:
import struct
print(struct.unpack("HH", b"\x00\x00\x00\x00"))
# -> gives a 2-tuple of integers, here (0, 0)
or, if we want a signed 16-bit integer (which I think is the case in .wav files), use "hh" instead of "HH". (I leave to you the task of figuring out how exactly two bytes can encode an integer value from -32768 to 32767 :-)
Another way to convert 2 bytes into an int16, use numpy.fromstring(). Here's an example:
audio_sample is from a wav file.
>>> audio_sample[0:8]
b'\x8b\xff\xe1\xff\x92\xffn\xff'
>>> x = np.fromstring(audio_sample, np.int16)
>>> x[0:4]
array([-117, -31, -110, -146], dtype=int16)
You can use np.tobytes to convert back to bytes
I'm using the script found on this blog Google speech recognition with python (I give any credit to author).
import sys
import pyaudio, speex
import numpy as np # just for doing a standard deviation for audio level checks
import urllib2
import wave
e = speex.Encoder()
e.initialize(speex.SPEEX_MODEID_WB)
d = speex.Decoder()
d.initialize(speex.SPEEX_MODEID_WB)
chunk = 320 # tried other numbers... some don't work
FORMAT = pyaudio.paInt16
bytespersample=2
CHANNELS = 1
RATE = 16000 # "wideband" mode for speex. May work with 8000. Haven't tried it.
p = pyaudio.PyAudio()
# Start the stream to record the audio
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
output = True,
frames_per_buffer = chunk)
print "Listening. Recording will start when some sound is heard."
threshold = 200 # Adjust this to be slightly above the noise level of your recordings.
nquit = 40 # number of silent frames before terminating the program
nover = 0
keepgoing = True
spxlist=[] # list of the encoded speex packets/frames
while keepgoing:
data = stream.read(chunk) # grab 320 samples from the microphone
spxdata = e.encode(data) # encode using the speex dll
print "Length encoded: %d"%len(spxdata) # print the length, after encoding. Can't exceed 255!
spxlist.append(spxdata)
a=np.frombuffer(data,np.int16) # convert to numpy array to check for silence or audio
audiolevel=np.std(a)
if audiolevel < threshold: # too quiet
nover+=1
else:
nover=0
if nover >= nquit:
keepgoing=False
print '%2.1f (%d%% quiet)'%(audiolevel, nover*100/nquit)
print "Too quiet. I'm stopping now."
stream.stop_stream()
stream.close()
p.terminate()
fullspx=''.join(spxlist) # make a string of all the header-ized speex packets
out_file = open("test.spx","wb")
out_file.write(fullspx)
out_file.close()
As you can see I slightly modify the script to make it write and output file in .spx, but it dosen't work.
Any advice?
Thanks for your help.
Edit:
I'm running this script under an Ubuntu-linux machine.