Audio data string format to numpy array

Audio data string format to numpy array - python

I am trying to convert audio sample rate (from 44100 to 22050) of a numpy.array with 88200 samples in which I have already done some process (such as add silence and convert to mono). I tried to convert this array with audioop.ratecv and it work, but it return a str instead of a numpy array and when I wrote those data with scipy.io.wavfile.write the result was half of the data are lost and the audio speed is twice as fast (instead of slower, at least that would make kinda sense).
audio.ratecv works fine with str arrays such as wave.open returns, but I don't know how to process those, so I tried to convert from str to numpy with numpy.array2string(data) to pass this on ratecv and get correct results, and then convert again to numpy with numpy.fromstring(data, dtype) and now len of data is 8 samples. I think this is due to complication of formats, but I don't know how can I control it. I also haven't figure out what kind of format str does wave.open returns so I can force format on this one.
Here is this part of my code
def conv_sr(data, srold, fixSR, dType, chan = 1):
state = None
width = 2 # numpy.int16
print "data shape", data.shape, type(data[0]) # returns shape 88200, type int16
fragments = numpy.array2string(data)
print "new fragments len", len(fragments), "type", type(fragments) # return len 30 type str
fragments_new, state = audioop.ratecv(fragments, width, chan, srold, fixSR, state)
print "fragments", len(fragments_new), type(fragments_new[0]) # returns 16, type str
data_to_return = numpy.fromstring(fragments_new, dtype=dType)
return data_to_return
and I call it like this
data1 = numpy.array(data1, dtype=dType)
data_to_copy = numpy.append(data1, data2)
data_to_copy = _to_copy.sum(axis = 1) / chan
data_to_copy = data_to_copy.flatten() # because its mono
data_to_copy = conv_sr(data_to_copy, sr, fixSR, dType) #sr = 44100, fixSR = 22050
scipy.io.wavfile.write(filename, fixSR, data_to_copy)

After a bit more of research I found my mistake, it seems that 16 bit audio are made of two 8 bit 'cells', so the dtype I was putting on was false and that's why I had audio speed issue. I found the correct dtype here. So, in conv_sr def, I am passing a numpy array, convert it to data string, pass it to convert sample rate, converting again to numpy array for scipy.io.wavfile.write and finally, converting 2 8bits to 16 bit format
def widthFinder(dType):
try:
b = str(dType)
bits = int(b[-2:])
except:
b = str(dType)
bits = int(b[-1:])
width = bits/8
return width
def conv_sr(data, srold, fixSR, dType, chan = 1):
state = None
width = widthFinder(dType)
if width != 1 and width != 2 and width != 4:
width = 2
fragments = data.tobytes()
fragments_new, state = audioop.ratecv(fragments, width, chan, srold, fixSR, state)
fragments_dtype = numpy.dtype((numpy.int16, {'x':(numpy.int8,0), 'y':(numpy.int8,1)}))
data_to_return = numpy.fromstring(fragments_new, dtype=fragments_dtype)
data_to_return = data_to_return.astype(dType)
return data_to_return
If you find anything wrong, please feel free to correct me, I am still a learner

Related

Get int value from each two bytes

I am trying to read bytes from an image, and get all the int (16 bit) values from that image.
After I parsed the image header, I got to the pixel values. The values that I get when the pair of bytes are like b"\xd4\x00" is incorrect. In this case it should be 54272, not 3392.
This are parts of the code:
I use a generator to get the bytes:
import itertools
def osddef_generator(in_file):
with open(in_file, mode='rb') as f:
dat = f.read()
for byte in dat:
yield byte
def take_slice(in_generator, size):
return ''.join(str(chr(i)) for i in itertools.islice(in_generator, size))
def take_single_pixel(in_generator):
pix = itertools.islice(in_generator, 2)
hex_list = [hex(i) for i in pix]
hex_str = "".join(hex_list)[2:].replace("0x", '')
intval = int(hex_str, 16)
print("hex_list: ", hex_list)
print("hex_str: ", hex_str)
print("intval: ", intval)
After I get the header correctly using the take_slice method, I get to the part with the pixel values, where I use the take_single_pixel method.
Here, I get the bad results.
This is what I get:
hex_list: ['0xd4', '0x0']
hex_str: d40
intval: 3392
But the actual sequence of bytes that should be interpreted is: \xd4\x00, which equals to 54272, so that my hex_list = ['0xd4', '0x00'] and hex_str = d400.
Something happens when I have a sequence of bytes when the second one is \x00.
Got any ideas? Thanks!

There are much better ways of converting bytes to integters:
int.from_bytes() takes bytes input, and a byte order argument:
>>> int.from_bytes(b"\xd4\x00", 'big')
54272
>>> int.from_bytes(b"\xd4\x00", 'little')
212
The struct.unpack() function lets you convert a whole series of bytes to integers following a pattern:
>>> import struct
>>> struct.unpack('!4H', b'\xd4\x00\xd4\x00\xd4\x00\xd4\x00')
(54272, 54272, 54272, 54272)
The array module lets you read binary data representing homogenous integer data into a memory structure efficiently:
>>> array.array('H', fileobject)
However, array can't be told what byte order to use. You'd have to determine the current architecture byte order and call arr.byteswap() to reverse order if the machine order doesn't match the file order.
When reading image data, it is almost always preferable to use the struct module to do the parsing. You generally then use file.read() calls with specific sizes; if the header consists of 10 bytes, use:
headerinfo = struct.unpack('<expected header pattern for 10 bytes>', f.read(10))
and go from there. For examples, look at the Pillow / PIL image plugins source code; here is how the Blizzard Mipmap image format header is read:
def _read_blp_header(self):
self._blp_compression, = struct.unpack("<i", self.fp.read(4))
self._blp_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_depth, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_mips, = struct.unpack("<b", self.fp.read(1))
self._size = struct.unpack("<II", self.fp.read(8))
if self.magic == b"BLP1":
# Only present for BLP1
self._blp_encoding, = struct.unpack("<i", self.fp.read(4))
self._blp_subtype, = struct.unpack("<i", self.fp.read(4))
self._blp_offsets = struct.unpack("<16I", self.fp.read(16 * 4))
self._blp_lengths = struct.unpack("<16I", self.fp.read(16 * 4))
Because struct.unpack() always returns tuples, you can assign individual elements in a tuple to name1, name2, ... names on the left-hand size, including single_name, = assignments to extract a single result.
The separate set of read calls above could also be compressed into fewer calls:
comp, enc, adepth, aenc, mips, *size = struct.unpack("<i4b2I", self.fp.read(16))
if self.magic == b"BLP1":
# Only present for BLP1
enc, subtype = struct.unpack("<2i", self.fp.read(8))
followed by specific attribute assignments.

Python Pillow Image.frombytes mode '1' bad result

Where am I wrong ? I want to create a basic white pict from bytes
from PIL import Image
if __name__ == "__main__":
data = [chr(1)] * 8192
data = "".join(data)
im = Image.frombytes('1', (128,64), data, 'raw')
im = im.convert("RGB")
im.save("image.png", "PNG")
But I get this:

Just use Image.new instead:
im = Image.new(mode='RGB', size=(128,64), color=(255,255,255))
If you really want to make it from bytes, it would be like this:
Image.frombytes(mode='RGB', size=(128,64), data=b'\xff'*128*64*3)
edit: Image.frombytes expects bytes, not a list of integers. To convert a list of integers to the right type, use this:
>>> bytes([0,1,2]) # Python 3
b'\x00\x01\x02'
>>> bytes(bytearray([0,1,2])) # Python 2
'\x00\x01\x02'
edit 2: mode='1' or the docs have bug (see comment thread). Assuming you have a list of zeros and ones, 1024 elements long, and you want to convert this to an 128x64 monochromatic image (one bit per pixel) then you'll have to pack the bytes manually:
bits = [int(not (y%13 and x%7)) for x in range(64) for y in range(128)]
# asymmetric grid
octets = [bits[i:i+8] for i in range(0, len(bits), 8)]
def bits2byte(bits8):
result = 0
for bit in bits8:
result <<= 1
result |= bit
return result
data = bytes(bytearray([bits2byte(octet) for octet in octets]))
im = Image.frombytes(mode='1', size=(128,64), data=data)
im.show()
Result:

In mode 1 each byte represents 8 pixels (there might be zero padding at end of each row if the width does not divide by 8). So to get a white image, you have to pass in only the byte b'\xff'
data = b'\xff' * 1024
im = Image.frombytes('1', (128,64), data)
Even if the Pillow docs say that there's one pixel per byte in this mode, that is not true for the frombytes and tobytes methods, at least.
Any other repeating input other than \xff (all white) or \x00 (all black) will give some sort of pinstripe pattern, like the one in your question.

Can you use numpy's fromfile to store files in an ndarray?

I'm still pretty new to python. I've been able to write a program that will read in a file from binary and stores the data that's there in a few arrays. Now that I've been able to complete a few other tasks with this program, I'm trying to go back through all my code and see where I can make it more efficient, learning Python better along the way. In particular, I'm trying to update the reading and storing of data from the file. Using numpy's fromfile is MUCH, MUCH faster at unpacking data than the struct.unpack method, and works wonderfully for a 1D array structure. However, I have some of the data stored in 2D arrays. I am seemingly stuck on how to implement the same type of storing in the 2D array. Does anyone have any ideas or hints as to how I may be able to perform this?
My basic program structure is as follows:
from numpy import fromfile
import numpy as np
file = open(theFilePath,'rb')
####### File Header #########
reservedParse = 4
fileHeaderBytes = 4 + int(131072/reservedParse) #Parsing out the bins in the file header
fileHeaderArray = np.zeros(fileHeaderBytes)
fileHeaderArray[0] = fromfile(file, dtype='<I', count=1) #File Index; 4 Bytes
fileHeaderArray[1] = fromfile(file, dtype='<I', count=1) #The Number of Packets; 4 bytes
fileHeaderArray[2] = fromfile(file, dtype='<Q', count=1) #Timestamp; 16 bytes; 2, 8-byte.
fileHeaderArray[3] = fromfile(file, dtype='<Q', count=1)
fileHeaderArray[4:] = fromfile(file, dtype='<I', count=int(131072/reservedParse)) #Empty header space
####### Data Packets #########
#Note: Each packet begins with a header containing information about the data stream followed by the data.
packets = int(fileHeaderArray[1]) #The number of packets in the data stream
dataLength = int(28672)
packHeader = np.zeros(14*packets).reshape((packets,14))
data = np.zeros(dataLength*packets).reshape((packets,dataLength))
for i in range(packets):
packHeader[i][0] = fromfile(file, dtype='>H', count=1) #Model Num
packHeader[i][1] = fromfile(file, dtype='>H', count=1) #Packet ID
packHeader[i][2] = fromfile(file, dtype='>I', count=1) #Coarse Timestamp
....#Continuing on
packHeader[i][13] = fromfile(file, dtype='>i', count=1) #4 bytes of reserved space
data[i] = fromfile(file, dtype='<h', count=dataLength) #Actual data
Essentially this is what I have right now. Is there a way I can do this without doing the loop? Going through that loop does not seem particularly fast or numpy-ish.
For reference, the for-loop structure using unpack and not numpy is:
packHeader = [[0 for x in range(14)] for y in range(packets)]
data = [[0 for x in range(dataLength)] for y in range(packets)]
for i in range(packets):
packHeader[i][0] = unpack('>H', file.read(2)) #Model Num
packHeader[i][1] = unpack('>H', file.read(2)) #Packet ID
packHeader[i][2] = unpack('>I', file.read(4)) #Coarse Timestamp
....#Continuing on
packHeader[i][13] = unpack('>i', file.read(4)) #4 bytes of reserved space
packHeader[i]=list(chain(*packHeader[i])) #Deals with the tuple issue ((x,),(y,),...) -> (x,y,...)
data[i] = [unpack('<h', file.read(2)) for j in range(dataLength)] #Actual data
data[i] = list(chain(*data[i])) #Deals with the tuple issue ((x,),(y,),...) -> (x,y,...)
Let me know if any clarification is needed.

processing an audio stream with python (resampling)

I have a python script that receives chunks of binary raw audio data and I would like to change the sample rate of those chunks to 16000 and then pipe them to another component.
I tried my luck with audiotools but without success:
# f is a filelike FIFO buffer
reader = PCMFileReader(f, 44100, 1, 1, 16)
conv = PCMConverter(reader, 16000, 1, 1, 16)
Then I just write to the buffer anytime, I get a new chunk:
f.write(msg)
And read from the buffer in another thread:
while not reader.file.closed:
fl = conf.read(10)
chunk = fl.to_bytes(False, True)
The problem is that I get this value error, which seems to come from a "samplerate.c" library:
ValueError: SRC_DATA->data_out is NULL
This error only occurs with resampling. If I turn off that step, then everything works fine and I get playable audio.
Therefore my question: What would be a good tool for this task? And if audiotools turns out to be the right answer, how do I do that correctly.

here is a simplified resampler code
dataFormat is a number of bytes per sample in the stream, ex: stereo 16 bit would be = 4, original_samples is a source bin string size, desired_samples is a desired bit string size, 16KHz->44K1Hz ex: original = 160 but desired = 441, pcm is a source bin string, return is resampled bin string) :
def resampleSimplified(pcm, desired_samples, original_samples, dataFormat):
samples_to_pad = desired_samples - original_samples
q, r = divmod(desired_samples, original_samples)
times_to_pad_up = q + int(bool(r))
times_to_pad_down = q
pcmList = [pcm[i:i+dataFormat] for i in range(0, len(pcm), dataFormat)]
if samples_to_pad > 0:
# extending pcm times_to_pad times
pcmListPadded = list(itertools.chain.from_iterable(
itertools.repeat(x, times_to_pad_up) for x in pcmList)
)
else:
# shrinking pcm times_to_pad times
if times_to_pad_down > 0:
pcmListPadded = pcmList[::(times_to_pad_down)]
else:
pcmListPadded = pcmList
padded_pcm = ''.join(pcmListPadded[:desired_samples])
return padded_pcm

how to get wav samples from a wav file?

I want to know how to get samples out of a .wav file in order to perform windowed join of two .wav files.
Can any one please tell how to do this?

The wave module of the standard library is the key: after of course import wave at the top of your code, wave.open('the.wav', 'r') returns a "wave read" object from which you can read frames with the .readframes method, which returns a string of bytes which are the samples... in whatever format the wave file has them (you can determine the two parameters relevant to decomposing frames into samples with the .getnchannels method for the number of channels, and .getsampwidth for the number of bytes per sample).
The best way to turn the string of bytes into a sequence of numeric values is with the array module, and a type of (respectively) 'B', 'H', 'L' for 1, 2, 4 bytes per sample (on a 32-bit build of Python; you can use the itemsize value of your array object to double-check this). If you have different sample widths than array can provide you, you'll need to slice up the byte string (padding each little slice appropriately with bytes worth 0) and use the struct module instead (but that's clunkier and slower, so use array instead if you can).

You can use the wave module. First you should read the metadata, such us sample size or the number of channels. Using the readframes() method, you can read samples, but only as a byte string. Based on the sample format, you have to convert them to samples using struct.unpack().
Alternatively, if you want the samples as an array of floating-point numbers, you can use SciPy's io.wavfile module.

Here's a function to read samples from a wave file (tested with mono & stereo):
def read_samples(wave_file, nb_frames):
frame_data = wave_file.readframes(nb_frames)
if frame_data:
sample_width = wave_file.getsampwidth()
nb_samples = len(frame_data) // sample_width
format = {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples
return struct.unpack(format, frame_data)
else:
return ()
And here's the full script that does windowed mixing or concatenating of multiple .wav files. All input files need to have the same params (# of channels and sample width).
import argparse
import itertools
import struct
import sys
import wave
def _struct_format(sample_width, nb_samples):
return {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples
def _mix_samples(samples):
return sum(samples)//len(samples)
def read_samples(wave_file, nb_frames):
frame_data = wave_file.readframes(nb_frames)
if frame_data:
sample_width = wave_file.getsampwidth()
nb_samples = len(frame_data) // sample_width
format = _struct_format(sample_width, nb_samples)
return struct.unpack(format, frame_data)
else:
return ()
def write_samples(wave_file, samples, sample_width):
format = _struct_format(sample_width, len(samples))
frame_data = struct.pack(format, *samples)
wave_file.writeframes(frame_data)
def compatible_input_wave_files(input_wave_files):
nchannels, sampwidth, framerate, nframes, comptype, compname = input_wave_files[0].getparams()
for input_wave_file in input_wave_files[1:]:
nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
if (nc,sw,fr,ct,cn) != (nchannels, sampwidth, framerate, comptype, compname):
return False
return True
def mix_wave_files(output_wave_file, input_wave_files, buffer_size):
output_wave_file.setparams(input_wave_files[0].getparams())
sampwidth = input_wave_files[0].getsampwidth()
max_nb_frames = max([input_wave_file.getnframes() for input_wave_file in input_wave_files])
for frame_window in xrange(max_nb_frames // buffer_size + 1):
all_samples = [read_samples(wave_file, buffer_size) for wave_file in input_wave_files]
mixed_samples = [_mix_samples(samples) for samples in itertools.izip_longest(*all_samples, fillvalue=0)]
write_samples(output_wave_file, mixed_samples, sampwidth)
def concatenate_wave_files(output_wave_file, input_wave_files, buffer_size):
output_wave_file.setparams(input_wave_files[0].getparams())
sampwidth = input_wave_files[0].getsampwidth()
for input_wave_file in input_wave_files:
nb_frames = input_wave_file.getnframes()
for frame_window in xrange(nb_frames // buffer_size + 1):
samples = read_samples(input_wave_file, buffer_size)
if samples:
write_samples(output_wave_file, samples, sampwidth)
def argument_parser():
parser = argparse.ArgumentParser(description='Mix or concatenate multiple .wav files')
parser.add_argument('command', choices = ("mix", "concat"), help='command')
parser.add_argument('output_file', help='ouput .wav file')
parser.add_argument('input_files', metavar="input_file", help='input .wav files', nargs="+")
parser.add_argument('--buffer_size', type=int, help='nb of frames to read per iteration', default=1000)
return parser
if __name__ == '__main__':
args = argument_parser().parse_args()
input_wave_files = [wave.open(name,"rb") for name in args.input_files]
if not compatible_input_wave_files(input_wave_files):
print "ERROR: mixed wave files must have the same params."
sys.exit(2)
output_wave_file = wave.open(args.output_file, "wb")
if args.command == "mix":
mix_wave_files(output_wave_file, input_wave_files, args.buffer_size)
elif args.command == "concat":
concatenate_wave_files(output_wave_file, input_wave_files, args.buffer_size)
output_wave_file.close()
for input_wave_file in input_wave_files:
input_wave_file.close()

After reading the samples (for example with the wave module, more details here) you may want to have the values scales between -1 and 1 (this is the convention for audio signals).
In this case, you can add:
# scale to -1.0 -- 1.0
max_nb_bit = float(2**(nb_bits-1))
samples = signal_int / (max_nb_bit + 1.0)
with nb_bits the bit depth and signal_int the integers values.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Audio data string format to numpy array - python

Related

Get int value from each two bytes

Python Pillow Image.frombytes mode '1' bad result

Can you use numpy's fromfile to store files in an ndarray?

processing an audio stream with python (resampling)

how to get wav samples from a wav file?

Categories

Resources