Python Wave byte data - python

I'm trying to read the data from a .wav file.
import wave
wr = wave.open("~/01 Road.wav", 'r')
# sample width is 2 bytes
# number of channels is 2
wave_data = wr.readframes(1)
print(wave_data)
This gives:
b'\x00\x00\x00\x00'
Which is the "first frame" of the song. These 4 bytes obviously correspond to the (2 channels * 2 byte sample width) bytes per frame, but what does each byte correspond to?
In particular, I'm trying to convert it to a mono amplitude signal.

If you want to understand what the 'frame' is you will have to read the standard of the wave file format. For instance: https://web.archive.org/web/20140221054954/http://home.roadrunner.com/~jgglatt/tech/wave.htm
From that document:
The sample points that are meant to be "played" ie, sent to a Digital to Analog Converter(DAC) simultaneously are collectively called a sample frame. In the example of our stereo waveform, every two sample points makes up another sample frame. This is illustrated below for that stereo example.
sample sample sample
frame 0 frame 1 frame N
_____ _____ _____ _____ _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____| |_____|_____|
_____
| | = one sample point
|_____|
To convert to mono you could do something like this,
import wave
def stereo_to_mono(hex1, hex2):
"""average two hex string samples"""
return hex((ord(hex1) + ord(hex2))/2)
wr = wave.open('piano2.wav','r')
nchannels, sampwidth, framerate, nframes, comptype, compname = wr.getparams()
ww = wave.open('piano_mono.wav','wb')
ww.setparams((1,sampwidth,framerate,nframes,comptype,compname))
frames = wr.readframes(wr.getnframes()-1)
new_frames = ''
for (s1, s2) in zip(frames[0::2],frames[1::2]):
new_frames += stereo_to_mono(s1,s2)[2:].zfill(2).decode('hex')
ww.writeframes(new_frames)
There is no clear-cut way to go from stereo to mono. You could just drop one channel. Above, I am averaging the channels. It all depends on your application.

For wav file IO I prefer to use scipy. It is perhaps overkill for reading a wav file, but generally after reading the wav it is easier to do downstream processing.
import scipy.io.wavfile
fs1, y1 = scipy.io.wavfile.read(filename)
From here the data y1, will be N samples long, and will have Z columns where each column corresponds to a channel. To convert to a mono wav file you don't say how you'd like to do that conversion. You can take the average, or whatever else you'd like. For average use
monoChannel = y1.mean(axis=1)

As a direct answer to your question: two bytes make one 16-bit integer value in the "usual" way, given by the explicit formula: value = ord(data[0]) + 256 * ord(data[1]). But using the struct module is a better way to decode (and later reencode) such multibyte integers:
import struct
print(struct.unpack("HH", b"\x00\x00\x00\x00"))
# -> gives a 2-tuple of integers, here (0, 0)
or, if we want a signed 16-bit integer (which I think is the case in .wav files), use "hh" instead of "HH". (I leave to you the task of figuring out how exactly two bytes can encode an integer value from -32768 to 32767 :-)

Another way to convert 2 bytes into an int16, use numpy.fromstring(). Here's an example:
audio_sample is from a wav file.
>>> audio_sample[0:8]
b'\x8b\xff\xe1\xff\x92\xffn\xff'
>>> x = np.fromstring(audio_sample, np.int16)
>>> x[0:4]
array([-117, -31, -110, -146], dtype=int16)
You can use np.tobytes to convert back to bytes

Related

Python Pillow Image.frombytes mode '1' bad result

Where am I wrong ? I want to create a basic white pict from bytes
from PIL import Image
if __name__ == "__main__":
data = [chr(1)] * 8192
data = "".join(data)
im = Image.frombytes('1', (128,64), data, 'raw')
im = im.convert("RGB")
im.save("image.png", "PNG")
But I get this:
Just use Image.new instead:
im = Image.new(mode='RGB', size=(128,64), color=(255,255,255))
If you really want to make it from bytes, it would be like this:
Image.frombytes(mode='RGB', size=(128,64), data=b'\xff'*128*64*3)
edit: Image.frombytes expects bytes, not a list of integers. To convert a list of integers to the right type, use this:
>>> bytes([0,1,2]) # Python 3
b'\x00\x01\x02'
>>> bytes(bytearray([0,1,2])) # Python 2
'\x00\x01\x02'
edit 2: mode='1' or the docs have bug (see comment thread). Assuming you have a list of zeros and ones, 1024 elements long, and you want to convert this to an 128x64 monochromatic image (one bit per pixel) then you'll have to pack the bytes manually:
bits = [int(not (y%13 and x%7)) for x in range(64) for y in range(128)]
# asymmetric grid
octets = [bits[i:i+8] for i in range(0, len(bits), 8)]
def bits2byte(bits8):
result = 0
for bit in bits8:
result <<= 1
result |= bit
return result
data = bytes(bytearray([bits2byte(octet) for octet in octets]))
im = Image.frombytes(mode='1', size=(128,64), data=data)
im.show()
Result:
In mode 1 each byte represents 8 pixels (there might be zero padding at end of each row if the width does not divide by 8). So to get a white image, you have to pass in only the byte b'\xff'
data = b'\xff' * 1024
im = Image.frombytes('1', (128,64), data)
Even if the Pillow docs say that there's one pixel per byte in this mode, that is not true for the frombytes and tobytes methods, at least.
Any other repeating input other than \xff (all white) or \x00 (all black) will give some sort of pinstripe pattern, like the one in your question.

Python: reading 12 bit packed binary image

I have a 12 bit packed image from a GigE camera. It is a little-endian file and each 3 bytes hold 2 12-bit pixels.
I am trying to read this image using python and I tried something like this:
import bitstring
import numpy
with open('12bitpacked1.bin', 'rb') as f:
data = f.read()
ii=numpy.zeros(2*len(data)/3)
ic = 0
for oo in range(0,len(data)/3):
aa = bitstring.Bits(bytes=data[oo:oo+3], length=24)
ii[ic],ii[ic+1] = aa.unpack('uint:12,uint:12')
ic=ic+2
b = numpy.reshape(ii,(484,644))
In short: I read 3 bytes, convert them to bits and then unpack them as two 12-bit integers.
The result is, however, very different from what it should be. It looks like the image is separated into four quarters, each of them expanded to full image size and then overlapped.
What am I doing wrong here?
Update: Here are the test files:
12-bit packed
12-bit normal
They will not be identical, but they should show the same image. 12-bit normal has 12-bit pixel as uint16.
with open('12bit1.bin', 'rb') as f:
a = numpy.fromfile(f, dtype=numpy.uint16)
b = numpy.reshape(a,(484,644))
With this code
for oo in range(0,len(data)/3):
aa = bitstring.Bits(bytes=data[oo:oo+3], length=24)
you are reading bytes data[0:3], data[1:4], ... What you actually want is probably this:
for oo in range(0,len(data)/3):
aa = bitstring.Bits(bytes=data[3*oo:3*oo+3], length=24)
[EDIT]
Even more compact would be this:
for oo in range(0,len(data)-2,3):
aa = bitstring.Bits(bytes=data[oo:oo+3], length=24)

How to byte-swap a 32-bit integer in python?

Take this example:
i = 0x12345678
print("{:08x}".format(i))
# shows 12345678
i = swap32(i)
print("{:08x}".format(i))
# should print 78563412
What would be the swap32-function()? Is there a way to byte-swap an int in python, ideally with built-in tools?
One method is to use the struct module:
def swap32(i):
return struct.unpack("<I", struct.pack(">I", i))[0]
First you pack your integer into a binary format using one endianness, then you unpack it using the other (it doesn't even matter which combination you use, since all you want to do is swap endianness).
Big endian means the layout of a 32 bit int has the most significant byte first,
e.g. 0x12345678 has the memory layout
msb lsb
+------------------+
| 12 | 34 | 56 | 78|
+------------------+
while on little endian, the memory layout is
lsb msb
+------------------+
| 78 | 56 | 34 | 12|
+------------------+
So you can just convert between them with some bit masking and shifting:
def swap32(x):
return (((x << 24) & 0xFF000000) |
((x << 8) & 0x00FF0000) |
((x >> 8) & 0x0000FF00) |
((x >> 24) & 0x000000FF))
From python 3.2 you can define function swap32() as the following:
def swap32(x):
return int.from_bytes(x.to_bytes(4, byteorder='little'), byteorder='big', signed=False)
It uses array of bytes to represent the value and reverses order of bytes by changing endianness during conversion back to integer.
Maybe simpler use the socket library.
from socket import htonl
swapped = htonl (i)
print (hex(swapped))
that's it.
this library also works in the other direction with ntohl
The array module provides a byteswap() method for fixed sized items.
The array module appears to be in versions back to Python 2.7
array.byteswap()
“Byteswap” all items of the array. This is only supported for values which are 1, 2, 4, or 8 bytes in size;
Along with the fromfile() and tofile() methods, this module is quite easy to use:
import array
# Open a data file.
input_file = open( 'my_data_file.bin' , 'rb' )
# Create an empty array of unsigned 4-byte integers.
all_data = array.array( 'L' )
# Read all the data from the file.
all_data.fromfile( input_file , 16000 ) # assumes the size of the file
# Swap the bytes in all the data items.
all_data.byteswap( )
# Write all the data to a new file.
output_file = open( filename[:-4] + '.new' , 'wb' ) # assumes a three letter extension
all_data.tofile( output_file )
# Close the files.
input_file.close( )
output_file_close( )
The above code worked for me since I have fixed-size data files. There are more Pythonic ways to handle variable length files.

Unpacking mono channel wave data and storing it in an array

I'm trying to unpack the data from a single channel WAVE file using struct.unpack. I want to store the data in an array and be able to manipulate it (say by adding noise of given variance). I have extracted the header data and stored it in a dictionary as follows:
stHeaderFields['ChunkSize'] = struct.unpack('<L', bufHeader[4:8])[0]
stHeaderFields['Format'] = bufHeader[8:12]
stHeaderFields['Subchunk1Size'] = struct.unpack('<L', bufHeader[16:20])[0]
stHeaderFields['AudioFormat'] = struct.unpack('<H', bufHeader[20:22])[0]
stHeaderFields['NumChannels'] = struct.unpack('<H', bufHeader[22:24])[0]
stHeaderFields['SampleRate'] = struct.unpack('<L', bufHeader[24:28])[0]
stHeaderFields['ByteRate'] = struct.unpack('<L', bufHeader[28:32])[0]
stHeaderFields['BlockAlign'] = struct.unpack('<H', bufHeader[32:34])[0]
stHeaderFields['BitsPerSample'] = struct.unpack('<H', bufHeader[34:36])[0]
When I pass in a file I get the following output:
NumChannels: 1
ChunkSize: 78476
BloackAlign: 0
Filename: foo.wav
ByteRate: 32000
BlockAlign: 2
AudioFormat: 1
SampleRate: 16000
BitsPerSample: 16
Format: WAVE
Subchunk1Size: 16
I then try to get the data by doing struct.unpack('<h', self.bufHeader[36:])[0] but doing this returns a simple integer value 24932. I'm not allowed to use the wave library or anything else to do with waves specifically as I Will have to adapt this to other sorts of signals. How can I store and manipulate the actual wave data?
EDIT:
while chunk_reader < stHeaderFields['ChunkSize']:
data.append(struct.unpack('<H', bufHeader[chunk_reader:chunk_reader+stHeaderFields['BlockAlign']]))
Okay, I'll try to write a complete walk-through.
First, it is a common mistake to treat WAV (or, more likely, RIFF) file as a linear structure. It is actually a tree, with each element having a 4-byte tag, 4-byte length of data and/or child elements, and some kind of data inside.
It is just common for WAV files to have only two child elements ('fmt ' and 'data'), but it also may have metadata ('LIST') with some child elements ('INAM', 'IART', 'ICMT' etc.) or some other elements. Also there's no actual order requirement for blocks, so it is incorrect to think that 'data' follows 'fmt ', because metadata may stick in between.
So let's look at the RIFF file:
'RIFF'
|-- file type ('WAVE')
|-- 'fmt '
| |-- AudioFormat
| |-- NumChannels
| |-- ...
| L_ BitsPerSample
|-- 'LIST' (optional)
| |-- ... (other tags)
| L_ ... (other tags)
L_ 'data'
|-- sample 1 for channel 1
|-- ...
|-- sample 1 for channel N
|-- sample 2 for channel 1
|-- ...
|-- sample 2 for channel N
L_ ...
So, how should you read a WAV file? Well, first you need to read 4 bytes from the beginning of the file and make sure it is RIFF or RIFX tag, otherwise it is not a valid RIFF file. The difference between RIFF and RIFX is the former uses little-endian encoding (and is supported everywhere) while the latter uses big-endian (and virtually nobody supports it). For simplicity let's assume we're dealing only with little-endian RIFF files.
Next you read the root element length (in file endianness) and following file type. If file type is not WAVE, it is not a WAV file, so you might abandon further processing. After reading the root element, you start to read all child elements and process those you're interested in.
Reading fmt header is pretty straightforward, and you have actually done it in your code.
Data samples are usually represented as 1, 2, 3 or 4 bytes (again, in the file endianness). The most common format is a so-called s16_le (you might have seen such naming in some audio processing utilities like ffmpeg), which means samples are presented as signed 16-bit integers in little endian. Other possible formats are u8 (8-bit samples are unsigned numbers!), s24_le, s32_le. Data samples are interleaved, so it is easy to seek to arbitrary position in a stream even for multi-channel audio. Note: this is valid only for uncompressed WAV files, as indicated by AudioFormat == 1. For other formats data samples may have another layout.
So let's take a look at a simple WAV reader:
stHeaderFields = dict()
rawData = None
with open("file.wav", "rb") as f:
riffTag = f.read(4)
if riffTag != 'RIFF':
print 'not a valid RIFF file'
exit(1)
riffLength = struct.unpack('<L', f.read(4))[0]
riffType = f.read(4)
if riffType != 'WAVE':
print 'not a WAV file'
exit(1)
# now read children
while f.tell() < 8 + riffLength:
tag = f.read(4)
length = struct.unpack('<L', f.read(4))[0]
if tag == 'fmt ': # format element
fmtData = f.read(length)
fmt, numChannels, sampleRate, byteRate, blockAlign, bitsPerSample = struct.unpack('<HHLLHH', fmtData)
stHeaderFields['AudioFormat'] = fmt
stHeaderFields['NumChannels'] = numChannels
stHeaderFields['SampleRate'] = sampleRate
stHeaderFields['ByteRate'] = byteRate
stHeaderFields['BlockAlign'] = blockAlign
stHeaderFields['BitsPerSample'] = bitsPerSample
elif tag == 'data': # data element
rawData = f.read(length)
else: # some other element, just skip it
f.seek(length, 1)
Now we know file format info and its sample data, so we can parse it. As it was said, sample may have any size, but for now let's assume we're dealing only with 16-bit samples:
blockAlign = stHeaderFields['BlockAlign']
numChannels = stHeaderFields['NumChannels']
# some sanity checks
assert(stHeaderFields['BitsPerSample'] == 16)
assert(numChannels * stHeaderFields['BitsPerSample'] == blockAlign * 8)
for offset in range(0, len(rawData), blockAlign):
samples = struct.unpack('<' + 'h' * numChannels, rawData[offset:offset+blockAlign])
# now samples contains a tuple with sample values for each channel
# (in case of mono audio, you'll have a tuple with just one element).
# you may store it in the array for future processing,
# change and immediately write to another stream, whatever.
So now you have all the samples in rawData, and you may access and modify it as you like. It might be handy to use Python's array() to effectively access and modify data (but it won't do in case of 24-bit audio, you'll need to write your own serialization and deserialization).
After you've done with data processing (which may involve upscaling or downscaling the number of bits per sample, changing number of channels, sound levels manipulation etc.), you just write a new RIFF header with correct data length (usually may be computed with a simplified formula 36 + len(rawData)), an altered fmt header and data stream.
Hope this helps.

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)
array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.
This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'
The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)
import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

Categories