I'm trying to unpack the data from a single channel WAVE file using struct.unpack. I want to store the data in an array and be able to manipulate it (say by adding noise of given variance). I have extracted the header data and stored it in a dictionary as follows:
stHeaderFields['ChunkSize'] = struct.unpack('<L', bufHeader[4:8])[0]
stHeaderFields['Format'] = bufHeader[8:12]
stHeaderFields['Subchunk1Size'] = struct.unpack('<L', bufHeader[16:20])[0]
stHeaderFields['AudioFormat'] = struct.unpack('<H', bufHeader[20:22])[0]
stHeaderFields['NumChannels'] = struct.unpack('<H', bufHeader[22:24])[0]
stHeaderFields['SampleRate'] = struct.unpack('<L', bufHeader[24:28])[0]
stHeaderFields['ByteRate'] = struct.unpack('<L', bufHeader[28:32])[0]
stHeaderFields['BlockAlign'] = struct.unpack('<H', bufHeader[32:34])[0]
stHeaderFields['BitsPerSample'] = struct.unpack('<H', bufHeader[34:36])[0]
When I pass in a file I get the following output:
NumChannels: 1
ChunkSize: 78476
BloackAlign: 0
Filename: foo.wav
ByteRate: 32000
BlockAlign: 2
AudioFormat: 1
SampleRate: 16000
BitsPerSample: 16
Format: WAVE
Subchunk1Size: 16
I then try to get the data by doing struct.unpack('<h', self.bufHeader[36:])[0] but doing this returns a simple integer value 24932. I'm not allowed to use the wave library or anything else to do with waves specifically as I Will have to adapt this to other sorts of signals. How can I store and manipulate the actual wave data?
EDIT:
while chunk_reader < stHeaderFields['ChunkSize']:
data.append(struct.unpack('<H', bufHeader[chunk_reader:chunk_reader+stHeaderFields['BlockAlign']]))
Okay, I'll try to write a complete walk-through.
First, it is a common mistake to treat WAV (or, more likely, RIFF) file as a linear structure. It is actually a tree, with each element having a 4-byte tag, 4-byte length of data and/or child elements, and some kind of data inside.
It is just common for WAV files to have only two child elements ('fmt ' and 'data'), but it also may have metadata ('LIST') with some child elements ('INAM', 'IART', 'ICMT' etc.) or some other elements. Also there's no actual order requirement for blocks, so it is incorrect to think that 'data' follows 'fmt ', because metadata may stick in between.
So let's look at the RIFF file:
'RIFF'
|-- file type ('WAVE')
|-- 'fmt '
| |-- AudioFormat
| |-- NumChannels
| |-- ...
| L_ BitsPerSample
|-- 'LIST' (optional)
| |-- ... (other tags)
| L_ ... (other tags)
L_ 'data'
|-- sample 1 for channel 1
|-- ...
|-- sample 1 for channel N
|-- sample 2 for channel 1
|-- ...
|-- sample 2 for channel N
L_ ...
So, how should you read a WAV file? Well, first you need to read 4 bytes from the beginning of the file and make sure it is RIFF or RIFX tag, otherwise it is not a valid RIFF file. The difference between RIFF and RIFX is the former uses little-endian encoding (and is supported everywhere) while the latter uses big-endian (and virtually nobody supports it). For simplicity let's assume we're dealing only with little-endian RIFF files.
Next you read the root element length (in file endianness) and following file type. If file type is not WAVE, it is not a WAV file, so you might abandon further processing. After reading the root element, you start to read all child elements and process those you're interested in.
Reading fmt header is pretty straightforward, and you have actually done it in your code.
Data samples are usually represented as 1, 2, 3 or 4 bytes (again, in the file endianness). The most common format is a so-called s16_le (you might have seen such naming in some audio processing utilities like ffmpeg), which means samples are presented as signed 16-bit integers in little endian. Other possible formats are u8 (8-bit samples are unsigned numbers!), s24_le, s32_le. Data samples are interleaved, so it is easy to seek to arbitrary position in a stream even for multi-channel audio. Note: this is valid only for uncompressed WAV files, as indicated by AudioFormat == 1. For other formats data samples may have another layout.
So let's take a look at a simple WAV reader:
stHeaderFields = dict()
rawData = None
with open("file.wav", "rb") as f:
riffTag = f.read(4)
if riffTag != 'RIFF':
print 'not a valid RIFF file'
exit(1)
riffLength = struct.unpack('<L', f.read(4))[0]
riffType = f.read(4)
if riffType != 'WAVE':
print 'not a WAV file'
exit(1)
# now read children
while f.tell() < 8 + riffLength:
tag = f.read(4)
length = struct.unpack('<L', f.read(4))[0]
if tag == 'fmt ': # format element
fmtData = f.read(length)
fmt, numChannels, sampleRate, byteRate, blockAlign, bitsPerSample = struct.unpack('<HHLLHH', fmtData)
stHeaderFields['AudioFormat'] = fmt
stHeaderFields['NumChannels'] = numChannels
stHeaderFields['SampleRate'] = sampleRate
stHeaderFields['ByteRate'] = byteRate
stHeaderFields['BlockAlign'] = blockAlign
stHeaderFields['BitsPerSample'] = bitsPerSample
elif tag == 'data': # data element
rawData = f.read(length)
else: # some other element, just skip it
f.seek(length, 1)
Now we know file format info and its sample data, so we can parse it. As it was said, sample may have any size, but for now let's assume we're dealing only with 16-bit samples:
blockAlign = stHeaderFields['BlockAlign']
numChannels = stHeaderFields['NumChannels']
# some sanity checks
assert(stHeaderFields['BitsPerSample'] == 16)
assert(numChannels * stHeaderFields['BitsPerSample'] == blockAlign * 8)
for offset in range(0, len(rawData), blockAlign):
samples = struct.unpack('<' + 'h' * numChannels, rawData[offset:offset+blockAlign])
# now samples contains a tuple with sample values for each channel
# (in case of mono audio, you'll have a tuple with just one element).
# you may store it in the array for future processing,
# change and immediately write to another stream, whatever.
So now you have all the samples in rawData, and you may access and modify it as you like. It might be handy to use Python's array() to effectively access and modify data (but it won't do in case of 24-bit audio, you'll need to write your own serialization and deserialization).
After you've done with data processing (which may involve upscaling or downscaling the number of bits per sample, changing number of channels, sound levels manipulation etc.), you just write a new RIFF header with correct data length (usually may be computed with a simplified formula 36 + len(rawData)), an altered fmt header and data stream.
Hope this helps.
Related
I want to share memory between a program in C and another in python.
The c program uses the following structure to define the data.
struct Memory_LaserFrontal {
char Data[372]; // original data
float Med[181]; // Measurements in [m]
charD; // 'I': Invalid -- 'V': Valid
charS; // 'L': Clean -- 'S': Dirty
char LaserStatus[2];
};
From python I have managed to read the variable in memory using sysv_ipc but they have no structure and is seen as a data array. How can I restructure them?
python code:
from time import sleep
import sysv_ipc
# Create shared memory object
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
print (memory_value)
print (len(memory_value))
while True:
memory_value = memory.read()
print (float(memory_value[800]))
sleep(0.1)
I have captured and printed the data in python, I have modified the sensor reading and the read data is also modified, confirming that the read data corresponds to the data in the sensor's shared memory. But without the proper structure y cant use the data.
You need to unpack your binary data structure into Python types. The Python modules struct and array can do this for you.
import struct
import array
NB: Some C compilers, but not the comomn ones, may pad your member variables to align each of them with the expected width for your CPU ( almost always 4 bytes ). This means that it may add padding bytes. You may have to experiment with the struct format parameter 'x' between the appropriate parts of your struct if this is the case. Python's struct module does not expect aligned or padded types by default, you need to inform it. See my note at the very end for a guess on what the padding might look like. Again, per #Max's comment, this is unlikely.
NB: I think the members charD and charS are really char D; and char S;
Assuming you want the floats as a Python list or equivalent we have to do some more work with the Python array module . Same for the char[] Data.
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, b for byteetc.
my_chars.from_bytes(memory_value[:372]) # happens that 372 C chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
Now use the array module to unpack to make a Python list
my_floats = array.array("f") # f for float, c for char etc.
my_floats.from_bytes(floats_as_bytes)
Now Data might be a list of Python bytes type that you need to convert to your preferred string encoding. Usually .decode('utf-8') is good enough.
Data_S = "".join(Data).decode('utf-8') # get a usable string in Data_S
Padding
struct Memory_LaserFrontal {
char Data[372]; // 372 is a multiple of 4, probably no padding
float Med[181]; // floats are 4 bytes, probably no padding
charD; // single char, expect 3 padding bytes after
charS; // single char, expect 3 padding bytes after
char LaserStatus[2]; // double char expect 2 padding bytes after.
};
So the last Python line above might be - where the 'x' indicates a padding byte that can be ignored.
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cxxxcxxxccxx", memory_value[end_of_Med:] )
I always like to leave the full source code of the problem solved so others can use it if they have a similar problem.
thanks a lot all!
from time import sleep
import sysv_ipc
import struct
import array
# Create shared memory object
while True:
memory = sysv_ipc.SharedMemory(1234)
# Read value from shared memory
memory_value = memory.read()
#print (memory_value)
#print (len(memory_value))
# Get the initial char array - you can turn it into a string if you need to later.
my_chars = array.array("b") # f for float, c for char etc.
#my_chars.from_bytes(memory_value[:372]) # happens that 372 chars is 372 bytes.
Data = my_chars.tolist() # Could be bytes list
# advance to the member after Data
end_of_Data = struct.calcsize("372c")
# get the length in bytes that 181 floats take up
end_of_Med = struct.calcsize("181f") + end_of_Data
# now we know where the floats are
floats_as_bytes = memory_value[ end_of_Data : end_of_Med ]
# unpack the remaining parts
( D, S, LaserStatus_1, LaserStatus_2 ) = struct.unpack( "cccc", memory_value[end_of_Med:] )
print(len(floats_as_bytes)/4)
a=[]
for i in range(0,len(floats_as_bytes),4):
a.append(struct.unpack('<f', floats_as_bytes[i:i+4]))
print (a[0])
sleep(0.1)
I'm a heavy LabVIEW user who is just starting to learn Python. I work with industrial and aerospace equipment a lot and something I need to do very often is process some data, then export it over some communications protocol in binary. For example, let's say I have a packet that contains a struct/cluster/other-complex-data-element that has the following underlying data elements:
sync - unsigned 32-bit integer
time - 64-bit double
payload ID - 16-bit signed integer
source - 16-but signed integer
destination - 16-bit signed integer
payload length - 32 bit signed integer
data 1 - 8-bit unsigned integer
data 2 - 8-bit unsigned integer
data 3 - 8-bit unsigned integer
data 4 - 8-bit unsigned integer
data 4 - 64-bit double
data 5 - 32-bit single
data 6 - 16 bit unsigned integer
crc - 32-bit unsigned integer
(This frame should be 42 bytes long)
I call this a frame, where there is some header information, a payload, then a crc, I think that's a common term for what I'm creating. The data types, and their location in the byte stream is critical. Any extraneous or missing bytes breaks the data transfer protocol and the data cannot be tolerated.
My question is this:
How do you achieve this easily in Python? In LabVIEW (and probably other languages), there are good, built in functions and methods to clearly define the data types, then flatten them to a string of bytes that is very efficient. It seems that with picking, there are things going on that I don't understand.
In my example code, I have a simple function to get some memory information, then serialize it. I would expect the integer version to have 88 bytes and the float version to have 172 bytes, but I get 87 and 115 respectively. Here is the code, thanks for your help!
import psutil
import time
import pickle
def getMemoryInfo():
while True:
virtual_memory = psutil.virtual_memory()
swap_memory = psutil.swap_memory()
memoryInfo = list(virtual_memory+swap_memory)
# memoryInfo = [float(x) for x in memoryInfo]
time.sleep(1.000)
print(memoryInfo)
string = pickle.dumps(memoryInfo)
print(string)
print(len(memoryInfo))
print(len(string))
getMemoryInfo()
The struct module worked for me. It was a little more tedious than I would have hoped, but it worked just fine. Here is the code I ended up using:
def build_frame(payload, payload_class, payload_id, source, destination):
# form frame header and payload
frame = {"sync": int(0x64617665),
"absolute_time": time.time(),
"relative_time": time.monotonic(),
"source": int(source),
"destination": int(destination),
"counter": 0,
"payload_class": int(payload_class),
"payload_id": int(payload_id),
"payload_length": int(len(payload)),
"payload": payload}
# form bytearray to crc
sync = (struct.pack('<I', frame['sync']))
absolute_time = (struct.pack('<d', frame['absolute_time']))
relative_time = (struct.pack('<d', frame['relative_time']))
source = (struct.pack('i', frame['source']))
destination = (struct.pack('i', frame['destination']))
counter = (struct.pack('I', frame['counter']))
payload_class = (struct.pack('i', frame['payload_class']))
payload_id = (struct.pack('i', frame['payload_id']))
payload_length = (struct.pack('i', frame['payload_length']))
payload_bytes = frame['payload']
crc_bytes = sync + absolute_time + relative_time + source + destination + counter + payload_class + payload_id + payload_length + payload_bytes
# crc bytes and add to frame
frame['crc'] = binascii.crc32(crc_bytes)
return frame
I saved one data set(200 double data values) from Keil, it turns to be a .hex file with IntelHex format, I installed IntelHex in python and load it. Now the problem is I do not know how to interpret it, for example, this post
tells you to use dict, but it does not work for hex file including double data. my code:
from intelhex import IntelHex
ih = IntelHex() # create empty object
ih.loadhex('output.hex')
ihdict = ih.todict()
datastr = ""
startAddress = 536871952
while ihdict.get(startAddress) != None:
datastr += str("%0.2X" %ihdict.get(startAddress))
startAddress += 1
the file output.hex:
:020000042000DA
:0802A8003FB7809F5BC03F409F
:1002B000DFB56EEF5AB73F407F717CBF38BE3F401D
:1002C000DFD369EFE9B43F407F717CBF38BE3F4068
:1002D0003F895E9F44AF3F401F706A0F38B53F4073
:1002E0009F20584F10AC3F405F5F72AF2FB93F4027
:1002F000DFB56EEF5AB73F40DF5B7DEFADBE3F40ED
:10030000BFA364DF51B23F40DF62676FB1B33F40CC
:100310001F9E8C0F4FC63F405F0C6B2F86B53F4032
:10032000BF7542DF3AA13F403F4D689F26B43F4032
:100330009F2742CF13A13F40DF2D5BEF96AD3F409B
:100340009F915ACF48AD3F40DF874CEF43A63F40D7
:100350007FD2573FE9AB3F40FD721E7F398F3F4050
:10036000FF892FFFC4973F409D5311CFA9883F407D
:100370001F706A0F38B53F407F78663F3CB33F40FF
:100380001DFD148F7E8A3F401F954F8FCAA73F40A7
:100390005FC04D2FE0A63F401F0D3C8F069E3F40A3
:1003A0007F4A443F25A23F40DFE13DEFF09E3F40C2
:1003B0003F185C1F0CAE3F403F79379FBC9B3F40CE
:1003C000FF2F3EFF179F3F40DFBC586F5EAC3F40A2
:1003D000FD36287F1B943F403F3D419F9EA03F40FC
:1003E000FFFA317FFD983F409FF2354FF99A3F4029
:1003F0007D0511BF82883F40DF703B6FB89D3F4055
:10040000FF1143FF88A13F40DD60146F308A3F40F9
:100410001F49328F24993F407D230CBF11863F40F6
:100420009DBD29CFDE943F40BFED2EDF76973F4044
:10043000DDBA056FDD823F407D58183F2C8C3F4070
:100440007F3333BF99993F40DD9C0A6F4E853F4013
:100450007F3333BF99993F403DBC171FDE8B3F4030
:10046000BDF4185F7A8C3F403D16091F8B843F40D6
:100470003DE1FC9E707E3F40DD0D0DEF86863F40E6
:100480003F1F469F0FA33F403DFFF79EFF7B3F402E
:10049000DD42196FA18C3F40BDC6F65E637B3F40D5
:1004A0009D5AFB4EAD7D3F40FD4BE6FE25733F4020
:1004B0001D12D30E89693F403D8EF51EC77A3F401D
:1004C0003DBC171FDE8B3F409D7FE0CE3F703F401D
:1004D000BD6C055FB6823F40DD4903EFA4813F401C
:1004E000DD14F76E8A7B3F40DDF6FB6EFB7D3F40FF
:1004F000BD20E85E10743F40DD6EE86E37743F400B
:100500001DB1F78ED87B3F407DFCD33EFE693F4056
:100510009D4C274FA6933F407DD7EEBE6B773F4063
:10052000DDAADE6E556F3F401D55B38EAA593F4080
:100530009DF7CCCE7B663F40DDCFC3EEE7613F4009
:10054000BDF2C55EF9623F40BD7AD95EBD6C3F40E9
:100550005D90D82E486C3F40BD7AD95EBD6C3F405F
:100560003D67BD9EB35E3F403D0DCC9E06663F405D
:10057000BD88AD5EC4563F40BDA6A85E53543F4003
:10058000BDD4CA5E6A653F40FD95B0FE4A583F4003
:100590003DA3B39ED1593F405DF89D2EFC4E3F4098
:1005A000FD69E1FEB4703F40FD59BAFE2C5D3F404D
:1005B000BD63C8DE31643F407DB0B63E585B3F400E
:1005C0001DCD9F8EE64F3F405DEAC92EF5643F404A
:1005D000FD4993FEA4493F405D8E852EC7423F40B2
:1005E0007D4D88BE26443F401D3EA20E1F513F4018
:1005F0003D938C9E49463F40FD7E9F7EBF4F3F40CE
:10060000DDB1C8EE58643F40BD7F70DE3F383F40EB
:100610005DBCA72EDE533F409D4197CEA04B3F408F
:100620003D0B799E853C3F403D0B799E853C3F408C
:10063000BD7F70DE3F383F407D6B83BEB5413F409C
:10064000FD32827E19413F409D2A864E15433F4030
:10065000FDEFA1FEF7503F401DC4620E62313F40E6
:100660003D476F9EA3373F401D98930ECC493F40B6
:100670001D53608E29303F403DF4671EFA333F40E2
:100680003D048F1E82473F407D726D3EB9363F402C
:10069000FDF68B7EFB453F40FD5767FEAB333F4089
:1006A0005D7774AE3B3A3F40BD2C695E96343F4067
:1006B0009DF579CEFA3C3F40DD4C476EA6233F4086
:1006C0001D1E540E0F2A3F407DE36FBEF1373F40A1
:1006D000FD7C4C7E3E263F40DD8153EEC0293F40ED
:1006E0009D034ECE01273F409D1375CE893A3F4072
:1006F000DDC4336EE2193F407D444B3EA2253F40AE
:100700005DD84F2EEC273F407D0F3FBE871F3F40F7
:100710007DF143BEF8213F40BDE04B5EF0253F40F8
:100720005D8548AE42243F405DD84F2EEC273F40C8
:100730007D5B5CBE2D2E3F40FD40567E202B3F4012
:10074000BD514EDE28273F40FD2945FE94223F4003
:100750001DD9208E6C103F409DE552CE72293F403E
:100760003D91399EC81C3F407D16293E8B143F4069
:100770007D6930BE34183F409D9935CECC1A3F403C
:100780005DFD34AE7E1A3F409DB730CE5B183F40D2
:100790003DAF349E571A3F405D8C322E46193F4084
:1007A0003D81129E40093F405DC8282E64143F40A1
:1007B0007D34243E1A123F405DC13EAE601F3F4073
:1007C0003D2E0B1E97053F405D6E372EB71B3F40F9
:1007D0003D09269E04133F403DCD2F9EE6173F4026
:1007E000BD6F49DEB7243F403D5C2D1EAE163F4035
:1007F0001DE00A0E70053F403DBD089E5E043F406F
:100800009D890ECE44073F401D86190EC30C3F4004
:10081000BDD70EDE6B073F401DBB258EDD123F406E
:100820001DBB258EDD123F407D06023E03013F4089
:10083000BDD0245E68123F40FDA81B7ED40D3F4012
:100840005D14462E0A233F40FD12347E091A3F40B4
:100850009D180C4E0C063F401D681E0E340F3F4085
:100860001D510D8EA8063F403DD4191EEA0C3F4095
:100870001B94ED0DCAF63E405D40152EA00A3F4088
:100880009D64294EB2143F407DF82D3EFC163F403A
:100890001DD9208E6C103F405DE6232EF3113F40A2
:1008A0007DA526BE52133F40BDB913DEDC093F4093
:1008B0005D2904AE14023F40BBE5E2DD72F13E402B
:1008C0009D180C4E0C063F40BD84075EC2033F409E
:1008D0005D221A2E110D3F40FDC6167E630B3F4070
:0108E0009D7A
:00000001FF
Assuming the data represents a list of 64-bit floating point numbers that you want to decode, the process is to collect the appropriate number of octets and decode them as a double.
Reusing the structure you presented:
from intelhex import IntelHex
import struct
ih = IntelHex()
ih.loadhex('output.hex')
ihdict = ih.todict()
# Read all the data into a long list of int octets
data = []
startAddress = 536871952
while ihdict.get(startAddress) is not None:
data.append(ihdict.get(startAddress))
startAddress += 1
# slice the list into 8-byte bytearrays
bin_arr = [bytearray(data[n:n+8]) for n in range(0, len(data), 8)]
# unpack each bytearray as a double
# Filter for 8 byte arrays because len(data) is not divisible by 8.
# Is the data properly aligned?
doubles_list = [struct.unpack('d', b) for b in bin_arr if len(b) == 8]
print(doubles_list)
It may be worth mentioning that the above assumes a big endian byte ordering. I believe you can use < as part of the format definition to assume a little endian ordering. More information is available in the struct.unpack docs.
Let's suppose I have a large number of NumPy arrays saved as files (np.save(), ".npy" files). All these have shape e.g. (n,20), where I don't know n without opening the file. n is different for every file.
I want to merge these into a single dataset, and then using a set of selection methods split it into three different numpy arrays written on the disk.
Usually I would loop over all files and use np.concatenate(). However the final array is likely not to fit in memory.
The other option I have is to use np.memmap(), which I am absolutely not so sure how it works. To my understanding, I'd have to do something like that:
a = np.memmap('output.npy',dtype='float64',mode='w+',shape=(N,20))
for i,f in enumerate(myfiles):
a[i,:] = np.load(f)
a.flush()
# And then find a way to split "a" into three, does the following work?
part_one = a[ [0,2,10,42,58] , : ]
The problem is that I don't know N, the final number of rows. Therefore I would need to open each file, read number of rows, close the file, sum all the number of rows before declaring the memmap. Which is highly inefficient, and there must be a better method.
Do you have any suggestion on this problem? Am I doing something wrong?
The .npy file specification defines the header for npy files. I couldn't find an already-baked way to read it, but the format is easy and you can pull the information out yourself. The file information is encoded in a python dict including a shape tuple. This is a short read of the top of the file and will be much faster than reading in the data.
import struct
import ast
# structs to decode .npy file header consisting of a "magic"
# string verifying the file type, major and minor version numbers,
# header length, and literal string representation of a python dict
# holding file's type and shape.
npy_magic = b"\x93NUMPY"
npy_v1_header = struct.Struct(
"<" # little-endian encoding
"6s" # 6 byte magic string
"B" # 1 byte major number
"B" # 1 byte minor number
"H" # 2 byte header length
# ... header string follows
)
npy_v2_header = struct.Struct(
"<" # little-endian encoding
"6s" # 6 byte magic string
"B" # 1 byte major number
"B" # 1 byte minor number
"L" # 4 byte header length
# ... header string follows
)
def read_npy_file_header(filename):
with open(filename, 'rb') as fp:
buf = fp.read(npy_v1_header.size)
magic, major, minor, hdr_size = npy_v1_header.unpack(buf)
if magic != npy_magic:
raise IOError("Not an npy file")
if major not in (0,1):
raise IOError("Unknown npy file version")
if major == 2:
fp.seek(0)
buf = fp.read(npy_v2_header.size)
magic, major, minor, hdr_size = npy_v2_header.unpack(buf)
return ast.literal_eval(fp.read(hdr_size).decode('ascii'))
# test
from glob import glob
for fn in glob('*.npy'):
print(fn, read_npy_file_header(fn))
I'm trying to read the data from a .wav file.
import wave
wr = wave.open("~/01 Road.wav", 'r')
# sample width is 2 bytes
# number of channels is 2
wave_data = wr.readframes(1)
print(wave_data)
This gives:
b'\x00\x00\x00\x00'
Which is the "first frame" of the song. These 4 bytes obviously correspond to the (2 channels * 2 byte sample width) bytes per frame, but what does each byte correspond to?
In particular, I'm trying to convert it to a mono amplitude signal.
If you want to understand what the 'frame' is you will have to read the standard of the wave file format. For instance: https://web.archive.org/web/20140221054954/http://home.roadrunner.com/~jgglatt/tech/wave.htm
From that document:
The sample points that are meant to be "played" ie, sent to a Digital to Analog Converter(DAC) simultaneously are collectively called a sample frame. In the example of our stereo waveform, every two sample points makes up another sample frame. This is illustrated below for that stereo example.
sample sample sample
frame 0 frame 1 frame N
_____ _____ _____ _____ _____ _____
| ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
|_____|_____|_____|_____| |_____|_____|
_____
| | = one sample point
|_____|
To convert to mono you could do something like this,
import wave
def stereo_to_mono(hex1, hex2):
"""average two hex string samples"""
return hex((ord(hex1) + ord(hex2))/2)
wr = wave.open('piano2.wav','r')
nchannels, sampwidth, framerate, nframes, comptype, compname = wr.getparams()
ww = wave.open('piano_mono.wav','wb')
ww.setparams((1,sampwidth,framerate,nframes,comptype,compname))
frames = wr.readframes(wr.getnframes()-1)
new_frames = ''
for (s1, s2) in zip(frames[0::2],frames[1::2]):
new_frames += stereo_to_mono(s1,s2)[2:].zfill(2).decode('hex')
ww.writeframes(new_frames)
There is no clear-cut way to go from stereo to mono. You could just drop one channel. Above, I am averaging the channels. It all depends on your application.
For wav file IO I prefer to use scipy. It is perhaps overkill for reading a wav file, but generally after reading the wav it is easier to do downstream processing.
import scipy.io.wavfile
fs1, y1 = scipy.io.wavfile.read(filename)
From here the data y1, will be N samples long, and will have Z columns where each column corresponds to a channel. To convert to a mono wav file you don't say how you'd like to do that conversion. You can take the average, or whatever else you'd like. For average use
monoChannel = y1.mean(axis=1)
As a direct answer to your question: two bytes make one 16-bit integer value in the "usual" way, given by the explicit formula: value = ord(data[0]) + 256 * ord(data[1]). But using the struct module is a better way to decode (and later reencode) such multibyte integers:
import struct
print(struct.unpack("HH", b"\x00\x00\x00\x00"))
# -> gives a 2-tuple of integers, here (0, 0)
or, if we want a signed 16-bit integer (which I think is the case in .wav files), use "hh" instead of "HH". (I leave to you the task of figuring out how exactly two bytes can encode an integer value from -32768 to 32767 :-)
Another way to convert 2 bytes into an int16, use numpy.fromstring(). Here's an example:
audio_sample is from a wav file.
>>> audio_sample[0:8]
b'\x8b\xff\xe1\xff\x92\xffn\xff'
>>> x = np.fromstring(audio_sample, np.int16)
>>> x[0:4]
array([-117, -31, -110, -146], dtype=int16)
You can use np.tobytes to convert back to bytes