I am looking for a fast, preferably standard library mechanism to determine the bit-depth of wav file e.g. '16-bit' or '24-bit'.
I am using a subprocess call to Sox to get a plethora of audio metadata but a subprocess call is very slow and the only information I can only currently get reliably from Sox is the bit-depth.
The built in wave module does not have a function like "getbitdepth()" and is also not compatible with 24bit wav files - I could use a 'try except' to access the files metadata using the wave module (if it works, manually record that it is 16bit) then on except call sox instead (where sox will perform the analysis to accurately record its bitdepth). My concern is that that this approach feels like guess work. What if a an 8bit file is read? I would be manually assigning 16-bit when it is not.
SciPy.io.wavefile also is not compatible with 24bit audio so creates a similar issue.
This tutorial is really interesting and even includes some really low level (low level for Python at least) scripting examples to extract information from the wav files headers - unfortunately these scripts don't work for 16-bit audio.
Is there any way to simply (and without calling sox) determine what bit-depth the wav file I'm checking has?
The wave header parser script I'm using is as follows:
import struct
import os
def print_wave_header(f):
'''
Function takes an audio file path as a parameter and
returns a dictionary of metadata parsed from the header
'''
r = {} #the results of the header parse
r['path'] = f
fin = open(f,"rb") # Read wav file, "r flag" - read, "b flag" - binary
ChunkID=fin.read(4) # First four bytes are ChunkID which must be "RIFF" in ASCII
r["ChunkID"]=ChunkID
ChunkSizeString=fin.read(4) # Total Size of File in Bytes - 8 Bytes
ChunkSize=struct.unpack('I',ChunkSizeString) # 'I' Format is to to treat the 4 bytes as unsigned 32-bit inter
TotalSize=ChunkSize[0]+8 # The subscript is used because struct unpack returns everything as tuple
r["TotalSize"]=TotalSize
DataSize=TotalSize-44 # This is the number of bytes of data
r["DataSize"]=DataSize
Format=fin.read(4) # "WAVE" in ASCII
r["Format"]=Format
SubChunk1ID=fin.read(4) # "fmt " in ASCII
r["SubChunk1ID"]=SubChunk1ID
SubChunk1SizeString=fin.read(4) # Should be 16 (PCM, Pulse Code Modulation)
SubChunk1Size=struct.unpack("I",SubChunk1SizeString) # 'I' format to treat as unsigned 32-bit integer
r["SubChunk1Size"]=SubChunk1Size
AudioFormatString=fin.read(2) # Should be 1 (PCM)
AudioFormat=struct.unpack("H",AudioFormatString) ## 'H' format to treat as unsigned 16-bit integer
r["AudioFormat"]=AudioFormat[0]
NumChannelsString=fin.read(2) # Should be 1 for mono, 2 for stereo
NumChannels=struct.unpack("H",NumChannelsString) # 'H' unsigned 16-bit integer
r["NumChannels"]=NumChannels[0]
SampleRateString=fin.read(4) # Should be 44100 (CD sampling rate)
SampleRate=struct.unpack("I",SampleRateString)
r["SampleRate"]=SampleRate[0]
ByteRateString=fin.read(4) # 44100*NumChan*2 (88200 - Mono, 176400 - Stereo)
ByteRate=struct.unpack("I",ByteRateString) # 'I' unsigned 32 bit integer
r["ByteRate"]=ByteRate[0]
BlockAlignString=fin.read(2) # NumChan*2 (2 - Mono, 4 - Stereo)
BlockAlign=struct.unpack("H",BlockAlignString) # 'H' unsigned 16-bit integer
r["BlockAlign"]=BlockAlign[0]
BitsPerSampleString=fin.read(2) # 16 (CD has 16-bits per sample for each channel)
BitsPerSample=struct.unpack("H",BitsPerSampleString) # 'H' unsigned 16-bit integer
r["BitsPerSample"]=BitsPerSample[0]
SubChunk2ID=fin.read(4) # "data" in ASCII
r["SubChunk2ID"]=SubChunk2ID
SubChunk2SizeString=fin.read(4) # Number of Data Bytes, Same as DataSize
SubChunk2Size=struct.unpack("I",SubChunk2SizeString)
r["SubChunk2Size"]=SubChunk2Size[0]
S1String=fin.read(2) # Read first data, number between -32768 and 32767
S1=struct.unpack("h",S1String)
r["S1"]=S1[0]
S2String=fin.read(2) # Read second data, number between -32768 and 32767
S2=struct.unpack("h",S2String)
r["S2"]=S2[0]
S3String=fin.read(2) # Read second data, number between -32768 and 32767
S3=struct.unpack("h",S3String)
r["S3"]=S3[0]
S4String=fin.read(2) # Read second data, number between -32768 and 32767
S4=struct.unpack("h",S4String)
r["S4"]=S4[0]
S5String=fin.read(2) # Read second data, number between -32768 and 32767
S5=struct.unpack("h",S5String)
r["S5"]=S5[0]
fin.close()
return r
Esentially the same answer as from Matthias, but with copy-pastable code.
Requirements
pip install soundfile
Code
import soundfile as sf
ob = sf.SoundFile('example.wav')
print('Sample rate: {}'.format(ob.samplerate))
print('Channels: {}'.format(ob.channels))
print('Subtype: {}'.format(ob.subtype))
Explanation
Channels: Usually 2, meaning you have one left speaker and one right speaker.
Sample rate: Audio signals are analog, but we want to represent them digitally. Meaning we want to discretize them in value and in time. The sample rate gives how many times per second we get a value. The unit is Hz. The sample rate needs to be at least double of the highest frequency in the original sound, otherwise you get aliasing. Human hearing range goes from ~20Hz to ~20kHz, so you can cut off anything above 20kHZ. Meaning a sample rate of more than 40kHz does not make much sense.
Bit-depth: The higher the bit-depth, the more dynamic range can be captured. Dynamic range is the difference between the quietest and loudest volume of an instrument, part or piece of music. A typical value seems to be 16 bit or 24 bit. A bit-depth of 16 bit has a theoretical dynamic range of 96 dB, whereas 24 bit has a dynamic range of 144 dB (source).
Subtype: PCM_16 means 16 bit depth, where PCM stands for Pulse-Code Modulation.
Alternative
If you only look for a command line tool, then I can recommend MediaInfo:
$ mediainfo example.wav
General
Complete name : example.wav
Format : Wave
File size : 83.2 MiB
Duration : 8 min 14 s
Overall bit rate mode : Constant
Overall bit rate : 1 411 kb/s
Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 8 min 14 s
Bit rate mode : Constant
Bit rate : 1 411.2 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 83.2 MiB (100%)
I highly recommend the soundfile module (but mind you, I'm very biased because I wrote a large part of it).
There you can open your file as a soundfile.SoundFile object, which has a subtype attribute that holds the information you are looking for.
In your case that would probably be 'PCM_16' or 'PCM_24'.
Not clear when this update went out but the built in wave module appears to be compatible with 24 bit wav files. I'm using python 3.10.5
The wave_read sampwidth() method states that it returns bytes. I'm fairly sure just taking this value and multiplying by 8 will give us bit depth. For example:
with wave.open(path, 'rb') as wav:
bit_depth = wav.getsampwidth() * 8
getsampwidth() returns 2 for a 16 bit file and 3 for a 24 bit. No additional modules or subprocesses needed!
Related
I am working on a script in Python to parse MIDI files (yes I know MIDI parsing libraries exist for Python but for my use case it's easiest if i make it from scratch).
The one thing I'm having a problem with is the time division. the last two bytes of the header specifies the time division, but I'm having trouble determining if a file's time division is noted in ticks per beat or frames per second. After doing some reading, it seems that the top bit of the top byte indicates which of the two the time division is noted in. What I am confused about is if the top bit of a byte is the first bit of a byte or the last bit of a byte, as well as how to read the MIDI time division entirely.
EDIT: for example, a header of a MIDI file I have is the following:
4d54 6864 0000 0006 0000 0001 0078
0078 are the two bytes that denote the time sig, but I am confused as how to interpret it.
Edit 2:
def openmidi(file):
tmbr = []
f = open(file, "rb")#opening the midi in binary mode
loopfile = True
while loopfile == True:
cb = f.read(1)
if cb != b'':#checking if there are still bytes left to read
tmbr.append(cb)
else:
loopfile = False
return tmbr
def byteread(num):#will read and return the specified number of bytes
global bytecounter
bytehold = b''
for i in range(0, num):#reads specified number of bytes
bytehold+=midibytearray[i+bytecounter]#number of increment plus the read position
bytecounter+=num#after reading is done read position is incremented by the number of bytes read.
return bytehold#after looping is done the specified bytes are returned.
def timetype(deltatimebytes):#used to determine if the time division is in ticks per beat or frames per second.
if str(deltatimebytes).replace("b'","").replace("'","")[0:2] == "00":
return True#if true the time division is in ticks per beat.
else:
return False#the time division is in frames per second.
global bytecounter
bytecounter = 0 #keeps track of what position in the file is being read.
midibytearray = openmidi("C:\\Users\\gabep\\Desktop\\Electrorchestrion\\Midis\\BONEY M.Rasputin K.mid") #array that the bytes will be stored in.
header = byteread(4)
chunklength = byteread(4)
formattype = byteread(2)
numofmtrkchunks = byteread(2)
deltatime = byteread(2)#if no tempo is assigned, 120bpm is assumed.
print(deltatime)
print("Header: "+str(header.decode("utf-8")))
print("MThd chunk length: "+str(int(chunklength.hex(), 16)))
print("Midi Format Type: "+str(int(formattype.hex(), 16)))
print("Number of MTrk chunks (number of tracks): "+str(int(numofmtrkchunks.hex(), 16)))
print("Delta time: "+str(int(deltatime.hex(), 16)))
if timetype(deltatime.hex()) == True:
print("Time signature is in ticks per beat")
else:
print("Time signature is in frames per second")
Maybe you don't know that the official MIDI specifications are available and you can download the document for free. (You need to register as site user first). It includes the detailed SMF format.
Here is the description of the header chunk.
The header chunk at the beginning of the file specifies some basic information about the data in the file. Here's the syntax of the complete chunk:
<Header Chunk> = <chunk type> <length> <format> <ntrks> <division>
As described above, <chunk type> is the four ASCII characters 'MThd'; <length> is a 32-bit representation of the number 6 (high byte first).
The data section contains three 16-bit words, stored most-significant byte first.
The first word, <format>, specifies the overall organization of the file. Only three values of <format> are specified:
0=the file contains a single multi-channel track
1=the file contains one or more simultaneous tracks (or MIDI outputs) of a
sequence
2=the file contains one or more sequentially independent single-track patterns
More information about these formats is provided below.
The next word, <ntrks>, is the number of track chunks in the file. It will always be 1 for a format 0 file.
The third word, <division>, specifies the meaning of the delta-times. It has two formats, one for metrical time, and one for time-code-based time:
|bits |
|15|14 ... 8|7 ... 0 |
|--|-----------------------|-----------------|
| 0| ticks per quarter-note |
| 1| negative SMPTE format | ticks per frame |
If bit 15 of <division> is a zero, the bits 14 thru 0 represent the number of delta-time "ticks" which make up a quarter-note. For instance, if <division> is 96, then a time interval of an eighth-note between two events in the file would be 48. If bit 15 of <division> is a one, delta-times in a file correspond to subdivisions of a second, in a way consistent with SMPTE and MIDI time code. Bits 14 thru 8 contain one of the four values -24, -25, -29, or -30, corresponding to the four standard SMPTE and MIDI time code formats (-29 corresponds to 30 drop frame), and represents the number of frames per second. These negative numbers are stored in two's complement form. The second byte (stored positive) is the resolution within a frame: typical values may be 4 (MIDI time code resolution),
8, 10, 80 (bit resolution), or 100. This system allows exact specification of time-code-based tracks, but also allows millisecond-based tracks by specifying 25 frames/sec and a resolution of 40 units per frame. If the events in a file are stored with bit resolution of thirty-frame time code, the division word would be E250 hex.
In your example, your third word (hex 0078) means that the <division> is 120 ticks per quarter-note.
Delta time is given in ticks for the events in the file. Time signature is another totally different thing. It is an indication of the rhythm, and is a meta-event type. (See page 10 of the specification).
So there is this super interesting thread already about getting original size of a .gz file. Turns out the size one can get from the 4 file ending bytes are 'just' there to make sure extraction was successful. However: Its fine to rely on it IF the extracted data size is below 2**32 bytes. ie. 4 GB.
Now IF there are more than 4 GB of uncompressed data there must be multiple members in the .gz! The last 4 bytes only indicating the uncompressed size of the last chunk!
So how do we get the ending bytes of the other chunks?
Reading the gzip specs I don't see a length of the
+=======================+
|...compressed blocks...|
+=======================+
Ok. Must depend on the CM - compression method. Which is probably deflate. Let's see the RFC about it. There on page 11 it says there is a LEN attribute for "Non-compressed blocks" but it gets funky when they tell about the Compressed ones ...
I can imagine something like
full_size = os.path.getsize(gz_path)
gz = gzip.open(gz_path)
pos = 0
size = 0
while True:
try:
head_len = get_header_length(gz, pos)
block_len = get_block_length(gz, pos + head_len)
size += get_orig_size(gz, pos + head_len + block_len)
pos += head_len + block_len + 8
except:
break
print('uncompressed size of "%s" is: %i bytes' % (gz_path, full_size)
But how to get_block_length?!? :|
This was probably never intended because ... "stream data". But I don't wanna give up now.
One big bummer already: Even 7zip shows such a big .gz with the exact uncompressed size of just the very last 4 bytes.
Does someone have another idea?
First off, no, there do not need to be multiple members. There is no limit on the length of a gzip member. If the uncompressed data is more than 4 GB, then the last four bytes simply represents that length modulo 232. A gzip file with more than 4 GB of uncompressed data is in fact very likely to be a single member.
Second, the fact that you can have multiple members is true even for small gzip files. The uncompressed data does not need to be more than 4 GB for the last four bytes of the file to be useless.
The only way to reliably determine the amount of uncompressed data in a gzip file is to decompress it. You don't have to write the data out, but you have to process the entire gzip file and count the number of uncompressed bytes.
I'm coming here to leave an estimate of what you are looking for. The good answer is the one given by Mark Adler: the only reliable way to determine the uncompressed size of a gzip file is by actually decompressing it.
But I'm working with an estimate that will usually give good results, but it can fail at the boundaries. The assumptions are:
there is only one stream in the file
the stream have a similar compression ratio compared to the whole file
The idea is to get the compression ratio of the beginning of the file (get a 1M sample, decompress and measure), use it to extrapolate the uncompressed size from the compressed size, and finally, substitute the 32 least significant bits by the size module 32 obtained from the gzip stream.
The caveat comes at the multiple of 4GiB boudaries, as it could over/underestimate the size and give an estimate +/-4GiB displaced.
The code would be:
from io import SEEK_END
import os
import pack
import zlib
def estimate_uncompressed_gz_size(filename):
# From the input file, get some data:
# - the 32 LSB from the gzip stream
# - 1MB sample of compressed data
# - compressed file size
with open(filename, "rb") as gz_in:
sample = gz_in.read(1000000)
gz_in.seek(-4, SEEK_END)
lsb = struct.unpack('I', gz_in.read(4))[0]
file_size = os.fstat(gz_in.fileno()).st_size
# Estimate the total size by decompressing the sample to get the
# compression ratio so we can extrapolate the uncompressed size
# using the compression ratio and the real file size
dobj = zlib.decompressobj(31)
d_sample = dobj.decompress(sample)
compressed_len = len(sample) - len(dobj.unconsumed_tail)
decompressed_len = len(d_sample)
estimate = file_size * decompressed_len / compressed_len
# 32 LSB to zero
mask = ~0xFFFFFFFF
# Kill the 32 LSB to be substituted by the data read from the file
adjusted_estimate = (estimate & mask) | lsb
return adjusted_estimate
Workarounds around the stated caveats could be to check the difference between estimate and adjusted estimate, and if bigger than 2GiB, add/substract 4GiB accordingly. But at the end, it will be always an estimate, not a reliable number.
I've got a raw binary file (1 KB↓) that is a serial data dump of a GPS stream (along with some associated metadata). I'm specifically trying to pull a value out of the binary file that represents the GPS time; I know its offset and width in the file (10 and 8 bytes respectively, with a total frame width of 28 bytes) but it's encoded in a very weird way as described in the quote below.
What's the most Pythonic way to read this data (into a list or array)?
GPS TIME - GPS Sensor time (time of week in seconds, starting at
Saturday 2400 hours/ Sunday 0000 hours) if GPS Time Valid Message 3500
is set to 1, otherwise SDN500 system time since power up is reported.
Data words are in the order 2, 1 (MSW), 4 (LSW), 3.
A message word length is 16 bits on the SDN500–HV interface. However,
the SDN500–HV protocol, which uses a standard Universal Asynchronous
Receiver Transmitter (UART), transmits data in 8-bit groups (bytes).
This means that two bytes are required in order to make up one message
word.
A byte of information is transmitted as a sequence of 11 bits: one
start bit, 8 bits of data (least significant bit (LSB) first), one
parity bit (odd), and one stop bit. For each 16-bit data word, the
least significant byte is transmitted first, followed by the most
significant byte. Integer and floating point data types consisting of
more than one word are transmitted from the lowest numbered word to
the highest numbered word. The one exception to this rule is the time
tag, which is output in words 6-9 of each HV output message. The four
16-bit data words are in the following order: 2,1,4,3, where 1
represents the most significant word and 4 the least significant word.
Each word is separately byte-reversed.
start by opening the file
fin = open("20160128t184727_pps","rb")
then read in a frame
def read_frame(f_handle):
frame = f_handle.read(28) # 28 byte frame size
start_byte = 10
end_byte = 18 # 4 words each word is 2 bytes
timestamp_raw = frame[start_byte:end_byte]
timestamp_words = struct.unpack(">HHHH",timestamp_raw)
I could probably help more but I dont understand where the timestamp startbyte and endbyte is from your description as it does not seem to match the description you quoted ... I also do not know what the expected output value is ...if you provided those details I could probably help more
so I think questions like this have been asked before but I'm having quite a bit of trouble getting this implemented.
I'm dealing with CSV files that contain floating points between -1 and 1. All of these floating points have to be converted to 16 bit 2s complement without the leading '0b'. From there, I will convert that number to a string representation of the 2s complement, and all of those from the CSV will be written will be written to a .dat file with no space in between. So for example, if I read in the CSV file and it has two entries [0.006534, -.1232], I will convert each entry to their respective 2s complement and write them one after another onto a .dat file.
The problem is I'm getting stuck in my code on how to convert the floating point to a 16 bit 2s complement. I've been looking at other posts like this and I've been told to use the .float() function but I've had no luck.
Can someone help me write a script that will take in a floating point number, and return the 16 bit 2s complement string of it? It has to be exactly 16 bits because I'm dealing with the MIT 16 standard.
I am using python 3.4 btw
To answer the question in the title: to convert a Python float to IEEE 754 half-precision binary floating-point format, you could use binary16:
>>> from binary16 import binary16
>>> binary16(0.006534)
b'\xb0\x1e'
>>> binary16(-.1232)
b'\xe2\xaf'
numpy produces similar results:
>>> import numpy as np
>>> np.array([0.006534, -.1232], np.float16).tostring()
b'\xb1\x1e\xe3\xaf'
>>> np.array([0.006534, -.1232], '>f2').tostring() # big-endian
b'\x1e\xb1\xaf\xe3'
My goal was to save the amplitudes as the ecg mit signal format 16
..snip..
the input is a .CSV file containing the f.p. values of the amplitude from a .WAV file (which is the recording of an ECG).
You could read the wav file directly and write the corresponding 16-bit two's complement amplitudes in little-endian byte order where any unused high-order bits are sign-extended from the most significant bit ('<h' struct format):
#!/usr/bin/env python3
import wave
with wave.open('ecg.wav') as wavfile, open('ecg.mit16', 'wb') as output_file:
assert wavfile.getnchannels() == 1 # mono
assert wavfile.getsampwidth() == 2 # 16bit
output_file.writelines(iter(lambda: wavfile.readframes(4096), b''))
There is a bug in Python 3 that .readframes() returns str instead of bytes sometimes. To workaround it, use if not data test that works on both empty str and bytes:
#!/usr/bin/env python3
import wave
with wave.open('ecg.wav') as wavfile, open('ecg.mit16', 'wb') as output_file:
assert wavfile.getnchannels() == 1 # mono
assert wavfile.getsampwidth() == 2 # 16bit
while True:
data = wavfile.readframes(4096)
if not data:
break
output_file.write(data)
I have written a code for joining two wave files.It works fine when i am joining larger segments but as i need to join very small segments the clarity is not good.
I have learned that the signal processing technique such a windowed join can be used to improve the joining of file.
y[n] = w[n]s[n]
Multiply value of signal at sample number n by the value of a windowing function
hamming window w[n]= .54 - .46*cos(2*Pi*n)/L 0
I am not understanding how to get the value to signal at sample n and how to implement this??
the code i am using for joining is
import wave
m=['C:/begpython/S0001_0002.wav', 'C:/begpython/S0001_0001.wav']
i=1
a=m[i]
infiles = [a, "C:/begpython/S0001_0002.wav", a]
outfile = "C:/begpython/S0001_00367.wav"
data= []
data1=[]
for infile in infiles:
w = wave.open(infile, 'rb')
data1=[w.getnframes]
data.append( [w.getparams(), w.readframes(w.getnframes())] )
#data1 = [ord(character) for character in data1]
#print data1
#data1 = ''.join(chr(character) for character in data1)
w.close()
output = wave.open(outfile, 'wb')
output.setparams(data[0][0])
output.writeframes(data[0][1])
output.writeframes(data[1][1])
output.writeframes(data[2][1])
output.close()
during joining i am manipulating using byte format for frames.now have to use integer or float format to perform operation on them i guess,if what i am thinking is true,how can i do this?
It's probably not the best solution, but I'm sure it will work. Maybe you find existing libs or so for some steps, I dont know for Python. The steps I suggest are:
Load the wave file.
Create the sample values (amplitude)
for each frame (depending on frame
size, litte/big endian,
signed/unsigned).
Divide the resulting array of int
values into windows, e.g. sample
0-511, 512-1023, ...
Perform the window function, for the
windows that you want to join.
Do your joining.
Store the windows back in a byte
array, the inverse operation of the
first step.
Old Post:
You have to calculate the sample value, in java a function for a 2 byte/frame soundfile would look like this:
public static int createIntFrom16( byte _8Bit1, byte _8Bit2 ) {
return ( 8Bit1<<8 ) | ( 8Bit2 &0x00FF );
}
Normally you will have to care about whether or not the file uses little endian, I don't know if the Python lib will take this into account.
Once you have created all sample values, you have to divide your file into windows, e.g. of size 512 samples. Then you can window the values, and create back the byte values. For 16bit it would look like this:
public static byte[] createBytesFromInt(int i) {
byte[] bytes = new byte[2];
bytes[0]=(byte)(i>>8);
bytes[1]=(byte)i;
return bytes;
}
To give you a high level understanding, WAV audio format consists of a 44 byte header where you define necessary meta data like sample rate, number of channels, etc. followed by the payload where the actual audio data lives. Audio is simply a curve of amplitude change over time. WAV format permits this amplitude to vary from a maximum value of +1.0 to minimum of -1.0 as expressed as a floating point. As an audio recording is made this amplitude is measured typically 44100 times per second (sample rate). So a WAV file just stores this series of sample values. The WAV format does NOT store floating points, instead it stores the range of +1 to -1 as integers ranging from 0 to 2^16. These 16 bit samples require two bytes of file storage per sample. In example code like above the i>>8 is shifting the audio values by 8 bits. If you think about these ideas, and write your own WAV format code to read or write from/to files you'll be well on your way to being able to answer your question.