I have tried using the Pydub library; however, it only allows the reduction or increase of a certain amount of decibels. How would I proceed if I wanted, for example, to reduce the volume of the wav by a certain percent?
This is simple enough to just do with just the tools in the stdlib.
First, you use wave to open the input file and create an output file:
pathout = os.path.splitext(path)[0] + '-quiet.wav'
with wave.open(path, 'rb') as fin, wave.open(pathout, 'wb') as fout:
Now, you have to copy over all the wave params, and hold onto the sample width for later:
fout.setparams(fin.getparams())
sampwidth = fin.getsampwidth()
Then you loop over frames until done:
while True:
frames = bytearray(fin.readframes(1024))
if not frames:
break
You can use audioop to process this data:
frames = audioop.mul(frames, sampwidth, factor)
… but this will only work for 16-bit little-endian signed LPCM wave files (the most common kind, but not the only kind). You could solve that with other functions—most importantly, lin2lin to handle 8-bit unsigned LPCM (the second most common kind). But it's worth understanding how to do it manually:
for i in range(0, len(frames), sampwidth):
if sampwidth == 1:
# 8-bit unsigned
frames[i] = int(round((sample[0] - 128) * factor + 128)
else:
# 16-, 24-, or 32-bit signed
sample = int.from_bytes(frames[i:i+sampwidth], 'little', signed=True)
quiet = round(sample * factor)
frames[i:i+sampwidth] = int(quiet).to_bytes(sampwidth, 'little', signed=True)
audioop.mul only handles the else part—but it does more than I've done here. In particular, it has to handle cases with factors over 1—a naive multiply would clip, which will just add weird distortion without adding the desired max energy. (It's worth reading the pure Python implementation from PyPy if you want to learn the basics of this stuff.)
If you also want to handle float32 files, you need to look at the format, because they have the same sampwidth as int32, and you'll probably want the struct module or the array module to pack and unpack them. If you want to handle even less common formats, like a-law and µ-law, you'll need to read a more detailed format spec. Notice that audioop has tools for handling most of them, like ulaw2lin to convert µ-law to LPCM so you can process it and convert it back—but again, it might be worth learning how to do it manually. And for some of them, like CoolEdit float24/32, you pretty much have to do it manually.
Anyway, once you've got the quieted frames, you just write them out:
fout.writeframes(frames)
You could use the mul function from the built-in audioop module. This is what pydub uses internally, after converting the decibel value to a multiplication factor.
Related
I'm using pyaudio to take input from a microphone or read a wav file, and analyze the stream while playing it. I want to only analyze the right channel if the input is stereo. I've been able to extract the data and convert to integers using loops:
levels = []
length = len(data)
if channels == 1:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
elif channels == 2:
for i in range(length//4):
j = 4 * i + 2
volume = abs(struct.unpack('<h', data[j:j+2])[0])
levels.append(volume)
I think this working correctly, I know it runs without error on a laptop and Raspberry Pi 3, but it appears to consume too much time to run on a Raspberry Pi Zero when simultaneously streaming the output to a speaker. I figure that eliminating the loop and using numpy may help. I assume I need to use np.ndarray to do this, and the first parameter will be (CHUNK,) where CHUNK is my chunk size for analyzing the audio (I'm using 1024). And the format would be '<h', as in the struct code above, I think. But I'm at a loss as to how to code it correctly for each of the two cases (mono and right channel only for stereo). How do I create the numpy arrays for each of the two cases?
You are here reading 16-bit integers from a binary file. It seems that you are first reading the data into data variable with something like data = f.read(), which is here not visible. Then you do:
for i in range(length//2):
volume = abs(struct.unpack('<h', data[i:i+2])[0])
levels.append(volume)
BTW, that code is wrong, it shoud be abs(struct.unpack('<h', data[2*i:2*i+2])[0]), otherwise you are overlapping bytes from different values.
To do the same with numpy, you should just do this (instead of both f.read()and the whole loop):
data = np.fromfile(f, dtype='<i2')
This is over 100 times faster than the manual thing above in my test on 5 MB of data.
In the second case, you have interleaved left-right-left-right values. Again you can read them all (assuming you have enough memory) and then access only one half:
data = np.fromfile(f, dtype='<i2')
left = data[::2]
right = data[1::2]
This processes everything, even though you need just one half, but it is still much much faster.
EDIT: If the data not coming from a file, np.fromfile can be replaced with np.frombuffer. Then you have this:
channel_data = np.frombuffer(data, dtype='<i2')
if channels == 2:
channel_data = channel_data[1::2]
levels = np.abs(channel_data)
I have some binary input files (extension ".bin") that describe a 2D field of ocean depth and which are all negative float numbers. I have been able to load them in matlab as follows:
f = fopen(filename,'r','b');
data = reshape(fread(f,'float32'),[128 64]);
This matlab code gives me double values between 0 and -5200. However, when I try to the same in Python, I strangely get values between 0 and 1e-37. The Python code is:
f = open(filename, 'rb')
data = np.fromfile(f, np.float32)
data.shape = (64,128)
The strange thing is that there is a mask value of 0 for land which shows up in the right places in the (64,128) array in both cases. It seems to just be the magnitude and sign of the numpy.float32 values that are off.
What am I doing wrong in the Python code?
numpy.fromfile isn't platform independant, especially the "byte-order" is mentioned in the documentation:
Do not rely on the combination of tofile and fromfile for data storage, as the binary files generated are are not platform independent. In particular, no byte-order or data-type information is saved.
You could try:
data = np.fromfile(f, '>f4') # big-endian float32
and:
data = np.fromfile(f, '<f4') # little-endian float32
and check which one (big endian or little endian) gives the correct values.
Base on your matlab fopen, the file is in big endian ('b'). But your python code does not take care of the endianness.
Background
The binary file contain successive raw output from a camera sensor which is in the form of a bayer pattern. i.e. the data is successive blocks containing information of the form shown below and where each block is a image in image stream
[(bayer width) * (bayer height) * sizeof(short)]
Objective
To read information from a specific block of data and store it as an array for processing. I was digging through the opencv documentation, and totally lost on how to proceed. I apologize for the novice question but any suggestions?
Assuming you can read the binary file (as a whole), I would try to use
Numpy to read it into a numpy.array. You can use numpy.fromstring and depending on the system the file was written on (little or big endian), use >i2 or <i2 as your data type (you can find the list of data types here).
Also note that > means big endian and < means little endian (more on that here)
You can set an offset and specify the length in order to read to read a certain block.
import numpy as np
with open('datafile.bin','r') as f:
dataBytes = f.read()
data = np.fromstring(dataBytes[blockStartIndex:blockEndIndex], dtype='>i2')
In case you cannot read the file as a whole, I would use mmap (requires a little knowledge of C) in order to break it down to multiple files and then use the method above.
OP here, with #lsxliron's suggestion I looked into using Numpy to achieve my goals and this is what I ended up doing
import numpy as np
# Data read length
length = (bayer width) * (bayer height)
# In terms of bytes: short = 2
step = 2 * length
# Open filename
img = open("filename","rb")
# Block we are interested in i
img.seek(i * step)
# Copy data as Numpy array
Bayer = np.fromfime(img,dtype=np.uint16,count=length)
Bayer now holds the bayer pattern values in the form of an numpy array success!!
I am a very beginner in programming and I use Ubuntu.
But now I am trying to perform sound analysis with Python.
In the following code I used wav package to open the wav file and the struct to convert the information:
from wav import *
from struct import *
fp = wave.open(sound.wav, "rb")
total_num_samps = fp.getnframes()
num_fft = (total_num_samps / 512) - 2 #for a fft lenght of 512
for i in range(num_fft):
tempb = fp.readframes(512);
tempb2 = struct.unpack('f', tempb)
print (tempb2)
So in terminal the message that appears is:
struct.error: unpack requires a string argument of length 4
Please, can someone help me to solve this? Someone have a suggestion of other strategy to interpret the sound file?
The format string provided to struct has to tell it exactly the format of the second argument. For example, "there are one hundred and three unsigned shorts". The way you've written it, the format string says "there is exactly one float". But then you provide it a string with way more data than that, and it barfs.
So issue one is that you need to specify the exact number of packed c types in your byte string. In this case, 512 (the number of frames) times the number of channels (likely 2, but your code doesn't take this into account).
The second issue is that your .wav file simply doesn't contain floats. If it's 8-bit, it contains unsigned chars, if it's 16 bit it contains signed shorts, etc. You can check the actual sample width for your .wav by doing fp.getsampwidth().
So then: let's assume you have 512 frames of two-channel 16 bit audio; you would write the call to struct as something like:
channels = fp.getnchannels()
...
tempb = fp.readframes(512);
tempb2 = struct.unpack('{}h'.format(512*channels), tempb)
Using SciPy, you could load the .wav file into a NumPy array using:
import scipy.io.wavfile as wavfile
sample_rate, data = wavfile.read(FILENAME)
NumPy/SciPy will also be useful for computing the FFT.
Tips:
On Ubuntu, you can install NumPy/SciPy with
sudo apt-get install python-scipy
This will install NumPy as well, since NumPy is a dependency of SciPy.
Avoid using * imports such as from struct import *. This copies
names from the struct namespace into the current module's global
namespace. Although it saves you a bit of typing, you pay an awful
price later when the script becomes more complex and you lose
track of where variables are coming from (or worse, the imported variables
mask the value of other variables with the same name).
I'm reading an image file of dpx format, and want to extract the "Orientation" in the image section of the header, and also modify it. I have never tried to interpret binary data, so I'm a bit at a loss. I'm trying to use the struct module, but I really don't know how to do it properly. The file header specification is here:
http://www.fileformat.info/format/dpx/egff.htm
Thanks.
There seems to be a constant offset to the Orientation so if this is all you want to change then I wouldn't bother trying to parse it all, just work out the offset (which I think is just the size of the GENERICFILEHEADER plus one byte for the high byte of the orientation word) and read / manipulate it directly.
Using a bytearray would be my first choice. The offset varies by one depending on if it's in a big or little endian format, so something like this might work for you:
b = bytearray(your_byte_data)
big_endian = (b[0] == 0x52)
offset = 768 + big_endian
current_orientation = b[offset] # get current orientation
b[offset] = new_offset # set it to something new
open('out_file', 'wb').write(b)
You might want to consider using Imagemagic to do that. Open source and supports the dpx format.
The Python Imaging Library PIL has am attribute .info that might return the relevant data