How can MATLAB function wavread() be implemented in Python? - python

In Matlab:
[s, fs] = wavread(file);
Since the value of each element in s is between -1 and 1, so we import scikits.audiolab using its wavread function.
from scikits.audiolab import wavread
s, fs, enc = wavread(filename)
But when I red the same input wav file, the value of s in Matlab and Python were totally different. How could I get the same output of s as in MATLAB?
p.s. The wav file is simple 16-bit mono channel file sampled in 44100Hz.

The Matlab wavread() function returns normalised data as default, i.e. it scales all the data to the range -1 to +1. If your audio file is in 16-bit format then the raw data values will be in the range -32768 to +32767, which should match the range returned by scikits.audiolab.wavread().
To normalise this data to within the range -1 to +1 all you need to do is divide the data array by the value with the maximum magnitude (using the numpy module in this case):
from scikits.audiolab import wavread
import numpy
s, fs, enc = wavread(filename) # s in range -32768 to +32767
s = numpy.array(s)
s = s / max(abs(s)) # s in range -1 to +1
Note that you can also use the 'native' option with the Matlab function to return un-normalised data values, as suggested by horchler in his comment.
[s, fs] = wavread(file, 'native');

Related

WAV FFT: Slice indices must be integers

I am trying to perform some analysis on a .wav file, I have taken the code from the following question (Python Scipy FFT wav files) and it seems to give exactly what I need however when running the code I run into the following error:
TypeError: slice indices must be integers or None or have an index method
This occurs on line 9 of my code. I don't undertand why this occurs, because I thought that the abs function would make it an integer.
import matplotlib.pyplot as plt
from scipy.fftpack import fft
from scipy.io import wavfile # get the api
fs, data = wavfile.read('New Recording 2.wav') # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
c = fft(b) # calculate fourier transform (complex numbers list)
d = len(c)/2 # you only need half of the fft list (real signal symmetry)
plt.plot(abs(c[:(d-1)]),'r')
plt.show()
plt.savefig("Test.png", bbox_inches = "tight")
abs() doesn't make your number into an integer. It just turns negative numbers into positive numbers. When len(c) is an odd number your variable d is a float that ends in x.5.
What you want is probably round(d) instead of abs(d)

maximum allowed dimention exceeded

I am attempting to make a painting based on the mass of the universe with pi and the gravitational constant of earth at sea level converted to binary. i've done the math and i have the right dimentions and it should only be less than a megabyte of ram but im running into maximum allowed dimention exceeded value error.
Here is the code:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
boshi = 123456789098765432135790864234579086542098765432135321 # universal mass
genesis = boshi ** 31467 # padding
artifice = np.binary_repr(genesis) # formatting
A = int(artifice)
D = np.array(A).reshape(A, (1348, 4117))
plt.imsave('hello_world.png', D, cmap=cm.gray) # save image
I keep running into the error at D = np.array..., and maybe my reshape is too big but its only a little bigger than 4k. seems like this should be no problem for gpu enhanced colab. Doesn't run on my home machine either with the same error. Would this be fixed with more ram?
Making it Work
The problem is that artifice = np.binary_repr(genesis) creates a string. The string consists of 1348 * 4117 = 5549716 digits, all of them zeros and ones. If you convert the string to a python integer, A = int(artifice), you will (A) wait a very long time, and (B) get a non-iterable object. The array you create with np.array(A) will have a single element.
The good news is that you can bypass the time-consuming step entirely using the fact that the string artifice is already an iterable:
D = np.array(list(artifice), dtype=np.uint8).reshape(1348, 4117)
The step list(artifice) will take a couple of seconds since it has to split up the string, but everything else should be quite fast.
Plotting is easy from there with plt.imsave('hello_world.png', D, cmap=cm.gray):
Colormaps
You can easily change the color map to coolwarm or whatever you want when you save the image. Keep in mind that your image is binary, so only two of the values will actually matter:
plt.imsave('hello_world2.png', D, cmap=cm.coolwarm)
Exploration
You have an opportunity here to add plenty of color to your image. Normally, a PNG is 8-bit. For example, instead of converting genesis to bits, you can take the bytes from it to construct an image. You can also take nibbles (half-bytes) to construct an indexed image with 16 colors. With a little padding, you can even make sure that you have a multiple of three data points, and create a full color RGB image in any number of ways. I will not go into the more complex options, but I would like to explore making a simple image from the bytes.
5549716 bits is 693715 = 5 * 11 * 12613 bytes (with four leading zero bits). This is a very nasty factorization leading to an image size of 55x12613, so let's remove that upper nibble: while 693716's factorization is just as bad as 693715's, 693714 factors very nicely into 597 * 1162.
You can convert your integer to an array of bytes using its own to_bytes method:
from math import ceil
byte_genesis = genesis.to_bytes(ceil(genesis.bit_length() / 8), 'big')
The reason that I use the built-in ceil rather than np.ceil is that it return an integer rather than a float.
Converting the huge integer is very fast because the bytes object has direct access to the data of the integer: even if it makes a copy, it does virtually no processing. It may even share the buffer since both bytes and int are nominally immutable. Similarly, you can create a numpy array from the bytes as just a view to the same memory location using np.frombuffer:
img = np.frombuffer(byte_genesis, dtype=np.uint8)[1:].reshape(597, 1162)
The [1:] is necessary to chop off the leading nibble, since bytes_genesis must be large enough to hold the entirety of genesis. You could also chop off on the bytes side:
img = np.frombuffer(byte_genesis[1:], dtype=np.uint8).reshape(597, 1162)
The results are identical. Here is what the picture looks like:
plt.imsave('hello_world3.png', img, cmap=cm.viridis)
The result is too large to upload (because it's not a binary image), but here is a randomly selected sample:
I am not sure if this is aesthetically what you are looking for, but hopefully this provides you with a place to start looking at how to convert very large numbers into data buffers.
More Options, Because this is Interesting
I wanted to look at using nibbles rather than bytes here, since that would allow you to have 16 colors per pixel, and twice as many pixels. You can get an 1162x1194 image starting from
temp = np.frombuffer(byte_genesis, dtype=np.uint8)[1:]
Here is one way to unpack the nibbles:
img = np.empty((1162, 1194), dtype=np.uint8)
img.ravel()[::2] = np.bitwise_and(temp >> 4, 0x0F)
img.ravel()[1::2] = np.bitwise_and(temp, 0x0F)
With a colormap like jet, you get:
plt.imsave('hello_world4.png', img, cmap=cm.jet)
Another option, going in the opposite direction in a manner of speaking) is not to use colormaps at all. Instead, you can divide your space by a factor of three and generate your own colors in RGB space. Luckily, one of the prime factors of 693714 is 3. You can therefore have a 398x581 image (693714 == 3 * 398 * 581). How you interpret the data is even more than usual up to you.
Side Note Before I Continue
With the black-and-white binary image, you could control the color, size and orientation of the image. With 8-bit data, you could control how the bits were sampled (8 or fewer, as in the 4-bit example), the endianness of your interpretation, the color map, and the image size. With full color, you can treat each triple as a separate color, treat the entire dataset as three consecutive color planes, or even do something like apply a Bayer filter to the array. All in addition to the other options like size, ordering, number of bits per sample, etc.
The following will show the color triples and three color planes options for now.
Full Color Images
To treat each set of 3 consecutive bytes as an RGB triple, you can do something like this:
img = temp.reshape(398, 581, 3)
plt.imsave('hello_world5.png', img)
Notice that there is no colormap in this case.
Interpreting the data as three color planes requires an extra step because plt.imsave expects the last dimension to have size 3. np.rollaxis is a good tool for this:
img = np.rollaxis(temp.reshape(3, 398, 581), 0, 3)
plt.imsave('hello_world6.png', img)
I could not reproduce your problem, because the line A = int(artifice) took like forever. I replaced it with a ,for loop to cast each digit on its own. The code worked then and produced the desired image.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
boshi = 123456789098765432135790864234579086542098765432135321
genesis = boshi ** 31467
artifice = np.binary_repr(genesis)
D = np.zeros((1348, 4117), dtype=int)
for i, val in enumerate(D):
D[i] = int(artifice[i])
plt.imsave('hello_world.png', D, cmap=cm.gray)

Reading MONO 16-bit images using PyCapture2

I am using the CMLN-13S2M-CS camera from PointGrey. This camera has a MONO 16-bit pixel format.Using the PyCapture2 wrapper from PointGrey I am unable to retrieve the image the camera is recording.
I have the following code:
import sys
import numpy
import PyCapture2
## Connect camera
bus = PyCapture2.BusManager()
c = PyCapture2.Camera()
c.connect(bus.getCameraFromIndex(0))
## Configure camera format7 settings
fmt7imgSet = PyCapture2.Format7ImageSettings(0, 0, 0, 1296, 964, PyCapture2.PIXEL_FORMAT.MONO16)
fmt7pktInf, isValid = c.validateFormat7Settings(fmt7imgSet)
c.setFormat7ConfigurationPacket(fmt7pktInf.recommendedBytesPerPacket, fmt7imgSet)
## Start capture and retrieve buffer
c.startCapture()
im = c.retrieveBuffer()
print im.getData().shape
print numpy.max(im.getData())
The following is returned by the print statements: (2498688,) and 240. The shape is exactly 2 x (964 x 1296). How should I reshape this? Also, the maximum value when saturated is 255. This is odd as this corresponds to MONO 8 Pixel format. What am I doing wrong?
Here's a quick demo that shows how to convert a 1D array of uint8 to a 2D array of uint16. The key function we need here is view.
import numpy as np
# Make 24 bytes of fake data
raw = np.arange(24, dtype=np.uint8)
#Convert
out = raw.view(np.uint16).reshape(3, 4)
print(out)
print(out.dtype)
output
[[ 256 770 1284 1798]
[2312 2826 3340 3854]
[4368 4882 5396 5910]]
uint16
Thanks to Andras Deak for his assistance!
If the resulting image doesn't look correct, you may need to swap the byte ordering of the 16 bit integers. You can read about byte ordering in Numpy here.
And if that still doesn't look correct, then the data may be organized as two planes, with one plane for the low-order bits of a pixel and the other plane for the high-order bits. That's also easy to deal with, but hopefully it won't come to that. ;)

Numpy image slicing returning black patches/ wrong values

The end goal is to take an image and slice it up into samples that I save. The problem is that my slices are randomly returning black/ incorrect patches. Bellow is a small sample program.
import scipy.ndimage as ndimage
import scipy.misc as misc
import numpy as np
image32 = misc.imread("work0.png")
patches = np.zeros((36, 8, 8))
for i in range(4):
for j in range(4):
patches[i*4 + j] = image32[i:i+8,j:j+8]
misc.imsave("{0}{1}.png".format(i,j), patches[i*4 + j])
An example of my image would be:
Patch of 0,0 of 8x8 patch yields:
Two things:
You are initializing your patch matrix to be the wrong data type. By default, numpy will make patches matrix a np.float64 type and if you use this with saving, you won't get the results you would expect. Specifically, if you consult Mr. F's answer, there is actually some scaling performed on floating-point images where the minimum and maximum values of the image get scaled to black and white respectively and so if you have an image that is completely uniform in background, both the minimum and maximum will be the same and will get visualized to black. As such, the best thing is to respect the original image's data type, namely setting the dtype of your patches matrix to np.uint8.
Judging from your for loop indexing, you want to extract out 8 x 8 patches that are non-overlapping. This means that if you have a 32 x 32 image with 8 x 8 patches, you have 16 patches in total arranged in a 4 x 4 grid.
Therefore, you need to change the patches statement so that it has 16 in the first dimension, not 36. In addition, you'll have to adjust the way you're indexing into your image to extract out the 8 x 8 patches because right now, the patches are overlapping. Specifically, you want to make the image patch indexing go from 8*i to 8*(i+1) for the rows and 8*j to 8*(j+1) for the columns. If you substitute sample values of i and j yourself, you'll see that we get unique 8 x 8 patches for each grid in your image.
With both of the above things I noted, the modified code should be:
import scipy.ndimage as ndimage
import scipy.misc as misc
import numpy as np
image32 = misc.imread('work0.png')
patches = np.zeros((16,8,8), dtype=np.uint8) # Change
for i in range(4):
for j in range(4):
patches[i*4 + j] = image32[8*i:8*(i+1),8*j:8*(j+1)] # Change
misc.imsave("{0}{1}.png".format(i,j), patches[i*4 + j])
When I do this and take a look at the output images, I get what I expect.
To be absolutely sure, let's plot the segments using matplotlib. You've conveniently saved all of the patches in patches so it shouldn't be a problem showing what we need. However, I'll place some code in comments so that you can read in the images that were saved from disk with your above code so you can verify that it still works, regardless of looking at patches or the images on disk:
import matplotlib.pyplot as plt
plt.figure()
for i in range(4):
for j in range(4):
plt.subplot(4, 4, 4*i + j + 1)
img = patches[4*i + j]
# or you can do this:
# img = misc.imread('{0}{1}.png'.format(i,j))
img = np.dstack([img, img, img])
plt.imshow(img)
plt.show()
The weird thing about matplotlib.pyplot.imshow is that if you have an image that is single channel (such as your case) that has the same intensity all around, it gets visualized to black no matter what the colour map is, much like what we experienced with imsave. Therefore, I had to artificially make this a RGB image but with all of the channels to be the same so this gets visualized as grayscale before we show the image.
We get:
According to this answer the issue is that imsave normalizes the data so that the computed minimum is defined as black (and, if there is a distinct maximum, that is defined as white).
This led me to go digging as to why the suggested use of uint8 did work to create the desired output. As it turns out, in the source there is a function called bytescale that gets called internally.
Actually, imsave itself is a very thin wrapper around toimage followed by save (from the image object). Inside of toimage if mode is None (which it is by default), that's when bytescale gets invoked.
It turns out that bytescale has an if statement that checks for the uint8 data type, and if the data is in that format, it returns the data unaltered. But if not, then the data is scaled according to a max and min transformation (where 0 and 255 are the default low and high pixel values to compare to).
This is the full snippet of code linked above:
if data.dtype == uint8:
return data
if high < low:
raise ValueError("`high` should be larger than `low`.")
if cmin is None:
cmin = data.min()
if cmax is None:
cmax = data.max()
cscale = cmax - cmin
if cscale < 0:
raise ValueError("`cmax` should be larger than `cmin`.")
elif cscale == 0:
cscale = 1
scale = float(high - low) / cscale
bytedata = (data * 1.0 - cmin) * scale + 0.4999
bytedata[bytedata > high] = high
bytedata[bytedata < 0] = 0
return cast[uint8](bytedata) + cast[uint8](low)
For the blocks of your data that are all 255, cscale will be 0, which will be checked for and changed to 1. Then the line
bytedata = (data * 1.0 - cmin) * scale + 0.4999
will result in the whole image block having the float value of 0.4999, thus set explicitly to 0 in the next chunk of code (when casted to uint8 from float) as for example:
In [102]: np.cast[np.uint8](0.4999)
Out[102]: array(0, dtype=uint8)
You can see in the body of bytescale that there are only two possible ways to return: either your data is type uint8 and it's returned as-is, or else it goes through this kind of silly scaling process. So in the end, it is indeed correct, and good practice, to be using uint8 for the pieces of your code that specifically load from or save to an image format via these functions.
So this cascade of stuff is why you were getting all zeros in the outputted image file and why the other suggestion of using dtype=np.uint8 actually helps you. It's not because you need to avoid floating point data for images, just because of this bizarre convention to check and scale data on the part of imsave.

Read the data of a single channel from a stereo wave file in Python

I have to read the data from just one channel in a stereo wave file in Python.
For this I tried it with scipy.io:
import scipy.io.wavfile as wf
import numpy
def read(path):
data = wf.read(path)
for frame in data[1]:
data = numpy.append(data, frame[0])
return data
But this code is very slow, especially if I have to work with longer files.
So does anybody know a faster way to do this? I thought about the standard wave module by using wave.readframes(), but how are the frames stored there?
scipy.io.wavfile.read returns the tuple (rate, data). If the file is stereo, data is a numpy array with shape (nsamples, 2). To get a specific channel, use a slice of data. For example,
rate, data = wavfile.read(path)
# data0 is the data from channel 0.
data0 = data[:, 0]
The wave module returns the frames as a string of bytes, which can be converted to numbers with the struct module. For instance:
def oneChannel(fname, chanIdx):
""" list with specified channel's data from multichannel wave with 16-bit data """
f = wave.open(fname, 'rb')
chans = f.getnchannels()
samps = f.getnframes()
sampwidth = f.getsampwidth()
assert sampwidth == 2
s = f.readframes(samps) #read the all the samples from the file into a byte string
f.close()
unpstr = '<{0}h'.format(samps*chans) #little-endian 16-bit samples
x = list(struct.unpack(unpstr, s)) #convert the byte string into a list of ints
return x[chanIdx::chans] #return the desired channel
If your WAV file has some other sample size, you can use the (uglier) function in another answer I wrote here.
I've never used scipy's wavfile function so I can't compare speed, but the wave and struct approach I use here has always worked for me.
rate, audio = wavfile.read(path)
audio = np.mean(audio, axis=1)

Categories