I would like to write a data converter tool. I need analyze the bitstream in a file to display the 2D cross-sections of a 3D volume.
The dataset I am trying to view can be found here: https://figshare.com/articles/SSOCT_test_dataset_for_OCTproZ/12356705.
It's the file titled: burned_wood_with_tape_1664x512x256_12bit.raw (832 MB)
Would extremely appreciate some direction. Willing to award a bounty if I could get some software to display the dataset as images using a data conversion.
As I'm totally new to this concept, I don't have code to show for this problem. However, here's a little something I tried using inspiration from other questions on SO:
import rawpy
import imageio
path = "Datasets/burned_wood_with_tape_1664x512x256_12bit.raw"
for item in path:
item_path = path + item
raw = rawpy.imread(item_path)
rgb = raw.postprocess()
rawpy.imshow(rgb)
Down below I implemented next visualization.
Example RAW file burned_wood_with_tape_1664x512x256_12bit.raw consists of 1664 samples per A-scan, 512 A-scans per B-scan, 16 B-scans per buffer, 16 buffers per volume, and 2 volumes in this file, each sample is encoded as 2-bytes unsigned integer in little endian order, only 12 higher bits are used, lower 4 bits contain zeros. Samples are centered approximately around 2^15, to be precise data has these stats min 0 max 47648 mean 32757 standard deviation 454.5.
I draw gray images of size 1664 x 512, there are total 16 * 16 * 2 = 512 such images (frames) in a file. I draw animated frames on screen using matplotlib library, also rendering these animation into GIF file. One example of rendered GIF at reduced quality is located after code.
To render/draw images of different resulting resolution you need to change code line with plt.rcParams['figure.figsize'], this fig size contains (widht_in_inches, height_in_inches), by default DPI (dots per inch) equals to 100, meaning that if you want to have resulting GIF of resolution 720x265 then you need to set this figure size to (7.2, 2.65). Also resulting GIF contains animation of a bit smaller resolution because axes and padding is included into resulting figure size.
My next code needs pip modules to be installed one time by command python -m pip install numpy matplotlib.
Try it online!
# Needs: python -m pip install numpy matplotlib
def oct_show(file, *, begin = 0, end = None):
import os, numpy as np, matplotlib, matplotlib.pyplot as plt, matplotlib.animation
plt.rcParams['figure.figsize'] = (7.2, 2.65) # (4.8, 1.75) (7.2, 2.65) (9.6, 3.5)
sizeX, sizeY, cnt, bits = 1664, 512, 16 * 16 * 2, 12
stepX, stepY = 16, 8
fps = 5
try:
fsize, opened_here = None, False
if type(file) is str:
fsize = os.path.getsize(file)
file, opened_here = open(file, 'rb'), True
by = (bits + 7) // 8
if end is None and fsize is not None:
end = fsize // (sizeX * sizeY * by)
imgs = []
file.seek(begin * sizeY * sizeX * by)
a = file.read((end - begin) * sizeY * sizeX * by)
a = np.frombuffer(a, dtype = np.uint16)
a = a.reshape(end - begin, sizeY, sizeX)
amin, amax, amean, stdd = np.amin(a), np.amax(a), np.mean(a), np.std(a)
print('min', amin, 'max', amax, 'mean', round(amean, 1), 'std_dev', round(stdd, 3))
a = (a.astype(np.float32) - amean) / stdd
a = np.maximum(0.1, np.minimum(a * 128 + 128.5, 255.1)).astype(np.uint8)
a = a[:, :, :, None].repeat(3, axis = -1)
fig, ax = plt.subplots()
plt.subplots_adjust(left = 0.08, right = 0.99, bottom = 0.06, top = 0.97)
for i in range(a.shape[0]):
title = ax.text(
0.5, 1.02, f'Frame {i}',
size = plt.rcParams['axes.titlesize'],
ha = 'center', transform = ax.transAxes,
)
imgs.append([ax.imshow(a[i], interpolation = 'antialiased'), title])
ani = matplotlib.animation.ArtistAnimation(plt.gcf(), imgs, interval = 1000 // fps)
print('Saving animated frames to GIF...', flush = True)
ani.save(file.name + '.gif', writer = 'imagemagick', fps = fps)
print('Showing animated frames on screen...', flush = True)
plt.show()
finally:
if opened_here:
file.close()
oct_show('burned_wood_with_tape_1664x512x256_12bit.raw')
Example output GIF:
I don't think it's a valid RAW file at all.
If you try this code:
import rawpy
import imageio
path = 'Datasets/burned_wood_with_tape_1664x512x256_12bit.raw'
raw = rawpy.imread(path)
rgb = raw.postprocess()
You will get a following error:
----> 5 raw = rawpy.imread(path)
6 rgb = raw.postprocess()
~\Anaconda3\envs\py37tf2gpu\lib\site-packages\rawpy\__init__.py in imread(pathOrFile)
18 d.open_buffer(pathOrFile)
19 else:
---> 20 d.open_file(pathOrFile)
21 return d
rawpy\_rawpy.pyx in rawpy._rawpy.RawPy.open_file()
rawpy\_rawpy.pyx in rawpy._rawpy.RawPy.handle_error()
LibRawFileUnsupportedError: b'Unsupported file format or not RAW file'
Related
I am working on a violence detection service. I am trying to develop software based on the code in this repo. My dataset consists of videos resided in two directories "Violence" and "Non-Violence".
I used this code to generate npy files out of RGB channels and optical flow features. The output of this part would be 2 folders containing npy array with 244x244x5 shape. (np.float32 dtype). so it's like I have video frames in RGB in the first 3 channels (npy[...,:3]) and optical flow features in the next two channels (npy[..., 3:]).
Now I am trying to convert them to tfrecords and use tf.data.tfrecorddataset to speed up the training process. Since my model input has to be a cube tensor, my training elements has to be 64 frames of each video. It means the data point shape has to be 64x244x244x5.
So I used this code to convert the npy files to tfrecords.
from pathlib import Path
from os.path import join
import tensorflow as tf
import numpy as np
import cv2
from tqdm import tqdm
def normalize(data):
mean = np.mean(data)
std = np.std(data)
return (data - mean) / std
def random_flip(video, prob):
s = np.random.rand()
if s < prob:
video = np.flip(m=video, axis=2)
return video
def color_jitter(video):
# range of s-component: 0-1
# range of v component: 0-255
s_jitter = np.random.uniform(-0.2, 0.2)
v_jitter = np.random.uniform(-30, 30)
for i in range(len(video)):
hsv = cv2.cvtColor(video[i], cv2.COLOR_RGB2HSV)
s = hsv[..., 1] + s_jitter
v = hsv[..., 2] + v_jitter
s[s < 0] = 0
s[s > 1] = 1
v[v < 0] = 0
v[v > 255] = 255
hsv[..., 1] = s
hsv[..., 2] = v
video[i] = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
return video
def uniform_sample(video: str, target_frames: int = 64) -> np.ndarray:
"""
gets video and outputs n_frames number of frames in video.
Args:
video:
target_frames:
Returns:
"""
len_frames = int(len(data))
interval = int(np.ceil(len_frames / target_frames))
# init empty list for sampled video and
sampled_video = []
for i in range(0, len_frames, interval):
sampled_video.append(video[i])
# calculate number of padded frames and fix it
num_pad = target_frames - len(sampled_video)
if num_pad > 0:
padding = [video[i] for i in range(-num_pad, 0)]
sampled_video += padding
return np.array(sampled_video, dtype=np.float32)
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
if __name__ == '__main__':
path = Path('transformed/')
npy_files = list(path.rglob('*.npy'))[:100]
aug = True
# one_hots = to_categorical(range(2), dtype=np.int8)
path_to_save = 'data_tfrecords'
tfrecord_path = join(path_to_save, 'all_data.tfrecord')
with tf.io.TFRecordWriter(tfrecord_path) as writer:
for file in tqdm(npy_files, desc='files converted'):
# load npy files
npy = np.load(file.as_posix(), mmap_mode='r')
data = np.float32(npy)
del npy
# Uniform sampling
data = uniform_sample(data, target_frames=64)
# Add augmentation
if aug:
data[..., :3] = color_jitter(data[..., :3])
data = random_flip(data, prob=0.5)
# Normalization
data[..., :3] = normalize(data[..., :3])
data[..., 3:] = normalize(data[..., 3:])
# Label one hot encoding
label = 1 if file.parent.stem.startswith('F') else 0
# label = one_hots[label]
feature = {'image': _bytes_feature(tf.compat.as_bytes(data.tobytes())),
'label': _int64_feature(int(label))}
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
The code works fine, but the real problem is that it consumes too much disk drive. my whole dataset consisting of 2000 videos takes 12 GB, when I converted them to npy files, it became around 80 GB, and now using tfrecords It became over 120 GB or so. How can I convert them in an efficient way to reduce the space required to store them?
The answer might be too late. But I see you are still saving the video frame in your tfrecords file.
Try removing the "image" feature from your features list. And saving per frame as their Height, Width, Channels, and so forth.
feature = {'label': _int64_feature(int(label))}
Which is why the file is taking more space.
I am having some trouble saving my pdf properly. I am trying to plot a barcode label and subsequently save it as a pdf, as in the following code. I have installed the code128.ttf font on my windows. Also, I have tried setting the .savefig dpi argument to fig.dpi, as argued in this post.
import os
import matplotlib.pyplot as plt
from matplotlib import font_manager as fm
def draw_label(label, label_dimensions_x=3.8189, label_dimensions_y=1.41732):
# import barcode code128 font
fpath = os.path.join("path", "to", "font", "code128.ttf")
prop = fm.FontProperties(fname=fpath, size=58)
fig, ax = plt.subplots(1, figsize=(label_dimensions_x,
label_dimensions_y))
plt.axis('off')
plt.xticks([], [])
plt.yticks([], [])
plt.tight_layout()
plt.xlim(0, label_dimensions_x)
plt.ylim(0, label_dimensions_y)
# plot barcode
plt.text(label_dimensions_x / 2, label_dimensions_y / 2, label,
ha='center', va='bottom',
fontproperties=prop)
plt.show()
try:
plt.savefig(os.path.join("path", "to", "output", label + '.pdf'),
dpi=plt.gcf().dpi)
except PermissionError:
logging.warning("Close the current label pdf's before running this script.")
plt.close()
return
draw_label('123456789')
This is what is output in the plot window.
This is what is output in the .pdf saved file, and this happens for all kinds of labels - it's not as if the numbers 1 to 9 except 8 are not printable.
EDIT: If I substitute a normal text font (in this case Frutiger Roman) for the code128.ttf, and set plt.axis('on') the text is not clipped, see this. Admitted, it's not pretty and doesn't fit too well, but it should be readable still.
Sam,
First, your barcode won't scan, as is. The string requires a start character, a checksum and a stop character to be added for Code128B. So, there's that.
I recommend changing to Code 39 font (which, doesn't require checksum, and start and stop characters are the same: "*") or writing the code to produce the checksum and learning a little more about Code 128 at Code 128 Wiki.
Second, I suspect there are issues with the bounding box for the graphic during the conversion to PDF. That small section of barcode being converted looks more like a piece of the number nine in the string. I suspect there is some image clipping going on.
Try substituting a regular text font to make sure the barcode image isn't being lost in the conversion.
Edited answer to include suggestion to use PNG instead of PDF.
I managed to get the software to work if you output to PNG format. I know, now the problem becomes how to convert PNG to PDF. You can start by investigating some of the libraries mentioned here: Create PDF from a list of images
In short I recommend you create graphics files and then embed them in document files.
I also added the code you need to build the barcode with the start, checksum and stop characters:
import os
import matplotlib.pyplot as plt
from matplotlib import font_manager as fm
def draw_label(label, label_dimensions_x=3.8189, label_dimensions_y=1.41732):
# import barcode code128 font
fpath = os.path.join("./", "code128.ttf")
prop = fm.FontProperties(fname=fpath, size=32)
fig, ax = plt.subplots(1, figsize=(label_dimensions_x,
label_dimensions_y))
plt.axis('off')
plt.xticks([], [])
plt.yticks([], [])
plt.tight_layout()
plt.xlim(0, label_dimensions_x)
plt.ylim(0, label_dimensions_y)
# calc checksum THEN plot barcode
weight = 1
chksum = 104
for x in label:
chksum = chksum + weight*(ord(x)-32)
weight = weight + 1
chksum = chksum % 103
chkchar = chr(chksum+32)
label128 = "%s%s%s%s" % ('Ñ', label, chkchar, 'Ó')
plt.text(label_dimensions_x / 2, label_dimensions_y / 2, label128,
ha='center', va='bottom',
fontproperties=prop)
try:
plt.savefig(os.path.join("./", label + '.png'))
except PermissionError:
logging.warning("Close the current label pdf's before running this script.")
return
draw_label('123456789')
draw_label('987654321')
draw_label('Test&Show')
You're over-complicating things by using matplotlib and a font. Generating an image directly and saving it to a PDF file is not much more complicated, and far more reliable.
As noted by Brian Anderson, it's not enough to encode the characters in your string. You need to add a start code, a checksum, and a stop code to make a complete barcode. The function code128_codes below does this, leaving the conversion to an image as a separate step.
from PIL import Image
def list_join(seq):
''' Join a sequence of lists into a single list, much like str.join
will join a sequence of strings into a single string.
'''
return [x for sub in seq for x in sub]
_code128B_mapping = dict((chr(c), [98, c+64] if c < 32 else [c-32]) for c in range(128))
_code128C_mapping = dict([(u'%02d' % i, [i]) for i in range(100)] + [(u'%d' % i, [100, 16+i]) for i in range(10)])
def code128_codes(s):
''' Code 128 conversion to a list of raw integer codes.
Only encodes ASCII characters, does not take advantage of
FNC4 for bytes with the upper bit set. Control characters
are not optimized and expand to 2 characters each.
Coded for https://stackoverflow.com/q/52710760/5987
'''
if s.isdigit() and len(s) >= 2:
# use Code 128C, pairs of digits
codes = [105] + list_join(_code128C_mapping[s[i:i+2]] for i in range(0, len(s), 2))
else:
# use Code 128B and shift for Code 128A
codes = [104] + list_join(_code128B_mapping[c] for c in s)
check_digit = (codes[0] + sum(i * x for i,x in enumerate(codes))) % 103
codes.append(check_digit)
codes.append(106) # stop code
return codes
_code128_patterns = '''
11011001100 11001101100 11001100110 10010011000 10010001100 10001001100
10011001000 10011000100 10001100100 11001001000 11001000100 11000100100
10110011100 10011011100 10011001110 10111001100 10011101100 10011100110
11001110010 11001011100 11001001110 11011100100 11001110100 11101101110
11101001100 11100101100 11100100110 11101100100 11100110100 11100110010
11011011000 11011000110 11000110110 10100011000 10001011000 10001000110
10110001000 10001101000 10001100010 11010001000 11000101000 11000100010
10110111000 10110001110 10001101110 10111011000 10111000110 10001110110
11101110110 11010001110 11000101110 11011101000 11011100010 11011101110
11101011000 11101000110 11100010110 11101101000 11101100010 11100011010
11101111010 11001000010 11110001010 10100110000 10100001100 10010110000
10010000110 10000101100 10000100110 10110010000 10110000100 10011010000
10011000010 10000110100 10000110010 11000010010 11001010000 11110111010
11000010100 10001111010 10100111100 10010111100 10010011110 10111100100
10011110100 10011110010 11110100100 11110010100 11110010010 11011011110
11011110110 11110110110 10101111000 10100011110 10001011110 10111101000
10111100010 11110101000 11110100010 10111011110 10111101110 11101011110
11110101110 11010000100 11010010000 11010011100 1100011101011'''.split()
def code128_img(s, height=100, bar_width=1):
''' Generate a Code 128 barcode image.
Coded for https://stackoverflow.com/q/52968042/5987
'''
codes = code128_codes(s)
pattern = ''.join(_code128_patterns[c] for c in codes)
pattern = '00000000000' + pattern + '00000000000'
width = bar_width * len(pattern)
color, bg = (0, 0, 0), (255, 255, 255)
im = Image.new('RGB', (width, height), bg)
ld = im.load()
for i, bar in enumerate(pattern):
if bar == '1':
for y in range(height):
for x in range(i * bar_width, (i + 1) * bar_width):
ld[x, y] = color
return im
>>> im = code128_img('AM-H-10-01-1')
>>> im.save(r'c:\temp\temp.pdf')
I am trying to create a spectrogram from a .wav file in python3.
I want the final saved image to look similar to this image:
I have tried the following:
This stack overflow post:
Spectrogram of a wave file
This post worked, somewhat. After running it, I got
However, This graph does not contain the colors that I need. I need a spectrogram that has colors. I tried to tinker with this code to try and add the colors however after spending significant time and effort on this, I couldn't figure it out!
I then tried this tutorial.
This code crashed(on line 17) when I tried to run it with the error TypeError: 'numpy.float64' object cannot be interpreted as an integer.
line 17:
samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)
I tried to fix it by casting
samples = int(np.append(np.zeros(np.floor(frameSize/2.0)), sig))
and I also tried
samples = np.append(np.zeros(int(np.floor(frameSize/2.0)), sig))
However neither of these worked in the end.
I would really like to know how to convert my .wav files to spectrograms with color so that I can analyze them! Any help would be appreciated!!!!!
Please tell me if you want me to provide any more information about my version of python, what I tried, or what I want to achieve.
Use scipy.signal.spectrogram.
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
sample_rate, samples = wavfile.read('path-to-mono-audio-file.wav')
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate)
plt.pcolormesh(times, frequencies, spectrogram)
plt.imshow(spectrogram)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
Be sure that your wav file is mono (single channel) and not stereo (dual channel) before trying to do this. I highly recommend reading the scipy documentation at https://docs.scipy.org/doc/scipy-
0.19.0/reference/generated/scipy.signal.spectrogram.html.
Putting plt.pcolormesh before plt.imshow seems to fix some issues, as pointed out by #Davidjb, and if unpacking error occurs, follow the steps by #cgnorthcutt below.
I have fixed the errors you are facing for http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html
This implementation is better because you can change the binsize (e.g. binsize=2**8)
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))
frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win
return np.fft.rfft(frames)
""" scale frequency axis logarithmically """
def logscale_spec(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)
scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))
# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
if i == len(scale)-1:
newspec[:,i] = np.sum(spec[:,int(scale[i]):], axis=1)
else:
newspec[:,i] = np.sum(spec[:,int(scale[i]):int(scale[i+1])], axis=1)
# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
if i == len(scale)-1:
freqs += [np.mean(allfreqs[int(scale[i]):])]
else:
freqs += [np.mean(allfreqs[int(scale[i]):int(scale[i+1])])]
return newspec, freqs
""" plot spectrogram"""
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"):
samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)
sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
timebins, freqbins = np.shape(ims)
print("timebins: ", timebins)
print("freqbins: ", freqbins)
plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
plt.colorbar()
plt.xlabel("time (s)")
plt.ylabel("frequency (hz)")
plt.xlim([0, timebins-1])
plt.ylim([0, freqbins])
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])
if plotpath:
plt.savefig(plotpath, bbox_inches="tight")
else:
plt.show()
plt.clf()
return ims
ims = plotstft(filepath)
import os
import wave
import pylab
def graph_spectrogram(wav_file):
sound_info, frame_rate = get_wav_info(wav_file)
pylab.figure(num=None, figsize=(19, 12))
pylab.subplot(111)
pylab.title('spectrogram of %r' % wav_file)
pylab.specgram(sound_info, Fs=frame_rate)
pylab.savefig('spectrogram.png')
def get_wav_info(wav_file):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'int16')
frame_rate = wav.getframerate()
wav.close()
return sound_info, frame_rate
for A Capella Science - Bohemian Gravity! this gives:
Use graph_spectrogram(path_to_your_wav_file).
I don't remember the blog from where I took this snippet. I will add the link whenever I see it again.
Beginner's answer above is excellent. I dont have 50 rep so I can't comment on it, but if you want the correct amplitude in the frequency domain the stft function should look like this:
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))
frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win
fftResults = np.fft.rfft(frames)
windowCorrection = 1/(np.sum(np.hanning(frameSize))/frameSize) #This is amplitude correct (1/mean(window)). Energy correction is 1/rms(window)
FFTcorrection = 2/frameSize
scaledFftResults = fftResults*windowCorrection*FFTcorrection
return scaledFftResults
You can use librosa for your mp3 spectogram needs. Here is some code I found, thanks to Parul Pandey from medium. The code I used is this,
# Method described here https://stackoverflow.com/questions/15311853/plot-spectogram-from-mp3
import librosa
import librosa.display
from pydub import AudioSegment
import matplotlib.pyplot as plt
from scipy.io import wavfile
from tempfile import mktemp
def plot_mp3_matplot(filename):
"""
plot_mp3_matplot -- using matplotlib to simply plot time vs amplitude waveplot
Arguments:
filename -- filepath to the file that you want to see the waveplot for
Returns -- None
"""
# sr is for 'sampling rate'
# Feel free to adjust it
x, sr = librosa.load(filename, sr=44100)
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
def convert_audio_to_spectogram(filename):
"""
convert_audio_to_spectogram -- using librosa to simply plot a spectogram
Arguments:
filename -- filepath to the file that you want to see the waveplot for
Returns -- None
"""
# sr == sampling rate
x, sr = librosa.load(filename, sr=44100)
# stft is short time fourier transform
X = librosa.stft(x)
# convert the slices to amplitude
Xdb = librosa.amplitude_to_db(abs(X))
# ... and plot, magic!
plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, sr = sr, x_axis = 'time', y_axis = 'hz')
plt.colorbar()
# same as above, just changed the y_axis from hz to log in the display func
def convert_audio_to_spectogram_log(filename):
x, sr = librosa.load(filename, sr=44100)
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))
plt.figure(figsize=(14, 5))
librosa.display.specshow(Xdb, sr = sr, x_axis = 'time', y_axis = 'log')
plt.colorbar()
Cheers!
I am trying to plot a very big file (~5 GB) using python and matplotlib. I am able to load the whole file in memory (the total available in the machine is 16 GB) but when I plot it using simple imshow I get a segmentation fault. This is most probable to the ulimit which I have set to 15000 but I cannot set higher. I have come to the conclusion that I need to plot my array in batches and therefore made a simple code to do that. My main isue is that when I plot a batch of the big array the x coordinates start always from 0 and there is no way I can overlay the images to create a final big one. If you have any suggestion please let me know. Also I am not able to install new packages like "Image" on this machine due to administrative rights. Here is a sample of the code that reads the first 12 lines of my array and make 3 plots.
import os
import sys
import scipy
import numpy as np
import pylab as pl
import matplotlib as mpl
import matplotlib.cm as cm
from optparse import OptionParser
from scipy import fftpack
from scipy.fftpack import *
from cmath import *
from pylab import *
import pp
import fileinput
import matplotlib.pylab as plt
import pickle
def readalllines(file1,rows,freqs):
file = open(file1,'r')
sizer = int(rows*freqs)
i = 0
q = np.zeros(sizer,'float')
for i in range(rows*freqs):
s =file.readline()
s = s.split()
#print s[4],q[i]
q[i] = float(s[4])
if i%262144 == 0:
print '\r ',int(i*100.0/(337*262144)),' percent complete',
i += 1
file.close()
return q
parser = OptionParser()
parser.add_option('-f',dest="filename",help="Read dynamic spectrum from FILE",metavar="FILE")
parser.add_option('-t',dest="dtime",help="The time integration used in seconds, default 10",default=10)
parser.add_option('-n',dest="dfreq",help="The bandwidth of each frequency channel in Hz",default=11.92092896)
parser.add_option('-w',dest="reduce",help="The chuncker divider in frequency channels, integer default 16",default=16)
(opts,args) = parser.parse_args()
rows=12
freqs = 262144
file1 = opts.filename
s = readalllines(file1,rows,freqs)
s = np.reshape(s,(rows,freqs))
s = s.T
print s.shape
#raw_input()
#s_shift = scipy.fftpack.fftshift(s)
#fig = plt.figure()
#fig.patch.set_alpha(0.0)
#axes = plt.axes()
#axes.patch.set_alpha(0.0)
###plt.ylim(0,8)
plt.ion()
i = 0
for o in range(0,rows,4):
fig = plt.figure()
#plt.clf()
plt.imshow(s[:,o:o+4],interpolation='nearest',aspect='auto', cmap=cm.gray_r, origin='lower')
if o == 0:
axis([0,rows,0,freqs])
fdf, fdff = xticks()
print fdf
xticks(fdf+o)
print xticks()
#axis([o,o+4,0,freqs])
plt.draw()
#w, h = fig.canvas.get_width_height()
#buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
#buf.shape = (w,h,4)
#buf = np.rol(buf, 3, axis=2)
#w,h,_ = buf.shape
#img = Image.fromstring("RGBA", (w,h),buf.tostring())
#if prev:
# prev.paste(img)
# del prev
#prev = img
i += 1
pl.colorbar()
pl.show()
If you plot any array with more than ~2k pixels across something in your graphics chain will down sample the image in some way to display it on your monitor. I would recommend down sampling in a controlled way, something like
data = convert_raw_data_to_fft(args) # make sure data is row major
def ds_decimate(row,step = 100):
return row[::step]
def ds_sum(row,step):
return np.sum(row[:step*(len(row)//step)].reshape(-1,step),1)
# as per suggestion from tom10 in comments
def ds_max(row,step):
return np.max(row[:step*(len(row)//step)].reshape(-1,step),1)
data_plotable = [ds_sum(d) for d in data] # plug in which ever function you want
or interpolation.
Matplotlib is pretty memory-inefficient when plotting images. It creates several full-resolution intermediate arrays, which is probably why your program is crashing.
One solution is to downsample the image before feeding it into matplotlib, as #tcaswell suggests.
I also wrote some wrapper code to do this downsampling automatically, based on your screen resolution. It's at https://github.com/ChrisBeaumont/mpl-modest-image, if it's useful. It also has the advantage that the image is resampled on the fly, so you can still pan and zoom without sacrificing resolution where you need it.
I think you're just missing the extent=(left, right, bottom, top) keyword argument in plt.imshow.
x = np.random.randn(2, 10)
y = np.ones((4, 10))
x[0] = 0 # To make it clear which side is up, etc
y[0] = -1
plt.imshow(x, extent=(0, 10, 0, 2))
plt.imshow(y, extent=(0, 10, 2, 6))
# This is necessary, else the plot gets scaled and only shows the last array
plt.ylim(0, 6)
plt.colorbar()
plt.show()
I want to represent an audio file in an image with a maximum size of 180×180 pixels.
I want to generate this image so that it somehow gives a representation of the audio file, think of it like SoundCloud's waveform (amplitude graph)?.
I wonder if any of you have something for this. I have been searching around for a bit, mainly "audio visualization" and "audio thumbnailing", but I have not found anything useful.
I first posted this to ux.stackexchange.com, this is my attempt to reach any programmers working on this.
You could also break up the audio into a chunks and measure the RMS (a measure of loudness). let's say you want an image that is 180 pixels wide.
I'll use pydub, a light-weight wrapper I wrote around the std lib wave modeule:
from pydub import AudioSegment
# first I'll open the audio file
sound = AudioSegment.from_mp3("some_song.mp3")
# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / 180
loudness_of_chunks = []
for i in range(180):
start = i * chunk_length
end = chunk_start + chunk_length
chunk = sound[start:end]
loudness_of_chunks.append(chunk.rms)
the for loop can be represented as the following list comprehension, I just wanted it to be clear:
loudness_of_chunks = [
sound[ i*chunk_length : (i+1)*chunk_length ].rms
for i in range(180)]
Now the only think left to do is scale the RMS down to a 0 - 180 scale (since you want the image to be 180px tall)
max_rms = max(loudness_of_chunks)
scaled_loudness = [ (loudness / max_rms) * 180 for loudness in loudness_of_chunks]
I'll leave the drawing of the actual pixels to you, I'm not very experienced with PIL or ImageMagik :/
Based on Jiaaro's answer (thanks for writing pydub!), and built for web2py here's my two cents:
def generate_waveform():
img_width = 1170
img_height = 140
line_color = 180
filename = os.path.join(request.folder,'static','sounds','adg3.mp3')
# first I'll open the audio file
sound = pydub.AudioSegment.from_mp3(filename)
# break the sound 180 even chunks (or however
# many pixels wide the image should be)
chunk_length = len(sound) / img_width
loudness_of_chunks = [
sound[ i*chunk_length : (i+1)*chunk_length ].rms
for i in range(img_width)
]
max_rms = float(max(loudness_of_chunks))
scaled_loudness = [ round(loudness * img_height/ max_rms) for loudness in loudness_of_chunks]
# now convert the scaled_loudness to an image
im = Image.new('L',(img_width, img_height),color=255)
draw = ImageDraw.Draw(im)
for x,rms in enumerate(scaled_loudness):
y0 = img_height - rms
y1 = img_height
draw.line((x,y0,x,y1), fill=line_color, width=1)
buffer = cStringIO.StringIO()
del draw
im = im.filter(ImageFilter.SMOOTH).filter(ImageFilter.DETAIL)
im.save(buffer,'PNG')
buffer.seek(0)
return response.stream(buffer, filename=filename+'.png')