Matplotlib agg complexity exceeding issue even with very small dataset - python

I've been trying to get some data to display in a matplotlib graph and I'm having an issue that seems fairly unexpected. I was originally trying to plot a large number of data points (~500000) and was getting the
OverflowError: Agg rendering complexity exceeded. Consider downsampling or decimating your data.
So, I did just that. I decimated my data using both the signal.decimate function and using slice notation. None of these solved my issue, I still get the complexity exceeded error even when trying to plot only 60 data points. I've attempted to determine if my computer my have some bad settings but I am fully capable of plotting 500000 points in a straight line without a hiccup. I'll add some example code and maybe someone can help me spot the error of my ways.
import scikits.audiolab as audiolab
if __name__ == "__main__":
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import freqz
sound = audiolab.sndfile('exampleFile.wav', 'read')
sound_info = sound.read_frames(sound.get_nframes())
sound.close()
nsamples = sound_info.size
t = np.linspace(0, 5, nsamples, endpoint=False)
plt.figure()
plt.plot(t, sound_info, label='Filtered signal (600 Hz)')
plt.show()

Related

How can I map the amplitude of an ECG signal in relation to the time stamp?

I'm really new to programming, so can anyone suggest a way to process ECG signals from the MIT-BIH Arrhythmia database that allows me to map the amplitude and time together. Essentially, I want an array that illustrates time as an x value, and amplitude as a y-value. Does anyone have any ideas about how I can go about creating this?
Do you have a csv, text, or Excel file? You can load it in using Pandas. Then you can graph it with Matplotlib (or Seaborn which is built off of Matplotlib)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
signal = np.random.randint(100, size=(1000))
time = np.arange(0,100,0.1)
df = pd.DataFrame({"Signal":signal, "Time":time})
sns.lineplot(data=df, x="Time", y="Signal")
plt.title("ECG Signal")
plt.show()

plotting spectrogram in audio analysis

I am working on speech recognition using neural network. To do so I need to get the spectrograms of those training audio files (.wav) . How to get those spectrograms in python ?
There are numerous ways to do so. The easiest is to check out the methods proposed in Kernels on Kaggle competition TensorFlow Speech Recognition Challenge (just sort by most voted). This one is particularly clear and simple and contains the following function. The input is a numeric vector of samples extracted from the wav file, the sample rate, the size of the frame in milliseconds, the step (stride or skip) size in milliseconds and a small offset.
from scipy.io import wavfile
from scipy import signal
import numpy as np
sample_rate, audio = wavfile.read(path_to_wav_file)
def log_specgram(audio, sample_rate, window_size=20,
step_size=10, eps=1e-10):
nperseg = int(round(window_size * sample_rate / 1e3))
noverlap = int(round(step_size * sample_rate / 1e3))
freqs, times, spec = signal.spectrogram(audio,
fs=sample_rate,
window='hann',
nperseg=nperseg,
noverlap=noverlap,
detrend=False)
return freqs, times, np.log(spec.T.astype(np.float32) + eps)
Outputs are defined in the SciPy manual, with an exception that the spectrogram is rescaled with a monotonic function (Log()), which depresses larger values much more than smaller values, while leaving the larger values still larger than the smaller values. This way no extreme value in spec will dominate the computation. Alternatively, one can cap the values at some quantile, but log (or even square root) are preferred. There are many other ways to normalize the heights of the spectrogram, i.e. to prevent extreme values from "bullying" the output :)
freq (f) : ndarray, Array of sample frequencies.
times (t) : ndarray, Array of segment times.
spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.
Alternatively, you can check the train.py and models.py code on github repo from the Tensorflow example on audio recognition.
Here is another thread that explains and gives code on building spectrograms in Python.
Scipy serve this purpose.
import scipy
# Read the .wav file
sample_rate, data = scipy.io.wavfile.read('directory_path/file_name.wav')
# Spectrogram of .wav file
sample_freq, segment_time, spec_data = signal.spectrogram(data, sample_rate)
# Note sample_rate and sampling frequency values are same but theoretically they are different measures
Use matplot library to visualize the spectrogram
import matplotlib.pyplot as plt
plt.pcolormesh(segment_time, sample_freq, spec_data )
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
You can use NumPy, SciPy and matplotlib packages to make spectrograms. See this following post.
http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html

How do I plot GFS grib2 data with Python?

I would like to have a chart with the temperatures for the following days on my website, and the Global Forecasting System meets my needs the most. How do I plot the GRIB2 data in matplotlib and create a PNG image from the plot?
I've spend hours of searching on the internet, asking people who do know how to do this (they where not helpfull at all) and I don't know where to start.
GFS data can be found here: ftp://ftp.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/
If possible, I'd like it to be lightweight and without loosing too much server space.
When you think lightweight about data usage and storage, you may consider to use other data forms than GRIB. GRIB-files usually contain worldwide data, which is pretty useless when you only want to plot for a specific domain.
I can strongly recommend to use data from the NOAA-NCEP opendap data server. You can gain data from this server using netCDF4. Unfortunately, this server is known to be unstable at some times which may causes delays in refreshing runs and/or malformed datasets. Although, in 95% of the time, I have acces to all the data I need.
Note: This data server may be slow due to high trafficking after a release of a new run. Acces to the data server can be found here: http://nomads.ncdc.noaa.gov/data.php?name=access#hires_weather_datasets
Plotting data is pretty easy with Matplotlib and Basemap toolkits. Some examples, including usage of GFS-datasets, can be found here: http://matplotlib.org/basemap/users/examples.html
Basically, there are 2 steps:
use wgrib to extract selected variables from grib2 data, and save into NetCDF file. Although there are some API such as pygrib, yet I found it less buggy to use the command line tool directly. some useful links:
install: http://www.cpc.ncep.noaa.gov/products/wesley/wgrib2/compile_questions.html
tricks: http://www.ftp.cpc.ncep.noaa.gov/wd51we/wgrib2/tricks.wgrib2
For example, extract temperature and humidity:
wgrib2 test.grb2 -s | egrep '(:RH:2 m above ground:|:TMP:2 m above ground:)'|wgrib2 -i test.grb2 -netcdf test.nc
use Python libraries to process NetCDF files, example code may look like this:
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
% matplotlib inline
from netCDF4 import Dataset
from mpl_toolkits.basemap import Basemap
from pyproj import Proj
import matplotlib.cm as cm
import datetime
file = "test.nc"
rootgrp = Dataset(file, "r")
x = rootgrp['longitude'][:] # 0-359, step = 1
y = rootgrp['latitude'][:] # -90~90, step =1
tmp = rootgrp['TMP_2maboveground'][:][0] # shape(181,360)
dt = datetime.datetime(1970,1,1) + datetime.timedelta(seconds = rootgrp['time'][0])
fig = plt.figure(dpi=150)
m = Basemap(projection='mill',lat_ts=10,llcrnrlon=x.min(),
urcrnrlon=x.max(),llcrnrlat=y.min(),urcrnrlat=y.max(), resolution='c')
xx, yy = m(*np.meshgrid(x,y))
m.pcolormesh(xx,yy,tmp-273.15,shading='flat',cmap=plt.cm.jet)
m.colorbar(location='right')
m.drawcoastlines()
m.drawparallels(np.arange(-90.,120.,30.), labels=[1,0,0,0], fontsize=10)
m.drawmeridians(np.arange(0.,360.,60.), labels=[0,0,0,1], fontsize=10)
plt.title("{}, GFS, Temperature (C) ".format(dt.strftime('%Y-%m-%d %H:%M UTC')))
plt.show()

matplotlib plot array size limit?

I've created a program that retrieves data from a device on the serial port every half second or so. It then appends that data to the array that sets the data points and then updates the plot. Everything goes fine until it's been running for an hour or so, at which point the program stops responding.
Does anyone know if there is a size limit for this array? If anyone has any ideas on handling a data set that could be millions of points, I would love to hear your thoughts.
Using the code below I was able to get matplotlib to show a simple graph of ten million points. I suspect the problem isn't with the array size.
import numpy as np
import matplotlib.pyplot as plt
import random
nsteps = 10000000
draws = np.random.randint(0,2,size=nsteps)
steps = np.where(draws>0,1,-1)
walk = steps.cumsum()
plt.plot(np.arange(nsteps), np.array(walk), 'r-')
plt.title("Big Set Random Walk with $\pm1$ steps")
plt.show()
There seems to be a some limit. I just tried
import pylab
import numpy as np
n = 10000000 # my code works fine for n = 1000000
x = np.random.normal(0,1,n)
pylab.plot(x)
pylab.show()
And got the following error:
OverflowError: Agg rendering complexity exceeded. Consider downsampling or decimating your data.

Reduce the size of .eps figure made using matplotlib

Today I was doing a report for a course and I needed to include a figure of a contour plot of some field. I did this with matplotlib (ignore the chaotic header):
import numpy as np
import matplotlib
from matplotlib import rc
rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
## for Palatino and other serif fonts use:
#rc('font',**{'family':'serif','serif':['Palatino']})
rc('text', usetex=True)
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy.ma as ma
from numpy.random import uniform
from matplotlib.colors import LogNorm
fig = plt.figure()
data = np.genfromtxt('Isocurvas.txt')
matplotlib.rcParams['xtick.direction'] = 'out'
matplotlib.rcParams['ytick.direction'] = 'out'
rc('text', usetex=True)
rc('font', family='serif')
x = data[:,0]
y = data[:,1]
z = data[:,2]
# define grid.
xi = np.linspace(0.02,1, 100)
yi = np.linspace(0.02,1.3, 100)
# grid the data.
zi = griddata(x,y,z,xi,yi)
# contour the gridded data.
CS = plt.contour(xi,yi,zi,25,linewidths=0,colors='k')
CS = plt.contourf(xi,yi,zi,25,cmap=plt.cm.jet)
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=0)
plt.xlim(0.01,1)
plt.ylim(0.01,1.3)
plt.ylabel(r'$t$')
plt.xlabel(r'$x$')
plt.title(r' Contour de $\rho(x,t)$')
plt.savefig("Isocurvas.eps", format="eps")
plt.show()
where "Isocurvas.txt" is a 3 column file, which I really don't want to touch (eliminate data, or something like that, wouldn't work for me). My problem was that the figure size was 1.8 Mb, which is too much for me. The figure itself was bigger than the whole rest of the report, and when I opened the pdf it wasn't very smooth .
So , my question is :
Are there any ways of reducing this size without a sacrifice on the quality of the figure?. I'm looking for any solution, not necessarily python related.
This is the .png figure, with a slight variation on parameters. using .png you can see the pixels, which i don't like very much, so it is preferable pdf or eps.
Thank you.
The scatter plot is what's causing your large size. Using the EPS backend, I used your data to create the figures. Here's the filesizes that I got:
Straight from your example: 1.5Mb
Without the scatter plot: 249Kb
With a raster scatter plot: 249Kb
In your particular example it's unclear why you want the scatter (not visible). But for future problems, you can use the rasterized=True keyword on the call to plt.scatter to activate a raster mode. In your example you have 12625 points in the scatter plot, and in vector mode that's going to take a bit of space.
Another trick that I use to trim down vector images from matplotlib is the following:
Save figure as EPS
Run epstopdf (available with a TeX distribution) on the resulting file
This will generally give you a smaller pdf than matplotlib's default, and the quality is unchanged. For your example, using the EPS file without the scatter, it produced a pdf with 73 Kb, which seems quite reasonable. If you really want a vector scatter command, running epstopdf on the original 1.5 Mb EPS file produced a pdf with 198 Kb in my system.
I'm not sure if it helps with size, but if your willing to try the matplotlib 1.2 release candidate there is a new backend for producing PGF images (designed to slot straight into latex seamlessly). You can find the docs for that here: http://matplotlib.org/1.2.0/users/whats_new.html#pgf-tikz-backend
If you do decide to give it a shot and you have any questions, I'm probably not the best person to talk to, so would recommend emailing the matplotlib-users mailing list.
HTH,
Try removing the scatter plot of your data. They do not appear to be visible in your final figure (because you made them size 0) and may be taking up space in your eps.
EDITED: to completely change the answer because I read the question wrong.

Categories