Matplotlib Agg Rendering Complexity Error - python

I am trying to print a 600 dpi graph using Python matplotlib. However Python plotted 2 out of 8 graphs, and output the error:
OverflowError: Agg rendering complexity exceeded. Consider downsampling or decimating your data.
I am plotting a huge chunk of data (7,500,000 data per column) so I guess either that would be some overloading problem or that I need to set a large cell_block_limit.
I tried searching for the solutions for changing a cell_block_limit on Google but to no avail. What would be a good approach?
The code as follows:-
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
majorLocator = MultipleLocator(200)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(20)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
ax.xaxis.set_ticks_position('bottom')
ax.xaxis.grid(True,which='minor')
ax.yaxis.grid(True)
plt.plot(timemat,fildata)
plt.xlabel(plotxlabel,fontsize=14)
plt.ylabel(plotylabel,fontsize=14)
plt.title(plottitle,fontsize=16)
fig.savefig(plotsavetitle,dpi=600)

In addition to #Lennart's point that there's no need for the full resolution, you might also consider a plot similar to the following.
Calculating the max/mean/min of a "chunked" version is very simple and efficient if you use a 2D view of the original array and the axis keyword arg to x.min(), x.max(), etc.
Even with the filtering, plotting this is much faster than plotting the full array.
(Note: to plot this many points, you'll have to tune down the noise level a bit. Otherwise you'll get the OverflowError you mentioned. If you want to compare plotting the "full" dataset, change the y += 0.3 * y.max() np.random... line to more like 0.1 or remove it completely.)
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1977)
# Generate some very noisy but interesting data...
num = 1e7
x = np.linspace(0, 10, num)
y = np.random.random(num) - 0.5
y.cumsum(out=y)
y += 0.3 * y.max() * np.random.random(num)
fig, ax = plt.subplots()
# Wrap the array into a 2D array of chunks, truncating the last chunk if
# chunksize isn't an even divisor of the total size.
# (This part won't use _any_ additional memory)
chunksize = 10000
numchunks = y.size // chunksize
ychunks = y[:chunksize*numchunks].reshape((-1, chunksize))
xchunks = x[:chunksize*numchunks].reshape((-1, chunksize))
# Calculate the max, min, and means of chunksize-element chunks...
max_env = ychunks.max(axis=1)
min_env = ychunks.min(axis=1)
ycenters = ychunks.mean(axis=1)
xcenters = xchunks.mean(axis=1)
# Now plot the bounds and the mean...
ax.fill_between(xcenters, min_env, max_env, color='gray',
edgecolor='none', alpha=0.5)
ax.plot(xcenters, ycenters)
fig.savefig('temp.png', dpi=600)

With 600dpi you would have to make the plot 13 meters wide to plot that data without decimating it. :-)
I would suggest chunking the data into pieces a couple of hundred or maybe even a thousand samples long, and extracting the maximum value out of that.
Something like this:
def chunkmax(data, chunk_size):
source = iter(data)
chunk = []
while True:
for i in range(chunk_size):
chunk.append(next(source))
yield max(chunk)
This would then, with a chunk_size of 1000 give you 7500 points to plot, where you then easily can see where in the data the shock comes. (Unless the data is so noisy you would have to average it to see if there is a chock or not. But that's also easily fixable).

Related

Radial heatmap from similarity matrix in Python

Summary
I have a 2880x2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?
Details
I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 ... 22:00, 23:00).
These were about 28-31 cells long, and each cell had the measurement of the thing I'm trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24x12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880x2880 matrix with similarity coefficients.
I'm trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions:
I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It's just blank. I've added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.
import sys
sys.setrecursionlimit(10000)
import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc
# Function creating dummy data for this example
def transformer():
dimension = 2880
dummy_matrix = ([[ random.random() for i in range(dimension) ] for j in range(dimension)]) #Fake, similar data
col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
idx_vals = [i for i in range(dimension*dimension)] # Placeholder
return idx_vals, val_vals, row_vals, col_vals
idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)
hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))
gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))
I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it's possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn't have that datashade-feature.
So I have no idea how to tackle this. I would be content with a broad overview too, I don't need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I'm open to any solution really.
I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:
The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
Near the center of the radial heatmap, the color points will be too dense to be understood by the human.
The following is a simple code to show a heatmap.
import matplotlib.cm
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import numpy as np
n = 2880
m = 2880
dummy_matrix = np.random.rand(m, n)
fig = plt.figure(figsize=(50,50)) # change the figsize to control the resolution
ax = fig.add_subplot(111)
cmap = matplotlib.cm.get_cmap("Blues") # you may use other build-in colormap or define you own colormap
# if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
plt.colorbar(image)
plt.show()
Which gives:
Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.
Plain Matplotlib seems to be able to handle it, based on answers from here: How do I create radial heatmap in matplotlib?
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
n = 2880
m = 2880
rad = np.linspace(0, 10, m)
a = np.linspace(0, 2 * np.pi, n)
r, th = np.meshgrid(rad, a)
dummy_matrix = ([[ random.random() for i in range(n) ] for j in range(m)])
plt.subplot(projection="polar")
plt.pcolormesh(th, r, dummy_matrix, cmap = 'Blues')
plt.plot(a, r, ls='none', color = 'k')
plt.grid()
plt.colorbar()
plt.savefig("custom_radial_heatmap.png")
plt.show()
And it didn't even take an eternity, took only about 20 seconds max.
You would think it would turn out monstrous like that
But the sheer amount of points drowns out the jaggedness, WOOHOO!
There's some things left to be desired, like tags and ticks, but I think I'll figure that out.

Plot stochastic trajectories deviations from 'real' path using a colormesh in matplotlib (Python)

Hi I created a program that will create deviations from a real trajectory, it is complicated and I do not have a simple example unfortunately.
It calculates a path with stochastic initial conditions from the real path and does this for x iterations, the goal is to show that the deviations become larger at greater times.
The real path and the deviations are showed below.
However I want to show that the deviations become greater the longer in time we are. Ofcourse I could just calculate the variance and plot mean+var and mean-var at each time step but I was wondering if I could plot something like this, using hist2d
You see that the blocks are not as smooth as a like and this is not that great to use.
Then I went and looked at python's kde and created the following.
This is also not preferable as I think it bins more points at the minima and maxima. Also it is 'too smeared out'. Especially in the beginning, all the points are the same so I want there just to be a straight line to really show that the deviations start later on.
I guess my question is; is what I want even possible and what package/command should I use. I haven't found what I am looking for on other questions. Or has anyone a suggestion to nicely show what I want in a any other way?
Here is an idea plotting multiple curves with transparency on top of each other:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 200)
for _ in range(1000):
plt.plot(x, np.sin(x * np.random.normal(1, 0.1)) * np.random.normal(1, 0.1), color='r', alpha=0.02)
plt.plot(x, np.sin(x), color='b')
plt.margins(x=0)
plt.show()
Another option creates a 2d histogram:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 200)
all_curves = np.array([np.sin(x * np.random.normal(1, 0.1)) * np.random.normal(1, 0.1) for _ in range(100)])
plt.hist2d(x=np.tile(x, all_curves.shape[0]), y=all_curves.ravel(), bins=(100, 100), cmap='inferno')
plt.show()
Still another approach would use fill_between (as suggested by #bramb) between confidence intervals:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 200)
all_curves = np.array([np.sin(x * np.random.normal(1, 0.1)) * np.random.normal(1, 0.1) for _ in range(1000)])
confidence_interval1 = 95
confidence_interval2 = 80
confidence_interval3 = 50
for ci in [confidence_interval1, confidence_interval2, confidence_interval3]:
low = np.percentile(all_curves, 50 - ci / 2, axis=0)
high = np.percentile(all_curves, 50 + ci / 2, axis=0)
plt.fill_between(x, low, high, color='r', alpha=0.2)
plt.plot(x, np.sin(x), color='b')
plt.margins(x=0)
plt.show()
You could use something like the matplotlib.pyplot.fill_between method. It fills everything between y1 (max) and y2 (min) for a given (common) x array. You would then be able to accentuate that the filled region keeps enlarging with increasing x value.
However, this would require you to find the minimal and maximal value of your deviations at each time point and save these to two separate arrays. The exact method of doing this will depend on how you are storing these individual runs.
In case they are separate lists / arrays, you can convert these to a numpy matrix / pandas dataframe and use the minimum / maximum methods along the relevant axis.

How can I create a list of the values on the y-axis without having to plot a graph in Python?

I have a piece of code that plots a random walk with a specified number of bins on my y-axis. Is there a way in Python to replicate/recreate the values on my y-axis, without having to plot the graph? Below is the code I've been working on and the method I've tried is to divide the min-max range by the number of
wanted bins and thereafter create a list with these values. However, I find my method far from optimal and not close to the results I get by using the below code.
I am greatful for any help on this matter!
import matplotlib.pyplot as plt
import numpy as np
import random
dims = 1
step_n = 2000
step_set = [-1, 0, 1]
origin = np.zeros((1,dims))
random.seed(30)
step_shape = (step_n,dims)
steps = np.random.choice(a=step_set, size=step_shape)
path = np.concatenate([origin, steps]).cumsum(0)
# create subplot
fig, ax = plt.subplots(1,1, figsize=(20, 11))
img = ax.plot(path)
plt.locator_params(axis='y', nbins=20)
y_values = ax.get_yticks() # y_values is a numpy array with my y values
I am not sure, if I understood your problem correctly.
Matplotlib defines the differences between the ticks in a way, that I assume are mostly multiples of 5.
But a general approach could be, to calculate a padding based on the bins you want and add/subtract it. For your given example the following gives the same result as ax.get_yticks()
bins = 19
padding = np.ceil((np.max(path) - np.min(path)) / bins)
np.linspace(np.min(path) - padding, np.max(path) + padding, bins, dtype=np.int32)

Faster way to provide rotation to scatter point plots in matplotlib?

Currently I use the following to plot a set of rotated lines (geologic strike indicators). However, this section of code takes a long time even with only a modest amount of strikes (5000). Each point has a unique rotation. Is there a way to give matplotlib a list with the rotations and perform the plotting faster than rotating one-by-one like this?
sample=#3d-array of points(x,y,theta) where theta is an amount I want to rotate the points by.
for i in range(len(sample.T)):
t = matplotlib.markers.MarkerStyle(marker='|')
t._transform = t.get_transform().rotate_deg(sample[2,i])
plt.scatter(sample[0,i],sample[1,i],marker=t,s=50,c='0',linewidth=1)
Here you create 5000 individual scatter plots. That is for sure inefficient. You may use a solution I proposed in this answer, namely to set the individual markers as paths to a PathCollection. This would work similar to a scatter, with an additional argument m for the markers.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.markers as mmarkers
def mscatter(x,y,ax=None, m=None, **kw):
import matplotlib.markers as mmarkers
if not ax: ax=plt.gca()
sc = ax.scatter(x,y,**kw)
if (m is not None) and (len(m)==len(x)):
paths = []
for marker in m:
if isinstance(marker, mmarkers.MarkerStyle):
marker_obj = marker
else:
marker_obj = mmarkers.MarkerStyle(marker)
path = marker_obj.get_path().transformed(
marker_obj.get_transform())
paths.append(path)
sc.set_paths(paths)
return sc
np.random.seed(42)
data = np.random.rand(5000,3)
data[:,2] *= 360
markers = []
fig, ax = plt.subplots()
for i in range(len(data)):
t = mmarkers.MarkerStyle(marker='|')
t._transform = t.get_transform().rotate_deg(data[i,2])
markers.append(t)
mscatter(data[:,0], data[:,1], m=markers, s=50, c='0', linewidth=1)
plt.show()
If we time this we find that this takes ~250 ms to create the plot with 5000 points and 5000 different angles. The loop solution would in contrast take more than 12 seconds.
So far for the general question on how to rotate many markers. For the special case here, it seems you want to use simple line markers. This could easily be done using a quiver plot. One may then turn the arrow heads off to have the arrows look like lines.
fig, ax = plt.subplots()
ax.quiver(data[:,0], data[:,1], 1,1, angles=data[:,2]+90, scale=1/10, scale_units="dots",
units="dots", color="k", pivot="mid",width=1, headwidth=1, headlength=0)
The result is pretty much the same, with the benefit of this plot only taking ~80 ms, which is again three times faster than the PathCollection.

How to eliminate connecting line when plotting unwrapped function

I am trying to figure out how to eliminate the spurious "connecting line" that occurs when a function is "chopped" up so that it is plotted only in a single interval. For example, suppose I have an angular function that extends from zero to 10 pi (or perhaps even larger) and I want to plot this function only in the range 0 to 2 pi. I can use a modulo operation to fix the data, but if I plot it I get a line that connects from 2 pi back to zero, which I do not want to plot. Here is some code that shows what I am talking about.
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 10*np.pi, 1000)
y = t + np.sin(t)
t2 = t%(2*np.pi)
plt.plot(t2, y)
plt.show()
The resulting plot has a series of horizontal lines that I don't want (see image below). I have done some research on this and have not found any simple way of dealing with this situation, but it seems like this would be somewhat common.
Any ideas?
By the way, I am dealing with a pretty large data set, so I can't very well do anything "by hand."
In general, you can use a NaN to insert a break into a point. In the particular case that you have shown, you can use np.diff to identify the discontinuities and set the t2 value at those locations to NaN resulting in the desired breaks
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 10*np.pi, 1000)
y = t + np.sin(t)
t2 = t % (2*np.pi)
# Compute the difference between successive t2 values
diffs = np.append(np.diff(t2), 0)
# Find the differences that are greater than pi
discont_indices = np.abs(diffs) > np.pi
# Set those t2 values to NaN
t2[discont_indices] = np.nan
plt.plot(t2, y)
plt.show()
You can approach the same problem in a slightly different way: Create a x-mesh from 0 to 2*pi and then add an offset for y for plotting the five different curves. The key here is to exclude the last point of t using the index [0:-1] in order to avoid the continuation line.
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 2*np.pi, 1000)
t = (t%(2*np.pi))[0:-1]
for i in range(5):
y = t+ np.sin(t) + i*2*np.pi
plt.plot(t, y, 'b')

Categories