How to eliminate connecting line when plotting unwrapped function - python

I am trying to figure out how to eliminate the spurious "connecting line" that occurs when a function is "chopped" up so that it is plotted only in a single interval. For example, suppose I have an angular function that extends from zero to 10 pi (or perhaps even larger) and I want to plot this function only in the range 0 to 2 pi. I can use a modulo operation to fix the data, but if I plot it I get a line that connects from 2 pi back to zero, which I do not want to plot. Here is some code that shows what I am talking about.
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 10*np.pi, 1000)
y = t + np.sin(t)
t2 = t%(2*np.pi)
plt.plot(t2, y)
plt.show()
The resulting plot has a series of horizontal lines that I don't want (see image below). I have done some research on this and have not found any simple way of dealing with this situation, but it seems like this would be somewhat common.
Any ideas?
By the way, I am dealing with a pretty large data set, so I can't very well do anything "by hand."

In general, you can use a NaN to insert a break into a point. In the particular case that you have shown, you can use np.diff to identify the discontinuities and set the t2 value at those locations to NaN resulting in the desired breaks
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 10*np.pi, 1000)
y = t + np.sin(t)
t2 = t % (2*np.pi)
# Compute the difference between successive t2 values
diffs = np.append(np.diff(t2), 0)
# Find the differences that are greater than pi
discont_indices = np.abs(diffs) > np.pi
# Set those t2 values to NaN
t2[discont_indices] = np.nan
plt.plot(t2, y)
plt.show()

You can approach the same problem in a slightly different way: Create a x-mesh from 0 to 2*pi and then add an offset for y for plotting the five different curves. The key here is to exclude the last point of t using the index [0:-1] in order to avoid the continuation line.
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 2*np.pi, 1000)
t = (t%(2*np.pi))[0:-1]
for i in range(5):
y = t+ np.sin(t) + i*2*np.pi
plt.plot(t, y, 'b')

Related

Plotting an intersection when graph touches an x-axis

So I'm making a Graphical Calculator, which shows an intersection between graphs and axes. I found the method from Intersection of two graphs in Python, find the x value to work most of the time, however trying to plot the x-axis intersection of x**2 as such
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-5, 5, 0.01)
g = (x) ** 2
plt.plot(x, g, '-')
idx = np.argwhere(np.diff(np.sign(g))).flatten()
plt.plot(x[idx], g[idx], 'ro')
plt.show()
doesn't put the dot at (0,0) point. I assumed it has something to do with the fact that 0 is not in g, so the grpah it doesn't actually pass through the point exactly and instead gets really close to it. So I experimented with changing idx to
epsilon = 0.0001
# or another real small number
idx = g < epsilon
Unfortunately, that only seemed to make a lot of points near the actual x-intercept, instead of just one.
You are close, instead, I just search for where the absolute value of the derivative is at a minimum such that
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-5, 5, 0.01)
g = x**2
plt.plot(np.abs(np.diff(g)))
plt.show()
which shows that the minimum should be at index 500:
Then all you need to do is return the index of the minimum value with argmin and plot that point
idx = np.argmin(np.abs(np.diff(g)))
plt.plot(x, g, '-')
plt.scatter(x[idx],g[idx])
plt.show()
You'll need to modify the idx variable to return multiple roots, but for the question you posted, this should be sufficient.

Radial heatmap from similarity matrix in Python

Summary
I have a 2880x2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?
Details
I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 ... 22:00, 23:00).
These were about 28-31 cells long, and each cell had the measurement of the thing I'm trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24x12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880x2880 matrix with similarity coefficients.
I'm trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions:
I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It's just blank. I've added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.
import sys
sys.setrecursionlimit(10000)
import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc
# Function creating dummy data for this example
def transformer():
dimension = 2880
dummy_matrix = ([[ random.random() for i in range(dimension) ] for j in range(dimension)]) #Fake, similar data
col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
idx_vals = [i for i in range(dimension*dimension)] # Placeholder
return idx_vals, val_vals, row_vals, col_vals
idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)
hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))
gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))
I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it's possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn't have that datashade-feature.
So I have no idea how to tackle this. I would be content with a broad overview too, I don't need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I'm open to any solution really.
I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:
The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
Near the center of the radial heatmap, the color points will be too dense to be understood by the human.
The following is a simple code to show a heatmap.
import matplotlib.cm
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import numpy as np
n = 2880
m = 2880
dummy_matrix = np.random.rand(m, n)
fig = plt.figure(figsize=(50,50)) # change the figsize to control the resolution
ax = fig.add_subplot(111)
cmap = matplotlib.cm.get_cmap("Blues") # you may use other build-in colormap or define you own colormap
# if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
plt.colorbar(image)
plt.show()
Which gives:
Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.
Plain Matplotlib seems to be able to handle it, based on answers from here: How do I create radial heatmap in matplotlib?
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
n = 2880
m = 2880
rad = np.linspace(0, 10, m)
a = np.linspace(0, 2 * np.pi, n)
r, th = np.meshgrid(rad, a)
dummy_matrix = ([[ random.random() for i in range(n) ] for j in range(m)])
plt.subplot(projection="polar")
plt.pcolormesh(th, r, dummy_matrix, cmap = 'Blues')
plt.plot(a, r, ls='none', color = 'k')
plt.grid()
plt.colorbar()
plt.savefig("custom_radial_heatmap.png")
plt.show()
And it didn't even take an eternity, took only about 20 seconds max.
You would think it would turn out monstrous like that
But the sheer amount of points drowns out the jaggedness, WOOHOO!
There's some things left to be desired, like tags and ticks, but I think I'll figure that out.

Plot stochastic trajectories deviations from 'real' path using a colormesh in matplotlib (Python)

Hi I created a program that will create deviations from a real trajectory, it is complicated and I do not have a simple example unfortunately.
It calculates a path with stochastic initial conditions from the real path and does this for x iterations, the goal is to show that the deviations become larger at greater times.
The real path and the deviations are showed below.
However I want to show that the deviations become greater the longer in time we are. Ofcourse I could just calculate the variance and plot mean+var and mean-var at each time step but I was wondering if I could plot something like this, using hist2d
You see that the blocks are not as smooth as a like and this is not that great to use.
Then I went and looked at python's kde and created the following.
This is also not preferable as I think it bins more points at the minima and maxima. Also it is 'too smeared out'. Especially in the beginning, all the points are the same so I want there just to be a straight line to really show that the deviations start later on.
I guess my question is; is what I want even possible and what package/command should I use. I haven't found what I am looking for on other questions. Or has anyone a suggestion to nicely show what I want in a any other way?
Here is an idea plotting multiple curves with transparency on top of each other:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 200)
for _ in range(1000):
plt.plot(x, np.sin(x * np.random.normal(1, 0.1)) * np.random.normal(1, 0.1), color='r', alpha=0.02)
plt.plot(x, np.sin(x), color='b')
plt.margins(x=0)
plt.show()
Another option creates a 2d histogram:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 200)
all_curves = np.array([np.sin(x * np.random.normal(1, 0.1)) * np.random.normal(1, 0.1) for _ in range(100)])
plt.hist2d(x=np.tile(x, all_curves.shape[0]), y=all_curves.ravel(), bins=(100, 100), cmap='inferno')
plt.show()
Still another approach would use fill_between (as suggested by #bramb) between confidence intervals:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 200)
all_curves = np.array([np.sin(x * np.random.normal(1, 0.1)) * np.random.normal(1, 0.1) for _ in range(1000)])
confidence_interval1 = 95
confidence_interval2 = 80
confidence_interval3 = 50
for ci in [confidence_interval1, confidence_interval2, confidence_interval3]:
low = np.percentile(all_curves, 50 - ci / 2, axis=0)
high = np.percentile(all_curves, 50 + ci / 2, axis=0)
plt.fill_between(x, low, high, color='r', alpha=0.2)
plt.plot(x, np.sin(x), color='b')
plt.margins(x=0)
plt.show()
You could use something like the matplotlib.pyplot.fill_between method. It fills everything between y1 (max) and y2 (min) for a given (common) x array. You would then be able to accentuate that the filled region keeps enlarging with increasing x value.
However, this would require you to find the minimal and maximal value of your deviations at each time point and save these to two separate arrays. The exact method of doing this will depend on how you are storing these individual runs.
In case they are separate lists / arrays, you can convert these to a numpy matrix / pandas dataframe and use the minimum / maximum methods along the relevant axis.

How to draw enveloping line with a shaded area which incorporates a large number of data points?

For the figure above, how can I draw an enveloping line with a shaded area, similar to the figure below?
Replicating your example is easy because it's possible to calculate the min and max at each x and fill between them. eg.
import matplotlib.pyplot as plt
import numpy as np
#dummy data
y = [range(20) + 3 * i for i in np.random.randn(3, 20)]
x = list(range(20))
#calculate the min and max series for each x
min_ser = [min(i) for i in np.transpose(y)]
max_ser = [max(i) for i in np.transpose(y)]
#initial plot
fig, axs = plt.subplots()
axs.plot(x, x)
for s in y:
axs.scatter(x, s)
#plot the min and max series over the top
axs.fill_between(x, min_ser, max_ser, alpha=0.2)
giving
For your displayed data, that might prove problematic because the series do not share x values in all cases. If that's the case then you need some statistical technique to smooth the series somehow. One option is to use a package like seaborn, which provides functions to handle all the details for you.

Matplotlib Agg Rendering Complexity Error

I am trying to print a 600 dpi graph using Python matplotlib. However Python plotted 2 out of 8 graphs, and output the error:
OverflowError: Agg rendering complexity exceeded. Consider downsampling or decimating your data.
I am plotting a huge chunk of data (7,500,000 data per column) so I guess either that would be some overloading problem or that I need to set a large cell_block_limit.
I tried searching for the solutions for changing a cell_block_limit on Google but to no avail. What would be a good approach?
The code as follows:-
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
majorLocator = MultipleLocator(200)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(20)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
ax.xaxis.set_ticks_position('bottom')
ax.xaxis.grid(True,which='minor')
ax.yaxis.grid(True)
plt.plot(timemat,fildata)
plt.xlabel(plotxlabel,fontsize=14)
plt.ylabel(plotylabel,fontsize=14)
plt.title(plottitle,fontsize=16)
fig.savefig(plotsavetitle,dpi=600)
In addition to #Lennart's point that there's no need for the full resolution, you might also consider a plot similar to the following.
Calculating the max/mean/min of a "chunked" version is very simple and efficient if you use a 2D view of the original array and the axis keyword arg to x.min(), x.max(), etc.
Even with the filtering, plotting this is much faster than plotting the full array.
(Note: to plot this many points, you'll have to tune down the noise level a bit. Otherwise you'll get the OverflowError you mentioned. If you want to compare plotting the "full" dataset, change the y += 0.3 * y.max() np.random... line to more like 0.1 or remove it completely.)
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1977)
# Generate some very noisy but interesting data...
num = 1e7
x = np.linspace(0, 10, num)
y = np.random.random(num) - 0.5
y.cumsum(out=y)
y += 0.3 * y.max() * np.random.random(num)
fig, ax = plt.subplots()
# Wrap the array into a 2D array of chunks, truncating the last chunk if
# chunksize isn't an even divisor of the total size.
# (This part won't use _any_ additional memory)
chunksize = 10000
numchunks = y.size // chunksize
ychunks = y[:chunksize*numchunks].reshape((-1, chunksize))
xchunks = x[:chunksize*numchunks].reshape((-1, chunksize))
# Calculate the max, min, and means of chunksize-element chunks...
max_env = ychunks.max(axis=1)
min_env = ychunks.min(axis=1)
ycenters = ychunks.mean(axis=1)
xcenters = xchunks.mean(axis=1)
# Now plot the bounds and the mean...
ax.fill_between(xcenters, min_env, max_env, color='gray',
edgecolor='none', alpha=0.5)
ax.plot(xcenters, ycenters)
fig.savefig('temp.png', dpi=600)
With 600dpi you would have to make the plot 13 meters wide to plot that data without decimating it. :-)
I would suggest chunking the data into pieces a couple of hundred or maybe even a thousand samples long, and extracting the maximum value out of that.
Something like this:
def chunkmax(data, chunk_size):
source = iter(data)
chunk = []
while True:
for i in range(chunk_size):
chunk.append(next(source))
yield max(chunk)
This would then, with a chunk_size of 1000 give you 7500 points to plot, where you then easily can see where in the data the shock comes. (Unless the data is so noisy you would have to average it to see if there is a chock or not. But that's also easily fixable).

Categories