How can I create stacked line graph? - python

I would like to be able to produce a stacked line graph (similar to the method used here) with Python (preferably using matplotlib, but another library would be fine too). How can I do this?
This similar to the stacked bar graph example on their website, except I'd like the top of bar to be connected with a line segment and the area underneath to be filled. I might be able to approximate this by decreasing the gaps between bars and using lots of bars (but this seems like a hack, and besides I'm not sure if it is possible).

Newer versions of matplotlib contain the function plt.stackplot, which allow for several different "out-of-the-box" stacked area plots:
import numpy as np
import pylab as plt
X = np.arange(0, 10, 1)
Y = X + 5 * np.random.random((5, X.size))
baseline = ["zero", "sym", "wiggle", "weighted_wiggle"]
for n, v in enumerate(baseline):
plt.subplot(2 ,2, n + 1)
plt.stackplot(X, *Y, baseline=v)
plt.title(v)
plt.axis('tight')
plt.show()

I believe Area Plot is a common term for this type of plot, and in the specific instance recited in the OP, Stacked Area Plot.
Matplotlib does not have an "out-of-the-box" function that combines both the data processing and drawing/rendering steps to create a this type of plot, but it's easy to roll your own from components supplied by Matplotlib and NumPy.
The code below first stacks the data, then draws the plot.
import numpy as NP
from matplotlib import pyplot as PLT
# just create some random data
fnx = lambda : NP.random.randint(3, 10, 10)
y = NP.row_stack((fnx(), fnx(), fnx()))
# this call to 'cumsum' (cumulative sum), passing in your y data,
# is necessary to avoid having to manually order the datasets
x = NP.arange(10)
y_stack = NP.cumsum(y, axis=0) # a 3x10 array
fig = PLT.figure()
ax1 = fig.add_subplot(111)
ax1.fill_between(x, 0, y_stack[0,:], facecolor="#CC6666", alpha=.7)
ax1.fill_between(x, y_stack[0,:], y_stack[1,:], facecolor="#1DACD6", alpha=.7)
ax1.fill_between(x, y_stack[1,:], y_stack[2,:], facecolor="#6E5160")
PLT.show()

If you have a dataframe, it's quite easy:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area();
From: pandas documentation

A slightly less hackish way would be to use a line graph in the first place and matplotlib.pyplot.fill_between. To emulate the stacking you have to shift the points up yourself.
x = np.arange(0,4)
y1 = np.array([1,2,4,3])
y2 = np.array([5,2,1,3])
# y2 should go on top, so shift them up
y2s = y1+y2
plot(x,y1)
plot(x,y2s)
fill_between(x,y1,0,color='blue')
fill_between(x,y1,y2s,color='red')

Related

Artifact in matplotlib.pyplot.imshow

I'm trying to make a colorplot of a function with matplotlob.pyplot.imshow. However, depending on the size of the plot, I get a vertical line as an artifact.
The code to generate the plot is:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from matplotlib import cm
def double_vortex(X,Y):
return np.angle((X + 25)+1j*Y) - np.angle((X - 25)+1j*Y)
X = np.arange(-50,50)
Y = np.arange(-50,50)
X, Y = np.meshgrid(X, Y)
phi0_vortex = double_vortex(X,Y)
fig = plt.figure(figsize=(16,8))
gs = gridspec.GridSpec(1, 3, width_ratios=[2.5, 1.5,1])
for i in range(3):
ax = plt.subplot(gs[i])
ax.imshow(phi0_vortex % (2*np.pi), cmap=cm.hsv, vmin=0, vmax=2*np.pi)
The resulting plot is this:
You can see that the two smaller plots exhibit a vertical line as an artefact. Is this a bug in matplotlib or somehow actually to be expected?
This is a consequence of matplotlib's downsampling algorithm, which happens in data space, and in your case a pair of pixels that has [359, 1] in them, get averaged to 180, and you get the cyan line. This is https://github.com/matplotlib/matplotlib/issues/18735 for which we are working on a solution to allow RGB-space downsampling (as well).
What can you do about this until that is improved in Matplotlib? Don't downsample in Matplotlib is the simple answer - make a big png, and then resample in post-processing software like imagemagick.

Control scatter plot y axis order in matplotlib

I'm trying to control the y axis order on a matplotlib scatter plot but the ordering of the x and y axes in the data I have is causing the plot to be displayed incorrectly.
Here's some code to illustrate the problem and one sub-optimal attempt to make a solution.
import pandas as pd
from numpy import random
import matplotlib.pyplot as plt
# make some fake data
axes = ['a', 'b', 'c', 'd']
pairs = pd.DataFrame([(x, y) for x in axes for y in axes], columns=['x', 'y'])
pairs['value'] = random.randint(100, size=16) + 100
# remove the diagonal
pairs_nodiag = pairs[pairs['x'] != pairs['y']]
# zero the values for the diagonal
pairs_diag = pairs.copy()
pairs_diag.loc[pairs_diag['x'] == pairs_diag['y'], 'value'] = 0
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(5, 3))
scatter = ax[0].scatter(x=pairs['x'], y=pairs['y'], s=pairs['value'])
scatter = ax[1].scatter(x=pairs_nodiag['x'], y=pairs_nodiag['y'], s=pairs_nodiag['value'])
scatter = ax[2].scatter(x=pairs_diag['x'], y=pairs_diag['y'], s=pairs_diag['value'])
plt.show()
The left most is the raw data. The middle is the plot with the problem; I want the y axis to be the same as the left most plot. The right most plot is what I am after using a sub-optimal workaround. I'm sure there is a way of controlling the ordering on the axes but I'm not expert enough in Python yet to know exactly how to do this.
You need to create your own StringCategoryConverter with your desired mapping (matplotlib by default maps strings to numbers in the sequence the occur).
import matplotlib.category as mcat
# insert the following before scatter = ax[1].scatter(...
units = mcat.UnitData(sorted(pairs_nodiag.y.unique()))
ax[1].yaxis.set_units(units)
ax[1].yaxis.set_major_locator(mcat.StrCategoryLocator(units._mapping))
ax[1].yaxis.set_major_formatter(mcat.StrCategoryFormatter(units._mapping))
UPDATE: The following is the official way to do it without using _mapping:
import matplotlib
# insert the following before scatter = ax[1].scatter(...
scc = matplotlib.category.StrCategoryConverter()
units = scc.default_units(sorted(pairs_nodiag.y.unique()), ax[1].yaxis)
axisinfo = scc.axisinfo(units, ax[1].yaxis)
ax[1].yaxis.set_major_locator(axisinfo.majloc)
ax[1].yaxis.set_major_formatter(axisinfo.majfmt)

Making a matplotlib graph partially invisible

Look at this pretty graph.
Is there a way, in matplotlib, to make parts of the red and green graph invisible (where f(x)=0)?
Not just those, but also the single line segment where the flat part connects to the sine curve.
Basically, is it possible to tell matplotlib to only plot graph on a certain interval and not draw the rest (or vice versa)?
You could try replacing your points of interest with np.nan as shown below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# here is some example data because none was provided in the question;
# it is a quadratic from x=-5:5
x = np.arange(-5, 6)
s = pd.Series(x**2, index=x)
# replace all y values less than 4 with np.nan and store in a new Series object
s_mod = s.apply(lambda y: np.nan if y < 4 else y)
# plot the modified data with the original data
fig, ax = plt.subplots()
s.plot(marker='o', markersize=16, ax=ax, label='original')
s_mod.plot(marker='s', ax=ax, label='modified')
ax.legend()
fig # displays as follows

Manually draw log-spaced tick marks and labels in matplotlib

I frequently find myself working in log units for my plots, for example taking np.log10(x) of data before binning it or creating contour plots. The problem is, when I then want to make the plots presentable, the axes are in ugly log units, and the tick marks are evenly spaced.
If I let matplotlib do all the conversions, i.e. by setting ax.set_xaxis('log') then I get very nice looking axes, however I can't do that to my data since it is e.g. already binned in log units. I could manually change the tick labels, but that wouldn't make the tick spacing logarithmic. I suppose I could also go and manually specify the position of every minor tick such it had log spacing, but is that the only way to achieve this? That is a bit tedious so it would be nice if there is a better way.
For concreteness, here is a plot:
I want to have the tick labels as 10^x and 10^y (so '1' is '10', 2 is '100' etc.), and I want the minor ticks to be drawn as ax.set_xaxis('log') would draw them.
Edit: For further concreteness, suppose the plot is generated from an image, like this:
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
x_range = [-5,3] # log10 units
y_range = [-55, -45] # log10 units
p = plt.imshow(img,extent=x_range+y_range)
plt.show()
and all we want to do is change the axes appearance as I have described.
Edit 2: Ok, ImportanceOfBeingErnest's answer is very clever but it is a bit more specific to images than I wanted. I have another example, of binned data this time. Perhaps their technique still works on this, though it is not clear to me if that is the case.
import numpy as np
import pandas as pd
import datashader as ds
from matplotlib import pyplot as plt
import scipy.stats as sps
v1 = sps.lognorm(loc=0, scale=3, s=0.8)
v2 = sps.lognorm(loc=0, scale=1, s=0.8)
x = np.log10(v1.rvs(100000))
y = np.log10(v2.rvs(100000))
x_range=[np.min(x),np.max(x)]
y_range=[np.min(y),np.max(y)]
df = pd.DataFrame.from_dict({"x": x, "y": y})
#------ Aggregate the data ------
cvs = ds.Canvas(plot_width=30, plot_height=30, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'x', 'y')
# Create contour plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(agg, extent=x_range+y_range)
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
The general answer to this question is probably given in this post:
Can I mimic a log scale of an axis in matplotlib without transforming the associated data?
However here an easy option might be to scale the content of the axes and then set the axes to a log scale.
A. image
You may plot your image on a logarithmic scale but make all pixels the same size in log units. Unfortunately imshow does not allow for such kind of image (any more), but one may use pcolormesh for that purpose.
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
extx = [-5,3] # log10 units
exty = [-45, -55] # log10 units
x = np.logspace(extx[0],extx[-1],img.shape[1]+1)
y = np.logspace(exty[0],exty[-1],img.shape[0]+1)
X,Y = np.meshgrid(x,y)
c = img.reshape((img.shape[0]*img.shape[1],img.shape[2]))/255.0
m = plt.pcolormesh(X,Y,X[:-1,:-1], color=c, linewidth=0)
m.set_array(None)
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
plt.show()
B. contour
The same concept can be used for a contour plot.
import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(-1.1,1.9)
y = np.linspace(-1.4,1.55)
X,Y = np.meshgrid(x,y)
agg = np.exp(-(X**2+Y**2)*2)
fig, ax = plt.subplots()
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
exp = lambda x: 10.**(np.array(x))
cf = ax.contourf(exp(X), exp(Y),agg, extent=exp([x.min(),x.max(),y.min(),y.max()]))
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()

How to draw enveloping line with a shaded area which incorporates a large number of data points?

For the figure above, how can I draw an enveloping line with a shaded area, similar to the figure below?
Replicating your example is easy because it's possible to calculate the min and max at each x and fill between them. eg.
import matplotlib.pyplot as plt
import numpy as np
#dummy data
y = [range(20) + 3 * i for i in np.random.randn(3, 20)]
x = list(range(20))
#calculate the min and max series for each x
min_ser = [min(i) for i in np.transpose(y)]
max_ser = [max(i) for i in np.transpose(y)]
#initial plot
fig, axs = plt.subplots()
axs.plot(x, x)
for s in y:
axs.scatter(x, s)
#plot the min and max series over the top
axs.fill_between(x, min_ser, max_ser, alpha=0.2)
giving
For your displayed data, that might prove problematic because the series do not share x values in all cases. If that's the case then you need some statistical technique to smooth the series somehow. One option is to use a package like seaborn, which provides functions to handle all the details for you.

Categories