I'm trying to control the y axis order on a matplotlib scatter plot but the ordering of the x and y axes in the data I have is causing the plot to be displayed incorrectly.
Here's some code to illustrate the problem and one sub-optimal attempt to make a solution.
import pandas as pd
from numpy import random
import matplotlib.pyplot as plt
# make some fake data
axes = ['a', 'b', 'c', 'd']
pairs = pd.DataFrame([(x, y) for x in axes for y in axes], columns=['x', 'y'])
pairs['value'] = random.randint(100, size=16) + 100
# remove the diagonal
pairs_nodiag = pairs[pairs['x'] != pairs['y']]
# zero the values for the diagonal
pairs_diag = pairs.copy()
pairs_diag.loc[pairs_diag['x'] == pairs_diag['y'], 'value'] = 0
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(5, 3))
scatter = ax[0].scatter(x=pairs['x'], y=pairs['y'], s=pairs['value'])
scatter = ax[1].scatter(x=pairs_nodiag['x'], y=pairs_nodiag['y'], s=pairs_nodiag['value'])
scatter = ax[2].scatter(x=pairs_diag['x'], y=pairs_diag['y'], s=pairs_diag['value'])
plt.show()
The left most is the raw data. The middle is the plot with the problem; I want the y axis to be the same as the left most plot. The right most plot is what I am after using a sub-optimal workaround. I'm sure there is a way of controlling the ordering on the axes but I'm not expert enough in Python yet to know exactly how to do this.
You need to create your own StringCategoryConverter with your desired mapping (matplotlib by default maps strings to numbers in the sequence the occur).
import matplotlib.category as mcat
# insert the following before scatter = ax[1].scatter(...
units = mcat.UnitData(sorted(pairs_nodiag.y.unique()))
ax[1].yaxis.set_units(units)
ax[1].yaxis.set_major_locator(mcat.StrCategoryLocator(units._mapping))
ax[1].yaxis.set_major_formatter(mcat.StrCategoryFormatter(units._mapping))
UPDATE: The following is the official way to do it without using _mapping:
import matplotlib
# insert the following before scatter = ax[1].scatter(...
scc = matplotlib.category.StrCategoryConverter()
units = scc.default_units(sorted(pairs_nodiag.y.unique()), ax[1].yaxis)
axisinfo = scc.axisinfo(units, ax[1].yaxis)
ax[1].yaxis.set_major_locator(axisinfo.majloc)
ax[1].yaxis.set_major_formatter(axisinfo.majfmt)
Related
Please see code below. I am trying to get 2 exact same scatter plots, in ax2 I try to set the color after the scatter plot has been created. How can I achieve this?
I want this because in my interface I am trying to have the user (optionally) select data to color the scatterplot with. I could just redo the whole plot but I am guessing for large number of data points, it is better add colors to existing axis object. Is that correct?
import numpy as np
import matplotlib.pyplot as plt
fig, (ax1,ax2) = plt.subplots(1,2)
x = np.random.rand(100)
y = np.random.rand(100)
z = np.random.rand(100)
#in ax1, I set color using the 'c' argument
ax1.scatter(x,y, c=z)
sc = ax2.scatter(x,y)
#in ax2, I try to mimic the 'c' argument with set_color but it raises error
# sc.set_color(z)
Look at this pretty graph.
Is there a way, in matplotlib, to make parts of the red and green graph invisible (where f(x)=0)?
Not just those, but also the single line segment where the flat part connects to the sine curve.
Basically, is it possible to tell matplotlib to only plot graph on a certain interval and not draw the rest (or vice versa)?
You could try replacing your points of interest with np.nan as shown below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# here is some example data because none was provided in the question;
# it is a quadratic from x=-5:5
x = np.arange(-5, 6)
s = pd.Series(x**2, index=x)
# replace all y values less than 4 with np.nan and store in a new Series object
s_mod = s.apply(lambda y: np.nan if y < 4 else y)
# plot the modified data with the original data
fig, ax = plt.subplots()
s.plot(marker='o', markersize=16, ax=ax, label='original')
s_mod.plot(marker='s', ax=ax, label='modified')
ax.legend()
fig # displays as follows
I frequently find myself working in log units for my plots, for example taking np.log10(x) of data before binning it or creating contour plots. The problem is, when I then want to make the plots presentable, the axes are in ugly log units, and the tick marks are evenly spaced.
If I let matplotlib do all the conversions, i.e. by setting ax.set_xaxis('log') then I get very nice looking axes, however I can't do that to my data since it is e.g. already binned in log units. I could manually change the tick labels, but that wouldn't make the tick spacing logarithmic. I suppose I could also go and manually specify the position of every minor tick such it had log spacing, but is that the only way to achieve this? That is a bit tedious so it would be nice if there is a better way.
For concreteness, here is a plot:
I want to have the tick labels as 10^x and 10^y (so '1' is '10', 2 is '100' etc.), and I want the minor ticks to be drawn as ax.set_xaxis('log') would draw them.
Edit: For further concreteness, suppose the plot is generated from an image, like this:
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
x_range = [-5,3] # log10 units
y_range = [-55, -45] # log10 units
p = plt.imshow(img,extent=x_range+y_range)
plt.show()
and all we want to do is change the axes appearance as I have described.
Edit 2: Ok, ImportanceOfBeingErnest's answer is very clever but it is a bit more specific to images than I wanted. I have another example, of binned data this time. Perhaps their technique still works on this, though it is not clear to me if that is the case.
import numpy as np
import pandas as pd
import datashader as ds
from matplotlib import pyplot as plt
import scipy.stats as sps
v1 = sps.lognorm(loc=0, scale=3, s=0.8)
v2 = sps.lognorm(loc=0, scale=1, s=0.8)
x = np.log10(v1.rvs(100000))
y = np.log10(v2.rvs(100000))
x_range=[np.min(x),np.max(x)]
y_range=[np.min(y),np.max(y)]
df = pd.DataFrame.from_dict({"x": x, "y": y})
#------ Aggregate the data ------
cvs = ds.Canvas(plot_width=30, plot_height=30, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'x', 'y')
# Create contour plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(agg, extent=x_range+y_range)
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
The general answer to this question is probably given in this post:
Can I mimic a log scale of an axis in matplotlib without transforming the associated data?
However here an easy option might be to scale the content of the axes and then set the axes to a log scale.
A. image
You may plot your image on a logarithmic scale but make all pixels the same size in log units. Unfortunately imshow does not allow for such kind of image (any more), but one may use pcolormesh for that purpose.
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
extx = [-5,3] # log10 units
exty = [-45, -55] # log10 units
x = np.logspace(extx[0],extx[-1],img.shape[1]+1)
y = np.logspace(exty[0],exty[-1],img.shape[0]+1)
X,Y = np.meshgrid(x,y)
c = img.reshape((img.shape[0]*img.shape[1],img.shape[2]))/255.0
m = plt.pcolormesh(X,Y,X[:-1,:-1], color=c, linewidth=0)
m.set_array(None)
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
plt.show()
B. contour
The same concept can be used for a contour plot.
import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(-1.1,1.9)
y = np.linspace(-1.4,1.55)
X,Y = np.meshgrid(x,y)
agg = np.exp(-(X**2+Y**2)*2)
fig, ax = plt.subplots()
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
exp = lambda x: 10.**(np.array(x))
cf = ax.contourf(exp(X), exp(Y),agg, extent=exp([x.min(),x.max(),y.min(),y.max()]))
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
Example of scatterplot matrix
Is there such a function in matplotlib.pyplot?
For those who do not want to define their own functions, there is a great data analysis libarary in Python, called Pandas, where one can find the scatter_matrix() method:
from pandas.plotting import scatter_matrix
df = pd.DataFrame(np.random.randn(1000, 4), columns = ['a', 'b', 'c', 'd'])
scatter_matrix(df, alpha = 0.2, figsize = (6, 6), diagonal = 'kde')
Generally speaking, matplotlib doesn't usually contain plotting functions that operate on more than one axes object (subplot, in this case). The expectation is that you'd write a simple function to string things together however you'd like.
I'm not quite sure what your data looks like, but it's quite simple to just build a function to do this from scratch. If you're always going to be working with structured or rec arrays, then you can simplify this a touch. (i.e. There's always a name associated with each data series, so you can omit having to specify names.)
As an example:
import itertools
import numpy as np
import matplotlib.pyplot as plt
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
def scatterplot_matrix(data, names, **kwargs):
"""Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid."""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.05, wspace=0.05)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
axes[x,y].plot(data[x], data[y], **kwargs)
# Label the diagonal subplots...
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
return fig
main()
You can also use Seaborn's pairplot function:
import seaborn as sns
sns.set()
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")
Thanks for sharing your code! You figured out all the hard stuff for us. As I was working with it, I noticed a few little things that didn't look quite right.
[FIX #1] The axis tics weren't lining up like I would expect (i.e., in your example above, you should be able to draw a vertical and horizontal line through any point across all plots and the lines should cross through the corresponding point in the other plots, but as it sits now this doesn't occur.
[FIX #2] If you have an odd number of variables you are plotting with, the bottom right corner axes doesn't pull the correct xtics or ytics. It just leaves it as the default 0..1 ticks.
Not a fix, but I made it optional to explicitly input names, so that it puts a default xi for variable i in the diagonal positions.
Below you'll find an updated version of your code that addresses these two points, otherwise preserving the beauty of your code.
import itertools
import numpy as np
import matplotlib.pyplot as plt
def scatterplot_matrix(data, names=[], **kwargs):
"""
Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid.
"""
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(8,8))
fig.subplots_adjust(hspace=0.0, wspace=0.0)
for ax in axes.flat:
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
# FIX #1: this needed to be changed from ...(data[x], data[y],...)
axes[x,y].plot(data[y], data[x], **kwargs)
# Label the diagonal subplots...
if not names:
names = ['x'+str(i) for i in range(numvars)]
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center')
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
# FIX #2: if numvars is odd, the bottom right corner plot doesn't have the
# correct axes limits, so we pull them from other axes
if numvars%2:
xlimits = axes[0,-1].get_xlim()
ylimits = axes[-1,0].get_ylim()
axes[-1,-1].set_xlim(xlimits)
axes[-1,-1].set_ylim(ylimits)
return fig
if __name__=='__main__':
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'],
linestyle='none', marker='o', color='black', mfc='none')
fig.suptitle('Simple Scatterplot Matrix')
plt.show()
Thanks again for sharing this with us. I have used it many times! Oh, and I re-arranged the main() part of the code so that it can be a formal example code or not get called if it is being imported into another piece of code.
While reading the question I expected to see an answer including rpy. I think this is a nice option taking advantage of two beautiful languages. So here it is:
import rpy
import numpy as np
def main():
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
mpg = data[0,:]
disp = data[1,:]
drat = data[2,:]
wt = data[3,:]
rpy.set_default_mode(rpy.NO_CONVERSION)
R_data = rpy.r.data_frame(mpg=mpg,disp=disp,drat=drat,wt=wt)
# Figure saved as eps
rpy.r.postscript('pairsPlot.eps')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
# Figure saved as png
rpy.r.png('pairsPlot.png')
rpy.r.pairs(R_data,
main="Simple Scatterplot Matrix Via RPy")
rpy.r.dev_off()
rpy.set_default_mode(rpy.BASIC_CONVERSION)
if __name__ == '__main__': main()
I can't post an image to show the result :( sorry!
I would like to be able to produce a stacked line graph (similar to the method used here) with Python (preferably using matplotlib, but another library would be fine too). How can I do this?
This similar to the stacked bar graph example on their website, except I'd like the top of bar to be connected with a line segment and the area underneath to be filled. I might be able to approximate this by decreasing the gaps between bars and using lots of bars (but this seems like a hack, and besides I'm not sure if it is possible).
Newer versions of matplotlib contain the function plt.stackplot, which allow for several different "out-of-the-box" stacked area plots:
import numpy as np
import pylab as plt
X = np.arange(0, 10, 1)
Y = X + 5 * np.random.random((5, X.size))
baseline = ["zero", "sym", "wiggle", "weighted_wiggle"]
for n, v in enumerate(baseline):
plt.subplot(2 ,2, n + 1)
plt.stackplot(X, *Y, baseline=v)
plt.title(v)
plt.axis('tight')
plt.show()
I believe Area Plot is a common term for this type of plot, and in the specific instance recited in the OP, Stacked Area Plot.
Matplotlib does not have an "out-of-the-box" function that combines both the data processing and drawing/rendering steps to create a this type of plot, but it's easy to roll your own from components supplied by Matplotlib and NumPy.
The code below first stacks the data, then draws the plot.
import numpy as NP
from matplotlib import pyplot as PLT
# just create some random data
fnx = lambda : NP.random.randint(3, 10, 10)
y = NP.row_stack((fnx(), fnx(), fnx()))
# this call to 'cumsum' (cumulative sum), passing in your y data,
# is necessary to avoid having to manually order the datasets
x = NP.arange(10)
y_stack = NP.cumsum(y, axis=0) # a 3x10 array
fig = PLT.figure()
ax1 = fig.add_subplot(111)
ax1.fill_between(x, 0, y_stack[0,:], facecolor="#CC6666", alpha=.7)
ax1.fill_between(x, y_stack[0,:], y_stack[1,:], facecolor="#1DACD6", alpha=.7)
ax1.fill_between(x, y_stack[1,:], y_stack[2,:], facecolor="#6E5160")
PLT.show()
If you have a dataframe, it's quite easy:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area();
From: pandas documentation
A slightly less hackish way would be to use a line graph in the first place and matplotlib.pyplot.fill_between. To emulate the stacking you have to shift the points up yourself.
x = np.arange(0,4)
y1 = np.array([1,2,4,3])
y2 = np.array([5,2,1,3])
# y2 should go on top, so shift them up
y2s = y1+y2
plot(x,y1)
plot(x,y2s)
fill_between(x,y1,0,color='blue')
fill_between(x,y1,y2s,color='red')