How to show bigger the distance between two curves in Matplotlib - python

I wrote a code to plot two curves passing through two data sets. However, after plotting, it turned out that the distance is little. Therefore, I need to rescale the y-axis to show the distance clearly in the resulting figure. But I do not know how to do this.
The code is this:
import numpy as np
from scipy.interpolate import make_interp_spline
import matplotlib.pyplot as plt
# Dataset
x = np.array([41/300,46/300,65/300,69/300,73/300,75/300,81/300,87/300,101/300,116/300,122/300,128/300])
y = np.array([186.492147/100,185.1351/100,181.1801/100,179.8990/100,178.0152/100,177.4235/100,176.8346/100,175.6332/100,173.8296/100 ,
172.0626/100,171.4815/100, 170.9044/100])
X_Y_Spline = make_interp_spline(x, y)
X_ = np.linspace(x.min(), x.max(), 500)
Y_ = X_Y_Spline(X_)
u = np.array([50/300,54/300,66/300,72/300,74/300,75/300,101/300,102/300,110/300,113/300,116/300,118/300,130/300])
v = np.array([183.724636117654/100,182.900/100,180.886911726436/100,178.160318153782/100,177.626286563695/100,
177.403672688541/100,173.695126295789/100,173.666369492423/100,172.52494955916/100,172.273125402593/100,
171.888413653382/100, 171.633213514319/100,170.670417094034/100])
uv_Spline = make_interp_spline(u,v)
U_ = np.linspace(u.min(), u.max(), 500)
V_ = uv_Spline(U_)
plt.plot(U_, V_,linewidth=0.5)
plt.plot(X_, Y_,linewidth=0.5)
plt.show()
I want to show clearly that these two curves have enough distances between themselves somewhere in their domain. Is there any way to do so? I do not know whether to rescale the y-axis? If so, what is is syntax?
I would appreciate any hint.

Related

Scatter plot over 2D-histogram in matplotlib with log-scale

I have two sets of points with values (x, y). One is enormous (300k) and one is small (2k). I want to show a scatter plot of the latter over a 2D-histogram of the former in log-log scale. plt.xscale('log')-like commands keep messing up the histogram and when I just take logs of x's and y's and then do all the plotting, my ticks are say -3 not 10^-3 and the pretty logarithmic minor ticks are missing altogether. What's the most elegant solution in matplotlib? Do I have to dig into the artist layer?
If you forgive a bit of self-advertisement, you may use my library physt (see https://github.com/janpipek/physt). Then, you can write code like this:
import numpy as np
import matplotlib.pyplot as plt
from physt import h2
# Data
r1 = np.random.normal(0, 1, 20000)
r2 = np.random.normal(0, .3, 20000) + r1
x = np.exp(r1)
y = np.exp(r2)
# Plot scatter
fig, ax = plt.subplots()
ax.scatter(x[:1000], y[:1000], s=2)
H = h2(x, y, "exponential")
H.plot(ax=ax, zorder=-1) # Necessary to put behind
Which, I hope is the solution to your problem:

Manually draw log-spaced tick marks and labels in matplotlib

I frequently find myself working in log units for my plots, for example taking np.log10(x) of data before binning it or creating contour plots. The problem is, when I then want to make the plots presentable, the axes are in ugly log units, and the tick marks are evenly spaced.
If I let matplotlib do all the conversions, i.e. by setting ax.set_xaxis('log') then I get very nice looking axes, however I can't do that to my data since it is e.g. already binned in log units. I could manually change the tick labels, but that wouldn't make the tick spacing logarithmic. I suppose I could also go and manually specify the position of every minor tick such it had log spacing, but is that the only way to achieve this? That is a bit tedious so it would be nice if there is a better way.
For concreteness, here is a plot:
I want to have the tick labels as 10^x and 10^y (so '1' is '10', 2 is '100' etc.), and I want the minor ticks to be drawn as ax.set_xaxis('log') would draw them.
Edit: For further concreteness, suppose the plot is generated from an image, like this:
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
x_range = [-5,3] # log10 units
y_range = [-55, -45] # log10 units
p = plt.imshow(img,extent=x_range+y_range)
plt.show()
and all we want to do is change the axes appearance as I have described.
Edit 2: Ok, ImportanceOfBeingErnest's answer is very clever but it is a bit more specific to images than I wanted. I have another example, of binned data this time. Perhaps their technique still works on this, though it is not clear to me if that is the case.
import numpy as np
import pandas as pd
import datashader as ds
from matplotlib import pyplot as plt
import scipy.stats as sps
v1 = sps.lognorm(loc=0, scale=3, s=0.8)
v2 = sps.lognorm(loc=0, scale=1, s=0.8)
x = np.log10(v1.rvs(100000))
y = np.log10(v2.rvs(100000))
x_range=[np.min(x),np.max(x)]
y_range=[np.min(y),np.max(y)]
df = pd.DataFrame.from_dict({"x": x, "y": y})
#------ Aggregate the data ------
cvs = ds.Canvas(plot_width=30, plot_height=30, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'x', 'y')
# Create contour plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(agg, extent=x_range+y_range)
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
The general answer to this question is probably given in this post:
Can I mimic a log scale of an axis in matplotlib without transforming the associated data?
However here an easy option might be to scale the content of the axes and then set the axes to a log scale.
A. image
You may plot your image on a logarithmic scale but make all pixels the same size in log units. Unfortunately imshow does not allow for such kind of image (any more), but one may use pcolormesh for that purpose.
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
extx = [-5,3] # log10 units
exty = [-45, -55] # log10 units
x = np.logspace(extx[0],extx[-1],img.shape[1]+1)
y = np.logspace(exty[0],exty[-1],img.shape[0]+1)
X,Y = np.meshgrid(x,y)
c = img.reshape((img.shape[0]*img.shape[1],img.shape[2]))/255.0
m = plt.pcolormesh(X,Y,X[:-1,:-1], color=c, linewidth=0)
m.set_array(None)
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
plt.show()
B. contour
The same concept can be used for a contour plot.
import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(-1.1,1.9)
y = np.linspace(-1.4,1.55)
X,Y = np.meshgrid(x,y)
agg = np.exp(-(X**2+Y**2)*2)
fig, ax = plt.subplots()
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
exp = lambda x: 10.**(np.array(x))
cf = ax.contourf(exp(X), exp(Y),agg, extent=exp([x.min(),x.max(),y.min(),y.max()]))
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()

Matplotlib: Coloring scatter plot by density relative to another data set

I'm new to Python and having some trouble with matplotlib. I currently have data that is contained in two numpy arrays, call them x and y, that I am plotting on a scatter plot with coordinates for each point (x, y) (i.e I have points x[0], y[0] and x1, y1 and so on on my plot). I have been using the following code segment to color the points in my scatter plot based on the spatial density of nearby points (found this on another stackoverflow post):
http://prntscr.com/abqowk
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
x = np.random.normal(size=1000)
y = x*3 + np.random.normal(size=1000)
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
idx = z.argsort()
fig,ax = plt.subplots()
ax.scatter(x,y,c=z,s=50,edgecolor='')
plt.show()
Output:
I've been using it without being sure exactly how it works (namely the point density calculation - if someone could explain how exactly that works, would also be much appreciated).
However, now I'd like to color code by the ratio of the spatial density of points in x,y to that of the spatial density of points in another set of numpy arrays, call them x2, y2. That is, I would like to make a plot such that I can identify how the density of points in x,y compares to the points in x2,y2 on the same scatter plot. Could someone please explain how I could go about doing this?
Thanks in advance for your help!
I've been trying to do the same thing based on that same earlier post, and I think I just figured it out! The trick is to use matplotlib.colors.Normalize() to define a scale and then weight it according to some data set (xnorm,ynorm):
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mplc
import matplotlib.cm as cm
from scipy.stats import gaussian_kde
def kdeplot(x,y,xnorm,ynorm):
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
wt = 1.0*len(x)/(len(xnorm)*1.0)
norm = mplc.Normalize(vmin=0, vmax=8/wt)
cmap = cm.gnuplot
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
args = (x,y)
kwargs = {'c':z,'s':10,'edgecolor':'','cmap':cmap,'norm':norm}
return args, kwargs
# (x1,y1) is some data set whose density map coloring you
# want to scale to (xnorm,ynorm)
args,kwargs = kdeplot(x1,y1,xnorm,ynorm)
plt.scatter(*args,**kwargs)
I used trial and error to optimize my normalization for my particular data and choice of colormap. Here's what my data looks like scaled to itself; here's my data scaled to some comparison data (which is on the bottom of that image).
I'm not sure this method is entirely general, but it works in my case: I know that my data and the comparison data are in similar regions of parameter space, and they both have gaussian scatter, so I can use a naive linear scaling determined by the number of data points and it results in something that gives the right idea visually.

Basic scatter plot with reference data on diagonal (identity line)

I have two arrays x,y obtained from a machine learning calculations and I wish to make a scatter plot with the reference data x on the diagonal in a way to visualize better the predicted values y against the true ones x. Please can you suggest me how to do it in python or gnuplot?
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
plt.scatter(x, y, c=colors)
plt.plot( [0,1],[0,1] )
plt.savefig('a.png')
This will produce:
Check this page for more information.
a simple example:
import matplotlib.pyplot as plt
import numpy as np
x=np.linspace(0,100,101)
y=np.random.normal(x) # add some noise
plt.plot(x,y,'r.') # x vs y
plt.plot(x,x,'k-') # identity line
plt.xlim(0,100)
plt.ylim(0,100)
plt.show()
In matplotlib, you can also draw an "infinite" line in order to avoid having to define the exact coordinates. For example, if you have an axes ax, you can do:
pt = (0, 0)
ax.axline(pt, slope=1, color='black')
where pt is an intersection point. Note if pt isn't included in the limits of the plot, the limits will be modified to include it.

Plot staggered histograms/lines as in FACS

My question is basically exaclt the same as this one but for matplotlib. I'm sure it has something to do with axes or subplots, but I don't think I fully understand those paradigms (a fuller explanation would be great).
As I loop through a set of comparisons, I'd like the base y value of each new plot to be set slightly below the previous one to get something like this:
One other (potential) wrinkle is that I'm generating these plots in a loop, so I don't necessarily know how many plots there will be at the outset. I think this is one of the things that I'm getting hung up on with subplots/axes, because it seems like you need to set them ahead of time.
Any ideas would be greatly appreciated.
EDIT: I made a little progress I think:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.random.random(100)
y = np.random.random(100)
fig = plt.figure()
ax = fig.add_axes([1,1,1,1])
ax2 = fig.add_axes([1.02,.9,1,1])
ax.plot(x, color='red')
ax.fill_between([i for i in range(len(x))], 0, x, color='red', alpha=0.5)
ax2.plot(y, color='green')
ax2.fill_between([i for i in range(len(y))], 0, y, color='green', alpha=0.5)
Gives me:
Which is close to what I want...
Is this the sort of thing you want?
What I did was define the y-distance between the baselines of each curve. For the ith curve, I calculated the minimum Y-value, then set that minimum to be i times the y-distance, adjusting the height of the entire curve accordingly. I used a decreasing z-order to ensure that the filled part of the curves were not obscured by the baselines.
Here's the code:
import numpy as np
import matplotlib.pyplot as plt
delta_Y = .5
zorder = 0
for i, Y in enumerate(data):
baseline = min(Y)
#change needed for minimum of Y to be delta_Y above previous curve
y_change = delta_Y * i - baseline
Y = Y + y_change
plt.fill_between(np.linspace(0, 1000, 1000), Y, np.ones(1000) * delta_Y * i, zorder = zorder)
zorder -= 1
Code that generates dummy data:
def gauss(X):
return np.exp(-X**2 / 2.0)
#create data
X = np.linspace(-10, 10, 100)
data = []
for i in xrange(10):
arr = np.zeros(1000)
arr[i * 100: i * 100 + 100] = gauss(X)
data.append(arr)
data.reverse()
You could also look into installing JoyPy through:
pip install joypy
Pretty dynamic tool created by Leonardo Taccari, if what you are looking into is "stacked" distribution plots like so:
Example 1 - Joy Plot using JoyPy:
Example 2 - Joy Plot on Iris dataset:
Leonardo also has a neat description of the package and how to use it here.
Alternatively Seaborn has a package but I found it less easy to use.
Hope that helps!
So I managed to get a little bit farther by adding an additional Axes instance in each loop.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#instantiate data sets
x = np.random.random(100)
y = np.random.random(100)
z = np.random.random(100)
plots = [x, y, z]
fig = plt.figure()
#Sets the default vertical position
pos = 1
def making_plot(ax, p):
ax.plot(p)
# Prevents the background from covering over the earlier plots
ax.set_axis_bgcolor('none')
for p in plots:
ax = fig.add_axes([1,pos,1,1])
pos -= 0.3
making_plot(ax, p)
plt.show()
Clearly, I could spend more time making this prettier, but this does the job.

Categories