Basic scatter plot with reference data on diagonal (identity line) - python

I have two arrays x,y obtained from a machine learning calculations and I wish to make a scatter plot with the reference data x on the diagonal in a way to visualize better the predicted values y against the true ones x. Please can you suggest me how to do it in python or gnuplot?

import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
plt.scatter(x, y, c=colors)
plt.plot( [0,1],[0,1] )
plt.savefig('a.png')
This will produce:
Check this page for more information.

a simple example:
import matplotlib.pyplot as plt
import numpy as np
x=np.linspace(0,100,101)
y=np.random.normal(x) # add some noise
plt.plot(x,y,'r.') # x vs y
plt.plot(x,x,'k-') # identity line
plt.xlim(0,100)
plt.ylim(0,100)
plt.show()

In matplotlib, you can also draw an "infinite" line in order to avoid having to define the exact coordinates. For example, if you have an axes ax, you can do:
pt = (0, 0)
ax.axline(pt, slope=1, color='black')
where pt is an intersection point. Note if pt isn't included in the limits of the plot, the limits will be modified to include it.

Related

How do I alternate the color of a scatter plot in matplotlib?

I have an array of data points of dimension n by 2, where the second dimension $2$ corresponds to the real part and imaginary part of a complex number.
Now I know the data points will intersect the unit circle on the plane for a couple of times. What I want to implement is: suppose the path starts will some color, it changes to another color when it touches the unit circle on the plane and changes color again if it intersects the unit circle again. I am not sure whether there is an easy to implement this.
You may want to try
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
X = [1,2,3,4]
Y1 = [4,8,12,16]
Y2 = [1,4,9,16]
plt.scatter(X,Y1,color='red')
plt.scatter(X,Y2,color='blue')
plt.show()
or try
x = np.arange(10)
ys = [i+x+(i*x)**2 for i in range(10)]
colors = cm.rainbow(np.linspace(0, 1, len(ys)))
for y, c in zip(ys, colors):
plt.scatter(x, y, color=c)
plt.show()
You may also want to check out this thread as well:
Setting different color for each series in scatter plot on matplotlib
The simplest way to do this is probably to implement the logic outside the plot, by assigning a different group to each point defined by your circle-crossing concept.
Once you have these group-indexes it's a simple plt.scatter, using the c (stands for "color") input.
Good luck!
Try something along the lines of
# Import matplotlib module as plt
import matplotlib.pyplot as plt
import math
x = [3,7,1,9,5,3,5,8,math.sqrt(3)/2]
y = [4,7,8,2,3,4,5,1,1/2]
# Plot scatter Plot
for i in range(len(x)):
if (round(x[i]**2+y[i]**2,2)) == 1: # equation of unit circle is x^2+y^2=1
plt.scatter(x[i],y[i], color ='g',marker ='.')
else:
plt.scatter(x[i],y[i], color ='r',marker ='*')
plt.xlabel('x')
plt.ylabel('y')
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Scatter Plot')
plt.legend()
plt.show()

How can I change de parameters of gaussian_kde for a scatter plot colored by density in matplotlib

As explained by Joe Kington answering in this question : How can I make a scatter plot colored by density in matplotlib, I made a scatter plot colored by density. However, due to the complex distribution of my data, I would like to change the parameters used to calculate the density.
Here is the results with some fake data similar to mine :
I would want to calibrate the density calculations of gaussian_kde so that the left part of the plot looks like this :
I don't like the first plot because the groups of points influence the density of adjacent groups of points and that prevents me from analyzing the distribution within a group. In other words, even if each of the 8 groups have exactly the same distribution, that won't be visible on the graph.
I tried to modify the covariance_factor (like I once did for a 2d plot of density over x), but when gaussian_kde is used with multiple dimension arrays it returns a numpy.ndarray, not a "scipy.stats.kde.gaussian_kde" object. Plus, I don't even know if changing the covariance_factor will do it.
Here's my dummy code :
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
a = np.random.normal(size=1000)
b = np.random.normal(size=1000)
# Data for the first image
x = np.concatenate((a+10,a+10,a+20,a+20,a+30,a+30,a+40,a+40,a+80))
y = np.concatenate((b+10,b-10,b+10,b-10,b+10,b-10,b+10,b-10,b*4))
# Data for the second image
#x = np.concatenate((a+10,a+10,a+20,a+20,a+30,a+30,a+40,a+40))
#y = np.concatenate((b+10,b-10,b+10,b-10,b+10,b-10,b+10,b-10))
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# My unsuccesfull try to modify covariance which would work in 1D with "z = gaussian_kde(x)"
#z.covariance_factor = lambda : 0.01
#z._compute_covariance()
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50, edgecolor='')
plt.show()
The solution could use an other density calculator, I don't mind.
The goal is to make a density plot like the ones showed above, where I can play with the density parameters.
I'm using python 3.4.3
Did have a look at Seaborn? It's not exactly what you're asking for, but it already has functions for generating density plots:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kendalltau
import seaborn as sns
# Generate fake data
a = np.random.normal(size=1000)
b = np.random.normal(size=1000)
# Data for the first image
x = np.concatenate((a+10, a+10, a+20, a+20, a+30, a+30, a+40, a+40, a+80))
y = np.concatenate((b+10, b-10, b+10, b-10, b+10, b-10, b+10, b-10, b*4))
sns.jointplot(x, y, kind="hex", stat_func=kendalltau)
sns.jointplot(x, y, kind="kde", stat_func=kendalltau)
plt.show()
It gives:
and

Matplotlib: Coloring scatter plot by density relative to another data set

I'm new to Python and having some trouble with matplotlib. I currently have data that is contained in two numpy arrays, call them x and y, that I am plotting on a scatter plot with coordinates for each point (x, y) (i.e I have points x[0], y[0] and x1, y1 and so on on my plot). I have been using the following code segment to color the points in my scatter plot based on the spatial density of nearby points (found this on another stackoverflow post):
http://prntscr.com/abqowk
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
x = np.random.normal(size=1000)
y = x*3 + np.random.normal(size=1000)
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
idx = z.argsort()
fig,ax = plt.subplots()
ax.scatter(x,y,c=z,s=50,edgecolor='')
plt.show()
Output:
I've been using it without being sure exactly how it works (namely the point density calculation - if someone could explain how exactly that works, would also be much appreciated).
However, now I'd like to color code by the ratio of the spatial density of points in x,y to that of the spatial density of points in another set of numpy arrays, call them x2, y2. That is, I would like to make a plot such that I can identify how the density of points in x,y compares to the points in x2,y2 on the same scatter plot. Could someone please explain how I could go about doing this?
Thanks in advance for your help!
I've been trying to do the same thing based on that same earlier post, and I think I just figured it out! The trick is to use matplotlib.colors.Normalize() to define a scale and then weight it according to some data set (xnorm,ynorm):
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mplc
import matplotlib.cm as cm
from scipy.stats import gaussian_kde
def kdeplot(x,y,xnorm,ynorm):
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
wt = 1.0*len(x)/(len(xnorm)*1.0)
norm = mplc.Normalize(vmin=0, vmax=8/wt)
cmap = cm.gnuplot
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
args = (x,y)
kwargs = {'c':z,'s':10,'edgecolor':'','cmap':cmap,'norm':norm}
return args, kwargs
# (x1,y1) is some data set whose density map coloring you
# want to scale to (xnorm,ynorm)
args,kwargs = kdeplot(x1,y1,xnorm,ynorm)
plt.scatter(*args,**kwargs)
I used trial and error to optimize my normalization for my particular data and choice of colormap. Here's what my data looks like scaled to itself; here's my data scaled to some comparison data (which is on the bottom of that image).
I'm not sure this method is entirely general, but it works in my case: I know that my data and the comparison data are in similar regions of parameter space, and they both have gaussian scatter, so I can use a naive linear scaling determined by the number of data points and it results in something that gives the right idea visually.

Waterfall plot python?

Is there a python module that will do a waterfall plot like MATLAB does? I googled 'numpy waterfall', 'scipy waterfall', and 'matplotlib waterfall', but did not find anything.
You can do a waterfall in matplotlib using the PolyCollection class. See this specific example to have more details on how to do a waterfall using this class.
Also, you might find this blog post useful, since the author shows that you might obtain some 'visual bug' in some specific situation (depending on the view angle chosen).
Below is an example of a waterfall made with matplotlib (image from the blog post):
(source: austringer.net)
Have a look at mplot3d:
# copied from
# http://matplotlib.sourceforge.net/mpl_examples/mplot3d/wire3d_demo.py
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X, Y, Z = axes3d.get_test_data(0.05)
ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)
plt.show()
I don't know how to get results as nice as Matlab does.
If you want more, you may also have a look at MayaVi: http://mayavi.sourceforge.net/
The Wikipedia type of Waterfall chart one can obtain also like this:
import numpy as np
import pandas as pd
def waterfall(series):
df = pd.DataFrame({'pos':np.maximum(series,0),'neg':np.minimum(series,0)})
blank = series.cumsum().shift(1).fillna(0)
df.plot(kind='bar', stacked=True, bottom=blank, color=['r','b'])
step = blank.reset_index(drop=True).repeat(3).shift(-1)
step[1::3] = np.nan
plt.plot(step.index, step.values,'k')
test = pd.Series(-1 + 2 * np.random.rand(10), index=list('abcdefghij'))
waterfall(test)
I have generated a function that replicates the matlab waterfall behaviour in matplotlib. That is:
It generates the 3D shape as many independent and parallel 2D curves
Its color comes from a colormap in the z values
I started from two examples in matplotlib documentation: multicolor lines and multiple lines in 3d plot. From these examples, I only saw possible to draw lines whose color varies following a given colormap according to its z value following the example, which is reshaping the input array to draw the line by segments of 2 points and setting the color of the segment to the z mean value between these 2 points.
Thus, given the input matrixes n,m matrixes X,Y and Z, the function loops over the smallest dimension between n,m to plot each of the waterfall plot independent lines as a line collection of the 2 points segments as explained above.
def waterfall_plot(fig,ax,X,Y,Z,**kwargs):
'''
Make a waterfall plot
Input:
fig,ax : matplotlib figure and axes to populate
Z : n,m numpy array. Must be a 2d array even if only one line should be plotted
X,Y : n,m array
kwargs : kwargs are directly passed to the LineCollection object
'''
# Set normalization to the same values for all plots
norm = plt.Normalize(Z.min().min(), Z.max().max())
# Check sizes to loop always over the smallest dimension
n,m = Z.shape
if n>m:
X=X.T; Y=Y.T; Z=Z.T
m,n = n,m
for j in range(n):
# reshape the X,Z into pairs
points = np.array([X[j,:], Z[j,:]]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
# The values used by the colormap are the input to the array parameter
lc = LineCollection(segments, cmap='plasma', norm=norm, array=(Z[j,1:]+Z[j,:-1])/2, **kwargs)
line = ax.add_collection3d(lc,zs=(Y[j,1:]+Y[j,:-1])/2, zdir='y') # add line to axes
fig.colorbar(lc) # add colorbar, as the normalization is the same for all
# it doesent matter which of the lc objects we use
ax.auto_scale_xyz(X,Y,Z) # set axis limits
Therefore, plots looking like matlab waterfall can be easily generated with the same input matrixes as a matplotlib surface plot:
import numpy as np; import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from mpl_toolkits.mplot3d import Axes3D
# Generate data
x = np.linspace(-2,2, 500)
y = np.linspace(-2,2, 60)
X,Y = np.meshgrid(x,y)
Z = np.sin(X**2+Y**2)-.2*X
# Generate waterfall plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
waterfall_plot(fig,ax,X,Y,Z,linewidth=1.5,alpha=0.5)
ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
fig.tight_layout()
The function assumes that when generating the meshgrid, the x array is the longest, and by default the lines have fixed y, and its the x coordinate what varies. However, if the size of the y array is longer, the matrixes are transposed, generating the lines with fixed x. Thus, generating the meshgrid with the sizes inverted (len(x)=60 and len(y)=500) yields:
To see what are the possibilities of the **kwargs argument, refer to the LineCollection class documantation and to its set_ methods.

how do you radially 'sweep out' a 1D array to plot 3d figure in python? (to represent a wavefunction)

effectively I have a large 1D array of heights. As a small example consider:
u=array([0,1,2,1,0,2,4,6,4,2,1])
and a 1D array, the same size as u, of radial values which the heights correspond to, e.g.:
r=array([0,1,2,3,4,5,6,7,8,9,10])
Obviously plotting these with:
pylab.plot(r,u)
gives a nice 2D plot.
How can one sweep this out around 360 degrees, to give a 3D contour/surface plot?
If you can imagine it should look like a series of concentric, circular ridges, like for the wavefunction of an atom.
any help would be much appreciated!
You're better off with something more 3D oriented than matplotlib, in this case...
Here's a quick example using mayavi:
from enthought.mayavi import mlab
import numpy as np
# Generate some random data along a straight line in the x-direction
num = 100
x = np.arange(num)
y, z = np.ones(num), np.ones(num)
s = np.cumsum(np.random.random(num) - 0.5)
# Plot using mayavi's mlab api
fig = mlab.figure()
# First we need to make a line source from our data
line = mlab.pipeline.line_source(x,y,z,s)
# Then we apply the "tube" filter to it, and vary the radius by "s"
tube = mlab.pipeline.tube(line, tube_sides=20, tube_radius=1.0)
tube.filter.vary_radius = 'vary_radius_by_scalar'
# Now we display the tube as a surface
mlab.pipeline.surface(tube)
# And finally visualize the result
mlab.show()
#!/usr/bin/python
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
import numpy as np
from scipy.interpolate import interp1d
from matplotlib import cm
from matplotlib import pyplot as plt
step = 0.04
maxval = 1.0
fig = plt.figure()
ax = Axes3D(fig)
u=np.array([0,1,2,1,0,2,4,6,4,2,1])
r=np.array([0,1,2,3,4,5,6,7,8,9,10])
f=interp1d(r,u)
# walk along the circle
p = np.linspace(0,2*np.pi,50)
R,P = np.meshgrid(r,p)
# transform them to cartesian system
X,Y = R*np.cos(P),R*np.sin(P)
Z=f(R)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet)
ax.set_xticks([])
plt.show()

Categories