Related
Before you reading, I apologize about broken English.
I have a data array of moving mass, and want to show them by time area.
Like:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([0, 1, 2, 3])
y = np.array([3, 4, 5, 6])
plt.plot(x,y)
In this code, I just want to see them with different color each rows.
For example,
point (0,3) is white dot,
point (3,5) is black dot,
and (1,4) , (2,5) are gray dot but different brightness.
I just started python, so I searched pyplot lib but didn't find examples.
I tried with Seaborn library, and Pyplot 3d examples. But didn't find solution to express what want to do.
If your aim is to identify the order of your points, you can use a colormap, specifying that the vector that determines the coloring is simply the sequence of the indices of the points.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(20201222)
x = np.random.randint(5, 20, 20)
y = np.random.randint(5, 20, 20)
r = range(len(x))
plt.scatter(x, y, s=80, c=r, cmap='plasma')
plt.grid()
cb = plt.colorbar()
cb.set_ticks(r)
cb.set_ticklabels(("%d"%(order+1) for order in r))
and eventually the very awaited overworked implementation
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
np.random.seed(20201222)
x = np.random.rand(20)*5+np.arange(20)/3
y = np.random.rand(20)*5+np.arange(20)/3
tck, u = interpolate.splprep([x,y] ,s=0.05)
x1, y1 = interpolate.splev(np.linspace(0, 1, 3333), tck)
r = range(len(x))
plt.plot(x1, y1, color='k', lw=0.4, alpha=0.4)
plt.scatter(x, y, s=60, c=r, cmap='Greys', ec='grey', zorder=4)
plt.xlim((0,12)), plt.ylim((0,12))
plt.grid(1)
plt.gca().set_aspect(1)
cb = plt.colorbar()
cb.set_ticks((0, len(x)-1))
cb.set_ticklabels(('First', 'Last'))
Pass a color argument into a scatter function that displays given points and any defined features.
#array declaration
#...
for x1, y1 in zip(x, y):
if x1 == 0:
color = "white"
elif x1 == 1 or x1 == 2:
color = "gray"
elif x1 == 3:
color = "black"
plt.scatter(x1, y1, c=color)
# plot linear line from arrays
We use the zip class to iterate through both arrays at once, allowing us to plot each point from the given arrays. We use the x-coordinates from the x array to determine what color to label the dot. The scatter function puts this point on the graph, giving us options to change features of the given dot(s).
--
The final code would look something like this:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([0, 1, 2, 3])
y = np.array([3, 4, 5, 6])
for x1, y1 in zip(x, y):
if x1 == 0:
color = "white"
elif x1 == 1 or x1 == 2:
color = "gray"
elif x1 == 3:
color = "black"
plt.scatter(x1, y1, c=color)
plt.plot(x, y)
plt.show()
Documentation on matplotlib's scatter function can be found here
poiboi was on the right track. Here's an example which automatically sets a linear gradient for the colours of the dots.
import matplotlib.pyplot as plt
import numpy as np
x = [0, 1, 2, 3]
y = [3, 4, 5, 6]
plt.scatter(x, y, c=x[:: -1], cmap='gray', vmin=min(x), vmax=max(x))
plt.show()
The c keyword argument tells Matplotlib which colour to use for which point using a grey colourmap. By default, said colourmap goes from black to white, so we pass x reversed. vmin and vmax are the least and greatest values to be assigned colours. Note that the first point is white (hence invisible).
Say I have a simple pyplot:
import matplotlib.pyplot as plt
plt.plot([-1, -4.5, 3.14, 1])
plt.show()
Which generates the following:
How do I show all Integer points in the graph, so it looks like:
You can use plt.xlim/plt.ylim to get the limits and numpy.meshgrid to generate the points, then plt.scatter to plot them:
import matplotlib.pyplot as plt
plt.plot([-1, -4.5, 3.14, 1])
x0,x1 = plt.xlim()
y0,y1 = plt.ylim()
import numpy as np
X,Y = np.meshgrid(np.arange(round(x0), round(x1)+1),
np.arange(round(y0), round(y1)+1))
plt.scatter(X,Y)
output:
Here's a very primitive method.
import matplotlib.pyplot as plt
import numpy as np
plt.plot([-1, -4.5, 3.14, 1])
# Get interger points of x and y within the axes
xlim = np.round(plt.xlim(), 0)
list_x = np.arange(xlim[0], xlim[1]+1)
ylim = np.round(plt.ylim(), 0)
list_y = np.arange(ylim[0], ylim[1]+1)
# Get mesh grids for the points
mesh_x, mesh_y = np.meshgrid(list_x, list_y)
# Make grids to vectors
list_x = mesh_x.flatten()
list_y = mesh_y.flatten()
# Plot points
plt.plot(list_x, list_y, ls="none", marker=".")
plt.show()
import matplotlib.pyplot as plt
l = [-1, -4.5, 3.14, 1]
plt.plot(l)
integers = [[x, y] for x in range(len(l)) for y in range(math.floor(min(l)), math.ceil(max(l)))]
x_int_points, y_int_points = list(zip(*integers))
plt.scatter(x=x_int_points, y=y_int_points)
plt.show()
Generalized:
def plot_with_grid(l):
integers = [[x, y] for x in range(len(l)) \
for y in range(math.floor(min(l)),
math.ceil(max(l)))]
x_int_points, y_int_points = list(zip(*integers))
plt.plot(l)
plt.scatter(x=x_int_points, y=y_int_points)
plt.show()
I have sparse scatter plot to visualize the comparison of predicted vs actual values. The range of the values are 1-4 and there are no decimal points.
I have tried plotly so far with hte following code (but I can also use a matplotlib solution):
my_scatter = go.Scatter(
x = y_actual, y = y_pred, mode = 'markers',
marker = dict(color = 'rgb(240, 189, 89)', opacity=0.5)
)
This prints the graph nicely (see below). I use opacity to see the density at each point. I.e. if two points lie on top of each other, the point will be shown in darker color. However, this is not explanatory enough. Is it possible to add the counts at each point as a label? There are some overlaps at certain intersections. I want to display how many points intersects. Can this be done automatically using matplotlib or plotly?
This answer uses matplotlib.
To answer the initial question first: You need to find out how often the data produces a point at a given coordinate to be able to annotate the points. If all values are integers this can easily be done using a 2d histogram. Out of the hstogram one would then select only those bins where the count value is nonzero and annotate the respective values in a loop:
x = [3, 0, 1, 2, 2, 0, 1, 3, 3, 3, 4, 1, 4, 3, 0]
y = [1, 0, 4, 3, 2, 1, 4, 0, 3, 0, 4, 2, 3, 3, 1]
import matplotlib.pyplot as plt
import numpy as np
x = np.array(x)
y = np.array(y)
hist, xbins,ybins = np.histogram2d(y,x, bins=range(6))
X,Y = np.meshgrid(xbins[:-1], ybins[:-1])
X = X[hist != 0]; Y = Y[hist != 0]
Z = hist[hist != 0]
fig, ax = plt.subplots()
ax.scatter(x,y, s=49, alpha=0.4)
for i in range(len(Z)):
ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(4,0),
textcoords="offset points" )
plt.show()
You may then decide not to plot all points but the result from the histogramming which offers the chance to change the color and size of the scatter points,
ax.scatter(X,Y, s=(Z*20)**1.4, c = Z/Z.max(), cmap="winter_r", alpha=0.4)
Since all values are integers, you may also opt for an image plot,
fig, ax = plt.subplots()
ax.imshow(hist, cmap="PuRd")
for i in range(len(Z)):
ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(0,0), color="w",
ha="center", va="center", textcoords="offset points" )
Without the necesity to calculate the number of occurances, another option is to use a hexbin plot. This gives slightly inaccurate positions of the dots, du to the hexagonal binning, but I still wanted to mention this option.
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np
x = np.array(x)
y = np.array(y)
fig, ax = plt.subplots()
cmap = plt.cm.PuRd
cmaplist = [cmap(i) for i in range(cmap.N)]
cmaplist[0] = (1.0,1.0,1.0,1.0)
cmap = matplotlib.colors.LinearSegmentedColormap.from_list('mcm',cmaplist, cmap.N)
ax.hexbin(x,y, gridsize=20, cmap=cmap, linewidth=0 )
plt.show()
I'd like to make a scatter plot where each point is colored by the spatial density of nearby points.
I've come across a very similar question, which shows an example of this using R:
R Scatter Plot: symbol color represents number of overlapping points
What's the best way to accomplish something similar in python using matplotlib?
In addition to hist2d or hexbin as #askewchan suggested, you can use the same method that the accepted answer in the question you linked to uses.
If you want to do that:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100)
plt.show()
If you'd like the points to be plotted in order of density so that the densest points are always on top (similar to the linked example), just sort them by the z-values. I'm also going to use a smaller marker size here as it looks a bit better:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50)
plt.show()
Plotting >100k data points?
The accepted answer, using gaussian_kde() will take a lot of time. On my machine, 100k rows took about 11 minutes. Here I will add two alternative methods (mpl-scatter-density and datashader) and compare the given answers with same dataset.
In the following, I used a test data set of 100k rows:
import matplotlib.pyplot as plt
import numpy as np
# Fake data for testing
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
Output & computation time comparison
Below is a comparison of different methods.
1: mpl-scatter-density
Installation
pip install mpl-scatter-density
Example code
import mpl_scatter_density # adds projection='scatter_density'
from matplotlib.colors import LinearSegmentedColormap
# "Viridis-like" colormap with white background
white_viridis = LinearSegmentedColormap.from_list('white_viridis', [
(0, '#ffffff'),
(1e-20, '#440053'),
(0.2, '#404388'),
(0.4, '#2a788e'),
(0.6, '#21a784'),
(0.8, '#78d151'),
(1, '#fde624'),
], N=256)
def using_mpl_scatter_density(fig, x, y):
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
density = ax.scatter_density(x, y, cmap=white_viridis)
fig.colorbar(density, label='Number of points per pixel')
fig = plt.figure()
using_mpl_scatter_density(fig, x, y)
plt.show()
Drawing this took 0.05 seconds:
And the zoom-in looks quite nice:
2: datashader
Datashader is an interesting project. It has added support for matplotlib in datashader 0.12.
Installation
pip install datashader
Code (source & parameterer listing for dsshow):
import datashader as ds
from datashader.mpl_ext import dsshow
import pandas as pd
def using_datashader(ax, x, y):
df = pd.DataFrame(dict(x=x, y=y))
dsartist = dsshow(
df,
ds.Point("x", "y"),
ds.count(),
vmin=0,
vmax=35,
norm="linear",
aspect="auto",
ax=ax,
)
plt.colorbar(dsartist)
fig, ax = plt.subplots()
using_datashader(ax, x, y)
plt.show()
It took 0.83 s to draw this:
There is also possibility to colorize by third variable. The third parameter for dsshow controls the coloring. See more examples here and the source for dsshow here.
3: scatter_with_gaussian_kde
def scatter_with_gaussian_kde(ax, x, y):
# https://stackoverflow.com/a/20107592/3015186
# Answer by Joel Kington
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)
ax.scatter(x, y, c=z, s=100, edgecolor='')
It took 11 minutes to draw this:
4: using_hist2d
import matplotlib.pyplot as plt
def using_hist2d(ax, x, y, bins=(50, 50)):
# https://stackoverflow.com/a/20105673/3015186
# Answer by askewchan
ax.hist2d(x, y, bins, cmap=plt.cm.jet)
It took 0.021 s to draw this bins=(50,50):
It took 0.173 s to draw this bins=(1000,1000):
Cons: The zoomed-in data does not look as good as in with mpl-scatter-density or datashader. Also you have to determine the number of bins yourself.
5: density_scatter
The code is as in the answer by Guillaume.
It took 0.073 s to draw this with bins=(50,50):
It took 0.368 s to draw this with bins=(1000,1000):
Also, if the number of point makes KDE calculation too slow, color can be interpolated in np.histogram2d [Update in response to comments: If you wish to show the colorbar, use plt.scatter() instead of ax.scatter() followed by plt.colorbar()]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import Normalize
from scipy.interpolate import interpn
def density_scatter( x , y, ax = None, sort = True, bins = 20, **kwargs ) :
"""
Scatter plot colored by 2d histogram
"""
if ax is None :
fig , ax = plt.subplots()
data , x_e, y_e = np.histogram2d( x, y, bins = bins, density = True )
z = interpn( ( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1]) ) , data , np.vstack([x,y]).T , method = "splinef2d", bounds_error = False)
#To be sure to plot all data
z[np.where(np.isnan(z))] = 0.0
# Sort the points by density, so that the densest points are plotted last
if sort :
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
ax.scatter( x, y, c=z, **kwargs )
norm = Normalize(vmin = np.min(z), vmax = np.max(z))
cbar = fig.colorbar(cm.ScalarMappable(norm = norm), ax=ax)
cbar.ax.set_ylabel('Density')
return ax
if "__main__" == __name__ :
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
density_scatter( x, y, bins = [30,30] )
You could make a histogram:
import numpy as np
import matplotlib.pyplot as plt
# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)
plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()
I have some surface data that is generated by an external program as XYZ values. I want to create the following graphs, using matplotlib:
Surface plot
Contour plot
Contour plot overlayed with a surface plot
I have looked at several examples for plotting surfaces and contours in matplotlib - however, the Z values seems to be a function of X and Y i.e. Y ~ f(X,Y).
I assume that I will somehow need to transform my Y variables, but I have not seen any example yet, that shows how to do this.
So, my question is this: given a set of (X,Y,Z) points, how may I generate Surface and contour plots from that data?
BTW, just to clarify, I do NOT want to create scatter plots. Also although I mentioned matplotlib in the title, I am not averse to using rpy(2), if that will allow me to create these charts.
for do a contour plot you need interpolate your data to a regular grid http://www.scipy.org/Cookbook/Matplotlib/Gridding_irregularly_spaced_data
a quick example:
>>> xi = linspace(min(X), max(X))
>>> yi = linspace(min(Y), max(Y))
>>> zi = griddata(X, Y, Z, xi, yi)
>>> contour(xi, yi, zi)
for the surface http://matplotlib.sourceforge.net/examples/mplot3d/surface3d_demo.html
>>> from mpl_toolkits.mplot3d import Axes3D
>>> fig = figure()
>>> ax = Axes3D(fig)
>>> xim, yim = meshgrid(xi, yi)
>>> ax.plot_surface(xim, yim, zi)
>>> show()
>>> help(meshgrid(x, y))
Return coordinate matrices from two coordinate vectors.
[...]
Examples
--------
>>> X, Y = np.meshgrid([1,2,3], [4,5,6,7])
>>> X
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> Y
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]])
contour in 3D http://matplotlib.sourceforge.net/examples/mplot3d/contour3d_demo.html
>>> fig = figure()
>>> ax = Axes3D(fig)
>>> ax.contour(xi, yi, zi) # ax.contourf for filled contours
>>> show()
With pandas and numpy to import and manipulate data, with matplot.pylot.contourf to plot the image
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
PATH='/YOUR/CSV/FILE'
df=pd.read_csv(PATH)
#Get the original data
x=df['COLUMNNE']
y=df['COLUMNTWO']
z=df['COLUMNTHREE']
#Through the unstructured data get the structured data by interpolation
xi = np.linspace(x.min()-1, x.max()+1, 100)
yi = np.linspace(y.min()-1, y.max()+1, 100)
zi = griddata(x, y, z, xi, yi, interp='linear')
#Plot the contour mapping and edit the parameter setting according to your data (http://matplotlib.org/api/pyplot_api.html?highlight=contourf#matplotlib.pyplot.contourf)
CS = plt.contourf(xi, yi, zi, 5, levels=[0,50,100,1000],colors=['b','y','r'],vmax=abs(zi).max(), vmin=-abs(zi).max())
plt.colorbar()
#Save the mapping and save the image
plt.savefig('/PATH/OF/IMAGE.png')
plt.show()
Example Image
Contour plot with rpy2 + ggplot2:
from rpy2.robjects.lib.ggplot2 import ggplot, aes_string, geom_contour
from rpy2.robjects.vectors import DataFrame
# Assume that data are in a .csv file with three columns X,Y,and Z
# read data from the file
dataf = DataFrame.from_csv('mydata.csv')
p = ggplot(dataf) + \
geom_contour(aes_string(x = 'X', y = 'Y', z = 'Z'))
p.plot()
Surface plot with rpy2 + lattice:
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import DataFrame
from rpy2.robjects import Formula
lattice = importr('lattice')
rprint = robjects.globalenv.get("print")
# Assume that data are in a .csv file with three columns X,Y,and Z
# read data from the file
dataf = DataFrame.from_csv('mydata.csv')
p = lattice.wireframe(Formula('Z ~ X * Y'), shade = True, data = dataf)
rprint(p)