SOLVED (see below)
On 2D matpotlib scatter plot I can turn on and off points by accessing _offsets property of scatter plot object and setting it's .mask attribute True/False for indexes of those points we want to show/hide like this:
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.widgets import Button
import random
def TogglePoints(event, plot):
plot._offsets.mask = [ [1, 1], [1, 1], [1, 1], [0, 0], [0, 0], [0, 0] ]
plot.figure.canvas.draw()
x = [random.randint(-10, 10) for n in range(6)]
y = [random.randint(-10, 10) for n in range(6)]
ax = plt.axes()
sc = ax.scatter(x, y, marker='o', s=20, alpha=1)
ax_button = plt.axes([0.81, 0.01, 0.1, 0.05])
button= Button(ax_button, "Toggle")
button.on_clicked(lambda event: TogglePoints(event, sc))
plt.show()
When you click the "Toggle" button on the figure, points with indexes 0, 1, 2 will disappear. You can make them re-appear by setting _offsets.mask back to False and re-drawing plot.
This is what I want to achieve with matplotlib 3D scatter plot.
Using _offsets.mask = [ [1, 1], [1, 1], [1, 1], [0, 0], [0, 0], [0, 0] ] on 3D scatter plot doesn't seem to work.
Actually it alters type of underlying array from MaskedArray to numpy.ndarray for some reason (see: Numpy MaskedArray in matplotlib 3D scatter plot, turns into ndarray when called by PyQt5 button click).
I know that 3D scatter plots have _offsets3d property. However I don't know how I can use it to show/hide points on the plot. Or maybe there's some other way ?
Does anyone know how I can do that ?
Thanks to this post:
Get working alpha value of scatter points in mpl_toolkits.basemap.Basemap
I've found a workaround that serves my purpose.
It concerns setting alpha values of points with set_facecolors().
So the working code now looks like this:
...
import pandas as pd #added
def TogglePointsOFF(event, plot):
for n in range(3): # n = index of point
fc_colors[n, 3] = 0 # 4th value is alpha
plot.set_facecolors(fc_colors)
plot.figure.canvas.draw()
def TogglePointsON(event, plot):
for n in range(3): # n = index of point
fc_colors[n, 3] = 1 # 4th value is alpha
plot.set_facecolors(fc_colors)
plot.figure.canvas.draw()
#I've put it into DataFrame() so you can better see
df = pd.DataFrame()
df['label'] = ["data_"+str(n) for n in range(6)]
df['id'] = [1, 1, 1, 2, 2, 2]
['x'] = [random.randint(-10, 10) for n in range(6)]
['y'] = [random.randint(-10, 10) for n in range(6)]
['z'] = [random.randint(-10, 10) for n in range(6)]
colors = {1:'red', 2:'blue'} # to map colors with df 'id'
#plot points colored according to value of df['id']
ax = plt.axes()
sc = ax.scatter(df['x'], df['y'], df['z'], c=df['id'].map(colors), marker='o', s=20, depthshade=False)
global fc_colors #yeah yeah globals...
face_colors = sc._face_colors
ax_button = plt.axes([0.81, 0.01, 0.1, 0.05])
ax_button_1 = plt.axes([0.68, 0.01, 0.12, 0.05])
button= Button(ax_button, "OFF")
button_1= Button(ax_button_1, "ON")
button.on_clicked(lambda event: TogglePointsOFF(event, sc))
button_1.on_clicked(lambda event: TogglePointsON(event, sc))
plt.show()
Clicking buttons "ON" and "OFF" will hide/show group of points based on index.
I've tried using set_alpha() and passing iterable of alpha values like: [0, 0, 0, 1, 1, 1] however it seemed to work on random points and set alpha of incorrect points.
Also getting face_colors from get_facecolors() seemed to get colors with random index alignment. This may be connected why passing iterable with alpha values to set_alpha() didn't work. That's why I take colors of points from: sc._face_colors .
Thank you for your time.
WARNING! Be advised.
This doesn't work when you use any 'official' colormap like this:
sc = ax.scatter(df['x'], df['y'], df['z'], cmap='tab10, vmin=10, vmax=10, marker='o', s=20, depthshade=False)
For setting alpha of points as described above you have to "kind-off" make you own colormap mapping like it was done here:
c=df['id'].map(colors)
or, use Normalizer object to map any colormap to some custom values like this:
from matplotlib.colors import Normalize #added
#let's assume we have some score values coresponding with data points:
score = [random.uniform(0.101, 100.123) for n in range(6)]
#but we can use any iterable with numbers
norm = Normalize(min(score), max(score)
cmap = matplotlib.cm.get_cmap('Spectral') #get some built in colormap
colors = cmap(norm(score))
#now you can use colors as 'c' parameter:
sc = ax.scatter(df['x'], df['y'], df['z'], c=colors, marker='o', s=20, depthshade=False)
Remember! don't put any alpha parameter and use depthshade=False to prevent fading of points in the back of the plot.
I hope you found this usefull.
Keep scrolling.
Related
I would like to create sth like the following graph in matplotlib:
I have x = [0, 1, ..., 10], and for each x I have values from range [0, 60]. Lets say that the black line is the quantile of values for a given i from range x. For selected i I want to add horizontally histogram (with parameter density = True) like in the picture with the possibility to control the width of this histogram (in the picture it goes from 2 to 5 but I would like to set fixed width). How can I do that?
Yes, this is relatively straightforward with inset_axes:
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.random.randn(100)
ax.plot(x)
ylim = ax.get_ylim()
histax = ax.inset_axes([0.3, 0, 0.2, 1], transform=ax.transAxes)
histax.hist(x, orientation='horizontal', alpha=0.5 )
histax.set_facecolor('none')
histax.set_ylim(ylim)
plt.show()
You will probably want to clean up the axes etc, but that is the general idea.
I would like to create a contourf plot with an imposed maximum value and with everything above that value shaded with the last color of the colorbar. In the example code below, which reproduces my problem in my setup, I would like the colorbar to range between -1 and 1, with an extend arrow indicating that values above 1.0 will be shaded with the last color of the colorbar. However, although I have tried several solutions from various stackexchange discussions, the colorbar ranges between -4 and 4, and there is no extend arrow. Please see the minimum reproducible example below.
# import matplotlib (v 3.1.1)
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import matplotlib.path as mpath
import matplotlib as mpl
# import numpy (v 1.17.2)
import numpy as np
# define grid
lon = np.linspace(start = 0, stop = 359, num = 360)
lat = np.linspace(start = -78, stop = -25, num = 52)
[X,Y] = np.meshgrid(lon, lat)
# generate random gaussian data for example purposes
mean = [0, 0]
cov = [[1, 0], [0, 100]]
zz = np.random.multivariate_normal(mean, cov, (np.size(lon),np.size(lat))).T
Z = zz[0,:,:]
# illutrate the maximum value of Z
np.max(Z)
# create plot
plt.figure(figsize=(10, 12))
# select plotting levels (missing min/max on purpose)
mylevs = [-1.0, -0.5, 0, 0.5, 1.0]
# colormap
cmap_cividis = plt.cm.get_cmap('cividis',len(mylevs))
mycolors = list(cmap_cividis(np.arange(len(mylevs))))
cmap = colors.ListedColormap(mycolors[:-1], "")
# set over-color to last color of list
cmap.set_over(mycolors[-1])
# contour plot: random pattern
C1 = plt.contourf(X, Y, Z, cmap = cmap, vmin=-1.0, vmax=1.0,
norm = colors.BoundaryNorm(mylevs, ncolors=len(mylevs)-1, clip=False))
# create colorbar
cbar = plt.colorbar(C1, orientation="horizontal", extend='max')
cbar.ax.tick_params(labelsize=20)
cbar.set_label('Random field', size='xx-large')
I would like the colorbar to stop at 1.0, with an extend arrow pointing to the right, shaded by the last color of the colorbar. Thanks in advance for any help you can provide.
Link to example image produced by the above code
Does this solve it?
fig,ax = plt.subplots()
mylevs = [-1.0, -0.5, 0, 0.5, 1.0]
C1 = ax.contourf(X, Y, Z, cmap = cmap, vmin=-1.0, vmax=1.0,levels=mylevs,extend='both')
fig.colorbar(C1)
I'm trying to build a scatterplot of a large amount of data from multiple classes in python/matplotlib. Unfortunately, it appears that I have to choose between having my data randomised and having legend labels. Is there a way I can have both (preferably without manually coding the labels?)
Minimum reproducible example:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
X = np.random.normal(0, 1, [5000, 2])
Y = np.random.normal(0.5, 1, [5000, 2])
data = np.concatenate([X,Y])
classes = np.concatenate([np.repeat('X', X.shape[0]),
np.repeat('Y', Y.shape[0])])
Plotting with randomized points:
plot_idx = np.random.permutation(data.shape[0])
colors = pd.factorize(classes)
fig, ax = plt.subplots()
ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[plot_idx],
label=classes[plot_idx],
alpha=0.4)
plt.legend()
plt.show()
This gives me the wrong legend.
Plotting with the correct legend:
from matplotlib import cm
unique_classes = np.unique(classes)
colors = cm.Set1(np.linspace(0, 1, len(unique_classes)))
for i, class in enumerate(unique_classes):
ax.scatter(data[classes == class, 0],
data[classes == class, 1],
c=colors[i],
label=class,
alpha=0.4)
plt.legend()
plt.show()
But now the points are not randomized and the resulting plot is not representative of the data.
I'm looking for something that would give me a result like I get as follows in R:
library(ggplot2)
X <- matrix(rnorm(10000, 0, 1), ncol=2)
Y <- matrix(rnorm(10000, 0.5, 1), ncol=2)
data <- as.data.frame(rbind(X, Y))
data$classes <- rep(c('X', 'Y'), times=nrow(X))
plot_idx <- sample(nrow(data))
ggplot(data[plot_idx,], aes(x=V1, y=V2, color=classes)) +
geom_point(alpha=0.4, size=3)
You need to create the legend manually. This is not a big problem though. You can loop over the labels and create a legend entry for each. Here one may use a Line2D with a marker similar to the scatter as handle.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
X = np.random.normal(0, 1, [5000, 2])
Y = np.random.normal(0.5, 1, [5000, 2])
data = np.concatenate([X,Y])
classes = np.concatenate([np.repeat('X', X.shape[0]),
np.repeat('Y', Y.shape[0])])
plot_idx = np.random.permutation(data.shape[0])
colors,labels = pd.factorize(classes)
fig, ax = plt.subplots()
sc = ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[plot_idx],
alpha=0.4)
h = lambda c: plt.Line2D([],[],color=c, ls="",marker="o")
plt.legend(handles=[h(sc.cmap(sc.norm(i))) for i in range(len(labels))],
labels=list(labels))
plt.show()
Alternatively you can use a special scatter handler, as shown in the quesiton Why doesn't the color of the points in a scatter plot match the color of the points in the corresponding legend? but that seems a bit overkill here.
It's a bit of a hack, but you can save the axis limits, set the labels by drawing points well outside the limits of the plot, and then resetting the axis limits as follows:
plot_idx = np.random.permutation(data.shape[0])
color_idx, unique_classes = pd.factorize(classes)
colors = cm.Set1(np.linspace(0, 1, len(unique_classes)))
fig, ax = plt.subplots()
ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[color_idx[plot_idx]],
alpha=0.4)
xlim = ax.get_xlim()
ylim = ax.get_ylim()
for i in range(len(unique_classes)):
ax.scatter(xlim[1]*10,
ylim[1]*10,
c=colors[i],
label=unique_classes[i])
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.legend()
plt.show()
I have sparse scatter plot to visualize the comparison of predicted vs actual values. The range of the values are 1-4 and there are no decimal points.
I have tried plotly so far with hte following code (but I can also use a matplotlib solution):
my_scatter = go.Scatter(
x = y_actual, y = y_pred, mode = 'markers',
marker = dict(color = 'rgb(240, 189, 89)', opacity=0.5)
)
This prints the graph nicely (see below). I use opacity to see the density at each point. I.e. if two points lie on top of each other, the point will be shown in darker color. However, this is not explanatory enough. Is it possible to add the counts at each point as a label? There are some overlaps at certain intersections. I want to display how many points intersects. Can this be done automatically using matplotlib or plotly?
This answer uses matplotlib.
To answer the initial question first: You need to find out how often the data produces a point at a given coordinate to be able to annotate the points. If all values are integers this can easily be done using a 2d histogram. Out of the hstogram one would then select only those bins where the count value is nonzero and annotate the respective values in a loop:
x = [3, 0, 1, 2, 2, 0, 1, 3, 3, 3, 4, 1, 4, 3, 0]
y = [1, 0, 4, 3, 2, 1, 4, 0, 3, 0, 4, 2, 3, 3, 1]
import matplotlib.pyplot as plt
import numpy as np
x = np.array(x)
y = np.array(y)
hist, xbins,ybins = np.histogram2d(y,x, bins=range(6))
X,Y = np.meshgrid(xbins[:-1], ybins[:-1])
X = X[hist != 0]; Y = Y[hist != 0]
Z = hist[hist != 0]
fig, ax = plt.subplots()
ax.scatter(x,y, s=49, alpha=0.4)
for i in range(len(Z)):
ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(4,0),
textcoords="offset points" )
plt.show()
You may then decide not to plot all points but the result from the histogramming which offers the chance to change the color and size of the scatter points,
ax.scatter(X,Y, s=(Z*20)**1.4, c = Z/Z.max(), cmap="winter_r", alpha=0.4)
Since all values are integers, you may also opt for an image plot,
fig, ax = plt.subplots()
ax.imshow(hist, cmap="PuRd")
for i in range(len(Z)):
ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(0,0), color="w",
ha="center", va="center", textcoords="offset points" )
Without the necesity to calculate the number of occurances, another option is to use a hexbin plot. This gives slightly inaccurate positions of the dots, du to the hexagonal binning, but I still wanted to mention this option.
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np
x = np.array(x)
y = np.array(y)
fig, ax = plt.subplots()
cmap = plt.cm.PuRd
cmaplist = [cmap(i) for i in range(cmap.N)]
cmaplist[0] = (1.0,1.0,1.0,1.0)
cmap = matplotlib.colors.LinearSegmentedColormap.from_list('mcm',cmaplist, cmap.N)
ax.hexbin(x,y, gridsize=20, cmap=cmap, linewidth=0 )
plt.show()
I have two vectors, one with values and one with class labels like 1,2,3 etc.
I would like to plot all the points that belong to class 1 in red, to class 2 in blue, to class 3 in green etc. How can I do that?
The accepted answer has it spot on, but if you might want to specify which class label should be assigned to a specific color or label you could do the following. I did a little label gymnastics with the colorbar, but making the plot itself reduces to a nice one-liner. This works great for plotting the results from classifications done with sklearn. Each label matches a (x,y) coordinate.
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
x = [4,8,12,16,1,4,9,16]
y = [1,4,9,16,4,8,12,3]
label = [0,1,2,3,0,1,2,3]
colors = ['red','green','blue','purple']
fig = plt.figure(figsize=(8,8))
plt.scatter(x, y, c=label, cmap=matplotlib.colors.ListedColormap(colors))
cb = plt.colorbar()
loc = np.arange(0,max(label),max(label)/float(len(colors)))
cb.set_ticks(loc)
cb.set_ticklabels(colors)
Using a slightly modified version of this answer, one can generalise the above for N colors as follows:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
N = 23 # Number of labels
# setup the plot
fig, ax = plt.subplots(1,1, figsize=(6,6))
# define the data
x = np.random.rand(1000)
y = np.random.rand(1000)
tag = np.random.randint(0,N,1000) # Tag each point with a corresponding label
# define the colormap
cmap = plt.cm.jet
# extract all colors from the .jet map
cmaplist = [cmap(i) for i in range(cmap.N)]
# create the new map
cmap = cmap.from_list('Custom cmap', cmaplist, cmap.N)
# define the bins and normalize
bounds = np.linspace(0,N,N+1)
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
# make the scatter
scat = ax.scatter(x,y,c=tag,s=np.random.randint(100,500,N),cmap=cmap, norm=norm)
# create the colorbar
cb = plt.colorbar(scat, spacing='proportional',ticks=bounds)
cb.set_label('Custom cbar')
ax.set_title('Discrete color mappings')
plt.show()
Which gives:
Assuming that you have your data in a 2d array, this should work:
import numpy
import pylab
xy = numpy.zeros((2, 1000))
xy[0] = range(1000)
xy[1] = range(1000)
colors = [int(i % 23) for i in xy[0]]
pylab.scatter(xy[0], xy[1], c=colors)
pylab.show()
You can also set a cmap attribute to control which colors will appear through use of a colormap; i.e. replace the pylab.scatter line with:
pylab.scatter(xy[0], xy[1], c=colors, cmap=pylab.cm.cool)
A list of color maps can be found
here
A simple solution is to assign color for each class. This way, we can control how each color is for each class. For example:
arr1 = [1, 2, 3, 4, 5]
arr2 = [2, 3, 3, 4, 4]
labl = [0, 1, 1, 0, 0]
color= ['red' if l == 0 else 'green' for l in labl]
plt.scatter(arr1, arr2, color=color)