How to scale legend elements down in a scatterplot matplotlib? - python

Lets say I have this scatterplot and would like to keep the size of the dots in the plot but in the legend I would like to have the size denoted as 1,2,... instead of 50,100,...
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
a2 = 300*np.random.rand(N)
sc = plt.scatter(x, y, s=a2, alpha=0.5)
plt.legend(*sc.legend_elements("sizes", num=6))
plt.show()

It depends. If the numbers you want to show are just arbitrary, i.e. unrelated to the actual sizes, you can supply a list of numbers as labels.
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
a2 = 300*np.random.rand(N)
sc = plt.scatter(x, y, s=a2, alpha=0.5)
plt.legend(sc.legend_elements("sizes", num=6)[0], [1,2,3,4,5])
plt.show()
If, however, there is a relation between the numbers to show and some data,
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
a3 = np.random.randint(1,6, size=N)
f = lambda a: 12*a**2 # function to calculate size from data
g = lambda s: np.sqrt(s/12) # inverse function to calc. data from size
sc = plt.scatter(x, y, s=f(a3), alpha=0.5)
plt.legend(*sc.legend_elements("sizes", num=5, func=g))
plt.show()

Related

Scatter plot markers color based on custom scale

I want to color my scatter points based on a custom color scale but I got this error
ValueError: 'c' argument has 150 elements, which is inconsistent with 'x' and 'y' with size 100.
For the example below, it seems like the length of t needs to be the same as x and y.
However, I want to color the points with a wider scale, for example -50 to 150 instead of 0 to 100.
How can I do this?
Thanks
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(100)
y = np.random.rand(100)
t = np.arange(100)
plt.scatter(x, y, c=t)
plt.show()
I'm not really sure what your goal is, so I present you two answers: hopefully one is the one you are looking for.
First:
Simply create an array t of 100 elements ranging from -50 to 150:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(100)
y = np.random.rand(100)
t = np.linspace(-50, 150, len(x))
fig, ax = plt.subplots()
sc = ax.scatter(x, y, c=t)
fig.colorbar(sc, label="value")
plt.show()
Second:
Create an array t with 100 elements, ranging from 0 to 100. Asks matplotlib to visualize this colors in the range from -50 to 150.
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(100)
y = np.random.rand(100)
t = np.arange(100)
fig, ax = plt.subplots()
sc = ax.scatter(x, y, c=t, vmin=-50, vmax=150)
fig.colorbar(sc, label="value")
plt.show()

How to fix "not enough values to unpack" when trying to plot 3D data as colormesh?

With matplotlib I am trying to plot 3D data as a 2D colormap. Each point has a x and a y coordinate, and a 'height' z. This height should determine the color a certain x/y region is colored in.
Here is the code I have been trying:
import random
import numpy as np
import matplotlib.pyplot as plt
x = []
y = []
z = []
for index in range(100):
a = random.random()
b = random.random()
c = np.exp(-a*a - b*b)
x.append(a)
y.append(b)
z.append(c)
cmap = plt.get_cmap('PiYG')
fig, ax = plt.subplots()
ax.pcolormesh(x, y, z, cmap=cmap)
But it gives an error
ValueError: not enough values to unpack (expected 2, got 1)
Maybe I am trying the wrong thing?
Remark: The three lists x,y,z and calculated for the example above, but in reality I have just three lists with "random" numbers in it I want to vizualize. I cannot calculate z given x and y.
I could also use imshow to create the plot I want, but I have to convert my original data into a matrix first. Maybe there is a function I can use?
pcolormesh might not be the choice for this kind of problem. pcolormesh expects ordered cell edges as data rather than random data points. You could do this if you know your grid before hand e.g.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 1, 51)
# meshgrid makes a 2D grid of points
xx, yy = np.meshgrid(x, x)
z = np.exp(-xx**2 - yy*2)
fig, ax = plt.subplots()
ax.pcolormesh(xx, yy, z, cmap="PiYG")
which will give you
Alternatively, you could use one of the tri functions such as tripcolor with your existing setup
import random
import numpy as np
import matplotlib.pyplot as plt
x = []
y = []
z = []
for index in range(100):
a = random.random()
b = random.random()
c = np.exp(-a*a - b*b)
x.append(a)
y.append(b)
z.append(c)
fig, ax = plt.subplots()
ax.tripcolor(x, y, z, cmap="PiYG")
which will give
Note it would be simpler to use np.random to generate your data
x, y = np.random.random(size=(2, 100))
z = np.exp(-x**2 - y**2)
fig, ax = plt.subplots()
ax.tripcolor(x, y, z, cmap="PiYG")
There is an issue with x, y and z shapes: they have to be 2D arrays (matrices) but they are 1-dimensional.
In order to generate x and y axis, you could use:
x = []
y = []
for index in range(100):
x.append(random.random())
y.append(random.random())
Then you have to create a meshgrid:
X, Y = np.meshgrid(x, y)
Finally you can compute Z over the meshgrid:
Z = np.exp(-X**2 - Y**2)
In this way, your code:
cmap = plt.get_cmap('PiYG')
fig, ax = plt.subplots()
ax.pcolormesh(X, Y, Z, cmap=cmap)
gives:
If you you cannot compute Z on the meshgrid, then you should not use pcolormesh.
Some alternative could be:
3D scatterplot:
import random
import numpy as np
import matplotlib.pyplot as plt
x = []
y = []
z = []
for index in range(100):
a = random.random()
b = random.random()
c = np.exp(-a*a - b*b)
x.append(a)
y.append(b)
z.append(c)
cmap = plt.get_cmap('PiYG')
fig = plt.figure()
ax = fig.add_subplot(projection = '3d')
ax.scatter(x, y, z, cmap=cmap)
plt.show()
2D colored scatterplot:
import random
import numpy as np
import matplotlib.pyplot as plt
x = []
y = []
z = []
for index in range(100):
a = random.random()
b = random.random()
c = np.exp(-a*a - b*b)
x.append(a)
y.append(b)
z.append(c)
cmap = plt.get_cmap('PiYG')
plt.style.use('seaborn-darkgrid')
fig, ax = plt.subplots()
ax.scatter(x, y, c = z, cmap=cmap)
plt.show()

Attempting to create a color map for most overlapping points

I'm running into an issue trying to create a color map within a scatterplot. Here's the portion of my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
f, ax = plt.subplots()
xy = np.vstack([x, y])
xy = xy[~np.isnan(xy)]
z = gaussian_kde(xy)(xy)
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
plt.scatter(x, y, c=z, cmap='Reds', alpha=0.5)
x and y are both columns within my panda dataframe and they both do have NaN values. I tried taking out all the NaN values by doing ~np.isnan(xy) to only get actual values since it wasn't allowing me to take infs or NaNs since I believe gaussian_kde() was throwing that error. Also, both columns don't align with each other in terms of where those NaN values are and one column has more NaN values than the other. Both also have the same amount of elements. When I run my code, it just keeps running and I have to stop it. Any ideas what's possibly wrong?
You have to filter the Nans using:
inds = ~np.logical_or(np.isnan(x), np.isnan(y))
x = x[inds]
y = y[inds]
From this example, I think your code should look like:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# removing nans in both vectors at the same place
inds = ~np.logical_or(np.isnan(x), np.isnan(y))
x = x[inds]
y = y[inds]
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100, edgecolor='')
plt.show()
Just keep in mind that if x and y are very large vectors, gaussian_kde can take a long time to run. For a vector length of 50000, it takes about 40.5 sec to run.

Matplotllib: set "bad" colour in scatter plot

I'm having problems setting the color of np.nan values in my data set.
I already managed to get the camp.set_bad working in imshow plots but it does not work in plt.scatter.
Anyways, my main goal is assigning a specific colour to bad values.
This is how I though it would work (but it does not ;-)
import matplotlib.pyplot as plt
import numpy as np
n = 20
x = y = np.linspace(1, 10, n)
c = np.random.random_sample((n,))
c[4] = np.nan
c[8:12] = np.nan
cmap = plt.get_cmap('plasma')
cmap.set_bad(color='black', alpha = 1.)
plt.scatter(x, y, c=c, s=200, cmap=cmap)
This gives me the following output:
Of course, I could divide the dataset into two separate sets and overplot them, but I'm quite sure there is a much cleaner solution.
There is no black color in cmap plasma.
Array c has to store indexes of colors which your select from current color map cmap. If your set c as NaN it means you do not get a object for these indices (4 and 8:12) on a scatter-plot.
The first variant is to set color for selected indices manually:
import matplotlib.pyplot as plt
import numpy as np
n = 20
x = y = np.linspace(1, 10, n)
c = np.random.random_sample((n,))
#c[4] = np.nan
#c[8:12] = np.nan
c[4]=c[8:12]=0 # first color use to mark 4 and 8:12 elements
cmap = plt.get_cmap('plasma')
plt.scatter(x, y, s=200, c=c, cmap=cmap)
plt.show()
The second variant is to draw two scatter-plots:
import matplotlib.pyplot as plt
import numpy as np
n = 20
x = y = np.linspace(1, 10, n)
c = np.random.random_sample((n,))
c[4] = np.nan
c[8:12] = np.nan
cmap = plt.get_cmap('plasma')
# plot good values
indices = ~np.isnan(c)
plt.scatter(x[indices], y[indices], s=200, c=c[indices], cmap=cmap)
# plot bad values
plt.scatter(x[~indices], y[~indices], s=200, c='k')
plt.show()

Vertically fill 3d matplotlib plot

I have a 3d plot made using matplotlib. I now want to fill the vertical space between the drawn line and the x,y axis to highlight the height of the line on the z axis. On a 2d plot this would be done with fill_between but there does not seem to be anything similar for a 3d plot. Can anyone help?
here is my current code
from stravalib import Client
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
... code to get the data ....
mpl.rcParams['legend.fontsize'] = 10
fig = plt.figure()
ax = fig.gca(projection='3d')
zi = alt
x = df['x'].tolist()
y = df['y'].tolist()
ax.plot(x, y, zi, label='line')
ax.legend()
plt.show()
and the current plot
just to be clear I want a vertical fill to the x,y axis intersection NOT this...
You're right. It seems that there is no equivalent in 3D plot for the 2D plot function fill_between. The solution I propose is to convert your data in 3D polygons. Here is the corresponding code:
import math as mt
import matplotlib.pyplot as pl
import numpy as np
import random as rd
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
# Parameter (reference height)
h = 0.0
# Code to generate the data
n = 200
alpha = 0.75 * mt.pi
theta = [alpha + 2.0 * mt.pi * (float(k) / float(n)) for k in range(0, n + 1)]
xs = [1.0 * mt.cos(k) for k in theta]
ys = [1.0 * mt.sin(k) for k in theta]
zs = [abs(k - alpha - mt.pi) * rd.random() for k in theta]
# Code to convert data in 3D polygons
v = []
for k in range(0, len(xs) - 1):
x = [xs[k], xs[k+1], xs[k+1], xs[k]]
y = [ys[k], ys[k+1], ys[k+1], ys[k]]
z = [zs[k], zs[k+1], h, h]
#list is necessary in python 3/remove for python 2
v.append(list(zip(x, y, z)))
poly3dCollection = Poly3DCollection(v)
# Code to plot the 3D polygons
fig = pl.figure()
ax = Axes3D(fig)
ax.add_collection3d(poly3dCollection)
ax.set_xlim([min(xs), max(xs)])
ax.set_ylim([min(ys), max(ys)])
ax.set_zlim([min(zs), max(zs)])
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
pl.show()
It produces the following figure:
I hope this will help you.

Categories