Ugly Outline When Rasterizing Errorbar - python

I'm plotting a large number of data points with errors using Matplotlib (version 2.2.5), and I'm rasterizing the data because there are a few thousand data points. I've found that when I rasterize the data and save as a PDF, however, the error bars produce an ugly white outline that isn't acceptable for publication. I've constructed a MWE that shows the problem:
import numpy as np
import random as rand
import matplotlib.pyplot as plt
rand.seed(10)
seeds = range(0, 1000)
data = np.empty((len(seeds), 2))
for n in seeds:
data[n, 0] = rand.gauss(1, 0.01)
data[n, 1] = rand.gauss(1, 0.01)
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.scatter(data[:, 0], data[:, 1], s=10, facecolors="k", rasterized=True, zorder=1)
ax.errorbar(data[:, 0], data[:, 1], xerr=0.01, yerr=0.01, color="k", fmt="none", rasterized=True, zorder=2)
fig.savefig("Test.pdf", dpi=250)
This looks fine in the Jupyter Notebook output, and also as a saved PNG file. The output PDF file, however, looks like this:
How do I get rid of that white fuzz caused by the error bars? If I don't rasterize, the problem vanishes, but then the file takes annoyingly long to load in my paper, and the last thing I want to do is annoy my reader.

I found the solution thanks to an older question: I needed to add ax.set_rasterization_zorder(0) to the code and change the zorder of the plotted points to be below 0. This produced a perfect graph that has no ugly outlines of the data and retains a vectorized axis, exactly what I wanted. The working code is:
import numpy as np
import random as rand
import matplotlib.pyplot as plt
rand.seed(10)
seeds = range(0, 1000)
data = np.empty((len(seeds), 2))
for n in seeds:
data[n, 0] = rand.gauss(1, 0.01)
data[n, 1] = rand.gauss(1, 0.01)
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.scatter(data[:, 0], data[:, 1], s=10, facecolors="k", rasterized=True, zorder=-2)
ax.errorbar(data[:, 0], data[:, 1], xerr=0.01, yerr=0.01, color="k", fmt="none", rasterized=True, zorder=-1)
ax.set_rasterization_zorder(0)
fig.savefig("Test.pdf", dpi=250)
and the output is:

Related

coloring lines generated from numerical points

I calculate the eigenvalues of large matrices depending on a parameter and would like to plot the eigenvalues in different colors. So I do not have functions where I can conveniently plot different functions in different colors, but instead I just have a set of points which just get connected as interpolation. My problem is that the lines should be intersecting, but that cannot be achieved with this numerical approach.
Maybe it is best explained with a small example.
import numpy as np
import numpy.linalg
import matplotlib.pyplot as plt
def mat(x):
#return np.array([[np.sin(x), 0], [0, -np.sin(x)]])
return np.array([[np.sin(x), 0, 0], [0, -np.sin(x), 0], [0, 0, np.sin(10*x)+x]])
fig=plt.figure()
fig.suptitle('wrong colors')
ax=fig.add_subplot(111)
# x = np.linspace(-1,1,100) # no, not that easy, the intersection points are difficult to find
x = np.sort(np.random.uniform(low=-1, high=1, size=1000))
#evs = np.zeros((2, len(x)))
evs = np.zeros((3, len(x)))
for i in range(len(x)):
evs[:, i] = np.linalg.eigvalsh(mat(x[i]))
print(np.shape(evs))
ax.plot(x, evs[0,:], color='C0')
ax.plot(x, evs[1,:], color='C1')
ax.plot(x, evs[2,:], color='C2')
# just reference plot, this is how it should look like
fig2 = plt.figure()
fig2.suptitle('correct colors')
ax2 = fig2.add_subplot(111)
ax2.plot(x, np.sin(x), color='C0')
ax2.plot(x, -np.sin(x), color='C1')
ax2.plot(x, np.sin(10*x)+x, color='C2')
plt.show()
So what I get is this:
What I would like to have is this:
One difficulty is that the intersection point is difficult to calculate and usually not included. That's ok, I don't need the point, as the graphics is purely informative. But the colors should be shown correctly. Any suggestions how I could achieve something like this easily?
To give you an idea of where this is to be used, have a look at the following picture.
Here, the straight lines in the middle should have a different color than the curved ones.
Besides the matrix being a lot more complex, the image is created in the same way as above.
EDIT: My example was not good and clear, I have come up with one which is closer to my real problem. The matrix is numeric and I cannot diagonalize it analytically, i.e. I cannot know whether it is sin, cos or maybe some mean np.sin(2*x+0.2)+np.cos(x)**2.
Here you go:
Just concatenate the first part of one signal with the last part of the other
import numpy as np
import numpy.linalg
import matplotlib.pyplot as plt
def mat(x):
return np.array([[np.sin(x), 0], [0, -np.sin(x)]])
fig=plt.figure()
fig.suptitle('wrong colors')
ax=fig.add_subplot(111)
x = np.linspace(-1,1,100)
evs = np.zeros((2, len(x)))
for i in range(len(x)):
evs[:, i] = np.linalg.eigvalsh(mat(x[i]))
print(np.shape(evs))
ax.plot(x, np.concatenate((evs[0,:int(len(x)//2)],evs[1,int(len(x)//2):])), color='C0')
ax.plot(x, np.concatenate((evs[1,:int(len(x)//2)],evs[0,int(len(x)//2):])), color='C1')
plt.show()

I'm learning from Python Data Science Handbook and got different graph with same code. What's wrong?

Here is the exact code from the book and output:
plt.scatter(data_projected[:, 0], data_projected[:, 1], c=digits.target,
edgecolor='none', alpha=0.5,
cmap=plt.cm.get_cmap('spectral', 10))
plt.colorbar(label='digit label', ticks=range(10))
plt.clim(-0.5, 9.5);
Output:
Here is my code:
plt.scatter(data_projected[:, 0], data_projected[:, 1],
c=digits.target,
edgecolor = 'none', alpha= 0.5,
cmap = plt.cm.get_cmap('tab10', 10))
plt.colorbar(label='digit label', ticks = range(10))
plt.clim(-0.5, 9.5)
Output:
I guess I need to change something in settings or do something not so complicated but don't know what. Or maybe they changed the dataset?
P.S. I changed color on purpose, it doesn't make any impact on code.
Book chapter:
https://jakevdp.github.io/PythonDataScienceHandbook/05.02-introducing-scikit-learn.html
Concerning the mirroring, consider that the projection involves an arbitrary choice of basis vectors. I'm in no way a machine-learning expert, so I cannot go into detail about the exact working of the algorithm. But if you run the same code several times, you may get all possible orientations, e.g. for 25 runs:
import matplotlib.pyplot as plt
from sklearn.manifold import Isomap
from sklearn.datasets import load_digits
digits = load_digits()
fig, axs = plt.subplots(5,5, figsize=(16,9), sharex=True, sharey=True)
for ax in axs.flat:
iso = Isomap(n_components=2)
iso.fit(digits.data)
data_projected = iso.transform(digits.data)
im = ax.scatter(data_projected[:, 0], data_projected[:, 1], c=digits.target,
s=4,
edgecolor='none', alpha=0.5,
norm=plt.Normalize(-.5, 9.5),
cmap=plt.cm.get_cmap('tab10', 10))
fig.colorbar(im, label='digit label', ax=axs, ticks=range(10))
plt.show()

How to change marker size/scale in legend when marker is set to pixel

I am scatter ploting data points with a very small marker (see screengrab below). When I use the very small marker ',' the legend is very hard to read (example code taken from here).
(Python 3, Jupyter lab)
How can I increase the size of the marker in the legend. The two versions shown on the above mentioned site do not work:
legend = ax.legend(frameon=True)
for legend_handle in legend.legendHandles:
legend_handle._legmarker.set_markersize(9)
and
ax.legend(markerscale=6)
The two solutions do however work when the marker is set to '.'.
How can I show bigger makers in the legend?
Sample Code from intoli.com:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
for i in range(5):
mean = [np.random.random()*10, np.random.random()*10]
covariance = [ [1 + np.random.random(), np.random.random() - 1], [0, 1 + np.random.random()], ]
covariance[1][0] = covariance[0][1] # must be symmetric
x, y = np.random.multivariate_normal(mean, covariance, 3000).T
plt.plot(x, y, ',', label=f'Cluster {i + 1}')
ax.legend(markerscale=12)
fig.tight_layout()
plt.show()
You can get 1 pixel sized markers for a plot by setting the markersize to 1 pixel. This would look like
plt.plot(x, y, marker='s', markersize=72./fig.dpi, mec="None", ls="None")
What the above does is set the marker to a square, set the markersize to the ppi (points per inch) divided by dpi (dots per inch) == dots == pixels, and removes lines and edges.
Then the solution you tried using markerscale in the legend works nicely.
Complete example:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
for i in range(5):
mean = [np.random.random()*10, np.random.random()*10]
covariance = [ [1 + np.random.random(), np.random.random() - 1], [0, 1 + np.random.random()], ]
covariance[1][0] = covariance[0][1] # must be symmetric
x, y = np.random.multivariate_normal(mean, covariance, 3000).T
plt.plot(x, y, marker='s', markersize=72./fig.dpi, mec="None", ls="None",
label=f'Cluster {i + 1}')
ax.legend(markerscale=12)
fig.tight_layout()
plt.show()
According to this discussion, the markersize has no effect when using pixels (,) as marker. How about generating a custom legend instead? For example, by adapting the first example in this tutorial, one can get a pretty decent legend:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
np.random.seed(12)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
for i in range(5):
mean = [np.random.random()*10, np.random.random()*10]
covariance = [ [1 + np.random.random(), np.random.random() - 1], [0, 1 + np.random.random()], ]
covariance[1][0] = covariance[0][1] # must be symmetric
x, y = np.random.multivariate_normal(mean, covariance, 3000).T
plt.plot(x, y, ',', label=f'Cluster {i + 1}')
##generating custom legend
handles, labels = ax.get_legend_handles_labels()
patches = []
for handle, label in zip(handles, labels):
patches.append(mpatches.Patch(color=handle.get_color(), label=label))
legend = ax.legend(handles=patches)
fig.tight_layout()
plt.show()
The output would look like this:

how to generate a series of histograms on matplotlib?

I would like to generate a series of histogram shown below:
The above visualization was done in tensorflow but I'd like to reproduce the same visualization on matplotlib.
EDIT:
Using plt.fill_between suggested by #SpghttCd, I have the following code:
colors=cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i]
edgecolor='w')
plt.show()
This works great, but is it possible to use histogram instead of a continuous curve?
EDIT:
joypy based approach, like mentioned in the comment of october:
import pandas as pd
import joypy
import numpy as np
df = pd.DataFrame()
for i in range(0, 400, 20):
df[i] = np.random.normal(i/410*5, size=30)
joypy.joyplot(df, overlap=2, colormap=cm.OrRd_r, linecolor='w', linewidth=.5)
for finer control of colors, you can define a color gradient function which accepts a fractional index and start and stop color tuples:
def color_gradient(x=0.0, start=(0, 0, 0), stop=(1, 1, 1)):
r = np.interp(x, [0, 1], [start[0], stop[0]])
g = np.interp(x, [0, 1], [start[1], stop[1]])
b = np.interp(x, [0, 1], [start[2], stop[2]])
return (r, g, b)
Usage:
joypy.joyplot(df, overlap=2, colormap=lambda x: color_gradient(x, start=(.78, .25, .09), stop=(1.0, .64, .44)), linecolor='w', linewidth=.5)
Examples with different start and stop tuples:
original answer:
You could iterate over your dataarrays you'd like to plot with plt.fill_between, setting colors to some gradient and the line color to white:
creating some sample data:
import numpy as np
t = np.linspace(-1.6, 1.6, 11)
y = np.cos(t)**2
y2 = lambda : y + np.random.random(len(y))/5-.1
plot the series:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
colors = cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
for i in range(10):
plt.fill_between(t+i, y2()+10-i/10, 10-i/10, facecolor = colors[i], edgecolor='w')
If you want it to have more optimized towards your example you should perhaps consider providing some sample data.
EDIT:
As I commented below, I'm not quite sure if I understand what you want - or if you want the best for your task. Therefore here a code which plots besides your approach in your edit two smples of how to present a bunch of histograms in a way that they are better comparable:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cm as cm
N = 10
np.random.seed(42)
colors=cm.OrRd_r(np.linspace(.2, .6, N))
fig1 = plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i],
edgecolor='w')
data = np.random.binomial(20, .3, (N, 100))
fig2, axs = plt.subplots(N, figsize=(10, 6))
for i, d in enumerate(data):
axs[i].hist(d, range(20), color=colors[i], label=str(i))
fig2.legend(loc='upper center', ncol=5)
fig3, ax = plt.subplots(figsize=(10, 6))
ax.hist(data.T, range(20), color=colors, label=[str(i) for i in range(N)])
fig3.legend(loc='upper center', ncol=5)
This leads to the following plots:
your plot from your edit:
N histograms in N subplots:
N histograms side by side in one plot:

Scatterplot in matplotlib with legend and randomized point order

I'm trying to build a scatterplot of a large amount of data from multiple classes in python/matplotlib. Unfortunately, it appears that I have to choose between having my data randomised and having legend labels. Is there a way I can have both (preferably without manually coding the labels?)
Minimum reproducible example:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
X = np.random.normal(0, 1, [5000, 2])
Y = np.random.normal(0.5, 1, [5000, 2])
data = np.concatenate([X,Y])
classes = np.concatenate([np.repeat('X', X.shape[0]),
np.repeat('Y', Y.shape[0])])
Plotting with randomized points:
plot_idx = np.random.permutation(data.shape[0])
colors = pd.factorize(classes)
fig, ax = plt.subplots()
ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[plot_idx],
label=classes[plot_idx],
alpha=0.4)
plt.legend()
plt.show()
This gives me the wrong legend.
Plotting with the correct legend:
from matplotlib import cm
unique_classes = np.unique(classes)
colors = cm.Set1(np.linspace(0, 1, len(unique_classes)))
for i, class in enumerate(unique_classes):
ax.scatter(data[classes == class, 0],
data[classes == class, 1],
c=colors[i],
label=class,
alpha=0.4)
plt.legend()
plt.show()
But now the points are not randomized and the resulting plot is not representative of the data.
I'm looking for something that would give me a result like I get as follows in R:
library(ggplot2)
X <- matrix(rnorm(10000, 0, 1), ncol=2)
Y <- matrix(rnorm(10000, 0.5, 1), ncol=2)
data <- as.data.frame(rbind(X, Y))
data$classes <- rep(c('X', 'Y'), times=nrow(X))
plot_idx <- sample(nrow(data))
ggplot(data[plot_idx,], aes(x=V1, y=V2, color=classes)) +
geom_point(alpha=0.4, size=3)
You need to create the legend manually. This is not a big problem though. You can loop over the labels and create a legend entry for each. Here one may use a Line2D with a marker similar to the scatter as handle.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
X = np.random.normal(0, 1, [5000, 2])
Y = np.random.normal(0.5, 1, [5000, 2])
data = np.concatenate([X,Y])
classes = np.concatenate([np.repeat('X', X.shape[0]),
np.repeat('Y', Y.shape[0])])
plot_idx = np.random.permutation(data.shape[0])
colors,labels = pd.factorize(classes)
fig, ax = plt.subplots()
sc = ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[plot_idx],
alpha=0.4)
h = lambda c: plt.Line2D([],[],color=c, ls="",marker="o")
plt.legend(handles=[h(sc.cmap(sc.norm(i))) for i in range(len(labels))],
labels=list(labels))
plt.show()
Alternatively you can use a special scatter handler, as shown in the quesiton Why doesn't the color of the points in a scatter plot match the color of the points in the corresponding legend? but that seems a bit overkill here.
It's a bit of a hack, but you can save the axis limits, set the labels by drawing points well outside the limits of the plot, and then resetting the axis limits as follows:
plot_idx = np.random.permutation(data.shape[0])
color_idx, unique_classes = pd.factorize(classes)
colors = cm.Set1(np.linspace(0, 1, len(unique_classes)))
fig, ax = plt.subplots()
ax.scatter(data[plot_idx, 0],
data[plot_idx, 1],
c=colors[color_idx[plot_idx]],
alpha=0.4)
xlim = ax.get_xlim()
ylim = ax.get_ylim()
for i in range(len(unique_classes)):
ax.scatter(xlim[1]*10,
ylim[1]*10,
c=colors[i],
label=unique_classes[i])
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.legend()
plt.show()

Categories