How to get matplotlib to place lines accurately? - python

By default, matplotlib plot can place lines very inaccurately.
For example, see the placement of the left endpoint in the attached plot. There's at least a whole pixel of air that shouldn't be there. In fact I think the line center is 2 pixels off.
How to get matplotlib to draw accurately? I don't mind if there is some performance hit.
Inaccurately rendered line in matplotlib plot:
Inaccurately rendered line in matplotlib plot - detail magnified:
This was made with the default installations in Ubuntu 16.04 (Python 3), Jupyter notebook (similar result from command line).
Mathematica, for comparison, does subpixel-perfect rendering directly and by default:
Why can't we?

Consider the following to see what is going on
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 4], clip_on=False, lw=5, alpha=.5)
ax.set_xlim([1, 3])
fig.savefig('so.png', dpi=400)
You can also disable pixel snapping by passing snap=False to plot, however once you get down to placing ~ single pixel wide line, you are going to have issues because the underlying rasterization is too coarse.

The problem is even more notorious when one plots functions which are symmetric with respect to the x axis and slowly approach zero. As for example here:
which is indeed embarrassing if you are telling the reader of a scientific paper that the two curves are symmetric!
I went around this problem by exporting to pdf instead of exporting to png:

I completely agree that this should be worked on. I find that plt.plot gives (at least more or less) unshifted lines (in Jupyter) by calling plt.figure with dpi=144 (the default is 72). The figures do become twice as big though...

Related

plt.scatter() plots behaving like plt.plot() plots in Matplotlib

I'm trying to compare the GDP-per-capita of the world's countries' to each countries' COVID-19 death total. Every time I try to turn it into a scatter plot, it displays the same plot as would be displayed using the plt.plot() command. Here is my code:
import pandas as pd
from matplotlib import pyplot as plt
plt.style.use('seaborn-whitegrid')
data = pd.read_csv(r'/Users/john.smith/covid-data.csv')
gdp = data["gdp_per_capita"]
deaths = data["total_deaths"]
plt.scatter(gdp, deaths)
plt.title('GDP-per-Capita Compared to COVID-19 Death Total')
plt.xlabel('GDP-per-Capita')
plt.ylabel('Confirmed Deaths')
plt.tight_layout()
plt.show()
While running this code, the following graph is produced. This is obviously not the scatter plot I'm trying to get, and it's worth noting that the only thing that changes when I use the plt.scatter() command is that the points on the plot just get very large.
I ran a test of the whole Matplotlib module entirely on a different file. When I use normal variables without importing from a CSV file, like this:
x = [7, 3, 8, 3]
y = [1, 5, 7, 4]
plt.scatter(x, y)
Then the code works perfectly fine and produces a scatter plot. I have been digging for hours online to try and find a solution, and have tried to use other methods of importing CSVs or creating scatter plots but nothing is working. Thank you for any tips.
Answer is courtesy of G. Anderson in the comments above.
As it turns out I just didn't have experience with the xlim() and ylim() commands, so the individual points in the scatter plot just overlapped very tightly in vertical lines. The reason this happened is simply because the original view window was too wide for this large of a dataset.
I did some slight additional research to try and put two plots onto a single figure with one being zoomed in, here's the code:
figs, axs = plt.subplots(2)
figs.suptitle('GDP-per-Capita Compared to COVID-19 Death Total')
axs[0].scatter(gdp, deaths)
axs[1].scatter(gdp, deaths)
plt.axis([10000, 20000, 10000, 20000])
This produced some nice plots I can use:
I'm going to look into ways to make the two plots much more readable.

matplotlib legend performance issue

I am using Jupyter-notebook with python 3.6.2 and matplotlib to plot some data.
When I plot my data, I want to add a legend to the plot (basically to know which line is which)
However calling plt.legend takes a lot of time (almost as much as the plot itself, which to my understanding should just be instant).
Minimal toy problem that reproduces the issue:
import numpy as np
import matplotlib.pyplot as plt
# Toy useless data (one milion x 4)
my_data = np.random.rand(1000000,4)
plt.plot(my_data)
#plt.legend(['A','C','G','T'])
plt.show()
The data here is just random and useless, but it reproduces my problem:
If I uncomment the plt.legend line, the run takes almost double the time
Why? Shouldn't the legend just look at the plot, see that 4 plots have been made, and draw a box assigning each color to the corresponding string?
Why is a simple legend taking so much time?
Am I missing something?
Replicating the answer by #bnaecker, such that this question is answered:
By default, the legend will be placed in the "best" location, which requires computing how many points from each line are inside a potential legend box. If there are many points, this can take a while. Drawing is much faster when specifying a location other than "best", e.g. plt.legend(loc=3).

SVG rendering issues using iPython inline plots

when I use inline plots in iPython (QtConsole), the first plot looks (more or less) fine, but then it gets weirder and weirder. When I plot something several times (so plot, see it displayed, plot again, see output etc.), it looks like it is being overlaid with the skewed previous picture. So after plotting a diagonal line (x=y) 4 times in a row I get something like this
If i right click and export it as svg everything looks good
(Exported PNG picture remains wrecked as the first one).
I guess the problem is similar to https://github.com/ipython/ipython/issues/1866, but I didn't got the upshot of the discussion (it got too technical and complicated for me to follow).
Is there any solution or work around for this issue?
I'm using
python 2.7
matplotlib 1.4.1
IPython 2.1.0
Here is a working example:
%matplotlib inline
% config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
a=range(10)
fig,ax=plt.subplots()
ax.plot(a,a)
ax.axis('off')
if you remove plt.axis('off') line, weird things happen only outside of the axis box.
P.S. Originally I encountered this problem in connection with drawing graphs with networkx. If I use draw from networkx this problem does not occur. If I use draw_networkx, same as described above happens. That might point to the core of the problem... I'm trying to figure out what line of code makes one work better than the other...
After tinkering around with the draw and draw_networkx functions from networkx module, I found the workaround which makes the difference between draw and draw_networkx in this case.
Adding fig.set_facecolor('w') overlays whatever is in the background, so the new plots are started with a white sheet (but not a blank one, I guess).
So new working example is:
%matplotlib inline
% config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
a=range(10)
fig,ax=plt.subplots()
fig.set_facecolor('w')
ax.plot(a,a)
ax.axis('off')

matplotlib tripcolor - removing edges

I'm using tripcolor from matplotlib.pyplot to generate color plots of some data. The plot works great, but I'd like to turn off the edges which are drawn between data points:
subtle but pretty noticeable if you zoom in. I tried to get rid of them via:
plt.tripcolor(1e4*data_z, data_phi, data_I/1e3, shading='flat', edgecolors='none')
but the edgecolors='none' keyword arg seems to have no effect. I can, however, change the color from white to something else. Is there a way to get rid of them altogether?
I tried with an example from the official documentation, and the property edgecolor seems working well.
This is the result with edgecolors='w':
And this with edgecolors='none':
I am using WinPython 3.3.5 (with Matplotlib 1.3.1) under Windows 7, maybe you have a different version?

Sloppy SVG generated by matplotlib resulting on poor clipping of datapoint drawing in figures

Figures that I create with matplotlib do not properly clip points to the figure axes when rendered, but instead draw additional points, even though such figures look fine in some viewers.
For example (following an example from the documentation) using
import matplotlib
matplotlib.use('SVG')
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
fig = plt.figure()
ax = fig.add_subplot(111)
x, y = 12*np.random.rand(2, 1000)
ax.set(xlim=[2,10])
ax.plot(x, y, 'go') # plot some data in data coordinates
circ = patches.Circle((0.5, 0.5), 0.25, transform=ax.transAxes,
facecolor='yellow', alpha=0.5)
ax.add_patch(circ)
plt.savefig()
I seem, when viewed for example in OS X Preview, to get
but when I view it in other editors, such as iDraw I get a mess (where, weirdly, there is a combination of correct clipping of edge points, failed clipping of points outside the axes, and clipping of the canvass at a point that does not correspond to either the axes or the range of data):
I'm not experienced with SVG, but those I've asked tell me that
I looked at the SVG file and didn't like what I saw. Characters are
flattened, and definition sections are scattered throughout the file
instead of being at the top; some defs are inside graphics constructs.
There's a lot of cruft. It turns out the definition of the clip-path
is at the very end of the svg file -- after all the uses ...
How can I get matplotlob to generate SVG that does not have these issues? I know that I can edit the SVG, but I have no idea how, and doing so defeats the purpose and I hope that it is not necessary to add a "by hand" step to my workflow.
I'm interested in understanding what the cause of the sloppy SVG generated by matplotlib is: whether it's something that can be avoided by coding a bit differently (though not, clearly, by simply checking whether every data point is in range), or whether it's a bug in matplotlib (or perhaps whether it's just a problem with ambiguities in the SVG standard). The goal is getting matplotlob to generate SVG that is not buggy.
This is probably related to a know issue and also comes up in pdfs (matplotlib data accessible outside of xlim range)
See Issues #2488 and #2423 (the later which includes a proposed fix for pdf). It is milestoned for 1.4.

Categories