matplotlib legend performance issue - python

I am using Jupyter-notebook with python 3.6.2 and matplotlib to plot some data.
When I plot my data, I want to add a legend to the plot (basically to know which line is which)
However calling plt.legend takes a lot of time (almost as much as the plot itself, which to my understanding should just be instant).
Minimal toy problem that reproduces the issue:
import numpy as np
import matplotlib.pyplot as plt
# Toy useless data (one milion x 4)
my_data = np.random.rand(1000000,4)
plt.plot(my_data)
#plt.legend(['A','C','G','T'])
plt.show()
The data here is just random and useless, but it reproduces my problem:
If I uncomment the plt.legend line, the run takes almost double the time
Why? Shouldn't the legend just look at the plot, see that 4 plots have been made, and draw a box assigning each color to the corresponding string?
Why is a simple legend taking so much time?
Am I missing something?

Replicating the answer by #bnaecker, such that this question is answered:
By default, the legend will be placed in the "best" location, which requires computing how many points from each line are inside a potential legend box. If there are many points, this can take a while. Drawing is much faster when specifying a location other than "best", e.g. plt.legend(loc=3).

Related

How to fix matplotlib overlapping graphs [duplicate]

I am trying to use a forloop to produce figures for each set of data I have, but while the .show() command produces the correct figure, .savefig() keeps adding the previous plotted values to the new figure.
In my forloop, this is the relevant sample of the code.
import matplotlib.pyplot as plt
plt.plot(X,Y[:,0],'o-')
plt.xlabel('x')
plt.savefig('plot'+str(i)+'.png')
As a comparison, here is the savefig plot and here is that shown by show(). As can be seen, the savefig() plot also plotted the previous result.
You have to close current figure after saving with function
plt.close(): http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.close
Or you have to clean current figure after saving by plt.clf(): http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.clf
I made some beautiful charts before I included plt.clf() to clear the plot each time through the loop.
scatterplot1
scatterplot2
In other words, my previous plots were being added to a single figure as shown in the lots above, within my for loop as well. adding [plt.clf()] to clear the plot each time through the loop fixed this problem being clearing the figure before starting the loop iteration at the top to create a new figure with new plots.
TLDR; I included plt.clf() to clear the plot each time through the loop.

Python/Seaborn: What does the inside horizontal distribution of the data-points means or is it random?

It seems like that inside-distribution of the histogram data points is almost random every time you plot (using Seaborn) - is it for the ease of readability or other meaningful purpose?
I am using Python 3.0 and Seaborn provided dataset called 'tips' for this question.
import seaborn as sns
tips = sns.load_dataset("tips")
After I ran my same code below twice I see differences of inside points distribution. Here is the code you can run a couple of times:
ax = sns.stripplot(x="day", y="total_bill", data=tips, alpha=.55,
palette='Set1', jitter=True, linewidth=1 )
Now, if you look into the plots (if you ran it twice for example) you will notice that the distribution of the points is not the same between 2 plots:
Please explain why points are not distributed identically with 2 separate runs? Also, judging those points on the horizontal scale; is there a reason why (for example) one red point is further left than other red point OR is it simply for readability?
Thank you in advance!
After a bit more research, I believe that the distribution of data points is random but uniform (thank you #ImportanceOfBeingErnest for pointing to the code). Therefore, answering my own questions there is no hidden meaning in terms of distribution and horizontal range is simply set for visibility that also changes or stays the same based on set/notset seed.
I do think that both displays are identical along the vertical axis (I.e. : both distributions are equal since they represent the same scatter plot of a given dataset). The slight visual differences comes along the position onto the horizontal (categorical days) axis; this one comes from the 'jitter' option (=True) that induces slight random relatively to the vertical axis they are related to (day). The jitter option helps to distinguish scatter plots with the same total_bill value (that should be superimposed if equal) : thus the difference comes from the jitter option set to True, that is used for readability.

Superimposing some plots with a txt file

`I am trying to reproduce the attached figure step by step. My problem was that how can i plot colorbar in above figure by my data. My data is a cosmological data and it has 7 columns totally with many raw. My main goal is reproducing the present figure step by step. You can see that there are three different plots which are interpolated each other. Firstly, i tried to plot small colorful lines in the body of figure by using two columns of data. I did it by scatter plots and then i needed to reproduce the colorbar part of figure. But, it was not possible at the first attempt. Because, the colorbar points was not a part of data. Then, i obtained the values of colorbar by some calculations and added them as additional columns to data. Now, i could you the simple colorbar function to do colorbar part. And i got it. For the next step, i need to turn small curved lines to dark solid lines.
How can I do plots in matplotlib?
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
data1 = np.loadtxt("bei_predic.txt", unpack=True)
B = np.log10(data1[3]/(4.*(data1[2])))
R = np.vstack((data1,B))
R = np.transpose(R)
D = R[~np.isnan(R).any(axis=1)]
A = plt.scatter(D[:,3],D[:,2], c=D[:,8])
cbar= plt.colorbar()
cbar.set_label("file", labelpad=+1)
plt.show()
If you could start off by telling us a little bit about the data that you are using that would be great. In order to plot the figure that you want, we must first load the data into some variables. Have you managed to do this?
Check out this example in which the author plots multicolored lines for some guidance.

SVG rendering issues using iPython inline plots

when I use inline plots in iPython (QtConsole), the first plot looks (more or less) fine, but then it gets weirder and weirder. When I plot something several times (so plot, see it displayed, plot again, see output etc.), it looks like it is being overlaid with the skewed previous picture. So after plotting a diagonal line (x=y) 4 times in a row I get something like this
If i right click and export it as svg everything looks good
(Exported PNG picture remains wrecked as the first one).
I guess the problem is similar to https://github.com/ipython/ipython/issues/1866, but I didn't got the upshot of the discussion (it got too technical and complicated for me to follow).
Is there any solution or work around for this issue?
I'm using
python 2.7
matplotlib 1.4.1
IPython 2.1.0
Here is a working example:
%matplotlib inline
% config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
a=range(10)
fig,ax=plt.subplots()
ax.plot(a,a)
ax.axis('off')
if you remove plt.axis('off') line, weird things happen only outside of the axis box.
P.S. Originally I encountered this problem in connection with drawing graphs with networkx. If I use draw from networkx this problem does not occur. If I use draw_networkx, same as described above happens. That might point to the core of the problem... I'm trying to figure out what line of code makes one work better than the other...
After tinkering around with the draw and draw_networkx functions from networkx module, I found the workaround which makes the difference between draw and draw_networkx in this case.
Adding fig.set_facecolor('w') overlays whatever is in the background, so the new plots are started with a white sheet (but not a blank one, I guess).
So new working example is:
%matplotlib inline
% config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
a=range(10)
fig,ax=plt.subplots()
fig.set_facecolor('w')
ax.plot(a,a)
ax.axis('off')

Pyplot plot function

Given below is the code for plotting points using pyplot.
x1=300+p[k]*math.cos(val[thetaval])
y1=300+p[k]*math.sin(val[thetaval])
plt.plot(x1,y1,'k.')
The plotting is working fine, the problem is, if I want to plot it as a point I am specifying the dot in 'k.' inside the plot function. The output is something like:
The width of the black line/curve that I am plotting is much more that needed. How to reduce it?
It seems that you are not plotting a line but a series of small points. Maybe if you try setting the markersize argument of the plot function could work.
Looking into the documentation of plot() you can find "linewidth"
So use:
plt.plot(x1,y1,'k.', linewidth=0.1)

Categories