Closing a figure in python - python

I am making a lot of plots and saving them to a file, it all works, but during the compilation I get the following message:
RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
fig = self.plt.figure(figsize=self.figsize)
So I think I could improve the code by closing the figures, I googled it and found that I should use fig.close(). However I get the following error 'Figure' object has no attribute 'close'. How should I make it work?
This is the loop in which I create plots:
for i in years:
ax = newdf.plot.barh(y=str(i), rot=0)
fig = ax.get_figure()
fig.savefig('C:\\Users\\rysza\\Desktop\\python data analysis\\zajecia3\\figure'+str(i)+'.jpeg',bbox_inches='tight')
fig.close()

Replace fig.close() with plt.close(fig), close is a function defined directly in the module.

Try this, matplotlib.pyplot.close(fig) , for more information refer this website
https://matplotlib.org/2.1.0/api/_as_gen/matplotlib.pyplot.close.html

Related

How to close Seaborn plots

I am running a loop to extract data and graph plots using Seaborn, Pandas and Python. I just want to save each plot as a graphic and close it but I am not able to figure out how to do this.
/usr/local/lib/python3.6/dist-packages/seaborn/axisgrid.py:311: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).
I had expected g.close() to work but I got the error:
AttributeError: 'FacetGrid' object has no attribute 'close'
for o in options:
s = "SELECT * from options_yahoo where contract_name = '" + o + "'
SQL_Query = pd.read_sql_query(s, conn)
df = pd.DataFrame(SQL_Query)
g = sns.relplot( kind="line", data=df[['bid','ask','lastprice']])
g.savefig( o+ ".png")
g.close()
I expect to be able to have a more efficient solution that doesn't use up so much memory and bring warning errors. Some best practices would be much appreciated.
Seaborn plots responds to pyplot commands, you can do plt.close() to close the current figure, even if it was plotted by Seaborn
If you want to close a specific figure corresponding to a seaborn plot (e.g. a FacetGrid) called sns_plot, use:
plt.close(sns_plot.fig)

Is there a way to prevent plotnine from printing user warnings when saving ggplot objects to a file?

I'm building a simulation tool in python that outputs a number of plots using plotnine. However, for each individual plot I save, I get the following error messages:
C:\Users\tarca\Anaconda3\lib\site-packages\plotnine\ggplot.py:706: UserWarning: Saving 10 x 3 in image.
from_inches(height, units), units))
C:\Users\tarca\Anaconda3\lib\site-packages\plotnine\ggplot.py:707: UserWarning: Filename: my_plot.png
warn('Filename: {}'.format(filename))
I've already tried manually setting all of the arguments, and I've tried saving the files using both plot.save() and ggsave() - both yield the same result. If you search the error, the only thing that comes up is that the author of the following tutorial gets the same errors, though they are not addressed therein:
https://monashdatafluency.github.io/python-workshop-base/modules/plotting_with_ggplot/
To save the plots, I'm using code similar to:
plot.save(filename = 'my_plot.png', width = 10, height = 3, dpi = 300)
I'm hoping to be able to save the plots without generating any annoying messages that may confuse anyone using the program.
I'm not sure why this warning is still displayed in the tutorial you linked to, because as soon as I do
import warnings
warnings.filterwarnings('ignore')
as described there right at the beginning, too, the UserWarning which was printed before when saving a plot to disk is successfully suppressed.
Yes there is, just use:
fig2.save(fig_dir + "/figure2.png", width = w, height = h, verbose = False)
If you don't specify verbose = plotnine will always display a warning. See their GitHub module why.

How to save a nltk FreqDist plot?

I've tried different methods to save my plot but every thing I've tried has turned up with a blank image and I'm not currently out of ideas. Any help with other suggestions that could fix this? The code sample is below.
word_frequency = nltk.FreqDist(merged_lemmatizedTokens) #obtains frequency distribution for each token
print("\nMost frequent top-10 words: ", word_frequency.most_common(10))
word_frequency.plot(10, title='Top 10 Most Common Words in Corpus')
plt.savefig('img_top10_common.png')
I was able to save the NLTK FreqDist plot, when I first initialized a figure object, then called the plot function and finally saved the figure object.
import matplotlib.pyplot as plt
from nltk.probability import FreqDist
fig = plt.figure(figsize = (10,4))
plt.gcf().subplots_adjust(bottom=0.15) # to avoid x-ticks cut-off
fdist = FreqDist(merged_lemmatizedTokens)
fdist.plot(10, cumulative=False)
plt.show()
fig.savefig('freqDist.png', bbox_inches = "tight")
I think you can try the following:
plt.ion()
word_frequency.plot(10, title='Top 10 Most Common Words in Corpus')
plt.savefig('img_top10_common.png')
plt.ioff()
plt.show()
This is because inside nltk's plot function, plt.show() is called and once the figure is closed, plt.savefig() has no active figure to save anymore.
The workaround is to turn interactive mode on, such that the plt.show() from inside the nltk function does not block. Then savefig is called with a current figure available and saves the correct plot. To then show the figure, interactive mode needs to be turned off again and plt.show() be called externally - this time in a blocking mode.
Ideally, nltk would rewrite their plotting function to either allow to set the blocking status, or to not show the plot and return the created figure, or to take a Axes as input to which to plot. Feel free to reach out to them with this request.

Running code line by line versus as a selection (in Ipython using Spyder)

I'm slowly transitioning from R to Python, and some of the more subtle differences are driving me a bit nuts. I found a really interesting guide on Effectively Using Matplotlib in the blog Practical Business Python
Here, the author shows how to build a chart step by step using the following lines of code (short version):
# Modules
import pandas as pd
import matplotlib.pyplot as plt
# Get the data
df = pd.read_excel("https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true")
df.head()
# Rearrange data
top_10 = (df.groupby('name')['ext price', 'quantity'].agg({'ext price': 'sum', 'quantity': 'count'})
.sort_values(by='ext price', ascending=False))[:10].reset_index()
top_10.rename(columns={'name': 'Name', 'ext price': 'Sales', 'quantity': 'Purchases'}, inplace=True)
# Customize plot
plt.style.use('ggplot')
# And here's the part that puzzles me:
fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
I'm messing around with this in Spyder, and I've noticed that there is a difference betweene running parts of this code line by line versus running the same lines as a selection.
Option 1, step 1:
Option 1, step 2
Option 2
I was guessing that the result somehow would be the same "under the hood", and I've tried rendering the chart using plt.show, plt.draw and fig.draw. But no luck so far.
I assume that the answer to this has got something to do with very basic functionality in IPython and / or how these elements are assigned to memory, but the whole thing just leaves me confused. I'm hoping some of you find the time to explaing this, and perhaps offer further suggestions on how to wrestle with these things.
Thanks!
Edit:
I'm using Spyder 2.3.8 with Python 3.5.1 on Windows
In the IPython console within Spyder, the figure will be shown if a figure object is detected in the cell.
Because the fig, ax = plt.subplots() has a figure object in it, the (empty) figure is shown.
If afterwards a plotting command on the axes is effectuated, no figure object is detected, so only the return of the cell is shown as text.
plt.show() will not help here (don't ask me why it hasn't been implemented).
However, you can at any point simply state the reference to the figure, fig to obtain the image of the figure.

Efficient memory usage in python

I wrote a Python script using pyodbc to transfer data from an excel sheet into ms access and also using matplotlib to use some of the data in the excel sheet to create plots and save them to a folder. When I ran the script, it did what I expected it to do ; however, I was monitoring it with task manager and it ended up using over 1500 MB of RAM!
I don't even understand how that is possible. It created 560 images but the total size of those images was only 17 MB. The excel sheet is 8.5 MB. I understand that maybe you can't tell me exactly what the problem is without seeing all my code (I don't know exactly what the problem is so I would just have to post the whole thing and I don't think it is reasonable to ask you to read my entire code) but some general guidelines would suffice.
Thanks.
Update
I did just as #HYRY suggested and split my code up. I ran the script first with only the matplotlib functions and then afterwards without them. As those who have commented so far have suspected, the memory hog is from the matplotlib functions. Now that we have narrowed it down I will post some of my code. Note that the code below executes in two for loops. The inner for loop will always execute four times while the outer for loop executes however many times is necessary.
#Plot waveform and then relative harmonic orders on a bar graph.
#Remember that table is the sheet name which is named after the ExperimentID
cursorEx.execute('select ['+phase+' Time] from ['+table+']')
Time = cursorEx.fetchall()
cursorEx.execute('select ['+phase+' Waveform] from ['+table+']')
Current = np.asanyarray(cursorEx.fetchall())
experiment = table[ :-1]
plt.figure()
#A scale needs to be added to the primary current values
if line == 'P':
ratioCurrent = Current / 62.5
plt.plot(Time, ratioCurrent)
else:
plt.plot(Time, Current)
plt.title(phaseTitle)
plt.xlabel('Time (s)')
plt.ylabel('Current (A)')
plt.savefig(os.getcwd()+'\\HarmonicsGraph\\'+line+'H'+experiment+'.png')
cursorEx.execute('select ['+phase+' Order] from ['+table+']')
cursorEx.fetchone() #the first row is zero
order = cursorEx.fetchmany(51)
cursorEx.execute('select ['+phase+' Harmonics] from ['+table+']')
cursorEx.fetchone()
percentage = np.asanyarray(cursorEx.fetchmany(51))
intOrder = np.arange(1, len(order) + 1, 1)
plt.figure()
plt.bar(intOrder, percentage, width = 0.35, color = 'g')
plt.title(orderTitle)
plt.xlabel('Harmonic Order')
plt.ylabel('Percentage')
plt.axis([1, 51, 0, 100])
plt.savefig(os.getcwd()+'\\HarmonicsGraph\\'+line+'O'+experiment+'.png')
I think that plt.close() is a very hard solution when you will do more than one plot in your script. A lot of times, if you keep the figure reference, you can do all the work on them, calling before:
plt.clf()
you will see how your code is faster (is not the same to generate the canvas every time!). Memory leaks are terrible when you call multiple axes o figures without the proper clean!
I don't see a cleanup portion in your code, but I'd be willing to bet that the issue is that you are not calling
plt.close()
after you are finished with each plot. Add that one line after you are done with each figure and see if it helps.

Categories