How to show timeline in matplotlib.axes.Axes.stem plot? - python

I am doing a matplotlib.axes.Axes.stem graph where the x-axis is a dateline that shows days. Some of my data appear on certain days. While on other days, it has no data (because such info do not exist in my data).
Question 1: How do I make a timeline stem graph that will show my data, including days with no data? Is this possible? Is there some way to auto-scale the appearance of the data x-axis to handle such a situation?
Below is a sample data file called test.txt and my python script to read in its data to show a timeline stem plot for your consideration. output from this script is also given below.
Question2. Presentation question. How do I show a "-" symbol at each annotation? Also, how do I rotate the annotation by 30 degrees?
test.txt
No. Date
1 23/01/2020
2 24/01/2020
3 24/01/2020
4 26/01/2020
5 27/01/2020
6 28/01/2020
7 29/01/2020
8 29/01/2020
9 30/01/2020
10 30/01/2020
11 31/01/2020
12 31/01/2020
13 01/02/2020
14 01/02/2020
15 04/02/2020
16 04/02/2020
17 04/02/2020
18 05/02/2020
19 05/02/2020
20 05/02/2020
21 06/02/2020
22 07/02/2020
23 07/02/2020
24 07/02/2020
25 08/02/2020
26 08/02/2020
27 08/02/2020
28 08/02/2020
29 08/02/2020
30 09/02/2020
31 10/02/2020
32 10/02/2020
33 11/02/2020
34 11/02/2020
38 13/02/2020
39 13/02/2020
40 13/02/2020
41 13/02/2020
42 13/02/2020
43 13/02/2020
44 14/02/2020
45 14/02/2020
46 14/02/2020
47 14/02/2020
48 14/02/2020
49 14/02/2020
50 15/02/2020
51 15/02/2020
52 15/02/2020
53 15/02/2020
54 15/02/2020
57 18/02/2020
58 18/02/2020
59 18/02/2020
60 19/02/2020
61 21/02/2020
stem_plot.py
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
from datetime import datetime
from pathlib import Path
#########################
#### DATA EXTRACTION ####
#########################
source = Path('./test.txt')
with source.open() as f:
lines = f.readlines()
#print( lines )
# Store source data in dictionary with date shown as mm-dd.
data={}
for line in lines[1:]:
case, cdate = line.strip().split()
cdate = datetime.strptime(cdate, "%d/%m/%Y").strftime('%m-%d')
data[case] = cdate
print( f'\ndata = {data}' )
# Collate data's y-axis for each date, i.e. history
history2={}
cdates = list(data.values())
sorted_dates = sorted( set( cdates ) )
for i in sorted_dates:
cases=[]
for case, date in data.items():
if i == date:
cases.append(case)
#print( i, cases)
history2[i] = cases
print( f'\nhistory2 = {history2}')
###########################
#### DATA PRESENTATION ####
###########################
# Create figure and plot a stem plot with the date
fig, ax = plt.subplots(figsize=(8.8, 5), constrained_layout=True)
ax.set(title="Test")
labels=list( history2.values() ) # For annotation
yy = [ len(i) for i in labels ] # y-axis
xx = list(history2.keys()) # x-axis
markerline, stemline, baseline = ax.stem(
xx, yy, linefmt="C1:", basefmt="k-", use_line_collection=True)
plt.setp(markerline, marker="None" )
# annotate stem lines
for ann_x, label in list(history2.items()):
print(ann_x, label)
each_count=1
for each in label:
ax.annotate( each, xy=(ann_x, each_count), xycoords='data')
each_count += 1
#print(f'each_count = {each_count}' )
# format xaxis
plt.setp( ax.get_xticklabels(), rotation=30 )
# remove top and right spines
for spine in ["top", "right"]:
ax.spines[spine].set_visible(False)
# show axis name
ax.get_yaxis().set_label_text(label='Y-axis')
ax.get_xaxis().set_label_text(label='X-axis')
plt.show()
Current output:

About your first question. Basically, you make a list of all days between the days you are using and use that. So add this to the beginning of your code:
import pandas as pd
alldays = pd.date_range(start="20200123",
end="20200221",
normalize=True)
dates = []
for i in alldays:
dates.append(f"{i.month:02}-{i.day:02}")
What this does is it gets a pandas data range between two dates and converts this range into a list of month-day strings.
Then modify this part of your code like this:
# Collate data's y-axis for each date, i.e. history
history2={}
cdates = list(data.values())
sorted_dates = sorted( set( cdates ) )
for i in dates: # This is the only change!
cases=[]
for case, date in data.items():
if i == date:
cases.append(case)
#print( i, cases)
history2[i] = cases
And this change would give you this:
About your second question, change your code to this:
# annotate stem lines
for ann_x, label in list(history2.items()):
print(ann_x, label)
each_count=1
for each in label:
ax.annotate(f"--{each}", xy=(ann_x, each_count), xycoords='data', rotation=30)
each_count += 1
I just changed the ax.annotate line. The two changes are:
added a "--" to each of your annotation labels,
added a rotation parameter. The rotation parameter does not appear directly in the documentation, but the documentation says you can use any of the methods for Text as kwargs, and they are here.
This would hopefully give you what you have asked for:

Adding to #SinanKurmus answer to my 1st Question:
Solution1:
A time-axis with a daily interval for the entire history of the given data can be obtained using matplotlib's methods, namely drange and num2date, and python. The use of pandas can be avoided here.
First, express the start and end date of the time axis as a python datetime object. Note, you need to add 1 more day to the end date else data from the last date would not be included. Next, use 1 day as your time interval using python's datetime.timedelta object. Next supply them to matplotlib.date.drange method that will return a NumPy array. Matplotlib's num2date method in turns converts that back to a python datetime object.
def get_time_axis( data ):
start = datetime.strptime(min(data.values()), "%Y-%m-%d")
end = datetime.strptime(max(data.values()), "%Y-%m-%d") + timedelta(days=1)
delta = timedelta(days=1)
time_axis_md = mdates.drange( start, end, delta )
time_axis_py = mdates.num2date( time_axis_md, tz=None ) # Add tz when required
return time_axis_py
Solution 2:
Apparently, Matplotlib also has a FAQ on how to skip dates where there is no data. I have included their sample code example below.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.ticker as ticker
r = mlab.csv2rec('../data/aapl.csv')
r.sort()
r = r[-30:] # get the last 30 days
N = len(r)
ind = np.arange(N) # the evenly spaced plot indices
def format_date(x, pos=None):
thisind = np.clip(int(x+0.5), 0, N-1)
return r.date[thisind].strftime('%Y-%m-%d')
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(ind, r.adj_close, 'o-')
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
fig.autofmt_xdate()
plt.show()

Related

How to plot multiple daily time series, aligned at specified trigger times?

The Problem:
I have a dataframe df that looks like this:
value msg_type
date
2022-03-15 08:15:10+00:00 122 None
2022-03-15 08:25:10+00:00 125 None
2022-03-15 08:30:10+00:00 126 None
2022-03-15 08:30:26.542134+00:00 127 ANNOUNCEMENT
2022-03-15 08:35:10+00:00 128 None
2022-03-15 08:40:10+00:00 122 None
2022-03-15 08:45:09+00:00 127 None
2022-03-15 08:50:09+00:00 133 None
2022-03-15 08:55:09+00:00 134 None
....
2022-03-16 09:30:09+00:00 132 None
2022-03-16 09:30:13.234425+00:00 135 ANNOUNCEMENT
2022-03-16 09:35:09+00:00 130 None
2022-03-16 09:40:09+00:00 134 None
2022-03-16 09:45:09+00:00 135 None
2022-03-16 09:50:09+00:00 134 None
The value data occurs in roughly 5 minute intervals, but messages can occur at any time. I am trying to plot one line of values per day, where the x-axis ranges from t=-2 hours to t=+8 hours, and the ANNOUNCEMENT occurs at t=0 (see image below).
So, for example, if an ANNOUNCEMENT occurs at 8:30AM on 3/15 and again at 9:30AM on 3/16, there should be two lines:
one line for 3/15 that plots data from 6:30AM to 4:30PM, and
one line for 3/16 that plots data from 7:30AM to 5:30PM,
both sharing the same x-axis ranging from -2 to +8, with ANNOUNCEMENT at t=0.
What I've Tried:
I am able to do this currently by finding the index position of an announcement (e.g. say it occurs at row 298 -> announcement_index = 298), generating an array of 120 numbers from -24 to 96 (representing 10 hours at 5 minutes per number -> x = np.arange(-24, 96, 1)), then plotting
sns.lineplot(x, y=df['value'].iloc[announcement_index-24:announcement_index+96])
While this does mostly work (see image below), I suspect it's not the correct way to go about it. Specifically, trying to add more info to the plot (like a different set of 'value' markers) at specific times is difficult because I would need to convert the timestamp into this arbitrary 24-96 value range.
How can I make this same plot but by utilizing the datetime index instead? Thank you so much!
Assuming the index has already been converted to_datetime, create an IntervalArray from -2H to +8H of the index:
dl, dr = -2, 8
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')
df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)
Then for each ANNOUNCEMENT, plot the window from interval.left to interval.right:
Set the x-axis as seconds since ANNOUNCEMENT
Set the labels as hours since ANNOUNCEMENT
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT'].itertuples():
window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
window.index -= ann.Index # compute time since announcement
window.index = window.index.total_seconds() # convert to seconds since announcement
window.plot(ax=ax, y='value', label=ann.Index.date())
deltas = np.arange(dl, dr + 1)
ax.set(xticks=deltas * 3600, xticklabels=deltas) # set tick labels to hours since announcement
ax.legend()
Here is the output with a smaller window -1H to +2H just so we can see the small sample data more clearly (full code below):
Full code:
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
s = '''
date,value,msg_type
2022-03-15 08:15:10+00:00,122,None
2022-03-15 08:25:10+00:00,125,None
2022-03-15 08:30:10+00:00,126,None
2022-03-15 08:30:26.542134+00:00,127,ANNOUNCEMENT
2022-03-15 08:35:10+00:00,128,None
2022-03-15 08:40:10+00:00,122,None
2022-03-15 08:45:09+00:00,127,None
2022-03-15 08:50:09+00:00,133,None
2022-03-15 08:55:09+00:00,134,None
2022-03-16 09:30:09+00:00,132,None
2022-03-16 09:30:13.234425+00:00,135,ANNOUNCEMENT
2022-03-16 09:35:09+00:00,130,None
2022-03-16 09:40:09+00:00,134,None
2022-03-16 09:45:09+00:00,135,None
2022-03-16 09:50:09+00:00,134,None
'''
df = pd.read_csv(io.StringIO(s), index_col=0, parse_dates=['date'])
# create intervals from -1H to +2H of the index
dl, dr = -1, 2
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')
df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)
# plot each announcement's interval.left to interval.right
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT')].itertuples():
window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
window.index -= ann.Index # compute time since announcement
window.index = window.index.total_seconds() # convert to seconds since announcement
window.plot(ax=ax, y='value', label=ann.Index.date())
deltas = np.arange(dl, dr + 1)
ax.set(xticks=deltas * 3600, xticklabels=deltas) # set tick labels to hours since announcement
ax.grid()
ax.legend()

Iterating rows and collecting values for output. Numpy and Python 3.6

Update 5/22/18: Answer by #aorr below original question.
I am trying to collect each ID and the data for that ID for thousands of inputs.
I am trying to collect each row of individual ID's, sort the dates, then plot each ID + plus data and export the chart for each ID.
Edited
Sample data:
Col names: Id Date O G Company Date2
aab72ffd-4d0b-4c62-b6fe-4c55b98be9a0 3/1/1999 180.66 673 A 1/1/1996
aab72ffd-4d0b-4c62-b6fe-4c55b98be9a0 3/1/1995 173.9 651 A 1/1/1996
a15961bc-0263-4c66-a825-1deb69bda8be 12/1/2010 55.14 542 C 1/1/2011
a15961bc-0263-4c66-a825-1deb69bda8be 5/1/2012 49.24 577 C 1/1/2011
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 12/1/2000 48.14 290 D 3/1/2002
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 3/1/2003 69.03 282.5 D 3/1/2002
Desired output arrays/charts, but sorted by date.
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 7/1/2005 28.24 327
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 10/1/1998 45.11 335
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 7/1/2001 28.22 348
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 7/1/1997 44.53 350.5
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 8/1/2001 28.4 333.5
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 10/1/2005 41.72 314
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 12/1/2001 29.53 313.5
10a1d17b-1f5c-4a4d-8186-e4dbf62e3bf2 8/1/2002 43.24 319
The code I have typed so far successfully creates an indexed array of the the different data types. Now, I am just trying to iterate over all rows and organize the data so that it prints out/writes individual arrays/charts based on ID's.
Here is what I have so far:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#import data
mydataset = pd.read_csv('input_test.csv', dtype=None)
x = mydataset.iloc[:,:].values
y = mydataset.iloc[:,:].values
#Id
b = np.array((x[:,0]), dtype=str)
#Date
c = np.array((x[:,1]), dtype=str)
# O Var
d = np.array((x[:,2]), dtype=int)
# G var
e = np.array((x[:,3]), dtype=int)
#Stack
f = np.vstack((b,c,d,e))
#Transpose array
g = f.T
#Plot data
plt.figure()
plt.plot(x[:,2], y[:,3], label ='Rate over time')
plt.xlabel('m')
plt.ylabel('r/m')
#plt.legend()
Update based on #aorr answer:
Thank's for helping us noobs.
This plots both O and G on the Y axis with Date on the X axis for each Id. And everything is sorted based on date. Great starting point to expand with this data. More to follow based on updates.
for Id in data['Id'].unique():
fig, ax = plt.subplots(figsize=(5,3))
plot_data = data.query("Id==#Id").sort_values('Date')
_ = plot_data.plot(x='Date',y='O', ax=ax)
_ = plot_data.plot(x='Date', y='G', ax=ax)
#Plot Company name in each chart
for Company in plot_data[Company]:
_ = plt.title(Company)
#Plot Date2 Event onto X-axis
for Date2 in plot_data[Date2]:
_ = plt.axvline(Date2)
Have you tried solving this with pandas? I don't think you need to create numpy arrays for every element, pandas already stores them as ndarrays internally.
import matplotlib.pyplot as plt
data = pd.read_csv('input_test.csv', parse_dates=['date'])
for id in data['id'].unique():
fig, ax = plt.subplots(figsize=(5,3))
plot_data = data.query("id==#id").sort_values('date')
_ = plot_data.plot(x='O',y='G', ax=ax)
that should get you nearly all the way there. The pandas visualization docs here have a bunch of other really helpful options for exploring data quickly, but if you're picky about the look of the figure then you'll want to use straight matplotlib for the figure and axes layouts.

ggplot multiple plots in one object

I've created a script to create multiple plots in one object. The results I am looking for are two plots one over the other such that each plot has different y axis scale but x axis is fixed - dates. However, only one of the plots (the top) is properly created, the bottom plot is visible but empty i.e the geom_line is not visible. Furthermore, the y-axis of the second plot does not match the range of values - min to max. I also tried using facet_grid (scales="free") but no change in the y-axis. The y-axis for the second graph has a range of 0 to 0.05.
I've limited the date range to the past few weeks. This is the code I am using:
df = df.set_index('date')
weekly = df.resample('w-mon',label='left',closed='left').sum()
data = weekly[-4:].reset_index()
data= pd.melt(data, id_vars=['date'])
pplot = ggplot(aes(x="date", y="value", color="variable", group="variable"), data)
#geom_line()
scale_x_date(labels = date_format('%d.%m'),
limits=(data.date.min() - dt.timedelta(2),
data.date.max() + dt.timedelta(2)))
#facet_grid("variable", scales="free_y")
theme_bw()
The dataframe sample (df), its a daily dataset containing values for each variable x and a, in this case 'date' is the index:
date x a
2016-08-01 100 20
2016-08-02 50 0
2016-08-03 24 18
2016-08-04 0 10
The dataframe sample (to_plot) - weekly overview:
date variable value
0 2016-08-01 x 200
1 2016-08-08 x 211
2 2016-08-15 x 104
3 2016-08-22 x 332
4 2016-08-01 a 8
5 2016-08-08 a 15
6 2016-08-15 a 22
7 2016-08-22 a 6
Sorry for not adding the df dataframe before.
Your calls to the plot directives geom_line(), scale_x_date(), etc. are standing on their own in your script; you do not connect them to your plot object. Thus, they do not have any effect on your plot.
In order to apply a plot directive to an existing plot object, use the graphics language and "add" them to your plot object by connecting them with a + operator.
The result (as intended):
The full script:
from __future__ import print_function
import sys
import pandas as pd
import datetime as dt
from ggplot import *
if __name__ == '__main__':
df = pd.DataFrame({
'date': ['2016-08-01', '2016-08-08', '2016-08-15', '2016-08-22'],
'x': [100, 50, 24, 0],
'a': [20, 0, 18, 10]
})
df['date'] = pd.to_datetime(df['date'])
data = pd.melt(df, id_vars=['date'])
plt = ggplot(data, aes(x='date', y='value', color='variable', group='variable')) +\
scale_x_date(
labels=date_format('%y-%m-%d'),
limits=(data.date.min() - dt.timedelta(2), data.date.max() + dt.timedelta(2))
) +\
geom_line() +\
facet_grid('variable', scales='free_y')
plt.show()

Plotting datetime output using matplotlib

So I have this code based on a simple data array that looks like this:
5020 : 2015 7 11 11 42 54 782705
5020 : 2015 7 11 11 44 55 575776
5020 : 2015 7 11 11 46 56 560755
5020 : 2015 7 11 11 48 57 104872
and the plot looks like the following:
import scipy as sp
import matplotlib.pyplot as plt
data = sp.genfromtxt("E:/Python/data.txt", delimiter=" : ")
x = data[:,0]
y = data[:,1]
plt.scatter(x,y)
plt.title("Instagram")
plt.xlabel("Time")
plt.ylabel("Followers")
plt.xticks([w*2*60 for w in range(10)],
['2-minute interval %i'%w for w in range(10)])
plt.autoscale(tight=True)
plt.grid()
plt.show()
I'm looking for a simple way to use the datetime output as x intervals on the graph, I can't figure out a way to make it understand it and there's this:
In [15]:sp.sum(sp.isnan(y))
Out[15]: 77
Which I guess is because of the spaces? I'm new to machine learning in Python, forgive my ignorance.
Thank you very much.
I would solve this by directly passing datetime.datetime objects to pyplot. Here is a short example:
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib
# Note: please figure out yourself the data input
x = [dt.datetime(2015,7,11,11,42,54),
dt.datetime(2015,7,11,11,44,56),
dt.datetime(2015,7,11,11,46,56),
dt.datetime(2015,7,11,11,48,57)]
#define the x limit:
xstart= dt.datetime(2015,7,11,11,40,54)
xstop = dt.datetime(2015,7,11,11,50,54)
y = [782705, 575776, 560755, 104872]
fig,ax= plt.subplots()
ax.scatter(x,y)
xfmt = matplotlib.dates.DateFormatter('%D %H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
ax.set_title("Instagram")
ax.set_xlabel("Time")
ax.set_ylabel("Followers")
ax.set_xlim(xstart,xstop)
plt.xticks(rotation='vertical')
plt.show()
Result:
Yes it's because of the spaces. When you're importing the data it's assigning NaN to your x values.
Try this, it's a little longer but should work:
data = []
x=[]
y=[]
with open('data.txt', 'r') as f:
for line in f:
data.append(line.split(':'))
for i in data:
y.append(i[0])
x_old.append(i[1])
for t in x_old:
x.append(float(t[17:19]+'.'+t[20:])/60+int(t[14:16]))
Because of the spaces I had to convert the data into float manually. I divided the seconds+milliseconds by 60 then added to minutes since I'm assuming you're only interested in that (2 min interval).
If the format is done better you can use datetime and extract the information better. For example:
my_time = datetime.strptime('2015 7 11 11 42 54.782705', '&Y &m %d %H:%M:%S.%f')

Memory leak when using matplotlib.collection.LineCollection

I am using the following code to create a collection of color coded line plots:
for j in idlist[i]:
single_traj(lonarray, latarray, parray)
plt.savefig(savename, dpi = 400)
plt.close('all')
plt.clf()
where:
def single_traj(lonarray, latarray, parray, linewidth = 0.7):
"""
Plots XY Plot of one trajectory, with color as a function of p
Helper Function for DrawXYTraj
"""
global lc
x = lonarray
y = latarray
p = parray
points = np.array([x,y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = col.LineCollection(segments, cmap=plt.get_cmap('Spectral'),
norm=plt.Normalize(100, 1000), alpha = 0.8)
lc.set_array(p)
lc.set_linewidth(linewidth)
plt.gca().add_collection(lc)
Somehow, this loop uses a lot of memory (> ~10GB), which is still being used after the plot is saved.
I used hpy to look at memory usage
Partition of a set of 27472988 objects. Total size = 10990671168 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 8803917 32 9226505016 84 9226505016 84 dict of matplotlib.path.Path
1 8888542 32 711083360 6 9937588376 90 numpy.ndarray
2 8803917 32 563450688 5 10501039064 96 matplotlib.path.Path
3 11 0 219679112 2 10720718176 98 guppy.sets.setsc.ImmNodeSet
4 25407 0 77593848 1 10798312024 98 list
5 89367 0 28232616 0 10826544640 99 dict (no owner)
6 7642 0 25615984 0 10852160624 99 dict of matplotlib.collections.LineCollection
7 15343 0 16079464 0 10868240088 99 dict of
matplotlib.transforms.CompositeGenericTransform
8 15327 0 16062696 0 10884302784 99 dict of matplotlib.transforms.Bbox
9 53741 0 15047480 0 10899350264 99 dict of weakref.WeakValueDictionary
At this point the plot is already saved, so all matplotlib related objects should be gone... But I cant "find" these objects, which means I don't know how to delete them.
EDIT:
Here is a stand-alone example which reproduces the leak (savefig throws an error for some reason but isn't relevant anyway):
# Memory leak test!
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.collections as col
def draw():
x = range(1000)
y = range(1000)
p = range(1000)
fig = plt.figure(figsize = (12,8))
ax = plt.gca()
ax.set_aspect('equal')
for i in range(1000):
if i%100 == 0:
print i
points = np.array([x,y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = col.LineCollection(segments, cmap=plt.get_cmap('Spectral'),
norm=plt.Normalize(0, 1000), alpha = 0.8)
lc.set_array(p)
lc.set_linewidth(0.7)
plt.gca().add_collection(lc)
cb = fig.colorbar(lc, shrink = 0.7)
cb.set_label('p')
cb.ax.invert_yaxis()
plt.tight_layout()
#plt.savefig('./mem_test.png', dpi = 400)
plt.close('all')
plt.clf()
draw()
a = input('Wait...')
The draw() function should delete all plt objects, but they still use up memory after the function is called. I just check it with top/htop!
It seems from your hpy dump that the memory hog consists of a large number of matplotlib.path.Paths. This may be due to your variable lc. Have you tried del lc? It may be that plt.close is not (at least should not be!) able to delete them, as they are in your global variable lc.

Categories