Why doesn't this code save my figures with titles? - python

I'm producing some figures with the following code:
def boxplot_data(self,parameters_file,figure_title):
data = pandas.read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
remainder = len(header)-(16*number_of_full_subplots)
try:
for i in range(number_of_full_subplots+1):
fig =plt.figure(i)
txt = fig.suptitle(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots),fontsize='20')
txt.set_text(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots))
for j in range(16):
plt.ioff()
plt.subplot(4,4,j)
plt.boxplot(data[header[16*i+j]])
plt.xlabel('')
mng=plt.get_current_fig_manager()
mng.window.showMaximized()
plt.savefig(str(i)+'.png',bbox_inches='tight',orientation='landscape')
plt.close(fig)
plt.ion()
except IndexError:
txt = fig.suptitle(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots),fontsize='20')
txt.set_text(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots))
print '{} full figures were created and 1 partially filled \
figure containing {} subplots'.format(number_of_full_subplots,remainder)
This produces and saves the figures to file in the properly formatted manner however, no matter what I do the code seems to bypass the fig.suptitle line(s) and consequently I can't give my figure a title. Apologies if it seems there is a lot going on in this function that I haven't explained but does anybody have an explanation as to why this code refuses to give my figures titles?

Your problem is not that suptitle is bypassed, but that you are never saving the figure that you call suptitle on. All your calls to savefig are within the inner loop and as such are saving only the subplots. You can actually watch this happening if you open the png file while your code is running - you see each of the 16 sub axes being added one by one.
Your code looks unnecessarily complicated. For instance, I don't think you need to use ion and ioff. Here is a simple example of how to do what I think you want, followed by a translation of your code to fit that (Obviously i can't test, because I don't have your data)
import matplotlib.pyplot as plt
test_y=range(10)
test_x=[8,13,59,8,81,2,5,6,2,3]
def subplotsave_test():
for i in range(5):
fig = plt.figure(i)
txt = fig.suptitle('Page '+str(i)+' of '+str(5),fontsize='20')
for j in range(16):
plt.subplot(4,4,j+1)
plt.plot(test_y,test_x)
plt.savefig(str(i)+'.png',bbox_inches='tight',orientation='landscape')
if __name__ == '__main__':
subplotsave_test()
One tip I have found works for me - do a plt.show() wherever you intend to save the figure and ensure it looks like you want beforehanad and then replace that call with plt.savefig()
Possible translation of your function
def boxplot_data(self,parameters_file,figure_title):
data = pandas.read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
remainder = len(header)-(16*number_of_full_subplots)
for i in range(number_of_full_subplots+1)
fig =plt.figure(i)
fig.suptitle(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots),fontsize='20')
for j in range(16):
plt.subplot(4,4,j+1)
if 16*i + j < len(header):
plt.boxplot(data[header[16*i+j]])
plt.xlabel('')
#You might want the showMaximized() call here - does nothing
#on my machine but YMMV
else:
print '{} full figures were created and 1 partially filled \
figure containing {} subplots'.format(number_of_full_subplots,remainder)
break
plt.savefig(str(i)+'.png',bbox_inches='tight',orientation='landscape')
plt.close(fig)

Related

Multi-Line Graph: Stuck with the json-dict format for the lines

Hello I am totally new to using Python for data visualisation, I have this json response:
{
"max365": 83.87,
"current365": 83.87,
"min365": 75.29,
"max180": 76.94,
"current180": 76.94,
"min180": 56.43,
"max90": 98.66,
"current90": 98.66,
"min90": 63.29,
"max30": 138.14,
"current30": 136,
"min30": 66.77,
"max14": 156.93,
"current14": 122.88,
"min14": 72.56,
"max7": 168.9,
"current7": 122.68,
"min7": 74.08,
"max0": 267.5,
"current0": 81.28,
"min0": 36.07 }
max, current and min are the lines I would like to plot on a multi-line graph, but I am struggling with the data in this date/time-grouping format with the response.
I've added a screenshot of a graph here that I am essentially trying to reverse engineer:
I've seen some useful posts for generic line graphs, but my issue here is mostly linking all the max/current/mins onto their own lines whilst I have 0/7/14/30/90/180/365 grouping/intersecting each of them in the response.
Hope I've explained it well enough. Any help would be greatly appreciated.
Python standard library has a json module, you need to import just the loads method, and massage a little bit the data.
… and here it is the code — I'd like to underline that we need to split the labels to get out the real label and the sequence info(¿are the numbers time? I made an educated guess), so for each label we construct a list of lists, each element a time and a value, then we sort the lists inside each labelled list and finally we plot the three lines.
from matplotlib.pyplot import subplots
from json import loads
def split_num(s):
num = []
for c in reversed(s):
if c.isdigit():
num += c
else:
break
if num:
return s[:-len(num)], ''.join(reversed(num))
else:
return s, ''
json = '''{
"max365": 83.87,"current365": 83.87,"min365": 75.29,
"max180": 76.94,"current180": 76.94,"min180": 56.43,
"max90": 98.66,"current90": 98.66,"min90": 63.29,
"max30": 138.14,"current30": 136,"min30": 66.77,
"max14": 156.93,"current14": 122.88,"min14": 72.56,
"max7": 168.9,"current7": 122.68,"min7": 74.08,
"max0": 267.5,"current0": 81.28,"min0": 36.07 }'''
jdict = loads(json)
data = {}
for k in jdict:
name, num = split_num(k)
data[name] = data.setdefault(name, []) + [[int(num), float(jdict[k])]]
for k in data: data[k] = sorted(data[k])
fig, ax = subplots()
for k in data:
ax.plot(*zip(*data[k]), label=k)
ax.legend()
fig.show()

Python - plt.savefig in a nested for loop

I would like to create a pdf containing 12 figures/subplot processed in a nested 'for' loop.
The 12 figures are produced from 4 metrics applied to 3 different variables (3*4=12).
Here is the code that I am using:
#########VARIABLES#############################
df = pd.read_csv('scores.csv',delimiter=',')
sta='Jony'
cvars=['Qle','Qh','NEE']
exps=['ctr','lai']
expsC={'ctr':'k','lai':'b'}
stats=['MBE','SD','NME','pcorr']
##################FUNCTION#########################
def extract_score(df,station,cvar,exp,score):
a=df[(df.station == station) & (df.cvar == cvar) &
(df.exp == exp) ]
return a[score].values[0]
def extract_score_exp(df,station,cvar,score,expS=None):
out=[]
if expS is None:
exps=np.unique(df.exp)
else:
expsf=np.unique(df.exp)
exps=[]
for ee in expsf:
if expS in ee:
exps.append(ee)
for exp in exps:
out.append(extract_score(df,station,cvar,exp,score))
return exps,out
def get_val_exp(exps,pos=1):
return [float(bb.split('_')[pos]) for bb in exps]
#####################PLOT###################################
for j,cvar in enumerate(cvars):
fig=plt.figure()
ax1 = plt.subplot2grid((3,1), (j,0))
for i,stat in enumerate(stats):
ax2 = plt.subplot2grid((1,4), (0,i))
for exp in exps:
xscore0=extract_score(df,sta,cvar,exp,stat)
xx0=250
expsP,xscore=extract_score_exp(df,sta,cvar,stat,expS='_'+exp)
xx=get_val_exp(expsP)
ax2.plot(xx,xscore,'.',c=expsC[exp])
ax2.plot(xx0,xscore0,'x',c=expsC[exp])
ax.set_title(stat,fontsize=8)
plt.show()
plt.savefig('/home/david/test_pals/result/output/test.pdf')
plt.close()
The code runs fine and the ´plt.show()´ command actually shows what I want, as seen on the following picture:
But unfortunately, the ´plt.savefig´ command just save the last 1*4 subplot produced:
I tried different place in the loop for the ´plt.savefig´ but I did not succeed to save the 4*3 figures in one pdf. Does anyone has indication why?

Python-PPTX : Data Label Positions not working for Doughnut Chart

I have a Chart Placeholder, into which I have inserted a chart of chart_type 'DOUGHNUT'. I've added data labels to it and want to change their positions. For some reason, the method given in the documentation has no effect on my chart.
Here is my code, please help if I'm doing something wrong -
from pptx import Presentation
from pptx.chart.data import ChartData
from pptx.enum.chart import XL_CHART_TYPE, XL_LABEL_POSITION, XL_DATA_LABEL_POSITION, XL_TICK_MARK, XL_TICK_LABEL_POSITION
chart_data = ChartData()
chart_data.add_series('', tuple(input_chart_data[x] for x in input_chart_data))
graphic_frame = content_placeholder.insert_chart(XL_CHART_TYPE.DOUGHNUT, chart_data)
chart = graphic_frame.chart
chart.has_legend = False
#Adding Data-Labels with custom text
chart.plots[0].has_data_labels = True
data_labels = chart.plots[0].data_labels
i = 0
series = chart.series[0]
for point in series.points:
fill = point.format.fill
fill.solid()
fill.fore_color.rgb = RGBColor(<color_code>)
point.data_label.has_text_frame = True
#Assigning custom text for data label associated with each data-point
point.data_label.text_frame.text = str(chart_data.categories[i].label) + "\n" + str(float(chart.series[0].values[i])) + "%"
for run in point.data_label.text_frame.paragraphs[0].runs:
run.font.size = Pt(10)
i+=1
data_labels.position = XL_LABEL_POSITION.OUTSIDE_END
PowerPoint is finicky about where you place certain chart attributes and feels free to ignore them when it wants (although it does so consistently).
A quick option worth trying is to set the value individually, point-by-point in the series. So something like:
for point in series.points:
point.data_label.position = XL_LABEL_POSITION.OUTSIDE_END
The most reliable method is to start by producing the effect you want by hand, using PowerPoint itself on an example chart, then inspecting the XML PowerPoint produces in the saved file, perhaps using opc-diag. Once you've identified what XML produces the desired effect (or discovered PowerPoint won't let you do it), then you can proceed to working out how to get the XML generated by python-pptx. That might make a good second question if you're able to get that far.
I made it work by writing the below code.
def apply_data_labels(self, chart):
plot = chart.plots[0]
plot.has_data_labels = True
for series in plot.series:
values = series.values
counter = 0
for point in series.points:
data_label = point.data_label
data_label.has_text_frame = True
data_label.text_frame.text = str(values[counter])
counter = counter + 1
the cause of error is setting the label position. no matter what you set it asks to repair the PPT. will have to drill down more to see why is it so.
Also to save some more time the formatting doesn't works(font color, size)
If anybody has any leads then please help.
To add on Vibhanshu's response, I could get the formatting (font type, font color, size etc) to work using the following code:
for idx, point in enumerate(chart.series[0].points):
# set position
point.data_label.position = XL_LABEL_POSITION.OUTSIDE_END
# set text
point.data_label.has_text_frame = True
point.data_label.text_frame.text = "This is an example"
# set formatting
for paragraph_idx, paragraph in enumerate(point.data_label.text_frame.paragraphs):
paragraph.line_spacing = 0.6 # set paragraph line spacing
for run in paragraph.runs:
run.font.size = Pt(30) #set font size
run.font.name = 'Poppins Medium' #set font name
run.font.color.rgb = RGBColor.from_string("FF0000") #set font color

Python return statement in PlotAnimation function

I have a problem in a function to be used in matplotlib FuncAnimation.
I'm aware this is not 100% related to the PlotAnimation code. It is more related to how return works. Anyway, here it go:
I need to plot several lines in the same graph - but I don't know in advance how many lines. So, I created a list of plots (called lines) like that:
fig = plt.figure()
ax = plt.axes(xlim=(0, 1), ylim=(0, 1), aspect='equal')
lines=[]
for i in range(len(disc)):
line, = ax.plot([], [], '-', lw=1.5)
lines.append(line)
Let's say len(disc) is equal to or greater than 3. If I do this:
def animate(i):
global disc
for i in range(len(disc)):
lines[i].set_data(disc[i].x, disc[i].y)
return line[0], line[1], line[2]
anim = animation.FuncAnimation(fig, animate, blit=True)
plt.show()
All works perfecly (except for the fact that, if len(disc) is greated than 3, I miss some of the lines in the plot).
However, I don't know len(disc) beforehand. So, my question is: Is there a way to use something like that:
return lines
(or something like that; I know the above doesn't work), replacing the return statement in the animate function above?
The first thing I can think of is using a for loop
return lines[0:len(lines)]
I don't know, if I got this right, as I don't use matplotlib myself, but from the documentation I got an idea.
Why does return lines not work? It should be equal to
return lines[0], lines[1], ..., lines[len(lines)-1]
Maybe an iterator helps here. The keyword is yield:
def def animate(i):
global disc
for i in range(len(disc)):
lines[i].set_data(disc[i].x, disc[i].y)
yield from lines # in Python 2.x use: yield iter(lines)
Whereas I don't think, this is the correct/desired way to got, I cannot think of another solution for now.
Do you really need to return something? The documentation doesn't mention a required return-value.

How to move from user defined function directly to a chart?

I'm trying to build a module that will count all the color words in a text and then display them on a pie chart. I already have the counting and the chart coded separately, but I can't figure out how to connect them.
Right now I have:
def colorperc(text):
y0 = text.count("red")...text.count("rust")
y1 = ...
return {y0...y10}
colorperc(mobydick.txt)
...
fracs y0, y1...
How do I take these return values and plug them into the piechart fracs? I just keep getting an error: NameError: name 'y0' is not defined
First of all, text.count is unlikely to do what you want.
>>> "Red wired tired".count("red")
2
You will probably want to read through the text line by line, lowercase and split each line, and update a Counter. Then as Matt said, you will need to assign the value returned by your function rather than trying to access local variables.
Since your comment says you're using nltk, perhaps something like this (untested). We'll assume you're happy having the whole text in memory, since that matches your problem description.
from collections import Counter
import matplotlib.pyplot as plt
import nltk
def countcolors(text)
tokens = nltk.word_tokenize(text)
tokens = map(lamda x: x.lower(), tokens)
ctr = Counter(tokens)
colorlist = ("red", "rust", "blue", "green")
colorcounts = dict((color,ctr[color]) for clr in colorlist)
return colorcounts
#we'll call the text in memory `mytext`
colors, counts = countcolors(mytext).items()
fig, ax = plt.subplots(1,1)
ax.pie(counts,labels=colors)
plt.show()
Note that we simply use assignment (with =) to get the value returned by the function.

Categories