I'm trying to build a module that will count all the color words in a text and then display them on a pie chart. I already have the counting and the chart coded separately, but I can't figure out how to connect them.
Right now I have:
def colorperc(text):
y0 = text.count("red")...text.count("rust")
y1 = ...
return {y0...y10}
colorperc(mobydick.txt)
...
fracs y0, y1...
How do I take these return values and plug them into the piechart fracs? I just keep getting an error: NameError: name 'y0' is not defined
First of all, text.count is unlikely to do what you want.
>>> "Red wired tired".count("red")
2
You will probably want to read through the text line by line, lowercase and split each line, and update a Counter. Then as Matt said, you will need to assign the value returned by your function rather than trying to access local variables.
Since your comment says you're using nltk, perhaps something like this (untested). We'll assume you're happy having the whole text in memory, since that matches your problem description.
from collections import Counter
import matplotlib.pyplot as plt
import nltk
def countcolors(text)
tokens = nltk.word_tokenize(text)
tokens = map(lamda x: x.lower(), tokens)
ctr = Counter(tokens)
colorlist = ("red", "rust", "blue", "green")
colorcounts = dict((color,ctr[color]) for clr in colorlist)
return colorcounts
#we'll call the text in memory `mytext`
colors, counts = countcolors(mytext).items()
fig, ax = plt.subplots(1,1)
ax.pie(counts,labels=colors)
plt.show()
Note that we simply use assignment (with =) to get the value returned by the function.
Related
Hello I am totally new to using Python for data visualisation, I have this json response:
{
"max365": 83.87,
"current365": 83.87,
"min365": 75.29,
"max180": 76.94,
"current180": 76.94,
"min180": 56.43,
"max90": 98.66,
"current90": 98.66,
"min90": 63.29,
"max30": 138.14,
"current30": 136,
"min30": 66.77,
"max14": 156.93,
"current14": 122.88,
"min14": 72.56,
"max7": 168.9,
"current7": 122.68,
"min7": 74.08,
"max0": 267.5,
"current0": 81.28,
"min0": 36.07 }
max, current and min are the lines I would like to plot on a multi-line graph, but I am struggling with the data in this date/time-grouping format with the response.
I've added a screenshot of a graph here that I am essentially trying to reverse engineer:
I've seen some useful posts for generic line graphs, but my issue here is mostly linking all the max/current/mins onto their own lines whilst I have 0/7/14/30/90/180/365 grouping/intersecting each of them in the response.
Hope I've explained it well enough. Any help would be greatly appreciated.
Python standard library has a json module, you need to import just the loads method, and massage a little bit the data.
… and here it is the code — I'd like to underline that we need to split the labels to get out the real label and the sequence info(¿are the numbers time? I made an educated guess), so for each label we construct a list of lists, each element a time and a value, then we sort the lists inside each labelled list and finally we plot the three lines.
from matplotlib.pyplot import subplots
from json import loads
def split_num(s):
num = []
for c in reversed(s):
if c.isdigit():
num += c
else:
break
if num:
return s[:-len(num)], ''.join(reversed(num))
else:
return s, ''
json = '''{
"max365": 83.87,"current365": 83.87,"min365": 75.29,
"max180": 76.94,"current180": 76.94,"min180": 56.43,
"max90": 98.66,"current90": 98.66,"min90": 63.29,
"max30": 138.14,"current30": 136,"min30": 66.77,
"max14": 156.93,"current14": 122.88,"min14": 72.56,
"max7": 168.9,"current7": 122.68,"min7": 74.08,
"max0": 267.5,"current0": 81.28,"min0": 36.07 }'''
jdict = loads(json)
data = {}
for k in jdict:
name, num = split_num(k)
data[name] = data.setdefault(name, []) + [[int(num), float(jdict[k])]]
for k in data: data[k] = sorted(data[k])
fig, ax = subplots()
for k in data:
ax.plot(*zip(*data[k]), label=k)
ax.legend()
fig.show()
I have a Chart Placeholder, into which I have inserted a chart of chart_type 'DOUGHNUT'. I've added data labels to it and want to change their positions. For some reason, the method given in the documentation has no effect on my chart.
Here is my code, please help if I'm doing something wrong -
from pptx import Presentation
from pptx.chart.data import ChartData
from pptx.enum.chart import XL_CHART_TYPE, XL_LABEL_POSITION, XL_DATA_LABEL_POSITION, XL_TICK_MARK, XL_TICK_LABEL_POSITION
chart_data = ChartData()
chart_data.add_series('', tuple(input_chart_data[x] for x in input_chart_data))
graphic_frame = content_placeholder.insert_chart(XL_CHART_TYPE.DOUGHNUT, chart_data)
chart = graphic_frame.chart
chart.has_legend = False
#Adding Data-Labels with custom text
chart.plots[0].has_data_labels = True
data_labels = chart.plots[0].data_labels
i = 0
series = chart.series[0]
for point in series.points:
fill = point.format.fill
fill.solid()
fill.fore_color.rgb = RGBColor(<color_code>)
point.data_label.has_text_frame = True
#Assigning custom text for data label associated with each data-point
point.data_label.text_frame.text = str(chart_data.categories[i].label) + "\n" + str(float(chart.series[0].values[i])) + "%"
for run in point.data_label.text_frame.paragraphs[0].runs:
run.font.size = Pt(10)
i+=1
data_labels.position = XL_LABEL_POSITION.OUTSIDE_END
PowerPoint is finicky about where you place certain chart attributes and feels free to ignore them when it wants (although it does so consistently).
A quick option worth trying is to set the value individually, point-by-point in the series. So something like:
for point in series.points:
point.data_label.position = XL_LABEL_POSITION.OUTSIDE_END
The most reliable method is to start by producing the effect you want by hand, using PowerPoint itself on an example chart, then inspecting the XML PowerPoint produces in the saved file, perhaps using opc-diag. Once you've identified what XML produces the desired effect (or discovered PowerPoint won't let you do it), then you can proceed to working out how to get the XML generated by python-pptx. That might make a good second question if you're able to get that far.
I made it work by writing the below code.
def apply_data_labels(self, chart):
plot = chart.plots[0]
plot.has_data_labels = True
for series in plot.series:
values = series.values
counter = 0
for point in series.points:
data_label = point.data_label
data_label.has_text_frame = True
data_label.text_frame.text = str(values[counter])
counter = counter + 1
the cause of error is setting the label position. no matter what you set it asks to repair the PPT. will have to drill down more to see why is it so.
Also to save some more time the formatting doesn't works(font color, size)
If anybody has any leads then please help.
To add on Vibhanshu's response, I could get the formatting (font type, font color, size etc) to work using the following code:
for idx, point in enumerate(chart.series[0].points):
# set position
point.data_label.position = XL_LABEL_POSITION.OUTSIDE_END
# set text
point.data_label.has_text_frame = True
point.data_label.text_frame.text = "This is an example"
# set formatting
for paragraph_idx, paragraph in enumerate(point.data_label.text_frame.paragraphs):
paragraph.line_spacing = 0.6 # set paragraph line spacing
for run in paragraph.runs:
run.font.size = Pt(30) #set font size
run.font.name = 'Poppins Medium' #set font name
run.font.color.rgb = RGBColor.from_string("FF0000") #set font color
I am trying to write a orientation routine for a 3-axis accelerometer. The part I am getting stuck on is, I have one dict with all my axis' listed, after taking the 'z-axis' reading, I want to remove that axis from the Availiable_axis list. Here is a portion of my code that demonstrates what I am trying to do.
import operator
Readings1 = { 0:{'x':0.1, 'y':-1, 'z':-0.1}, 1:{'x':.4, 'y':-.1, 'z':-0.1},
2:{'x':-0.4, 'y':-.8, 'z':-0.1}, 3:{'x':0.1, 'y':-.1, 'z':-0.6},
4:{'x':0.1, 'y':-.2, 'z':0.4}}
SetupValue = {'Axis':{'x-axis':'x','y-axis':'y','z-axis':'z'}}
Available_axis = [SetupValue['Axis']['x-axis'], SetupValue['Axis']['y-axis'], SetupValue['Axis']['z-axis']]
axes = Readings1[0]
print axes
for key in axes:
axes[key] = abs(axes[key])
print axes
print (max(axes.iteritems(), key = operator.itemgetter(1))[0])
Available_axis.pop(max(axes.iteritems(), key = operator.itemgetter(1))[0],0)
Any help would be appreciated.
Available_axis is a list. When popping from a list, you must specify the integer location.
You can also have a short list comprehension that removes the target variable.
Available_axis = [x for x in Available_axis
if x != max(axes.iteritems(), key = operator.itemgetter(1))[0]]
I'm producing some figures with the following code:
def boxplot_data(self,parameters_file,figure_title):
data = pandas.read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
remainder = len(header)-(16*number_of_full_subplots)
try:
for i in range(number_of_full_subplots+1):
fig =plt.figure(i)
txt = fig.suptitle(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots),fontsize='20')
txt.set_text(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots))
for j in range(16):
plt.ioff()
plt.subplot(4,4,j)
plt.boxplot(data[header[16*i+j]])
plt.xlabel('')
mng=plt.get_current_fig_manager()
mng.window.showMaximized()
plt.savefig(str(i)+'.png',bbox_inches='tight',orientation='landscape')
plt.close(fig)
plt.ion()
except IndexError:
txt = fig.suptitle(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots),fontsize='20')
txt.set_text(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots))
print '{} full figures were created and 1 partially filled \
figure containing {} subplots'.format(number_of_full_subplots,remainder)
This produces and saves the figures to file in the properly formatted manner however, no matter what I do the code seems to bypass the fig.suptitle line(s) and consequently I can't give my figure a title. Apologies if it seems there is a lot going on in this function that I haven't explained but does anybody have an explanation as to why this code refuses to give my figures titles?
Your problem is not that suptitle is bypassed, but that you are never saving the figure that you call suptitle on. All your calls to savefig are within the inner loop and as such are saving only the subplots. You can actually watch this happening if you open the png file while your code is running - you see each of the 16 sub axes being added one by one.
Your code looks unnecessarily complicated. For instance, I don't think you need to use ion and ioff. Here is a simple example of how to do what I think you want, followed by a translation of your code to fit that (Obviously i can't test, because I don't have your data)
import matplotlib.pyplot as plt
test_y=range(10)
test_x=[8,13,59,8,81,2,5,6,2,3]
def subplotsave_test():
for i in range(5):
fig = plt.figure(i)
txt = fig.suptitle('Page '+str(i)+' of '+str(5),fontsize='20')
for j in range(16):
plt.subplot(4,4,j+1)
plt.plot(test_y,test_x)
plt.savefig(str(i)+'.png',bbox_inches='tight',orientation='landscape')
if __name__ == '__main__':
subplotsave_test()
One tip I have found works for me - do a plt.show() wherever you intend to save the figure and ensure it looks like you want beforehanad and then replace that call with plt.savefig()
Possible translation of your function
def boxplot_data(self,parameters_file,figure_title):
data = pandas.read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
remainder = len(header)-(16*number_of_full_subplots)
for i in range(number_of_full_subplots+1)
fig =plt.figure(i)
fig.suptitle(figure_title+' (n='+str(len(data[header[0]]))+') '+'Page '+str(i)+' of '+str(number_of_full_subplots),fontsize='20')
for j in range(16):
plt.subplot(4,4,j+1)
if 16*i + j < len(header):
plt.boxplot(data[header[16*i+j]])
plt.xlabel('')
#You might want the showMaximized() call here - does nothing
#on my machine but YMMV
else:
print '{} full figures were created and 1 partially filled \
figure containing {} subplots'.format(number_of_full_subplots,remainder)
break
plt.savefig(str(i)+'.png',bbox_inches='tight',orientation='landscape')
plt.close(fig)
I have the following functions in python:
def extractParam(word, varName, stringToReplace):
if word.startswith(stringToReplace):
varName=int (word.replace(stringToReplace, ''))
return varName
def getMParams(line):
l = r = b = t = 0
sline = line.strip().split()
for i in range(len(sline)):
l = extractParam(sline[i], l, "l=")
r = extractParam(sline[i], r, "r=")
b = extractParam(sline[i], b, "b=")
t = extractParam(sline[i], t, "t=")
return l, r, b, t
def getIterParams (line):
width = height = stride = x = y = 0
sline = line.strip().split()
for i in range(len(sline)):
width = extractParam(sline[i], width, "width=")
height = extractParam(sline[i], height,"height=")
stride = extractParam(sline[i], stride,"stride=")
x = extractParam(sline[i], x, "x=")
y = extractParam(sline[i], y, "y=")
return width, height , stride, x, y
the functions getMparams and getIterParams are quite the same, my question is if there's a way to create a function that will replace both of them, I was thinking about something like that:
def func (line, params)
//params is an array of parameters (i.e [l,r,b,t] or [width,height,stride,x,y])
//init all params
sline = line.strip().split()
for i in range(len(sline)):
//for everyParam:
param = extractParam(sline[i],param,"param=")
is it possible? or there's another way to do it?
First off, some style points:
The way you handle varName in extractParam is ugly and confusing. It took me a while to figure out what you are trying to do (i.e., allow for the fact that extractParam might not find any data). For now, this is better handled directly instead of trying to call out to a function.
That range(len( thing you're doing has to stop.
You do not need to strip the line before splitting it - any leading and trailing whitespace will disappear during the splitting operation. You will not end up with any extra empty strings in the result.
The name sline is just plain ugly. You've split the line up into words; why not refer to the words as, well, words? (And in any case, don't use abbreviations and jumble things up. Otherwise you get things like sline that are not actually words.)
We don't use namesLikeThis for functions (or anything else; although we do use NamesLikeThis, with a starting capital letter, for classes) in Python. We use names_like_this.
Also, it looks as though you are repeatedly trying to replace
That said, your proposed approach is fine. Note that since we don't know ahead of time how many items will be extracted, we can't just toss each one into a separate variable. But we can solve this easily by returning a dict.
My approach is as follows: I iterate over the names, and for each attempt to find the corresponding word in the line. Upon finding one, I replace the default value of 0, and after this check is done, I insert the corresponding key-value pair into the returned result. I also take a simpler approach to cutting up the word around the equals sign.
def extract_params(line, names):
words = line.split()
result = {}
for name in names:
value = 0
for word in words:
maybe_name, equals, maybe_value = word.partition('=')
if maybe_name == name and equals == '=':
value = maybe_value
result[name] = value
return result
This could potentially be improved quite a bit, but much depends on your exact specifications. I tried to create something that follows your basic logic as closely as possible.