I'm sure I'm being a complete numpty here. I'm trying to create a couple of pie charts, showing the demographics of respondents to a survey (in this case, Parents or Teachers). Obviously, at the moment, the columns contain strings, which can't be put into a pie chart. So I thought I'd do a count of the strings and put that into a variable. However, when I try to then use that in pie chart, it's failing.
I know this is probably something really simple, and I have Googled around, but I can't seem to find a way to get this working.
Code as follows:
respondents_pie=df.groupby('Respondents').size()
print(respondents_pie)
Output
Respondents
Parents 31
Teachers 20
dtype: int64
fig=plt.figure()
ax=fig.add_axes(0,0,1,1)
ax.axis('equal')
ax.pie(respondents_pie, autopct='%1.2f%%')
plt.show()
Error is: TypeError: from_bounds() argument after * must be an iterable, not int
Error is on line 2 of the code (ax=fig.add_axes(0,0,1,1))
How have I messed this one up?
I found a solution. Changing the ax=fig.add_exes(0,0,1,1) to ax = plt.subplots() resolved the issue.
Related
I have been trying to make a figure using plotly that combines multiple figures together. In order to do this, I have been trying to use the make_subplots function, but I have found it very difficult to have the plots added in such a way that they are properly formatted. I can currently make singular plots (as seen directly below):
However, whenever I try to combine these singular plots using make_subplots, I end up with this:
This figure has the subplots set up completely wrong, since I need each of the four subplots to contain data pertaining to the four methods (A, B, C, and D). In other words, I would like to have four subplots that look like my singular plot example above.
I have set up the code in the following way:
for sequence in sequences:
#process for making sequence profile is done here
sequence_df = pd.DataFrame(sequence_profile)
row_number=1
grand_figure = make_subplots(rows=4, cols=1)
#there are four groups per sequence, so the grand figure should have four subplots in total
for group in sequence_df["group"].unique():
figure_df_group = sequence_df[(sequence_df["group"]==group)]
figure_df_group.sort_values("sample", ascending=True, inplace=True)
figure = px.line(figure_df_group, x = figure_df_group["sample"], y = figure_df_group["intensity"], color= figure_df_group["method"])
figure.update_xaxes(title= "sample")
figure.update_traces(mode='markers+lines')
#note: the next line fails, since data must be extracted from the figure, hence why it is commented out
#grand_figure.append_trace(figure, row = row_number, col=1)
figure.update_layout(title_text="{} Profile Plot".format(sequence))
grand_figure.append_trace(figure.data[0], row = row_number, col=1)
row_number+=1
figure.write_image(os.path.join(output_directory+"{}_profile_plot_subplots_in_{}.jpg".format(sequence, group)))
grand_figure.write_image(os.path.join(output_directory+"grand_figure_{}_profile_plot_subplots.jpg".format(sequence)))
I have tried following directions (like for example, here: ValueError: Invalid element(s) received for the 'data' property) but I was unable to get my figures added as is as subplots. At first it seemed like I needed to use the graph object (go) module in plotly (https://plotly.com/python/subplots/), but I would really like to keep the formatting/design of my current singular plot. I just want the plots to be conglomerated in groups of four. However, when I try to add the subplots like I currently do, I need to use the data property of the figure, which causes the design of my scatter plot to be completely messed up. Any help for how I can ameliorate this problem would be great.
Ok, so I found a solution here. Rather than using the make_subplots function, I just instead exported all the figures onto an .html file (Plotly saving multiple plots into a single html) and then converted it into an image (HTML to IMAGE using Python). This isn't exactly the approach I would have preferred to have, but it does work.
UPDATE
I have found that plotly express offers another solution, as the px.line object has the parameter of facet that allows one to set up multiple subplots within their plot. My code is set up like this, and is different from the code above in that the dataframe does not need to be iterated in a for loop based on its groups:
sequence_df = pd.DataFrame(sequence_profile)
figure = px.line(sequence_df, x = sequence_df["sample"], y = sequence_df["intensity"], color= sequence_df["method"], facet_col= sequence_df["group"])
Although it still needs more formatting, my plot now looks like this, which is works much better for my purposes:
Learning Python as an alternative to Excel. Was feeling quite proud of myself having constructed a pivot table and drawn a chart in just a day. But I can't move the legend on the chart. I've read the Matlab documentation and various other examples 50 times so time to ask.
What is wrong with the code?
How do I tell python/matlab to keep using the default column headings for the legend but simply to be outside the X axis?
Code is:
my_plot = windtable.plot(kind='line')
my_plot.set_title("Wind Power")
my_plot.set_ylabel("MW")
my_plot.set_xlabel("Time of day")
my_plot.legend('NSW','QLD','SA','TAS','VIC','Location','southoutside')
returns the chart with default legend and a type error
TypeError Traceback (most recent call last)
----> 5 my_plot.legend('NSW','QLD','SA','TAS','VIC','Location','southoutside', 'Orientation','horizontal')....TypeError: legend only accepts two non-keyword arguments
I've tried a lot of variations including assigning the legend to a variable, using a with statement, but no go.
Two things are going on here.
First, to assign values to a legend, they need to be passed as a list (it is not clear from your example but I guess you have 5 lines, from NSW to VIC).
Second, you want the location of your legend to be outside of the axes. To do so, you can use a combination of loc and bbox_to_anchor properties, like that:
my_plot.set_title("Wind Power")
my_plot.set_ylabel("MW")
my_plot.set_xlabel("Time of day")
my_plot.legend(
['NSW','QLD','SA','TAS','VIC'],
loc='center left',
bbox_to_anchor=(1, 0.5)
)
This is what I get on my test dataframe:
First of all,
I'm learning to use Python and sometimes it's a little tricky to me.
I'm using a Game of Thrones database from kraggle to learn visualizations. Now I'm trying to see how many character of each hause died in each book.
Then I make this code:
houses_deathbybook = data_deathsB.groupby(['Book_of_Death', 'Allegiances']).count()[['Name']]
To see a count of deads by house and book.
And used the subplot command to achieve this graph.
I'm now trying to make that graph more usefull using this code
fig, axes = plt.subplots(nrows=1, ncols=1, gridspec_kw={'wspace': 0.1, 'hspace': 0.9})
data_deathsB.loc[data_deathsB['Allegiances']=='House Arryn'.groupby(['Book_of_Death']).agg('count').plot(x='Book of Death', y='Muertes',kind='bar',figsize=(20,15),color='limegreen',grid=True,ax=axes[1,0], title='House Arryn',fontsize=13)
The second part of the code will go replicate for each house.
But it seems to do not work. I make a test, putting in the grid settings just 1 row and column to check one house, and it gives me the next error "unexpected EOF while parsing".
Could you help me?
The problem in your second approach is that you have defined a figure with 2 subfigures: having a single column and two rows. So when you have either a single row or a single column, you can't use two indices [0,0] and so on to access the subplots. In this case you will have to use like the following
ax=axes[0],title='House Arryn')
and
ax=axes[1],title='House Arryn')
The two index style [0,0], [0,1] etc. will work when you will have more than one row and one column.
It worked!
This is the result of the next code (just one of the graphs
fig, axes = plt.subplots(nrows=2,gridspec_kw={'hspace': 1})
data_deathsB.loc[data_deathsB['Allegiances']=='House Arryn',['Allegiances', 'Name', 'Book_of_Death']].groupby(['Book_of_Death'],as_index=False).agg('count').plot(x='Book_of_Death', kind='bar',figsize=(20,15),color='limegreen',grid=True,ax=axes[0],title='House Arryn')
data_deathsB.loc[data_deathsB['Allegiances']=='House Baratheon',['Allegiances', 'Name', 'Book_of_Death']].groupby(['Book_of_Death'],as_index=False).agg('count').plot(x='Book_of_Death', kind='bar',figsize=(20,15),color='limegreen',grid=True,ax=axes1,title='House Baratheon')
The next steps would be to make the graphs a little more cute.
Thanks to everyone!
I'm quite new to Python, pandas DataFrames and Seaborn. When I was trying to understand Seaborn better, particularly sns.lmplot, I came across a difference between two figures made of the same data, that I thought were supposed to look alike, and I wonder why that is.
Data: My data is a pandas DataFrame that has 454 rows and 19 columns. The data relevant to this question includes 4 columns and looks something like this:
Columns: Av_density; pred2; LOC; Year;
Variable type: Continuous variable; Continuous variable; Categorical variable 1...4;Categorical 2012...2014
There are no missing data points.
My aim is to draw a 2x2 figure panel describing the relationship between Av_density and pred2 separately for each LOC(=location) with years marked with different colours. I call seaborn with:
import seaborn as sns
sns.set(style="whitegrid")
np.random.seed(sum(map(ord, "linear_categorical")))
(Side point: for some reason calling "linear_quantitative" does not work, i.e. I get a "File "stdin", line 2
sns.lmplot("Av_density", "pred2", Data, col="LOC", hue="YEAR", col_wrap=2);
^
SyntaxError: invalid syntax")
Figure method 1, FacetGrid + scatter:
sur=sns.FacetGrid(Data,col="LOC", col_wrap=2,hue="YEAR")
sur.map(plt.scatter, "Av_density", "pred2" );
plt.legend()
This produces a nice scatter of the data accurately. You can see the picture here:https://drive.google.com/file/d/0B7h2wsx9mUBScEdUbGRlRk5PV1E/view?usp=sharing
Figure method 2, sns.lmplot:
sns.lmplot("Av_density", "pred2", Data, col="LOC", hue="YEAR", col_wrap=2);
This produces the figure panel divided by LOC accurately, with Years in different colours, but the scatter of the data points does not look right. Instead, it looks like lmplot has linearised the data points, and lost the original scatter points that it is supposed to be drawing in addition to the regression lines.
You can see the figure here: https://drive.google.com/file/d/0B7h2wsx9mUBSRkN5ZXhBeW9ob1E/view?usp=sharing
My data produces only three points per location per year, and I was first wondering if this is what makes the "mistake" in lmplot datapoint. Optimally I would have a shorter line describing the trend between years instead of a proper regression, but I have not figured out the code to this yet.
But before tackling that issue, I would really like to know if there is something I am doing wrong that I can fix, or if this is an issue of lmplot trying to handle my data?
Any help, comments and ideas on this are warmly welcome!
-TA-
Ps. I'm running Python 2.7.8 with Spyder 2.3.4
EDIT: I get shorter "trend lines" with the first method by adding:
sur.map(plt.plot,"Av_density", "pred2" );
Still would like to know what is messing the figure with lmplot.
The issue is probably only that the added regression line is messing up the y-axis, so that the variability in the data cannot be seen.
Try resetting the y-axis based on the variability in your original plot to see if they show the same thing, in your case e.g.
fig1 = sns.lmplot("Av_density", "pred2", Data, col="LOC", hue="YEAR", col_wrap=2);
fig1.set(ylim=(-0.03, 0.05))
plt.show(fig1)
I am new to pandas and matplotlib, but not to Python. I have two questions; a primary and a secondary one.
Primary:
I have a pandas boxplot with FICO score on the x-axis and interest rate on the y-axis.
My x-axis is all messed up since the FICO scores are overwriting each other.
I'd like to show only every 4th or 5th ticklabel on the x-axis for a couple of reasons:
in general it's less chart-junky
in this case it will allow the labels to actually be read.
My code snippet is as follows:
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p = loansmin.boxplot('Interest.Rate','FICO.Score')
I saved the return value in p as I thought I might need to manipulate the plot further which I do now.
Secondary:
How do I access the plot, subplot, axes objects from pandas boxplot.
p above is an matplotlib.axes.AxesSubplot object.
help(matplotlib.axes.AxesSubplot) gives a message saying:
'AttributeError: 'module' object has no attribute 'AxesSubplot'
dir(matplotlib.axes) lists Axes, Subplot and Subplotbase as in that namespace but no AxesSubplot. How do I understand this returned object better?
As I explored further I found that one could explore the returned object p via dir().
Doing this I found a long list of useful methods, amongst which was set_xticklabels.
Doing help(p.set_xticklabels) gave some cryptic, but still useful, help - essentially suggesting passing in a list of strings for ticklabels.
I then tried doing the following - adding set_xticklabels to the end of the last line in the above code effectively chaining the invocations.
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score').set_xticklabels(['650','','','','','700'])
This gave the desired result. I suspect there's a better way as in the way matplotlib does it which allows you to show every n'th label. But for immediate use this works, and also allows setting labels where they are not periodic for whatever reason, if you need that.
As usual, writing out the question explicitly helped me find the answer. And if anyone can help me get to the underlying matplotlib object that is still an open question.
AxesSubplot (I think) is just another way to get at the Axes in matplotlib. set_xticklabels() is part of the matplotlib object oriented interface (on axes). So, if you were using something like pylab, you might use xticks(ticks, labels), but instead here you have to separate it into different calls ax.set_xticks(ticks), ax.set_xticklabels(labels). (where ax is an Axes object).
Let's say you only want to set ticks at 650 and 700. You could do the following:
ticks = labels = [650, 700]
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score')
p.set_xticks(ticks)
p.set_xticklabels(labels)
Similarly, you can use set_xlim and set_ylim to do the equivalent of xlim() and ylim() in plt.