How do I make this subplotting work in Python? - python

First of all,
I'm learning to use Python and sometimes it's a little tricky to me.
I'm using a Game of Thrones database from kraggle to learn visualizations. Now I'm trying to see how many character of each hause died in each book.
Then I make this code:
houses_deathbybook = data_deathsB.groupby(['Book_of_Death', 'Allegiances']).count()[['Name']]
To see a count of deads by house and book.
And used the subplot command to achieve this graph.
I'm now trying to make that graph more usefull using this code
fig, axes = plt.subplots(nrows=1, ncols=1, gridspec_kw={'wspace': 0.1, 'hspace': 0.9})
data_deathsB.loc[data_deathsB['Allegiances']=='House Arryn'.groupby(['Book_of_Death']).agg('count').plot(x='Book of Death', y='Muertes',kind='bar',figsize=(20,15),color='limegreen',grid=True,ax=axes[1,0], title='House Arryn',fontsize=13)
The second part of the code will go replicate for each house.
But it seems to do not work. I make a test, putting in the grid settings just 1 row and column to check one house, and it gives me the next error "unexpected EOF while parsing".
Could you help me?

The problem in your second approach is that you have defined a figure with 2 subfigures: having a single column and two rows. So when you have either a single row or a single column, you can't use two indices [0,0] and so on to access the subplots. In this case you will have to use like the following
ax=axes[0],title='House Arryn')
and
ax=axes[1],title='House Arryn')
The two index style [0,0], [0,1] etc. will work when you will have more than one row and one column.

It worked!
This is the result of the next code (just one of the graphs
fig, axes = plt.subplots(nrows=2,gridspec_kw={'hspace': 1})
data_deathsB.loc[data_deathsB['Allegiances']=='House Arryn',['Allegiances', 'Name', 'Book_of_Death']].groupby(['Book_of_Death'],as_index=False).agg('count').plot(x='Book_of_Death', kind='bar',figsize=(20,15),color='limegreen',grid=True,ax=axes[0],title='House Arryn')
data_deathsB.loc[data_deathsB['Allegiances']=='House Baratheon',['Allegiances', 'Name', 'Book_of_Death']].groupby(['Book_of_Death'],as_index=False).agg('count').plot(x='Book_of_Death', kind='bar',figsize=(20,15),color='limegreen',grid=True,ax=axes1,title='House Baratheon')
The next steps would be to make the graphs a little more cute.
Thanks to everyone!

Related

Why am I unable to make a plot containing subplots in plotly using a px.scatter plot?

I have been trying to make a figure using plotly that combines multiple figures together. In order to do this, I have been trying to use the make_subplots function, but I have found it very difficult to have the plots added in such a way that they are properly formatted. I can currently make singular plots (as seen directly below):
However, whenever I try to combine these singular plots using make_subplots, I end up with this:
This figure has the subplots set up completely wrong, since I need each of the four subplots to contain data pertaining to the four methods (A, B, C, and D). In other words, I would like to have four subplots that look like my singular plot example above.
I have set up the code in the following way:
for sequence in sequences:
#process for making sequence profile is done here
sequence_df = pd.DataFrame(sequence_profile)
row_number=1
grand_figure = make_subplots(rows=4, cols=1)
#there are four groups per sequence, so the grand figure should have four subplots in total
for group in sequence_df["group"].unique():
figure_df_group = sequence_df[(sequence_df["group"]==group)]
figure_df_group.sort_values("sample", ascending=True, inplace=True)
figure = px.line(figure_df_group, x = figure_df_group["sample"], y = figure_df_group["intensity"], color= figure_df_group["method"])
figure.update_xaxes(title= "sample")
figure.update_traces(mode='markers+lines')
#note: the next line fails, since data must be extracted from the figure, hence why it is commented out
#grand_figure.append_trace(figure, row = row_number, col=1)
figure.update_layout(title_text="{} Profile Plot".format(sequence))
grand_figure.append_trace(figure.data[0], row = row_number, col=1)
row_number+=1
figure.write_image(os.path.join(output_directory+"{}_profile_plot_subplots_in_{}.jpg".format(sequence, group)))
grand_figure.write_image(os.path.join(output_directory+"grand_figure_{}_profile_plot_subplots.jpg".format(sequence)))
I have tried following directions (like for example, here: ValueError: Invalid element(s) received for the 'data' property) but I was unable to get my figures added as is as subplots. At first it seemed like I needed to use the graph object (go) module in plotly (https://plotly.com/python/subplots/), but I would really like to keep the formatting/design of my current singular plot. I just want the plots to be conglomerated in groups of four. However, when I try to add the subplots like I currently do, I need to use the data property of the figure, which causes the design of my scatter plot to be completely messed up. Any help for how I can ameliorate this problem would be great.
Ok, so I found a solution here. Rather than using the make_subplots function, I just instead exported all the figures onto an .html file (Plotly saving multiple plots into a single html) and then converted it into an image (HTML to IMAGE using Python). This isn't exactly the approach I would have preferred to have, but it does work.
UPDATE
I have found that plotly express offers another solution, as the px.line object has the parameter of facet that allows one to set up multiple subplots within their plot. My code is set up like this, and is different from the code above in that the dataframe does not need to be iterated in a for loop based on its groups:
sequence_df = pd.DataFrame(sequence_profile)
figure = px.line(sequence_df, x = sequence_df["sample"], y = sequence_df["intensity"], color= sequence_df["method"], facet_col= sequence_df["group"])
Although it still needs more formatting, my plot now looks like this, which is works much better for my purposes:

Pie Chart Creation issue

I'm sure I'm being a complete numpty here. I'm trying to create a couple of pie charts, showing the demographics of respondents to a survey (in this case, Parents or Teachers). Obviously, at the moment, the columns contain strings, which can't be put into a pie chart. So I thought I'd do a count of the strings and put that into a variable. However, when I try to then use that in pie chart, it's failing.
I know this is probably something really simple, and I have Googled around, but I can't seem to find a way to get this working.
Code as follows:
respondents_pie=df.groupby('Respondents').size()
print(respondents_pie)
Output
Respondents
Parents 31
Teachers 20
dtype: int64
fig=plt.figure()
ax=fig.add_axes(0,0,1,1)
ax.axis('equal')
ax.pie(respondents_pie, autopct='%1.2f%%')
plt.show()
Error is: TypeError: from_bounds() argument after * must be an iterable, not int
Error is on line 2 of the code (ax=fig.add_axes(0,0,1,1))
How have I messed this one up?
I found a solution. Changing the ax=fig.add_exes(0,0,1,1) to ax = plt.subplots() resolved the issue.

Indexing my dataframe properly with pandas

I'm trying to plot a bargraph with errorbars acquired from my tests, i found some code on the internet on how to make it. But the code does not fit the way i want the table to look like.
I've tried leaving things out however i don't understand the dataframe enough to know what kind of code i need to process the data correctly.
order=pd.MultiIndex.from_arrays([['402515','402515','402515','402510','402510','402510'],
['z','z','z','z','z','z']],names=['letter','word'])
datas=pd.DataFrame({'first cracking strength':[em1,em2,em3,em4,em5,em6],'flexural strength':[en1,en2,en3,en4,en5,en6]},index=order)
gp4 = datas.groupby(level=('letter', 'word'))
means = gp4.mean()
errors = gp4.std()
print(means)
fig, ax = plt.subplots()
means.plot.bar(yerr=errors, ax=ax, capsize=4);
The multi-index code requires two labels (the 'z' and the '402515/402510', I only want the '402515/402510') on your dataset, but I only want one. What other code does that?
How it looks when I run the code.
How I want it to look.

How do I "reset the index" for a matplotlib plot?

I have the following code:
fig, ax = plt.subplots(1, 1)
calls["2016-12-24"].resample("1h").sum().plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().plot(ax=ax)
which generates the following image:
How can I make this so the lines share the x-axis? In other words, how do I make them not switch days?
If you don't care about using the correct datetime as index, you could just reset the index as you suggested for all the series. This is going to overlap all the time series, if this is what you're trying to achieve.
# the below should
calls["2016-12-24"].resample("1h").sum().reset_index("2016-12-24").plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().reset_index("2016-12-25").plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().reset_index("2016-12-26").plot(ax=ax)
Otherwise you should try as well to resample the three columns at the same time. Have a go with the below but not knowing how your original dataframe look like, I'm not sure this will fit your case. You should post some more information about the input dataframe.
# have a try with the below
calls[["2016-12-24","2016-12-25","2016-12-26"].resample('1h').sum().plot()

Show only the n'th ticklabel in a pandas boxplot

I am new to pandas and matplotlib, but not to Python. I have two questions; a primary and a secondary one.
Primary:
I have a pandas boxplot with FICO score on the x-axis and interest rate on the y-axis.
My x-axis is all messed up since the FICO scores are overwriting each other.
I'd like to show only every 4th or 5th ticklabel on the x-axis for a couple of reasons:
in general it's less chart-junky
in this case it will allow the labels to actually be read.
My code snippet is as follows:
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p = loansmin.boxplot('Interest.Rate','FICO.Score')
I saved the return value in p as I thought I might need to manipulate the plot further which I do now.
Secondary:
How do I access the plot, subplot, axes objects from pandas boxplot.
p above is an matplotlib.axes.AxesSubplot object.
help(matplotlib.axes.AxesSubplot) gives a message saying:
'AttributeError: 'module' object has no attribute 'AxesSubplot'
dir(matplotlib.axes) lists Axes, Subplot and Subplotbase as in that namespace but no AxesSubplot. How do I understand this returned object better?
As I explored further I found that one could explore the returned object p via dir().
Doing this I found a long list of useful methods, amongst which was set_xticklabels.
Doing help(p.set_xticklabels) gave some cryptic, but still useful, help - essentially suggesting passing in a list of strings for ticklabels.
I then tried doing the following - adding set_xticklabels to the end of the last line in the above code effectively chaining the invocations.
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score').set_xticklabels(['650','','','','','700'])
This gave the desired result. I suspect there's a better way as in the way matplotlib does it which allows you to show every n'th label. But for immediate use this works, and also allows setting labels where they are not periodic for whatever reason, if you need that.
As usual, writing out the question explicitly helped me find the answer. And if anyone can help me get to the underlying matplotlib object that is still an open question.
AxesSubplot (I think) is just another way to get at the Axes in matplotlib. set_xticklabels() is part of the matplotlib object oriented interface (on axes). So, if you were using something like pylab, you might use xticks(ticks, labels), but instead here you have to separate it into different calls ax.set_xticks(ticks), ax.set_xticklabels(labels). (where ax is an Axes object).
Let's say you only want to set ticks at 650 and 700. You could do the following:
ticks = labels = [650, 700]
plt.figure()
loansmin = pd.read_csv('../datasets/loanf.csv')
p=loansmin.boxplot('Interest.Rate','FICO.Score')
p.set_xticks(ticks)
p.set_xticklabels(labels)
Similarly, you can use set_xlim and set_ylim to do the equivalent of xlim() and ylim() in plt.

Categories