I already asked about labeling the axes and I have got the answer that satisfied me for the data that I had at that time. But now I'm trying plot the dataframe with kind=line to see better the evaluation of my values. so I'm using these pandas methods, that don't work in the same manner for kind=line as for kind=bar, and though don't provide the labels for axes. So my dataframe :
name Homework_note Class_note Behavior_note
Alice Ji 7 6 6
Eleonora LI 2 5 4
Mike The 6 5 3
Helen Wo 5 3 5
the script I use:
df=pd.read_csv(os.path.join(path,'class_notes.csv'), sep='\t|,', engine='python')
df.columns=['name', 'Homework_note', 'Class_note', 'Behavior_note']
ax=df.plot(kind='line', x='name', color=['red','blue', 'green'], figsize=(400,100))
ax.set_xlabel("Names", fontsize=56)
ax.set_ylabel("Notes", fontsize=56)
ax.set_title("Notes evaluation", fontsize=79)
plt.legend(loc=2,prop={'size':60})
plt.savefig(os.path.join(path,'notes_names.png'), bbox_inches='tight', dpi=100)
What else can I add to put labels on the axes (both x and y)? I prefer to stay with these pandas methods, cause I find them more comfortable to work with dataframes, but I haven't found the way to put the labels while using this type of plot line.
The labels will show up for a smaller figsize (in inches!). For the the legend, take a look here
Related
I need to create separates plot based on a label. My dataset is
Label Word Frequency
439 10.0 glass 600
471 10.0 tv 34
463 10.0 screen 31
437 10.0 laptop 15
454 10.0 info 15
65 -1.0 dog 1
68 -1.0 cat 1
69 -1.0 win 1
70 -2.0 man 1
71 -2.0 woman 1
In this case I would expect three plots, one for 10, one for -1 and one for -2, with on the x axis Word column and on the y-axis the Frequency (it is already sorted in descending order by Label).
I have tried as follows:
df['Word'].hist(by=df['Label'])
But it seems to be wrong as the output is far away from the expected one.
Any help would be great
You don't want to be using a histogram here: a histogram plot is where the columns of your dataframe contain raw data, and the hist function buckets the raw values and finds the frequencies of each bucket, and then plots.
Your dataframe is already bucketed, with a column in which the frequencies have already been calculated; what you need is the df.plot.bar() method. Unfortunately, this is quite new, and does not yet allow a by parameter, so you have to deal with the subplots manually.
Full walkthrough code for the cut-down example you have provided follows. Obviously you can make it more generic by not hardcoding the number of subplots required in the line marked [1].
# Set up:
import matplotlib.pyplot as plt
import pandas as pd
import io
txt = """Label,Word,Frequency
10.0,glass,600
10.0,tv,34
10.0,screen,31
10.0,laptop,15
10.0,info,15
-1.0,dog,1
-1.0,cat,1
-1.0,win,1
-2.0,man,1
-2.0,woman,1"""
dfA = pd.read_csv((io.StringIO(txt)))
labels = dfA["Label"].unique()
# Set up subplots on which to plot.
# Make more generic by not hardcoding nrows and ncols in [1],
# but calculating them depending on how many labels you have.
fig, axes = plt.subplots(nrows=2, ncols=2) # [1]
ax_list = axes.flatten() # axes is a list of lists;
# ax_list is a simple list which is easier to index.
# Loop through labels and plot the bar chart to the corresponding axis object.
for i in range(len(labels)):
dfA[dfA["Label"]==labels[i]].plot.bar(x="Word", y="Frequency", ax=ax_list[i])
How to assign different colors to the indices of a barh plot in pandas.DataFrame.plot ? I have a dataframe:
group clicks_per_visit bookings_per_visit rev_per_visit
test1 0.90 0.039 0.737
test2 0.87 0.034 0.726
I plot this using:
temp3.plot(kind='barh',subplots=True,grid=True,figsize=(10,7))
to get this plot:
I want the bars to have different colors to highlight the different test groups and I am also open to any other ideas or solutions to make a more 'fancy' visualization of this data.
The behavior of pandas' DataFrame.plot() can be complicated and not always intuitive.
In theory, you can pass an array of colors to plot(), which will be passed to the underlying plotting function.
In fact, since you are plotting 3 subplots, you would pass a list of 3 sub-lists, each containing the colors of each of your bars.
df.plot(kind='barh', subplots=True,grid=True,figsize=(10,7), color=[['C0','C1']]*3, legend=False)
However, doing this causes the labels on the y-axis to disappear. For some reason, you have to specify the names of the columns you want to use in the call to plot() to get the to appear again.
df.plot(kind='barh',x='group', y=['clicks_per_visit','bookings_per_visit','rev_per_visit'], subplots=True,grid=True,figsize=(10,7), color=[['C0','C1']]*3, legend=False)
Since you are asking for other visualization options, I can show you that you can get roughly the same output, with an easier syntax using seaborn. The only "catch" is that you have to "stack" your dataframe to be long-form instead of wide-form
df2 = df.melt(id_vars=['group'],value_vars=['clicks_per_visit', 'bookings_per_visit', 'rev_per_visit'])
plt.figure(figsize=(8,4))
sns.barplot(y='variable',x='value',hue='group', data=df2, orient='h')
plt.tight_layout()
You can do this, it is a very manual way but it will work:
axes = temp3.plot(kind='barh',subplots=True,grid=True,figsize=(10,7))
axes[0].get_children()[0].set_color('r')
This will assign the second bar from the first axis as red, then you can choose the other ones by getting the other axis and children.
I want to draw a multi-series histogram chart that looks like this:
multi-series histogram chart
I'm trying to add this to an existing Jupyter notebook that already had code in place to establish a double chart:
fig, (ax, ax2) = plt.subplots(2,1)
The existing code uses the style where the plotting is done using methods on the data objects themselves. For example, here's some of the existing code that plots line charts in one of the existing subplots:
ax = termstruct[i].T.plot.line(ax=ax, c=linecolor,
dashes=dash, grid=True, linewidth=width, figsize=FIGURE_SIZE)
The main point I'm making here is that the way the plotting is achieved is to use the .plot.line method on the Pandas pd.Series (termstruct). This is not at all consistent with the examples and tutorials I was able to find online for drawing charts with pyplot, but it works and it establishes a framework I'm trying to work within.
So I started by taking the obvious step of adding a 3rd subplot for my histogram by changing the subplots call to plt from above:
fig, (ax, ax2, ax3) = plt.subplots(3,1)
My data are in four separate pd.Series objects, where each one represents a series that should map to one of the colors in the chart example at the top of this post. But when I try following the same general coding style of using methods on the data objects to do the plotting, I always seem to wind up with the X and Y axes opposite what I want, like this:
what I wound up with!
The code that generated the above chart was:
ax3 = NakedPNLperMo.plot.hist(ax=ax3,grid=True, figsize=FIGURE_SIZE)
ax3 = H9PNLperMo.plot.hist(ax=ax3, grid=True, figsize=FIGURE_SIZE)
ax3 = H12PNLperMo.plot.hist(ax=ax3, grid=True, figsize=FIGURE_SIZE)
ax3 = H15PNLperMo.plot.hist(ax=ax3, grid=True, figsize=FIGURE_SIZE)
NakedPNLperMo and the other 3 pd.Series objects are full of arcane financial symbols, but a simplified version of their contents (to make this clear) would be:
NakedPNLperMO = pd.Series(data=[1.2,3.4,5.6,7.8,-2.3,-4.6],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
My intention/goal is that the data are plotted on the Y axis and the index values ('Month 1', etc.) are like columns across the x axis, but I can't seem to get that output no matter what I try.
Clearly the problem is the axes are swapped. But when I went looking for how to fix that, I couldn't find any examples online that follow this approach of drawing the chart using methods on the data objects. Everything I found in online tutorials was using a bunch of calls to plt to set up the charts. And more to the point, I couldn't see any way to follow the style in those examples and still draw the chart as a 3rd subplot alongside the 2 subplots already defined by this program.
My first (and foremost) question is what I SHOULD be trying next... Does it make sense to figure out how to change the parameters of [data-object].plot.xxx to get the axes the way I need them, or would it make more sense to follow the completely different style of making a series of calls to plt to design and draw the charts? The former would be consistent with what I have, but I can't find any online help for using that coding style. (Should I infer that it's a deprecated style of doing things?)
If the answer to the above is to take the approach of calling plt like the online examples all seem to show, how can I use the ax3 that ties this chart into the existing subplots? If the answer to the above is to stick with the approach of [data-object].plot.xxx, where can I find help on using that style? All the online examples I could find followed a different coding style.
And of course the most obvious question: How do I swap the axes so the chart looks right? :)
Thanks!
I hope this code help you, I have created three series to show you how you can do it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#jupyter notebook only
%matplotlib inline
s1 = pd.Series(data=[1.2,3.4,5.6,7.8,-2.3,-4.6],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
s2=pd.Series(data=[5,3.4,7.4,-5.1,-2.3,3],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
s3=pd.Series(data=[5,2,-2.4,0,1,3],
index=['Month 1','Month 2','Month 3','Month 4',
'Month 5','Month 6'])
df=pd.concat([s1,s2,s3],axis=1)
df.columns=['s1','s2','s3']
print(df)
ax=df.plot(kind='bar',figsize=(10,10),fontsize=15)
#------------------------------------------------#
plt.xticks(rotation=-45)
#grid on
plt.grid()
# set y=0
ax.axhline(0, color='black', lw=1)
#change size of legend
ax.legend(fontsize=20)
#hiding upper and right axis layout
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#changing the thickness
ax.spines['bottom'].set_linewidth(3)
ax.spines['left'].set_linewidth(3)
Output:
s1 s2 s3
Month 1 1.2 5.0 5.0
Month 2 3.4 3.4 2.0
Month 3 5.6 7.4 -2.4
Month 4 7.8 -5.1 0.0
Month 5 -2.3 -2.3 1.0
Month 6 -4.6 3.0 3.0
Figure
I have all the data I want to plot in one pandas data frame, e.g.:
date flower_color flower_count
0 2017-08-01 blue 1
1 2017-08-01 red 2
2 2017-08-02 blue 5
3 2017-08-02 red 2
I need a few different lines on one plot: x-value should be the date from the first column and y-value should be flower_count, and the y-value should depend on the flower_color given in the second column.
How can I do that without filtering the original df and saving it as a new object first? My only idea was to create a data frame for only red flowers and then specifying it like:
figure.line(x="date", y="flower_count", source=red_flower_ds)
figure.line(x="date", y="flower_count", source=blue_flower_ds)
You can try this
fig, ax = plt.subplots()
for name, group in df.groupby('flower_color'):
group.plot('date', y='flower_count', ax=ax, label=name)
If my understanding is right, you need a plot with two subplots. The X for both subplots are dates, and the Ys are the flower counts for each color?
In this case, you can employ the subplots in pandas visualization.
fig, axes = plt.subplots(2)
z[z.flower_color == 'blue'].plot(x=['date'], y= ['flower_count'],ax=axes[0]).set_ylabel('blue')
z[z.flower_color == 'red'].plot(x=['date'], y= ['flower_count'],ax=axes[1]).set_ylabel('red')
plt.show()
The output will be like:
Hope it helps.
I am struggling to set xlim for each histogram and create 1 column of graphs so the x-axis ticks are aligned. Being new pandas, I am unsure of how to apply answer applies: Overlaying multiple histograms using pandas.
>import from pandas import DataFrame, read_csv
>import matplotlib.pyplot as plt
>import pandas as pd
>df=DataFrame({'score0':[0.047771,0.044174,0.044169,0.042892,0.036862,0.036684,0.036451,0.035530,0.034657,0.033666],
'score1':[0.061010,0.054999,0.048395,0.048327,0.047784,0.047387,0.045950,0.045707,0.043294,0.042243]})
>print df
score0 score1
0 0.047771 0.061010
1 0.044174 0.054999
2 0.044169 0.048395
3 0.042892 0.048327
4 0.036862 0.047784
5 0.036684 0.047387
6 0.036451 0.045950
7 0.035530 0.045707
8 0.034657 0.043294
9 0.033666 0.042243
>df.hist()
>plt.xlim(-1.0,1.0)
The result sets only one of the bounds on the x-axis to be [-1,1].
I'm very familiar ggplot in R and just trying out pandas/matplotlib in python. I'm open to suggestions for better plotting ideas. Any help would be greatly appreciated.
update #1 (#ct-zhu):
I have tried the following, but the xlim edit on the subplot does not seem to translate the bin widths across the new x-axis values. As a result, the graph now has odd bin widths and still has more than one column of graphs:
for array in df.hist(bins=10):
for subplot in array:
subplot.set_xlim((-1,1))
update #2:
Getting closer with the use of layout, but the width of bins does not equal the interval length divided by bin count. In the example below, I set bins=10. Hence, the width of each bin over the interval from [-1,1] should be 2/10=0.20; however, the graph does not have any bins with a width of 0.20.
for array in df.hist(layout=(2,1),bins=10):
for subplot in array:
subplot.set_xlim((-1,1))
There are two subplots, and you can access each of them and modify them seperately:
ax_list=df.hist()
ax_list[0][0].set_xlim((0,1))
ax_list[0][1].set_xlim((0.01, 0.07))
What you are doing, by plt.xlim, changes the limit of the current working axis only. In this case, it is the second plot which is the most recently generated.
Edit:
To make the plots into 2 rows 1 column, use layout argument. To make the bin edges aligns, use bins argument. Set the x limit to (-1, 1) is probably not a good idea, you numbers are all smallish.
ax_list=df.hist(layout=(2,1),bins=np.histogram(df.values.ravel())[1])
ax_list[0][0].set_xlim((0.01, 0.07))
ax_list[1][0].set_xlim((0.01, 0.07))
Or specify exactly 10 bins between (-1,1):
ax_list=df.hist(layout=(2,1),bins=np.linspace(-1,1,10))
ax_list[0][0].set_xlim((-1,1))
ax_list[1][0].set_xlim((-1,1))