Normalise stacked plot matplotlib - python

I've made a stacked plot I'd like to normalise. I made it with the following command:
df2=df.groupby(['month','Tranche']).sum('Balance').unstack().fillna(0)
y=df2.plot(kind='bar', stacked=True)
y.legend(bbox_to_anchor=(1.1, 1.05))
Which gives this plot. I can't compare between bars here because it isn't normalised - I tried dividing df2 by
df.groupby(['month']).sum('Balance').unstack().fillna(0)
But this throws the error 'cannot join without overlapping index'. If I leave the 'Tranche' in the groupby of course each segment just becomes 1.
Is there a way to deal with this without manually putting the percentages into the input df for the plot? I need each bar in the below plot to stop at 1.

Related

Why is matplotlib .plot(kind='bar') plot so different to .plot()

This may be a very stupid question, but when plotting a Pandas DataFrame using .plot() it is very quick and produces a graph with an appropriate index. As soon as I try to change this to a bar chart, it just seems to lose all formatting and the index goes wild. Why is this the case? And is there an easy way to just plot a bar chart with the same format as the line chart?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['Date'] = pd.date_range(start='01/01/2012', end='31/12/2018')
df['Value'] = np.random.randint(low=5, high=100, size=len(df))
df.set_index('Date', inplace=True)
df.plot()
plt.show()
df.plot(kind='bar')
plt.show()
Update:
For comparison, if I take the data and put it into Excel, then create a line plot and a bar ('column') plot it instantly will convert the plot and keep the axis labels as they were for the line plot. If I try to produce many (thousands) of bar charts in Python with years of daily data, this takes a long time. Is there just an equivalent way of doing this Excel transformation in Python?
Pandas bar plots are categorical in nature; i.e. each bar is a separate category and those get their own label. Plotting numeric bar plots (in the same manner a line plots) is not currently possible with pandas.
In contrast matplotlib bar plots are numerical if the input data is numbers or dates. So
plt.bar(df.index, df["Value"])
produces
Note however that due to the fact that there are 2557 data points in your dataframe, distributed over only some hundreds of pixels, not all bars are actually plotted. Inversely spoken, if you want each bar to be shown, it needs to be one pixel wide in the final image. This means with 5% margins on each side your figure needs to be more than 2800 pixels wide, or a vector format.
So rather than showing daily data, maybe it makes sense to aggregate to monthly or quarterly data first.
The default .plot() connects all your data points with straight lines and produces a line plot.
On the other hand, the .plot(kind='bar') plots each data point as a discrete bar. To get a proper formatting on the x-axis, you will have to modify the tick-labels post plotting.

Python Pandas Matplotlib : How to Plot Graph without Numerics?

I want to plot bar graph or graphs in python using a Pandas dataframe using two columns that don't contain numeric. One column is Operating System, another is computer name, I want to plot a graph between them showing which OS is running over how many Systems, the sample data is like below.
How can I plot bar graph or other graphs for these two colums. When I try the code below:
ax = dfdefault[['Operating System','Computer Name']].plot(kind='bar')
ax.set_xlabel("Hour", fontsize=12)
ax.set_ylabel("V", fontsize=12)
plt.show()
I get this error:
Error:
TypeError: Empty 'DataFrame': no numeric data to plot
You will need to count the occurrence of each operating system first and then plot using a bar graph or pie chart. bar expects numeric data already, which you don't have. Counting will take care of this. Here is an example using a pie chart:
df = pd.DataFrame(
[['asd', 'win'],
['sdf', 'mac'],
['aww', 'win'],
['dd', 'linux']],
columns=['computer', 'os']
)
df['os'].value_counts().plot.pie()
A bar chart would work similarly. Just change pie to bar.

I want to create a pie chart using a dataframe column in python

I want to create a Pie chart using single column of my dataframe, say my column name is 'Score'. I have stored scores in this column as below :
Score
.92
.81
.21
.46
.72
.11
.89
Now I want to create a pie chart with the range in percentage.
Say 0-0.4 is 30% , 0.4-0.7 is 35 % , 0.7+ is 35% .
I am using the below code using
df1['bins'] = pd.cut(df1['Score'],bins=[0,0.5,1], labels=["0-50%","50-100%"])
df1 = df.groupby(['Score', 'bins']).size().unstack(fill_value=0)
df1.plot.pie(subplots=True,figsize=(8, 3))
With the above code I am getting the Pie chart, but i don’t know how i can do this using percentage.
my pie chart look like this for now
Cutting the dataframe up into bins is the right first step. After which, you can use value_counts with normalize=True in order to get relative frequencies of values in the bins column. This will let you see percentage of data across ranges that are defined in the bins.
In terms of plotting the pie chart, I'm not sure if I understood correctly, but it seemed like you would like to display the correct legend values and the percentage values in each slice of the pie.
pandas.DataFrame.plot is a good place to see all parameters that can be passed into the plot method. You can specify what are your x and y columns to use, and by default, the dataframe index is used as the legend in the pie plot.
To show the percentage values per slice, you can use the autopct parameter as well. As mentioned in this answer, you can use all the normal matplotlib plt.pie() flags in the plot method as well.
Bringing everything together, this is the resultant code and the resultant chart:
df = pd.DataFrame({'Score': [0.92,0.81,0.21,0.46,0.72,0.11,0.89]})
df['bins'] = pd.cut(df['Score'], bins=[0,0.4,0.7,1], labels=['0-0.4','0.4-0.7','0.7-1'], right=True)
bin_percent = pd.DataFrame(df['bins'].value_counts(normalize=True) * 100)
plot = bin_percent.plot.pie(y='bins', figsize=(5, 5), autopct='%1.1f%%')
Plot of Pie Chart

Plot not showing the exact values of my series

I am trying to plot my pandas Series with its values but having wrong values on my x-axis. I have done the same thing more than 3 times on the same workbook. What am doing wrong here?
S1=rog1.groupby('Date')['availabi'].mean()
S1.index
# output
DatetimeIndex(['2018-05-10', '2018-06-10', '2018-07-10'],
dtype='datetime64[ns]', name='Date', freq=None)
But when I decide to plot the lot.
plt.figure(figsize=(10,4))
plt.plot(S1.index, S1)
The below is what I get
The y-axis values are fine. I dunno where the plotted values are coming from. I only have 3 lines in this Series
The issue is that matplotlib auto-detects the number and spacing of x-ticks to populate the x-axis without overlapping labels, and also without leaving too much white space.
The simplest workaround I can think of:
1. Create figure and axis handles
2. Plot your data in the axis
3. Manually set the xtick positions and labels
Code to replace your two lines of plotting:
fig, ax = plt.subplots(figsize=(10, 4))
S1.plot(ax=ax)
ax.set_xticks(S1.index);
ax.set_xticklabels(S1.index.strftime('%Y-%m-%d'));

MatPlotLib - Showing legend

I'm making a scatter plot from a Pandas DataFrame with 3 columns. The first two would be the x and y axis, and the third would be classicfication data that I want to visualize by points having different colors. My question is, how can I add the legend to this plot:
df= df.groupby(['Month', 'Price'])['Quantity'].sum().reset_index()
df.plot(kind='scatter', x='Month', y='Quantity', c=df.Price , s = 100, legend = True);
As you can see, I'd like to automatically color the dots based on their price, so adding labels manually is a bit of an inconvenience. Is there a way I could add something to this code, that would also show a legend to the Price values?
Also, this colors the scatter plot dots on a range from black to white. Can I add custom colors without giving up the easy usage of c=df.Price?
Thank you!

Categories