Plot not showing the exact values of my series - python

I am trying to plot my pandas Series with its values but having wrong values on my x-axis. I have done the same thing more than 3 times on the same workbook. What am doing wrong here?
S1=rog1.groupby('Date')['availabi'].mean()
S1.index
# output
DatetimeIndex(['2018-05-10', '2018-06-10', '2018-07-10'],
dtype='datetime64[ns]', name='Date', freq=None)
But when I decide to plot the lot.
plt.figure(figsize=(10,4))
plt.plot(S1.index, S1)
The below is what I get
The y-axis values are fine. I dunno where the plotted values are coming from. I only have 3 lines in this Series

The issue is that matplotlib auto-detects the number and spacing of x-ticks to populate the x-axis without overlapping labels, and also without leaving too much white space.
The simplest workaround I can think of:
1. Create figure and axis handles
2. Plot your data in the axis
3. Manually set the xtick positions and labels
Code to replace your two lines of plotting:
fig, ax = plt.subplots(figsize=(10, 4))
S1.plot(ax=ax)
ax.set_xticks(S1.index);
ax.set_xticklabels(S1.index.strftime('%Y-%m-%d'));

Related

Normalise stacked plot matplotlib

I've made a stacked plot I'd like to normalise. I made it with the following command:
df2=df.groupby(['month','Tranche']).sum('Balance').unstack().fillna(0)
y=df2.plot(kind='bar', stacked=True)
y.legend(bbox_to_anchor=(1.1, 1.05))
Which gives this plot. I can't compare between bars here because it isn't normalised - I tried dividing df2 by
df.groupby(['month']).sum('Balance').unstack().fillna(0)
But this throws the error 'cannot join without overlapping index'. If I leave the 'Tranche' in the groupby of course each segment just becomes 1.
Is there a way to deal with this without manually putting the percentages into the input df for the plot? I need each bar in the below plot to stop at 1.

How to change tick labels for plot chart from 19:00-7:00 hours in matplotlib

I am trying to plot line charts for both nighttime and daytime to compare the differences in traffic volume in both time periods.
plt.subplot(2,1,1) #plot in grid chart to better compare differences
by_hour_business_night['traffic_volume'].plot.line()
plt.title('Business Nights Traffic Volume by Hours')
plt.ylabel('Traffic Volume')
plt.ylim(0,6500)
plt.show()
The chart for nighttime shows up alright, but the xtick labels are in [0,5,10,15,20,25], how can I change the labels to fit the hours? Something along the lines like: [0,1,2,3,4,5,6,19,20,21,22,23]
I have tried
x=[0,1,2,3,4,5,6,19,20,21,22,23]
plt.xticks(x)
But then I just got [0-6] on the left, and [19-23] on the right, both crammed on either side, leaving the middle of the xticks blank.
Or is there a better way to plot the chart? Since there will be a breaking point between 6 and 19 hours, is there a way to avoid this?
I am new to python and matplotlib, so forgive me if my wordings aren't precise enough.
xticks takes in two arguments: an array-like object of the placements and an array-like object of the labels. So you can do something like this:
plt.xticks(x, x)
This will set a label equal to the placement of the xtick. For more info you can read the docs for xtick here

MatPlotLib - Showing legend

I'm making a scatter plot from a Pandas DataFrame with 3 columns. The first two would be the x and y axis, and the third would be classicfication data that I want to visualize by points having different colors. My question is, how can I add the legend to this plot:
df= df.groupby(['Month', 'Price'])['Quantity'].sum().reset_index()
df.plot(kind='scatter', x='Month', y='Quantity', c=df.Price , s = 100, legend = True);
As you can see, I'd like to automatically color the dots based on their price, so adding labels manually is a bit of an inconvenience. Is there a way I could add something to this code, that would also show a legend to the Price values?
Also, this colors the scatter plot dots on a range from black to white. Can I add custom colors without giving up the easy usage of c=df.Price?
Thank you!

grids of graphs in matplotlib

Using the AXIS notation for matplotlib has allowed me to manually plot a grid of 2x2 or 3x3 or whatever size grid (if I know what size grid I want beforehand.)
However, how do you determine what size grid is needed automatically. Like what if you don't know how many unique values are in a column that you want to graph?
I am thinking there must be a way of doing this in a loop and figuring out based on the number of unique values in the column this is how big the graph needs to be.
Example
When I plot this for some reason it doesn't show month_name on the x axis (as in Jan, Feb, Marc etc)
avg_all_account.plot(legend=False,subplots=True,x='month_date',figsize=(10,20))
plt.xlabel('month')
plt.ylabel('number of proposals')
Yet when I plot subplots on a figure and specify x axis paremeter x='month_name' The month name appears on the plot here:
f = plt.figure()
f.set_figheight(8)
f.set_figwidth(8)
f.sharex=True
f.sharey=True
#graph1 = f.add_subplot(2,2,1)
avg_all_account.ix[0:,['month_date','number_open_proposals_all']].plot(ax=f.add_subplot(331),legend=False,subplots=True,x='month_date',y='number_open_proposals_all',title='open proposals')
plt.xlabel('month')
plt.ylabel('number of proposals')
Thus because the subplot method worked and showed the month_name on the x axis, and my x and y axis labels: I wanted to know how would I work out how many subplots I would need without first calculating it, then writing out each line and hard coding the subplot position?

Using a Pandas dataframe index as values for x-axis in matplotlib plot

I have time series in a Pandas dateframe with a number of columns which I'd like to plot. Is there a way to set the x-axis to always use the index from a dateframe?
When I use the .plot() method from Pandas the x-axis is formatted correctly however I when I pass my dates and the column(s) I'd like to plot directly to matplotlib the graph doesn't plot correctly. Thanks in advance.
plt.plot(site2.index.values, site2['Cl'])
plt.show()
FYI: site2.index.values produces this (I've cut out the middle part for brevity):
array([
'1987-07-25T12:30:00.000000000+0200',
'1987-07-25T16:30:00.000000000+0200',
'2010-08-13T02:00:00.000000000+0200',
'2010-08-31T02:00:00.000000000+0200',
'2010-09-15T02:00:00.000000000+0200'
],
dtype='datetime64[ns]')
It seems the issue was that I had .values. Without it (i.e. site2.index) the graph displays correctly.
You can use plt.xticks to set the x-axis
try:
plt.xticks( site2['Cl'], site2.index.values ) # location, labels
plt.plot( site2['Cl'] )
plt.show()
see the documentation for more details: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks
That's Builtin Right Into To plot() method
You can use yourDataFrame.plot(use_index=True) to use the DataFrame Index On X-Axis.
The "use_index=True" sets the DataFrame Index on the X-Axis.
Read More Here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.html
you want to use matplotlib to select a 'sensible' scale just like me, there is one way can solve this question. using a Pandas dataframe index as values for x-axis in matplotlib plot. Code:
ax = plt.plot(site2['Cl'])
x_ticks = ax.get_xticks() # use matplotlib default xticks
x_ticks = list(filter(lambda x: x in range(len(site2)), x_ticks))
ax.set_xticklabels([' '] + site2.index.iloc[x_ticks].to_list())

Categories