Adding a shaded box to a plot in python

Adding a shaded box to a plot in python - python

I am looking to add a shaded box to my plot below. I want the box to go from Aug 25-Aug 30 and to run the length of the Y axis.
The following is my code for the two plots I have made...
df = pd.read_excel('salinity_temp.xlsx')
dates = df['Date']
sal = df['Salinity']
temp = df['Temperature']
fig, axes = plt.subplots(2, 1, figsize=(8,8), sharex=True)
axes[0].plot(dates, sal, lw=5, color="red")
axes[0].set_ylabel('Salinity (PSU)')
axes[0].set_title('Salinity', fontsize=14)
axes[1].set_title('Temperature', fontsize=14)
axes[1].plot(dates, temp, lw=5, color="blue")
axes[1].set_ylabel('Temperature (C)')
axes[1].set_xlabel('Dates, 2017', fontsize=12)
axes[1].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[0].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[1].xaxis_date()
axes[0].xaxis_date()
I want the shaded box to highlight when Hurricane Harvey hit Houston, Texas (Aug 25- Aug 30). My data looks like:
Date Salinity Temperature
20-Aug 15.88144647 31.64707184
21-Aug 18.83088846 31.43848419
22-Aug 19.51015264 31.47655487
23-Aug 23.41655369 31.198349
24-Aug 25.16410124 30.63014984
25-Aug 25.2273574 28.8677597
26-Aug 28.35557667 27.49458313
27-Aug 18.52829235 25.92834473
28-Aug 7.423231661 24.06635284
29-Aug 0.520394177 23.47881317
30-Aug 0.238508327 23.90857697
31-Aug 0.143210364 24.30892944
1-Sep 0.206473387 25.20442963
2-Sep 0.241343182 26.32663727
3-Sep 0.58000503 26.93431854
4-Sep 1.182055098 27.8212738
5-Sep 3.632014919 28.23947906
6-Sep 4.672006985 27.29686737
7-Sep 5.938766377 26.8693161
8-Sep 9.107671159 26.48963928
9-Sep 8.180587303 26.05213165
10-Sep 6.200532091 25.73104858
11-Sep 5.144526191 25.60035706
12-Sep 5.106032451 25.73139191
13-Sep 4.279492562 26.06132507
14-Sep 5.255868992 26.74919128
15-Sep 8.026764063 27.23724365
I have tried using the rectangle function in this link (https://discuss.analyticsvidhya.com/t/how-to-add-a-patch-in-a-plot-in-python/5518) however can't seem to get it to work properly.

Independent of your specific data, it sounds like you need axvspan. Try running this after your plotting code:
for ax in axes:
ax.axvspan('2017-08-25', '2017-08-30', color='black', alpha=0.5)
This will work if dates = df['Date'] is stored as type datetime64. It might not work with other datetime data types, and it won't work if dates contains date strings.

Related

Change matplotlib subplots to seperate plots

I have gathered a code to make plots from data from multiple days. I have a data file containing over 40 days and 19k timestamps, and I need a plot, one for each day. I want python to generate them as different plots.
Mr. T helped me a lot with providing the code, but I cannot manage the code to get it to plot individual plots instead of all in one subplot. Can somebody help me with this?
Picture shows the current output:
My code:
import matplotlib.pyplot as plt
import numpy as np
#read your data and create datetime index
df= pd.read_csv('test-februari.csv', sep=";")
df.index = pd.to_datetime(df["Date"]+df["Time"].str[:-5], format="%Y:%m:%d %H:%M:%S")
#group by date and hour, count entries
dfcounts = df.groupby([df.index.date, df.index.hour]).size().reset_index()
dfcounts.columns = ["Date", "Hour", "Count"]
maxcount = dfcounts.Count.max()
#group by date for plotting
dfplot = dfcounts.groupby(dfcounts.Date)
#plot each day into its own subplot
fig, axs = plt.subplots(dfplot.ngroups, figsize=(6,8))
for i, groupdate in enumerate(dfplot.groups):
ax=axs[i]
#the marker is not really necessary but has been added in case there is just one entry per day
ax.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
ax.set_title(str(groupdate))
ax.set_xlim(0, 24)
ax.set_ylim(0, maxcount * 1.1)
ax.xaxis.set_ticks(np.arange(0, 25, 2))
plt.tight_layout()
plt.show()

Welcome to the Stackoverflow.
Instead of creating multiple subplots, you can create a figure on the fly and plot onto it in every loop separately. And at the end show all of them at the same time.
for groupdate in dfplot.groups:
fig = plt.figure()
plt.plot(groupdate.Hour, groupdate.Count, color="blue", marker="o")
plt.title(str(groupdate))
plt.xlim(0, 24)
plt.ylim(0, maxcount * 1.1)
plt.xticks(np.arange(0, 25, 2))
plt.tight_layout()
plt.show()

How to add monthly labels to x-axis using matplotlib?

For an assignment I need to plot record (min and max) temperatures over the period 2004-2014 using matplotlib. The figure is almost complete (see below) except for the x axis labelling. When plotting, I did not specify the x-axis value so it generated integers from 0-365, thus the number of days in a year. Now I want the months to appear as x-axis labels instead of integers (Jan, Feb, etc.). Can someone help me out?
Record low and high temperatures:

I generated source data as follows:
np.random.seed(13)
dates = pd.date_range(start='2014-01-01', end='2014-12-31')
temp = pd.DataFrame({'tMin': np.random.normal(0, 0.5, dates.size).cumsum() - 10,
'tMax': np.random.normal(0, 0.5, dates.size).cumsum() + 10}, index=dates)
To get the picture with month labels, try the following code:
# Imports
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Drawing
fig, ax = plt.subplots(figsize=(10, 4))
plt.xlabel('Month')
plt.ylabel('Temp')
plt.title('Temperatures 2014')
ax.xaxis.set_major_locator(mdates.MonthLocator())
fmt = mdates.DateFormatter('%b %Y')
ax.xaxis.set_major_formatter(fmt)
ax.plot(temp.tMin)
ax.plot(temp.tMax)
ax.fill_between(temp.index, temp.tMin, temp.tMax, color='#A0E0A0', alpha=0.2)
plt.setp(ax.get_xticklabels(), rotation=30);
For the above source data I got the following picture:

How to fix Matplotlib plotting Pandas Series blank data

I am plotting a csv read in by pandas using matplotlib and the following code.
Image of CSV data:
fig, ax = plt.subplots(figsize=(10, 10))
plt.plot(dat['Forecast Hour'].iloc[0:45], dat['Forecasted End Time'].iloc[0:45],'b', marker='o')
plt.plot(dat['Forecast Hour'].iloc[0:46], dat['Forecasted Start Time'].iloc[0:46], 'r', marker='o')
bar = plt.bar(dat['Forecast Hour'].iloc[8:46], dat['Forecasted Event Length'].iloc[8:46], width=.8, color='gainsboro')
ax.tick_params(which='major',labelsize='12')
ax.grid(which='major', color='#CCCCCC', linestyle='-')
plt.xticks(rotation='90')
plt.xlabel('Forecast Run')
plt.ylabel('Forecasted Start/End Time')
plt.legend()
ax3 = ax.twinx()
mn, mx = ax.get_ylim()
ax3.set_ylim(0, 12)
ax3.set_ylabel('Forecasted Event Length')
When I try to run the following code I receive the error message:
ValueError: could not convert string to float: '11:00 PM'
When I convert the Nan values to blank spaces using:
dat = dat.replace(np.nan, '', regex=True)
The data will plot but also include the blank space data, like so (space between 9:00 pm and x axis):
Image of Graphed data
Ultimately, how do I a) stop matplotlib from plotting this "blank data" or b) make 9:00 pm the 0 point for my graph axes?
Any help is very much appreciated!

How to plot stacked bar chart using one of the variables in Pandas?

I am trying to work with this csv file which I have inputted as a pandas.Dataframe giving Black Friday purchase data for various shoppers along with various variables for understanding their purchase patterns.
User_ID,Product_ID,Gender,Age,Occupation,City_Category,Stay_In_Current_City_Years,Marital_Status,Product_Category_1,Product_Category_2,Product_Category_3,Purchase
1000001,P00069042,F,0-17,10,A,2,0,3,,,8370
1000001,P00248942,F,0-17,10,A,2,0,1,6,14,15200
1000001,P00087842,F,0-17,10,A,2,0,12,,,1422
1000001,P00085442,F,0-17,10,A,2,0,12,14,,1057
1000002,P00285442,M,55+,16,C,4+,0,8,,,7969
1000003,P00193542,M,26-35,15,A,3,0,1,2,,15227
1000004,P00184942,M,46-50,7,B,2,1,1,8,17,19215
1000004,P00346142,M,46-50,7,B,2,1,1,15,,15854
1000004,P0097242,M,46-50,7,B,2,1,1,16,,15686
1000005,P00274942,M,26-35,20,A,1,1,8,,,7871
1000005,P00251242,M,26-35,20,A,1,1,5,11,,5254
1000005,P00014542,M,26-35,20,A,1,1,8,,,3957
1000005,P00031342,M,26-35,20,A,1,1,8,,,6073
1000005,P00145042,M,26-35,20,A,1,1,1,2,5,15665
1000006,P00231342,F,51-55,9,A,1,0,5,8,14,5378
1000006,P00190242,F,51-55,9,A,1,0,4,5,,2079
1000006,P0096642,F,51-55,9,A,1,0,2,3,4,13055
1000006,P00058442,F,51-55,9,A,1,0,5,14,,8851
1000007,P00036842,M,36-45,1,B,1,1,1,14,16,11788
1000008,P00249542,M,26-35,12,C,4+,1,1,5,15,19614
1000008,P00220442,M,26-35,12,C,4+,1,5,14,,8584
1000008,P00156442,M,26-35,12,C,4+,1,8,,,9872
1000008,P00213742,M,26-35,12,C,4+,1,8,,,9743
1000008,P00214442,M,26-35,12,C,4+,1,8,,,5982
1000008,P00303442,M,26-35,12,C,4+,1,1,8,14,11927
1000009,P00135742,M,26-35,17,C,0,0,6,8,,16662
1000009,P00039942,M,26-35,17,C,0,0,8,,,5887
1000009,P00161442,M,26-35,17,C,0,0,5,14,,6973
1000009,P00078742,M,26-35,17,C,0,0,5,8,14,5391
1000010,P00085942,F,36-45,1,B,4+,1,2,4,8,16352
1000010,P00118742,F,36-45,1,B,4+,1,5,11,,8886
1000010,P00297942,F,36-45,1,B,4+,1,8,,,5875
1000010,P00266842,F,36-45,1,B,4+,1,5,,,8854
1000010,P00058342,F,36-45,1,B,4+,1,3,4,,10946
1000010,P00032442,F,36-45,1,B,4+,1,5,,,5152
1000010,P00105942,F,36-45,1,B,4+,1,5,,,7089
1000010,P00182642,F,36-45,1,B,4+,1,2,4,9,12909
1000010,P00186942,F,36-45,1,B,4+,1,5,12,,8770
1000010,P00155442,F,36-45,1,B,4+,1,1,11,15,15212
1000010,P00221342,F,36-45,1,B,4+,1,1,2,5,15705
1000010,P00087242,F,36-45,1,B,4+,1,14,,,7947
1000010,P00111142,F,36-45,1,B,4+,1,1,15,16,18963
1000010,P00259342,F,36-45,1,B,4+,1,5,9,,8718
1000010,P0094542,F,36-45,1,B,4+,1,2,4,9,16406
1000010,P00148642,F,36-45,1,B,4+,1,6,10,13,12642
1000010,P00312142,F,36-45,1,B,4+,1,8,,,10007
1000010,P00113242,F,36-45,1,B,4+,1,1,6,8,11562
Now I want to create a stacked plot of the total purchase by city and gender which looks like this:
Here is what I have tried:
import pandas
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
import numpy as np
with open('BlackFriday.csv') as csv_file:
df = pandas.read_csv(csv_file, sep=',')
# Group by user id, city and gender
users_by_city_gender = df.groupby(['City_Category','Gender'])['Purchase'].agg('sum').to_frame()
ax3 = pandas.DataFrame({'City-A': users_by_city_gender.groupby('City_Category').get_group('A').Purchase,
'City-B': users_by_city_gender.groupby('City_Category').get_group('B').Purchase,
'City-C': users_by_city_gender.groupby('City_Category').get_group('C').Purchase}).plot.hist(stacked=True)
## Switch off ticks
ax3.tick_params(axis="both", which="both", bottom=False, top=False, labelbottom=False, left=False, right=False,
labelleft=True)
# Draw horizontal axis lines
# vals = ax.get_yticks()
# for tick in vals:
# ax.axhline(y=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)
# Remove title
ax3.set_title("Total purchase by city and gender")
# Set x-axis label
ax3.set_xlabel("City category", labelpad=20, weight='bold', size=12)
# Set y-axis label
ax3.set_ylabel("Total purchase [dollars]", labelpad=20, weight='bold', size=12)
# Format y-axis label
ax3.yaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))
plt.show()
The resulting plot is which seems completely different to the plot I want. Debugging the users_by_city_gender shows it to be a dataframe of a series of cities (A, B and C) each containing the total purchase by Gender (M and F). So I think that is the data I require for plotting the chart properly.
I have looked at other questions on stackexchange for creating stacked bar plots for pandas dataframe but I have not been able to find a solution for my problem.

You can use groupby and pivot_table:
s = (df.pivot_table(
index='City_Category', columns='Gender', values='Purchase', aggfunc='sum'))
s.plot(kind='bar', stacked=True)
plt.show()
For explanation, here is what the result of the pivot looks like:
Gender F M
City_Category
A 55412.0 54047.0
B 201995.0 62543.0
C NaN 108604.0

pandas dataframe recession highlighting plot

I have a pandas dataframe as shown in the figure below which has index as yyyy-mm,
US recession period (USREC) and timeseries varaible M1. Please see table below
Date USREC M1
2000-12 1088.4
2001-01 1095.08
2001-02 1100.58
2001-03 1108.1
2001-04 1 1116.36
2001-05 1 1117.8
2001-06 1 1125.45
2001-07 1 1137.46
2001-08 1 1147.7
2001-09 1 1207.6
2001-10 1 1166.64
2001-11 1 1169.7
2001-12 1182.46
2002-01 1190.82
2002-02 1190.43
2002-03 1194.85
2002-04 1186.82
2002-05 1186.9
2002-06 1194.55
2002-07 1199.26
2002-08 1183.7
2002-09 1197.1
2002-10 1203.47
I want to plot a chart in python that looks like the attached chart which was created in excel..
I have searched for various examples online, but none are able to show the chart like below. Can you please help? Thank you.
I would appreciate if there is any easier to use plotting library which has few inputs but easy to use for majority of plots similar to plots excel provides.
EDIT:
I checked out the example in the page https://matplotlib.org/examples/pylab_examples/axhspan_demo.html. The code I have used is below.
fig, axes = plt.subplots()
df['M1'].plot(ax=axes)
ax.axvspan(['USREC'],color='grey',alpha=0.5)
So I didnt see in any of the examples in the matplotlib.org webpage where I can input another column as axvspan range. In my code above I get the error
TypeError: axvspan() missing 1 required positional argument: 'xmax'

I figured it out. I created secondary Y axis for USREC and hid the axis label just like I wanted to, but it also hid the USREC from the legend. But that is a minor thing.
def plot_var(y1):
fig0, ax0 = plt.subplots()
ax1 = ax0.twinx()
y1.plot(kind='line', stacked=False, ax=ax0, color='blue')
df['USREC'].plot(kind='area', secondary_y=True, ax=ax1, alpha=.2, color='grey')
ax0.legend(loc='upper left')
ax1.legend(loc='upper left')
plt.ylim(ymax=0.8)
plt.axis('off')
plt.xlabel('Date')
plt.show()
plt.close()
plot_var(df['M1'])

There is a problem with Zenvega's answer: The recession lines are not vertical, as they should be. What exactly goes wrong, I am not entirely sure, but I show below how to get vertical lines.
My answer uses the following syntax ax.fill_between(date_index, y1=ymin, y2=ymax, where=True/False), where I compute the y1 and y2 arguments manually from the axis object and where the where argument takes the recession data as a boolean of True or False values.
import pandas as pd
import matplotlib.pyplot as plt
# get data: see further down for `string_data`
df = pd.read_csv(string_data, skipinitialspace=True)
df['Date'] = pd.to_datetime(df['Date'])
# convenience function
def plot_series(ax, df, index='Date', cols=['M1'], area='USREC'):
# convert area variable to boolean
df[area] = df[area].astype(int).astype(bool)
# set up an index based on date
df = df.set_index(keys=index, drop=False)
# line plot
df.plot(ax=ax, x=index, y=cols, color='blue')
# extract limits
y1, y2 = ax.get_ylim()
ax.fill_between(df[index].index, y1=y1, y2=y2, where=df[area], facecolor='grey', alpha=0.4)
return ax
# set up figure, axis
f, ax = plt.subplots()
plot_series(ax, df)
ax.grid(True)
plt.show()
# copy-pasted data from OP
from io import StringIO
string_data=StringIO("""
Date,USREC,M1
2000-12,0,1088.4
2001-01,0,1095.08
2001-02,0,1100.58
2001-03,0,1108.1
2001-04,1,1116.36
2001-05,1,1117.8
2001-06,1,1125.45
2001-07,1,1137.46
2001-08,1,1147.7
2001-09,1,1207.6
2001-10,1,1166.64
2001-11,1,1169.7
2001-12,0,1182.46
2002-01,0,1190.82
2002-02,0,1190.43
2002-03,0,1194.85
2002-04,0,1186.82
2002-05,0,1186.9
2002-06,0,1194.55
2002-07,0,1199.26
2002-08,0,1183.7
2002-09,0,1197.1
2002-10,0,1203.47""")
# after formatting, the data would look like this:
>>> df.head(2)
Date USREC M1
Date
2000-12-01 2000-12-01 False 1088.40
2001-01-01 2001-01-01 False 1095.08
See how the lines are vertical:
An alternative approach would be to use plt.axvspan() which would automatically calculate the y1 and y2values.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a shaded box to a plot in python - python

Related

Change matplotlib subplots to seperate plots

How to add monthly labels to x-axis using matplotlib?

How to fix Matplotlib plotting Pandas Series blank data

How to plot stacked bar chart using one of the variables in Pandas?

pandas dataframe recession highlighting plot

Categories

Resources