Adding a shaded box to a plot in python - python
I am looking to add a shaded box to my plot below. I want the box to go from Aug 25-Aug 30 and to run the length of the Y axis.
The following is my code for the two plots I have made...
df = pd.read_excel('salinity_temp.xlsx')
dates = df['Date']
sal = df['Salinity']
temp = df['Temperature']
fig, axes = plt.subplots(2, 1, figsize=(8,8), sharex=True)
axes[0].plot(dates, sal, lw=5, color="red")
axes[0].set_ylabel('Salinity (PSU)')
axes[0].set_title('Salinity', fontsize=14)
axes[1].set_title('Temperature', fontsize=14)
axes[1].plot(dates, temp, lw=5, color="blue")
axes[1].set_ylabel('Temperature (C)')
axes[1].set_xlabel('Dates, 2017', fontsize=12)
axes[1].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[0].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[1].xaxis_date()
axes[0].xaxis_date()
I want the shaded box to highlight when Hurricane Harvey hit Houston, Texas (Aug 25- Aug 30). My data looks like:
Date Salinity Temperature
20-Aug 15.88144647 31.64707184
21-Aug 18.83088846 31.43848419
22-Aug 19.51015264 31.47655487
23-Aug 23.41655369 31.198349
24-Aug 25.16410124 30.63014984
25-Aug 25.2273574 28.8677597
26-Aug 28.35557667 27.49458313
27-Aug 18.52829235 25.92834473
28-Aug 7.423231661 24.06635284
29-Aug 0.520394177 23.47881317
30-Aug 0.238508327 23.90857697
31-Aug 0.143210364 24.30892944
1-Sep 0.206473387 25.20442963
2-Sep 0.241343182 26.32663727
3-Sep 0.58000503 26.93431854
4-Sep 1.182055098 27.8212738
5-Sep 3.632014919 28.23947906
6-Sep 4.672006985 27.29686737
7-Sep 5.938766377 26.8693161
8-Sep 9.107671159 26.48963928
9-Sep 8.180587303 26.05213165
10-Sep 6.200532091 25.73104858
11-Sep 5.144526191 25.60035706
12-Sep 5.106032451 25.73139191
13-Sep 4.279492562 26.06132507
14-Sep 5.255868992 26.74919128
15-Sep 8.026764063 27.23724365
I have tried using the rectangle function in this link (https://discuss.analyticsvidhya.com/t/how-to-add-a-patch-in-a-plot-in-python/5518) however can't seem to get it to work properly.
Independent of your specific data, it sounds like you need axvspan. Try running this after your plotting code:
for ax in axes:
ax.axvspan('2017-08-25', '2017-08-30', color='black', alpha=0.5)
This will work if dates = df['Date'] is stored as type datetime64. It might not work with other datetime data types, and it won't work if dates contains date strings.
Related
Change matplotlib subplots to seperate plots
I have gathered a code to make plots from data from multiple days. I have a data file containing over 40 days and 19k timestamps, and I need a plot, one for each day. I want python to generate them as different plots. Mr. T helped me a lot with providing the code, but I cannot manage the code to get it to plot individual plots instead of all in one subplot. Can somebody help me with this? Picture shows the current output: My code: import matplotlib.pyplot as plt import numpy as np #read your data and create datetime index df= pd.read_csv('test-februari.csv', sep=";") df.index = pd.to_datetime(df["Date"]+df["Time"].str[:-5], format="%Y:%m:%d %H:%M:%S") #group by date and hour, count entries dfcounts = df.groupby([df.index.date, df.index.hour]).size().reset_index() dfcounts.columns = ["Date", "Hour", "Count"] maxcount = dfcounts.Count.max() #group by date for plotting dfplot = dfcounts.groupby(dfcounts.Date) #plot each day into its own subplot fig, axs = plt.subplots(dfplot.ngroups, figsize=(6,8)) for i, groupdate in enumerate(dfplot.groups): ax=axs[i] #the marker is not really necessary but has been added in case there is just one entry per day ax.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o") ax.set_title(str(groupdate)) ax.set_xlim(0, 24) ax.set_ylim(0, maxcount * 1.1) ax.xaxis.set_ticks(np.arange(0, 25, 2)) plt.tight_layout() plt.show()
Welcome to the Stackoverflow. Instead of creating multiple subplots, you can create a figure on the fly and plot onto it in every loop separately. And at the end show all of them at the same time. for groupdate in dfplot.groups: fig = plt.figure() plt.plot(groupdate.Hour, groupdate.Count, color="blue", marker="o") plt.title(str(groupdate)) plt.xlim(0, 24) plt.ylim(0, maxcount * 1.1) plt.xticks(np.arange(0, 25, 2)) plt.tight_layout() plt.show()
How to add monthly labels to x-axis using matplotlib?
For an assignment I need to plot record (min and max) temperatures over the period 2004-2014 using matplotlib. The figure is almost complete (see below) except for the x axis labelling. When plotting, I did not specify the x-axis value so it generated integers from 0-365, thus the number of days in a year. Now I want the months to appear as x-axis labels instead of integers (Jan, Feb, etc.). Can someone help me out? Record low and high temperatures:
I generated source data as follows: np.random.seed(13) dates = pd.date_range(start='2014-01-01', end='2014-12-31') temp = pd.DataFrame({'tMin': np.random.normal(0, 0.5, dates.size).cumsum() - 10, 'tMax': np.random.normal(0, 0.5, dates.size).cumsum() + 10}, index=dates) To get the picture with month labels, try the following code: # Imports import matplotlib.pyplot as plt import matplotlib.dates as mdates # Drawing fig, ax = plt.subplots(figsize=(10, 4)) plt.xlabel('Month') plt.ylabel('Temp') plt.title('Temperatures 2014') ax.xaxis.set_major_locator(mdates.MonthLocator()) fmt = mdates.DateFormatter('%b %Y') ax.xaxis.set_major_formatter(fmt) ax.plot(temp.tMin) ax.plot(temp.tMax) ax.fill_between(temp.index, temp.tMin, temp.tMax, color='#A0E0A0', alpha=0.2) plt.setp(ax.get_xticklabels(), rotation=30); For the above source data I got the following picture:
How to fix Matplotlib plotting Pandas Series blank data
I am plotting a csv read in by pandas using matplotlib and the following code. Image of CSV data: fig, ax = plt.subplots(figsize=(10, 10)) plt.plot(dat['Forecast Hour'].iloc[0:45], dat['Forecasted End Time'].iloc[0:45],'b', marker='o') plt.plot(dat['Forecast Hour'].iloc[0:46], dat['Forecasted Start Time'].iloc[0:46], 'r', marker='o') bar = plt.bar(dat['Forecast Hour'].iloc[8:46], dat['Forecasted Event Length'].iloc[8:46], width=.8, color='gainsboro') ax.tick_params(which='major',labelsize='12') ax.grid(which='major', color='#CCCCCC', linestyle='-') plt.xticks(rotation='90') plt.xlabel('Forecast Run') plt.ylabel('Forecasted Start/End Time') plt.legend() ax3 = ax.twinx() mn, mx = ax.get_ylim() ax3.set_ylim(0, 12) ax3.set_ylabel('Forecasted Event Length') When I try to run the following code I receive the error message: ValueError: could not convert string to float: '11:00 PM' When I convert the Nan values to blank spaces using: dat = dat.replace(np.nan, '', regex=True) The data will plot but also include the blank space data, like so (space between 9:00 pm and x axis): Image of Graphed data Ultimately, how do I a) stop matplotlib from plotting this "blank data" or b) make 9:00 pm the 0 point for my graph axes? Any help is very much appreciated!
How to plot stacked bar chart using one of the variables in Pandas?
I am trying to work with this csv file which I have inputted as a pandas.Dataframe giving Black Friday purchase data for various shoppers along with various variables for understanding their purchase patterns. User_ID,Product_ID,Gender,Age,Occupation,City_Category,Stay_In_Current_City_Years,Marital_Status,Product_Category_1,Product_Category_2,Product_Category_3,Purchase 1000001,P00069042,F,0-17,10,A,2,0,3,,,8370 1000001,P00248942,F,0-17,10,A,2,0,1,6,14,15200 1000001,P00087842,F,0-17,10,A,2,0,12,,,1422 1000001,P00085442,F,0-17,10,A,2,0,12,14,,1057 1000002,P00285442,M,55+,16,C,4+,0,8,,,7969 1000003,P00193542,M,26-35,15,A,3,0,1,2,,15227 1000004,P00184942,M,46-50,7,B,2,1,1,8,17,19215 1000004,P00346142,M,46-50,7,B,2,1,1,15,,15854 1000004,P0097242,M,46-50,7,B,2,1,1,16,,15686 1000005,P00274942,M,26-35,20,A,1,1,8,,,7871 1000005,P00251242,M,26-35,20,A,1,1,5,11,,5254 1000005,P00014542,M,26-35,20,A,1,1,8,,,3957 1000005,P00031342,M,26-35,20,A,1,1,8,,,6073 1000005,P00145042,M,26-35,20,A,1,1,1,2,5,15665 1000006,P00231342,F,51-55,9,A,1,0,5,8,14,5378 1000006,P00190242,F,51-55,9,A,1,0,4,5,,2079 1000006,P0096642,F,51-55,9,A,1,0,2,3,4,13055 1000006,P00058442,F,51-55,9,A,1,0,5,14,,8851 1000007,P00036842,M,36-45,1,B,1,1,1,14,16,11788 1000008,P00249542,M,26-35,12,C,4+,1,1,5,15,19614 1000008,P00220442,M,26-35,12,C,4+,1,5,14,,8584 1000008,P00156442,M,26-35,12,C,4+,1,8,,,9872 1000008,P00213742,M,26-35,12,C,4+,1,8,,,9743 1000008,P00214442,M,26-35,12,C,4+,1,8,,,5982 1000008,P00303442,M,26-35,12,C,4+,1,1,8,14,11927 1000009,P00135742,M,26-35,17,C,0,0,6,8,,16662 1000009,P00039942,M,26-35,17,C,0,0,8,,,5887 1000009,P00161442,M,26-35,17,C,0,0,5,14,,6973 1000009,P00078742,M,26-35,17,C,0,0,5,8,14,5391 1000010,P00085942,F,36-45,1,B,4+,1,2,4,8,16352 1000010,P00118742,F,36-45,1,B,4+,1,5,11,,8886 1000010,P00297942,F,36-45,1,B,4+,1,8,,,5875 1000010,P00266842,F,36-45,1,B,4+,1,5,,,8854 1000010,P00058342,F,36-45,1,B,4+,1,3,4,,10946 1000010,P00032442,F,36-45,1,B,4+,1,5,,,5152 1000010,P00105942,F,36-45,1,B,4+,1,5,,,7089 1000010,P00182642,F,36-45,1,B,4+,1,2,4,9,12909 1000010,P00186942,F,36-45,1,B,4+,1,5,12,,8770 1000010,P00155442,F,36-45,1,B,4+,1,1,11,15,15212 1000010,P00221342,F,36-45,1,B,4+,1,1,2,5,15705 1000010,P00087242,F,36-45,1,B,4+,1,14,,,7947 1000010,P00111142,F,36-45,1,B,4+,1,1,15,16,18963 1000010,P00259342,F,36-45,1,B,4+,1,5,9,,8718 1000010,P0094542,F,36-45,1,B,4+,1,2,4,9,16406 1000010,P00148642,F,36-45,1,B,4+,1,6,10,13,12642 1000010,P00312142,F,36-45,1,B,4+,1,8,,,10007 1000010,P00113242,F,36-45,1,B,4+,1,1,6,8,11562 Now I want to create a stacked plot of the total purchase by city and gender which looks like this: Here is what I have tried: import pandas import matplotlib.pyplot as plt from matplotlib.ticker import StrMethodFormatter import numpy as np with open('BlackFriday.csv') as csv_file: df = pandas.read_csv(csv_file, sep=',') # Group by user id, city and gender users_by_city_gender = df.groupby(['City_Category','Gender'])['Purchase'].agg('sum').to_frame() ax3 = pandas.DataFrame({'City-A': users_by_city_gender.groupby('City_Category').get_group('A').Purchase, 'City-B': users_by_city_gender.groupby('City_Category').get_group('B').Purchase, 'City-C': users_by_city_gender.groupby('City_Category').get_group('C').Purchase}).plot.hist(stacked=True) ## Switch off ticks ax3.tick_params(axis="both", which="both", bottom=False, top=False, labelbottom=False, left=False, right=False, labelleft=True) # Draw horizontal axis lines # vals = ax.get_yticks() # for tick in vals: # ax.axhline(y=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1) # Remove title ax3.set_title("Total purchase by city and gender") # Set x-axis label ax3.set_xlabel("City category", labelpad=20, weight='bold', size=12) # Set y-axis label ax3.set_ylabel("Total purchase [dollars]", labelpad=20, weight='bold', size=12) # Format y-axis label ax3.yaxis.set_major_formatter(StrMethodFormatter('{x:,g}')) plt.show() The resulting plot is which seems completely different to the plot I want. Debugging the users_by_city_gender shows it to be a dataframe of a series of cities (A, B and C) each containing the total purchase by Gender (M and F). So I think that is the data I require for plotting the chart properly. I have looked at other questions on stackexchange for creating stacked bar plots for pandas dataframe but I have not been able to find a solution for my problem.
You can use groupby and pivot_table: s = (df.pivot_table( index='City_Category', columns='Gender', values='Purchase', aggfunc='sum')) s.plot(kind='bar', stacked=True) plt.show() For explanation, here is what the result of the pivot looks like: Gender F M City_Category A 55412.0 54047.0 B 201995.0 62543.0 C NaN 108604.0
pandas dataframe recession highlighting plot
I have a pandas dataframe as shown in the figure below which has index as yyyy-mm, US recession period (USREC) and timeseries varaible M1. Please see table below Date USREC M1 2000-12 1088.4 2001-01 1095.08 2001-02 1100.58 2001-03 1108.1 2001-04 1 1116.36 2001-05 1 1117.8 2001-06 1 1125.45 2001-07 1 1137.46 2001-08 1 1147.7 2001-09 1 1207.6 2001-10 1 1166.64 2001-11 1 1169.7 2001-12 1182.46 2002-01 1190.82 2002-02 1190.43 2002-03 1194.85 2002-04 1186.82 2002-05 1186.9 2002-06 1194.55 2002-07 1199.26 2002-08 1183.7 2002-09 1197.1 2002-10 1203.47 I want to plot a chart in python that looks like the attached chart which was created in excel.. I have searched for various examples online, but none are able to show the chart like below. Can you please help? Thank you. I would appreciate if there is any easier to use plotting library which has few inputs but easy to use for majority of plots similar to plots excel provides. EDIT: I checked out the example in the page https://matplotlib.org/examples/pylab_examples/axhspan_demo.html. The code I have used is below. fig, axes = plt.subplots() df['M1'].plot(ax=axes) ax.axvspan(['USREC'],color='grey',alpha=0.5) So I didnt see in any of the examples in the matplotlib.org webpage where I can input another column as axvspan range. In my code above I get the error TypeError: axvspan() missing 1 required positional argument: 'xmax'
I figured it out. I created secondary Y axis for USREC and hid the axis label just like I wanted to, but it also hid the USREC from the legend. But that is a minor thing. def plot_var(y1): fig0, ax0 = plt.subplots() ax1 = ax0.twinx() y1.plot(kind='line', stacked=False, ax=ax0, color='blue') df['USREC'].plot(kind='area', secondary_y=True, ax=ax1, alpha=.2, color='grey') ax0.legend(loc='upper left') ax1.legend(loc='upper left') plt.ylim(ymax=0.8) plt.axis('off') plt.xlabel('Date') plt.show() plt.close() plot_var(df['M1'])
There is a problem with Zenvega's answer: The recession lines are not vertical, as they should be. What exactly goes wrong, I am not entirely sure, but I show below how to get vertical lines. My answer uses the following syntax ax.fill_between(date_index, y1=ymin, y2=ymax, where=True/False), where I compute the y1 and y2 arguments manually from the axis object and where the where argument takes the recession data as a boolean of True or False values. import pandas as pd import matplotlib.pyplot as plt # get data: see further down for `string_data` df = pd.read_csv(string_data, skipinitialspace=True) df['Date'] = pd.to_datetime(df['Date']) # convenience function def plot_series(ax, df, index='Date', cols=['M1'], area='USREC'): # convert area variable to boolean df[area] = df[area].astype(int).astype(bool) # set up an index based on date df = df.set_index(keys=index, drop=False) # line plot df.plot(ax=ax, x=index, y=cols, color='blue') # extract limits y1, y2 = ax.get_ylim() ax.fill_between(df[index].index, y1=y1, y2=y2, where=df[area], facecolor='grey', alpha=0.4) return ax # set up figure, axis f, ax = plt.subplots() plot_series(ax, df) ax.grid(True) plt.show() # copy-pasted data from OP from io import StringIO string_data=StringIO(""" Date,USREC,M1 2000-12,0,1088.4 2001-01,0,1095.08 2001-02,0,1100.58 2001-03,0,1108.1 2001-04,1,1116.36 2001-05,1,1117.8 2001-06,1,1125.45 2001-07,1,1137.46 2001-08,1,1147.7 2001-09,1,1207.6 2001-10,1,1166.64 2001-11,1,1169.7 2001-12,0,1182.46 2002-01,0,1190.82 2002-02,0,1190.43 2002-03,0,1194.85 2002-04,0,1186.82 2002-05,0,1186.9 2002-06,0,1194.55 2002-07,0,1199.26 2002-08,0,1183.7 2002-09,0,1197.1 2002-10,0,1203.47""") # after formatting, the data would look like this: >>> df.head(2) Date USREC M1 Date 2000-12-01 2000-12-01 False 1088.40 2001-01-01 2001-01-01 False 1095.08 See how the lines are vertical: An alternative approach would be to use plt.axvspan() which would automatically calculate the y1 and y2values.