I have a time series that I would like to plot year on year. I want the data to be daily, but the axis to show each month as "Jan", "Feb" etc.
At the moment I can get the daily data, BUT the axis is 1-366 (the day of the year).
Or I can get the monthly axis as 1, 2, 3 etc (by changing the index to df.index.month), BUT then the data is monthly.
How can I convert the day of year axis into months? Or how can I do this?
Code showing the daily data, but the axis is wrong:
# import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='31-12-2018', freq='D')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by day in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.dayofyear, columns=df.index.year, values='Data')
df_pivot.plot()
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
This can be done using the xticks function. Simply add the following code before plt.show():
plt.xticks(np.linspace(0,365,13)[:-1], ('Jan', 'Feb' ... 'Nov', 'Dec'))
Or the following to have the month names appear in the middle of the month:
plt.xticks(np.linspace(15,380,13)[:-1], ('Jan', 'Feb' ... 'Nov', 'Dec'))
It may be more straightforward to simply add a datetime index to your pivoted dataframe.
df_pivot.index = pd.date_range(
df.index.max() - pd.Timedelta(days=df_pivot.shape[0]),
freq='D', periods=df_pivot.shape[0])
df_pivot.plot()
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
The resulting plot has the axis as desired:
This method also has the advantage over the accepted answer of working irrespective of your start and end date. For example, if you change your index's end date to end='30-Jun-2018', the axis adapts nicely to fit the data:
Related
I am trying to plot a simple pandas Series object, its something like this:
2018-01-01 10
2018-01-02 90
2018-01-03 79
...
2020-01-01 9
2020-01-02 72
2020-01-03 65
It includes only the first month of each year, so it only contains the month January and all its values through the days.
When i try to plot it
# suppose the name of the series is dates_and_values
dates_and_values.plot()
It returns a plot like this (made using my current data)
It is clearly plotting by year and then the month, so it looks pretty squished and small, since i don't have any other months except January, is there a way to plot it by the year and day so it outputs a better plot to observe the days.
the x-axis is the index of the dataframe
dates are a continuous series, x-axis is continuous
change index to be a string of values, means it it no longer continuous and squishes your graph
have generated some sample data that only has January to demonstrate
import matplotlib.pyplot as plt
cf = pd.tseries.offsets.CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu Fri Sat",
holidays=[d for d in pd.date_range("01-jan-1990",periods=365*50, freq="D")
if d.month!=1])
d = pd.date_range("01-jan-2015", periods=200, freq=cf)
df = pd.DataFrame({"Values":np.random.randint(20,70,len(d))}, index=d)
fig, ax = plt.subplots(2, figsize=[14,6])
df.set_index(df.index.strftime("%Y %d")).plot(ax=ax[0])
df.plot(ax=ax[1])
I suggest that you convert the series to a dataframe and then pivot it to get one column for each year. This lets you plot the data for each year with a separate line, either in the same plot using different colors or in subplots. Here is an example:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample series
rng = np.random.default_rng(seed=123) # random number generator
dt = pd.date_range('2018-01-01', '2020-01-31', freq='D')
dt_jan = dt[dt.month == 1]
series = pd.Series(rng.integers(20, 90, size=dt_jan.size), index=dt_jan)
# Convert series to dataframe and pivot it
df_raw = series.to_frame()
df_pivot = df_raw.pivot_table(index=df_raw.index.day, columns=df_raw.index.year)
df = df_pivot.droplevel(axis=1, level=0)
df.head()
# Plot all years together in different colors
ax = df.plot(figsize=(10,4))
ax.set_xlim(1, 31)
ax.legend(frameon=False, bbox_to_anchor=(1, 0.65))
ax.set_xlabel('January', labelpad=10, size=12)
for spine in ['top', 'right']:
ax.spines[spine].set_visible(False)
# Plot years separately
axs = df.plot(subplots=True, color='tab:blue', sharey=True,
figsize=(10,8), legend=None)
for ax in axs:
ax.set_xlim(1, 31)
ax.grid(axis='x', alpha=0.3)
handles, labels = ax.get_legend_handles_labels()
ax.text(28.75, 80, *labels, size=14)
if ax.is_last_row():
ax.set_xlabel('January', labelpad=10, size=12)
ax.figure.subplots_adjust(hspace=0)
file_path is an excel file with a column 'Year' of year numbers ranging from 1940 to 2018 and another column 'Divide Year 1976' indicating Pre-1976 or 1976-Present.
# Load excel file as a pandas data_frame
data = pd.read_excel(file_path, sheet_name=5, skiprows=1)
data_frame = pd.DataFrame(data)
# create an extra column in data_frame with bin from 1930 to 2020 with 10 years interval
data_frame['bin Year'] = pd.cut(data_frame.Year, bins=np.arange(1930, 2030, 10, dtype=int))
# Plot stacked bar plot
color_table = pd.crosstab(index=data_frame['bin Year'], columns=data_frame['Divide Year 1976'])
color_table.plot(kind='bar', figsize=(6.5, 3.5), stacked=True, legend=None, edgecolor='black')
# Add xticks
plt.xticks(locs, ['1930s','1940s','1950s','1960s','1970s','1980s','1990s','2000s','2010s'], fontsize=8, rotation=45)
The problem here is that colortable.plot() function automatically ignores the interval that has 0 counts, in my case which is 1940-1950. How can I force the code to display bars that has zero counts in certain intervals?
enter image description here
Use parameter dropna in crosstab.
color_table = pd.crosstab(
index=data_frame['bin Year'],
columns=data_frame['Divide Year 1976'],
dropna=False)
See the docs
For a simple time series:
import pandas as pd
df = pd.DataFrame({'dt':['2020-01-01', '2020-01-02', '2020-01-04', '2020-01-05', '2020-01-06'], 'foo':[1,2, 4,5,6]})
df['dt'] = pd.to_datetime(df.dt)
df['dt_label']= df['dt'].dt.strftime('%Y-%m-%d %a')
df = df.set_index('dt')
#display(df)
df['foo'].plot()
x =plt.xticks(ticks=df.reset_index().dt.values, labels=df.dt_label, rotation=90, horizontalalignment='right')
How can I highlight the x-axis labels for weekends?
edit
Pandas Plots: Separate color for weekends, pretty printing times on x axis
suggests:
def highlight_weekends(ax, timeseries):
d = timeseries.dt
ranges = timeseries[d.dayofweek >= 5].groupby(d.year * 100 + d.weekofyear).agg(['min', 'max'])
for i, tmin, tmax in ranges.itertuples():
ax.axvspan(tmin, tmax, facecolor='orange', edgecolor='none', alpha=0.1)
but applying it with
highlight_weekends(ax, df.reset_index().dt)
will not change the plot
I've extended your sample data a little so we can can make sure that we can highlight more than a single weekend instance.
In this solution I create a column 'weekend', which is a column of bools indicating whether the corresponding date was at a weekend.
We then loop over these values and make a call to ax.axvspan
import pandas as pd
import matplotlib.pyplot as plt
# Add a couple of extra dates to sample data
df = pd.DataFrame({'dt': ['2020-01-01',
'2020-01-02',
'2020-01-04',
'2020-01-05',
'2020-01-06',
'2020-01-07',
'2020-01-09',
'2020-01-10',
'2020-01-11',
'2020-01-12']})
# Fill in corresponding observations
df['foo'] = range(df.shape[0])
df['dt'] = pd.to_datetime(df.dt)
df['dt_label']= df['dt'].dt.strftime('%Y-%m-%d %a')
df = df.set_index('dt')
ax = df['foo'].plot()
plt.xticks(ticks=df.reset_index().dt.values,
labels=df.dt_label,
rotation=90,
horizontalalignment='right')
# Create an extra column which highlights whether or not a date occurs at the weekend
df['weekend'] = df['dt_label'].apply(lambda x: x.endswith(('Sat', 'Sun')))
# Loop over weekend pairs (Saturdays and Sundays), and highlight
for i in range(df['weekend'].sum() // 2):
ax.axvspan(df[df['weekend']].index[2*i],
df[df['weekend']].index[2*i+1],
alpha=0.5)
Here is a solution that uses the fill_between plotting function and the x-axis units so that weekends can be highlighted independently from the DatetimeIndex and the frequency of the data.
The x-axis limits are used to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. Then a weekends mask is computed and passed to the where argument of the fill_between function. The masks are processed as right-exclusive so in this case, they must contain Mondays for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.
Note that contrary to axvspan, the fill_between function needs the y1 and y2 arguments. For some reason, using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. This issue is solved by running ax.set_ylim(*ax.get_ylim()) just after creating the plot.
Here is a complete example based on the provided sample code and using an extended dataset similar to the answer provided by jwalton:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
# Create sample dataset
dt = pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-04', '2020-01-05',
'2020-01-06', '2020-01-07', '2020-01-09', '2020-01-10',
'2020-01-11', '2020-01-14'])
df = pd.DataFrame(dict(foo=range(len(dt))), index=dt)
# Draw pandas plot: setting x_compat=True converts the pandas x-axis units to
# matplotlib date units. This is not necessary for this particular example but
# it is necessary for all cases where the dataframe contains a continuous
# DatetimeIndex (for example ones created with pd.date_range) that uses a
# frequency other than daily
ax = df['foo'].plot(x_compat=True, figsize=(6,4), ylabel='foo')
ax.set_ylim(*ax.get_ylim()) # reset y limits to display highlights without gaps
# Highlight weekends based on the x-axis units
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2) # range of days in date units
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Create and format x tick for each data point
plt.xticks(df.index.values, df.index.strftime('%d\n%a'), rotation=0, ha='center')
plt.title('Weekends are highlighted from SAT 00:00 to MON 00:00', pad=15, size=12);
You can find more examples of this solution in the answers I have posted here and here.
Suppose I have a random sample of data collected every 1 minute for a month. Then suppose I want to use pandas to analyze this data as a function of the time of day, and see the differences between a weekend and weekday. I can do this in pandas if my index is a DateTimeIndex by calculating the time of day as a 0-1 decimal value, manually binning the results in intervals of 10 minutes (or whatever) and then plotting the results using the bins column to actually calculate averages over the time intervals of the day, and then manually setting my tick positions and labels into something understandable.
However, this feels a little bit hacky and I am wondering if there are built-in pandas functions to achieve this same kind of analysis. I haven't been able to find them so far.
dates = pd.date_range(start='2018-10-01', end='2018-11-01', freq='min')
vals = np.random.rand(len(dates))
df = pd.DataFrame(data={'dates': dates, 'vals': vals})
df.set_index('dates', inplace=True)
# set up a column to make the time of day a value from 0 to 1
df['day_fraction'] = (df.index.hour + df.index.minute / 60) / 24
# bin the time of day to analyze data during 10 minute intervals
df['day_bins'] = df['day_fraction'] - df['day_fraction'] % (1 / 24 / 6)
ax = df.plot('day_fraction', 'vals', marker='o', color='pink', alpha=0.05, label='')
df.groupby('day_bins')['vals'].mean().plot(ax=ax, label='average')
df[df.index.weekday < 5].groupby('day_bins')['vals'].mean().plot(ax=ax, label='weekday average')
df[df.index.weekday >= 5].groupby('day_bins')['vals'].mean().plot(ax=ax, label='weekend average')
xlabels = [label if label else 12 for label in [i % 12 for i in range(0, 25, 2)]]
xticks = [i / 24 for i in range(0, 25, 2)]
ax.set_xticks(xticks)
ax.set_xticklabels(xlabels)
ax.set_xlabel('time of day')
ax.legend()
I think you just need to use groupby with a lot of the built in .dt accessors. Group based on weekday or weekend and then form bins every 10 minutes (with .floor) and calculate the mean.
Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dates = pd.date_range(start='2018-10-01', end='2018-11-01', freq='min')
vals = np.random.rand(len(dates))
df = pd.DataFrame(data={'dates': dates, 'vals': vals})
df.set_index('dates', inplace=True)
Plot
df1 = (df.groupby([np.where(df.index.weekday < 5, 'weekday', 'weekend'),
df.index.floor('10min').time])
.mean()
.rename(columns={'vals': 'average'}))
fig, ax = plt.subplots(figsize=(12,7))
df1.unstack(0).plot(ax=ax)
# Plot Full Average
df.groupby(df.index.floor('10min').time).mean().rename(columns={'vals': 'average'}).plot(ax=ax)
plt.show()
I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)
You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))