I want to draw a plot in matplotlib that shows the temperature of August in 2016 and 2017. x-axis is time and y-axis is temparature. I try to stack 2 plots (one for 2016, one for 2017) on top of each other by sharing the x-axis that ranges from 2016-08-01 00:00:00 to 2016-08-31 23:00:00 and showing only the day of the month.
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%d')
# times series from 2016-08-01 00:00:00 to 2016-08-31 23:00:00
x = stats_august_2016.MESS_DATUM
# temperature in 08.2016
y1 = stats_august_2016.TT_TU
# temperature in 08.2017
y2 = stats_august_2017.TT_TU
f, ax = plt.subplots()
# plot temp in 08.2016
ax.plot(x, y1, 'yellow', label = '2016')
# plot temp in 08.2017
ax.plot(x, y2, 'red', label = '2017')
# format x-axis to show only days of the month
ax.xaxis.set_major_formatter(myFmt)
ax.grid(True)
plt.rcParams["figure.figsize"] = (12, 8)
plt.xlabel("Day of the Month", fontsize = 20, color = 'Blue')
plt.xticks(fontsize = 15)
plt.ylabel("Temperature ($\degree$C)", fontsize = 20, color = 'Blue')
plt.yticks(fontsize = 15)
ax.set_ylim(5, 35)
plt.title("Temperature in August 2016 and 2017", fontsize = 30, color = 'DarkBlue')
plt.legend(prop = {'size': 20}, frameon = True, fancybox = True, shadow = True, framealpha = 1, bbox_to_anchor=(1.22, 1))
plt.show()
Everything looks fine except the last tick of x-axis is somehow 2016-09-01 00:00:00. And the result looks odd with the 1 at the end.
How can I fix this?
The problem is, that your data is ranging until to some time late at the 31st of August of each year
# times series from 2016-08-01 00:00:00 to 2016-08-31 23:00:00
Matplotlib is then autoscaling the axis reaching up to the first day of the next month, displayed as a 1 in your chosen format. If you want to avoid this, you can set the x limit of the axis to the last timestamp of your data
ax.set_xlim([x[0], x[-1]])
The whitespace margin left and right of your axis will disappear then, though. If you want to keep this margin and still want to avoid the ticklabel of 1st of september, you can hide the last x-tick label with
xticks = ax.xaxis.get_major_ticks()
xticks[-1].label1.set_visible(False)
try:
ax.set_xlim(right=pd.Timestamp("2016-08-30 00:00:00"))
This will set the limit to day 30th.
Related
Hi I want to stack time series per a year.
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021JB022650
This is the paper I read and at fig 5, they did annual stack (about the fig 5, they referred as "Each subplot of Figure 5 includes the annual stacks of normalized data").
I have timeseries as below for two years and want to do the job in python.
2011-01-01 0.034
2011-01-02 -0.234
...
2012-12-30 0.363
2012-12-31 0.092
So I think I have to divide the timeseries from 2011 year and 2012 year and stack the two timeseries. However, I could not figure out the way to stack timeseries.
What code I have to use for stacking annually?
You want to stack timeseries data by year for a given number of years. To stack your data together, you can use matplotlib and repeatedly plot each year of data onto a particular plot/subplot.
To stack annual data together, there's also the question of how to treat for leap days. The following code treats leap day Feb 29 as a necessary value to appear on the x-axis, so non-leap years are treated as not having a datapoint on that day.
I've also tried approximating the awesome layout of the graphs shown in your picture.
import matplotlib.pyplot as plt
import datetime
from calendar import isleap
import random
# Get day number (counted from start of year) for any datetime obj.
# Day numbers go all the way to 366 per year (counts leap day Feb 29).
def daysFromYearStart(dt):
td = dt - datetime.datetime(dt.year,1,1)
return td.days+2 if not isleap(dt.year) and td.days > 58 else td.days+1
t1 = datetime.datetime(2000,1,1)
t2 = datetime.datetime(2005,12,31)
tdelta = t2 - t1
# Days from t1 to t2 as datetime objs.
dates = [datetime.datetime(t1.year, 1, 1) + datetime.timedelta(days=k) for k in range(tdelta.days + 1)]
# Integer day numbers to plot as x-values.
x = list(map(daysFromYearStart, dates))
# Index positions of year starts + year end.
idx = [i for i,v in enumerate(x) if v==1] + [len(dates)]
# Random numeric y values.
y = list(map(lambda x: x+400*random.random(), range(tdelta.days + 1)))
fig, ax = plt.subplots(1,1)
color_cycler = ['green','blue','red','orange','purple','brown']
# This stacks lines together on each plot.
for k in range(len(idx) - 1):
ax.plot(x[idx[k]:idx[k+1]], y[idx[k]:idx[k+1]], color=color_cycler[k], marker='_')
# Add a legend outside of the plot.
ax.legend([f'Year {k}' for k in range(t1.year,t2.year + 1)], bbox_to_anchor=(1.02, 1), loc='upper left')
# Set title and axis labels.
ax.set_title('Stacked Timeseries Data')
ax.set_xlabel('Months')
ax.set_ylabel('Data to be normalized')
# Set grid lines.
ax.grid(visible=True, which='major', axis='both', alpha=0.5)
# Set x-axis major and minor ticks and labels.
ax.set_xticks([1, 92, 183, 275], labels=['Jan','Apr','Jul','Oct'])
ax.set_xticks([1, 32, 61, 92, 122, 153, 183, 214, 245, 275, 306, 336, 366], minor=True)
# Set ticks to also display on the top and right sides of plot.
ax.xaxis.set_ticks_position('both')
ax.yaxis.set_ticks_position('both')
# Set ticks to face inward in plot.
ax.tick_params(axis='both', direction='in', length=10)
ax.tick_params(axis='both', which='minor', direction='in', length=5)
# Rotate xlabels.
ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha="left")
# Display properly and show plot.
fig.tight_layout()
plt.show()
Here's the output:
I am trying to plot a simple pandas Series object, its something like this:
2018-01-01 10
2018-01-02 90
2018-01-03 79
...
2020-01-01 9
2020-01-02 72
2020-01-03 65
It includes only the first month of each year, so it only contains the month January and all its values through the days.
When i try to plot it
# suppose the name of the series is dates_and_values
dates_and_values.plot()
It returns a plot like this (made using my current data)
It is clearly plotting by year and then the month, so it looks pretty squished and small, since i don't have any other months except January, is there a way to plot it by the year and day so it outputs a better plot to observe the days.
the x-axis is the index of the dataframe
dates are a continuous series, x-axis is continuous
change index to be a string of values, means it it no longer continuous and squishes your graph
have generated some sample data that only has January to demonstrate
import matplotlib.pyplot as plt
cf = pd.tseries.offsets.CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu Fri Sat",
holidays=[d for d in pd.date_range("01-jan-1990",periods=365*50, freq="D")
if d.month!=1])
d = pd.date_range("01-jan-2015", periods=200, freq=cf)
df = pd.DataFrame({"Values":np.random.randint(20,70,len(d))}, index=d)
fig, ax = plt.subplots(2, figsize=[14,6])
df.set_index(df.index.strftime("%Y %d")).plot(ax=ax[0])
df.plot(ax=ax[1])
I suggest that you convert the series to a dataframe and then pivot it to get one column for each year. This lets you plot the data for each year with a separate line, either in the same plot using different colors or in subplots. Here is an example:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample series
rng = np.random.default_rng(seed=123) # random number generator
dt = pd.date_range('2018-01-01', '2020-01-31', freq='D')
dt_jan = dt[dt.month == 1]
series = pd.Series(rng.integers(20, 90, size=dt_jan.size), index=dt_jan)
# Convert series to dataframe and pivot it
df_raw = series.to_frame()
df_pivot = df_raw.pivot_table(index=df_raw.index.day, columns=df_raw.index.year)
df = df_pivot.droplevel(axis=1, level=0)
df.head()
# Plot all years together in different colors
ax = df.plot(figsize=(10,4))
ax.set_xlim(1, 31)
ax.legend(frameon=False, bbox_to_anchor=(1, 0.65))
ax.set_xlabel('January', labelpad=10, size=12)
for spine in ['top', 'right']:
ax.spines[spine].set_visible(False)
# Plot years separately
axs = df.plot(subplots=True, color='tab:blue', sharey=True,
figsize=(10,8), legend=None)
for ax in axs:
ax.set_xlim(1, 31)
ax.grid(axis='x', alpha=0.3)
handles, labels = ax.get_legend_handles_labels()
ax.text(28.75, 80, *labels, size=14)
if ax.is_last_row():
ax.set_xlabel('January', labelpad=10, size=12)
ax.figure.subplots_adjust(hspace=0)
I was able to run the "mpl_finance" candlestick_ohlc function and the graph appeared as expected, using the following (only relevant) code:
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # e.g., Jan 12
dayFormatter = DateFormatter('%d') # e.g., 12
fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
candlestick_ohlc(ax, zip(mdates.date2num(quotes.index.to_pydatetime()),
quotes['open'], quotes['high'],
quotes['low'], quotes['close']),
width=0.6)
ax.xaxis_date()
ax.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
plt.title('PETR4 daily quotes')
plt.show()
Now I would like to "add" on this graph (say) a horizontal red line at y = 26.5 ... how should I proceed?
(My real question is: how/where should I type something like axvline(...) so that I am able to make new data appear inside the same graph?)
Thanks!
Sure, DavidG. Thanks again for your help. Hope to see you in other posts.
The interested readers will be able to adapt this "real stuff" below (it´s working)!
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # e.g., Jan 12
dayFormatter = DateFormatter('%d') # e.g., 12
fig, aux = plt.subplots()
fig.subplots_adjust(bottom=0.2)
aux.xaxis.set_major_locator(mondays)
aux.xaxis.set_minor_locator(alldays)
aux.xaxis.set_major_formatter(weekFormatter)
candlestick_ohlc(aux, zip(mdates.date2num(quotes.index.to_pydatetime()),
quotes['open'], quotes['high'],
quotes['low'], quotes['close']),
width=0.6)
for i in range(len(features_period.date)):
plt.plot(quotes.index, quotes.close , 'd', color='blue')
aux.xaxis_date()
aux.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
plt.title('USIM5 daily quotes')
plt.rcParams['figure.figsize'] = [10, 10]
display(candlestick_ohlc);
(The blue dots were added to the graph created by the module used/mentioned.)
Regards,
fskilnik
I have a Pandas dataframe that contains columns representing year, month within year and a binary outcome (0/1). I want to plot a column of barcharts with one barchart per year. I've used the subplots() function in matplotlib.pyplot with sharex = True and sharey = True. The graphs look fine except the padding between the y-ticks and the y-tick labels is different on the final (bottom) graph.
An example dataframe can be created as follows:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generate dataframe containing dates over several years and a random binary outcome
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range(start = pd.to_datetime('2014-01-01'),end = pd.to_datetime('2017-12-31'))
tempDF['case'] = np.random.choice([0,1],size = [len(tempDF.index)],p = [0.9,0.1])
# Create a dataframe that summarises proportion of cases per calendar month
tempGroupbyCalendarMonthDF = tempDF.groupby([tempDF['date'].dt.year,tempDF['date'].dt.month]).agg({'case': sum,
'date': 'count'})
tempGroupbyCalendarMonthDF.index.names = ['year','month']
tempGroupbyCalendarMonthDF = tempGroupbyCalendarMonthDF.reset_index()
# Rename columns to something more meaningful
tempGroupbyCalendarMonthDF = tempGroupbyCalendarMonthDF.rename(columns = {'case': 'numberCases',
'date': 'monthlyTotal'})
# Calculate percentage positive cases per month
tempGroupbyCalendarMonthDF['percentCases'] = (tempGroupbyCalendarMonthDF['numberCases']/tempGroupbyCalendarMonthDF['monthlyTotal'])*100
The final dataframe looks something like:
year month monthlyTotal numberCases percentCases
0 2014 1 31 5 16.129032
1 2014 2 28 5 17.857143
2 2014 3 31 3 9.677419
3 2014 4 30 1 3.333333
4 2014 5 31 4 12.903226
.. ... ... ... ... ...
43 2017 8 31 2 6.451613
44 2017 9 30 2 6.666667
45 2017 10 31 3 9.677419
46 2017 11 30 2 6.666667
47 2017 12 31 1 3.225806
Then the plots are produced as shown below. The subplots() function is used to return an array of axes. The code steps through each axis and plots the values. The x-axis ticks and labels are only displayed on the final (bottom) plot. Finally, the get a common y-axis label, an additional subplot is added that covers all the bar graphs but all the axes and labels (except the y axis label) are not displayed.
# Calculate minimumn and maximum years in dataset
minYear = tempDF['date'].min().year
maxYear = tempDF['date'].max().year
# Set a few parameters
barWidth = 0.80
labelPositionX = 0.872
labelPositionY = 0.60
numberSubplots = maxYear - minYear + 1
fig, axArr = plt.subplots(numberSubplots,figsize = [8,10],sharex = True,sharey = True)
# Keep track of which year to plot, starting with first year in dataset.
currYear = minYear
# Step through each subplot
for ax in axArr:
# Plot the data
rects = ax.bar(tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'month'],
tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'percentCases'],
width = barWidth)
# Format the axes
ax.set_xlim([0.8,13])
ax.set_ylim([0,40])
ax.grid(True)
ax.tick_params(axis = 'both',
left = 'on',
bottom = 'off',
top = 'off',
right = 'off',
direction = 'out',
length = 4,
width = 2,
labelsize = 14)
# Turn on the x-axis ticks and labels for final plot only
if currYear == maxYear:
ax.tick_params(bottom = 'on')
xtickPositions = [1,2,3,4,5,6,7,8,9,10,11,12]
ax.set_xticks([x + barWidth/2 for x in xtickPositions])
ax.set_xticklabels(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
ytickPositions = [10,20,30]
ax.set_yticks(ytickPositions)
# Add label (in this case, the year) to each subplot
# (The transform = ax.transAxes makes positios relative to current axis.)
ax.text(labelPositionX, labelPositionY,currYear,
horizontalalignment = 'center',
verticalalignment = 'center',
transform = ax.transAxes,
family = 'sans-serif',
fontweight = 'bold',
fontsize = 18,
color = 'gray')
# Onto the next year...
currYear = currYear + 1
# Fine-tune overall figure
# ========================
# Make subplots close to each other.
fig.subplots_adjust(hspace=0)
# To display a common y-axis label, create a large axis that covers all the subplots
fig.add_subplot(111, frameon=False)
# Hide tick and tick label of the big axss
plt.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
# Add y label that spans several subplots
plt.ylabel("Percentage cases (%)", fontsize = 16, labelpad = 20)
plt.show()
The figure that is produced is almost exactly what I want but the y-axis tick labels on the bottom plot are set further from the axis compared with all the other plots. If the number of plots produced is altered (by using a wider range of dates), the same pattern occurs, namely, it's only the final plot that appears different.
I'm almost certainly not seeing the wood for the trees but can anyone spot what I've done wrong?
EDIT
The above code was original run on Matplotlib 1.4.3 (see comment by ImportanceOfBeingErnest). However, when updated to the Matplotlib 2.0.2 the code failed to run (KeyError: 0). The reason appears to be that the default setting in Matplotlib 2.xxx is for bars to be aligned center. To get the above code to run, either adjust the x-axis range and tick positions so that the bars don't extend beyond the y-axis or set align='center' in the plotting function, i.e.:
rects = ax.bar(tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'month'],
tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'percentCases'],
width = barWidth,
align = 'edge')
I would like to plot a graph with the below sample, X-axis: 'Time', Y-axis: 'Celcius'. With the attached code, I got [09:00:00.000000 09:05:00.000000 ... 09:30:00.000000] at the x-axis, instead of [2013-01-02 09:00 2013-01-02 09:05 ... 2013-01-02 09:30].
Does anyone know what the correct way to format x-axis to the designated format is?
data = {'Celcius': [36.906441135554658, 51.286294403017202], 'Time': [datetime.datetime(2013, 1, 2, 9, 0), datetime.datetime(2013, 1, 2, 9, 30)]}
def plotTemperature(self, data):
logging.debug(data)
t = data.get('Time')
T = data.get('Celcius')
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
days = mdates.DayLocator() # every day
hours = mdates.HourLocator() # every hour
minutes = mdates.MinuteLocator() # every minute
yearsFmt = mdates.DateFormatter('%Y')
hoursFmt = mdates.DateFormatter('%H')
fig, ax = plt.subplots()
ax.plot(t, T)
# format the ticks
# ax.xaxis.set_major_locator(hours)
# ax.xaxis.set_major_formatter(hoursFmt)
# ax.xaxis.set_minor_locator(minutes)
datemin = datetime.datetime(min(t).year, min(t).month, min(t).day, min(t).hour, min(t).minute)
datemax = datetime.datetime(max(t).year, max(t).month, max(t).day, max(t).hour, max(t).minute)
ax.set_xlim(datemin, datemax)
# format the coords message box
def temperature(x): return '$%1.2f'%x
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d %H:%M')
ax.format_ydata = temperature
ax.grid(True)
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
plt.show()
Add this:
import dateutil
import matplotlib.dates as mdates
ymdhFmt = mdates.DateFormatter('%Y-%m-%d %H:%M')
rule = mdates.rrulewrapper(dateutil.rrule.MINUTELY, interval=30)
loc = mdates.RRuleLocator(rule)
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(ymdhFmt)
This answer was posted as an edit to the question matplotlib Plot X-axis with '%Y-%m-%d %H:%M' by the OP twfx under CC BY-SA 3.0.