How to add monthly labels to x-axis using matplotlib? - python

For an assignment I need to plot record (min and max) temperatures over the period 2004-2014 using matplotlib. The figure is almost complete (see below) except for the x axis labelling. When plotting, I did not specify the x-axis value so it generated integers from 0-365, thus the number of days in a year. Now I want the months to appear as x-axis labels instead of integers (Jan, Feb, etc.). Can someone help me out?
Record low and high temperatures:

I generated source data as follows:
np.random.seed(13)
dates = pd.date_range(start='2014-01-01', end='2014-12-31')
temp = pd.DataFrame({'tMin': np.random.normal(0, 0.5, dates.size).cumsum() - 10,
'tMax': np.random.normal(0, 0.5, dates.size).cumsum() + 10}, index=dates)
To get the picture with month labels, try the following code:
# Imports
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Drawing
fig, ax = plt.subplots(figsize=(10, 4))
plt.xlabel('Month')
plt.ylabel('Temp')
plt.title('Temperatures 2014')
ax.xaxis.set_major_locator(mdates.MonthLocator())
fmt = mdates.DateFormatter('%b %Y')
ax.xaxis.set_major_formatter(fmt)
ax.plot(temp.tMin)
ax.plot(temp.tMax)
ax.fill_between(temp.index, temp.tMin, temp.tMax, color='#A0E0A0', alpha=0.2)
plt.setp(ax.get_xticklabels(), rotation=30);
For the above source data I got the following picture:

Related

How to set xlim in seaborn barplot?

I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?
IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()

Flatten broken horizontal bar chart to line graph or heatmap

I have data for all the time I've spent coding. This data is represented as a dictionary where the key is the date and the value is a list of tuples containing the time I started a coding session and how long the coding session lasted.
I have successfully plotted this on a broken_barh using the below code, where the y-axis is the date, the x-axis is the time in that day and each broken bar is an individual session.
for i,subSessions in enumerate(sessions.values()):
plt.broken_barh(subSessions, (i,1))
months = {}
start = getStartMonth()
for month in period_range(start=start,end=datetime.today(),freq="M"):
month = str(month)
months[month] = (datetime.strptime(month,'%Y-%m')-start).days
plt.yticks(list(months.values()),months.keys())
plt.xticks(range(0,24*3600,3600),[str(i)+":00" for i in range(24)],rotation=45)
plt.gca().invert_yaxis()
plt.show()
I want to use this data to discover what times of the day I spend the most time coding, but it isn't very clear from the above chart so I'd like to display it as a line graph or heatmap where the y-axis is the number of days I spent coding at the time on the x-axis (or, in other words, how many sessions are present in that column of the above chart). How do I accomplish this?
You can find some great examples of how to create a heatmap from matplotlib website.
Here is a basic code with some random data:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
index_labels = np.arange(0,24)
column_labels = pd.date_range(start='1/1/2022', end='1/31/2022').strftime('%m/%d')
#random data
np.random.seed(12345)
data = np.random.randint(0,60, size=(len(index_labels), len(column_labels)))
df = pd.DataFrame(data=data, columns=column_labels, index=index_labels)
#heatmap function
def heatmap(df, ax, cbarlabel="", cmap="Greens", label_num_dec_place=0):
df = df.copy()
# Ploting a blank heatmap
im = ax.imshow(df.values, cmap)
# create a customized colorbar
cbar = ax.figure.colorbar(im, ax=ax, fraction=0.05, extend='both', extendfrac=0.05)
cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom", fontsize=14)
# Setting ticks
ax.set_xticks(np.arange(df.shape[1]), labels=df.columns, fontsize=12)
ax.set_yticks(np.arange(df.shape[0]), labels=list(df.index), fontsize=12)
# proper placement of ticks
ax.tick_params(axis='x', top=True, bottom=False,
labeltop=True, labelbottom=False)
ax.spines[:].set_visible(False)
ax.grid(which="both", visible="False", color="white", linestyle='solid', linewidth=2)
ax.grid(False)
# Rotation of tick labels
plt.setp(ax.get_xticklabels(), rotation=-60,
ha="right", rotation_mode=None)
plt.setp(ax.get_yticklabels(), rotation=30)
#plotting and saving
fig, ax = plt.subplots(facecolor=(1,1,1), figsize=(20,8), dpi=200)
heatmap(df=df, ax=ax, cbarlabel="time (min)", cmap="Greens", label_num_dec_place=0)
plt.savefig('time_heatmap.png',
bbox_inches='tight',
facecolor=fig.get_facecolor(),
transparent=True,
)
Output:
One way to do it is to use sampling. Choose how many samples you want to take in a given interval (the precision, for example 288 samples per day) and split each interval by that number of samples and count how many sessions are within this sample. The downside to this is that it can't be 100% precise and increasing the precision increases the time it takes to generate (for me, it takes several minutes to generate a second-precise image, though this level of precision makes little to no difference to the result).
Here is some code which can produce both a heatmap and a line graph
# Configuration options
precisionPerDay = 288
timeTicksPerDay = 24
timeTickRotation = 60
timeTickFontSize = 6
heatmap = True
# Constants
hoursInDay = 24
secondsInHour = 3600
secondsInDay = hoursInDay*secondsInHour
xInterval = secondsInDay/precisionPerDay
timeTickSecondInterval = precisionPerDay/timeTicksPerDay
timeTickHourInterval = hoursInDay/timeTicksPerDay
# Calculating x-axis (time) ticks
xAxis = range(precisionPerDay)
timeTickLabels = []
timeTickLocations = []
for timeTick in range(timeTicksPerDay):
timeTickLocations.append(int(timeTick*timeTickSecondInterval))
hours = timeTick/timeTicksPerDay*hoursInDay
hour = int(hours)
minute = int((hours-hour)*60)
timeTickLabels.append(f"{hour:02d}:{minute:02d}")
# Calculating y-axis (height)
heights = []
for dayX in xAxis:
rangeStart = dayX*xInterval
rangeEnd = rangeStart+xInterval
y = 0
for date,sessions in sessions.items():
for session in sessions:
if session[0] < rangeEnd and session[0]+session[1] > rangeStart:
y += 1
heights.append(y)
# Plotting data
if heatmap:
plt.yticks([])
plt.imshow([heights], aspect="auto")
else:
plt.plot(xAxis,heights)
plt.ylim(ymin=0)
plt.xlim(xmin=0,xmax=len(heights))
plt.xlabel("Time of day")
plt.ylabel("How often I've coded at that time")
plt.xticks(timeTickLocations,timeTickLabels,
fontsize=timeTickFontSize,rotation=timeTickRotation)
plt.show()
And here are some sample results
Graph produced by same configuration options shown in above code
Same data but as a line graph with a lower precision (24 per day) and more time ticks (48)

How to convert DOY (Day of year) to months (as text) in a plot?

I have plots of climate time series for daily mean temperature, precipitation and global radiation.
I generated plots like this:
https://i.ibb.co/w4x2FMN/temp-mean-1999-2018.png
On x-axis I just generated list of the numbers 1 - 365 which represent the day of year (DOY).
What I actually want is, that the x-axis is devided in month names (as strings) like this:
https://i.ibb.co/cL2zc87/rplot.jpg
I tried already a lot of different things but nothing worked.
fig = plt.figure(figsize=(10,10))
ax = plt.axes()
x = np.arange(1,366) # here I define the List with DOY
ax.fill_between(x, temp_cum['min'], temp_cum['max'], color='lightgray', label='1999-2017')
#ax.plot(x, merge_table_99_17_without, color='grey', linewidth=0.3)
ax.plot(x, temp_cum['2018'], color='black', label='2018');
ax.legend(loc='upper left')
ax.set_ylabel('daily mean temperature [°C]')
#ax.set_xlabel('DOY')
plt.show()
First you should convert your numbers to date objects as described in this post. You can use the following function.
import datetime
def serial_date_to_string(srl_no):
new_date = datetime.datetime(2018,1,1,0,0) + datetime.timedelta(srl_no - 1)
return new_date.strftime("%Y-%m-%d")
Then you have to format your x-axis to only show the month and not the full dates. This post describes how to do this in detail.
Thank you very much #AUBSieGUL.
Your second link finally helped me:
import numpy as np
import matplotlib.pyplot as plt
import datetime
import matplotlib.dates as mdates
fig = plt.figure(figsize=(12,12))
ax = plt.axes()
### I added this!
# Set the locator
locator = mdates.MonthLocator() # every month
# Specify the format - %b gives us Jan, Feb...
fmt = mdates.DateFormatter('%b')
numdays = 365
base = datetime.datetime(2018, 1, 1, 0, 0, 0, 0)
date_list = [base + datetime.timedelta(days=x) for x in range(0,numdays)]
###
###replaced all x with date_list
ax.fill_between(date_list, prec_cum['min'], prec_cum['max'], color='lightgray', label='1999-2017')
ax.plot(date_list, merge_table_99_17_cumsum_without, color='grey', linewidth=0.3)
ax.plot(date_list, prec_cum['2018'], color='black', label='2018');
ax.legend(loc='upper left')
ax.set_ylabel('cum. sums of global radiation [kW/m²]')
#ax.set_xlabel('DOY')
### I added this!
X = plt.gca().xaxis
X.set_major_locator(locator)
# Specify formatter
X.set_major_formatter(fmt)
###
plt.show()

Adjusting x-axis in matplotlib

I have a range of values for every hour of year. Which means there are 24 x 365 = 8760 values. I want to plot this information neatly with matplotlib, with x-axis showing January, February......
Here is my current code:
from matplotlib import pyplot as plt
plt.plot(x_data,y_data,label=str("Plot"))
plt.xticks(rotation=45)
plt.xlabel("Time")
plt.ylabel("Y axis values")
plt.title("Y axis values vs Time")
plt.legend(loc='upper right')
axes = plt.gca()
axes.set_ylim([0,some_value * 3])
plt.show()
x_data is a list containing dates in datetime format. y_data contains values corresponding to the values in x_data. How can I get the plot neatly done with months on the X axis? An example:
You could create a scatter plot with horizontal lines as markers. The month is extracted by using the datetime module. In case the dates are not ordered, the plot sorts both lists first according to the date:
#creating a toy dataset for one year, random data points within month-specific limits
from datetime import date, timedelta
import random
x_data = [date(2017, 1, 1) + timedelta(days = i) for i in range(365)]
random.shuffle(x_data)
y_data = [random.randint(50 * (i.month - 1), 50 * i.month) for i in x_data]
#the actual plot starts here
from matplotlib import pyplot as plt
#get a scatter plot with horizontal markers for each data point
#in case the dates are not ordered, sort first the dates and the y values accordingly
plt.scatter([day.strftime("%b") for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], marker = "_", s = 900)
plt.show()
Output
The disadvantage is obviously that the lines have a fixed length. Also, if a month doesn't have a data point, it will not appear in the graph.
Edit 1:
You could also use Axes.hlines, as seen here.
This has the advantage, that the line length changes with the window size. And you don't have to pre-sort the lists, because each start and end point is calculated separately.
The toy dataset is created as above.
from matplotlib import pyplot as plt
#prepare the axis with categories Jan to Dec
x_ax = [date(2017, 1, 1) + timedelta(days = 31 * i) for i in range(12)]
#create invisible bar chart to retrieve start and end points from automatically generated bars
Bars = plt.bar([month.strftime("%b") for month in x_ax], [month.month for month in x_ax], align = "center", alpha = 0)
start_1_12 = [plt.getp(item, "x") for item in Bars]
end_1_12 = [plt.getp(item, "x") + plt.getp(item, "width") for item in Bars]
#retrieve start and end point for each data point line according to its month
x_start = [start_1_12[day.month - 1] for day in x_data]
x_end = [end_1_12[day.month - 1] for day in x_data]
#plot hlines for all data points
plt.hlines(y_data, x_start, x_end, colors = "blue")
plt.show()
Output
Edit 2:
Now your description of the problem is totally different from what you show in your question. You want a simple line plot with specific axis formatting. This can be found easily in the matplotlib documentation and all over SO. An example, how to achieve this with the above created toy dataset would be:
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MonthLocator
ax = plt.subplot(111)
ax.plot([day for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], "r.-")
ax.xaxis.set_major_locator(MonthLocator(bymonthday=15))
ax.xaxis.set_minor_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter("%B"))
plt.show()
Output

Matplotlib: How to skip a range of hours when plotting with a datetime axis?

I have tick-by-tick data of a financial instrument, which I am trying to plot using matplotlib. I am working with pandas and the data is indexed with DatetimeIndex.
The problem is, when I try to plot multiple trading days I can't skip the range of time between the market closing time and next day's opening (see the example), which of course I am not interested in.
Is there a way to make matplotlib ignore this and just "stick" together the closing quote with the following day's opening? I tried to pass a custom range of time:
plt.xticks(time_range)
But the result is the same. Any ideas how to do this?
# Example data
instrument = pd.DataFrame(data={
'Datetime': [
dt.datetime.strptime('2018-01-11 11:00:11', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-11 13:02:17', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-11 16:59:14', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 11:00:11', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 13:15:24', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 16:58:43', '%Y-%m-%d %H:%M:%S')
],
'Price': [127.6, 128.1, 127.95, 129.85, 129.7, 131.2],
'Volume': [725, 146, 48, 650, 75, 160]
}).set_index('Datetime')
plt.figure(figsize=(10,5))
top = plt.subplot2grid((4,4), (0, 0), rowspan=3, colspan=4)
bottom = plt.subplot2grid((4,4), (3,0), rowspan=1, colspan=4)
top.plot(instrument.index, instrument['Price'])
bottom.bar(instrument.index, instrument['Volume'], 0.005)
top.xaxis.get_major_ticks()
top.axes.get_xaxis().set_visible(False)
top.set_title('Example')
top.set_ylabel('Price')
bottom.set_ylabel('Volume')
TL;DR
Replace the matplotlib plotting functions:
top.plot(instrument.index, instrument['Price'])
bottom.bar(instrument.index, instrument['Volume'], 0.005)
With these ones:
top.plot(range(instrument.index.size), instrument['Price'])
bottom.bar(range(instrument.index.size), instrument['Volume'], width=1)
Or with these pandas plotting functions (only the x-axis limits will look different):
instrument['Price'].plot(use_index=False, ax=top)
instrument['Volume'].plot.bar(width=1, ax=bottom)
Align both plots by sharing the x-axis with sharex=True and set up the ticks as you would like them using the dataframe index, as shown in the example further below.
Let me first create a sample dataset and show what it looks like if I plot it using matplotlib plotting functions like in your example where the DatetimeIndex is used as the x variable.
Create sample dataset
The sample data is created using the pandas_market_calendars package to create a realistic DatetimeIndex with a minute-by-minute frequency that spans several weekdays and a weekend.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.ticker as ticker
import pandas_market_calendars as mcal # v 1.6.1
# Create datetime index with a 'minute start' frequency based on the New
# York Stock Exchange trading hours (end date is inclusive)
nyse = mcal.get_calendar('NYSE')
nyse_schedule = nyse.schedule(start_date='2021-01-07', end_date='2021-01-11')
nyse_dti = mcal.date_range(nyse_schedule, frequency='1min', closed='left')\
.tz_convert(nyse.tz.zone)
# Remove timestamps of closing times to create a 'period start' datetime index
nyse_dti = nyse_dti.delete(nyse_dti.indexer_at_time('16:00'))
# Create sample of random data consisting of opening price and
# volume of financial instrument traded for each period
rng = np.random.default_rng(seed=1234) # random number generator
price_change = rng.normal(scale=0.1, size=nyse_dti.size)
price_open = 127.5 + np.cumsum(price_change)
volume = rng.integers(100, 10000, size=nyse_dti.size)
df = pd.DataFrame(data=dict(Price=price_open, Volume=volume), index=nyse_dti)
df.head()
# Price Volume
# 2021-01-07 09:30:00-05:00 127.339616 7476
# 2021-01-07 09:31:00-05:00 127.346026 3633
# 2021-01-07 09:32:00-05:00 127.420115 1339
# 2021-01-07 09:33:00-05:00 127.435377 3750
# 2021-01-07 09:34:00-05:00 127.521752 7354
Plot data with matplotlib using the DatetimeIndex
This sample data can now be plotted using matplotlib plotting functions like in your example, but note that the subplots are created by using plt.subplots with the sharex=True argument. This aligns the line with the bars correctly and makes it possible to use the interactive interface of matplotlib with both subplots.
# Create figure and plots using matplotlib functions
fig, (top, bot) = plt.subplots(2, 1, sharex=True, figsize=(10,5),
gridspec_kw=dict(height_ratios=[0.75,0.25]))
top.plot(df.index, df['Price'])
bot.bar(df.index, df['Volume'], 0.0008)
# Set title and labels
top.set_title('Matplotlib plots with unwanted gaps', pad=20, size=14, weight='semibold')
top.set_ylabel('Price', labelpad=10)
bot.set_ylabel('Volume', labelpad=10);
Plot data with matplotlib without any gaps by using a range of integers
The problem of these gaps can be solved by simply ignoring the DatetimeIndex and using a range of integers instead. Most of the work then lies in creating appropriate tick labels. Here is an example:
# Create figure and matplotlib plots with some additional formatting
fig, (top, bot) = plt.subplots(2, 1, sharex=True, figsize=(10,5),
gridspec_kw=dict(height_ratios=[0.75,0.25]))
top.plot(range(df.index.size), df['Price'])
top.set_title('Matplotlib plots without any gaps', pad=20, size=14, weight='semibold')
top.set_ylabel('Price', labelpad=10)
top.grid(axis='x', alpha=0.3)
bot.bar(range(df.index.size), df['Volume'], width=1)
bot.set_ylabel('Volume', labelpad=10)
# Set fixed major and minor tick locations
ticks_date = df.index.indexer_at_time('09:30')
ticks_time = np.arange(df.index.size)[df.index.minute == 0][::2] # step in hours
bot.set_xticks(ticks_date)
bot.set_xticks(ticks_time, minor=True)
# Format major and minor tick labels
labels_date = [maj_tick.strftime('\n%d-%b').replace('\n0', '\n')
for maj_tick in df.index[ticks_date]]
labels_time = [min_tick.strftime('%I %p').lstrip('0').lower()
for min_tick in df.index[ticks_time]]
bot.set_xticklabels(labels_date)
bot.set_xticklabels(labels_time, minor=True)
bot.figure.autofmt_xdate(rotation=0, ha='center', which='both')
Create dynamic ticks for interactive plots
If you like to use the interactive interface of matplotlib (with pan/zoom), you will need to use locators and formatters from the matplotlib ticker module. Here is an example of how to set the ticks, where the major ticks are fixed and formatted like above but the minor ticks are generated automatically as you zoom in/out of the plot:
# Set fixed major tick locations and automatic minor tick locations
ticks_date = df.index.indexer_at_time('09:30')
bot.set_xticks(ticks_date)
bot.xaxis.set_minor_locator(ticker.AutoLocator())
# Format major tick labels
labels_date = [maj_tick.strftime('\n%d-%b').replace('\n0', '\n')
for maj_tick in df.index[ticks_date]]
bot.set_xticklabels(labels_date)
# Format minor tick labels
def min_label(x, pos):
if 0 <= x < df.index.size:
return df.index[int(x)].strftime('%H:%M')
min_fmtr = ticker.FuncFormatter(min_label)
bot.xaxis.set_minor_formatter(min_fmtr)
bot.figure.autofmt_xdate(rotation=0, ha='center', which='both')
Documentation: example of an alternative solution; datetime string format codes
Maybe use https://pypi.org/project/mplfinance/
Allows mimicking the usual financial plots you see in most services.
When you call the mplfinance mpf.plot() function, there is a kwarg show_nontrading, which by default is set to False so that these unwanted gaps are automatically not plotted. (To plot them, set show_nontrading=True).

Categories