I'm trying to plot two separate things from two pandas dataframes but the x-axis is giving some issues. When using matplotlib.ticker to skip x-ticks, the date doesn't get skipped. The result is that the x-axis values doesn't match up with what is plotted.
For example, when the x-ticks are set to a base of 2, you'll see that the dates are going up by 1.
But the graph has the same spacing when the base is set to 4, which you can see here:
For the second image, the goal is for the days to increase by 4 each tick, so it should read 22, 26, 30, etc.
Here is the code that I'm working with:
ax = plot2[['Date','change value']].plot(x='Date',color='red',alpha=1,linewidth=1.5)
plt.ylabel('Total Change')
plot_df[['Date','share change daily']].plot(x='Date',secondary_y=True,kind='bar',ax=ax,alpha=0.4,color='black',figsize=(6,2),label='Daily Change')
plt.ylabel('Daily Change')
ax.legend(['Total Change (L)','Daily Change'])
plt.xticks(plot_df.index,plot_df['Date'].values)
myLocator = mticker.MultipleLocator(base=4)
ax.xaxis.set_major_locator(myLocator)
Any help is appreciated! Thanks :)
First off, I suggest you set the date as the index of your dataframe. This lets pandas automatically format the date labels nicely when you create line plots and it lets you conveniently create a custom format with the strftime method.
This second point is relevant to this example, seeing as plotting a bar plot over a line plot prevents you from getting the pandas line plot date labels because the x-axis units switch to integer units starting at 0 (note that this is also the case when you use the dates as strings instead of datetime objects, aka timestamp objects in pandas). You can check this for yourself by running ax.get_xticks() after creating the line plot (with a DatetimeIndex) and again after creating the bar plot.
There are too many peculiarities regarding the tick locators and formatters, the pandas plotting defaults, and the various ways in which you could define your custom ticks and tick labels for me to go into more detail here. So let me suggest you refer to the documentation for more information (though for your case you don't really need any of this): Major and minor ticks, Date tick labels, Custom tick formatter for time series, more examples using ticks, and the ticker module which contains the list of tick locators and formatters and their parameters.
Furthermore, you can identify the default tick locators and formatters used by the plotting functions with ax.get_xaxis().get_major_locator() or ax.get_xaxis().get_major_formatter() (you can do the same for the y-axis, and for minor ticks) to get an idea of what is happening under the hood.
On to solving your problem. Seeing as you want a fixed frequency of ticks for a predefined range of dates, I suggest that you avoid explicitly selecting a ticker locator and formatter and that instead you simply create the list of ticks and tick labels you want. First, here is some sample data similar to yours:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.bdate_range(start='2020-07-22', end='2020-09-03')
daily = rng.normal(loc=0, scale=250, size=dti.size)
total = -1900 + np.cumsum(daily)
df = pd.DataFrame({'Daily Change': daily,
'Total Change': total},
index=dti)
df.head()
Daily Change Total Change
2020-07-22 86.396048 -1813.603952
2020-07-23 205.404536 -1608.199416
2020-07-24 82.609269 -1525.590147
2020-07-27 -325.789308 -1851.379455
2020-07-28 226.338967 -1625.040488
The date is set as the index, which will simplify the code for creating the plots (no need to specify x). I use the same formatting arguments as in the example you gave, except for the figure size. Note that for setting the ticks and tick labels I do not use plt.xticks because this refers to the secondary Axes containing the bar plot and for some reason, the rotation and ha arguments get ignored.
label_daily, label_total = df.columns
# Create pandas line plot: note the 'use_index' parameter
ax = df.plot(y=label_total, color='red', alpha=1, linewidth=1.5,
use_index=False, ylabel=label_total)
# Create pandas bar plot: note that the second ylabel must be created
# after, else it overwrites the previous label on the left
df.plot(kind='bar', y=label_daily, color='black', alpha=0.4,
ax=ax, secondary_y=True, mark_right=False, figsize=(9, 4))
plt.ylabel(label_daily, labelpad=10)
# Place legend in a better location: note that because there are two
# Axes, the combined legend can only be edited with the fig.legend
# method, and the ax legend must be removed
ax.legend().remove()
plt.gcf().legend(loc=(0.11, 0.15))
# Create custom x ticks and tick labels
freq = 4 # business days
xticks = ax.get_xticks()
xticklabels = df.index[::freq].strftime('%b-%d')
ax.set_xticks(xticks[::freq])
ax.set_xticks(xticks, minor=True)
ax.set_xticklabels(xticklabels, rotation=0, ha='center')
plt.show()
The codes for formatting the dates can be found here.
For the sake of completeness, here are two alternative ways of creating exactly the same ticks but this time by making explicit use of matplotlib tick locators and formatters.
This first alternative uses lists of ticks and tick labels like before, but this time passing them to FixedLocator and FixedFormatter:
import matplotlib.ticker as mticker
# Create custom x ticks and tick labels
freq = 4 # business days
maj_locator = mticker.FixedLocator(ax.get_xticks()[::freq])
min_locator = mticker.FixedLocator(ax.get_xticks())
ax.xaxis.set_major_locator(maj_locator)
ax.xaxis.set_minor_locator(min_locator)
maj_formatter = mticker.FixedFormatter(df.index[maj_locator.locs].strftime('%b-%d'))
ax.xaxis.set_major_formatter(maj_formatter)
plt.setp(ax.get_xticklabels(), rotation=0, ha='center')
This second alternative makes use of the option to create a tick at every nth position of the index when using IndexLocator, combining it with FuncFormatter (instead of IndexFormatter which is deprecated):
import matplotlib.ticker as mticker
# Create custom x ticks and tick labels
maj_freq = 4 # business days
min_freq = 1 # business days
maj_locator = mticker.IndexLocator(maj_freq, 0)
min_locator = mticker.IndexLocator(min_freq, 0)
ax.xaxis.set_major_locator(maj_locator)
ax.xaxis.set_minor_locator(min_locator)
maj_formatter = mticker.FuncFormatter(lambda x, pos=None:
df.index[int(x)].strftime('%b-%d'))
ax.xaxis.set_major_formatter(maj_formatter)
plt.setp(ax.get_xticklabels(), rotation=0, ha='center')
As you can see, both of these alternatives are more verbose than the initial example.
Related
I am querying COVID-19 data and building a dataframe of day-over-day changes for one of the data points (positive test results) where each row is a day, each column is a state or territory (there are 56 altogether). I can then generate a chart for every one of the states, but I can't get my x-axis labels (the dates) to behave like I want. There are two problems which I suspect are related. First, there are too many labels -- usually matplotlib tidily reduces the label count for readability, but I think the subplots are confusing it. Second, I would like the labels to read vertically; but this only happens on the last of the plots. (I tried moving the rotation='vertical' inside the for block, to no avail.)
The dates are the same for all the subplots, so -- this part works -- the x-axis labels only need to appear on the bottom row of the subplots. Matplotlib is doing this automatically. But I need fewer of the labels, and for all of them to align vertically. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# get current data
all_states = pd.read_json("https://covidtracking.com/api/v1/states/daily.json")
# convert the YYYYMMDD date to a datetime object
all_states[['gooddate']] = all_states[['date']].applymap(lambda s: pd.to_datetime(str(s), format = '%Y%m%d'))
# 'positive' is the cumulative total of COVID-19 test results that are positive
all_states_new_positives = all_states.pivot_table(index = 'gooddate', columns = 'state', values = 'positive', aggfunc='sum')
all_states_new_positives_diff = all_states_new_positives.diff()
fig, axes = plt.subplots(14, 4, figsize = (12,8), sharex = True )
plt.tight_layout
for i , ax in enumerate(axes.ravel()):
# get the numbers for the last 28 days
x = all_states_new_positives_diff.iloc[-28 :].index
y = all_states_new_positives_diff.iloc[-28 : , i]
ax.set_title(y.name, loc='left', fontsize=12, fontweight=0)
ax.plot(x,y)
plt.xticks(rotation='vertical')
plt.subplots_adjust(left=0.5, bottom=1, right=1, top=4, wspace=2, hspace=2)
plt.show();
Suggestions:
Increase the height of the figure.
fig, axes = plt.subplots(14, 4, figsize = (12,20), sharex = True)
Rotate all the labels:
fig.autofmt_xdate(rotation=90)
Use tight_layout at the end instead of subplots_adjust:
fig.tight_layout()
I have to plot several curves with very high xtick density, say 1000 date strings. To prevent these tick labels overlapping each other I manually set them to be 60 dates apart. Code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
fig = plt.figure(1)
ax = plt.subplot(1, 1, 1)
tick_spacing = 60
for i in range(5):
plt.plot(ts_index, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
fig.savefig(r".\net_value_curves.png", )
fig.clf()
I'm running this piece of code in PyCharm Community Edition 2017.2.2 with a Python 3.6 kernel. Now comes the funny thing: whenever I ran the code in the normal "run" mode (i.e. just hit the execution button and let the code run "freely" till interruption or termination), then the figure I got would always miss xticklabels:
However, if I ran the code in "debug" mode and ran it step by step then I would get an expected figure with complete xticklabels:
This is really weird. Anyway, I just hope to find a way that can ensure me getting the desired output (the second figure) in the normal "run" mode. How can I modify my current code to achieve this?
Thanks in advance!
Your x axis data are strings. Hence you will get one tick per data point. This is probably not what you want. Instead use the dates to plot. Because you are using pandas, this is easily converted,
dates = pd.to_datetime(ts_index, format="%Y%m%d")
You may then get rid of your manual xtick locating and formatting, because matplotlib will automatically choose some nice tick locations for you.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
dates = pd.to_datetime(ts_index, format="%Y%m%d")
fig, ax = plt.subplots()
for i in range(5):
plt.plot(dates, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
However in case you do want to have some manual control over the locations and formats you may use matplotlib.dates locators and formatters.
# tick every 3 months
plt.gca().xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))
# format as "%Y%m%d"
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%Y%m%d"))
In general, the Axis object computes and places ticks using a Locator object. Locators and Formatters are meant to be easily replaceable, with appropriate methods of Axis. The default Locator does not seem to be doing the trick for you so you can replace it with anything you want using axes.xaxis.set_major_locator. This problem is not complicated enough to write your own, so I would suggest that MaxNLocator fits your needs fairly well. Your example seems to work well with nbins=16 (which is what you have in the picture, since there are 17 ticks.
You need to add an import:
from matplotlib.ticker import MaxNLocator
You need to replace the block
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
with
ax.xaxis.set_major_locator(MaxNLocator(nbins=16))
or just
ax.xaxis.set_major_locator(MaxNLocator(16))
You may want to play around with the other arguments (all of which have to be keywords, except nbins). Pay especial attention to integer.
Note that for the Locator and Formatter APIs we work with an Axis object, not Axes. Axes is the whole plot, while Axis is the thing with the spines on it. Axes usually contains two Axis objects and all the other stuff in your plot.
You can set the visibility of the xticks-labels to False
for label in plt.gca().xaxis.get_ticklabels()[::N]:
label.set_visible(False)
This will set every Nth label invisible.
rcParams['date.autoformatter.month'] = "%b\n%Y"
I am using matpltolib to plot a time-series and if I set rcParams as above, the resulting plot has month name and year labeled at each tick. How can I set it up so that year is only plotted at january of each year. I tried doing this, but it does not work:
rcParams['date.autoformatter.month'] = "%b"
rcParams['date.autoformatter.year'] = "%Y"
The formatters do not allow to specify conditions on them. Depending on the span of the series, the AutoDateFormatter will either fall into the date.autoformatter.month range or the date.autoformatter.year range.
Also, the AutoDateLocator may not necessarily decide to actually tick the first of January at all.
I would hence suggest to specify the tickers directly to the desired format and locations. You may use the major ticks to show the years and the minor ticks to show the months. The format for the major ticks can then get a line break, in order not to overlap with the minor ticklabels.
import matplotlib.pyplot as plt
import matplotlib.dates
from datetime import datetime
t = [datetime(2016,1,1), datetime(2017,12,31)]
x = [0,1]
fig, ax = plt.subplots()
ax.plot(t,x)
ax.xaxis.set_major_locator(matplotlib.dates.YearLocator())
ax.xaxis.set_minor_locator(matplotlib.dates.MonthLocator((1,4,7,10)))
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter("\n%Y"))
ax.xaxis.set_minor_formatter(matplotlib.dates.DateFormatter("%b"))
plt.setp(ax.get_xticklabels(), rotation=0, ha="center")
plt.show()
You could then also adapt the minor ticks' lengths to match those of the major ones in case that is desired,
ax.tick_params(axis="x", which="both", length=4)
I am plotting aggregated data in Python, using Pandas and Matlplotlib.
My axis customization commands are failing as a function of which of two similar functions I'm calling to make bar plots. The working case is e.g.:
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def format_x_date_month_day(ax):
days = mdates.DayLocator()
months = mdates.MonthLocator() # every month
dayFmt = mdates.DateFormatter('%D')
monthFmt = mdates.DateFormatter('%Y-%m')
ax.figure.autofmt_xdate()
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthFmt)
ax.xaxis.set_minor_locator(days)
span_days = 90
start = pd.to_datetime("1-1-2012")
idx = pd.date_range(start, periods=span_days).tolist()
df=pd.DataFrame(index=idx, data={'A':np.random.random(span_days), 'B':np.random.random(span_days)})
plt.close('all')
fig, ax = plt.subplots(1)
ax.bar(df.index, df.A) # loop over columns here to do stacked plot
format_x_date_month_day(ax)
plt.show()
(See matplotlib.org for example of looping to create a stacked bar plot.) This gives us
Another approach that should work and be much easier is to use df.plot.bar(ax=ax, stacked=True), however it does not admit date axis formatting with mdates:
plt.close('all')
fig, ax = plt.subplots(1)
df.plot.bar(ax=ax, stacked=True)
format_x_date_month_day(ax)
plt.show()
How can mdates and ax.figure.autofmt_xdate() be made to play nice with df.plot.bar?
Bar plots in pandas are designed to compare categories rather than to display time-series or other types of continuous variables, as stated in the docstring:
A bar plot shows comparisons among discrete categories. One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.
This is why the scale of the x-axis of pandas bar plots is made of integers starting from zero, regardless of the data type of the x variable. When the same bar plot is created with matplotlib, the scale of the x-axis is made of matplotlib date numbers, so the tick locators and formatters of the matplotlib.dates module (mdates) can be used as expected.
To be able to use a pandas bar plot with mdates, you need to move the bars along the x-axis to locations that match the matplotlib date numbers. This can be done thanks to the mdates.date2num function. This is illustrated in the following example based on the code you provided with a few modifications: the sample dataset contains 3 variables, the time series is limited to 45 days, and the tick formatting is adjusted to my preferences (and is not wrapped as a function).
This example works for any number of variables (with or without NaNs) and for any bar width that is passed to the pandas plot function:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create random dataset
rng = np.random.default_rng(seed=1) # random number generator
nperiods = 45
nvar = 3
idx = pd.date_range('2012-01-01', periods=nperiods, freq='D')
df = pd.DataFrame(rng.integers(11, size=(idx.size, nvar)),
index=idx, columns=list('ABC'))
# Draw pandas stacked bar chart
ax = df.plot(kind='bar', stacked=True, figsize=(10,5))
# Compute width of bars in matplotlib date units
pandas_width = ax.patches[0].get_width() # the default bar width is 0.5
mdates_x0 = mdates.date2num(df.index[0])
mdates_x1 = mdates.date2num(df.index[1])
mdates_width_default = (mdates_x1-mdates_x0)/2
mdates_width = pandas_width*mdates_width_default/0.5 # rule of three conversion
# Compute new x values for bars in matplotlib date units, adjusting the
# positions according to the bar width
mdates_x = mdates.date2num(df.index) - mdates_width/2
nvar = len(ax.get_legend_handles_labels()[1])
mdates_x_patches = np.ravel(nvar*[mdates_x])
# Set bars to new x positions: this loop works fine with NaN values as
# well because in bar plot NaNs are drawn with a rectangle of 0 height
# located at the foot of the bar, you can verify this with patch.get_bbox()
for patch, new_x in zip(ax.patches, mdates_x_patches):
patch.set_x(new_x)
patch.set_width(mdates_width)
# Set major and minor date tick locators
months = mdates.MonthLocator()
days = mdates.DayLocator(bymonthday=np.arange(31, step=3))
ax.xaxis.set_major_locator(months)
ax.xaxis.set_minor_locator(days)
# Set major date tick formatter
month_fmt = mdates.DateFormatter('\n%b\n%Y')
day_fmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(month_fmt)
ax.xaxis.set_minor_formatter(day_fmt)
# Shift the plot frame to where the bars are now located
xmin = min(mdates_x) - mdates_width
xmax = max(mdates_x) + 2*mdates_width
ax.set_xlim(xmin, xmax)
# Adjust tick label format last, else it may produce unexpected results
ax.figure.autofmt_xdate(rotation=0, ha='center')
Up to you to decide if this is more convenient than plotting stacked bars from scratch with matplotlib.
This solution can be slightly modified to display appropriate tick labels for time series based on any frequency of time. Here is an example using a frequency of minutes, a custom bar width, and an automatic date tick locator and formatter. Only the new/modified code lines are shown:
import matplotlib.ticker as mtick
#...
idx = pd.date_range('2012-01-01 12', periods=nperiods, freq='T')
#...
ax = df.plot(kind='bar', stacked=True, figsize=(10,5), width=0.3)
#...
# Set adaptive tick locators and major tick formatter
maj_loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(maj_loc)
min_loc = mtick.FixedLocator(mdates_x + mdates_width/2)
ax.xaxis.set_minor_locator(min_loc) # draw minor tick under each bar
fmt = mdates.ConciseDateFormatter(maj_loc)
ax.xaxis.set_major_formatter(fmt)
#...
You may notice that the ticks are often not well aligned with the bars. There appears to be some issue with matplotlib when the figure elements are put together. I find this is usually only noticeable when plotting thinner-than-useful bars. You can check that the bars and ticks are indeed placed correctly by running ax.get_xticks() and comparing that to the values given by patch.get_bbox() when looping through ax.patches.
I am trying to produce a graph and I am having some issues annotating it.
My graph has a log scale on the x-axis, showing time. What I want to be able to do is keep the existing (but not predictable) numeric tick labels at 100 units, 1000 units, 10000 units, etc but also add custom tick labels to the x-axis that make it clear where more "human readable" time intervals occur---for instance I want to be able to label 'one week', 'one month', '6 months', etc.
I can use matplotlib.pyplot.annotate() to mark the points but it doesn't really do what I want. I don't really want text and arrows on top of my graph, I just want to add a few extra custom tick marks. Any ideas?
If you really want to add extra ticks, you can get the existing ones using axis.xaxis.get_majorticklocs(), add whatever you want to add, and then set the ticks using axis.xaxis.set_ticks(<your updated array>).
An alternative would be to add vertical lines using axvline. The advantage is that you don't have to worry about inserting your custom tick into the existing array, but you'll have to annotate the lines manually.
Yet another alternative would be to add a linked axis with your custom ticks.
From http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.xticks:
# return locs, labels where locs is an array of tick locations and
# labels is an array of tick labels.
locs, labels = xticks()
So all you should need to do is obtain the locs and labels and then modify labels to your liking (dummy example):
labels = ['{0} (1 day)','{0} (1 weak)', '{0} (1 year)']
new_labels = [x.format(locs[i]) for i,x in enumerate(labels)]
and then run:
xticks(locs, new_labels)
This is my solution. The main advantages are:
You can specify the axes (useful for twin axes or if working with multiple axes simultaneously)
You can specify the axis (put ticks on x-axis or y-axis)
You can easily add new ticks while keeping the automatic ones
It automatically replaces if you add a tick that already exists.
Code:
#!/usr/bin/python
from __future__ import division
import matplotlib.pyplot as plt
import numpy as np
#Function to add ticks
def addticks(ax,newLocs,newLabels,pos='x'):
# Draw to get ticks
plt.draw()
# Get existing ticks
if pos=='x':
locs = ax.get_xticks().tolist()
labels=[x.get_text() for x in ax.get_xticklabels()]
elif pos =='y':
locs = ax.get_yticks().tolist()
labels=[x.get_text() for x in ax.get_yticklabels()]
else:
print("WRONG pos. Use 'x' or 'y'")
return
# Build dictionary of ticks
Dticks=dict(zip(locs,labels))
# Add/Replace new ticks
for Loc,Lab in zip(newLocs,newLabels):
Dticks[Loc]=Lab
# Get back tick lists
locs=list(Dticks.keys())
labels=list(Dticks.values())
# Generate new ticks
if pos=='x':
ax.set_xticks(locs)
ax.set_xticklabels(labels)
elif pos =='y':
ax.set_yticks(locs)
ax.set_yticklabels(labels)
#Get numpy arrays
x=np.linspace(0,2)
y=np.sin(4*x)
#Start figure
fig = plt.figure()
ax=fig.add_subplot(111)
#Plot Arrays
ax.plot(x,y)
#Add a twin axes
axr=ax.twinx()
#Add more ticks
addticks(ax,[1/3,0.75,1.0],['1/3','3/4','Replaced'])
addticks(axr,[0.5],['Miguel'],'y')
#Save figure
plt.savefig('MWE.pdf')
I like Miguel's answer above. Worked quite well. However, a small adjustment has to be made. The following:
# Get back tick lists
locs=Dticks.keys()
labels=Dticks.values()
must be changed to
# Get back tick lists
locs=list(Dticks.keys())
labels=list(Dticks.values())
since, in Python 2.7+/3, Dict.keys() and Dict.values() return dict_keys and dict_values objects, which matplotlib does not like (apparently). More about those two objects in PEP 3106.