I am querying COVID-19 data and building a dataframe of day-over-day changes for one of the data points (positive test results) where each row is a day, each column is a state or territory (there are 56 altogether). I can then generate a chart for every one of the states, but I can't get my x-axis labels (the dates) to behave like I want. There are two problems which I suspect are related. First, there are too many labels -- usually matplotlib tidily reduces the label count for readability, but I think the subplots are confusing it. Second, I would like the labels to read vertically; but this only happens on the last of the plots. (I tried moving the rotation='vertical' inside the for block, to no avail.)
The dates are the same for all the subplots, so -- this part works -- the x-axis labels only need to appear on the bottom row of the subplots. Matplotlib is doing this automatically. But I need fewer of the labels, and for all of them to align vertically. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# get current data
all_states = pd.read_json("https://covidtracking.com/api/v1/states/daily.json")
# convert the YYYYMMDD date to a datetime object
all_states[['gooddate']] = all_states[['date']].applymap(lambda s: pd.to_datetime(str(s), format = '%Y%m%d'))
# 'positive' is the cumulative total of COVID-19 test results that are positive
all_states_new_positives = all_states.pivot_table(index = 'gooddate', columns = 'state', values = 'positive', aggfunc='sum')
all_states_new_positives_diff = all_states_new_positives.diff()
fig, axes = plt.subplots(14, 4, figsize = (12,8), sharex = True )
plt.tight_layout
for i , ax in enumerate(axes.ravel()):
# get the numbers for the last 28 days
x = all_states_new_positives_diff.iloc[-28 :].index
y = all_states_new_positives_diff.iloc[-28 : , i]
ax.set_title(y.name, loc='left', fontsize=12, fontweight=0)
ax.plot(x,y)
plt.xticks(rotation='vertical')
plt.subplots_adjust(left=0.5, bottom=1, right=1, top=4, wspace=2, hspace=2)
plt.show();
Suggestions:
Increase the height of the figure.
fig, axes = plt.subplots(14, 4, figsize = (12,20), sharex = True)
Rotate all the labels:
fig.autofmt_xdate(rotation=90)
Use tight_layout at the end instead of subplots_adjust:
fig.tight_layout()
Related
I am trying to include 2 seaborn countplots with different scales on the same plot but the bars display as different widths and overlap as shown below. Any idea how to get around this?
Setting dodge=False, doesn't work as the bars appear on top of each other.
The main problem of the approach in the question, is that the first countplot doesn't take hue into account. The second countplot won't magically move the bars of the first. An additional categorical column could be added, only taking on the 'weekend' value. Note that the column should be explicitly made categorical with two values, even if only one value is really used.
Things can be simplified a lot, just starting from the original dataframe, which supposedly already has a column 'is_weeked'. Creating the twinx ax beforehand allows to write a loop (so writing the call to sns.countplot() only once, with parameters).
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_style('dark')
# create some demo data
data = pd.DataFrame({'ride_hod': np.random.normal(13, 3, 1000).astype(int) % 24,
'is_weekend': np.random.choice(['weekday', 'weekend'], 1000, p=[5 / 7, 2 / 7])})
# now, make 'is_weekend' a categorical column (not just strings)
data['is_weekend'] = pd.Categorical(data['is_weekend'], ['weekday', 'weekend'])
fig, ax1 = plt.subplots(figsize=(16, 6))
ax2 = ax1.twinx()
for ax, category in zip((ax1, ax2), data['is_weekend'].cat.categories):
sns.countplot(data=data[data['is_weekend'] == category], x='ride_hod', hue='is_weekend', palette='Blues', ax=ax)
ax.set_ylabel(f'Count ({category})')
ax1.legend_.remove() # both axes got a legend, remove one
ax1.set_xlabel('Hour of Day')
plt.tight_layout()
plt.show()
use plt.xticks(['put the label by hand in your x label'])
TLDR: What I'm looking for is a way to plot a list of timestamps as equidistant datapoints, with mpl deciding which labels to show.
If equidistant plotting of timestamped datapoints is only possible by turning the timestamps into strings (and so plotting them as a categorical axis), my question could also be phrased: how can one get mpl to automatically drop labels from an overcrowded categorical axis?
Details:
I have a timeseries with monthly data, that I'd like to plot as a bar graph. My issue is that matplotlib.pyplot automatically plots this data on a time axis:
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots(1, 1, )
s = pd.Series(range(3,7), pd.date_range('2021', freq='MS', periods=4))
ax.bar(s.index, s.values, 27) # width 27 days
ax.set_ylabel('income [Eur]')
Usually, this it what I want, but with monthly data it looks weird because Feb is significantly shorter. What I want is for the data to be plotted equi-distantly. Is there a way to force this?
Importantly, I want to retain the behaviour that e.g. only the second or third label is plotted once the x-axis becomes too crowded, without having to adjust it manually.
What I've tried or don't want to do:
I could make the gaps between the bars the same - by tweaking the width of the bars. However, I'm plotting revenue data [Eur], which means that an uneven bar width is misleading.
I could turn the timestamps into a string so that the data is plotted categorically:
s = pd.Series(range(3,7), pd.date_range('2021', freq='MS', periods=4))
x = [f'{ts.year}-{ts.month:02}' for ts in s.index]
ax.bar(x, s.values, 0.9) # width now as fraction of spacing between datapoints
However, this leads mpl to think each label must be plotted, which gets crowded:
s = pd.Series(range(3,17), pd.date_range('2021', freq='MS', periods=14))
x = [f'{ts.year}-{ts.month:02}' for ts in s.index]
ax.bar(x, s.values, 0.9) # width now as fraction of spacing between datapoints
You can space out your categorical ticks with the MaxNLocator.
Given your bigger Series sample with categorical labels:
s = pd.Series(range(3,17), pd.date_range('2021', freq='MS', periods=14))
x = [f'{ts.year}-{ts.month:02}' for ts in s.index]
fig, ax = plt.subplots()
ax.bar(x, s.values, 0.9)
ax.set_ylabel('income [Eur]')
Apply the MaxNLocator with a specified number of bins (or 'auto'):
from matplotlib.ticker import MaxNLocator
locator = MaxNLocator(nbins=5) # or nbins='auto'
ax.xaxis.set_major_locator(locator)
I'm trying to plot two datasets into one plot with matplotlib. One of the two plots is misaligned by 1 on the x-axis.
This MWE pretty much sums up the problem. What do I have to adjust to bring the box-plot further to the left?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
titles = ["nlnd", "nlmd", "nlhd", "mlnd", "mlmd", "mlhd", "hlnd", "hlmd", "hlhd"]
plotData = pd.DataFrame(np.random.rand(25, 9), columns=titles)
failureRates = pd.DataFrame(np.random.rand(9, 1), index=titles)
color = {'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue',
'caps': 'Gray'}
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
plotData.plot.box(ax=ax1, color=color, sym='+')
failureRates.plot(ax=ax2, color='b', legend=False)
ax1.set_ylabel('Seconds')
ax2.set_ylabel('Failure Rate in %')
plt.xlim(-0.7, 8.7)
ax1.set_xticks(range(len(titles)))
ax1.set_xticklabels(titles)
fig.tight_layout()
fig.show()
Actual result. Note that its only 8 box-plots instead of 9 and that they're starting at index 1.
The issue is a mismatch between how box() and plot() work - box() starts at x-position 1 and plot() depends on the index of the dataframe (which defaults to starting at 0). There are only 8 plots because the 9th is being cut off since you specify plt.xlim(-0.7, 8.7). There are several easy ways to fix this, as #Sheldore's answer indicates, you can explicitly set the positions for the boxplot. Another way you can do this is to change the indexing of the failureRates dataframe to start at 1 in construction of the dataframe, i.e.
failureRates = pd.DataFrame(np.random.rand(9, 1), index=range(1, len(titles)+1))
note that you need not specify the xticks or the xlim for the question MCVE, but you may need to for your complete code.
You can specify the positions on the x-axis where you want to have the box plots. Since you have 9 boxes, use the following which generates the figure below
plotData.plot.box(ax=ax1, color=color, sym='+', positions=range(9))
I am plotting aggregated data in Python, using Pandas and Matlplotlib.
My axis customization commands are failing as a function of which of two similar functions I'm calling to make bar plots. The working case is e.g.:
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def format_x_date_month_day(ax):
days = mdates.DayLocator()
months = mdates.MonthLocator() # every month
dayFmt = mdates.DateFormatter('%D')
monthFmt = mdates.DateFormatter('%Y-%m')
ax.figure.autofmt_xdate()
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthFmt)
ax.xaxis.set_minor_locator(days)
span_days = 90
start = pd.to_datetime("1-1-2012")
idx = pd.date_range(start, periods=span_days).tolist()
df=pd.DataFrame(index=idx, data={'A':np.random.random(span_days), 'B':np.random.random(span_days)})
plt.close('all')
fig, ax = plt.subplots(1)
ax.bar(df.index, df.A) # loop over columns here to do stacked plot
format_x_date_month_day(ax)
plt.show()
(See matplotlib.org for example of looping to create a stacked bar plot.) This gives us
Another approach that should work and be much easier is to use df.plot.bar(ax=ax, stacked=True), however it does not admit date axis formatting with mdates:
plt.close('all')
fig, ax = plt.subplots(1)
df.plot.bar(ax=ax, stacked=True)
format_x_date_month_day(ax)
plt.show()
How can mdates and ax.figure.autofmt_xdate() be made to play nice with df.plot.bar?
Bar plots in pandas are designed to compare categories rather than to display time-series or other types of continuous variables, as stated in the docstring:
A bar plot shows comparisons among discrete categories. One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.
This is why the scale of the x-axis of pandas bar plots is made of integers starting from zero, regardless of the data type of the x variable. When the same bar plot is created with matplotlib, the scale of the x-axis is made of matplotlib date numbers, so the tick locators and formatters of the matplotlib.dates module (mdates) can be used as expected.
To be able to use a pandas bar plot with mdates, you need to move the bars along the x-axis to locations that match the matplotlib date numbers. This can be done thanks to the mdates.date2num function. This is illustrated in the following example based on the code you provided with a few modifications: the sample dataset contains 3 variables, the time series is limited to 45 days, and the tick formatting is adjusted to my preferences (and is not wrapped as a function).
This example works for any number of variables (with or without NaNs) and for any bar width that is passed to the pandas plot function:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create random dataset
rng = np.random.default_rng(seed=1) # random number generator
nperiods = 45
nvar = 3
idx = pd.date_range('2012-01-01', periods=nperiods, freq='D')
df = pd.DataFrame(rng.integers(11, size=(idx.size, nvar)),
index=idx, columns=list('ABC'))
# Draw pandas stacked bar chart
ax = df.plot(kind='bar', stacked=True, figsize=(10,5))
# Compute width of bars in matplotlib date units
pandas_width = ax.patches[0].get_width() # the default bar width is 0.5
mdates_x0 = mdates.date2num(df.index[0])
mdates_x1 = mdates.date2num(df.index[1])
mdates_width_default = (mdates_x1-mdates_x0)/2
mdates_width = pandas_width*mdates_width_default/0.5 # rule of three conversion
# Compute new x values for bars in matplotlib date units, adjusting the
# positions according to the bar width
mdates_x = mdates.date2num(df.index) - mdates_width/2
nvar = len(ax.get_legend_handles_labels()[1])
mdates_x_patches = np.ravel(nvar*[mdates_x])
# Set bars to new x positions: this loop works fine with NaN values as
# well because in bar plot NaNs are drawn with a rectangle of 0 height
# located at the foot of the bar, you can verify this with patch.get_bbox()
for patch, new_x in zip(ax.patches, mdates_x_patches):
patch.set_x(new_x)
patch.set_width(mdates_width)
# Set major and minor date tick locators
months = mdates.MonthLocator()
days = mdates.DayLocator(bymonthday=np.arange(31, step=3))
ax.xaxis.set_major_locator(months)
ax.xaxis.set_minor_locator(days)
# Set major date tick formatter
month_fmt = mdates.DateFormatter('\n%b\n%Y')
day_fmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(month_fmt)
ax.xaxis.set_minor_formatter(day_fmt)
# Shift the plot frame to where the bars are now located
xmin = min(mdates_x) - mdates_width
xmax = max(mdates_x) + 2*mdates_width
ax.set_xlim(xmin, xmax)
# Adjust tick label format last, else it may produce unexpected results
ax.figure.autofmt_xdate(rotation=0, ha='center')
Up to you to decide if this is more convenient than plotting stacked bars from scratch with matplotlib.
This solution can be slightly modified to display appropriate tick labels for time series based on any frequency of time. Here is an example using a frequency of minutes, a custom bar width, and an automatic date tick locator and formatter. Only the new/modified code lines are shown:
import matplotlib.ticker as mtick
#...
idx = pd.date_range('2012-01-01 12', periods=nperiods, freq='T')
#...
ax = df.plot(kind='bar', stacked=True, figsize=(10,5), width=0.3)
#...
# Set adaptive tick locators and major tick formatter
maj_loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(maj_loc)
min_loc = mtick.FixedLocator(mdates_x + mdates_width/2)
ax.xaxis.set_minor_locator(min_loc) # draw minor tick under each bar
fmt = mdates.ConciseDateFormatter(maj_loc)
ax.xaxis.set_major_formatter(fmt)
#...
You may notice that the ticks are often not well aligned with the bars. There appears to be some issue with matplotlib when the figure elements are put together. I find this is usually only noticeable when plotting thinner-than-useful bars. You can check that the bars and ticks are indeed placed correctly by running ax.get_xticks() and comparing that to the values given by patch.get_bbox() when looping through ax.patches.
I am using matplotlib 1.2.x and Python 2.6.5 on Ubuntu 10.0.4. I am trying to create a SINGLE plot that consists of a top plot and a bottom plot.
The X axis is the date of the time series. The top plot contains a candlestick plot of the data, and the bottom plot should consist of a bar type plot - with its own Y axis (also on the left - same as the top plot). These two plots should NOT OVERLAP.
Here is a snippet of what I have done so far.
datafile = r'/var/tmp/trz12.csv'
r = mlab.csv2rec(datafile, delimiter=',', names=('dt', 'op', 'hi', 'lo', 'cl', 'vol', 'oi'))
mask = (r["dt"] >= datetime.date(startdate)) & (r["dt"] <= datetime.date(enddate))
selected = r[mask]
plotdata = zip(date2num(selected['dt']), selected['op'], selected['cl'], selected['hi'], selected['lo'], selected['vol'], selected['oi'])
# Setup charting
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # Eg, Jan 12
dayFormatter = DateFormatter('%d') # Eg, 12
monthFormatter = DateFormatter('%b %y')
# every Nth month
months = MonthLocator(range(1,13), bymonthday=1, interval=1)
fig = pylab.figure()
fig.subplots_adjust(bottom=0.1)
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(months)#mondays
ax.xaxis.set_major_formatter(monthFormatter) #weekFormatter
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
ax.format_ydata = price
ax.grid(True)
candlestick(ax, plotdata, width=0.5, colorup='g', colordown='r', alpha=0.85)
ax.xaxis_date()
ax.autoscale_view()
pylab.setp( pylab.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
# Add volume data
# Note: the code below OVERWRITES the bottom part of the first plot
# it should be plotted UNDERNEATH the first plot - but somehow, that's not happening
fig.subplots_adjust(hspace=0.15)
ay = fig.add_subplot(212)
volumes = [ x[-2] for x in plotdata]
ay.bar(range(len(plotdata)), volumes, 0.05)
pylab.show()
I have managed to display the two plots using the code above, however, there are two problems with the bottom plot:
It COMPLETELY OVERWRITES the bottom part of the first (top) plot - almost as though the second plot was drawing on the same 'canvas' as the first plot - I can't see where/why that is happening.
It OVERWRITES the existing X axis with its own indice, the X axis values (dates) should be SHARED between the two plots.
What am I doing wrong in my code?. Can someone spot what is causing the 2nd (bottom) plot to overwrite the first (top) plot - and how can I fix this?
Here is a screenshot of the plot created by the code above:
[[Edit]]
After modifying the code as suggested by hwlau, this is the new plot. It is better than the first in that the two plots are separate, however the following issues remain:
The X axis should be SHARED by the two plots (i.e. the X axis should be shown only for the 2nd [bottom] plot)
The Y values for the 2nd plot seem to be formmated incorrectly
I think these issues should be quite easy to resolve however, my matplotlib fu is not great at the moment, as I have only recently started programming with matplotlib. any help will be much appreciated.
There seem to be a couple of problems with your code:
If you were using figure.add_subplots with the full
signature of subplot(nrows, ncols, plotNum) it may have
been more apparent that your first plot asking for 1 row
and 1 column and the second plot was asking for 2 rows and
1 column. Hence your first plot is filling the whole figure.
Rather than fig.add_subplot(111) followed by fig.add_subplot(212)
use fig.add_subplot(211) followed by fig.add_subplot(212).
Sharing an axis should be done in the add_subplot command using sharex=first_axis_instance
I have put together an example which you should be able to run:
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
import datetime as dt
n_pts = 10
dates = [dt.datetime.now() + dt.timedelta(days=i) for i in range(n_pts)]
ax1 = plt.subplot(2, 1, 1)
ax1.plot(dates, range(10))
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
ax2.bar(dates, range(10, 20))
# Now format the x axis. This *MUST* be done after all sharex commands are run.
# put no more than 10 ticks on the date axis.
ax1.xaxis.set_major_locator(mticker.MaxNLocator(10))
# format the date in our own way.
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
# rotate the labels on both date axes
for label in ax1.xaxis.get_ticklabels():
label.set_rotation(30)
for label in ax2.xaxis.get_ticklabels():
label.set_rotation(30)
# tweak the subplot spacing to fit the rotated labels correctly
plt.subplots_adjust(hspace=0.35, bottom=0.125)
plt.show()
Hope that helps.
You should change this line:
ax = fig.add_subplot(111)
to
ax = fig.add_subplot(211)
The original command means that there is one row and one column so it occupies the whole graph. So your second graph fig.add_subplot(212) cover the lower part of the first graph.
Edit
If you dont want the gap between two plots, use subplots_adjust() to change the size of the subplots margin.
The example from #Pelson, simplified.
import matplotlib.pyplot as plt
import datetime as dt
#Two subplots that share one x axis
fig,ax=plt.subplots(2,sharex=True)
#plot data
n_pts = 10
dates = [dt.datetime.now() + dt.timedelta(days=i) for i in range(n_pts)]
ax[0].bar(dates, range(10, 20))
ax[1].plot(dates, range(10))
#rotate and format the dates on the x axis
fig.autofmt_xdate()
The subplots sharing an x-axis are created in one line, which is convenient when you want more than two subplots:
fig, ax = plt.subplots(number_of_subplots, sharex=True)
To format the date correctly on the x axis, we can simply use fig.autofmt_xdate()
For additional informations, see shared axis demo and date demo from the pylab examples.
This example ran on Python3, matplotlib 1.5.1