I was trying to plot a time series data figure using matplotbib, the problem is that there are too many observations, therefore the labels have overlap and don't fit well within a sized figure.
I am thinking of three solutions, one is to shrink the label size of observations, one is to change the text into vertical order or skewed manner, last is only to specify the first and last a few observations with dots between them. The code is to demonstrate my point.
I wonder anyone can help? Thanks
from datetime import date
import numpy as np
from pandas import *
import matplotlib.pyplot as plt
N = 100
data = np.array(np.random.randn(N))
time_index = date_range(date.today(), periods = len(data))
plt.plot(time_index, data)
For your simple plot, you could do
plt.xticks(rotation=90).
Alternatively, you could specify what ticks you wanted to display with
plt.xticks(<certain range of values>)
plt.xticklabels(<labels for those values>)
Edit:
Personally, I would change to the object-oriented way of pyplot.
f = plt.figure()
ax = f.add_subplot(111)
ax.plot(<stuff>)
ax.tick_params(axis='x', labelsize='8')
plt.setp( ax.xaxis.get_majorticklabels(), rotation=90 )
# OR
xlabels = ax.get_xticklabels()
for label in xlabels:
label.set_rotation(90)
plt.show()
Related
I am querying COVID-19 data and building a dataframe of day-over-day changes for one of the data points (positive test results) where each row is a day, each column is a state or territory (there are 56 altogether). I can then generate a chart for every one of the states, but I can't get my x-axis labels (the dates) to behave like I want. There are two problems which I suspect are related. First, there are too many labels -- usually matplotlib tidily reduces the label count for readability, but I think the subplots are confusing it. Second, I would like the labels to read vertically; but this only happens on the last of the plots. (I tried moving the rotation='vertical' inside the for block, to no avail.)
The dates are the same for all the subplots, so -- this part works -- the x-axis labels only need to appear on the bottom row of the subplots. Matplotlib is doing this automatically. But I need fewer of the labels, and for all of them to align vertically. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# get current data
all_states = pd.read_json("https://covidtracking.com/api/v1/states/daily.json")
# convert the YYYYMMDD date to a datetime object
all_states[['gooddate']] = all_states[['date']].applymap(lambda s: pd.to_datetime(str(s), format = '%Y%m%d'))
# 'positive' is the cumulative total of COVID-19 test results that are positive
all_states_new_positives = all_states.pivot_table(index = 'gooddate', columns = 'state', values = 'positive', aggfunc='sum')
all_states_new_positives_diff = all_states_new_positives.diff()
fig, axes = plt.subplots(14, 4, figsize = (12,8), sharex = True )
plt.tight_layout
for i , ax in enumerate(axes.ravel()):
# get the numbers for the last 28 days
x = all_states_new_positives_diff.iloc[-28 :].index
y = all_states_new_positives_diff.iloc[-28 : , i]
ax.set_title(y.name, loc='left', fontsize=12, fontweight=0)
ax.plot(x,y)
plt.xticks(rotation='vertical')
plt.subplots_adjust(left=0.5, bottom=1, right=1, top=4, wspace=2, hspace=2)
plt.show();
Suggestions:
Increase the height of the figure.
fig, axes = plt.subplots(14, 4, figsize = (12,20), sharex = True)
Rotate all the labels:
fig.autofmt_xdate(rotation=90)
Use tight_layout at the end instead of subplots_adjust:
fig.tight_layout()
I frequently find myself working in log units for my plots, for example taking np.log10(x) of data before binning it or creating contour plots. The problem is, when I then want to make the plots presentable, the axes are in ugly log units, and the tick marks are evenly spaced.
If I let matplotlib do all the conversions, i.e. by setting ax.set_xaxis('log') then I get very nice looking axes, however I can't do that to my data since it is e.g. already binned in log units. I could manually change the tick labels, but that wouldn't make the tick spacing logarithmic. I suppose I could also go and manually specify the position of every minor tick such it had log spacing, but is that the only way to achieve this? That is a bit tedious so it would be nice if there is a better way.
For concreteness, here is a plot:
I want to have the tick labels as 10^x and 10^y (so '1' is '10', 2 is '100' etc.), and I want the minor ticks to be drawn as ax.set_xaxis('log') would draw them.
Edit: For further concreteness, suppose the plot is generated from an image, like this:
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
x_range = [-5,3] # log10 units
y_range = [-55, -45] # log10 units
p = plt.imshow(img,extent=x_range+y_range)
plt.show()
and all we want to do is change the axes appearance as I have described.
Edit 2: Ok, ImportanceOfBeingErnest's answer is very clever but it is a bit more specific to images than I wanted. I have another example, of binned data this time. Perhaps their technique still works on this, though it is not clear to me if that is the case.
import numpy as np
import pandas as pd
import datashader as ds
from matplotlib import pyplot as plt
import scipy.stats as sps
v1 = sps.lognorm(loc=0, scale=3, s=0.8)
v2 = sps.lognorm(loc=0, scale=1, s=0.8)
x = np.log10(v1.rvs(100000))
y = np.log10(v2.rvs(100000))
x_range=[np.min(x),np.max(x)]
y_range=[np.min(y),np.max(y)]
df = pd.DataFrame.from_dict({"x": x, "y": y})
#------ Aggregate the data ------
cvs = ds.Canvas(plot_width=30, plot_height=30, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'x', 'y')
# Create contour plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(agg, extent=x_range+y_range)
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
The general answer to this question is probably given in this post:
Can I mimic a log scale of an axis in matplotlib without transforming the associated data?
However here an easy option might be to scale the content of the axes and then set the axes to a log scale.
A. image
You may plot your image on a logarithmic scale but make all pixels the same size in log units. Unfortunately imshow does not allow for such kind of image (any more), but one may use pcolormesh for that purpose.
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc
img = scipy.misc.face()
extx = [-5,3] # log10 units
exty = [-45, -55] # log10 units
x = np.logspace(extx[0],extx[-1],img.shape[1]+1)
y = np.logspace(exty[0],exty[-1],img.shape[0]+1)
X,Y = np.meshgrid(x,y)
c = img.reshape((img.shape[0]*img.shape[1],img.shape[2]))/255.0
m = plt.pcolormesh(X,Y,X[:-1,:-1], color=c, linewidth=0)
m.set_array(None)
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
plt.show()
B. contour
The same concept can be used for a contour plot.
import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(-1.1,1.9)
y = np.linspace(-1.4,1.55)
X,Y = np.meshgrid(x,y)
agg = np.exp(-(X**2+Y**2)*2)
fig, ax = plt.subplots()
plt.gca().set_xscale("log")
plt.gca().set_yscale("log")
exp = lambda x: 10.**(np.array(x))
cf = ax.contourf(exp(X), exp(Y),agg, extent=exp([x.min(),x.max(),y.min(),y.max()]))
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.show()
I have to plot several curves with very high xtick density, say 1000 date strings. To prevent these tick labels overlapping each other I manually set them to be 60 dates apart. Code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
fig = plt.figure(1)
ax = plt.subplot(1, 1, 1)
tick_spacing = 60
for i in range(5):
plt.plot(ts_index, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
fig.savefig(r".\net_value_curves.png", )
fig.clf()
I'm running this piece of code in PyCharm Community Edition 2017.2.2 with a Python 3.6 kernel. Now comes the funny thing: whenever I ran the code in the normal "run" mode (i.e. just hit the execution button and let the code run "freely" till interruption or termination), then the figure I got would always miss xticklabels:
However, if I ran the code in "debug" mode and ran it step by step then I would get an expected figure with complete xticklabels:
This is really weird. Anyway, I just hope to find a way that can ensure me getting the desired output (the second figure) in the normal "run" mode. How can I modify my current code to achieve this?
Thanks in advance!
Your x axis data are strings. Hence you will get one tick per data point. This is probably not what you want. Instead use the dates to plot. Because you are using pandas, this is easily converted,
dates = pd.to_datetime(ts_index, format="%Y%m%d")
You may then get rid of your manual xtick locating and formatting, because matplotlib will automatically choose some nice tick locations for you.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
dates = pd.to_datetime(ts_index, format="%Y%m%d")
fig, ax = plt.subplots()
for i in range(5):
plt.plot(dates, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
However in case you do want to have some manual control over the locations and formats you may use matplotlib.dates locators and formatters.
# tick every 3 months
plt.gca().xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))
# format as "%Y%m%d"
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%Y%m%d"))
In general, the Axis object computes and places ticks using a Locator object. Locators and Formatters are meant to be easily replaceable, with appropriate methods of Axis. The default Locator does not seem to be doing the trick for you so you can replace it with anything you want using axes.xaxis.set_major_locator. This problem is not complicated enough to write your own, so I would suggest that MaxNLocator fits your needs fairly well. Your example seems to work well with nbins=16 (which is what you have in the picture, since there are 17 ticks.
You need to add an import:
from matplotlib.ticker import MaxNLocator
You need to replace the block
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
with
ax.xaxis.set_major_locator(MaxNLocator(nbins=16))
or just
ax.xaxis.set_major_locator(MaxNLocator(16))
You may want to play around with the other arguments (all of which have to be keywords, except nbins). Pay especial attention to integer.
Note that for the Locator and Formatter APIs we work with an Axis object, not Axes. Axes is the whole plot, while Axis is the thing with the spines on it. Axes usually contains two Axis objects and all the other stuff in your plot.
You can set the visibility of the xticks-labels to False
for label in plt.gca().xaxis.get_ticklabels()[::N]:
label.set_visible(False)
This will set every Nth label invisible.
Changing the vertical distance between two subplot using tight_layout(h_pad=-1) changes the total figuresize. How can I define the figuresize using tight_layout?
Here is the code:
#define figure
pl.figure(figsize=(10, 6.25))
ax1=subplot(211)
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
subplot(212)
x=linspace(0,50)
pl.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
And here is the results:
If I write
pl.tight_layout(h_pad=-2)
in the last line, then I get this:
As you can see, the figure is bigger...
You can use a GridSpec object to control precisely width and height ratios, as answered on this thread and documented here.
Experimenting with your code, I could produce something like what you want, by using a height_ratio that assigns twice the space to the upper subplot, and increasing the h_pad parameter to the tight_layout call. This does not sound completely right, but maybe you can adjust this further ...
import numpy as np
from matplotlib.pyplot import *
import matplotlib.pyplot as pl
import matplotlib.gridspec as gridspec
#define figure
fig = pl.figure(figsize=(10, 6.25))
gs = gridspec.GridSpec(2, 1, height_ratios=[2,1])
ax1=subplot(gs[0])
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
ax2=subplot(gs[1])
x=np.linspace(0,50)
ax2.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
fig.tight_layout(h_pad=-5)
show()
There were other issues, like correcting the imports, adding numpy, and plotting to ax2 instead of directly with pl. The output I see is this:
This case is peculiar because of the fact that the default aspect ratios of images and plots are not the same. So it is worth noting for people looking to remove the spaces in a grid of subplots consisting of images only or of plots only that you may find an appropriate solution among the answers to this question (and those linked to it): How to remove the space between subplots in matplotlib.pyplot?.
The aspect ratios of the subplots in this particular example are as follows:
# Default aspect ratio of images:
ax1.get_aspect()
# 1.0
# Which is as it is expected based on the default settings in rcParams file:
matplotlib.rcParams['image.aspect']
# 'equal'
# Default aspect ratio of plots:
ax2.get_aspect()
# 'auto'
The size of ax1 and the space beneath it are adjusted automatically based on the number of pixels along the x-axis (i.e. width) so as to preserve the 'equal' aspect ratio while fitting both subplots within the figure. As you mentioned, using fig.tight_layout(h_pad=xxx) or the similar fig.set_constrained_layout_pads(hspace=xxx) is not a good option as this makes the figure larger.
To remove the gap while preserving the original figure size, you can use fig.subplots_adjust(hspace=xxx) or the equivalent plt.subplots(gridspec_kw=dict(hspace=xxx)), as shown in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
np.random.seed(1)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 6.25),
gridspec_kw=dict(hspace=-0.206))
# For those not using plt.subplots, you can use this instead:
# fig.subplots_adjust(hspace=-0.206)
size = 50
ax1.imshow(np.random.random((10, size)))
ax1.xaxis.set_visible(False)
# Create plot of a line that is aligned with the image above
x = np.arange(0, size)
ax2.plot(x, x, 'k-')
ax2.set_xlim(ax1.get_xlim())
plt.show()
I am not aware of any way to define the appropriate hspace automatically so that the gap can be removed for any image width. As stated in the docstring for fig.subplots_adjust(), it corresponds to the height of the padding between subplots, as a fraction of the average axes height. So I attempted to compute hspace by dividing the gap between the subplots by the average height of both subplots like this:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Compute negative hspace to close the vertical gap between subplots
ax1_h = ax1_y1-ax1_y0
ax2_h = ax2_y1-ax2_y0
avg_h = (ax1_h+ax2_h)/2
gap = ax1_y0-ax2_y1
hspace=-(gap/avg_h) # this divided by 2 also does not work
fig.subplots_adjust(hspace=hspace)
Unfortunately, this does not work. Maybe someone else has a solution for this.
It is also worth mentioning that I tried removing the gap between subplots by editing the y positions like in this example:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Set new y positions: shift ax1 down over gap
gap = ax1_y0-ax2_y1
ax1.set_position([ax1_x0, ax1_y0-gap, ax1_x1, ax1_y1-gap])
ax2.set_position([ax2_x0, ax2_y0, ax2_x1, ax2_y1])
Unfortunately, this (and variations of this) produces seemingly unpredictable results, including a figure resizing similar to when using fig.tight_layout(). Maybe someone else has an explanation for what is happening here behind the scenes.
I wish to plot the time variation of my y-axis variable using Matplotlib. This is no problem for continuously discrete data, however how should this be tackled for non-continuous data.
I.e. if I wanted to visualise the times at which my car was stationary on the way to work the x-axis would be time and the y-axis would be comprised of the variables 'stationary' and 'moving' (pretty useless example i know)
The non-continuous data would need to be indexed somehow, but i don't know how to proceed...any ideas?
Is this the type of thing you want? (If not, you might want to check out the matplotlib gallery page to give yourself some ideas, or maybe just draw a picture and post it.)
import matplotlib.pyplot as plt
data = [0]*5 + [1]*10 + [0]*3 +[1]*2
print data
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(data)
ax.set_yticks((0, 1.))
ax.set_yticklabels(('stopped', 'moving'))
ax.set_ybound((-.2, 1.2))
ax.set_xlabel("time (minutes)")
plt.show()