I have to plot several curves with very high xtick density, say 1000 date strings. To prevent these tick labels overlapping each other I manually set them to be 60 dates apart. Code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
fig = plt.figure(1)
ax = plt.subplot(1, 1, 1)
tick_spacing = 60
for i in range(5):
plt.plot(ts_index, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
fig.savefig(r".\net_value_curves.png", )
fig.clf()
I'm running this piece of code in PyCharm Community Edition 2017.2.2 with a Python 3.6 kernel. Now comes the funny thing: whenever I ran the code in the normal "run" mode (i.e. just hit the execution button and let the code run "freely" till interruption or termination), then the figure I got would always miss xticklabels:
However, if I ran the code in "debug" mode and ran it step by step then I would get an expected figure with complete xticklabels:
This is really weird. Anyway, I just hope to find a way that can ensure me getting the desired output (the second figure) in the normal "run" mode. How can I modify my current code to achieve this?
Thanks in advance!
Your x axis data are strings. Hence you will get one tick per data point. This is probably not what you want. Instead use the dates to plot. Because you are using pandas, this is easily converted,
dates = pd.to_datetime(ts_index, format="%Y%m%d")
You may then get rid of your manual xtick locating and formatting, because matplotlib will automatically choose some nice tick locations for you.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
dates = pd.to_datetime(ts_index, format="%Y%m%d")
fig, ax = plt.subplots()
for i in range(5):
plt.plot(dates, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
However in case you do want to have some manual control over the locations and formats you may use matplotlib.dates locators and formatters.
# tick every 3 months
plt.gca().xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))
# format as "%Y%m%d"
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%Y%m%d"))
In general, the Axis object computes and places ticks using a Locator object. Locators and Formatters are meant to be easily replaceable, with appropriate methods of Axis. The default Locator does not seem to be doing the trick for you so you can replace it with anything you want using axes.xaxis.set_major_locator. This problem is not complicated enough to write your own, so I would suggest that MaxNLocator fits your needs fairly well. Your example seems to work well with nbins=16 (which is what you have in the picture, since there are 17 ticks.
You need to add an import:
from matplotlib.ticker import MaxNLocator
You need to replace the block
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
with
ax.xaxis.set_major_locator(MaxNLocator(nbins=16))
or just
ax.xaxis.set_major_locator(MaxNLocator(16))
You may want to play around with the other arguments (all of which have to be keywords, except nbins). Pay especial attention to integer.
Note that for the Locator and Formatter APIs we work with an Axis object, not Axes. Axes is the whole plot, while Axis is the thing with the spines on it. Axes usually contains two Axis objects and all the other stuff in your plot.
You can set the visibility of the xticks-labels to False
for label in plt.gca().xaxis.get_ticklabels()[::N]:
label.set_visible(False)
This will set every Nth label invisible.
Related
I noticed a 'strange' behaviour when running the following code:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator)
freqs = np.logspace(2,4)
freqs_ext = np.logspace(2, 10)
fig, ax = plt.subplots(1,2)
ax [0].plot(freqs , freqs**2)
#ax[0].xaxis.set_minor_locator(AutoMinorLocator(5))
ax[0].grid(which='both')
#ax[0].minorticks_on()
ax[0].set_xscale( 'log')
ax[1].plot(freqs_ext,freqs_ext**2)
#ax[l].xaxis.set_minor_locator(AutoMinorLocator(5))
ax[1].grid(which='both')
#ax[1].minorticks on()
ax[1].set_xscale('log')
The output is the following:
I have tried more variants than I care to report, (some are commented out in the code above), but I cannot get matplotlib to draw minor gridlines for the plot on the right side, as it does for the one on the left.
I think I have understood that the "problem" lies in where the ticks are located for the second plot, which has a much larger span. They are every two decades and I believe this might be the source of the minor grid lines not displaying.
I have played with xaxis.set_xticks and obtained ticks every decade, but still cannot get this to correctly produce the gridlines.
It is probably something stupid but I can't see it.
NOTE : I know that matplotlib doesn't turn the minor ticks on by default, and in this case this action is "triggered" by changing the scale to log (that's why axis.grid(which='both') actually only acts on the x axis)
OK, I have found this answer:
Matplotlib: strange double-decade axis ticks in log plot
which actually shows how the issue is a design choice for matplotlib starting with v2. Answer was given in 2017 so, not the newest issue :)
The following code correctly plots the minor grids as wanted:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import LogLocator
freqs = np.logspace(2,4)
freqs_ext = np.logspace(2, 10)
fig, ax = plt.subplots(1,2)
ax[0].plot(freqs , freqs**2)
ax[0].grid(which='both')
ax[0].set_xscale( 'log')
ax[1].plot(freqs_ext,freqs_ext**2)
ax[1].set_xscale('log')
ax[1].xaxis.set_major_locator(LogLocator(numticks=15))
ax[1].xaxis.set_minor_locator(LogLocator(numticks=15,subs=np.arange(2,10)))
ax[1].grid(which='both')
I've got an issue with matplotlib and the way it displays graphs.
In my Python Crash Course coursebook, one of early graphs is meant to display up to 1000 on the x axis, and up to 1,000,000 on the y axis. Instead it displays a float of up to 2.0, and 1e6 at the top.
I use VSCode. I worry I haven't properly configured it. When displaying the course materials made by the developer, I have the same problem.
Here's the graph I want.
Here's the graph I've got.
And here's my code.
import matplotlib.pyplot as plt
x_values = range(1, 1001)
y_values = [x**2 for x in x_values]
plt.style.use('seaborn')
fig, ax = plt.subplots()
ax.scatter(x_values, y_values, c=y_values, cmap=plt.cm.Blues, s=10)
# Set chart title and label axes.
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# Set size of tick labels.
ax.tick_params(axis='both', which='major', labelsize=14)
# Set the range for each axis.
ax.axis([0, 1100, 0, 1100000])
plt.show()
If anyone has any experience with this, please let me know. I'm happy to change to another IDE that displays this properly, any recommendations would be welcome.
This is default matplotlib behaviour. You can turn this off by creating a custom ScalarFormatter object and turning scientific notation off. For more details, see the matplotlib documentation pages on tick formatters and on ScalarFormatter.
# additional import statement at the top
import matplotlib.pyplot as plt
from matplotlib import ticker
# additional code before plt.show()
formatter = ticker.ScalarFormatter()
formatter.set_scientific(False)
ax.yaxis.set_major_formatter(formatter)
Note that, most likely, the axis label will be slightly cut off. One way to fix this is by adding fig.tight_layout() before plt.show().
Responding to an old question in case it helps someone, but place the following before "plt.show()"
ax.ticklabel_format(style='plain')
I have the following code for generating a time series plot
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
series = pd.Series([np.sin(ii*np.pi) for ii in range(30)],
index=pd.date_range(start='2019-01-01', end='2019-12-31',
periods=30))
series.plot(ax=ax)
I want to set an automatic limit for x and y, I tried using ax.margins() but it does not seem to work:
ax.margins(y=0.1, x=0.05)
# even with
# ax.margins(y=0.1, x=5)
What I am looking for is an automatic method like padding=0.1 (10% of whitespace around the graph)
Pandas and matplotlib seem to be confused rather often while collaborating when axes have dates. For some reason in this case ax.margins doesn't work as expected with the x-axis.
Here is a workaround which does seem to do the job, explicitely moving the xlims:
xmargins = 0.05
ymargins = 0.1
ax.margins(y=ymargins)
x0, x1 = plt.xlim()
plt.xlim(x0-xmargins*(x1-x0), x1+xmargins*(x1-x0))
Alternatively, you could work directly with matplotlib's plot, which does work as expected applying the margins to the date axis.
ax.plot(series.index, series)
ax.margins(y=0.1, x=0.05)
PS: This post talks about setting use_sticky_edges to False and calling autoscale_view after setting the margins, but also that doesn't seem to work here.
ax.use_sticky_edges = False
ax.autoscale_view(scaley=True, scalex=True)
You can use ax.set_xlim and ax.set_ylim to set the x and y limits of your plot respectively.
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
series = pd.Series([np.sin(ii*np.pi) for ii in range(30)],
index=pd.date_range(start='2019-01-01', end='2019-12-31',
periods=30))
# set xlim to be a between certain dates
ax.set_xlim((pd.to_datetime('2019-01-01'), pd.to_datetime('2019-01-31'))
# set ylim to be between certain values
ax.set_ylim((-0.5, 0.5))
series.plot(ax=ax)
I have been given a data for which I need to find a histogram. So I used pandas hist() function and plot it using matplotlib. The code runs on a remote server so I cannot directly see it and hence I save the image. Here is what the image looks like
Here is my code below
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5) // raw_data is the data supplied to me
plt.savefig('/path/to/file.png')
plt.close()
As you can see the x axis labels are overlapping. So I used this function plt.tight_layout() like so
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5)
plt.tight_layout()
plt.savefig('/path/to/file.png')
plt.close()
There is some improvement now
But still the labels are too close. Is there a way to ensure the labels do not touch each other and there is fair spacing between them? Also I want to resize the image to make it smaller.
I checked the documentation here https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html but not sure which parameter to use for savefig.
Since raw_data is not already a pandas dataframe there's no need to turn it into one to do the plotting. Instead you can plot directly with matplotlib.
There are many different ways to achieve what you'd like. I'll start by setting up some data which looks similar to yours:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gamma
raw_data = gamma.rvs(a=1, scale=1e6, size=100)
If we go ahead and use matplotlib to create the histogram we may find the xticks too close together:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
ax.hist(raw_data, bins=5)
fig.tight_layout()
The xticks are hard to read with all the zeros, regardless of spacing. So, one thing you may wish to do would be to use scientific formatting. This makes the x-axis much easier to interpret:
ax.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
Another option, without using scientific formatting would be to rotate the ticks (as mentioned in the comments):
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
Finally, you also mentioned altering the size of the image. Note that this is best done when the figure is initialised. You can set the size of the figure with the figsize argument. The following would create a figure 5" wide and 3" in height:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
I think the two best fixes were mentioned by Pam in the comments.
You can rotate the labels with
plt.xticks(rotation=45
For more information, look here: Rotate axis text in python matplotlib
The real problem is too many zeros that don't provide any extra info. Numpy arrays are pretty easy to work with, so pd.DataFrame(np.array(raw_data)/1000).hist(bins=5) should get rid of three zeros off of both axes. Then just add a 'kilo' in the axes labels.
To change the size of the graph use rcParams.
from matplotlib import rcParams
rcParams['figure.figsize'] = 7, 5.75 #the numbers are the dimensions
I was trying to plot a time series data figure using matplotbib, the problem is that there are too many observations, therefore the labels have overlap and don't fit well within a sized figure.
I am thinking of three solutions, one is to shrink the label size of observations, one is to change the text into vertical order or skewed manner, last is only to specify the first and last a few observations with dots between them. The code is to demonstrate my point.
I wonder anyone can help? Thanks
from datetime import date
import numpy as np
from pandas import *
import matplotlib.pyplot as plt
N = 100
data = np.array(np.random.randn(N))
time_index = date_range(date.today(), periods = len(data))
plt.plot(time_index, data)
For your simple plot, you could do
plt.xticks(rotation=90).
Alternatively, you could specify what ticks you wanted to display with
plt.xticks(<certain range of values>)
plt.xticklabels(<labels for those values>)
Edit:
Personally, I would change to the object-oriented way of pyplot.
f = plt.figure()
ax = f.add_subplot(111)
ax.plot(<stuff>)
ax.tick_params(axis='x', labelsize='8')
plt.setp( ax.xaxis.get_majorticklabels(), rotation=90 )
# OR
xlabels = ax.get_xticklabels()
for label in xlabels:
label.set_rotation(90)
plt.show()