How can I force the x axis to use column entries - python

I am trying to create a chart using a data frame which has TimePeriod as 201811, 201812, 201901, ..., 202006 which I want to use as the x axis values and plot against the y values (Total lives). See figure here:
However, when I plot the figure the x axis shows up as 201825, 201850, 201875, 201925,..., 202025. This clearly makes no sense and I cannot figure out how to force python to plot the desired x axis.
I am assuming it is something in xticks but I haven't has any luck. I have also tried manually entering all x axis values as labels = ('201811', '201812', '201901', ...) but this did not work either.
Is there any way to achieve the desired outcome?
Code:
import numpy as np
import pyodbc
import matplotlib.pyplot as plt
aggregated_lives_plt = aggregated_lives.plot(x= 'TimePeriodId', y='TotalLives', kind = 'line')
plt.title('Aggregated Optional Benefit Certs Since Nov-2018')
plt.xlabel('Time Period')
plt.ylabel('Total Certs (Lives)')
plt.show()
Thank you for any help!

You Timeperiod is integers, you can convert it to string:
aggregated_lives['TimePeriodId'] = aggregated_lives['TimePeriodId'].astype(str)
then use your plot command.

Related

Seaborn showing x-tick labels overlapping

I am trying to make a box plot that looks like this.
Now, there are a lot of tickmarks that I do not need and truly do not show any additional information.
The code I am using is the following:
plot=sns.boxplot(y=MSE, x=Sim,
width=0.5,
palette='colorblind')
plot=sns.stripplot(y=MSE, x=Sim,
jitter=True,
marker='o',
alpha=0.15,
color='black')
plt.xlabel('xlabel')
plt.ylabel('ylabel')
plt.gca().invert_xaxis()
Where MSE and SIM are two numpy arrays of 400 elements each.
I reviewed some solutions that use locator_params and set_xticklabels. However, I want to know:
why this happen and,
is there a simple transformation in the MSE and SIM arrays to solve this?
I hope my questions are clear enough.
Thanks in advance.
Not very sure what you have as Sim, if it is an array of floats, then they are converted to categorical before plotting. The thing you can do, since the labels are not useful, is to use a range of values thats as long as the y-values.
With that, it still overlaps a lot because you are trying to fit 400 x ticks onto the x-axis, and the font size are set by default to be something readable. For example:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
fig,ax = plt.subplots(figsize=(15,6))
MSE = [np.random.normal(0,1,10) for i in range(100)]
Sim = np.arange(len(MSE))
g = sns.boxplot(y=MSE, x=Sim, width=0.5,palette='colorblind',ax=ax)
You can set the font size to be smaller and they don't overlap but I guess its hardly readable:
So like you said in your case, they are not useful, you can do:
ax.set(xticks=Sim[0::10])

How to ensure even spacing between labels on x axis of matplotlib graph?

I have been given a data for which I need to find a histogram. So I used pandas hist() function and plot it using matplotlib. The code runs on a remote server so I cannot directly see it and hence I save the image. Here is what the image looks like
Here is my code below
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5) // raw_data is the data supplied to me
plt.savefig('/path/to/file.png')
plt.close()
As you can see the x axis labels are overlapping. So I used this function plt.tight_layout() like so
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5)
plt.tight_layout()
plt.savefig('/path/to/file.png')
plt.close()
There is some improvement now
But still the labels are too close. Is there a way to ensure the labels do not touch each other and there is fair spacing between them? Also I want to resize the image to make it smaller.
I checked the documentation here https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html but not sure which parameter to use for savefig.
Since raw_data is not already a pandas dataframe there's no need to turn it into one to do the plotting. Instead you can plot directly with matplotlib.
There are many different ways to achieve what you'd like. I'll start by setting up some data which looks similar to yours:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gamma
raw_data = gamma.rvs(a=1, scale=1e6, size=100)
If we go ahead and use matplotlib to create the histogram we may find the xticks too close together:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
ax.hist(raw_data, bins=5)
fig.tight_layout()
The xticks are hard to read with all the zeros, regardless of spacing. So, one thing you may wish to do would be to use scientific formatting. This makes the x-axis much easier to interpret:
ax.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
Another option, without using scientific formatting would be to rotate the ticks (as mentioned in the comments):
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
Finally, you also mentioned altering the size of the image. Note that this is best done when the figure is initialised. You can set the size of the figure with the figsize argument. The following would create a figure 5" wide and 3" in height:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
I think the two best fixes were mentioned by Pam in the comments.
You can rotate the labels with
plt.xticks(rotation=45
For more information, look here: Rotate axis text in python matplotlib
The real problem is too many zeros that don't provide any extra info. Numpy arrays are pretty easy to work with, so pd.DataFrame(np.array(raw_data)/1000).hist(bins=5) should get rid of three zeros off of both axes. Then just add a 'kilo' in the axes labels.
To change the size of the graph use rcParams.
from matplotlib import rcParams
rcParams['figure.figsize'] = 7, 5.75 #the numbers are the dimensions

Pyplot how to reduce xticks *and* xticklabels density?

I have to plot several curves with very high xtick density, say 1000 date strings. To prevent these tick labels overlapping each other I manually set them to be 60 dates apart. Code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
fig = plt.figure(1)
ax = plt.subplot(1, 1, 1)
tick_spacing = 60
for i in range(5):
plt.plot(ts_index, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
fig.savefig(r".\net_value_curves.png", )
fig.clf()
I'm running this piece of code in PyCharm Community Edition 2017.2.2 with a Python 3.6 kernel. Now comes the funny thing: whenever I ran the code in the normal "run" mode (i.e. just hit the execution button and let the code run "freely" till interruption or termination), then the figure I got would always miss xticklabels:
However, if I ran the code in "debug" mode and ran it step by step then I would get an expected figure with complete xticklabels:
This is really weird. Anyway, I just hope to find a way that can ensure me getting the desired output (the second figure) in the normal "run" mode. How can I modify my current code to achieve this?
Thanks in advance!
Your x axis data are strings. Hence you will get one tick per data point. This is probably not what you want. Instead use the dates to plot. Because you are using pandas, this is easily converted,
dates = pd.to_datetime(ts_index, format="%Y%m%d")
You may then get rid of your manual xtick locating and formatting, because matplotlib will automatically choose some nice tick locations for you.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
dates = pd.to_datetime(ts_index, format="%Y%m%d")
fig, ax = plt.subplots()
for i in range(5):
plt.plot(dates, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
However in case you do want to have some manual control over the locations and formats you may use matplotlib.dates locators and formatters.
# tick every 3 months
plt.gca().xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))
# format as "%Y%m%d"
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%Y%m%d"))
In general, the Axis object computes and places ticks using a Locator object. Locators and Formatters are meant to be easily replaceable, with appropriate methods of Axis. The default Locator does not seem to be doing the trick for you so you can replace it with anything you want using axes.xaxis.set_major_locator. This problem is not complicated enough to write your own, so I would suggest that MaxNLocator fits your needs fairly well. Your example seems to work well with nbins=16 (which is what you have in the picture, since there are 17 ticks.
You need to add an import:
from matplotlib.ticker import MaxNLocator
You need to replace the block
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
with
ax.xaxis.set_major_locator(MaxNLocator(nbins=16))
or just
ax.xaxis.set_major_locator(MaxNLocator(16))
You may want to play around with the other arguments (all of which have to be keywords, except nbins). Pay especial attention to integer.
Note that for the Locator and Formatter APIs we work with an Axis object, not Axes. Axes is the whole plot, while Axis is the thing with the spines on it. Axes usually contains two Axis objects and all the other stuff in your plot.
You can set the visibility of the xticks-labels to False
for label in plt.gca().xaxis.get_ticklabels()[::N]:
label.set_visible(False)
This will set every Nth label invisible.

In a pandas hist () plot with sub-histograms, how to insert titles for the x and y axes, and overall title?

I am using the pandas hist() method with the 'by' option, specifically:
histos=data_ok._DiffPricePercent.hist(by=input_data._Category, sharex=True, sharey=True )
This command produces this plot:
How do I add titles for respectively the x and y axes on each of the sub-histograms, or alternatively, overall? Also, how to insert an overall title for the plot?
I tried the following, but it does not go through (with error "AttributeError: 'numpy.ndarray' object has no attribute 'set_ylabel'"):
histos.set_ylabel('Fréquence')
histos.set_xlabel('Variation prix suggérée, en %')
Many thanks in advance for any suggestions
What you actually get is an object-type numpy array with elements in it that are the AxesSubplot instances. Try
histos[0].set_xlabel('My x label 1')
histos[1].set_xlabel('My x label 2')
EDIT :
To change the format of ticks, use a Formatter:
from matplotlib.ticker import FormatStrFormatter
maj_frm = FormatStrFormatter('%.1f')
...
histos[0].xaxis.set_major_formatter(maj_frm)
The documentation on tick locating and formatting can be found here.
.plot or .hist on a pandas dataframe is likely to write a numpy array with a given size.
One way to set attributes at the level of the subplot is (assuming the numpy array plt outputted by a pandas plot is 2D):
for (x,y), value in numpy.ndenumerate(plt)
plt[x,y].set_xlabel(...)
plt[x,y].set_xticks([...])
and so on.

Using a Pandas dataframe index as values for x-axis in matplotlib plot

I have time series in a Pandas dateframe with a number of columns which I'd like to plot. Is there a way to set the x-axis to always use the index from a dateframe?
When I use the .plot() method from Pandas the x-axis is formatted correctly however I when I pass my dates and the column(s) I'd like to plot directly to matplotlib the graph doesn't plot correctly. Thanks in advance.
plt.plot(site2.index.values, site2['Cl'])
plt.show()
FYI: site2.index.values produces this (I've cut out the middle part for brevity):
array([
'1987-07-25T12:30:00.000000000+0200',
'1987-07-25T16:30:00.000000000+0200',
'2010-08-13T02:00:00.000000000+0200',
'2010-08-31T02:00:00.000000000+0200',
'2010-09-15T02:00:00.000000000+0200'
],
dtype='datetime64[ns]')
It seems the issue was that I had .values. Without it (i.e. site2.index) the graph displays correctly.
You can use plt.xticks to set the x-axis
try:
plt.xticks( site2['Cl'], site2.index.values ) # location, labels
plt.plot( site2['Cl'] )
plt.show()
see the documentation for more details: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks
That's Builtin Right Into To plot() method
You can use yourDataFrame.plot(use_index=True) to use the DataFrame Index On X-Axis.
The "use_index=True" sets the DataFrame Index on the X-Axis.
Read More Here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.html
you want to use matplotlib to select a 'sensible' scale just like me, there is one way can solve this question. using a Pandas dataframe index as values for x-axis in matplotlib plot. Code:
ax = plt.plot(site2['Cl'])
x_ticks = ax.get_xticks() # use matplotlib default xticks
x_ticks = list(filter(lambda x: x in range(len(site2)), x_ticks))
ax.set_xticklabels([' '] + site2.index.iloc[x_ticks].to_list())

Categories