Ticklabels in matplotlib don't match the plot values [duplicate] - python

I have an existing plot that was created with pandas like this:
df['myvar'].plot(kind='bar')
The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)
How can I format the y axis as percentages without changing the line above?
Here is the solution I found but requires that I redefine the plot:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
Link to the above solution: Pyplot: using percentage on x axis

This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):
import ...
import matplotlib.ticker as mtick
ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Update
PercentFormatter was introduced into Matplotlib proper in version 2.1.0.

pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,5))
# you get ax from here
ax = df.plot()
type(ax) # matplotlib.axes._subplots.AxesSubplot
# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])

Jianxun's solution did the job for me but broke the y value indicator at the bottom left of the window.
I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):
import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.DataFrame(np.random.randn(100,5))
ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
Generally speaking I'd recommend using FuncFormatter for label formatting: it's reliable, and versatile.

For those who are looking for the quick one-liner:
plt.gca().set_yticklabels([f'{x:.0%}' for x in plt.gca().get_yticks()])
this assumes
import: from matplotlib import pyplot as plt
Python >=3.6 for f-String formatting. For older versions, replace f'{x:.0%}' with '{:.0%}'.format(x)

I'm late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.
Echoing #Mad Physicist answer, using the package PercentFormatter it would be:
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer

I propose an alternative method using seaborn
Working code:
import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)

You can do this in one line without importing anything:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{}%'.format))
If you want integer percentages, you can do:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format))
You can use either ax.yaxis or plt.gca().yaxis. FuncFormatter is still part of matplotlib.ticker, but you can also do plt.FuncFormatter as a shortcut.

Based on the answer of #erwanp, you can use the formatted string literals of Python 3,
x = '2'
percentage = f'{x}%' # 2%
inside the FuncFormatter() and combined with a lambda expression.
All wrapped:
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: f'{y}%'))

Another one line solution if the yticks are between 0 and 1:
plt.yticks(plt.yticks()[0], ['{:,.0%}'.format(x) for x in plt.yticks()[0]])

add a line of code
ax.yaxis.set_major_formatter(ticker.PercentFormatter())

Related

Changing the order of pandas/matplotlib line plotting without changing data order

Given the following example:
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
df.plot(linewidth=10)
The order of plotting puts the last column on top:
How can I make this keep the data & legend order but change the behaviour so that it plots X on top of Y on top of Z?
(I know I can change the data column order and edit the legend order but I am hoping for a simpler easier method leaving the data as is)
UPDATE: final solution used:
(Thanks to r-beginners) I used the get_lines to modify the z-order of each plot
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(ax=ax, linewidth=10)
lines = ax.get_lines()
for i, line in enumerate(lines, -len(lines)):
line.set_zorder(abs(i))
fig
In a notebook produces:
Get the default zorder and sort it in the desired order.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
ax = df.plot(linewidth=10)
l = ax.get_children()
print(l)
l[0].set_zorder(3)
l[1].set_zorder(1)
l[2].set_zorder(2)
Before definition
After defining zorder
I will just put this answer here because it is a solution to the problem, but probably not the one you are looking for.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# generate data
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
# read columns in reverse order and plot them
# so normally, the legend will be inverted as well, but if we invert it again, you should get what you want
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
Note that in this example, you don't change the order of your data, you just read it differently, so I don't really know if that's what you want.
You can also make it easier on the eyes by creating a corresponding method.
def plot_dataframe(df: pd.DataFrame) -> None:
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
# then you just have to call this
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
plot_dataframe(df)

How to set major locator of secondary axis

I want to set a major locator for a secondary axis with 24 hour intervals, but it’s not valid and does not result in any errors.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
dt=pd.DataFrame({'time':[100000,200000,500000,800000],'value':[1,2,4,6]})
plot= plt.subplot()
plot.plot(dt.time,dt.value)
x_major_locator=plt.MultipleLocator(100000)
plot.xaxis.set_major_locator(x_major_locator)
plot.set_xlabel("Second")
s2h=lambda s: s/3600
h2s=lambda h: h*3600
ax2=plot.secondary_xaxis("top",functions=(s2h,h2s))
x_major_locator=plt.MultipleLocator(24)
ax2.xaxis.set_major_locator(x_major_locator)
ax2.set_xlabel("Hour")
plt.show()
I am not sure why the ticks are not being modified; however, one way to get around this is to create a new subplot axis that shares y. The following will work as long as you do not change the limits because the lines are plotted over each other. If do need to change the limits, then you can do a hacky approach by plotting the line in negative y space and setting the ylims which will preserve your top x-axis.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
dt=pd.DataFrame({'time':[100000,200000,500000,800000],'value':[1,2,4,6]})
plot= plt.subplot()
plot.plot(dt.time,dt.value)
x_major_locator=MultipleLocator(100000)
plot.xaxis.set_major_locator(x_major_locator)
plot.set_xlabel("Second")
s2h=lambda s: s/3600
h2s=lambda h: h*3600
#ax2=plot.secondary_xaxis("top",functions=(s2h,h2s))
ax2 = plot.twiny()
ax2.plot(s2h(dt.time),dt.value)
x_major_locator = MultipleLocator(24)
ax2.xaxis.set_major_locator(x_major_locator)
ax2.set_xlabel("Hour")
#ax2.set_xlim(0,200) #If you do this, you get 2 lines
plt.show()

Clustermapping in Python using Seaborn

I am trying to create a heatmap with dendrograms on Python using Seaborn and I have a csv file with about 900 rows. I'm importing the file as a pandas dataframe and attempting to plot that but a large number of the rows are not being represented in the heatmap. What am I doing wrong?
This is the code I have right now. But the heatmap only represents about 49 rows.
Here is an image of the clustermap I've obtained but it is not displaying all of my data.
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
# Data set
df = pd.read_csv('diff_exp_gene.csv', index_col = 0)
# Default plot
sns.clustermap(df, cmap = 'RdBu', row_cluster=True, col_cluster=True)
plt.show()
Thank you.
An alternative approach would be to use imshow in matpltlib. I'm not exactly sure what your question is but I demonstrate a way to graph points on a plane from csv file
import numpy as np
import matplotlib.pyplot as plt
import csv
infile = open('diff_exp_gene.csv')
df = csv.DictReader(in_file)
temp = np.zeros((128,128), dtype = int)
for row in data:
if row['TYPE'] == types:
temp[int(row['Y'])][int(row['X'])] = temp[int(row['Y'])][int(row['X'])] + 1
plt.imshow(temp, cmap = 'hot', origin = 'lower')
plt.show()
As far as I know, keywords that apply to seaborn heatmaps also apply to clustermap, as the sns.clustermap passes to the sns.heatmap. In that case, all you need to do in your example is to set yticklabels=True as a keyword argument in sns.clustermap(). That will make all of the 900 rows appear.
By default, it is set as "auto" to avoid overlap. The same applies to the xticklabels. See more here: https://seaborn.pydata.org/generated/seaborn.heatmap.html

get the date format on a Matplotlib plot's x-axis

I generate a plot using the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
index=pd.date_range('2018-01-01',periods=200)
data=pd.Series(np.random.randn(200),index=index)
plt.figure()
plt.plot(data)
Which gives me a plot, looking as follows:
It looks like Matplotlib has decided to format the x-ticks as %Y-%m (source)
I am looking for a way to retrieve this date format. A function like ax.get_xtickformat(), which would then return %Y-%m. Which is the smartest way to do this?
There is no built-in way to obtain the date format used to label the axes. The reason is that this format is determined at drawtime and may even change as you zoom in or out of the plot.
However you may still determine the format yourself. This requires to draw the figure first, such that the ticklocations are fixed. Then you may query the formats used in the automatic formatting and select the one which would be chosen for the current view.
Note that the following assumes that an AutoDateFormatter or a formatter subclassing this is in use (which should be the case by default).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
index=pd.date_range('2018-01-01',periods=200)
data=pd.Series(np.random.randn(200),index=index)
plt.figure()
plt.plot(data)
def get_fmt(axis):
axis.axes.figure.canvas.draw()
formatter = axis.get_major_formatter()
locator_unit_scale = float(formatter._locator._get_unit())
fmt = next((fmt for scale, fmt in sorted(formatter.scaled.items())
if scale >= locator_unit_scale),
formatter.defaultfmt)
return fmt
print(get_fmt(plt.gca().xaxis))
plt.show()
This prints %Y-%m.
If you want to edit the format of the date in myFmt = DateFormatter("%d-%m-%Y"):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
index=pd.date_range('2018-01-01',periods=200)
data=pd.Series(np.random.randn(200),index=index)
fig, ax = plt.subplots()
ax.plot(index, data)
myFmt = DateFormatter("%d-%m-%Y")
ax.xaxis.set_major_formatter(myFmt)
fig.autofmt_xdate()
plt.show()

How to mark the beginning of a new year while plotting pandas Series?

I am plotting such data:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
a = pd.DatetimeIndex(start='2010-01-01',end='2011-06-01' , freq='M')
b = pd.Series(np.random.randn(len(a)), index=a)
I would like the plot to be in the format of bars, so I use this:
b.plot(kind='bar')
This is what I get:
As you can see, the dates are formatted in full, which is very ugly and unreadable. I happened to test this command which creates a very nice Date format:
b.plot()
As you can see:
I like this format very much, it includes the months, marks the beginning of the year and is easily readable.
After doing some search, the closest I could get to that format is using this:
fig, ax = plt.subplots()
ax.plot(b.index, b)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
However the output looks like this:
I am able to have month names on x axis this way, but I like the first formatting much more. That is much more elegant. Does anyone know how I can get the same exact xticks for my bar plot?
Here's a solution that will get you the format you're looking for. You can edit the tick labels directly, and use set_major_formatter() method:
fig, ax = plt.subplots()
ax.bar(b.index, b)
ticklabels = [item.strftime('%b') for item in b.index] #['']*len(b.index)
ticklabels[::12] = [item.strftime('%b\n%Y') for item in b.index[::12]]
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
ax.set_xticks(b.index)
plt.gcf().autofmt_xdate()
Output:

Categories