same scale of Y axis on differents figures - python

I try to plot different data with similar representations but slight different behaviours and different origins on several figures. So the min & max of the Y axis is different between each figure, but the scale too.
e.g. here are some extracts of my batch plotting :
Does it exists a simple way with matplotlib to constraint the same Y step on those different figures, in order to have an easy visual interpretation, while keeping an automatically determined Y min and Y max ?
In others words, I'd like to have the same metric spacing between each Y-tick

you could use a MultipleLocator from the ticker module on both axes to define the tick spacings:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
fig=plt.figure()
ax1=fig.add_subplot(211)
ax2=fig.add_subplot(212)
ax1.set_ylim(0,100)
ax2.set_ylim(40,70)
# set ticks every 10
tickspacing = 10
ax1.yaxis.set_major_locator(ticker.MultipleLocator(base=tickspacing))
ax2.yaxis.set_major_locator(ticker.MultipleLocator(base=tickspacing))
plt.show()
EDIT:
It seems like your desired behaviour was different to how I interpreted your question. Here is a function that will change the limits of the y axes to make sure ymax-ymin is the same for both subplots, using the larger of the two ylim ranges to change the smaller one.
import matplotlib.pyplot as plt
import numpy as np
fig=plt.figure()
ax1=fig.add_subplot(211)
ax2=fig.add_subplot(212)
ax1.set_ylim(40,50)
ax2.set_ylim(40,70)
def adjust_axes_limits(ax1,ax2):
yrange1 = np.ptp(ax1.get_ylim())
yrange2 = np.ptp(ax2.get_ylim())
def change_limits(ax,yr):
new_ymin = ax.get_ylim()[0] - yr/2.
new_ymax = ax.get_ylim()[1] + yr/2.
ax.set_ylim(new_ymin,new_ymax)
if yrange1 > yrange2:
change_limits(ax2,yrange1-yrange2)
elif yrange2 > yrange1:
change_limits(ax1,yrange2-yrange1)
else:
pass
adjust_axes_limits(ax1,ax2)
plt.show()
Note that the first subplot here has expanded from (40, 50) to (30, 60), to match the y range of the second subplot

The answer of Tom is pretty fine !
But I decided to use a simpler solution
I define an arbitrary yrange for all my plots e.g.
yrang = 0.003
and for each plot, I do :
ymin, ymax = ax.get_ylim()
ymid = np.mean([ymin,ymax])
ax.set_ylim([ymid - yrang/2 , ymid + yrang/2])
and possibly:
ax.yaxis.set_major_locator(ticker.MultipleLocator(base=0.005))

Related

How to set y-scale when making a boxplot with dataframe

I have a column of data with a very large distribution and thus I log2-transform it before plotting and visualizing it. This works fine but I cannot seem to figure out how to set the y-scale to the exponential values of 2 (instead I have just the exponents themselves).
df['num_ratings_log2'] = df['num_ratings'] + 1
df['num_ratings_log2'] = np.log2(df['num_ratings_log2'])
df.boxplot(column = 'num_ratings_log2', figsize=(10,10))
As the scale, I would like to have 1 (2^0), 32 (2^5), 1024 (2^1) ... instead of 0, 5, 10 ...
I want everything else about the plot to stay the same. How can I achieve this?
Instead of taking the log of the data, you can create a normal boxplot and then set a log scale on the y-axis (ax.set_yscale('log'), or symlog to also represent zero). To get the ticks at powers of 2 (instead of powers of 10), use a LogLocator with base 2. A ScalarFormatter shows the values as regular numbers (instead of as powers such as 210). A NullLocator for the minor ticks suppresses undesired extra ticks.
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, LogLocator, NullLocator
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'num_ratings': (np.random.pareto(10, 10000) * 800).astype(int)})
ax = df.boxplot(column='num_ratings', figsize=(10, 10))
ax.set_yscale('symlog') # symlog also allows zero
# ax.yaxis.set_major_formatter(ScalarFormatter()) # show tick labels as regular numbers
ax.yaxis.set_major_formatter(lambda x, p: f'{int(x):,}')
ax.yaxis.set_minor_locator(NullLocator()) # remove minor ticks
plt.show()
Hope you are looking for below,
Code
ax = df.boxplot(column='num_ratings_log2', figsize=(20,10))
ymin = 0
ymax = 20
ax.set_ylim(2**ymin, 2**ymax)

Is it possible to test if the legend is covering any data in matplotlib/pyplot

Python beginner so apologies if incorrect terminology at any point.
I am using the legend(loc='best', ...) method and it works 99% of the time. However, when stacking more than 9 plots (i.e. i>9 in example below) on a single figure, with individual labels, it defaults to center and covers the data.
Is there a way to run a test in the script that will give a true/false value if the legend is covering any data points?
Very simplified code:
fig = plt.figure()
for i in data:
plt.plot(i[x, y], label=LABEL)
fig.legend(loc='best')
fig.savefig()
Example of legend covering data
One way is to add some extra space at the bottom/top/left or right side of the axis (in your case I would prefer top or bottom), by changing the limits slightly. Doing so makes the legend fit below the data. Add extra space by setting a different y-limit with ax.set_ylim(-3e-4, 1.5e-4) (the upper limit is approximately what it is in your figure and -3 is a estimate of what you need).
What you also need to do is to add split the legend into more columns, with the keyword ncol=N when creating the legend.
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.linspace(0, 1, 100)
y = 3.5 * x - 2
for i in range(9):
ax.plot(x, y + i / 10., label='iiiiiiiiiiii={}'.format(i))
ax.set_ylim(-3, 1.5)
ax.legend(loc='lower center', ncol=3) # ncol=3 looked nice for me, maybe you need to change this
plt.show()
EDIT
Another solution is to put the legend in a separate axis like I do in the code below. The data-plot does not need to care about making space for the legend or anything and you should have enough space in the axis below to put all your line-labels. If you need more space, you can easily change the ratio of the upper axis to the lower axis.
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(211)
ax_leg = fig.add_subplot(212)
x = np.linspace(0, 1, 100)
y = 3.5 * x - 2
lines = []
for i in range(9): #for plotting the actual data
li, = ax.plot(x, y + i / 10., label='iiiiiiiiiiii={}'.format(i))
lines.append(li)
for line in lines: # just to make the legend plot
ax_leg.plot([], [], line.get_color(), label=line.get_label())
ax_leg.legend(loc='center', ncol=3, ) # ncol=3 looked nice for me, maybe you need to change this
ax_leg.axis('off')
fig.show()

How to change the x-axis unit in matplotlib? [duplicate]

I am creating a plot in python. Is there a way to re-scale the axis by a factor? The yscale and xscale commands only allow me to turn log scale off.
Edit:
For example. If I have a plot where the x scales goes from 1 nm to 50 nm, the x scale will range from 1x10^(-9) to 50x10^(-9) and I want it to change from 1 to 50. Thus, I want the plot function to divide the x values placed on the plot by 10^(-9)
As you have noticed, xscale and yscale does not support a simple linear re-scaling (unfortunately). As an alternative to Hooked's answer, instead of messing with the data, you can trick the labels like so:
ticks = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x*scale))
ax.xaxis.set_major_formatter(ticks)
A complete example showing both x and y scaling:
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
# Generate data
x = np.linspace(0, 1e-9)
y = 1e3*np.sin(2*np.pi*x/1e-9) # one period, 1k amplitude
# setup figures
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
# plot two identical plots
ax1.plot(x, y)
ax2.plot(x, y)
# Change only ax2
scale_x = 1e-9
scale_y = 1e3
ticks_x = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale_x))
ax2.xaxis.set_major_formatter(ticks_x)
ticks_y = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale_y))
ax2.yaxis.set_major_formatter(ticks_y)
ax1.set_xlabel("meters")
ax1.set_ylabel('volt')
ax2.set_xlabel("nanometers")
ax2.set_ylabel('kilovolt')
plt.show()
And finally I have the credits for a picture:
Note that, if you have text.usetex: true as I have, you may want to enclose the labels in $, like so: '${0:g}$'.
Instead of changing the ticks, why not change the units instead? Make a separate array X of x-values whose units are in nm. This way, when you plot the data it is already in the correct format! Just make sure you add a xlabel to indicate the units (which should always be done anyways).
from pylab import *
# Generate random test data in your range
N = 200
epsilon = 10**(-9.0)
X = epsilon*(50*random(N) + 1)
Y = random(N)
# X2 now has the "units" of nanometers by scaling X
X2 = (1/epsilon) * X
subplot(121)
scatter(X,Y)
xlim(epsilon,50*epsilon)
xlabel("meters")
subplot(122)
scatter(X2,Y)
xlim(1, 50)
xlabel("nanometers")
show()
To set the range of the x-axis, you can use set_xlim(left, right), here are the docs
Update:
It looks like you want an identical plot, but only change the 'tick values', you can do that by getting the tick values and then just changing them to whatever you want. So for your need it would be like this:
ticks = your_plot.get_xticks()*10**9
your_plot.set_xticklabels(ticks)

Discrete pyplot scatter colobar

I am creating a scatterplot with a colorbar
plt.scatter(X, Y, c=Z)
plt.colorbar()
plt.show()
plt.close()
where X and Y are float arrays and Z is an integer array.
Even though Z is an integer array (here 1-14), the colorbar displays floats.
How can I display a discrete colorbar 1-14?
I found something attempting to answer a similar question here, but I don't understand the answer (containing some complications to make 0 be gray) well enough to apply it.
Check out the second answer to your linked question. If you discretize your colourmap before calling scatter, it will automatically work as you want it to:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
n = 14
X = np.random.rand(20)
Y = np.random.rand(20)
Z = np.random.randint(low=0,high=n,size=X.shape)
plt.figure()
plt.scatter(X,Y,c=Z,cmap=cm.hot)
plt.colorbar()
plt.figure()
plt.scatter(X,Y,c=Z,cmap=cm.get_cmap('hot',n))
plt.colorbar()
Results for comparison:
Note that the default colourmap is jet. But only until viridis kicks in starting from version 2.0 as the new (and wonderful) default.
If what's bothering you is that the numbers are floating-point on the colourbar, you can set manual ticks in it, irrespective of the discretization of colours:
plt.figure()
plt.scatter(X,Y,c=Z,cmap=cm.jet)
plt.colorbar(ticks=np.unique(Z))
#or
#plt.colorbar(ticks=range(Z.min(),Z.max()+1))
Result:
Note that since I used a few random-generated points, not every number is present in Z, so unique might not be the best approach (see the missing ticks in the above figure). This is why I also added a solution based on min/max. You can tailor the limits to your needs depending on your actual application.
Here is my discrete colorbar for land use type, it seems like your work,because the Z value is also an interger array from 1-14.
My method
creat the colormap and colorbar label manually learned from here
My Code
cMap = ListedColormap(['white', '#8dd3c7','#ffffb3','#bebada', \
'#b2182b','#80b1d3','#fdb462','#b3de69','#6a3d9a',\
'#b2df8a', '#1f78b4', '#ccebc5','#ffed6f'])
## If you want to use the colormap from plt.cm..., you can use(take 'jet' for example)
cMap = plt.cm.get_cmap("jet",lut=13)
### here you can change your data in
lulc = plt.pcolormesh(lulc,cmap = cMap,alpha = 0.7)
z_range = np.linspace(1,14,14)
list = z_range.astype('S10')
k = -0.05
for i in range(0,13,1):
k = k + 1/13.0
ax.annotate(list[i],xycoords='axes fraction',xy=(1.12,k),fontsize = 14, \
fontstyle = 'italic',zorder =3)
cbar = plt.colorbar(lulc,ticks = [ ])
for label in cbar.ax.yaxis.get_ticklabels()[::-1]:
label.set_visible(False)
My result
(source: tietuku.com)
Wish it can help!

What is wrong with this matplotlib code?

I am trying to plot datetime on y axis and time on x-axis using a bar graph. I need to specify the heights in terms of datetime of y-axis and I am not sure how to do that.
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import datetime as dt
# Make a series of events 1 day apart
y = mpl.dates.drange(dt.datetime(2009,10,1),
dt.datetime(2010,1,15),
dt.timedelta(days=1))
# Vary the datetimes so that they occur at random times
# Remember, 1.0 is equivalent to 1 day in this case...
y += np.random.random(x.size)
# We can extract the time by using a modulo 1, and adding an arbitrary base date
times = y % 1 + int(y[0]) # (The int is so the y-axis starts at midnight...)
# I'm just plotting points here, but you could just as easily use a bar.
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(times, y, width = 10)
ax.yaxis_date()
fig.autofmt_ydate()
plt.show()
I think it could be your y range.
When I do your code I get a plot with the y axis ranging from about 11/Dec/2011 to 21/Dec/2011.
However, you've generated dates ranging from 1/10/2009 to 15/1/2010.
When I adjusted my y limits to take this into account it worked fine. I added this after the ax.bar.
plt.ylim( (min(y)-1,max(y)+1) )
Another reason the output is confusing is that since you've picked a width of 10, the bars are too wide and are actually obscuring each other.
Try use ax.plot(times,y,'ro') to see what I mean.
I produced the following plot using ax.bar(times,y,width=.1,alpha=.2) and ax.plot(times,y,'ro') to show you what I meant about bars overlapping each other:
And that's with a width of .1 for the bars, so if they had a width of 10 they'd be completely obscuring each other.

Categories