Matplotlib: how to locate ticks and showing min and max of data

Matplotlib: how to locate ticks and showing min and max of data - python

Good day,
I would like to dynamically locate my ticks and showing the min and max of the data (which is varying, thus I really can't harcode the conditions). I'm trying to use matplotlib.ticker functions and the best that I can find is MaxNLocator().. but unfortunately, it does not consider the limits of my dataset.
What would be the best approach to my problem?
Thanks!
pseudocode as follows:
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
data1 = range(5)
ax1 = plt.subplot(2,1,1)
ax1.plot(data1)
data2 = range(63)
ax2 = plt.subplot(2,1,2)
ax2.plot(data2)
ax1.xaxis.set_major_locator(MaxNLocator(integer=True))
ax2.xaxis.set_major_locator(MaxNLocator(integer=True))
plt.show()
and the output is:

Not sure about best approach, but one possible way to do this would be to create a list of numbers between your minimum and maximum using numpy.linspace(start, stop, num). The third argument passed to this lets you control the number of points generated. You can then round these numbers using a list comprehension, and then set the ticks using ax.set_xticks().
Note: This will produce unevenly distributed ticks in some cases, which may be unavoidable in your case
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import numpy as np
data1 = range(5)
ax1 = plt.subplot(2,1,1)
ax1.plot(data1)
data2 = range(63) # max of this is 62, not 63 as in the question
ax2 = plt.subplot(2,1,2)
ax2.plot(data2)
ticks1 = np.linspace(min(data1),max(data1),5)
ticks2 = np.linspace(min(data2),max(data2),5)
int_ticks1 = [round(i) for i in ticks1]
int_ticks2 = [round(i) for i in ticks2]
ax1.set_xticks(int_ticks1)
ax2.set_xticks(int_ticks2)
plt.show()
This gives:
Update: This will give a maximum numbers of ticks of 5, however if the data goes from say range(3) then the number of ticks will be less. I have updates the creating of int_ticks1 and int_ticks2 so that only unique values will be used to avoid repeated plotting of certain ticks if the range is small
Using the following data
data1 = range(3)
data2 = range(3063)
# below removes any duplicate ticks
int_ticks1 = list(set([int(round(i)) for i in ticks1]))
int_ticks2 = list(set([int(round(i)) for i in ticks2]))
This produces the following figure:

Related

How to set y-scale when making a boxplot with dataframe

I have a column of data with a very large distribution and thus I log2-transform it before plotting and visualizing it. This works fine but I cannot seem to figure out how to set the y-scale to the exponential values of 2 (instead I have just the exponents themselves).
df['num_ratings_log2'] = df['num_ratings'] + 1
df['num_ratings_log2'] = np.log2(df['num_ratings_log2'])
df.boxplot(column = 'num_ratings_log2', figsize=(10,10))
As the scale, I would like to have 1 (2^0), 32 (2^5), 1024 (2^1) ... instead of 0, 5, 10 ...
I want everything else about the plot to stay the same. How can I achieve this?

Instead of taking the log of the data, you can create a normal boxplot and then set a log scale on the y-axis (ax.set_yscale('log'), or symlog to also represent zero). To get the ticks at powers of 2 (instead of powers of 10), use a LogLocator with base 2. A ScalarFormatter shows the values as regular numbers (instead of as powers such as 210). A NullLocator for the minor ticks suppresses undesired extra ticks.
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, LogLocator, NullLocator
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'num_ratings': (np.random.pareto(10, 10000) * 800).astype(int)})
ax = df.boxplot(column='num_ratings', figsize=(10, 10))
ax.set_yscale('symlog') # symlog also allows zero
# ax.yaxis.set_major_formatter(ScalarFormatter()) # show tick labels as regular numbers
ax.yaxis.set_major_formatter(lambda x, p: f'{int(x):,}')
ax.yaxis.set_minor_locator(NullLocator()) # remove minor ticks
plt.show()

Hope you are looking for below,
Code
ax = df.boxplot(column='num_ratings_log2', figsize=(20,10))
ymin = 0
ymax = 20
ax.set_ylim(2**ymin, 2**ymax)

How can I create a list of the values on the y-axis without having to plot a graph in Python?

I have a piece of code that plots a random walk with a specified number of bins on my y-axis. Is there a way in Python to replicate/recreate the values on my y-axis, without having to plot the graph? Below is the code I've been working on and the method I've tried is to divide the min-max range by the number of
wanted bins and thereafter create a list with these values. However, I find my method far from optimal and not close to the results I get by using the below code.
I am greatful for any help on this matter!
import matplotlib.pyplot as plt
import numpy as np
import random
dims = 1
step_n = 2000
step_set = [-1, 0, 1]
origin = np.zeros((1,dims))
random.seed(30)
step_shape = (step_n,dims)
steps = np.random.choice(a=step_set, size=step_shape)
path = np.concatenate([origin, steps]).cumsum(0)
# create subplot
fig, ax = plt.subplots(1,1, figsize=(20, 11))
img = ax.plot(path)
plt.locator_params(axis='y', nbins=20)
y_values = ax.get_yticks() # y_values is a numpy array with my y values

I am not sure, if I understood your problem correctly.
Matplotlib defines the differences between the ticks in a way, that I assume are mostly multiples of 5.
But a general approach could be, to calculate a padding based on the bins you want and add/subtract it. For your given example the following gives the same result as ax.get_yticks()
bins = 19
padding = np.ceil((np.max(path) - np.min(path)) / bins)
np.linspace(np.min(path) - padding, np.max(path) + padding, bins, dtype=np.int32)

Getting and setting maximum and minimum values from a Cartopy GeoAxesSubplot object

I have an array of subplots that I would like to share a colorbar through a post-processing step. When plotting 1-d data, I can do this by iterating over the axes after creating the data and using get_ylims() and set_ylims() to find, and then set the correct global minimum and maximum values.
When working with Cartopy GeoAxesSubplot objects, however, I haven't been able to find functions to retrieve or set the z-axis limits. The function get_ylims corresponds to the plot rather than the data now.
I am trying to avoid taking the extra step to calculate vmin and vmax beforehand, because the processing for each subplot takes quite a long time and I would not like to do it twice. I would much rather adjust the geoaxes in a post-processing step.
Simply, how do I get from the first figure to the second figure if I am only given the first figure?
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
# Create random data
data=[]
for i in range(4):
data.append(i + np.random.random((10,10)))
# Plot with individual colorbars
fig,ax = plt.subplots(nrows=2, ncols=2, subplot_kw={'projection':ccrs.NorthPolarStereo()})
for _ax,_dat in zip(ax.flat,data):
im = _ax.imshow(_dat)
plt.colorbar(im,ax=_ax)
fig.suptitle('Before.')
plt.show()
# Plot with a shared colorbar
fig2,ax2 = plt.subplots(nrows=2, ncols=2, subplot_kw={'projection':ccrs.NorthPolarStereo()})
for _ax,_dat in zip(ax2.flat,data):
im = _ax.imshow(_dat, vmin=0, vmax=4)
fig2.colorbar(im, ax=ax2.ravel().tolist())
fig2.suptitle('After.')
plt.show()

I ended up solving this by using the get_clim() and set_clim() functions of the matplotlib.collections.QuadMesh object.
I iterate over the axes, and then iterate over the components using get_children(). When I identify a QuadMesh object, I save it to a list. Finally, I iterate over that list twice, first to calculate the global minimum and maximum values, and then to set each subplot to those values.
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy as cpy
import matplotlib as mpl
geoaxes = figure.axes
qms = [] # to store QuadMesh object
for i in geoaxes: # iterate over axes and find QuadMesh objects
for j in i.get_children(): # breaks down a single axis (?) into components
if isinstance(j, mpl.collections.QuadMesh):
qms.append(j)
# Calculate global min and max values
min,max = qms[0].get_clim() # initialize min/max
for _qm in qms:
_clim = _qm.get_clim()
if _clim[0] < min:
min = _clim[0]
if _clim[1] > max:
max = _clim[1]
print(_clim)
# Set common bounds for each QuadMesh:
for _qm in qms:
_qm.set_clim((min, max))

x-axis labelling with matplotlib

I have a two dimensional (numpy)array and I plot the first column with the command plt.plot(wp[:, 0]). This shows exactly what I want and there is nothing I want to change besides the x axis labelling. For the x axis I am searching for a command which shows the area where the the value of the second column is the same and also which displays the y-number of this area.
[x1,y1]
[x2,y2]
[x3,y2]
[x4,y3]
[x5,y3]
[x6,y3]
[x7,y4]
As u can the see in my example matrix, the entries in the second column are not unique but instead there are "regions" with the same value.
Edit: So plt.xticks(tx, wp[:,2], rotation='vertical')does work for smaller matrices but looks really ugly for larger ones:
So in my opinion it would be enough if each number would just occur once. Do you know how to do that?

You'll have to:
Customize the number of ticks
Customize what to print when for a certain value
Modified from the examples:
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter, MaxNLocator
fig = plt.figure()
ax = fig.add_subplot(111)
xs = range(100)
ys = range(100)
def format_fn(tick_val, tick_pos):
return '{0}'.format(int(tick_val))[:1]
ax.xaxis.set_major_formatter(FuncFormatter(format_fn))
ax.xaxis.set_major_locator(MaxNLocator(nbins=6,integer=True))
ax.plot(xs, ys)
plt.show()

Pyplot: using percentage on x axis

I have a line chart based on a simple list of numbers. By default the x-axis is just the an increment of 1 for each value plotted. I would like to be a percentage instead but can't figure out how. So instead of having an x-axis from 0 to 5, it would go from 0% to 100% (but keeping reasonably spaced tick marks. Code below. Thanks!
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid.axislines import Subplot
data=[8,12,15,17,18,18.5]
fig=plt.figure(1,(7,4))
ax=Subplot(fig,111)
fig.add_subplot(ax)
plt.plot(data)

The code below will give you a simplified x-axis which is percentage based, it assumes that each of your values are spaces equally between 0% and 100%.
It creates a perc array which holds evenly-spaced percentages that can be used to plot with. It then adjusts the formatting for the x-axis so it includes a percentage sign using matplotlib.ticker.FormatStrFormatter. Unfortunately this uses the old-style string formatting, as opposed to the new style, the old style docs can be found here.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()

This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you can do as follows to set the axis:
import matplotlib.ticker as mtick
# Actual plotting code omitted
ax.xaxis.set_major_formatter(mtick.PercentFormatter(5.0))
This will display values from 0 to 5 on a scale of 0% to 100%. The formatter is similar in concept to what #Ffisegydd suggests doing except that it can take any arbitrary existing ticks into account.
PercentFormatter() accepts three arguments, max, decimals, and symbol. max allows you to set the value that corresponds to 100% on the axis (in your example, 5).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Note that this formatter will use whatever ticks would normally be generated if you just plotted your data. It does not modify anything besides the strings that are output to the tick marks.
Update
PercentFormatter was accepted into Matplotlib in version 2.1.0.

Totally late in the day, but I wrote this and thought it could be of use:
def transformColToPercents(x, rnd, navalue):
# Returns a pandas series that can be put in a new dataframe column, where all values are scaled from 0-100%
# rnd = round(x)
# navalue = Nan== this
hv = x.max(axis=0)
lv = x.min(axis=0)
pp = pd.Series(((x-lv)*100)/(hv-lv)).round(rnd)
return pp.fillna(navalue)
df['new column'] = transformColToPercents(df['a'], 2, 0)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib: how to locate ticks and showing min and max of data - python

Related

How to set y-scale when making a boxplot with dataframe

How can I create a list of the values on the y-axis without having to plot a graph in Python?

Getting and setting maximum and minimum values from a Cartopy GeoAxesSubplot object

x-axis labelling with matplotlib

Pyplot: using percentage on x axis

Categories

Resources