I have a log-log plot where the range goes from 10^-3 to 10^+3. I would like values ≥10^0 to have a + sign in the exponent analogous to how values <10^0 have a - sign in the exponent. Is there an easy way to do this in matplotlib?
I looked into FuncFormatter but it seems overly complex to achieve this and also I couldn't get it to work.
You can do this with a FuncFormatter from the matplotlib.ticker module. You need a condition on whether the tick's value is greater than or less than 1. So, if log10(tick value) is >0, then add the + sign in the label string, if not, then it will get its minus sign automatically.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
# sample data
x = y = np.logspace(-3,3)
# create a figure
fig,ax = plt.subplots(1)
# plot sample data
ax.loglog(x,y)
# this is the function the FuncFormatter will use
def mylogfmt(x,pos):
logx = np.log10(x) # to get the exponent
if logx < 0:
# negative sign is added automatically
return u"$10^{{{:.0f}}}$".format(logx)
else:
# we need to explicitly add the positive sign
return u"$10^{{+{:.0f}}}$".format(logx)
# Define the formatter
formatter = ticker.FuncFormatter(mylogfmt)
# Set the major_formatter on x and/or y axes here
ax.xaxis.set_major_formatter(formatter)
ax.yaxis.set_major_formatter(formatter)
plt.show()
Some explanation of the format string:
"$10^{{+{:.0f}}}$".format(logx)
the double braces {{ and }} are passed to LaTeX, to signify everything within them should be raised as an exponent. We need double braces, because the single braces are used by python to contain the format string, in this case {:.0f}. For more explanation of format specifications, see the docs here, but the TL;DR for your case is we are formatting a float with a precision of 0 decimal places (i.e. printing it essentially as an integer); the exponent is a float in this case because np.log10 returns a float. (one could alternatively convert the output of np.log10 to an int, and then format the string as an int - just a matter of your preference which you prefer).
I hope this is what you mean:
def fmt(y, pos):
a, b = '{:.2e}'.format(y).split('e')
b = int(b)
if b >= 0:
format_example = r'$10^{+{}}$'.format(b)
else:
format_example = r'$10^{{}}$'.format(b)
return
Then use FuncFormatter, e.g. for a colorbar: plt.colorbar(name_of_plot,ticks=list_with_tick_locations, format = ticker.FuncFormatter(fmt)). I think you have to import import matplotlib.ticker as ticker.
Regards
Related
I was plotting a scatter plot to show null values in dataframe. As you can see the plt.scatter() function is not expressive enough. Relation between list(range(0,1200)) and 'a' is not clear unless you see the previous lines. Can the plt.scatter(x,y) be written in a more explicit way where it could be easily understood how x and y is related. Like if somebody only see the plt.scatter(x,y) , they would understand what it is about.
a = []
for i in range(0,1200):
feature_with_na = [feature for feature in df.columns if df[feature].isnull().sum()>i]
a.append(len(feature_with_na))
plt.scatter(list(range(0,1200)), a)
On your x axis you have the number, then on the y-axis you want to plot the number of columns in your DataFrame that have more than that number of null values.
Instead of your loop you can count the number of null values within each column and use numpy.broadcasting, ([:, None]), to compare with an array of your numbers. This allows you to specify an xarr of the numbers, then you use that same array in the comparison.
Sample Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plot
df = pd.DataFrame(np.random.choice([1,2,3,4,5,np.NaN], (100,10)))
Code
# Range of 'x' values to consider
xarr = np.arange(0, 100)
plt.scatter(xarr, (df.isnull().sum().to_numpy()>xarr[:, None]).sum(axis=1))
ALollz answer is good, but here's a less numpy-heavy alternative if that's your thing:
feature_null_counts = df.isnull().sum()
n_nulls = list(range(100))
features_with_n_nulls = [sum(feature_null_counts > n) for n in n_nulls]
plt.scatter(n_nulls, features_with_n_nulls)
I want to change x axis scale. For example, I am reading a data from txt file.
This data is like a=[ 1,2,5,9,12,17] and I want to convert to this number this scale[0,3]. I mean this data a=[ 1,2,5,9,12,17] has 6 number but I need to scale these number in [0,3] so that my axis should only be c=[0,3].I have other data c=[1,2,3,4,5,6]. I plot this data in normal way plot(a,b) but I want to scale this like plot(c,b). I don't know which function I will use for that.
Other question, I used plt.axhline(y=0.005), I want to change with linestyle='-' because otherwise giving continues line. How can I put max and minimum threshold with '-' ?
Second question answer:
import matplotlib.pyplot as plt
plt.axhline(y=0.5, color='b', linestyle='--',linewidth=1)
plt.axhline(y=-0.5, color='b', linestyle='--',linewidth=1)
plt.show()` I solved my second question like this.
If NumPy is available you can use the interp function to generate your scaled values (docs):
import numpy as np
scaled_a = np.interp(a, (min(a), max(a)), c)
The scaled_a variable is a NumPy array that can be passed to matplotlib's plot function in place of the original a variable.
If NumPy is not available you'll have to do a bit of arithmetic to calculate the new values:
def scaler(x, old_min, old_max, new_min, new_max):
old_diff = old_max - old_min
new_diff = new_max - new_min
return ((x - old_min) * (new_diff / old_diff)) + new_min
old_min = min(a)
old_max = max(a)
scaled_a = [scaler(x, old_min, old_max, c[0], c[1]) for x in a]
The variable scaled_a is now a python list, but it can still be passed to the plot function.
When using scipy.ndimage.interpolation.shift to shift a numpy data array along one axis with periodic boundary treatment (mode = 'wrap'), I get an unexpected behavior. The routine tries to force the first pixel (index 0) to be identical to the last one (index N-1) instead of the "last plus one (index N)".
Minimal example:
# module import
import numpy as np
from scipy.ndimage.interpolation import shift
import matplotlib.pyplot as plt
# print scipy.__version__
# 0.18.1
a = range(10)
plt.figure(figsize=(16,12))
for i, shift_pix in enumerate(range(10)):
# shift the data via spline interpolation
b = shift(a, shift=shift_pix, mode='wrap')
# plotting the data
plt.subplot(5,2,i+1)
plt.plot(a, marker='o', label='data')
plt.plot(np.roll(a, shift_pix), marker='o', label='data, roll')
plt.plot(b, marker='o',label='shifted data')
if i == 0:
plt.legend(loc=4,fontsize=12)
plt.ylim(-1,10)
ax = plt.gca()
ax.text(0.10,0.80,'shift %d pix' % i, transform=ax.transAxes)
Blue line: data before the shift
Green line: expected shift behavior
Red line: actual shift output of scipy.ndimage.interpolation.shift
Is there some error in how I call the function or how I understand its behavior with mode = 'wrap'? The current results are in contrast to the mode parameter description from the related scipy tutorial page and from another StackOverflow post. Is there an off-by-one-error in the code?
Scipy version used is 0.18.1, distributed in anaconda-2.2.0
It seems that the behaviour you have observed is intentional.
The cause of the problem lies in the C function map_coordinate which translates the coordinates after shift to ones before shift:
map_coordinate(double in, npy_intp len, int mode)
The function is used as the subroutine in NI_ZoomShift that does the actual shift. Its interesting part looks like this:
Example. Lets see how the output for output = shift(np.arange(10), shift=4, mode='wrap') (from the question) is computed.
NI_ZoomShift computes edge values output[0] and output[9] in some special way, so lets take a look at computation of output[1] (a bit simplified):
# input = [0,1,2,3,4,5,6,7,8,9]
# output = [ ,?, , , , , , , , ] '?' == computed position
# shift = 4
output_index = 1
in = output_index - shift # -3
sz = 10 - 1 # 9
in += sz * ((-5 / 9) + 1)
# += 9 * (( 0) + 1) == 9
# in == 6
return input[in] # 6
It is clear that sz = len - 1 is responsible for the behaviour you have observed. It was changed from sz = len in a suggestively named commit dating back to 2007: Fix off-by-on errors in ndimage boundary routines. Update tests.
I don't know why such change was introduced. One of the possible explanations that come to my mind is as follows:
Function 'shift' uses splines for interpolation.
A knot vector of an uniform spline on interval [0, k] is simply [0,1,2,...,k]. When we say that the spline should wrap, it is natural to require equality on values for knots 0 and k, so that many copies of the spline could be glued together, forming a periodic function:
0--1--2--3-...-k 0--1--2--3-...-k 0--1-- ...
0--1--2--3-...-k 0--1--2--3-...-k ...
Maybe shift just treats its input as a list of values for spline's knots?
It is worth noting that this behavior appears to be a bug, as noted in this SciPy issue:
https://github.com/scipy/scipy/issues/2640
The issue appears to effect every extrapolation mode in scipy.ndimage other than mode='mirror'.
This is not a duplicate of this or this, as the answer there was not at all satisfactory to my problem, I don't want to deal with this per label. This is also not a duplicate of this as it doesn't deal with my specific problem.
I want to set the angular axis labels of polar plots, not one by one, but by a single time initialization method. This must be possible, as there appear to be ways to similar things with other axes types.
I knew how to do this before hand, but handn't seen the exact same question here and good solutions were also not found here. While I'm not sure if this is the best method, it is certainly better than setting the format per label!
So the solution I've found is using the FunctionFormatter. The definition is short, so i'll just paste it here.
class FuncFormatter(Formatter):
"""
Use a user-defined function for formatting.
The function should take in two inputs (a tick value ``x`` and a
position ``pos``), and return a string containing the corresponding
tick label.
"""
def __init__(self, func):
self.func = func
def __call__(self, x, pos=None):
"""
Return the value of the user defined function.
`x` and `pos` are passed through as-is.
"""
return self.func(x, pos)
This formatter class will allow us to create a function, pass it as an argument, and the output of that function will be the format of our plot angular labels.
You can then use PolarAxis.xaxis.set_major_formatter(formatter) to use your newly create formatter and only the angular axis labels will be changed. The same thing can be done with the yaxis attribute instead, and will cause the inner radial labels to change as well.
Here is what our function looks like that we will pass:
def radian_function(x, pos =None):
# the function formatter sends
rad_x = x/math.pi
return "{}π".format(str(rad_x if rad_x % 1 else int(rad_x)))
it uses standard python formatting strings as an output, getting rid of unnecessary decimals and appending the pi symbol to the end of the string to keep it in terms of pi.
The full program looks like this:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math
def radian_function(x, pos =None):
# the function formatter sends
rad_x = x/math.pi
return "{}π".format(str(rad_x if rad_x % 1 else int(rad_x)))
ax = plt.subplot(111, projection='polar')
ax.set_rmax(4)
ax.set_rticks([1, 2, 3, 4])
ax.grid(True)
ax.set_title("Polar axis label example", va='bottom')
# sets the formatter for the entire set of angular axis labels
ax.xaxis.set_major_formatter(ticker.FuncFormatter(radian_function))
# sets the formatter for the radius inner labels.
#ax.yaxis.set_major_formatter(ticker.FuncFormatter(radian_function))
plt.show()
which outputs
You could further improve the formatter to check for one (so that 1π is simply shown as π) or check for 0 in a similar fashion. You can even use the position variable (which I left out since it was unnecessary) to further improve visual formatting.
such a function might look like this:
def radian_function(x, pos =None):
# the function formatter sends
rad_x = x/math.pi
if rad_x == 0:
return "0"
elif rad_x == 1:
return "π"
return "{}π".format(str(rad_x if rad_x % 1 else int(rad_x)))
I currently implement a 2D plot which shall be used to relate those two values to a visual "landscape":
x-axis: huge binary discrete values with a length of up to
3000 digits (2^3000)
y-axis: calculated value (no problem)
It seems that matplotlib can not handle such huge values.
As it represents a landscape, the values itself are not important. What is important is a visual representation of the function itself.
I tried to log-scale the values, which did not solve the problem. This is the current code:
import numpy as np
import matplotlib.pyplot as plt
'''
convert binary list to gray code to maintain hamming distance
'''
def indtogray(self, ind):
return ind[:1] + [i ^ ishift for i, ishift in zip(ind[:-1], ind[1:])]
'''
Create int from gray value
'''
def graytoint(self, gray):
i = 0
for bit in gray:
i = i*2 + bit
return i
'''
Create example list of binary lists
'''
def create(self, n, size):
return [[np.random.randint(2) for _ in range(size)] for _ in range(n)]
def showPlot(self, toolbox, neval):
individuals = self.create(100, 2000)
fitnesses = map(np.sum, individuals)
fig,ax = plt.subplots()
values = map(self.graytoint, map(self.indtogray, individuals))
full = zip(*sorted(zip(values, fitnesses)))
line = ax.plot(full[0], full[1], 'r-')
plt.show()
if __name__ == '__main__':
show()
I get the following error:
OverflowError: long int too large to convert to float
Anyone an idea?
The error just means that your number is too big and it cannot convert it to float. Things you can do is take the logarithm of x.
Now if you have up to 3000 binary digits, this means that the largest decimal number is pow(2,3000). If you take log(pow(2,3000), you should get 2079.44154~ which you should then be able convert to a float. I would double check to see if the number you have is in binary format but in decimal representation. Meaning if x[0] = 10, make sure that it is ten and not 2 in binary. Otherwise, a 2^3000 number in binary format would be very large.