Plot negative values on a log scale - python

I am doing some analysis to calculate the value of log_10(x) which is a negative number. I am now trying to plot these values, however, since the range of the answers is very large I would like to use a logarithmic scale for this. If I simply use plt.yscale('log') I get a message telling me UserWarning: Data has no positive values, and therefore cannot be log-scaled. I also cannot supply the values of x to plt.plot as the result of log_10(x) is so large and negative that the answer of x**(log_10(x)) is simply 0.
What might be the most straightforward way of plotting this data?

You can use
plt.yscale('symlog')
to set the scale to a symmetic log scale. This means that it will scale logarithmically to both sides of 0. Only using the negative part of the symlog scale would work just fine.

Two alternatives to ImportanceOfBeingErnest's solution:
Plot -log_10(x) on a semilog y axis and set the y-label to display negative units
Plot -log_10(-log_10(x)) on a linear scale
However, in all cases (including the solution proposed by ImportanceOfBeingErnest), the interpretation is not straightforward since you are displaying or calculating the log of a log.
Finally, in order to return the value for x, you need to calculate 10**(log_10(x)) not x**(log_10(x))

Related

Seaborn distplot for data with high SD [duplicate]

In matplotlib, I can set the axis scaling using either pyplot.xscale() or Axes.set_xscale(). Both functions accept three different scales: 'linear' | 'log' | 'symlog'.
What is the difference between 'log' and 'symlog'? In a simple test I did, they both looked exactly the same.
I know the documentation says they accept different parameters, but I still don't understand the difference between them. Can someone please explain it? The answer will be the best if it has some sample code and graphics! (also: where does the name 'symlog' come from?)
I finally found some time to do some experiments in order to understand the difference between them. Here's what I discovered:
log only allows positive values, and lets you choose how to handle negative ones (mask or clip).
symlog means symmetrical log, and allows positive and negative values.
symlog allows to set a range around zero within the plot will be linear instead of logarithmic.
I think everything will get a lot easier to understand with graphics and examples, so let's try them:
import numpy
from matplotlib import pyplot
# Enable interactive mode
pyplot.ion()
# Draw the grid lines
pyplot.grid(True)
# Numbers from -50 to 50, with 0.1 as step
xdomain = numpy.arange(-50,50, 0.1)
# Plots a simple linear function 'f(x) = x'
pyplot.plot(xdomain, xdomain)
# Plots 'sin(x)'
pyplot.plot(xdomain, numpy.sin(xdomain))
# 'linear' is the default mode, so this next line is redundant:
pyplot.xscale('linear')
# How to treat negative values?
# 'mask' will treat negative values as invalid
# 'mask' is the default, so the next two lines are equivalent
pyplot.xscale('log')
pyplot.xscale('log', nonposx='mask')
# 'clip' will map all negative values a very small positive one
pyplot.xscale('log', nonposx='clip')
# 'symlog' scaling, however, handles negative values nicely
pyplot.xscale('symlog')
# And you can even set a linear range around zero
pyplot.xscale('symlog', linthreshx=20)
Just for completeness, I've used the following code to save each figure:
# Default dpi is 80
pyplot.savefig('matplotlib_xscale_linear.png', dpi=50, bbox_inches='tight')
Remember you can change the figure size using:
fig = pyplot.gcf()
fig.set_size_inches([4., 3.])
# Default size: [8., 6.]
(If you are unsure about me answering my own question, read this)
symlog is like log but allows you to define a range of values near zero within which the plot is linear, to avoid having the plot go to infinity around zero.
From http://matplotlib.sourceforge.net/api/axes_api.html#matplotlib.axes.Axes.set_xscale
In a log graph, you can never have a zero value, and if you have a value that approaches zero, it will spike down way off the bottom off your graph (infinitely downward) because when you take "log(approaching zero)" you get "approaching negative infinity".
symlog would help you out in situations where you want to have a log graph, but when the value may sometimes go down towards, or to, zero, but you still want to be able to show that on the graph in a meaningful way. If you need symlog, you'd know.
Here's an example of behaviour when symlog is necessary:
Initial plot, not scaled. Notice how many dots cluster at x~0
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
[
'
Log scaled plot. Everything collapsed.
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(xlabel='Score, log', ylabel='Total Amount Deposited, log')
'
Why did it collapse? Because of some values on the x-axis being very close or equal to 0.
Symlog scaled plot. Everything is as it should be.
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
ax.set_xscale('symlog')
ax.set_yscale('symlog')
ax.set(xlabel='Score, symlog', ylabel='Total Amount Deposited, symlog')

2D histogram colour by "label fraction" of data in each bin

Following on from the post found here: 2D histogram coloured by standard deviation in each bin
I would like to colour each bin in a 2D grid by the fraction of points whose label values are below a certain threshold in Python.
Note that, in this dataset, each point has a continuous label value between 0-1.
For example here is a histogram I made whereby the colour denotes the standard deviation of label values of all points in each bin:
The way this was done was by using
scipy.stats.binned_statistic_2d()
(see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html)
..and setting the statistic argument to 'std'
But is there a way to change this kind of plot so that the colouring is representative of the fraction of points in each bin with label value below 0.5 for example?
It could be that the only way to do this is by explicitly defining a grid of some kind and calculating the fractions but I'm not sure of the best way to do that so any help on this matter would be greatly appreciated!
Maybe using scipy.stats.binned_statistic_2d or numpy.histogram2d and being able to return the raw data values in each bin as a multi dimensional array would help in being able to quickly compute the fractions explicitly.
The fraction of elements in an array below a threshold can be calculated as
fraction = lambda a, threshold: len(a[a<threshold])/len(a)
Hence you can call
scipy.stats.binned_statistic_2d(x, y, values, statistic=lambda a: fraction(a, 0.5))

matplotlib: get axis ratio of plot

I need to produce scatter plots for several 2D data sets automatically.
By default the aspect ratio is set ax.set_aspect(aspect='equal'), which most of the times works because the x,y values are distributed more or less in a squared region.
Sometimes though, I encounter a data set that, when plotted with the equal ratio, looks like this:
i.e.: too narrow in a given axis. For the above image, the axis are approximately 1:8.
In such a case, an aspect ratio of ax.set_aspect(aspect='auto') would result in a much better plot:
Now, I don't want to set aspect='auto' as my default for all data sets because using aspect='equal' is actually the correct way of displaying such a scatter plot.
I need to fall back to using ax.set_aspect(aspect='auto') only for cases such as the one above.
The question: is there a way to know before hand if the aspect ratio of a plot will be too narrow if aspect='equal' is used? Like getting the actual aspect ratio of the plotted data set.
This way, based on such a number, I can adjust the aspect ratio to something more sane looking (i.e.: auto or some other aspect ratio) instead of 'equal'.
Something like this ought to do,
aspect = (max(x) - min(x)) / (max(y) - min(y))
The axes method get_data_ratio gives the aspect ratio of the bounds of your data as displayed.¹
ax.get_data_ratio()
for example:
M = 4.0
ax.set_aspect('equal' if 1/M < ax.get_data_ratio() < M else 'auto')
¹This is the reciprocal of #farenorth's answer when the axes are zoomed right around the data, i.e., when max(y) == max(ax.get_ylim()) since it is calculated using the ranges in ax.get_ybound and ax.get_xbound.

How to set pivots on colorbar at customisable location in Matplotlib?

I have some trouble in using Matplotlib colorbar, perhaps I am not understanding the documentation (I am not a native English speaker) or its core concept.
Suppose I have a matrix of data (shape, N*2). I want to make a scatter plot of this data and add a color scheme based on a column of label (N*1), in float. I know how to use colorbar and scalarmappable.
But, I am interested in some pivot values in this label column, and I wish to present these value in some interesting position of the colorbar. For example, label value 0, I want to position it at 1/3 place or in the middle -- which in the colorbar I choose could have a white or grey colour.
But if I understand it correctly, colorbar only takes data array that mapped in [0, 1] from the original data in [min, max]. In this case, the pivot value that I am interested would be end up in somewhere random, unless I define my normalisation function very carefully.
So to put the white colour I prefer for my pivot value is in the middle of the colour bar, I have to have defined the normalisation function which not only normalised my data, but also make the pivot value at the position of 0.5.
For my limited Matplotlib experience, this is the solution I know.
Ideally, suppose I have a column of float data, I could pick some pivot value, and give them some special position. and then I get them normalised and give to the colormap. The colorbar, however, I could set special colours for those special positions that I previous defined. and get a corresponding colorbar with the right tick locator and tick labels, that indicate my special pivot value.
I am looking for an easier way (from the standard lib) that I could use achieve this.
It will be very helpful if you can post a plot that you wish to make. But based on my understanding, you just want to do something to the colorbar at one or more particular spot. That is easy, the following cases shows a example of writing a text string at 0.5.
x1=np.random.random(1000)
x2=np.random.random(1000)
x3=np.random.random(1000)
plt.scatter(x1, x2, c=x3, marker='+')
cb=plt.colorbar()
color_norm=lambda x: (x-cb.vmin)/(cb.vmax-cb.vmin)
cb.ax.text(0.5, color_norm(0.5), 'Do something.\nRight here', color='r')
If you want to have value 0.5 at exactly 1/3 height of the colorbar, you need to adjust the colorbar limit using cb.set_clim((cmin, cmax)) method. There will be infinite possible (cmin, cmax) fit your need so additional constrains are necessary, such as keeping the min constant or keeping the max constant or keeping the max-min constant.

What is the difference between 'log' and 'symlog'?

In matplotlib, I can set the axis scaling using either pyplot.xscale() or Axes.set_xscale(). Both functions accept three different scales: 'linear' | 'log' | 'symlog'.
What is the difference between 'log' and 'symlog'? In a simple test I did, they both looked exactly the same.
I know the documentation says they accept different parameters, but I still don't understand the difference between them. Can someone please explain it? The answer will be the best if it has some sample code and graphics! (also: where does the name 'symlog' come from?)
I finally found some time to do some experiments in order to understand the difference between them. Here's what I discovered:
log only allows positive values, and lets you choose how to handle negative ones (mask or clip).
symlog means symmetrical log, and allows positive and negative values.
symlog allows to set a range around zero within the plot will be linear instead of logarithmic.
I think everything will get a lot easier to understand with graphics and examples, so let's try them:
import numpy
from matplotlib import pyplot
# Enable interactive mode
pyplot.ion()
# Draw the grid lines
pyplot.grid(True)
# Numbers from -50 to 50, with 0.1 as step
xdomain = numpy.arange(-50,50, 0.1)
# Plots a simple linear function 'f(x) = x'
pyplot.plot(xdomain, xdomain)
# Plots 'sin(x)'
pyplot.plot(xdomain, numpy.sin(xdomain))
# 'linear' is the default mode, so this next line is redundant:
pyplot.xscale('linear')
# How to treat negative values?
# 'mask' will treat negative values as invalid
# 'mask' is the default, so the next two lines are equivalent
pyplot.xscale('log')
pyplot.xscale('log', nonposx='mask')
# 'clip' will map all negative values a very small positive one
pyplot.xscale('log', nonposx='clip')
# 'symlog' scaling, however, handles negative values nicely
pyplot.xscale('symlog')
# And you can even set a linear range around zero
pyplot.xscale('symlog', linthreshx=20)
Just for completeness, I've used the following code to save each figure:
# Default dpi is 80
pyplot.savefig('matplotlib_xscale_linear.png', dpi=50, bbox_inches='tight')
Remember you can change the figure size using:
fig = pyplot.gcf()
fig.set_size_inches([4., 3.])
# Default size: [8., 6.]
(If you are unsure about me answering my own question, read this)
symlog is like log but allows you to define a range of values near zero within which the plot is linear, to avoid having the plot go to infinity around zero.
From http://matplotlib.sourceforge.net/api/axes_api.html#matplotlib.axes.Axes.set_xscale
In a log graph, you can never have a zero value, and if you have a value that approaches zero, it will spike down way off the bottom off your graph (infinitely downward) because when you take "log(approaching zero)" you get "approaching negative infinity".
symlog would help you out in situations where you want to have a log graph, but when the value may sometimes go down towards, or to, zero, but you still want to be able to show that on the graph in a meaningful way. If you need symlog, you'd know.
Here's an example of behaviour when symlog is necessary:
Initial plot, not scaled. Notice how many dots cluster at x~0
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
[
'
Log scaled plot. Everything collapsed.
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(xlabel='Score, log', ylabel='Total Amount Deposited, log')
'
Why did it collapse? Because of some values on the x-axis being very close or equal to 0.
Symlog scaled plot. Everything is as it should be.
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
ax.set_xscale('symlog')
ax.set_yscale('symlog')
ax.set(xlabel='Score, symlog', ylabel='Total Amount Deposited, symlog')

Categories