Why error bars in log-scale matplotlib bar plot are lopsided?

Why error bars in log-scale matplotlib bar plot are lopsided? - python

I'm trying to plot some bar plots, where each y-value is averaged over some series. Consequently, I'm also trying to add the error bars (standard deviations) for each bar.
The magnitudes generally seem right, even in log scale, but for several of the bars, the error bar drops down (- direction) almost indefinitely, while the + direction error is the right magnitude. I don't think its just the log scaling, but any input is greatly appreciated. Here is a link to the plot
I've checked and the + direction error bars are correct, just not sure why/how they are are dropping down to the x-axis occasionally. Below is a simplified example.
y = [99.79999999999997, 0.11701249999999999, 0.00011250000000000004, 0.013393750000000001,0.007743750000000001,
0.01, 0.033906250000000006, 0.0009687500000000002, 0.04187500000000001, 0.0218, 0.0018062499999999997, 0.0005187500000000001]
std =[0.013662601021279521, 0.1500170651403811, 3.4156502553198664e-05, 0.001310709095617076,0.0006239324215543433,
0.0, 0.0021671698133741164,0.0018750000000000001, 0.005302515126491074,0.007984401459512583,0.0006297817082132506,4.0311288741492725e-05]
plt.figure() # Powder plot
plt.bar(np.arange(len(y)), y, yerr=std)
plt.yscale('log')
'key_list' is just a list of strings that will become the x-tick labels. 'width' is the bar offset to fit in pairs. 'cm' and 'kk' are just dictionaries of lists. This honestly seems like a rendering issue, but am mostly curious if any of you have encountered this.

Like mentioned in the comment, it is because your std is larger than y (for example std[1] > y[1], hence the log scale goes banana. You can fix this by introduce a small tolerance to the lower std:
tor = 1e-9
lower_std = [a - tor if a<b else b for a,b in zip(y,std)]
plt.figure()
plt.bar(np.arange(len(y)), y, yerr=(lower_std,std))
plt.yscale('log')
plt.show()
Output:

You should look at the relative error rather than trying to plot the standard deviation, or any other measure of variability.
To illustrate this with an example:
In your linear space, you will have x +/- delta_x to display.
Projected into your logarithmic space, this becomes: log(x) +/- log(delta_x). But remember that log(x) - log(y) = log(x/y).
Hence, your non-symmetric error bar, for example. If you learn more about relative error, you will find an appropriate symmetric error bar.
Enjoy your learning :)

Related

Seaborn distplot for data with high SD [duplicate]

In matplotlib, I can set the axis scaling using either pyplot.xscale() or Axes.set_xscale(). Both functions accept three different scales: 'linear' | 'log' | 'symlog'.
What is the difference between 'log' and 'symlog'? In a simple test I did, they both looked exactly the same.
I know the documentation says they accept different parameters, but I still don't understand the difference between them. Can someone please explain it? The answer will be the best if it has some sample code and graphics! (also: where does the name 'symlog' come from?)

I finally found some time to do some experiments in order to understand the difference between them. Here's what I discovered:
log only allows positive values, and lets you choose how to handle negative ones (mask or clip).
symlog means symmetrical log, and allows positive and negative values.
symlog allows to set a range around zero within the plot will be linear instead of logarithmic.
I think everything will get a lot easier to understand with graphics and examples, so let's try them:
import numpy
from matplotlib import pyplot
# Enable interactive mode
pyplot.ion()
# Draw the grid lines
pyplot.grid(True)
# Numbers from -50 to 50, with 0.1 as step
xdomain = numpy.arange(-50,50, 0.1)
# Plots a simple linear function 'f(x) = x'
pyplot.plot(xdomain, xdomain)
# Plots 'sin(x)'
pyplot.plot(xdomain, numpy.sin(xdomain))
# 'linear' is the default mode, so this next line is redundant:
pyplot.xscale('linear')
# How to treat negative values?
# 'mask' will treat negative values as invalid
# 'mask' is the default, so the next two lines are equivalent
pyplot.xscale('log')
pyplot.xscale('log', nonposx='mask')
# 'clip' will map all negative values a very small positive one
pyplot.xscale('log', nonposx='clip')
# 'symlog' scaling, however, handles negative values nicely
pyplot.xscale('symlog')
# And you can even set a linear range around zero
pyplot.xscale('symlog', linthreshx=20)
Just for completeness, I've used the following code to save each figure:
# Default dpi is 80
pyplot.savefig('matplotlib_xscale_linear.png', dpi=50, bbox_inches='tight')
Remember you can change the figure size using:
fig = pyplot.gcf()
fig.set_size_inches([4., 3.])
# Default size: [8., 6.]
(If you are unsure about me answering my own question, read this)

symlog is like log but allows you to define a range of values near zero within which the plot is linear, to avoid having the plot go to infinity around zero.
From http://matplotlib.sourceforge.net/api/axes_api.html#matplotlib.axes.Axes.set_xscale
In a log graph, you can never have a zero value, and if you have a value that approaches zero, it will spike down way off the bottom off your graph (infinitely downward) because when you take "log(approaching zero)" you get "approaching negative infinity".
symlog would help you out in situations where you want to have a log graph, but when the value may sometimes go down towards, or to, zero, but you still want to be able to show that on the graph in a meaningful way. If you need symlog, you'd know.

Here's an example of behaviour when symlog is necessary:
Initial plot, not scaled. Notice how many dots cluster at x~0
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
[
'
Log scaled plot. Everything collapsed.
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(xlabel='Score, log', ylabel='Total Amount Deposited, log')
'
Why did it collapse? Because of some values on the x-axis being very close or equal to 0.
Symlog scaled plot. Everything is as it should be.
ax = sns.scatterplot(x= 'Score', y ='Total Amount Deposited', data = df, hue = 'Predicted Category')
ax.set_xscale('symlog')
ax.set_yscale('symlog')
ax.set(xlabel='Score, symlog', ylabel='Total Amount Deposited, symlog')

Understand log scale and actually taking np.log of a data

I am currently working up some experimental data and am having a hard time understanding whether I should be doing a log scale or actually applying np.log onto the data.
Here is the plot I have made.
Blue represents using plt.yscale('log'), whereas the orange is creating a new column and applying np.log onto the data.
My question
Why are their magnitudes so different? Which is correct? and if using plt.yscale('log') is the optimal way to do it, is there a way I can get those values as I need to do a curve fit after?
Thanks in advance for anyone that can provide some answers!
edit(1)
I understand that plt.yscale('log') is in base 10 and np.log refers to the natural log. I have tried using np.log10 on the data instead and it gives a smaller value that does not correspond to using a log scale.

Your data is getting log-ified but "pointing"? in the wrong direction.
Consider this toy data
x = np.linspace(0, 1, 100)[:-1]
y = np.log(1-x) + 5
Then we plot
plt.plot(x, y)
If I log scale it:
It's just more exaggerated
plt.plot(x, y)
plt.xscale('log')
You need to point your data the other direction like normal log data
plt.plot(-x, y)
But you also have to make sure the data is positive or ... you know ... logs and stuff ¯\_(ツ)_/¯
plt.plot(-x + 1, y)
plt.xscale('log')

matplotlib: get axis ratio of plot

I need to produce scatter plots for several 2D data sets automatically.
By default the aspect ratio is set ax.set_aspect(aspect='equal'), which most of the times works because the x,y values are distributed more or less in a squared region.
Sometimes though, I encounter a data set that, when plotted with the equal ratio, looks like this:
i.e.: too narrow in a given axis. For the above image, the axis are approximately 1:8.
In such a case, an aspect ratio of ax.set_aspect(aspect='auto') would result in a much better plot:
Now, I don't want to set aspect='auto' as my default for all data sets because using aspect='equal' is actually the correct way of displaying such a scatter plot.
I need to fall back to using ax.set_aspect(aspect='auto') only for cases such as the one above.
The question: is there a way to know before hand if the aspect ratio of a plot will be too narrow if aspect='equal' is used? Like getting the actual aspect ratio of the plotted data set.
This way, based on such a number, I can adjust the aspect ratio to something more sane looking (i.e.: auto or some other aspect ratio) instead of 'equal'.

Something like this ought to do,
aspect = (max(x) - min(x)) / (max(y) - min(y))

The axes method get_data_ratio gives the aspect ratio of the bounds of your data as displayed.¹
ax.get_data_ratio()
for example:
M = 4.0
ax.set_aspect('equal' if 1/M < ax.get_data_ratio() < M else 'auto')
¹This is the reciprocal of #farenorth's answer when the axes are zoomed right around the data, i.e., when max(y) == max(ax.get_ylim()) since it is calculated using the ranges in ax.get_ybound and ax.get_xbound.

Setting the color bounds for a matplotlib streamplot

When I create a grid of streamplots (using subplot) they all have their own color bounds. There doesn't appear to be an option for manually setting the color bounds and hence I can't figure out how to make multiple streamplots share the same color bounds.
For example, my plot of the wind in the upper atmosphere uses the following code to create the streamplots:
magnitude = (u ** 2 + v ** 2) ** 0.5
ax.streamplot(x, y, u, v, color=magnitude)
The wind speed/magnitude during winter (JJA) is much stronger than during summer (DJF), however you don't get that impression from the plot because each subplot has its own individual color bounds.
Does anyone know of a solution to this problem?

You could use set_clim, which is similar to the caxis function in MATLAB. This normalizes the colormap to the specified range.
ax.set_clim(vmin=minvalue, vmax=maxvalue)

Boxplot on distance Data - set Box manually to values

I have a bunch of 2d points and angles. To visualise the amount of movement i wanted to use a boxplot and plot the difference to the mean of the points.
I sucessfully visualised the angle jitter using python and matplotlib in the following boxplot:
Now i want to do the same for my position Data. After computing the euclidean distance all the data is positive, so a naive boxplot will give wrong results. For an Example see the boxplot at the bottom, points that are exactly on the mean have a distance of zero and are now outliers.
So my Question is:
How can i set the bottom end of the box and the whiskers manually onto zero?
If i should take another approach like a bar chart please tell me (i would like to use the same style though)
Edit:
It looks similar to the following plot at the moment (This a plot of the distance the angle have from their mean).
As you can see the boxplot does't cover the zero. That is correct for the data, but not for the meaning behind it! Zero is perfect (since it represents a points that was exactly in the middle of the angles) but it is not included in the boxplot.

I found out it has already been asked before in this question on SO. While not as exact duplicate, the other question contains the answer!
In matplotlib 1.4 will probably be a faster way to do it, but for now the answer in the other thread seems to be the best way to go.
Edit:
Well it turned out that i couldn't use their approach since i have plt.boxplot(data, patch_artist=True) to get all the other fancy stuff.
So i had to resort to the following ugly final solution:
N = 12 #number of my plots
upperBoxPoints= []
for d in data:
upperBoxPoints.append(np.percentile(d, 75))
w = 0.5 # i had to tune the width by hand
ind = range(0,N) #compute the correct placement from number and width
ind = [x + 0.5+(w/2) for x in ind]
for i in range(N):
rect = ax.bar(ind[i], menMeans[i], w, color=color[i], edgecolor='gray', linewidth=2, zorder=10)
# ind[i] position
# menMeans[i] hight of box
# w width
# color=color[i] as you can see i have a complex color scheme, use '#AAAAAAA' for colors, html names won't work
# edgecolor='gray' just like the other one
# linewidth=2 dito
# zorder=2 IMPORTANT you have to use at least 2 to draw it over the other stuff (but not to high or it is over your horizontal orientation lines
And the final result:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why error bars in log-scale matplotlib bar plot are lopsided? - python

Related

Seaborn distplot for data with high SD [duplicate]

Understand log scale and actually taking np.log of a data

matplotlib: get axis ratio of plot

Setting the color bounds for a matplotlib streamplot

Boxplot on distance Data - set Box manually to values

Categories

Resources