matplotlib radar plot min values - python

I started with the matplotlib radar example but values below some min values disappear.
I have a gist here.
The result looks like
As you can see in the gist, the values for D and E in series A are both 3 but they don't show up at all.
There is some scaling going on.
In order to find out what the problem is I started with the original values and removed one by one.
When I removed one whole series then the scale would shrink.
Here an example (removing Factor 5) and scale in [0,0.2] range shrinks.
From
to
I don't care so much about the scaling but I would like my values at 3 score to show up.
Many thanks

Actually, the values for D and E in series A do show up, although they are plotted in the center of the plot. This is because the limits of your "y-axis" is autoscaled.
If you want to have a fixed "minimum radius", you can simply put ax.set_ylim(bottom=0) in your for-loop.
If you want the minimum radius to be a number relative to the lowest plotted value, you can include something like ax.set_ylim(np.asarray(data.values()).flatten().min() - margin) in the for-loop, where margin is the distance from the lowest plotted value to the center of the plot.
With fixed center at radius 0 (added markers to better show that the points are plotted):
By setting margin = 1, and using the relative y-limits, I get this output:

Related

Why are the whiskers not displayed correctly with boxplots?

I would like to plot a boxplot for columns of a dataframe which have percentages and to set the
lower limit to 0 and the upper limit to 100 to detect visually the outliers. However I didn't succeed in plotting the whiskers correctly.
Here I created a column with random percentages with some outliers.
import random
from random import randint
import matplotlib.pyplot as plt
import pandas as pd
random.seed(42)
lst=[]
for x in range(140):
x=randint(1,100)
lst.append(x)
lst.append(-1)
lst.append(300)
lst.append(140)
print(lst)
df = pd.DataFrame({0:lst})
Here is my function:
def boxplot(df,var,lower_limit=None,upper_limit=None):
q1=df[var].quantile(0.25)
q3=df[var].quantile(0.75)
iqr=q3-q1
w1=w2=1.5
if (q1!=q3) and (lower_limit!=None):
w1=(q1-lower_limit)/iqr
if (q1!=q3) and (upper_limit!=None):
w2=(upper_limit-q3)/iqr
plt.figure(figsize=(5,5))
df.boxplot(column=var,whis=(w1,w2))
plt.show()
print(f'The minimum of {var} is',df[var].min(),'and its maximum is ',df[var].max(),"\n")
print(f'The first quantile of {var} is ',q1,'its median is ',df[var].median(),'and its third quantile is ',q3,"\n")
I coded boxplot(df,0,lower_limit=0,upper_limit=100) and I had this result:
But the whiskers don't go to 100 and I would like to know why.
TLDR: I don't think you can do what you want to do. The whiskers must snap to values within your dataset, and cannot be set arbitrarily.
Here is a good reference post: https://stackoverflow.com/a/65390045/13386979.
First of all, kudos on a nice first post. It is great that you provided code to reproduce your problem 👏 There were a few small syntax errors, see my edit.
My impression is that what you want to do is not possible with the matplotlib boxplot (which is called by df.boxplot). One issue is that the units of the whis parameter (when you pass a pair of floats) are in percentiles. Taken from the documentation:
If a pair of floats, they indicate the percentiles at which to draw the whiskers (e.g., (5, 95)). In particular, setting this to (0, 100) results in whiskers covering the whole range of the data.
When you pass lower_limit=0, upper_limit=100 to your function, you end up with w1 == 0.5490196078431373 and w2 == 0.4117647058823529 (you can add a print statement to verify this). This tells the boxplot to extend whiskers to the 0.5th and 0.4th percentile, which are both very small (the boxplot edges are the 25th to 75th percentile). The latter is smaller than the 75th percentile, so the top whisker is drawn at the upper edge of the box.
It seems that you have based your calculation of w1 and w2 based on this section from the documentation:
If a float, the lower whisker is at the lowest datum above Q1 - whis*(Q3-Q1), and the upper whisker at the highest datum below Q3 + whis*(Q3-Q1), where Q1 and Q3 are the first and third quartiles. The default value of whis = 1.5 corresponds to Tukey's original definition of boxplots.
I say this because if you also print q1 - w1 * iqr and q3 + w2 * iqr within your call, you get 0 and 100 (respectively). But this calculation is only relevant when a single float is passed (not a pair).
But okay, then what can you pass to whis to get the limits to be any arbitrary value? This is the real problem: I don't think this is possible. The percentiles will always be a value in your data set (there is no interpolation between points). Thus, the edges of the whiskers always snap to a point in your dataset. If you have a point near 0 and near 100, you could find the corresponding percentile to place the whisker there. But without a point there, you cannot hack the whis parameter to set the limits arbitrarily.
I think to fully implement what you want, you should look into drawing the boxes and whiskers manually. Though the caution shared in the other post I referenced is also relevant here:
But be aware that this is not a box and whiskers plot anymore, so you should clearly describe what you're plotting here, otherwise people will be mislead.

Boxplot on distance Data - set Box manually to values

I have a bunch of 2d points and angles. To visualise the amount of movement i wanted to use a boxplot and plot the difference to the mean of the points.
I sucessfully visualised the angle jitter using python and matplotlib in the following boxplot:
Now i want to do the same for my position Data. After computing the euclidean distance all the data is positive, so a naive boxplot will give wrong results. For an Example see the boxplot at the bottom, points that are exactly on the mean have a distance of zero and are now outliers.
So my Question is:
How can i set the bottom end of the box and the whiskers manually onto zero?
If i should take another approach like a bar chart please tell me (i would like to use the same style though)
Edit:
It looks similar to the following plot at the moment (This a plot of the distance the angle have from their mean).
As you can see the boxplot does't cover the zero. That is correct for the data, but not for the meaning behind it! Zero is perfect (since it represents a points that was exactly in the middle of the angles) but it is not included in the boxplot.
I found out it has already been asked before in this question on SO. While not as exact duplicate, the other question contains the answer!
In matplotlib 1.4 will probably be a faster way to do it, but for now the answer in the other thread seems to be the best way to go.
Edit:
Well it turned out that i couldn't use their approach since i have plt.boxplot(data, patch_artist=True) to get all the other fancy stuff.
So i had to resort to the following ugly final solution:
N = 12 #number of my plots
upperBoxPoints= []
for d in data:
upperBoxPoints.append(np.percentile(d, 75))
w = 0.5 # i had to tune the width by hand
ind = range(0,N) #compute the correct placement from number and width
ind = [x + 0.5+(w/2) for x in ind]
for i in range(N):
rect = ax.bar(ind[i], menMeans[i], w, color=color[i], edgecolor='gray', linewidth=2, zorder=10)
# ind[i] position
# menMeans[i] hight of box
# w width
# color=color[i] as you can see i have a complex color scheme, use '#AAAAAAA' for colors, html names won't work
# edgecolor='gray' just like the other one
# linewidth=2 dito
# zorder=2 IMPORTANT you have to use at least 2 to draw it over the other stuff (but not to high or it is over your horizontal orientation lines
And the final result:

Avoid points on edges of plots when the last x-value equals the tick [duplicate]

This question already has an answer here:
Add margin when plots run against the edge of the graph
(1 answer)
Closed 7 years ago.
Every time I try to plot some points the last point ends up right on the edge of the plot:
Note the point circled in red. Is it possible to avoid this? I'd like to have some space between the last point plotted and the right plot edge.
My desired output would be:
Which I obtained increasing the last x value by a tiny amount. I'd like to find a way of plotting that automatically handles this and that doesn't depend on the size of the values(e.g. if I was plotting really small float values then even the smallest change would be significant and the output wouldn't reflect the data)
I'm plotting using:
figure(1)
#X[-1] += 0.0001 #ugly way of obtaining some space at the right
errorbar(X, Y, yerr=error, fmt='bo')
(copied almost directly from How to autoscale y axis in matplotlib?)
You want margins doc
ex
ax.margins(y=.1, x=.1)
Also see Add margin when plots run against the edge of the graph

Selecting best range of values from histogram curve

Scenario :
I am trying to track two different colored objects. At the beginning, user is prompted to hold the first colored object (say, may be a RED) at a particular position in front of camera (marked on screen by a rectangle) and press any key, then my program takes that portion of frame (ROI) and analyze the color in it, to find what color to track. Similarly for second object also. Then as usual, use cv.inRange function in HSV color plane and track the object.
What is done :
I took the ROI of object to be tracked, converted it to HSV and checked the Hue histogram. I got two cases as below :
( here there is only one major central peak. But in some cases, I get two such peaks, One a bigger peak with some pixel cluster around it, and second peak, smaller than first one, but significant size with small cluster around it also. I don't have an sample image of it now. But it almost look like below (created in paint))
Question :
How can I get best range of hue values from these histograms?
By best range I mean, may be around 80-90% of the pixels in ROI lie in that range.
Or is there any better method than this to track different colored objects ?
If I understand right, the only thing you need here is to find a maximum in a graph, where the maximum is not necessarily the highest peak, but the area with largest density.
Here's a very simple not too scientific but fast O(n) approach. Run the histogram trough a low pass filter. E.g. a moving average. The length of your average can be let's say 20. In that case the 10th value of your new modified histogram would be:
mh10 = (h1 + h2 + ... + h20) / 20
where h1, h2... are values from your histogram. The next value:
mh11 = (h2 + h3 + ... + h21) / 20
which can be calculated much easier using the previously calculated mh10, by dropping it's first component and adding a new one to the end:
mh11 = mh10 - h1/20 + h21/20
Your only problem is how you handle numbers at the edge of your histogram. You could shrink your moving average's length to the length available, or you could add values before and after what you already have. But either way, you couldn't handle peaks right at the edge.
And finally, when you have this modified histogram, just get the maximum. This works, because now every value in your histogram contains not only himself but it's neighbors as well.
A more sophisticated approach is to weight your average for example with a Gaussian curve. But that's not linear any more. It would be O(k*n), where k is the length of your average which is also the length of the Gaussian.

making small values visible on matplotlib colorbar in python

I'm using colorbar with the default "jet" map and use that with "hexbin". I have counts in my bins that range from 0 to about 1500. The problem is that the smallest values in some hexagonal bins are between 1 and 10, while some bins have counts of hundreds. This means that in the jet colormap, the 0 to 10 range comes up as the color 0 -- i.e. it is indistinguishable from a bin with 0 counts. I'd like those small values to be visible. How can I make colormap do something like: make sure that the bin values greater than or equal to N have a "visible", meaning different from the 0 bin, value in the color map?
thanks.
A quick fix could be try plotting log(counts) instead of counts on the hexbin -- this will spread the scale such that higher counts are compressed and lower counts are not.
Note though, you'd have to put somewhere that the value being visualised is log(counts) not counts or else a casual reader would inariably misinterpret the graph.
A better method might be to modify the colour map that you're using.
The in-built maps more or less change from the '0' colour to the '1' colour linearly.
In order to make lower values have more spread in colour than the higher values, you need a non-linear colour map.
To do this you might try matplotlib.colors, and in particular matplotlib.colors.LinearSegmentedColormap.from_list (http://matplotlib.sourceforge.net/api/colors_api.html#matplotlib.colors.LinearSegmentedColormap.from_list)
Basically, you input the '0' and '1' colours (like blue-->red) and a gamma value. Having gamma > 1.0 increases sensitivity in the lower part of the scale.
If haven't tried, but something like:
import matplotlib.colors as colors
# colourmap from green to red, biased towards the blue end.
# Try out different gammas > 1.0
cmap = colors.LinearSegmentedColormap.from_list('nameofcolormap',['g','r'],gamma=2.0)
# feed cmap into hexbin
hexbin( ...., cmap=cmap )
Also, there is the mincnt option to set the minimum count in hexbin, which leaves all bins with less than this number blank. This makes it very easy to distinguish between zero and one counts in the jet color scheme.
hexbin( ...., mincnt=1)

Categories