Plotting histogram using clusters

Plotting histogram using clusters - python

I have tried to research this problem, but failed. I'm quite a beginner at python, so bear with me.
I have a textfile containing numbers on each line (they are angles in degrees).
I want to first cluster the angles into cluster sizes of 20. Then I want to plot this on a histogram. I have the following code:
angle = open(output_dir+'/chi_angle.txt', 'r').read().splitlines()
array = numpy.array(map(float, angle))
hello = list(array)
from cluster import *
cl = HierarchicalClustering(hello, lambda x,y: abs(x-y))
clusters = cl.getlevel(20)
frequency = [len(x) for x in clusters]
average = [1.0*sum(x)/len(x) for x in clusters]
Now. My question is: How do I plot the histogram?
Doing the following:
pylab.hist(average, bins=50)
pylab.xlabel('Chi 1 Angle [degrees]')
pylab.ylabel('#')
pylab.show()
will show a histogram with bars correctly placed (i.e. at the average of each cluster), but it wont show how many "angles" each cluster contains.
Just for clarification. The clustered data looks like this:
clusters = [[-60.26, -30.26, -45.24], [163.24, 173.24], [133.2, 123.23, 121.23]]
I want the mean of each cluster, and the number of angles in each cluster. On the histogram the first bar will thus be located at around -50 and will be a height of 3. How do I plot this?
Thanks a lot!

Not sure I understood your question. Anyhow try saving your histogram in this array
H=hist(average, bins=50)
If you want to plot it then do
plot(H[1][1:],H[0])
H[1] is an array that stores the bins centers and H[0] the counts in each bin. I hope this helped.

Why don't you just use a histogram right away?
A histogram of cluster centers is not a very sensible representation of your data.

Related

Python contour plot vs pcolormesh for probability map

So I have two arrays of points that I need to plot that I have stored in arrays, but at each of these points there is a probability of some event happening so each has a value ranging from 0 to 1. My idea was to find a way to assign these probabilities to their respective (x,y) coordinate and display it as a heatmap. The code to plot this is as follows:
plt.pcolormesh(xcoord,ycoord,des_mag)
plt.show()
Where xcoord and ycoord are arrays. I could only make this run if I made des_mag a 2D array, in this case a 2000x2000 array with only entries on the diagonal since xcoord and ycoord each contain 2000 coordinates. All the des_mag values vary from 0 to 1. When I run this the output is simply a graph with a solid background and one tiny grid point in the corner with a different color. I'm 95% confident the issue is my lack of understanding on what it is I need to input for the plot, but I can't seem to find many examples for clarity on the issue. If anyone has any suggestions it would be greatly appreciated.

Plotting 3D points with Python Matploltlib

I want to create a small simulation, and I think I know how, but in order to actually see what happens I need to visualize it.
I started with a 5x5x5 array, which I want to fill up with values.
data = numpy.zeros(shape=(5,5,5))
data[:,:,0]=4
data[:,:,1]=3
data[:,:,2]=2
data[:,:,3]=1
data[:,:,4]=0
This should create a cube which has a gradient in the upward direction (if the third axis is z).
Now, how can I plot this? I dont want a surface plot, or wireframe. Just Points on each coordinate, and maybe colorcoded or transperency by value.
As a test I tried plotting all coordinates using
ax.scatter(numpy.arange(5),numpy.arange(5),numpy.arange(5))
but this will only plot a line consisting of 5 dots.
So... how can I get the 125 dots, that I want to create?
Thx.

You can encode the value in color like this:
x = np.arange(5)
X, Y, Z = np.meshgrid(x,x,x)
v = np.arange(125)
ax.scatter(X,Y,Z, c=v)
See here for the documention.

Getting correct XY axes when plotting numpy array

Beginning python/numpy user here. I do an analysis of a 2D function in the XY plane. Using 2 loops through x and y I compute the function value and store it into an array for later plotting. I ran into a couple of problems.
Lets say my XY range is -10 to 10. How do I accommodate that when storing computed value into my data array? (only positive numbers are allowed as indices) For now I just add to x and Y to make it positive.
From my data I know that the extreme is a x=-3 and y=2. When I plot the computed array first of all the axes labels are wrong. I would like Y to go the mathematical way. (up)
I would like the axes labels to run from -10 to 10. I tried 'extend' but that did not come out right.
Again from my data I know that the extreme is at x=-3 and y=2. In the plot when I hover the mouse over the graphics, the max value is shown at x=12 and y=7. Seems x and y have been swapped. Though when I move the mouse the displayed x and y numbers run as follows. X grows larger when moving the mouse right etc. (OK) Y runs the wrong way, grows larger when moving DOWN.
As side note it would be nice to have the function value shown in the plot window as well next to x and y.
Here is my code:
size = 10
q = np.zeros((2*size,2*size))
for xs in range(-size,+size):
for ys in range(-size,+size):
q[xs+size,ys+size] = my_function_of_x_and_y(x,y)
im = plt.imshow(q, cmap='rainbow', interpolation='none')
plt.show()
One more thing. I would like not to mess with the q array too badly as I later want to find the extreme spot in it.
idxmin = np.argmin(q)
xmin,ymin = np.unravel_index(idxmin, q.shape)
xmin= xmin-size
ymin= ymin-size
So that I get this:
>>> xmin,ymin
(-3, 2)
>>>
Here is my plot:
(source: dyndns.ws)
Here is the desired plot (made in photoshop) (axis lineswould be nice):
(source: dyndns.ws)

Not too sure why setting extend did not work for you but this is how I have implemented it
q = np.random.randint(-10,10, size=(20, 20))
im = plt.imshow(q, cmap='rainbow', interpolation='none',extent=[-10,10,-10,10])
plt.vlines(0,10,-10)
plt.hlines(0,10,-10)
plt.show()
Use vlines and hlines methods to set the centering line

How do I plot the point of max value on a line graph in matplotlib?

so I am running Python 3.4 and was wondering how I could use matplotlib to plot the maximum value as a point on my current linear graph. My current graph is very simply: it has two lines with y values as scores and x values as time. I am trying to plot a point on each individual line at the time where the maximum score is reached and also show its coordinates: (optimal time, max score). Does anyone know if there is a way to do this with matplotlib? Thanks in advance.

What I ended up doing was using two plots (time_list is the x-axis values and score is the list of y-values):
ordered_time = [time_list for (score,time_list) in sorted(zip(score,time_list))]
best_time = ordered_time[-1]
max_coords = '('+str(best_time)+', ' + str("%.4f" % (max(score)))+')'
max_point = pl.plot(best_time, max(score), 'bo', label="(Opt. Time, Max Score)")
pl.text(best_time, max(score), max_coords)
... (insert rest of stuff for your graph)
This will find the max point on a specific line, plot a point onto it, and then label the point with its coordinates.
If you want a different text label other than coordinates then just replace "max_coords" in the last line with whatever string you want.
If you want to find the max for EACH line then just have multiple x and y lists and go through the same process (e.g. instead of "time_list" and "score", use "time_list_1", "time_list_2", ... and "score_1", "score_2"...)
Hope this helped someone.

Boxplot on distance Data - set Box manually to values

I have a bunch of 2d points and angles. To visualise the amount of movement i wanted to use a boxplot and plot the difference to the mean of the points.
I sucessfully visualised the angle jitter using python and matplotlib in the following boxplot:
Now i want to do the same for my position Data. After computing the euclidean distance all the data is positive, so a naive boxplot will give wrong results. For an Example see the boxplot at the bottom, points that are exactly on the mean have a distance of zero and are now outliers.
So my Question is:
How can i set the bottom end of the box and the whiskers manually onto zero?
If i should take another approach like a bar chart please tell me (i would like to use the same style though)
Edit:
It looks similar to the following plot at the moment (This a plot of the distance the angle have from their mean).
As you can see the boxplot does't cover the zero. That is correct for the data, but not for the meaning behind it! Zero is perfect (since it represents a points that was exactly in the middle of the angles) but it is not included in the boxplot.

I found out it has already been asked before in this question on SO. While not as exact duplicate, the other question contains the answer!
In matplotlib 1.4 will probably be a faster way to do it, but for now the answer in the other thread seems to be the best way to go.
Edit:
Well it turned out that i couldn't use their approach since i have plt.boxplot(data, patch_artist=True) to get all the other fancy stuff.
So i had to resort to the following ugly final solution:
N = 12 #number of my plots
upperBoxPoints= []
for d in data:
upperBoxPoints.append(np.percentile(d, 75))
w = 0.5 # i had to tune the width by hand
ind = range(0,N) #compute the correct placement from number and width
ind = [x + 0.5+(w/2) for x in ind]
for i in range(N):
rect = ax.bar(ind[i], menMeans[i], w, color=color[i], edgecolor='gray', linewidth=2, zorder=10)
# ind[i] position
# menMeans[i] hight of box
# w width
# color=color[i] as you can see i have a complex color scheme, use '#AAAAAAA' for colors, html names won't work
# edgecolor='gray' just like the other one
# linewidth=2 dito
# zorder=2 IMPORTANT you have to use at least 2 to draw it over the other stuff (but not to high or it is over your horizontal orientation lines
And the final result:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting histogram using clusters - python

Not sure I understood your question. Anyhow try saving your histogram in this array H=hist(average, bins=50) If you want to plot it then do plot(H[1][1:],H[0]) H[1] is an array that stores the bins centers and H[0] the counts in each bin. I hope this helped.

Why don't you just use a histogram right away? A histogram of cluster centers is not a very sensible representation of your data.

Related

Python contour plot vs pcolormesh for probability map

Plotting 3D points with Python Matploltlib

Getting correct XY axes when plotting numpy array

How do I plot the point of max value on a line graph in matplotlib?

Boxplot on distance Data - set Box manually to values

Categories

Resources