How to plot overlapping clusters in python - python

I am trying to plot a visualization for clusters obtained from the Fuzzy C-means clustering algorithm. With crisp clusters like that obtained through k-means, it is easy to visualize through a normal scatter plot such as the one obtained through matplotlib. Is there a recommended way to plot fuzzy clusters to visualize the overlaps? If yes, how?

One option would be to divide your data into two groups: points that are part of a cluster with degree of belonging >= X, and those less than X. Call the points with degree of belonging >= X the crisp groups. For those less than X you would make groups for each of your different clusters, call these the fuzzy groups. Every fuzzy group would have all of the data points not in the crisp groups.
Now, when you go to plot, assign a color to each of your clusters, say you have three clusters A, B, and C. Assign them colors blue, green, and red. Plot the crisp groups at 100% opacity their group color, and then for each of the fuzzy groups look at the degree of belonging for the points and plot them at some scaled back opacity in their cluster's color.
Since you would have to assign a color to each fuzzy group as a whole it may be best to "bin" them like a histogram by degree of belonging, or you could skip the groups all together and just plot each point separately.
e.g. say we have 2 clusters A and B, and
data = [(0.2,0.8),(0.5,0.5),(0.65,0.35),(0.25,0.75)]
where data represents the degree of belonging (A,B) for each our points (whose coordinates I won't list, but assume they can be represented by ptn). Then if X is .7 we would do crisp_A = [pt1] and crisp_B = [pt4]. Then fuzzy_A = [pt2, pt3] and fuzzy_B = [pt2,pt2]. Plot crisp_A and crisp_B as full colors, and then use a cm.hsv or something akin to scale fuzzy_A and fuzzy_B by their respective degrees of belonging.

Related

Clustering on evenly spaced grid points

I have a 50 by 50 grid of evenly spaced (x,y) points. Each of these points has a third scalar value. This can be visualized using a contourplot which I have added. I am interested in the regions indicated in by the red circles. These regions of low "Z-values" are what I want to extract from this data.
2D contour plot of 50 x 50 evenly spaced grid points:
I want to do this by using clustering (machine learning), which can be lightning quick when applied correctly. The problem is, however, that the points are evenly spaced together and therefore the density of the entire dataset is equal everywhere.
I have tried using a DBSCAN algorithm with a custom distance metric which takes into account the Z values of each point. I have defined the distance between two points as follows:\
def custom_distance(point1,point2):
average_Z = (point1[2]+point2[2])/2
distance = np.sqrt(np.square((point1[0]-point2[0])) + np.square((point1[1]-point2[1])))
distance = distance * average_Z
return distance
This essentially determines the Euclidean distance between two points and adds to it the average of the two Z values of both points. In the picture below I have tested this distance determination function applied in a DBSCAN algorithm. Each point in this 50 by 50 grid each has a Z value of 1, except for four clusters that I have randomly placed. These points each have a z value of 10. The algorithm is able to find the clusters in the data based on their z value as can be seen below.
DBSCAN clustering result using scalar value distance determination:
Positive about the results I tried to apply it to my actual data, only to be disappointed by the results. Since the x and y values of my data are very large, I have simply scaled them to be 0 to 49. The z values I have left untouched. The results of the clustering can be seen in the image below:
Clustering result on original data:
This does not come close to what I want and what I was expecting. For some reason the clusters that are found are of rectangular shape and the light regions of low Z values that I am interested in are not extracted with this approach.
Is there any way I can make the DBSCAN algorithm work in this way? I suspect the reason that it is currently not working has something to do with the differences in scale of the x,y and z values. I am also open for tips or recommendations on other approaches on how to define and find the lighter regions in the data.

Random colors in pyplot.scatter based on value

I have a large pandas dataframe that I clustered and the cluster id is stored in a column of a dataframe.
I would like to display clusters in such a way that each cluster has a different color.
I tried doing this with a colormap but the problem is that I have too many points and clusters so nearby clusters get assign only slightly different colors, so when I plot all of them I just get a big mashup that looks like this:
Note that this is image contains about 4000 clusters, but because colors of clusters are just assigned top to bottom, nearby clusters blend together.
I would like nearby clusters to be painted in different colors so I tried making a random color for each cluster and then assign each point a color based on its cluster label like this:
# creating a color for each distinct cluster label
colors = [(random.random(), random.random(), random.random())
for _ in range(len(set(data['labels'])))]
# assigning color to a point based on its cluster label
for index, row in data.iterrows():
plt.scatter(row['x'], row['y'], color=colors[int(row['labels'])])
Now this program works but it is much slower that vectorized version above.
Is there a way to do color each cluster in clearly different colors without writing a for loop?
This creates a random colormap of 256 colors that you can then pass to scatter :
def segmentation_cmap():
vals = np.linspace(0, 1, 256)
np.random.shuffle(vals)
return plt.cm.colors.ListedColormap(plt.cm.CMRmap(vals))
ax.scatter(row['x'],row['y'],c=row['labels'],s=1,cmap=segmentation_cmap())
You may add colors, but you would have trouble seeing the differences anyways at some point !

how to intersect two planes in python and export the coordinates of the intersection

I have a bunch of points (x, y and z) in a 3d space and want to extract some points out of them. I copied a simplified example with two arrays which are linked together:
all_points=[[np.array([[6.8,1.,0.1], [6.8,3.,0.1], [6.8,6.,0.1],\
[4.8,1.,2.], [4.8,3.,2.], [4.8,6.,2.],\
[3.8,1.,3.], [3.8,3.,3.], [3.8,6.,3.],\
[2.8,1.,4.1], [2.8,3.,4.1], [2.8,6.,4.1]]),\
np.array([[5.,1.,2.], [5.,3.,2.], [5.,6.,2.],\
[4.,1.,3.], [4.,3.,3.], [4.,6.,3.],\
[6.,1.,3.], [6.,3.,3.], [6.,6.,3.],\
[7.,1.,4.], [7.,3.,4.], [7.,6.,4.],\
[3.,1.,4.], [3.,3.,4.], [3.,6.,4.]])]]
Firstly, I want to check whether the array is normal or not. If I sort a normal array based on z values, the x value of srted array will be increasing or decreasing. First array (blue dots in upladed fig) clearly show a normal set. For normal arrays I just do a simple task and export four points showing corners of them (shown by yellow and green arrows in my fig). These points are found based on the minimum and maximum of x, y and z. Following code gives me four corners of normals:
four_corners=[]
for points in all_points:
for sub_points in points:
sorted_sub=np.sort(sub_points.view('i8,i8,i8'), order=['f2', 'f1'], axis=0).view('float')
le_st=sorted_sub[np.where(sorted_sub[:,2] == sorted_sub[0,2])]
le_st=len(le_st)
le_en=sorted_sub[np.where(sorted_sub[:,2] == sorted_sub[-1,2])]
le_en=len(le_en)
cor=np.array([sorted_sub[0,:], sorted_sub[int((le_st-1)),:], sorted_sub[-1,:], sorted_sub[-le_en,:]])
four_corners.append(cor)
In abnormal sets (black squares in my fig) usually some points are very close to a normal set (a limit can be defined) and then they go away. I want to extract four points but by creating two planes. First plane is created using three of the four corners points found for the normal points. Second surface is created using each three points of the abnormal points that are not close to the normal points (highlighted by a red line in my fig). Then, I want to find intersection line of two surfaces and find the x and z in the minimum and maximum of y (1 and 6) of the intersection. y value of all my corners points (normal or abnormal) is the minimum or maximum value. Other two points are created by substituting the y and z values of the two corners points coming from the normal plane that have higher z values (highlted by yellow arrows) into the equation of the plane of abnormal set. I only know how to create surfaces based on this solution. In reality I may have several normal and abnormal sets that all are linked to the normal. In advance, I do appreciate any help and contribution for doing what I want in python.

How to separate points and find corners of the points in python based on their distance

I have some points in the 3d space (x, y and z). These point sets are stored as arrays in lists. I copied a simplified example having two point sets:
all_points=[[np.array([[6.8,1.,0.1], [6.8,3.,0.1], [6.8,6.,0.1],\
[5.8,1.,1.1], [5.8,3.,1.1], [5.8,6.,1.1],\
[4.8,1.,2.], [4.8,3.,2.], [4.8,6.,2.],\
[3.8,1.,3.], [3.8,3.,3.], [3.8,6.,3.],\
[2.8,1.,4.1], [2.8,3.,4.1], [2.8,6.,4.1]]),\
np.array([[5.,1.,2.], [5.,3.,2.], [5.,6.,2.],[6.,1.,1.2],\
[4.,1.,3.], [4.,3.,3.], [4.,6.,3.],[5.5,3.,1.5],\
[6.,1.,3.], [6.,3.,3.], [6.,6.,3.],\
[7.,1.,4.], [7.,3.,4.], [7.,6.,4.],\
[3.,1.,4.], [3.,3.,4.], [3.,6.,4.]])]]
My point sets are normal or abnormal. They are normal if when I sort them based on their z, the x value will be only increasing or decreasing. Blue dots in my fig cleary show the normal type. But black squares show an abnormal point set. These two sets are linked because some points of the abnormal set are close to the normal one. Minimum and maximum of y value in both normal and abnormal sets is fixed (1 and 6 in my example). In normal set, I simply want four corners of them (shown by green arrows in my fig). This code gives me four corners:
four_corners=[]
for points in all_points:
for sub_points in points:
sorted_sub=np.sort(sub_points.view('i8,i8,i8'), order=['f2', 'f1'], axis=0).view('float')
le_st=sorted_sub[np.where(sorted_sub[:,2] == sorted_sub[0,2])]
le_st=len(le_st)
le_en=sorted_sub[np.where(sorted_sub[:,2] == sorted_sub[-1,2])]
le_en=len(le_en)
cor=np.array([sorted_sub[0,:], sorted_sub[int((le_st-1)),:], sorted_sub[-1,:], sorted_sub[-le_en,:]])
four_corners.append(cor)
Abnormal point sets can be devided into two groups: a group that is close to normal point sets and another one is far from them. A threshold can separate them. I tried the following code to seperate them (I should transfer my normal and abnormal arrays automatically here, but I have written them manually):
from scipy.spatial import distance
import numpy_indexed as npi
threshold=0.5
close_points=abnormal[np.where(np.min(distance.cdist(abnormal, normal),axis=0)<threshold)[0],:]
far_points= npi.difference(abnormal, close_points)
After separation, I want two points from far_points and two point from close_points. In far_points I want two point that have the highest z values and have min of y (1) and max of y (6). These two points are shown by yellow arrows in my fig and are:
[[7.,1.,4.], [7.,6.,4.]]
In close_points I want the points that their y value is again min and max (1 and 6). I name them y_min and y_max subgroups and from each subgroup, I want the point that the least z value. In my data they are and are shown by red arrows:
[[6.,1.,1.2],[5.,6.,2.]]
Finally, I want to find two point of the normal point sets that are closest to theese two point of close_points of the abnormal group. They are:
[[5.8,1.,1.1], [4.8,6.,2.]]
So, I want a method to firstly distiguish which array is normal and which is abnormal. Then find four simple corners of my normal sets and explained four explained points of abnormal sets. the method should be also able to ditinguish which normal set is connected to which abnormal ones. I may have one normal sets and two or three linked abnormal sets or maybe two normals and one abnormal which is connected to a normal set. I do appreciate any help for doing what I want in python.

How to plot coarse-grained average of a set of data points?

I have a set of discrete 2-dimensional data points. Each of these points has a measured value associated with it. I would like to get a scatter plot with points colored by their measured values. But the data points are so dense that points with different colors would overlap with each other, that may not be good for visualization. So I am thinking if I could associate the color for each point based on the coarse-grained average of measured values of some points near it. Does anyone know how to implement this in Python?
Thanks!
I have it done by using sklearn.neighbors.RadiusNeighborsClassifier(), the idea is the take the average of the values of the neighbors within a specific radius. Suppose the coordinates of the data points are in the list temp_coors, the values associated with these points are coloring, then coloring could be coarse-grained in the following way:
r_neigh = RadiusNeighborsRegressor(radius=smoothing_radius, weights='uniform')
r_neigh.fit(temp_coors, coloring)
coloring = r_neigh.predict(temp_coors)

Categories