So I created a really naive (probably inefficient) way of generating hasse diagrams.
Question:
I have 4 dimensions... p q r s .
I want to display it uniformly (tesseract) but I have no idea how to reshape it. How can one reshape a networkx graph in Python?
I've seen some examples of people using spring_layout() and draw_circular() but it doesn't shape in the way I'm looking for because they aren't uniform.
Is there a way to reshape my graph and make it uniform? (i.e. reshape my hasse diagram into a tesseract shape (preferably using nx.draw() )
Here's what mine currently look like:
Here's my code to generate the hasse diagram of N dimensions
#!/usr/bin/python
import networkx as nx
import matplotlib.pyplot as plt
import itertools
H = nx.DiGraph()
axis_labels = ['p','q','r','s']
D_len_node = {}
#Iterate through axis labels
for i in xrange(0,len(axis_labels)+1):
#Create edge from empty set
if i == 0:
for ax in axis_labels:
H.add_edge('O',ax)
else:
#Create all non-overlapping combinations
combinations = [c for c in itertools.combinations(axis_labels,i)]
D_len_node[i] = combinations
#Create edge from len(i-1) to len(i) #eg. pq >>> pqr, pq >>> pqs
if i > 1:
for node in D_len_node[i]:
for p_node in D_len_node[i-1]:
#if set.intersection(set(p_node),set(node)): Oops
if all(p in node for p in p_node) == True: #should be this!
H.add_edge(''.join(p_node),''.join(node))
#Show Plot
nx.draw(H,with_labels = True,node_shape = 'o')
plt.show()
I want to reshape it like this:
If anyone knows of an easier way to make Hasse Diagrams, please share some wisdom but that's not the main aim of this post.
This is a pragmatic, rather than purely mathematical answer.
I think you have two issues - one with layout, the other with your network.
1. Network
You have too many edges in your network for it to represent the unit tesseract. Caveat I'm not an expert on the maths here - just came to this from the plotting angle (matplotlib tag). Please explain if I'm wrong.
Your desired projection and, for instance, the wolfram mathworld page for a Hasse diagram for n=4 has only 4 edges connected all nodes, whereas you have 6 edges to the 2 and 7 edges to the 3 bit nodes. Your graph fully connects each "level", i.e. 4-D vectors with 0 1 values connect to all vectors with 1 1 value, which then connect to all vectors with 2 1 values and so on. This is most obvious in the projection based on the Wikipedia answer (2nd image below)
2. Projection
I couldn't find a pre-written algorithm or library to automatically project the 4D tesseract onto a 2D plane, but I did find a couple of examples, e.g. Wikipedia. From this, you can work out a co-ordinate set that would suit you and pass that into the nx.draw() call.
Here is an example - I've included two co-ordinate sets, one that looks like the projection you show above, one that matches this one from wikipedia.
import networkx as nx
import matplotlib.pyplot as plt
import itertools
H = nx.DiGraph()
axis_labels = ['p','q','r','s']
D_len_node = {}
#Iterate through axis labels
for i in xrange(0,len(axis_labels)+1):
#Create edge from empty set
if i == 0:
for ax in axis_labels:
H.add_edge('O',ax)
else:
#Create all non-overlapping combinations
combinations = [c for c in itertools.combinations(axis_labels,i)]
D_len_node[i] = combinations
#Create edge from len(i-1) to len(i) #eg. pq >>> pqr, pq >>> pqs
if i > 1:
for node in D_len_node[i]:
for p_node in D_len_node[i-1]:
if set.intersection(set(p_node),set(node)):
H.add_edge(''.join(p_node),''.join(node))
#This is manual two options to project tesseract onto 2D plane
# - many projections are available!!
wikipedia_projection_coords = [(0.5,0),(0.85,0.25),(0.625,0.25),(0.375,0.25),
(0.15,0.25),(1,0.5),(0.8,0.5),(0.6,0.5),
(0.4,0.5),(0.2,0.5),(0,0.5),(0.85,0.75),
(0.625,0.75),(0.375,0.75),(0.15,0.75),(0.5,1)]
#Build the "two cubes" type example projection co-ordinates
half_coords = [(0,0.15),(0,0.6),(0.3,0.15),(0.15,0),
(0.55,0.6),(0.3,0.6),(0.15,0.4),(0.55,1)]
#make the coords symmetric
example_projection_coords = half_coords + [(1-x,1-y) for (x,y) in half_coords][::-1]
print example_projection_coords
def powerset(s):
ch = itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
return [''.join(t) for t in ch]
pos={}
for i,label in enumerate(powerset(axis_labels)):
if label == '':
label = 'O'
pos[label]= example_projection_coords[i]
#Show Plot
nx.draw(H,pos,with_labels = True,node_shape = 'o')
plt.show()
Note - unless you change what I've mentioned in 1. above, they still have your edge structure, so won't look exactly the same as the examples from the web. Here is what it looks like with your existing network generation code - you can see the extra edges if you compare it to your example (e.g. I don't this pr should be connected to pqs:
'Two cube' projection
Wikimedia example projection
Note
If you want to get into the maths of doing your own projections (and building up pos mathematically), you might look at this research paper.
EDIT:
Curiosity got the better of me and I had to search for a mathematical way to do this. I found this blog - the main result of which being the projection matrix:
This led me to develop this function for projecting each label, taking the label containing 'p' to mean the point has value 1 on the 'p' axis, i.e. we are dealing with the unit tesseract. Thus:
def construct_projection(label):
r1 = r2 = 0.5
theta = math.pi / 6
phi = math.pi / 3
x = int( 'p' in label) + r1 * math.cos(theta) * int('r' in label) - r2 * math.cos(phi) * int('s' in label)
y = int( 'q' in label) + r1 * math.sin(theta) * int('r' in label) + r2 * math.sin(phi) * int('s' in label)
return (x,y)
Gives a nice projection into a regular 2D octagon with all points distinct.
This will run in the above program, just replace
pos[label] = example_projection_coords[i]
with
pos[label] = construct_projection(label)
This gives the result:
play with r1,r2,theta and phi to your heart's content :)
Related
I am trying graph-tool by Tiago Peixoto to build a graph (either directed or undirected) from a given weighted adjacency matrix with a block structure. So far, unsuccessfully. My question partly overlaps with this thread on SO, which, however, remains without a clear solution.
Suppose I have a function that generates my block matrix of weights J, which is in the form:
Each block Jij is some random binary matrix with entries drawn from a given distribution. The scalars s and g respectively denote weights for connections within diagonal blocks (i.e. when i = j) and blocks off the diagonal (i.e. i ≠ j).
I build my graph in graph_tool as follows:
import graph_tool.all as gt
directed = False # True if we want the graph to be directed
J = generate_adj_bmatrix(...,s=0.1,g=0.01,directed=directed) # Some function to generate the weighted adjacency matrix (here the matrix will be symmetric since we want the graph undirected)
# Define graph
G = gt.Graph(directed=directed)
indexes = J.nonzero()
G.add_edge_list(np.transpose(indexes))
# Add weight information
G.ep['weight'] = G.new_ep("double", vals=J[indexes])
I can also add, if I want, some VertexProperty to my G graph to whose block my nodes belong. But how do I include this information in the code whereby I can build the circular graph? The code reads (pasted here from graph-tool docs):
state = gt.minimize_blockmodel_dl(G) # or should I consider instead state = gt.minimize_nested_blockmodel_dl(G)?
gt.draw_hierarchy(state)
t = gt.get_hierarchy_tree(state)[0]
tpos = pos = gt.radial_tree_layout(t, t.vertex(t.num_vertices() - 1), weighted=True)
cts = gt.get_hierarchy_control_points(G, t, tpos)
pos = G.own_property(tpos)
b = state.levels[0].b
shape = b.copy()
shape.a %= 14 # Have not yet figured out what I need it for
gt.graph_draw(G, pos=pos, vertex_fill_color=b, vertex_shape=shape,
edge_control_points=cts,edge_color=[0, 0, 0, 0.3], vertex_anchor=0)
Noteworthy is that the above code currently hangs seemingly too long. The minimize_blockmodel_dl(G) line appears to engage in an endless loop. Ideally, I should not sample my graph for clusters, since this information could already be provided as a property to the vertexes, based on my knowledge of the block structure of J. At the same time, minimize_blockmodel_dl(G) seems necessary in order to access the edge bundling option, doesn't it?
Here is the solution I came up with.
def visualize_network(J,N_sizes):
"""
Visualize a network from weighted block adjacency matrix in a circular layout with FEB.
Input arguments:
-- J : Weighted adjacency matrix (in block-matrix form, but can be any, as far as it is square).
-- N_sizes : {<block1_label>: size; <block2_label>: size,...} such that node indexes of block n follow immediately those of block n-1.
"""
import numpy as np
import matplotlib.colors as mcolors
import graph_tool.all as gt
# Generate the graph
G = gt.Graph(directed=True) # In my case, network edges are oriented
eindexes = J.nonzero()
G.add_edge_list(np.transpose(eindexes))
# Add weight information
weight = G.new_ep("double", vals = J[eindexes])
# Assign color to each vertex based on the block it belongs to
colors = {'B1' : 'k',
'B2' : 'r',
'B3' : 'g',
'B4' : 'b'}
regs = np.asarray(list(N_sizes.keys()))
rindexes = np.cumsum(list(N_sizes.values()))
iidd = regs[np.searchsorted(rindexes,np.arange(np.shape(J)[0]))]
region_id = G.new_vp("string",vals=iidd)
vcolors = [colors[id] for id in iidd]
vertex_color = G.new_vp("string",vals=vcolors)
# Assigns edge colors by out-node.
eid = regs[np.searchsorted(rindexes,np.arange(np.shape(J)[0]))]
ecolors = [mcolors.to_hex(c) for c in regs[np.searchsorted(rindexes,eindexes[0]]]
edge_color = G.new_ep("string",vals=ecolors)
# Construct a graph in a circular layout with FEB
G = gt.GraphView(G, vfilt=gt.label_largest_component(G))
state = gt.minimize_nested_blockmodel_dl(G)
t = gt.get_hierarchy_tree(state)[0]
tpos = gt.radial_tree_layout(t, t.vertex(t.num_vertices() - 1, use_index=False), weighted=True)
cts = gt.get_hierarchy_control_points(G, t, tpos)
pos = G.own_property(tpos)
gt.graph_draw(G,
pos = pos,
vertex_fill_color = vertex_color,
edge_control_points = cts,
edge_color = edge_color,
vertex_anchor = 0)
Additional documentation on the circular layout and this way of building the graph can be found at this graph-tool doc page.
I have a set of points in a text file: random_shape.dat.
The initial order of points in the file is random. I would like to sort these points in a counter-clockwise order as follows (the red dots are the xy data):
I tried to achieve that by using the polar coordinates: I calculate the polar angle of each point (x,y) then sort by the ascending angles, as follows:
"""
Script: format_file.py
Description: This script will format the xy data file accordingly to be used with a program expecting CCW order of data points, By soting the points in Counterclockwise order
Example: python format_file.py random_shape.dat
"""
import sys
import numpy as np
# Read the file name
filename = sys.argv[1]
# Get the header name from the first line of the file (without the newline character)
with open(filename, 'r') as f:
header = f.readline().rstrip('\n')
angles = []
# Read the data from the file
x, y = np.loadtxt(filename, skiprows=1, unpack=True)
for xi, yi in zip(x, y):
angle = np.arctan2(yi, xi)
if angle < 0:
angle += 2*np.pi # map the angle to 0,2pi interval
angles.append(angle)
# create a numpy array
angles = np.array(angles)
# Get the arguments of sorted 'angles' array
angles_argsort = np.argsort(angles)
# Sort x and y
new_x = x[angles_argsort]
new_y = y[angles_argsort]
print("Length of new x:", len(new_x))
print("Length of new y:", len(new_y))
with open(filename.split('.')[0] + '_formatted.dat', 'w') as f:
print(header, file=f)
for xi, yi in zip(new_x, new_y):
print(xi, yi, file=f)
print("Done!")
By running the script:
python format_file.py random_shape.dat
Unfortunately I don't get the expected results in random_shape_formated.dat! The points are not sorted in the desired order.
Any help is appreciated.
EDIT: The expected resutls:
Create a new file named: filename_formatted.dat that contains the sorted data according to the image above (The first line contains the starting point, the next lines contain the points as shown by the blue arrows in counterclockwise direction in the image).
EDIT 2: The xy data added here instead of using github gist:
random_shape
0.4919261070361315 0.0861956168831175
0.4860816807027076 -0.06601587301587264
0.5023029456281289 -0.18238249845392662
0.5194784026079869 0.24347943722943777
0.5395164357511545 -0.3140611471861465
0.5570497147514262 0.36010146103896146
0.6074231036252226 -0.4142604617604615
0.6397066014669927 0.48590810704447085
0.7048302091822873 -0.5173701298701294
0.7499157837544145 0.5698170011806378
0.8000108666123336 -0.6199254449254443
0.8601249660418364 0.6500974025974031
0.9002010323281716 -0.7196585989767801
0.9703341483292582 0.7299242424242429
1.0104102146155935 -0.7931355765446666
1.0805433306166803 0.8102046438410078
1.1206193969030154 -0.865251869342778
1.1907525129041021 0.8909386068476981
1.2308285791904374 -0.9360074773711129
1.300961695191524 0.971219008264463
1.3410377614778592 -1.0076702085792988
1.4111708774789458 1.051499409681228
1.451246943765281 -1.0788793781975592
1.5213800597663678 1.1317798110979933
1.561456126052703 -1.1509956709956706
1.6315892420537896 1.2120602125147582
1.671665308340125 -1.221751279024005
1.7417984243412115 1.2923406139315234
1.7818744906275468 -1.2943211334120424
1.8520076066286335 1.3726210153482883
1.8920836729149686 -1.3596340023612745
1.9622167889160553 1.4533549783549786
2.0022928552023904 -1.4086186540731989
2.072425971203477 1.5331818181818184
2.1125020374898122 -1.451707005116095
2.182635153490899 1.6134622195985833
2.2227112197772345 -1.4884454939000387
2.292844335778321 1.6937426210153486
2.3329204020646563 -1.5192876820149541
2.403053518065743 1.774476584022039
2.443129584352078 -1.5433264462809912
2.513262700353165 1.8547569854388037
2.5533387666395 -1.561015348288075
2.6234718826405867 1.9345838252656438
2.663547948926922 -1.5719008264462806
2.7336810649280086 1.9858362849271942
2.7737571312143436 -1.5750757575757568
2.8438902472154304 2.009421487603306
2.883966313501766 -1.5687258953168035
2.954099429502852 2.023481896890988
2.9941754957891877 -1.5564797323888229
3.0643086117902745 2.0243890200708385
3.1043846780766096 -1.536523022432113
3.1745177940776963 2.0085143644234558
3.2145938603640314 -1.5088557654466737
3.284726976365118 1.9749508067689887
3.324803042651453 -1.472570838252656
3.39493615865254 1.919162731208186
3.435012224938875 -1.4285753640299088
3.5051453409399618 1.8343467138921687
3.545221407226297 -1.3786835891381335
3.6053355066557997 1.7260966810966811
3.655430589513719 -1.3197205824478546
3.6854876392284703 1.6130086580086582
3.765639771801141 -1.2544077134986225
3.750611246943765 1.5024152236652237
3.805715838087476 1.3785173160173163
3.850244800627849 1.2787337662337666
3.875848954088563 -1.1827449822904361
3.919007794704616 1.1336638361638363
3.9860581363759846 -1.1074537583628485
3.9860581363759846 1.0004485329485333
4.058012891753723 0.876878197560016
4.096267318663407 -1.0303482880755608
4.15638141809291 0.7443374218374221
4.206476500950829 -0.9514285714285711
4.256571583808748 0.6491902794175526
4.3166856832382505 -0.8738695395513574
4.36678076609617 0.593855765446675
4.426894865525672 -0.7981247540338443
4.476989948383592 0.5802489177489183
4.537104047813094 -0.72918339236521
4.587199130671014 0.5902272727272733
4.647313230100516 -0.667045454545454
4.697408312958435 0.6246979535615904
4.757522412387939 -0.6148858717040526
4.807617495245857 0.6754968516332154
4.8677315946753605 -0.5754260133805582
4.917826677533279 0.7163173947264858
4.977940776962782 -0.5500265643447455
5.028035859820701 0.7448917748917752
5.088149959250204 -0.5373268398268394
5.138245042108123 0.7702912239275879
5.198359141537626 -0.5445838252656432
5.2484542243955445 0.7897943722943728
5.308568323825048 -0.5618191656828015
5.358663406682967 0.8052154663518301
5.41877750611247 -0.5844972451790631
5.468872588970389 0.8156473829201105
5.5289866883998915 -0.6067217630853987
5.579081771257811 0.8197294372294377
5.639195870687313 -0.6248642266824076
5.689290953545233 0.8197294372294377
5.749405052974735 -0.6398317591499403
5.799500135832655 0.8142866981503349
5.859614235262157 -0.6493565525383702
5.909709318120076 0.8006798504525783
5.969823417549579 -0.6570670995670991
6.019918500407498 0.7811767020857934
6.080032599837001 -0.6570670995670991
6.13012768269492 0.7562308146399057
6.190241782124423 -0.653438606847697
6.240336864982342 0.7217601338055886
6.300450964411845 -0.6420995670995664
6.350546047269764 0.6777646595828419
6.410660146699267 -0.6225964187327819
6.4607552295571855 0.6242443919716649
6.520869328986689 -0.5922077922077915
6.570964411844607 0.5548494687131056
6.631078511274111 -0.5495730027548205
6.681173594132029 0.4686727666273125
6.7412876935615325 -0.4860743801652889
6.781363759847868 0.3679316979316982
6.84147785927737 -0.39541245791245716
6.861515892420538 0.25880333951762546
6.926639500135833 -0.28237987012986965
6.917336127605076 0.14262677798392165
6.946677533279001 0.05098957832291173
6.967431210462995 -0.13605442176870675
6.965045730326905 -0.03674603174603108
I find that an easy way to sort points with x,y-coordinates like that is to sort them dependent on the angle between the line from the points and the center of mass of the whole polygon and the horizontal line which is called alpha in the example. The coordinates of the center of mass (x0 and y0) can easily be calculated by averaging the x,y coordinates of all points. Then you calculate the angle using numpy.arccos for instance. When y-y0 is larger than 0 you take the angle directly, otherwise you subtract the angle from 360° (2𝜋). I have used numpy.where for the calculation of the angle and then numpy.argsort to produce a mask for indexing the initial x,y-values. The following function sort_xy sorts all x and y coordinates with respect to this angle. If you want to start from any other point you could add an offset angle for that. In your case that would be zero though.
def sort_xy(x, y):
x0 = np.mean(x)
y0 = np.mean(y)
r = np.sqrt((x-x0)**2 + (y-y0)**2)
angles = np.where((y-y0) > 0, np.arccos((x-x0)/r), 2*np.pi-np.arccos((x-x0)/r))
mask = np.argsort(angles)
x_sorted = x[mask]
y_sorted = y[mask]
return x_sorted, y_sorted
Plotting x, y before sorting using matplotlib.pyplot.plot (points are obvisously not sorted):
Plotting x, y using matplotlib.pyplot.plot after sorting with this method:
If it is certain that the curve does not cross the same X coordinate (i.e. any vertical line) more than twice, then you could visit the points in X-sorted order and append a point to one of two tracks you follow: to the one whose last end point is the closest to the new one. One of these tracks will represent the "upper" part of the curve, and the other, the "lower" one.
The logic would be as follows:
dist2 = lambda a,b: (a[0]-b[0])*(a[0]-b[0]) + (a[1]-b[1])*(a[1]-b[1])
z = list(zip(x, y)) # get the list of coordinate pairs
z.sort() # sort by x coordinate
cw = z[0:1] # first point in clockwise direction
ccw = z[1:2] # first point in counter clockwise direction
# reverse the above assignment depending on how first 2 points relate
if z[1][1] > z[0][1]:
cw = z[1:2]
ccw = z[0:1]
for p in z[2:]:
# append to the list to which the next point is closest
if dist2(cw[-1], p) < dist2(ccw[-1], p):
cw.append(p)
else:
ccw.append(p)
cw.reverse()
result = cw + ccw
This would also work for a curve with steep fluctuations in the Y-coordinate, for which an angle-look-around from some central point would fail, like here:
No assumption is made about the range of the X nor of the Y coordinate: like for instance, the curve does not necessarily have to cross the X axis (Y = 0) for this to work.
Counter-clock-wise order depends on the choice of a pivot point. From your question, one good choice of the pivot point is the center of mass.
Something like this:
# Find the Center of Mass: data is a numpy array of shape (Npoints, 2)
mean = np.mean(data, axis=0)
# Compute angles
angles = np.arctan2((data-mean)[:, 1], (data-mean)[:, 0])
# Transform angles from [-pi,pi] -> [0, 2*pi]
angles[angles < 0] = angles[angles < 0] + 2 * np.pi
# Sort
sorting_indices = np.argsort(angles)
sorted_data = data[sorting_indices]
Not really a python question I think, but still I think you could try sorting by - sign(y) * x doing something like:
def counter_clockwise_sort(points):
return sorted(points, key=lambda point: point['x'] * (-1 if point['y'] >= 0 else 1))
should work fine, assuming you read your points properly into a list of dicts of format {'x': 0.12312, 'y': 0.912}
EDIT: This will work as long as you cross the X axis only twice, like in your example.
If:
the shape is arbitrarily complex and
the point spacing is ~random
then I think this is a really hard problem.
For what it's worth, I have faced a similar problem in the past, and I used a traveling salesman solver. In particular, I used the LKH solver. I see there is a Python repo for solving the problem, LKH-TSP. Once you have an order to the points, I don't think it will be too hard to decide on a clockwise vs clockwise ordering.
If we want to answer your specific problem, we need to pick a pivot point.
Since you want to sort according to the starting point you picked, I would take a pivot in the middle (x=4,y=0 will do).
Since we're sorting counterclockwise, we'll take arctan2(-(y-pivot_y),-(x-center_x)) (we're flipping the x axis).
We get the following, with a gradient colored scatter to prove correctness (fyi I removed the first line of the dat file after downloading):
import numpy as np
import matplotlib.pyplot as plt
points = np.loadtxt('points.dat')
#oneliner for ordering points (transform, adjust for 0 to 2pi, argsort, index at points)
ordered_points = points[np.argsort(np.apply_along_axis(lambda x: np.arctan2(-x[1],-x[0]+4) + np.pi*2, axis=1,arr=points)),:]
#color coding 0-1 as str for gray colormap in matplotlib
plt.scatter(ordered_points[:,0], ordered_points[:,1],c=[str(x) for x in np.arange(len(ordered_points)) / len(ordered_points)],cmap='gray')
Result (in the colormap 1 is white and 0 is black), they're numbered in the 0-1 range by order:
For points with comparable distances between their neighbouring pts, we can use KDTree to get two closest pts for each pt. Then draw lines connecting those to give us a closed shape contour. Then, we will make use of OpenCV's findContours to get contour traced always in counter-clockwise manner. Now, since OpenCV works on images, we need to sample data from the provided float format to uint8 image format. Given, comparable distances between two pts, that should be pretty safe. Also, OpenCV handles it well to make sure it traces even sharp corners in curvatures, i.e. smooth or not-smooth data would work just fine. And, there's no pivot requirement, etc. As such all kinds of shapes would be good to work with.
Here'e the implementation -
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist
from scipy.spatial import cKDTree
import cv2
from scipy.ndimage.morphology import binary_fill_holes
def counter_clockwise_order(a, DEBUG_PLOT=False):
b = a-a.min(0)
d = pdist(b).min()
c = np.round(2*b/d).astype(int)
img = np.zeros(c.max(0)[::-1]+1, dtype=np.uint8)
d1,d2 = cKDTree(c).query(c,k=3)
b = c[d2]
p1,p2,p3 = b[:,0],b[:,1],b[:,2]
for i in range(len(b)):
cv2.line(img,tuple(p1[i]),tuple(p2[i]),255,1)
cv2.line(img,tuple(p1[i]),tuple(p3[i]),255,1)
img = (binary_fill_holes(img==255)*255).astype(np.uint8)
if int(cv2.__version__.split('.')[0])>=3:
_,contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
else:
contours,hierarchy = cv2.findContours(img.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cont = contours[0][:,0]
f1,f2 = cKDTree(cont).query(c,k=1)
ordered_points = a[f2.argsort()[::-1]]
if DEBUG_PLOT==1:
NPOINTS = len(ordered_points)
for i in range(NPOINTS):
plt.plot(ordered_points[i:i+2,0],ordered_points[i:i+2,1],alpha=float(i)/(NPOINTS-1),color='k')
plt.show()
return ordered_points
Sample run -
# Load data in a 2D array with 2 columns
a = np.loadtxt('random_shape.csv',delimiter=' ')
ordered_a = counter_clockwise_order(a, DEBUG_PLOT=1)
Output -
I'm currently working on a way to find rectangles/polygons in up to 15 given points (Image below).
Given Points
My goal is it to find polygons in that point array, like I marked in the image below. The polygons are rectangles in the real world but they are distorted a bit that's the reason why they can look like polygons or other shapes. I must find the best rectangle/polygon.
My idea was to check all connections between the points but the total amount of that is to big to run in and it took.
Does anyone has an idea how to solve that, I researched in the web and found the k-Nearest algorithm in sklearn for python but I don't have experience with that if this is the right way to solve it and how to do that. Maybe I'll also need a method to filter out some of the outliers to make it even easier for the algorithm to find the right corner points of the polygon.
The code snippet below splits the given point string into separate arrays, the array coordinatesOnly contains just the x and y values of the points.
Many thanks for you help.
Polygon in Given Points
import math
import numpy as np
import matplotlib.pyplot as plt
import time
from sklearn.neighbors import NearestNeighbors
millis = round(int(time.time())) / 1000
####input String
print("2D to 3D convert")
resultString = "0,487.50,399.46,176.84,99.99;1,485.93,423.43,-4.01,95.43;2,380.53,433.28,1.52,94.90;3,454.47,397.68,177.07,90.63;4,490.20,404.10,-6.17,89.90;5,623.56,430.52,-176.09,89.00;6,394.66,385.44,90.22,87.74;7,625.61,416.77,-177.95,87.02;8,597.21,591.66,-91.04,86.49;9,374.03,540.89,-11.20,85.77;10,600.51,552.91,178.29,85.52;11,605.29,530.78,-179.89,85.34;12,583.73,653.92,-82.39,84.42;13,483.56,449.58,-91.12,83.37;14,379.01,451.62,-6.21,81.51"
resultString = resultString.split(";")
resultStringSplitted = list()
coordinatesOnly = list()
for i in range(len(resultString)):
resultStringSplitted .append(resultString[i].split(","))
newList = ((float(resultString[i].split(",")[1]),float(resultString[i].split(",")[2])))
coordinatesOnly.append(newList)
for j in range(len(resultStringSplitted[i])):
resultStringSplitted[i][j] = float(resultStringSplitted[i][j])
#Check if score is valid
validScoreList = list()
for i in range(len(resultStringSplitted)):
if resultStringSplitted[i][len(resultStringSplitted[i])-1] != 0:
validScoreList.append(resultStringSplitted[i])
resultStringSplitted = validScoreList
#Result String array contains all 2D results
# [Point Number, X Coordinate, Y Coordinate, Angle, Point Score]
for i in range(len(resultStringSplitted)):
plt.scatter(resultStringSplitted[i][1],resultStringSplitted[i][2])
plt.show(block=True)
Since you mentioned that you can have a maximum of 15 points, I suggest to check all possible combinations of 4 points and keep all rectangles that are close enough to perfect rectangles. For 15 points, it is "only" 15*14*13*12=32760 potential rectangles.
import math
import itertools
import numpy as np
coordinatesOnly = ((0,0),(0,1),(1,0),(1,1),(2,0),(2,1),(1,3)) # Test data
rectangles = []
# Returns True if l0 and l1 are within 10% deviation
def isValid(l0, l1):
if l0 == 0 or l1 == 0:
return False
return abs(max(l0,l1)/min(l0,l1) - 1) < 0.1
for p in itertools.combinations(np.array(coordinatesOnly),4):
for r in itertools.permutations(p,4):
l01 = np.linalg.norm(r[1]-r[0]) # Side
l12 = np.linalg.norm(r[2]-r[1]) # Side
l23 = np.linalg.norm(r[3]-r[2]) # Side
l30 = np.linalg.norm(r[0]-r[3]) # Side
l02 = np.linalg.norm(r[2]-r[0]) # Diagonal
l13 = np.linalg.norm(r[2]-r[0]) # Diagonal
areSidesEqual = isValid(l01,l23) and isValid(l12,l30)
isDiag1Valid = isValid(math.sqrt(l01*l01+l30*l30),l13) # Pythagore
isDiag2Valid = isValid(math.sqrt(l01*l01+l12*l12),l02) # Pythagore
if areSidesEqual and isDiag1Valid and isDiag2Valid:
rectangles.append(r)
break
print(rectangles)
It takes about 1 second to run on 15 points on my computer. It really depends on what are your requirements for computation time, i.e., real time, interactive time, "I just don't want to spend days waiting for the answer" time.
At some point in my work, I came up with that kind of scatter plot.
I would like for my script to be able to detect the fact that it "loops" and to give me the point (or an approximation thereof) where it does so : for instance, in this case it would be about [0.2,0.1].
I tried to play around with some representative quantities of my points, like norm and/or argument, like in the following piece of code.
import numpy as np
x,y = np.genfromtxt('points.dat',unpack=True)
norm = np.sqrt(x**2+y**2)
arg = np.arctan2(y,x)
left,right = np.meshgrid(norm,norm)
norm_diff = np.fabs(left - right)
mask = norm_diff == 0.
norm_diff_ma = np.ma.masked_array(norm_diff,mask)
left,right = np.meshgrid(arg,arg)
arg_diff = np.fabs(left - right)
mask = arg_diff == 0.
arg_diff_ma = np.ma.masked_array(arg_diff,mask)
list_of_indices = np.ma.where((norm_diff_ma<1.0e-04)*(arg_diff_ma<1.0e-04))
But, it does not work as intended : might be because the dataset contains too many points and the distance between two aligned points is anyhow of the same order of magnitude as the distance between the points in the "loop cluster" ...
I was thinking about detecting clusters, or maybe even detecting lines in the scatter plot and then see if there are any intersections between any two lines, but I am afraid my skills in image processing only go so far.
Is there any algorithm, any trick that any of you can think about would work here ?
A representative data sample can be found here.
Edit 08/13/2015 16h18 : after the short discussion with #DrBwts I took a closer look at the data I obtained after a pyplot.contour() call. Using the following routine to extract all the vertices :
def contour_points(contour, steps=1):
try:
loc_arr = np.row_stack([path.interpolated(steps).vertices for linecol in contour.collections for path in linecol.get_paths()])
except ValueError:
loc_arr = np.empty((0,2))
finally:
return loc_arr
y,x = contour_points(CS,steps=1).T
it turns out the points of coordinates (x,y) are ordered, in the sense where a call to pyplot.plot() connects the dots correctly.
I was working on clustering a lot of data, which has two different clusters.
The first type is a 6-dimensional cluster whereas the second type is a 12-dimensional cluster. For now I have decided to use kmeans (as it seems the most intuitive clustering algorithm for the start).
The question is how can I map these clusters on a 2d plot so that I can infer whether kmeans is working or not. I would like to use matplotlib, but any other python package is fine.
Cluster 1 is a cluster made up of these data types (int,float,float,int,float,int)
Cluster 2 is a cluster made up of 12 float types.
Trying to get an output similar to this
Any tips will be useful.
Well after searching internet and getting lots of weird comment less solutions. I was able to figure out how to do it. Here's the code if you are trying to do something similar. It contains codes from various sources and a lot of them written/edited by me. I hope its easier to understand than others out there.
The function was based on kmeans2 from scipy which returns the centroid_list and label_list. The kmeansdata is the numpy array passed to kmeans2 for clustering and the num_clusters denotes the number of clusters passed to kmeans2.
The code writes back a new png file ensuring it doesn't overwrite something else. Also plots only 50 clusters (If you have 1000's of clusters, then dont try to output all of them)
(It was written for python2.7, should work for other versions too I guess.)
import numpy
import colorsys
import random
import os
from matplotlib.mlab import PCA as mlabPCA
from matplotlib import pyplot as plt
def get_colors(num_colors):
"""
Function to generate a list of randomly generated colors
The function first generates 256 different colors and then
we randomly select the number of colors required from it
num_colors -> Number of colors to generate
colors -> Consists of 256 different colors
random_colors -> Randomly returns required(num_color) colors
"""
colors = []
random_colors = []
# Generate 256 different colors and choose num_clors randomly
for i in numpy.arange(0., 360., 360. / 256.):
hue = i / 360.
lightness = (50 + numpy.random.rand() * 10) / 100.
saturation = (90 + numpy.random.rand() * 10) / 100.
colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
for i in range(0, num_colors):
random_colors.append(colors[random.randint(0, len(colors) - 1)])
return random_colors
def random_centroid_selector(total_clusters , clusters_plotted):
"""
Function to generate a list of randomly selected
centroids to plot on the output png
total_clusters -> Total number of clusters
clusters_plotted -> Number of clusters to plot
random_list -> Contains the index of clusters
to be plotted
"""
random_list = []
for i in range(0 , clusters_plotted):
random_list.append(random.randint(0, total_clusters - 1))
return random_list
def plot_cluster(kmeansdata, centroid_list, label_list , num_cluster):
"""
Function to convert the n-dimensional cluster to
2-dimensional cluster and plotting 50 random clusters
file%d.png -> file where the output is stored indexed
by first available file index
e.g. file1.png , file2.png ...
"""
mlab_pca = mlabPCA(kmeansdata)
cutoff = mlab_pca.fracs[1]
users_2d = mlab_pca.project(kmeansdata, minfrac=cutoff)
centroids_2d = mlab_pca.project(centroid_list, minfrac=cutoff)
colors = get_colors(num_cluster)
plt.figure()
plt.xlim([users_2d[:, 0].min() - 3, users_2d[:, 0].max() + 3])
plt.ylim([users_2d[:, 1].min() - 3, users_2d[:, 1].max() + 3])
# Plotting 50 clusters only for now
random_list = random_centroid_selector(num_cluster , 50)
# Plotting only the centroids which were randomly_selected
# Centroids are represented as a large 'o' marker
for i, position in enumerate(centroids_2d):
if i in random_list:
plt.scatter(centroids_2d[i, 0], centroids_2d[i, 1], marker='o', c=colors[i], s=100)
# Plotting only the points whose centers were plotted
# Points are represented as a small '+' marker
for i, position in enumerate(label_list):
if position in random_list:
plt.scatter(users_2d[i, 0], users_2d[i, 1] , marker='+' , c=colors[position])
filename = "name"
i = 0
while True:
if os.path.isfile(filename + str(i) + ".png") == False:
#new index found write file and return
plt.savefig(filename + str(i) + ".png")
break
else:
#Changing index to next number
i = i + 1
return
plot_cluster(X[:], kmean.cluster_centers_, kmean.labels_, clusters)