I want to build an algorithm in python to flip linestrings (arrays of coordinates) in a linestring collection which represent segments along a road, so that I can merge all coordinates into a single array where the coordinates are rising monotonic.
So my Segmentcollection looks something like this:
segmentCollection = [['1,1', '1,3', '2,3'],
['4,3', '2,3'],
['4,3', '7,10', '5,5']]
EDIT: SO the structure is a list of lists of 2D cartesian coordinate tuples ('1,1' for example is a point at x=1 and y=1, '7,10' is a point at x=7 and y=10, and so on). The whole problem is to merge all these lists to one list of coordinate tuples which are ordered in the sense of following a road in one direction...in fact these are segments which I get from a road network routing service,but I only get segments,where each segment is directed the way it is digitized in the database,not into the direction you have to drive. I would like to get a single polyline for the navigation route out of it.
So:
- I can assume, that all segments are in the right order
- I cannot assume that the Coordinates of each segment are in the right order
- Therefore I also cannot assume that the first coordinate of the first segment is the beginning
- And I also cannot assume that the last coordinate of the last segment is the end
- (EDIT) Even thought I Know,where the start and end point of my navigation request is located,these do not have to be identical with one of the coordinate tuples in these lists,because they only have to be somewhere near a routing graph element.
The algorithm should iterate through every segment, flip it if necessary, and append it then to the resulting array. For the first segment,the challenge is to find the starting point (the point which is NOT connected to the next segment). All other segments are then connected with one point to the last segment in the order (a directed graph).
I'd wonder if there isn't some kind of sorting data structure (sorting tree or anything) which does exactly that. Could you please give some ideas? After messing around a while with loops and array comparisons my brain is knocked out, and I just need a kick into the right direction in the true sense of the word.
If I understand correctly, you don't even need to sort things. I just translated your English text into Python:
def joinSegments( s ):
if s[0][0] == s[1][0] or s[0][0] == s[1][-1]:
s[0].reverse()
c = s[0][:]
for x in s[1:]:
if x[-1] == c[-1]:
x.reverse()
c += x
return c
It still contains duplicate points, but removing those should be straightforward.
def merge_seg(s):
index_i = 0
while index_i+1<len(s):
index_j=index_i+1
while index_j<len(s):
if c[index_i][-1] == c[index_j][0]:
c[index_i].extend(c[index_j][1:])
del c[index_j]
elif c[index_i][-1] == c[index_j][-1]:
c[index_i].extend(c[index_j].reverse()[1:])
del c[index_j]
else:
index_j+=1
index_i+=1
result = []
s.reverse()
for seg_index in range(len(s)-1):
result+=s[seg_index][:-1]#use [:-1] to delete the duplicate items
result+=s[-1]
return result
In inner while loop,every successive segment of s[index_i] is appended to s[index_i]
then index_i++ until every segments is processed.
therefore it is easy to proof that after these while loops, s[0][0] == s[1][-1], s[1][0] == s[2][-1], etc. so just reverse the list and put them together finally you will get your result.
Note: It is the most simple and straightford way, but not most time efficient.
for more algo see:http://en.wikipedia.org/wiki/Sorting_algorithm
You say that you can assume that all segments are in the right order, which means that independently of the coordinates order, your problem is basically to merge sorted arrays.
You would have to flip a segment if it's not defined in the right order, but this doesn't have a single impact on the main algorithm.
simply defind this reordering function:
def reorder(seg):
s1 = min(seg)
e1 = max(seg)
return (s1, e1)
and this comparison funciton
def cmp(seg1, seg2):
return cmp(reorder(seg1), reorder(seg2))
and you are all set, just run a typical merge algorithm:
http://en.wikipedia.org/wiki/Merge_algorithm
And in case, I didn't really understand your problem statement, here's another idea:
Use a segment tree which is a structure that is made exactly to store segments :)
Related
I have two lists of coordinates which should have some overlap (within a certain range) and I'm trying to output a new list which contains all of the coords which are contained in one list but not the other. See first image below for plot of these lists.
Points in each list might be slightly different so I'm allowing for a small amount around each point.
So far I have something like what's shown below which outputs the opposite of what I want - all of the points which are the in common between the two lists.
range = 0.33
different_points = [[],[]]
for i in range(len(All_points[0])):
for j in range(len(Initial_points[1][0])):
if Initial_points[0][j] - range <= All_points[0][i] <= Initial_points[0][j] + range and Initial_points[1][j] - range <= All_points[1][i] <= Initial_points[1][j] + range:
different_points[0].append((All_points[1][0][i]))
different_points[1].append((All_points[1][1][i]))
I'm struggling how to find the opposite list or if there's a much simpler way of doing this as a whole which I'm missing.
Thanks in advance for the help.
Use sets. In particular, intersection and difference.
Either it will help you, or I misunderstood your question completely.
If I have this following code that takes in points a,b,c,d and says if a/b intersect with c/d, then we add the intersection point to a list (it should be noted that this is all in a nested for loop and points c/d are more like c[i] and d[i+1] being pairs of subsequent points chosen in a larger list of point values). So basically, a/b is constant, but c/d changes each time to test if a/b intersects it.
This image shows the line a->b (red and green points respectively) and the black points and brown lines represent the lines created by subsequent points. In the left image, the line from a->b will intersect two brown lines, but i only want the code below to return true to the orange circled one (the one closest to the point a) and then append that to sectlist. I also want it to work if there is only one intersection like in the right example
And here is the code:
for i in range(400): #number of total points that together create a polygon
a = (x[0], y[0])
b = (0, testpoint1)
#so a line is made from the first value in x,y and testpoint1
c = (x[i], y[i])
d = (x[i+1], y[i+1])
#and subsequent lines are made for every subsequent value in the original list of values
if intersect_bool(a,b,c,d) == True:
sectlist.append(intersect_point(a,b,c,d))
This works fine if a/b intersects with only one pair of values c/d, but how can I only have it return True once (and append only the first intersection made and continue with the script)?
Edit: I added a ton to the code so the question is more intelligible + an image with a description
I didn't quite understand the question but you might be able to use something like
if any([intersect_bool(a, b, c, d) for c, d in *insert iterable here*])
Update: I Have edited my answer to provide an example.
Given that I am still not clear on how some of the points would look like, I have created a sample example for you, to illustrate one way you can do this, I have used Shapely.
from shapely.geometry import Point, MultiPoint, LineString
from shapely.ops import nearest_points
# prepare a sample line and list of points
some_line = LineString([(0, 0), (1, 1), (0,3)])
some_points = [Point(1,1), Point(1,2), Point(1,3)]
some_list = list()
# Get all intersecting points
for p in some_points:
if p.intersects(some_line):
some_list.append(p)
# convert gathered points to a list of destinations
destinations = MultiPoint(some_list)
# Find nearest items to a given origin point
nearest_geoms = nearest_points(some_line, destinations) # nearest_points(origin, destinations)
# print the nearest item
print(nearest_geoms[0])
I hope this helps in answering your question.
I am trying to find the closely lying points and remove duplicate points for some shape data (co-ordinates) in Python. I name the co-ordinates nodes as 1,2,3.. and so on and I'm using the shapely package and creating polygons around the node points 1,2,3.. by saying
polygons = [Point([nodes[i]).buffer(1) for i in range(len(nodes))]
and to find the cascading ones I use
cascade = cascaded_union(polygons)
the cascade which is returned is a multipolygon and has many co-ordinates listed, I want to exactly know which of the points from my nodes are cascaded (based on the buffer value of 1) so that I can replace them by a new node. How can I know this??
Instead of using the cascaded_union method, it might be easier to write your own method to check if any two polygons intersect. If I'm understanding what you want to do correctly, you want to find if two polygons overlap, and then delete one of them and edit another accordingly.
You could so something like this (not the best solution, I'll explain why):
def clean_closely_lying_points(nodes):
polygons = [Point([nodes[i]).buffer(1) for i in range(len(nodes))]
for i in range(len(polygons) - 1):
if polygons[i] is None:
continue
for j in range(i + 1, len(polygons)):
if polygons[j] is None:
continue
if polygons[i].intersects(polygons[j]):
polygons[j] = None
nodes[j] = None
# now overwrite 'i' so that it's whatever you want it to be, based on the fact that polygons[i] and polygons[j] intersect
polygons[i] =
nodes[i] =
However, overall, I feel like the creation of polygons is time intensive and unnecessary. It's also tedious to update the polygons list and the nodes list together. Instead, you could just use the nodes themselves, and use shapely's distance method to check if two points are within 2 units of each other.
This should be mathematically equivalent since the intersection between two circles both of radius 1 means that their center points are at most distance 2 away. In this scenario, your for loops would take a similar structure except they would iterate over the nodes.
def clean_closely_lying_points(nodes):
point_nodes = [Point(node) for node in nodes] # Cast each of the nodes (which I assume are in tuple form like (x,y), to shapely Points)
for i in range(len(point_nodes) - 1):
if point_nodes[i] is None:
continue
for j in range(i + 1, len(point_nodes)):
if point_nodes[j] is None:
continue
if point_nodes[i].distance(point_nodes[j]) < 2:
point_nodes[j] = None
point_nodes[i] = # Whatever you want point_nodes[i] to be now that you know that point_nodes[j] was within a distance of 2 (could remain itself)
return [node for node in point_nodes if node is not None]
The result of this method would be a list of shapely point objects, with closely lying points eliminated.
Having not worked with cartesian graphs since high school, I have actually found a need for them relevant to real life. It may be a strange need, but I have to allocate data to points on a cartesian graph, that will be accessible by calling cartesian coordinates. There needs to be infinite points on the graphs. For Eg.
^
[-2-2,a ][ -1-2,f ][0-2,k ][1-2,p][2-2,u]
[-2-1,b ][ -1-1,g ][0-1,l ][1-1,q][1-2,v]
<[-2-0,c ][ -1-0,h ][0-0,m ][1-0,r][2-0,w]>
[-2--1,d][-1--1,i ][0--1,n][1-1,s][2-1,x]
[-2--2,e][-1--2,j ][0--2,o][1-2,t][2-2,y]
v
The actual values aren't important. But, say I am on variable m, this would be 0-0 on the cartesian graph. I need to calculate the cartesian coordinates for if I moved up one space, which would leave me on l.
Theoretically, say I have a python variable which == ("0-1"), I believe I need to split it at the -, which would leave x=0, y=1. Then, I would need to perform (int(y)+1), then re-attach x to y with a '-' in between.
What I want to be able to do is call a function with the argument (x+1,y+0), and for the program to perform the above, and then return the cartesian coordinate it has calculated.
I don't actually need to retrieve the value of the space, just the cartesian coordinate. I imagine I could utilise re.sub(), however I am not sure how to format this function correctly to split around the '-', and I'm also not sure how to perform the calculation correctly.
How would I do this?
To represent an infinite lattice, use a dictionary which maps tuples (x,y) to values.
grid[(0,0)] = m
grid[(0,1)] = l
print(grid[(0,0)])
I'm not sure I fully understand the problem but I would suggest using a list of lists to get the 2D structure.
Then to look up a particular value you could do coords[x-minX][y-minY] where x,y are the integer indices you want, and minX and minY are the minimum values (-2 in your example).
You might also want to look at NumPy which provides an n-dim object array type that is much more flexible, allowing you to 'slice' each axis or get subranges. The NumPy documentation might be helpful if you are new to working with arrays like this.
EDIT:
To split a string like 0-1 into the constituent integers you can use:
s = '0-1'
[int(x) for x in s.split('-')]
You want to create a bidirectional mapping between the variable names and the coordinates, then you can look up coordinates by variable name, apply your function to it, then find the next variable using the new set of coordinates produced by your function.
Mapping between numeric tuples you can apply your function to, and strings usable as keys in a dict, and back, is easy.
I am writing a piece of code which models the evolution of a social network. The idea is that each person is assigned to a node and relationships between people (edges on the network) are given a weight of +1 or -1 depending on whether the relationship is friendly or unfriendly.
Using this simple model you can say that a triad of three people is either "balanced" or "unbalanced" depending on whether the product of the edges of the triad is positive or negative.
So finally what I am trying to do is implement an ising type model. I.e. Random edges are flipped and the new relationship is kept if the new network has more balanced triangels (a lower energy) than the network before the flip, if that is not the case then the new relationship is only kept with a certain probability.
Ok so finally onto my question: I have written the following code, however the dataset I have contains ~120k triads, as a result it will take 4 days to run!
Could anyone offer any tips on how I might optimise the code?
Thanks.
#Importing required librarys
try:
import matplotlib.pyplot as plt
except:
raise
import networkx as nx
import csv
import random
import math
def prod(iterable):
p= 1
for n in iterable:
p *= n
return p
def Sum(iterable):
p= 0
for n in iterable:
p += n[3]
return p
def CalcTriads(n):
firstgen=G.neighbors(n)
Edges=[]
Triads=[]
for i in firstgen:
Edges.append(G.edges(i))
for i in xrange(len(Edges)):
for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)
if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
t=[n,Edges[i][j][0],Edges[i][j][1]]
t.sort()
Triads.append(t)# Add found nodes to Triads.
new_Triads = []# Delete duplicate triads.
for elem in Triads:
if elem not in new_Triads:
new_Triads.append(elem)
Triads = new_Triads
for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.
a=G[Triads[i][0]][Triads[i][1]].values()
b=G[Triads[i][1]][Triads[i][2]].values()
c=G[Triads[i][2]][Triads[i][0]].values()
Q=prod(a+b+c)
Triads[i].append(Q)
return Triads
###### Import sorted edge data ######
li=[]
with open('Sorted Data.csv', 'rU') as f:
reader = csv.reader(f)
for row in reader:
li.append([float(row[0]),float(row[1]),float(row[2])])
G=nx.Graph()
G.add_weighted_edges_from(li)
for i in xrange(800000):
e = random.choice(li) # Choose random edge
TriNei=[]
a=CalcTriads(e[0]) # Find triads of first node in the chosen edge
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member
e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge
G.clear()
G.add_weighted_edges_from(li)
TriNei=[]
a=CalcTriads(e[0])
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]):
TriNei.append(a[i])
postH=-Sum(TriNei)# Calculate the post flip "energy".
if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
continue
elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(-DeltaH)/exp(1) at the moment)
e[2]=-1*e[2]
The following suggestions won't boost your performance that much because they are not on the algorithmic level, i.e. not very specific to your problem. However, they are generic suggestions for slight performance improvements:
Unless you are using Python 3, change
for i in range(800000):
to
for i in xrange(800000):
The latter one just iterates numbers from 0 to 800000, the first one creates a huge list of numbers and then iterates that list. Do something similar for the other loops using range.
Also, change
j=random.choice(range(len(li)))
e=li[j] # Choose random edge
to
e = random.choice(li)
and use e instead of li[j] subsequently. If you really need a index number, use random.randint(0, len(li)-1).
There are syntactic changes you can make to speed things up, such as replacing your Sum and Prod functions with the built-in equivalents sum(x[3] for x in iterable) and reduce(operator.mul, iterable) - it is generally faster to use builtin functions or generator expressions than explicit loops.
As far as I can tell the line:
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
is testing if a float is in a list of floats. Replacing it with if e[1] in a[i]: will remove the overhead of creating two set objects for each comparison.
Incidentally, you do not need to loop through the index values of an array, if you are only going to use that index to access the elements. e.g. replace
for i in range(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
with
for x in a:
if set([e[1]]).issubset(x): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(x)
However I suspect that changes like this will not make a big difference to the overall runtime. To do that you either need to use a different algorithm or switch to a faster language. You could try running it in pypy - for some cases it can be significantly faster than CPython. You could also try cython, which will compile your code to C and can sometimes give a big performance gain especially if you annotate your code with cython type information. I think the biggest improvement may come from changing the algorithm to one that does less work, but I don't have any suggestions for that.
BTW, why loop 800000 times? What is the significance of that number?
Also, please use meaningful names for your variables. Using single character names or shrtAbbrv does not speed the code up at all, and makes it very hard to follow what it is doing.
There are quite a few things you can improve here. Start by profiling your program using a tool like cProfile. This will tell you where most of the program's time is being spent and thus where optimization is likely to be most helpful. As a hint, you don't need to generate all the triads at every iteration of the program.
You also need to fix your indentation before you can expect a decent answer.
Regardless, this question might be better suited to Code Review.
I'm not sure I understand exactly what you are aiming for, but there are at least two changes that might help. You probably don't need to destroy and create the graph every time in the loop since all you are doing is flipping one edge weight sign. And the computation to find the triangles can be improved.
Here is some code that generates a complete graph with random weights, picks a random edge in a loop, finds the triads and flips the edge weight...
import random
import networkx as nx
# complete graph with random 1/-1 as weight
G=nx.complete_graph(5)
for u,v,d in G.edges(data=True):
d['weight']=random.randrange(-1,2,2) # -1 or 1
edges=G.edges()
for i in range(10):
u,v = random.choice(edges) # random edge
nbrs = set(G[u]) & set(G[v]) - set([u,v]) # nodes in traids
triads = [(u,v,n) for n in nbrs]
print "triads",triads
for u,v,w in triads:
print (u,v,G[u][v]['weight']),(u,w,G[u][w]['weight']),(v,w,G[v][w]['weight'])
G[u][v]['weight']*=-1