Finding points of a cascaded union of polygons in python - python

I am trying to find the closely lying points and remove duplicate points for some shape data (co-ordinates) in Python. I name the co-ordinates nodes as 1,2,3.. and so on and I'm using the shapely package and creating polygons around the node points 1,2,3.. by saying
polygons = [Point([nodes[i]).buffer(1) for i in range(len(nodes))]
and to find the cascading ones I use
cascade = cascaded_union(polygons)
the cascade which is returned is a multipolygon and has many co-ordinates listed, I want to exactly know which of the points from my nodes are cascaded (based on the buffer value of 1) so that I can replace them by a new node. How can I know this??

Instead of using the cascaded_union method, it might be easier to write your own method to check if any two polygons intersect. If I'm understanding what you want to do correctly, you want to find if two polygons overlap, and then delete one of them and edit another accordingly.
You could so something like this (not the best solution, I'll explain why):
def clean_closely_lying_points(nodes):
polygons = [Point([nodes[i]).buffer(1) for i in range(len(nodes))]
for i in range(len(polygons) - 1):
if polygons[i] is None:
continue
for j in range(i + 1, len(polygons)):
if polygons[j] is None:
continue
if polygons[i].intersects(polygons[j]):
polygons[j] = None
nodes[j] = None
# now overwrite 'i' so that it's whatever you want it to be, based on the fact that polygons[i] and polygons[j] intersect
polygons[i] =
nodes[i] =
However, overall, I feel like the creation of polygons is time intensive and unnecessary. It's also tedious to update the polygons list and the nodes list together. Instead, you could just use the nodes themselves, and use shapely's distance method to check if two points are within 2 units of each other.
This should be mathematically equivalent since the intersection between two circles both of radius 1 means that their center points are at most distance 2 away. In this scenario, your for loops would take a similar structure except they would iterate over the nodes.
def clean_closely_lying_points(nodes):
point_nodes = [Point(node) for node in nodes] # Cast each of the nodes (which I assume are in tuple form like (x,y), to shapely Points)
for i in range(len(point_nodes) - 1):
if point_nodes[i] is None:
continue
for j in range(i + 1, len(point_nodes)):
if point_nodes[j] is None:
continue
if point_nodes[i].distance(point_nodes[j]) < 2:
point_nodes[j] = None
point_nodes[i] = # Whatever you want point_nodes[i] to be now that you know that point_nodes[j] was within a distance of 2 (could remain itself)
return [node for node in point_nodes if node is not None]
The result of this method would be a list of shapely point objects, with closely lying points eliminated.

Related

program generate an undirected graph

I'm a beginner in python, I'm having trouble solving an exercise on graph, the exercise is as follows:
A graph G = (V, A) stores the information of a set of vertices and a set of edges. The degree of a vertex is the number of edges incident on it. The degree of a graph is the maximum value of the degree of its vertices. The "D(degree)" is the inverse idea, the minimum value of the degree of the vertices. Write a program that takes a series of instructions and processes them to generate an undirected graph.
IV A inserts the vertex with id==A into the graph;
IA A B inserts an edge from the vertex of id==A to the vertex of id==B, if the vertices exist;
RV A removes the vertex of id==A, if it exists, and all edges related to it; and
RA A B removes the edge from the vertex of id==A to the vertex of id==B, if it exists;
Input:
The input consists of a line containing the number 0 ≤ n ≤ 100 indicating the number of operations on the graph, followed by n lines, each containing an instruction as shown. Each id is a string with a maximum of 10 characters.
Exit:
Present, in one line, the "D(degree)" of the graph.
Note:
Insert operations overwrite existing information. In the first example, the two vertices have the least number of edges. In the second case, vertex A has the fewest edges. In the last example, vertices A and B have only one edge while C has two.
this is the beginning of my code, as I couldn't develop:
n = int(input())
G = {}
for i in range(n):
l = input().split()
if l[0] == 'IV':
if l[1] not in G:
G[l[1]] = []
It's easy actually. You have mentioned that n <= 100. So there might be atmost 200 different ids that can be introduced. Now we will maintain an array of 200x200
So you will do the following things:
Given those ids map them to integers always. Keep a dictionary and associate a number with it. So if you find they mention IV A then populate the dictionary with {'A': 0} and then if IV B populate it with {A: 0, B: 1} etc.
When you get an edge removal or addition case, check first if both vertices exists in the above mentioned dictionary.
In case they don't ignore it.
In case they do, then if it is addition of edge, the 2d array (which was initialized with 0) increase the corresponding entry.
IA A B --> Then you will add 1 to position [0][1] and [1][0] of 2d array.
RA A B --> Then you will subtract 1 from position [0][1] and [1][0] of 2d array.
In the end you will just count how many non-zero entries each row has (you will only considered the rows corresponding to IDs which appeared in the dictionary). And return the minimum one of them.

Python: How to return only the first true value of this if statement?

If I have this following code that takes in points a,b,c,d and says if a/b intersect with c/d, then we add the intersection point to a list (it should be noted that this is all in a nested for loop and points c/d are more like c[i] and d[i+1] being pairs of subsequent points chosen in a larger list of point values). So basically, a/b is constant, but c/d changes each time to test if a/b intersects it.
This image shows the line a->b (red and green points respectively) and the black points and brown lines represent the lines created by subsequent points. In the left image, the line from a->b will intersect two brown lines, but i only want the code below to return true to the orange circled one (the one closest to the point a) and then append that to sectlist. I also want it to work if there is only one intersection like in the right example
And here is the code:
for i in range(400): #number of total points that together create a polygon
a = (x[0], y[0])
b = (0, testpoint1)
#so a line is made from the first value in x,y and testpoint1
c = (x[i], y[i])
d = (x[i+1], y[i+1])
#and subsequent lines are made for every subsequent value in the original list of values
if intersect_bool(a,b,c,d) == True:
sectlist.append(intersect_point(a,b,c,d))
This works fine if a/b intersects with only one pair of values c/d, but how can I only have it return True once (and append only the first intersection made and continue with the script)?
Edit: I added a ton to the code so the question is more intelligible + an image with a description
I didn't quite understand the question but you might be able to use something like
if any([intersect_bool(a, b, c, d) for c, d in *insert iterable here*])
Update: I Have edited my answer to provide an example.
Given that I am still not clear on how some of the points would look like, I have created a sample example for you, to illustrate one way you can do this, I have used Shapely.
from shapely.geometry import Point, MultiPoint, LineString
from shapely.ops import nearest_points
# prepare a sample line and list of points
some_line = LineString([(0, 0), (1, 1), (0,3)])
some_points = [Point(1,1), Point(1,2), Point(1,3)]
some_list = list()
# Get all intersecting points
for p in some_points:
if p.intersects(some_line):
some_list.append(p)
# convert gathered points to a list of destinations
destinations = MultiPoint(some_list)
# Find nearest items to a given origin point
nearest_geoms = nearest_points(some_line, destinations) # nearest_points(origin, destinations)
# print the nearest item
print(nearest_geoms[0])
I hope this helps in answering your question.

Creating a random edge to a neighbor that excludes one particular node in Networkx

I'm trying to write a function that will remove 2 random edges from a specific node in my graph except an edge with another particular node. I'm using the Python Networkx package. Here's my code:
close = nx.closeness_centrality(self.combined_network) #This calculates closeness
self.most_central_node = max(close.items(), key=itemgetter(1))[0] identifies the maximum centrality
self.combined_network.add_edge(self.most_central_node, 'bad_apple_Bad_apple') adds an edge from the most central node to a specific node
what I'd like to do next is ask self.most_central_node to break 2 edges (randomly) but not to break the one I just created with 'bad_apple_Bad_apple'. I've tried first selecting most_central_node's neighbors using:
Neighb = (B.neighbors(most_central_node))
That worked, but it returns a list. Now I'm trying to build that list without "bad_apple_Bad_apple"
I tried:
n2 = B.node['bad_apple_0']
n3 = (item for item in Neighb if item not in n2)
That doesn't work I think. When I print n3 in the console I get:
<generator object <genexpr> at 0x00000000203845E8>
What am I doing wrong? I will then need to select 2 of those at random to ask most_central_node to break its edges with them. Can someone point the way on that too?
I solved it:
if 'bad_apple_0' in Neighb: Neighb.remove('bad_apple_0')
Thanks!

Ordering linestring direction algorithm

I want to build an algorithm in python to flip linestrings (arrays of coordinates) in a linestring collection which represent segments along a road, so that I can merge all coordinates into a single array where the coordinates are rising monotonic.
So my Segmentcollection looks something like this:
segmentCollection = [['1,1', '1,3', '2,3'],
['4,3', '2,3'],
['4,3', '7,10', '5,5']]
EDIT: SO the structure is a list of lists of 2D cartesian coordinate tuples ('1,1' for example is a point at x=1 and y=1, '7,10' is a point at x=7 and y=10, and so on). The whole problem is to merge all these lists to one list of coordinate tuples which are ordered in the sense of following a road in one direction...in fact these are segments which I get from a road network routing service,but I only get segments,where each segment is directed the way it is digitized in the database,not into the direction you have to drive. I would like to get a single polyline for the navigation route out of it.
So:
- I can assume, that all segments are in the right order
- I cannot assume that the Coordinates of each segment are in the right order
- Therefore I also cannot assume that the first coordinate of the first segment is the beginning
- And I also cannot assume that the last coordinate of the last segment is the end
- (EDIT) Even thought I Know,where the start and end point of my navigation request is located,these do not have to be identical with one of the coordinate tuples in these lists,because they only have to be somewhere near a routing graph element.
The algorithm should iterate through every segment, flip it if necessary, and append it then to the resulting array. For the first segment,the challenge is to find the starting point (the point which is NOT connected to the next segment). All other segments are then connected with one point to the last segment in the order (a directed graph).
I'd wonder if there isn't some kind of sorting data structure (sorting tree or anything) which does exactly that. Could you please give some ideas? After messing around a while with loops and array comparisons my brain is knocked out, and I just need a kick into the right direction in the true sense of the word.
If I understand correctly, you don't even need to sort things. I just translated your English text into Python:
def joinSegments( s ):
if s[0][0] == s[1][0] or s[0][0] == s[1][-1]:
s[0].reverse()
c = s[0][:]
for x in s[1:]:
if x[-1] == c[-1]:
x.reverse()
c += x
return c
It still contains duplicate points, but removing those should be straightforward.
def merge_seg(s):
index_i = 0
while index_i+1<len(s):
index_j=index_i+1
while index_j<len(s):
if c[index_i][-1] == c[index_j][0]:
c[index_i].extend(c[index_j][1:])
del c[index_j]
elif c[index_i][-1] == c[index_j][-1]:
c[index_i].extend(c[index_j].reverse()[1:])
del c[index_j]
else:
index_j+=1
index_i+=1
result = []
s.reverse()
for seg_index in range(len(s)-1):
result+=s[seg_index][:-1]#use [:-1] to delete the duplicate items
result+=s[-1]
return result
In inner while loop,every successive segment of s[index_i] is appended to s[index_i]
then index_i++ until every segments is processed.
therefore it is easy to proof that after these while loops, s[0][0] == s[1][-1], s[1][0] == s[2][-1], etc. so just reverse the list and put them together finally you will get your result.
Note: It is the most simple and straightford way, but not most time efficient.
for more algo see:http://en.wikipedia.org/wiki/Sorting_algorithm
You say that you can assume that all segments are in the right order, which means that independently of the coordinates order, your problem is basically to merge sorted arrays.
You would have to flip a segment if it's not defined in the right order, but this doesn't have a single impact on the main algorithm.
simply defind this reordering function:
def reorder(seg):
s1 = min(seg)
e1 = max(seg)
return (s1, e1)
and this comparison funciton
def cmp(seg1, seg2):
return cmp(reorder(seg1), reorder(seg2))
and you are all set, just run a typical merge algorithm:
http://en.wikipedia.org/wiki/Merge_algorithm
And in case, I didn't really understand your problem statement, here's another idea:
Use a segment tree which is a structure that is made exactly to store segments :)

Optimising model of social network evolution

I am writing a piece of code which models the evolution of a social network. The idea is that each person is assigned to a node and relationships between people (edges on the network) are given a weight of +1 or -1 depending on whether the relationship is friendly or unfriendly.
Using this simple model you can say that a triad of three people is either "balanced" or "unbalanced" depending on whether the product of the edges of the triad is positive or negative.
So finally what I am trying to do is implement an ising type model. I.e. Random edges are flipped and the new relationship is kept if the new network has more balanced triangels (a lower energy) than the network before the flip, if that is not the case then the new relationship is only kept with a certain probability.
Ok so finally onto my question: I have written the following code, however the dataset I have contains ~120k triads, as a result it will take 4 days to run!
Could anyone offer any tips on how I might optimise the code?
Thanks.
#Importing required librarys
try:
import matplotlib.pyplot as plt
except:
raise
import networkx as nx
import csv
import random
import math
def prod(iterable):
p= 1
for n in iterable:
p *= n
return p
def Sum(iterable):
p= 0
for n in iterable:
p += n[3]
return p
def CalcTriads(n):
firstgen=G.neighbors(n)
Edges=[]
Triads=[]
for i in firstgen:
Edges.append(G.edges(i))
for i in xrange(len(Edges)):
for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)
if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
t=[n,Edges[i][j][0],Edges[i][j][1]]
t.sort()
Triads.append(t)# Add found nodes to Triads.
new_Triads = []# Delete duplicate triads.
for elem in Triads:
if elem not in new_Triads:
new_Triads.append(elem)
Triads = new_Triads
for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.
a=G[Triads[i][0]][Triads[i][1]].values()
b=G[Triads[i][1]][Triads[i][2]].values()
c=G[Triads[i][2]][Triads[i][0]].values()
Q=prod(a+b+c)
Triads[i].append(Q)
return Triads
###### Import sorted edge data ######
li=[]
with open('Sorted Data.csv', 'rU') as f:
reader = csv.reader(f)
for row in reader:
li.append([float(row[0]),float(row[1]),float(row[2])])
G=nx.Graph()
G.add_weighted_edges_from(li)
for i in xrange(800000):
e = random.choice(li) # Choose random edge
TriNei=[]
a=CalcTriads(e[0]) # Find triads of first node in the chosen edge
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member
e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge
G.clear()
G.add_weighted_edges_from(li)
TriNei=[]
a=CalcTriads(e[0])
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]):
TriNei.append(a[i])
postH=-Sum(TriNei)# Calculate the post flip "energy".
if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
continue
elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(-DeltaH)/exp(1) at the moment)
e[2]=-1*e[2]
The following suggestions won't boost your performance that much because they are not on the algorithmic level, i.e. not very specific to your problem. However, they are generic suggestions for slight performance improvements:
Unless you are using Python 3, change
for i in range(800000):
to
for i in xrange(800000):
The latter one just iterates numbers from 0 to 800000, the first one creates a huge list of numbers and then iterates that list. Do something similar for the other loops using range.
Also, change
j=random.choice(range(len(li)))
e=li[j] # Choose random edge
to
e = random.choice(li)
and use e instead of li[j] subsequently. If you really need a index number, use random.randint(0, len(li)-1).
There are syntactic changes you can make to speed things up, such as replacing your Sum and Prod functions with the built-in equivalents sum(x[3] for x in iterable) and reduce(operator.mul, iterable) - it is generally faster to use builtin functions or generator expressions than explicit loops.
As far as I can tell the line:
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
is testing if a float is in a list of floats. Replacing it with if e[1] in a[i]: will remove the overhead of creating two set objects for each comparison.
Incidentally, you do not need to loop through the index values of an array, if you are only going to use that index to access the elements. e.g. replace
for i in range(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
with
for x in a:
if set([e[1]]).issubset(x): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(x)
However I suspect that changes like this will not make a big difference to the overall runtime. To do that you either need to use a different algorithm or switch to a faster language. You could try running it in pypy - for some cases it can be significantly faster than CPython. You could also try cython, which will compile your code to C and can sometimes give a big performance gain especially if you annotate your code with cython type information. I think the biggest improvement may come from changing the algorithm to one that does less work, but I don't have any suggestions for that.
BTW, why loop 800000 times? What is the significance of that number?
Also, please use meaningful names for your variables. Using single character names or shrtAbbrv does not speed the code up at all, and makes it very hard to follow what it is doing.
There are quite a few things you can improve here. Start by profiling your program using a tool like cProfile. This will tell you where most of the program's time is being spent and thus where optimization is likely to be most helpful. As a hint, you don't need to generate all the triads at every iteration of the program.
You also need to fix your indentation before you can expect a decent answer.
Regardless, this question might be better suited to Code Review.
I'm not sure I understand exactly what you are aiming for, but there are at least two changes that might help. You probably don't need to destroy and create the graph every time in the loop since all you are doing is flipping one edge weight sign. And the computation to find the triangles can be improved.
Here is some code that generates a complete graph with random weights, picks a random edge in a loop, finds the triads and flips the edge weight...
import random
import networkx as nx
# complete graph with random 1/-1 as weight
G=nx.complete_graph(5)
for u,v,d in G.edges(data=True):
d['weight']=random.randrange(-1,2,2) # -1 or 1
edges=G.edges()
for i in range(10):
u,v = random.choice(edges) # random edge
nbrs = set(G[u]) & set(G[v]) - set([u,v]) # nodes in traids
triads = [(u,v,n) for n in nbrs]
print "triads",triads
for u,v,w in triads:
print (u,v,G[u][v]['weight']),(u,w,G[u][w]['weight']),(v,w,G[v][w]['weight'])
G[u][v]['weight']*=-1

Categories