I carefully read the docs, but it still is unclear to me how to use G.forEdges(), described as an "experimental edge iterator interface".
Let's say that I want to decrease the density of my graph. I have a sorted list of weights, and I want to remove edges based on their weight until the graph splits into two connected components. Then I'll select the minimum number of links that keeps the graph connected. I would do something like this:
cc = components.ConnectedComponents(G).run()
while cc.numberOfComponents()==1:
for weight in weightlist:
for (u,v) in G.edges():
if G.weight(u,v)==weight:
G=G.removeEdge(u,v)
By the way I know from the docs that there is this edge iterator, which probably does the iteration in a more efficient way. But from the docs I really can't understand how to correctly use this forEdges, and I can't find a single example over the internet. Any ideas?
Or maybe an alternative idea to do what I want to do: since it's a huge graph (125millions links) the iteration will take forever, even if I am working on a cluster.
NetworKit iterators accept a callback function so if you want to iterate over edges (or nodes) you have to define a function and then pass it to the iterator as a parameter. You can find more information here. For example a simple function that just prints all edges is:
# Callback function.
# To iterate over edges it must accept 4 parameters
def myFunction(u, v, weight, edgeId):
print("Edge from {} to {} has weight {} and id {}".format(u, v, weight, edgeId))
# Using iterator with callback function
G.forEdges(myFunction)
Now if you want to keep removing edges whose weight is inside your weightlist until the graph splits into two connected components you also have to update the connected components of the graph since ConnectedComponents will not do that for you automatically (this may be also one of the reasons why the iteration takes forever). To do this efficiently, you can use the DynConnectedComponents class (see my example below). In this case, I think that the edge iterator will not help you much so I would suggest you to keep using the for loop.
from networkit import *
# Efficiently updates connected components after edge updates
cc = components.DynConnectedComponents(G).run()
# Removes edges with weight equals to w until components split
def removeEdges(w):
for (u, v) in G.edges():
if G.weight(u, v) == weight:
G.removeEdge(u, v)
# Updating connected components
event = dynamic.GraphEvent(dynamic.GraphEvent.EDGE_REMOVAL, u, v, weight)
cc.update(event)
if cc.numberOfComponents() > 1:
# Components did split
return True
# Components did not split
return False
if cc.numberOfComponents() == 1:
for weight in weights:
if removeEdges(weight):
break
This should speed up a bit your original code. However, it is still sequential code so even if you run it on a multi-core machine it will use only one core.
Related
My network has several connected components.
I would need to calculate, for each of them, the diameter and radius.
I have tried to loop through each of them, but something is going wrong (error: AttributeError: 'set' object has no attribute 'order')
N_comp=[len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)] # this should return the maximum number of connected components in the network
for i in range(N_comp):
# Calculate average diameter and average path length of connected components
print('Average diameter:{}'.format(nx.diameter(nx.connected_components(G), key=len))))
print('average path length: {}'.format(nx.average_shortest_path_length(nx.connected_components(G), key=len)))
From the documentation, connected_components returns "A generator of sets of nodes, one for each component of G."
But these are not subgraphs. You'll need to take the induced subgraph of G on each of these sets of nodes.
From the documentation: "to create the induced subgraph of each component use: S = [G.subgraph(c).copy() for c in nx.connected_components(G)]"
Once you've got S, you can do the calculations you want on the elements of S (which is something similar to what I think you were trying to create in the list N_comp)
In your code there are additional issues. It looks to me like you've set up a for loop over N_comp (which is a list of sets of nodes). But you never use the things you're looping over, and I don't see any attempt to calculate the diameter of the individual components and then average them.
Description of the problem:
The objective is to extract the component that a certain vertex belongs to in order to calculate its size.
Steps of the code:
Use the igraph method clusters() to get the list of all connected components (c.c) in the graph.
Then, iterate over the c.c list while checking each time if that certain node belongs to it or not.
When it is found, I calculate its size.
The code is as follows:
def sizeofcomponent(clusters, vertex):
for i in range(len(clusters)):
if str(vertex) in clusters.subgraphs()[i].vs["name"]:
return(len(clusters.subgraphs()[i].vs["name"]))
The Problem is that this code will be used with extremely large graphs, and this way of doing things slowed my code by a lot. Is there a way to improve it?
EDIT 01: Explanation of how the algorithm works
Suppose that the following graph is the main graph:
The Maximal Independent Set (MIS) is calculated and we get the following graph that I call components:
Randomly add a node from the main Graph in a way that that node belongs to the main graph but doesn't belong to components (isn't part of the MIS). Example: adding node 10 to components.
Calculate the size of the component it forms.
The process is repeated with all nodes (ones that don't belong in components (MIS).
In the end, the node that forms the smallest component (smallest size) is the one added permanently to components.
Your solution:
When the following code is executed (i being the vertex):
cls = components.clusters()
c = cls.membership[i]
The variable c value would be the following list:
Example: node (2) belongs to the component 1 (of id 1).
Why it wouldn't work for me:
The following line of code wouldn't give me the correct result:
cls = components.clusters()
c = cls.membership[i]
because the ids of the nodes in the list c don't match up with the name of the nodes. Example: cls.membership[i] would give an exception error: list out of range. Instead of the correct result which is: 4.
Also, from your code, the size, in your case, is calculated in the following way:
c = components.membership[i]
s = components.membership.count(c)
You can simply get the component vertex i belongs to by doing
components = G.clusters()
c = components.membership[i]
You can then get the size of component c using
s = components.size(c)
I am looking for a way to get the indices of the local maxima of a tensor using TensorFlow exclusively.
tl;dr
I am not a data scientist. I don't know much about the theory behind much of computer vision, but I am trying to build a computer vision app using TensorFlow. I plan on saving my model and calling it as a service using TF Serving, so I can't depend on external libraries such as numpy, scipy, etc. What I want to accomplish is algorithmically the same as scipy's signal.argrelextrema, but in a way that can be saved with my model and rerun. Other algorithms for this have been shown here, but none execute within TensorFlow. Can anyone point me in the right direction?
EDIT
My first solution was functional, but inefficient. It required five iterations of the tensor (zero-trail, reverse, zero-trail, reverse, where). I now have a solution that requires only two iterations, and is also flexible enough to quickly identify local minima as well...
def get_slope(prev, cur):
# A: Ascending
# D: Descending
# P: PEAK (on previous node)
# V: VALLEY (on previous node)
return tf.cond(prev[0] < cur, lambda: (cur, ascending_or_valley(prev, cur)), lambda: (cur, descending_or_peak(prev, cur)))
def ascending_or_valley(prev, cur):
return tf.cond(tf.logical_or(tf.equal(prev[1], 'A'), tf.equal(prev[1], 'V')), lambda: np.array('A'), lambda: np.array('V'))
def descending_or_peak(prev, cur):
return tf.cond(tf.logical_or(tf.equal(prev[1], 'A'), tf.equal(prev[1], 'V')), lambda: np.array('P'), lambda: np.array('D'))
def label_local_extrema(tens):
"""Return a vector of chars indicating ascending, descending, peak, or valley slopes"""
# initializer element values don't matter, just the type.
initializer = (np.array(0, dtype=np.float32), np.array('A'))
# First, get the slope for each element
slope = tf.scan(get_slope, tens, initializer)
# shift by one, since each slope indicator is the slope
# of the previous node (necessary to identify peaks and valleys)
return slope[1][1:]
def find_local_maxima(tens):
"""Return the indices of the local maxima of the first dimension of the tensor"""
return tf.squeeze(tf.where(tf.equal(label_local_extrema(blur_x_tf), 'P')))
End EDIT
Ok, I've managed to find a solution, but it's not pretty. The following function takes a 1D tensor, and reduces all points that are not local maxima to zero. This will work only for positive numbers, and would require modification for datatypes other than float32, but it meets my needs.
There has to be a better way to do this, though.
def zero_descent(prev, cur):
"""reduces all descent steps to zero"""
return tf.cond(prev[0] < cur, lambda: (cur, cur), lambda: (cur, 0.0))
def skeletonize_1d(tens):
"""reduces all point other than local maxima to zero"""
# initializer element values don't matter, just the type.
initializer = (np.array(0, dtype=np.float32), np.array(0, dtype=np.float32))
# First, zero out the trailing side
trail = tf.scan(zero_descent, tens, initializer)
# Next, let's make the leading side the trailing side
trail_rev = tf.reverse(trail[1], [0])
# Now zero out the leading (now trailing) side
lead = tf.scan(zero_descent, trail_rev, initializer)
# Finally, undo the reversal for the result
return tf.reverse(lead[1], [0])
def find_local_maxima(tens):
return tf.where(skeletonize_1d >0)
Pseudo:
input_matrix == max_pool(input_matrix)
Explanation:
When input values are the same as the ones taken by max_pooling, it means they are the greatest around.
I don't think you are giving enough information to clarify much. First of all, I'm not sure you want to get the maximum element of a Tensor (there is a function in tf for this) or you want to find the local maxima of a function (not a Tensor). In this case, you can revert the function and find the local minima which would result in what you are looking for.
I have a large network to analyze. For example:
import networkx as nx
import random
BA = nx.random_graphs.barabasi_albert_graph(1000000, 3)
nx.info(BA)
I have to shuffle the edges while keeping the degree distribution unchanged. The basic idea was introduced by Maslov. Thus, my colleague and I wrote a shuffleNetwork function in which we work on a network object G for num times. edges is a list object.
The problem is this function runs too slow for large networks. I tried to use set or dict instead of list for the edges object (set and dict are hash table). However, since we also need to delete and add elements to it, the time complexity becomes even bigger.
Do you have any suggestions on further optimising this function?
def shuffleNetwork(G,Num):
edges=G.edges()
l=range(len(edges))
for n in range(Num):
i,j = random.sample(l, 2)
a,b=edges[i]
c,d=edges[j]
if a != d and c!= b:
if not (a,d) in edges or (d, a) in edges or (c,b) in edges or (b, c) in edges:
edges[i]=(a,d)
edges[j]=(c,b)
K=nx.from_edgelist(edges)
return K
import timeit
start = timeit.default_timer()
#Your statements here
gr = shuffleNetwork(BA, 1000)
stop = timeit.default_timer()
print stop - start
You should consider using nx.double_edge_swap
The documentation is here. It looks like it does exactly what you want, but modifies the graph in place.
I'm not sure whether it will solve the speed issues, but it does avoid generating the list, so I think it will do better than what you've got.
You would call it with nx.double_edge_swap(G,nswap=number)
I am writing a piece of code which models the evolution of a social network. The idea is that each person is assigned to a node and relationships between people (edges on the network) are given a weight of +1 or -1 depending on whether the relationship is friendly or unfriendly.
Using this simple model you can say that a triad of three people is either "balanced" or "unbalanced" depending on whether the product of the edges of the triad is positive or negative.
So finally what I am trying to do is implement an ising type model. I.e. Random edges are flipped and the new relationship is kept if the new network has more balanced triangels (a lower energy) than the network before the flip, if that is not the case then the new relationship is only kept with a certain probability.
Ok so finally onto my question: I have written the following code, however the dataset I have contains ~120k triads, as a result it will take 4 days to run!
Could anyone offer any tips on how I might optimise the code?
Thanks.
#Importing required librarys
try:
import matplotlib.pyplot as plt
except:
raise
import networkx as nx
import csv
import random
import math
def prod(iterable):
p= 1
for n in iterable:
p *= n
return p
def Sum(iterable):
p= 0
for n in iterable:
p += n[3]
return p
def CalcTriads(n):
firstgen=G.neighbors(n)
Edges=[]
Triads=[]
for i in firstgen:
Edges.append(G.edges(i))
for i in xrange(len(Edges)):
for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)
if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
t=[n,Edges[i][j][0],Edges[i][j][1]]
t.sort()
Triads.append(t)# Add found nodes to Triads.
new_Triads = []# Delete duplicate triads.
for elem in Triads:
if elem not in new_Triads:
new_Triads.append(elem)
Triads = new_Triads
for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.
a=G[Triads[i][0]][Triads[i][1]].values()
b=G[Triads[i][1]][Triads[i][2]].values()
c=G[Triads[i][2]][Triads[i][0]].values()
Q=prod(a+b+c)
Triads[i].append(Q)
return Triads
###### Import sorted edge data ######
li=[]
with open('Sorted Data.csv', 'rU') as f:
reader = csv.reader(f)
for row in reader:
li.append([float(row[0]),float(row[1]),float(row[2])])
G=nx.Graph()
G.add_weighted_edges_from(li)
for i in xrange(800000):
e = random.choice(li) # Choose random edge
TriNei=[]
a=CalcTriads(e[0]) # Find triads of first node in the chosen edge
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member
e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge
G.clear()
G.add_weighted_edges_from(li)
TriNei=[]
a=CalcTriads(e[0])
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]):
TriNei.append(a[i])
postH=-Sum(TriNei)# Calculate the post flip "energy".
if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
continue
elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(-DeltaH)/exp(1) at the moment)
e[2]=-1*e[2]
The following suggestions won't boost your performance that much because they are not on the algorithmic level, i.e. not very specific to your problem. However, they are generic suggestions for slight performance improvements:
Unless you are using Python 3, change
for i in range(800000):
to
for i in xrange(800000):
The latter one just iterates numbers from 0 to 800000, the first one creates a huge list of numbers and then iterates that list. Do something similar for the other loops using range.
Also, change
j=random.choice(range(len(li)))
e=li[j] # Choose random edge
to
e = random.choice(li)
and use e instead of li[j] subsequently. If you really need a index number, use random.randint(0, len(li)-1).
There are syntactic changes you can make to speed things up, such as replacing your Sum and Prod functions with the built-in equivalents sum(x[3] for x in iterable) and reduce(operator.mul, iterable) - it is generally faster to use builtin functions or generator expressions than explicit loops.
As far as I can tell the line:
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
is testing if a float is in a list of floats. Replacing it with if e[1] in a[i]: will remove the overhead of creating two set objects for each comparison.
Incidentally, you do not need to loop through the index values of an array, if you are only going to use that index to access the elements. e.g. replace
for i in range(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
with
for x in a:
if set([e[1]]).issubset(x): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(x)
However I suspect that changes like this will not make a big difference to the overall runtime. To do that you either need to use a different algorithm or switch to a faster language. You could try running it in pypy - for some cases it can be significantly faster than CPython. You could also try cython, which will compile your code to C and can sometimes give a big performance gain especially if you annotate your code with cython type information. I think the biggest improvement may come from changing the algorithm to one that does less work, but I don't have any suggestions for that.
BTW, why loop 800000 times? What is the significance of that number?
Also, please use meaningful names for your variables. Using single character names or shrtAbbrv does not speed the code up at all, and makes it very hard to follow what it is doing.
There are quite a few things you can improve here. Start by profiling your program using a tool like cProfile. This will tell you where most of the program's time is being spent and thus where optimization is likely to be most helpful. As a hint, you don't need to generate all the triads at every iteration of the program.
You also need to fix your indentation before you can expect a decent answer.
Regardless, this question might be better suited to Code Review.
I'm not sure I understand exactly what you are aiming for, but there are at least two changes that might help. You probably don't need to destroy and create the graph every time in the loop since all you are doing is flipping one edge weight sign. And the computation to find the triangles can be improved.
Here is some code that generates a complete graph with random weights, picks a random edge in a loop, finds the triads and flips the edge weight...
import random
import networkx as nx
# complete graph with random 1/-1 as weight
G=nx.complete_graph(5)
for u,v,d in G.edges(data=True):
d['weight']=random.randrange(-1,2,2) # -1 or 1
edges=G.edges()
for i in range(10):
u,v = random.choice(edges) # random edge
nbrs = set(G[u]) & set(G[v]) - set([u,v]) # nodes in traids
triads = [(u,v,n) for n in nbrs]
print "triads",triads
for u,v,w in triads:
print (u,v,G[u][v]['weight']),(u,w,G[u][w]['weight']),(v,w,G[v][w]['weight'])
G[u][v]['weight']*=-1