NetworkX largest component no longer working? - python

according to networkx documentation, connected_component_subgraphs(G) returns a sorted list of all components. Thus the very first one should be the largest component.
However, when I try to get the largest component of a graph G using the example code on the documentation page
G=nx.path_graph(4)
G.add_edge(5,6)
H=nx.connected_component_subgraphs(G)[0]
I get
TypeError: 'generator' object has no attribute '__getitem__'
It used to work on my other computer with earlier versions of networkx (1.7 I think, not 100% sure)
Now I am using a different computer with python 2.7.7 and networkx 1.9. Is it a version problem?
I have written a small function with a couple lines myself to find the largest component, just wondering why this error came out.
BTW, I can get the components by converting the generator object to a list.
components = [comp for comp in nx.connected_components(G)]
But the list is not sorted by component size as stated in the documentation.
example:
G = nx.Graph()
G.add_edges_from([(1,2),(1,3),(4,5)])
G.add_nodes_from(range(6,20))
components = [comp for comp in nx.connected_components(G)]
component_size = [len(comp) for comp in components]
print G.number_of_nodes(), G.number_of_edges(), component_size
G = nx.Graph()
G.add_edges_from([(1000,2000),(1000,3000),(4000,5000)])
G.add_nodes_from(range(6,20))
components = [comp for comp in nx.connected_components(G)]
component_size = [len(comp) for comp in components]
print G.number_of_nodes(), G.number_of_edges(), component_size
output:
19 3 [3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
19 3 [2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
looks like when the node names are large numbers and when there are a bunch of single nodes, the returned subgraphs are not sorted properly

The networkx-1.9 documentation is here http://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.components.connected.connected_components.html#networkx.algorithms.components.connected.connected_components
The interface was changed to return a generator (as you figured out). The example in the documentation shows how to do what you ask.
Generate a sorted list of connected components, largest first.
>> G = nx.path_graph(4)
>>> G.add_path([10, 11, 12])
>>> sorted(nx.connected_components(G), key = len, reverse=True)
[[0, 1, 2, 3], [10, 11, 12]]
or
>>> sorted(nx.connected_component_subgraphs(G), key = len, reverse=True)

As for version 2.4:
nx.connected_component_subgraphs(G) is removed.
Instead to get the same result use:
connected_subgraphs = [G.subgraph(cc) for cc in nx.connected_components(G)]
And to get the giant component:
gcc = max(nx.connected_components(G), key=len)
giantC = G.subgraph(gcc)

Related

Accessing items in a list, and forming graphs

I have a list of 2D numpy arrays:
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
Each line in linelist is the array of vertices connecting the edge.
These elements are the lines that form two squares:
-----
| |
-----
-----
| |
-----
I want to form two graphs, one for each square. To do this, I use a for loop. If neither vertex is present in the graph, then we form a new graph. If one vertex is present in the linelist, then it gets added to a present graph. In order for two lines to be connected, they need to share a vertex in common. However, I am having trouble coding this.
This is what I have so far:
graphs = [[]]
i=0
for elements in linelist:
for graph in graphs:
if elements[0] not in graph[i] and elements[1] not in graph[i]:
graphs.append([])
graphs[i].append(elements)
i=i+1
else:
graphs[i].append(elements)
I suggest doing a 'diffusion-like' process over the graph to find the disjoint subgraphs. One algorithm that comes to mind is breadth-first search; it works by looking for what nodes can be reached from a start node.
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
# edge list usually reads v1 -> v2
graph = {}
# however these are lines so symmetry is assumed
for l in linelist:
v1, v2 = map(tuple, l)
graph[v1] = graph.get(v1, ()) + (v2,)
graph[v2] = graph.get(v2, ()) + (v1,)
def BFS(graph):
"""
Implement breadth-first search
"""
# get nodes
nodes = list(graph.keys())
graphs = []
# check all nodes
while nodes:
# initialize BFS
toCheck = [nodes[0]]
discovered = []
# run bfs
while toCheck:
startNode = toCheck.pop()
for neighbor in graph.get(startNode):
if neighbor not in discovered:
discovered.append(neighbor)
toCheck.append(neighbor)
nodes.remove(neighbor)
# add discovered graphs
graphs.append(discovered)
return graphs
print(BFS(graph))
for idx, graph in enumerate(BFS(graph)):
print(f"This is {idx} graph with nodes {graph}")
Output
This is 0 graph with nodes [(1, 0), (0, 1), (0, 0), (1, 1)]
This is 1 graph with nodes [(3, 1), (2, 2), (1, 2), (3, 2)]
You may be interested in the package networkx for analyzing graphs. For instance finding the disjoint subgraphs is pretty trivial:
import networkx as nx
tmp = [tuple(tuple(j) for j in i) for i in linelist]
graph = nx.Graph(tmp);
for idx, graph in enumerate(nx.connected_components(graph)):
print(idx, graph)
My approach involves 2 passes over the list. In the first pass, I will look at the vertices and assign a graph number to each (1, 2, ...) If both vertices have not been seen, I will assign a new graph number. Otherwise, assign it to an existing one.
In the second pass, I go through the list and group the edges that belong to the same graph number together. Here is the code:
import collections
import itertools
import pprint
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
# First pass: Look at the vertices and figure out which graph they
# belong to
vertices = {}
graph_numbers = itertools.count(1)
for v1, v2 in linelist:
v1 = tuple(v1)
v2 = tuple(v2)
graph_number = vertices.get(v1) or vertices.get(v2) or next(graph_numbers)
vertices[v1] = graph_number
vertices[v2] = graph_number
print('Vertices:')
pprint.pprint(vertices)
# Second pass: Sort edges
graphs = collections.defaultdict(list)
for v1, v2 in linelist:
graph_number = vertices[tuple(v1)]
graphs[graph_number].append([v1, v2])
print('Graphs:')
pprint.pprint(graphs)
Output:
Vertices:
{(0, 0): 1,
(0, 1): 1,
(1, 0): 1,
(1, 1): 1,
(1, 2): 2,
(2, 2): 2,
(3, 1): 2,
(3, 2): 2}
Graphs:
defaultdict(<type 'list'>, {1: [[[0, 0], [1, 0]], [[0, 0], [0, 1]], [[1, 0], [1, 1]], [[0, 1], [1, 1]]], 2: [[[1, 2], [3, 1]], [[1, 2], [2, 2]], [[3, 1], [3, 2]], [[2, 2], [3, 2]]]})
Notes
I have to convert each vertex from a list to a tuple because list cannot be a dictionary's key.
graphs behaves like a dictionary, the keys are graph numbers (1, 2, ...) and the values are list of edges
A little explanation of the line
graph_number = vertices.get(v1) or vertices.get(v2) or next(graph_numbers)
That line is roughly equal to:
number1 = vertices.get(v1)
number2 = vertices.get(v2)
if number1 is None and number2 is None:
graph_number = next(graph_numbers)
elif number1 is not None:
graph_number = number1
else:
graph_number = number2
Which says: If both v1 and v2 are not in the vertices, generate a new number (i.e. next(graph_numbers)). Otherwise, assign graph_number to whichever value that is not None.
Not only that line is succinct, it takes advantage of Python's short circuit feature: The interpreter first evaluate vertices.get(v1). If this returns a number (1, 2, ...) then the interpreter will return that number and skips evaluating the vertices.get(v2) or next(graph_numbers) part.
If vertices.get(v1) returns None, which is False in Python, then the interpreter will evaluate the next segment of the or: vertices.get(v2). Again, if this returns a non-zero number, then the evaluation stops and that number is return. If vertices.get(v2) returns None, then the interpreter evaluates the last segment, next(graph_numbers) and returns that value.

Find first and second order contacts of each node in a network

I have a graph having 602647 nodes and 982982 edges. I wanted to find the first and second order contacts (i.e. 1-hop contacts and 2-hops contacts) for each node in the graph in Networkx.
i built the following code that worked fine for smaller graphs, but never finished running for larger (graphs as the one above):
hop_1 = {}
hop_2 = {}
row_1 = {}
row_2 = {}
for u, g in G.nodes(data=True):
row_1.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=1))
row_2.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=2))
hop_1.update(row_1)
hop_2.update(row_2)
some notes:
results are stored first in a dict (hope_1 and hope_2)
row_1 and row_2 and temporary holding variables
hop-1 will include nodes after one jump
hop-2 will include nodes that are located at both one jump and two jumps
Is there a way to optimize/imrpove this code and finish running?
To find first and second-order neighbors you can use functions all_neighbors() and node_boundary():
hop1 = {}
hop2 = {}
for n in G.nodes():
neighbs1 = list(nx.all_neighbors(G, n))
hop1[n] = neighbs1
hop2[n] = list(nx.node_boundary(G, neighbs1 + [n])) + neighbs1
print(hop1)
# {0: [1, 2, 3], 1: [0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [2, 3]}
print(hop2)
# {0: [4, 1, 2, 3], 1: [4, 0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [0, 1, 2, 3]}
I don't know networkx; but, by definition, a node that is reachable one hop is also reachable in <=2 hops, which is what the docs (and source) of single_source_shortest_path_length is giving you. you can therefore remove the first call to single_source_shortest_path_length.
second, your uses of dictionaries are very strange! why are you using setdefault rather than just setting elements? also you're copying things a lot with update which doesn't do anything useful and just wastes time.
I'd do something like:
hop_1 = {}
hop_2 = {}
for u in G.nodes():
d1 = []
d2 = []
for v, n in nx.single_source_shortest_path_length(G, u, cutoff=2).items():
if n == 1:
d1.append(v)
elif n == 2:
d2.append(v)
hop_1[u] = d1
hop_2[u] = d2
which takes about a minute on my laptop with a G_nm graph as generated by:
import networkx as nx
G = nx.gnm_random_graph(602647, 982982)
note that tqdm is nice for showing progress of long running loops, just import tqdm and change the outer for loop to be:
for u in tqdm(G.nodes()):
...
and you'll get a nice bar reporting progress

How to store calculated values?

I have been trying to write code which gives the solution of the number of ways of reaching a sum, which is specified. This is very similar to the subset sums problem which I have found online (Finding all possible combinations of numbers to reach a given sum).
I modified slightly the code so that it reuses numbers multiple times.
object_list = [(2, 50), (3, 100), (5, 140)] # the first number in the tuple is my weight and the second is my cost
max_weight = 17
weight_values = [int(i[0]) for i in object_list]
cost_values = [int(i[1]) for i in object_list]
def subset_sum(objects, max_weight, weights=[]):
w = sum(weights)
if w == max_weight:
print("sum(%s)=%s" % (weights, max_weight))
if w >= max_weight:
return
for i in range(len(objects)):
o = objects[i]
subset_sum(objects, max_weight, weights + [o])
if __name__ == "__main__":
subset_sum(weight_values, max_weight)
print(subset_sum(weight_values, max_weight))
This gives the solution:
sum([2, 2, 2, 2, 2, 2, 2, 3])=17
sum([2, 2, 2, 2, 2, 2, 3, 2])=17
sum([2, 2, 2, 2, 2, 2, 5])=17
sum([2, 2, 2, 2, 2, 3, 2, 2])=17
...
So on so forth.
Unlike the original I am using a list of tuples and then taking the first value of the tuple to make a list. This is the same with the last value.
The part I am currently stuck on is how to store these values and reuse them in the next part of the code.I had a look at this post but I couldn't understand it (Python: How to store the result of an executed function and re-use later?).
So I want to store the part of the solution which stores [2, 2, 2, 2, 2, 2, 2, 3] from the solution sum([2, 2, 2, 2, 2, 2, 2, 3])=17. I want to do this for all solutions because in the next step i am going to replace the numbers with the next part of the tuple (so 2 will be replaced by 50 because the tuple that 2 is in is (2,50)). Then I am going to use this to print another sum value with the replaced numbers and print the highest value (probably going to sort the solutions from highest to lowest and print the first value in the list)
I tried using a dictionary to try and replace the values after the calculation but i couldn't manage to do it.
I tried:
dictionary = dict(zip(weight_values, cost_values))
Any help is appreciated. before anyone asks Ia have looked online for solutions and have no one else to ask help from, since the only person i know who has a background in coding is my brother who isn't at home

Python: Dictionary to Spare Vector

I am new to Python and programming in general. I was working on Pyschool exercises Topic 8, Q 11 on converting Dictionary to Spare Vectore.
I was asked to Write a function that converts a dictionary back to its sparese vector representation.
Examples
>>> convertDictionary({0: 1, 3: 2, 7: 3, 12: 4})
[1, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 4]
>>> convertDictionary({0: 1, 2: 1, 4: 2, 6: 1, 9: 1})
[1, 0, 1, 0, 2, 0, 1, 0, 0, 1]
>>> convertDictionary({})
[]
I have attempted many times. Below is the latest code I have:
def convertDictionary(dictionary):
k=dictionary.keys()
v=dictionary.values()
result=[]
for i in range(0,max(k)):
result.append(0)
for j in k:
result[j]=v[k.index(j)]
return result
The returned error is:
Traceback (most recent call last):
File "Code", line 8, in convertDictionary
IndexError: list assignment index out of range
Could anyone help me? Thank you so much!
Something like this should suffice:
M = max(dictionary, default=0)
vector = [dictionary.get(i, 0) for i in range(M)]
Translated into a plain old for-loop
M = max(dictionary, default=0)
vector = []
for i in range(M):
vector.append(dictionary.get(i, 0))
The get method lets you provide a default as a second argument in case the key is missing. Once you get more advance you could use a defaultdict
Edit: the default parameter for max requires Python >3.4 . You can either use exception handling (generally prefered) or explicit checks for empty dictionary to deal with that case if you have earlier versions.
Your code works logically fine, but you have a problem of indentation. Your function should be:
def convertDictionary(dictionary):
k=dictionary.keys()
v=dictionary.values()
result=[]
for i in range(0,max(k)):
result.append(0)
for j in k:
result[j]=v[k.index(j)]
return result
The problem is that your second for was inside of the first. What you want is to build a list with max(k) elements and then put the right values into it. Then, the two for loops should be one after the other, rather than one inside of the other.

Creating a diff array using lambda functions in python

I wish to create a diff array in python as follows
>>> a = [1,5,3,8,2,4,7,6]
>>> diff = []
>>> a = sorted(a,reverse=True)
>>> for i in xrange(len(a)-1):
diff.append(a[i]-a[i+1])
But I wanted to refactor the above code. I tried to achieve it using lambda functions. But failed to get the result.
>>> [i for i in lambda x,y:y-x,sorted(a,reverse=True)]
The above code returns
[<function <lambda> at 0x00000000023B9C18>, [1, 2, 3, 4, 5, 6, 7, 8]]
I wished to know can the required functionality be achieved using lambda functions or any other technique?
Thanks in advance for any help!!
NOTES:
1) Array 'a' can be huge. Just for the sake of example I have taken a small array.
2) The result must be achieved in minimum time.
If you can use numpy:
import numpy as np
a = [1,5,3,8,2,4,7,6]
j = np.diff(np.sorted(a)) # array([1, 1, 1, 1, 1, 1, 1])
print list(j)
# [1, 1, 1, 1, 1, 1, 1]
k = np.diff(a) # array([ 4, -2, 5, -6, 2, 3, -1])
print list(k)
# [4, -2, 5, -6, 2, 3, -1]
Timing comparisons with one-hundred-thousand random ints - numpy is faster if the data needs to be sorted:
from timeit import Timer
a = [random.randint(0, 1000000) for _ in xrange(100000)]
##print a[:100]
def foo(a):
a = sorted(a, reverse=True)
return [a[i]-a[i+1] for i in xrange(len(a)-1)]
def bar(a):
return np.diff(np.sort(a))
t = Timer('foo(a)', 'from __main__ import foo, bar, np, a')
print t.timeit(10)
# 0.86916993838
t = Timer('bar(a)', 'from __main__ import foo, bar, np, a')
print t.timeit(10)
# 0.28586356791
You can use list comprehension, as follows:
>>> a = sorted([1,5,3,8,2,4,7,6], reverse=True)
>>> diff = [a[i]-a[i+1] for i in xrange(len(a)-1)]
>>> diff
[1, 1, 1, 1, 1, 1, 1]
>>>
You said or any other technique, so I take this to be valid. However, I haven't found a working lambda solution yet :)
Comparing the time of this answer with all of the below:
Mine:
1.59740447998e-05 seconds
#Marcin's
0.00110197067261 seconds
#roippi's
0.000382900238037
#wwii's
0.00154685974121
Therefore, mine was clearly the fastest by more than twice, followed by #roippi, followed by #Marcin, followed by #wwi.
P.S. I was completely unbiased here, my timing method was using current time.time() minus previous time.time().
a = [1,5,3,8,2,4,7,6]
a = sorted(a,reverse=True)
Can't really improve these lines. You need to transform your data by sorting it, no sense changing what you've done.
from itertools import izip, starmap
from operator import sub
list(starmap(sub,izip(a,a[1:])))
Out[12]: [1, 1, 1, 1, 1, 1, 1]
If a is really massive, you can replace the a[1:] slice with islice to save on memory overhead:
list(starmap(sub,izip(a,islice(a,1,None))))
Though if it is really that massive, you should probably be using numpy anyway.
np.diff(a) * -1
Out[24]: array([1, 1, 1, 1, 1, 1, 1])
You could do as follows:
diff = [v[0] - v[1] for v in zip(sorted(a,reverse=True)[0:-1], sorted(a,reverse=True)[1:])]
#gives: diff = [1, 1, 1, 1, 1, 1, 1]
Though here you use sorting twice. Not sure if this matters to you or not.
As #aj8uppal sugested its better to have a as sorted version before, so in this case you do:
a = sorted([1,5,3,8,2,4,7,6], reverse=True)
diff = [v[0] - v[1] for v in zip(a[0:-1], a[1:])]
#gives: diff = [1, 1, 1, 1, 1, 1, 1]

Categories