I have a list of 2D numpy arrays:
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
Each line in linelist is the array of vertices connecting the edge.
These elements are the lines that form two squares:
-----
| |
-----
-----
| |
-----
I want to form two graphs, one for each square. To do this, I use a for loop. If neither vertex is present in the graph, then we form a new graph. If one vertex is present in the linelist, then it gets added to a present graph. In order for two lines to be connected, they need to share a vertex in common. However, I am having trouble coding this.
This is what I have so far:
graphs = [[]]
i=0
for elements in linelist:
for graph in graphs:
if elements[0] not in graph[i] and elements[1] not in graph[i]:
graphs.append([])
graphs[i].append(elements)
i=i+1
else:
graphs[i].append(elements)
I suggest doing a 'diffusion-like' process over the graph to find the disjoint subgraphs. One algorithm that comes to mind is breadth-first search; it works by looking for what nodes can be reached from a start node.
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
# edge list usually reads v1 -> v2
graph = {}
# however these are lines so symmetry is assumed
for l in linelist:
v1, v2 = map(tuple, l)
graph[v1] = graph.get(v1, ()) + (v2,)
graph[v2] = graph.get(v2, ()) + (v1,)
def BFS(graph):
"""
Implement breadth-first search
"""
# get nodes
nodes = list(graph.keys())
graphs = []
# check all nodes
while nodes:
# initialize BFS
toCheck = [nodes[0]]
discovered = []
# run bfs
while toCheck:
startNode = toCheck.pop()
for neighbor in graph.get(startNode):
if neighbor not in discovered:
discovered.append(neighbor)
toCheck.append(neighbor)
nodes.remove(neighbor)
# add discovered graphs
graphs.append(discovered)
return graphs
print(BFS(graph))
for idx, graph in enumerate(BFS(graph)):
print(f"This is {idx} graph with nodes {graph}")
Output
This is 0 graph with nodes [(1, 0), (0, 1), (0, 0), (1, 1)]
This is 1 graph with nodes [(3, 1), (2, 2), (1, 2), (3, 2)]
You may be interested in the package networkx for analyzing graphs. For instance finding the disjoint subgraphs is pretty trivial:
import networkx as nx
tmp = [tuple(tuple(j) for j in i) for i in linelist]
graph = nx.Graph(tmp);
for idx, graph in enumerate(nx.connected_components(graph)):
print(idx, graph)
My approach involves 2 passes over the list. In the first pass, I will look at the vertices and assign a graph number to each (1, 2, ...) If both vertices have not been seen, I will assign a new graph number. Otherwise, assign it to an existing one.
In the second pass, I go through the list and group the edges that belong to the same graph number together. Here is the code:
import collections
import itertools
import pprint
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
# First pass: Look at the vertices and figure out which graph they
# belong to
vertices = {}
graph_numbers = itertools.count(1)
for v1, v2 in linelist:
v1 = tuple(v1)
v2 = tuple(v2)
graph_number = vertices.get(v1) or vertices.get(v2) or next(graph_numbers)
vertices[v1] = graph_number
vertices[v2] = graph_number
print('Vertices:')
pprint.pprint(vertices)
# Second pass: Sort edges
graphs = collections.defaultdict(list)
for v1, v2 in linelist:
graph_number = vertices[tuple(v1)]
graphs[graph_number].append([v1, v2])
print('Graphs:')
pprint.pprint(graphs)
Output:
Vertices:
{(0, 0): 1,
(0, 1): 1,
(1, 0): 1,
(1, 1): 1,
(1, 2): 2,
(2, 2): 2,
(3, 1): 2,
(3, 2): 2}
Graphs:
defaultdict(<type 'list'>, {1: [[[0, 0], [1, 0]], [[0, 0], [0, 1]], [[1, 0], [1, 1]], [[0, 1], [1, 1]]], 2: [[[1, 2], [3, 1]], [[1, 2], [2, 2]], [[3, 1], [3, 2]], [[2, 2], [3, 2]]]})
Notes
I have to convert each vertex from a list to a tuple because list cannot be a dictionary's key.
graphs behaves like a dictionary, the keys are graph numbers (1, 2, ...) and the values are list of edges
A little explanation of the line
graph_number = vertices.get(v1) or vertices.get(v2) or next(graph_numbers)
That line is roughly equal to:
number1 = vertices.get(v1)
number2 = vertices.get(v2)
if number1 is None and number2 is None:
graph_number = next(graph_numbers)
elif number1 is not None:
graph_number = number1
else:
graph_number = number2
Which says: If both v1 and v2 are not in the vertices, generate a new number (i.e. next(graph_numbers)). Otherwise, assign graph_number to whichever value that is not None.
Not only that line is succinct, it takes advantage of Python's short circuit feature: The interpreter first evaluate vertices.get(v1). If this returns a number (1, 2, ...) then the interpreter will return that number and skips evaluating the vertices.get(v2) or next(graph_numbers) part.
If vertices.get(v1) returns None, which is False in Python, then the interpreter will evaluate the next segment of the or: vertices.get(v2). Again, if this returns a non-zero number, then the evaluation stops and that number is return. If vertices.get(v2) returns None, then the interpreter evaluates the last segment, next(graph_numbers) and returns that value.
Related
I have a simple code to create a graph,G in networkx.
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib notebook
G = nx.DiGraph()
G.add_edge(1,2); G.add_edge(1,4)
G.add_edge(3,1); G.add_edge(3,2)
G.add_edge(3,4); G.add_edge(2,3)
G.add_edge(4,3)
I want to find "which node in G is connected to the other nodes by a shortest path of length equal to the diameter of G ".
there are two of these combinations, [1,3] and [2,4], which can be found by nx.shortest_path(G, 1) and nx.shortest_path(G, 2),respectively.
Or, for example,
if I use nx.shortest_path_length(G, source=2) then I get {2: 0, 3: 1, 1: 2, 4: 2}. so the length=2 is from node 2 to node 4, which is ok.
now, I'm trying to generalise it for all of the nodes to see if I can find the target nodes.
for node in G.nodes():
target = [k for k,v in nx.shortest_path_length(G, node).items() if v == nx.diameter(G)]
print(target)
and I get this odd result:
[3]
[1, 4]
[1, 2]
[]
Can anybody explain what this result means? as I'm trying to apply this method to solve a bigger problem.
For the graph you provided:
G = nx.DiGraph()
G.add_edge(1,2); G.add_edge(1,4)
G.add_edge(3,1); G.add_edge(3,2)
G.add_edge(3,4); G.add_edge(2,3)
G.add_edge(4,3)
The following:
for node in G.nodes():
target = [k for k,v in nx.shortest_path_length(G, node).items() if v == nx.diameter(G)]
print(target)
will print the target to which node is of distance equal to nx.diameter(G)
I would advise not calculating the diameter inside the for loop since that can turn out quite expensive.
In comparison, for a 200 node graph (nx.barabasi_albert_graph(200, 2, seed=1)) with the diameter calculation outside the for loop it takes ~74ms. The other option (with the diameter calculation inside the for loop) takes... well it's still running :ยด) but i'd say it will take waaay too long.
Also, instead of just the targets print the start and end nodes for readability:
diameter = nx.diameter(G)
for node in G.nodes():
start_end_nodes = [(node, k) for k,v in nx.shortest_path_length(G, node).items() if v == diameter]
print(start_end_nodes)
yielding:
[(1, 3)] # the path from 1 to 3 has lenght 2 = nx.diameter(G)
[(2, 1), (2, 4)] # the paths from 2 to 1 and 2 to 4 have lenght 2
[(4, 1), (4, 2)] # the paths from 4 to 1 and 4 to 2 have lenght 2
[] # there is no path starting at 3 that has lenght 2
A slight modification of the code from the reply of willcrack above (note the addition of calls to sorted):
diameter = nx.diameter(G)
for node in sorted(G.nodes()):
start_end_nodes = sorted([(node, k) for k,v in nx.shortest_path_length(G, node).items()
if v == diameter])
print(node, ":", start_end_nodes)
will produce:
1 : [(1, 3)]
2 : [(2, 1), (2, 4)]
3 : []
4 : [(4, 1), (4, 2)]
The point is that G.nodes() returns the nodes in an arbitrary fashion based on the internal representation of the graph which likely stores the nodes in an unsorted set-like structure.
I have a graph having 602647 nodes and 982982 edges. I wanted to find the first and second order contacts (i.e. 1-hop contacts and 2-hops contacts) for each node in the graph in Networkx.
i built the following code that worked fine for smaller graphs, but never finished running for larger (graphs as the one above):
hop_1 = {}
hop_2 = {}
row_1 = {}
row_2 = {}
for u, g in G.nodes(data=True):
row_1.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=1))
row_2.setdefault(u, nx.single_source_shortest_path_length(G, u, cutoff=2))
hop_1.update(row_1)
hop_2.update(row_2)
some notes:
results are stored first in a dict (hope_1 and hope_2)
row_1 and row_2 and temporary holding variables
hop-1 will include nodes after one jump
hop-2 will include nodes that are located at both one jump and two jumps
Is there a way to optimize/imrpove this code and finish running?
To find first and second-order neighbors you can use functions all_neighbors() and node_boundary():
hop1 = {}
hop2 = {}
for n in G.nodes():
neighbs1 = list(nx.all_neighbors(G, n))
hop1[n] = neighbs1
hop2[n] = list(nx.node_boundary(G, neighbs1 + [n])) + neighbs1
print(hop1)
# {0: [1, 2, 3], 1: [0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [2, 3]}
print(hop2)
# {0: [4, 1, 2, 3], 1: [4, 0, 2, 3], 2: [0, 1, 3, 4], 3: [0, 1, 2, 4], 4: [0, 1, 2, 3]}
I don't know networkx; but, by definition, a node that is reachable one hop is also reachable in <=2 hops, which is what the docs (and source) of single_source_shortest_path_length is giving you. you can therefore remove the first call to single_source_shortest_path_length.
second, your uses of dictionaries are very strange! why are you using setdefault rather than just setting elements? also you're copying things a lot with update which doesn't do anything useful and just wastes time.
I'd do something like:
hop_1 = {}
hop_2 = {}
for u in G.nodes():
d1 = []
d2 = []
for v, n in nx.single_source_shortest_path_length(G, u, cutoff=2).items():
if n == 1:
d1.append(v)
elif n == 2:
d2.append(v)
hop_1[u] = d1
hop_2[u] = d2
which takes about a minute on my laptop with a G_nm graph as generated by:
import networkx as nx
G = nx.gnm_random_graph(602647, 982982)
note that tqdm is nice for showing progress of long running loops, just import tqdm and change the outer for loop to be:
for u in tqdm(G.nodes()):
...
and you'll get a nice bar reporting progress
adj_list={1:[2,4],2:[1,3,4,8],3:[2,6,8,7],4:[1,5,2],5:[4,6],6:[3,9,5],7:[3,8,9,10],8:[2,3,7],9:[6,7,10],10:[7,9]}
def func(x,y):
t=0
xx=x
global i
for i in range(len(adj_list[xx])):
if y in adj_list[xx]:
t=t+1
# print(x,y,t)
break
else:
if xx<y:
t = t + 1
xx = xx + 1
i=0
print(x,y,t)
func(1,6)
I except the output like:
func(1,10) :1-2-3-7-10(4) or 1-2-8-7-10(4)
4 should be no of steps from 1 to 10
If you want a quick and easy implementation in pure Python you can use recursion to traverse the adjacent list and count the number of steps it takes to get to the destination from each node, then only record whichever path took the least amount of steps.
def count_steps(current_vertex, destination, steps=0, calling=0):
"""
Counts the number of steps between two nodes in an adjacent array
:param current_vertex: Vertex to start looking from
:param destination: Node we want to count the steps to
:param steps: Number of steps taken so far (only used from this method, ignore when calling)
:param calling: The node that called this function (only used from this method, ignore when calling)
:return:
"""
# Start at illegal value so we know it can be changed
min_steps = -1
# Check every vertex at the current index
for vertex in adj_list[current_vertex]:
if destination in adj_list[current_vertex]:
# We found the destination in the current array of vertexes
return steps + 1
elif vertex > calling:
# Make sure the vertex we will go to is greater than wherever we want to go so we don't end up in a loop
counted = count_steps(vertex, destination, steps + 1, current_vertex)
if counted != -1 and (min_steps == -1 or counted < min_steps):
# If this is actually the least amount of steps we have found
min_steps = counted
return min_steps
Note that when we find the destination in the current vertex's array we add one. This is because one more step would be needed to actually get to the node we found.
If you're looking into the least amount of steps to get from a specific node to any other node, I would suggest Dijkstra's Algorithm. This isn't a problem that is solvable in a single loop, it requires a queue of values that keeps in mind the shortest amount of steps.
You can use networkx for this. Start by building a network using the keys as nodes and the values as edges. A little extra work will be necessary for the edges however, given that edges must be lists of tuples containing (source_node, dest_node).
So a way to deal with this is to get all key-value combinations from all entries in the dictionary.
For the nodes you'll simply need:
nodes = list(adj_list.keys())
Now lets get the list of edges from the dictionary. For that you can use the following list comprehension:
edges = [(k,val) for k, vals in adj_list.items() for val in vals]
# [(1, 2), (1, 4), (2, 1), (2, 3), (2, 4)...
So, this list contains the entries in the dict as a flat list of tuples:
1: [2, 4] -> (1, 2), (1, 4)
2: [1, 3, 4, 8] -> (2, 1), (2, 3), (2, 4), (2, 8)
...
Now lets build the network with the corresponding nodes and edges:
import networkx as nx
G=nx.Graph()
G.add_edges_from(edges)
G.add_nodes_from(nodes)
Having built the network, in order to find the steps between different nodes, you can use shortest_path, which will give you precisely the shortest path between two given nodes. So if you wanted to find the shortest path between nodes 1 and 10:
nx.shortest_path(G, 1,10)
# [1, 2, 3, 7, 10]
If you're interested in the length simply take the len of the list. Lets look at another example:
nx.shortest_path(G, 1,6)
# [1, 2, 3, 6]
This can more easily checked by directly plotting the network:
nx.draw(G, with_labels = True)
plt.show()
Where in the case of the shortest path between nodes 1 and 10, as it can be seen the intermediate nodes are [1, 2, 3, 7, 10]:
I'm working on an problem that finds the distance- the number of distinct items between two consecutive uses of an item in realtime. The input is read from a large file (~10G), but for illustration I'll use a small list.
from collections import OrderedDict
unique_dist = OrderedDict()
input = [1, 4, 4, 2, 4, 1, 5, 2, 6, 2]
for item in input:
if item in unique_dist:
indx = unique_dist.keys().index(item) # find the index
unique_dist.pop(item) # pop the item
size = len(unique_dist) # find the size of the dictionary
unique_dist[item] = size - indx # update the distance value
else:
unique_dist[item] = -1 # -1 if it is new
print input
print unique_dist
As we see, for each item I first check if the item is already present in the dictionary, and if it is, I update the value of the distance or else I insert it at the end with the value -1. The problem is that this seems to be very inefficient as the size grows bigger. Memory isn't a problem, but the pop function seems to be. I say that because, just for the sake if I do:
for item in input:
unique_dist[item] = random.randint(1,99999)
the program runs really fast. My question is, is there any way I could make my program more efficient(fast)?
EDIT:
It seems that the actual culprit is indx = unique_dist.keys().index(item). When I replaced that with indx = 1. The program was orders of magnitude faster.
According to a simple analysis I did with the cProfile module, the most expensive operations by far are OrderedDict.__iter__() and OrderedDict.keys().
The following implementation is roughly 7 times as fast as yours (according to the limited testing I did).
It avoids the call to unique_dist.keys() by maintaining a list of items keys. I'm not entirely sure, but I think this also avoids the call to OrderedDict.__iter__().
It avoids the call to len(unique_dist) by incrementing the size variable whenever necessary. (I'm not sure how expensive of an operation len(OrderedDict) is, but whatever)
def distance(input):
dist= []
key_set= set()
keys= []
size= 0
for item in input:
if item in key_set:
index= keys.index(item)
del keys[index]
del dist[index]
keys.append(item)
dist.append(size-index-1)
else:
key_set.add(item)
keys.append(item)
dist.append(-1)
size+= 1
return OrderedDict(zip(keys, dist))
I modified #Rawing's answer to overcome the overhead caused by the lookup and insertion time taken by set data structure.
from random import randint
dist = {}
input = []
for x in xrange(1,10):
input.append(randint(1,5))
keys = []
size = 0
for item in input:
if item in dist:
index = keys.index(item)
del keys[index]
keys.append(item)
dist[item] = size-index-1
else:
keys.append(item)
dist[item] = -1
size += 1
print input
print dist
How about this:
from collections import OrderedDict
unique_dist = OrderedDict()
input = [1, 4, 4, 2, 4, 1, 5, 2, 6, 2]
for item in input:
if item in unique_dist:
indx = unique_dist.keys().index(item)
#unique_dist.pop(item) # dont't pop the item
size = len(unique_dist) # now the directory is one element to big
unique_dist[item] = size - indx - 1 # therefor decrement the value here
else:
unique_dist[item] = -1 # -1 if it is new
print input
print unique_dist
[1, 4, 4, 2, 4, 1, 5, 2, 6, 2]
OrderedDict([(1, 2), (4, 1), (2, 2), (5, -1), (6, -1)])
Beware that the entries in unique_dist are now ordered by there first occurrence of the item in the input; yours were ordered by there last occurrence:
[1, 4, 4, 2, 4, 1, 5, 2, 6, 2]
OrderedDict([(4, 1), (1, 2), (5, -1), (6, -1), (2, 1)])
The short question, is there an off the self function to make a graph from a collection of python sets?
The longer question: I have several python sets. They each overlap or some are sub sets of others. I would like to make a graph (as in nodes and edges) nodes are the elements in the sets. The edges are intersection of the sets with weighted by number of elements in the intersection of the sets. There are several graphing packages for python. (NetworkX, igraph,...) I am not familiar with the use of any of them. Will any of them make a graph directly from a list of sets ie, MakeGraphfromSets(alistofsets)
If not do you know of an example of how to take the list of sets to define the edges. It actually looks like it might be straight forward but an example is always good to have.
It's not too hard to code yourself:
def intersection_graph(sets):
adjacency_list = {}
for i, s1 in enumerate(sets):
for j, s2 in enumerate(sets):
if j == i:
continue
try:
lst = adjacency_list[i]
except KeyError:
adjacency_list[i] = lst = []
weight = len(s1.intersection(s2))
lst.append( (j, weight) )
return adjacency_list
This function numbers each set with its index within sets. We do this because dict keys must be immutable, which is true of integers but not sets.
Here's an example of how to use this function, and it's output:
>>> sets = [set([1,2,3]), set([2,3,4]), set([4,2])]
>>> intersection_graph(sets)
{0: [(1, 2), (2, 1)], 1: [(0, 2), (2, 2)], 2: [(0, 1), (1, 2)]}
def MakeGraphfromSets(sets):
egs = []
l = len(sets)
for i in range(l):
for j in range(i,l):
w = sets[i].intersection(sets[j])
egs.append((i,j,len(w)))
return egs
# (source set index,destination set index,length of intersection)
sets = [set([1,2,3]), set([2,3,4]), set([4,2])]
edges = MakeGraphfromSets(sets)
for e in edges:
print e
OUTPUT:
(0, 0, 3)
(0, 1, 2)
(0, 2, 1)
(1, 1, 3)
(1, 2, 2)
(2, 2, 2)