I have a (un-directed) graph represented using adjacency lists, e.g.
a: b, c, e
b: a, d
c: a, d
d: b, c
e: a
where each node of the graph is linked to a list of other node(s)
I want to update such a graph given some new list(s) for certain node(s), e.g.
a: b, c, d
where a is no longer connected to e, and is connected to a new node d
What would be an efficient (both time and space wise) algorithm for performing such updates to the graph?
Maybe I'm missing something, but wouldn't it be fastest to use a dictionary (or default dict) of node-labels (strings or numbers) to sets? In this case update could look something like this:
def update(graph, node, edges, undirected=True):
# graph: dict(str->set(str)), node: str, edges: set(str), undirected: bool
if undirected:
for e in graph[node]:
graph[e].remove(node)
for e in edges:
graph[e].add(node)
graph[node] = edges
Using sets and dicts, adding and removing the node to/from the edges-sets of the other nodes should be O(1), same as updating the edges-set for the node itself, so this should be only O(2n) for the two loops, with n being the average number of edges of a node.
Using an adjacency grid would make it O(n) to update, but would take n^2 space, regardless of how sparse the graph is. (Trivially done by updating each changed relationship by inverting the row and column.)
Using lists would put the time up to O(n^2) for updating, but for sparse graphs would not take a huge time penalty, and would save a lot of space.
A typical update is del edge a,e; add edge a,d, but your update looks like a new adjacency list for vertex a. So simply find the a adjacency list and replace it. That should be O(log n) time (assuming sorted array of adjacency lists, like in your description).
Related
I am looking for a way to generate all possible directed graphs from an undirected template. For example, given this graph "template":
I want to generate all six of these directed versions:
In other words, for each edge in the template, choose LEFT, RIGHT, or BOTH direction for the resulting edge.
There is a huge number of outputs for even a small graph, because there are 3^E valid permutations (where E is the number of edges in the template graph), but many of them are duplicates (specifically, they are automorphic to another output). Take these two, for example:
I only need one.
I'm curious first: Is there is a term for this operation? This must be a formal and well-understood process already?
And second, is there a more efficient algorithm to produce this list? My current code (Python, NetworkX, though that's not important for the question) looks like this, which has two things I don't like:
I generate all permutations even if they are isomorphic to a previous graph
I check isomorphism at the end, so it adds additional computational cost
Results := Empty List
T := The Template (Undirected Graph)
For i in range(3^E):
Create an empty directed graph G
convert i to trinary
For each nth edge in T:
If the nth digit of i in trinary is 1:
Add the edge to G as (A, B)
If the nth digit of i in trinary is 2:
Add the edge to G as (B, A)
If the nth digit of i in trinary is 0:
Add the reversed AND forward edges to G
For every graph in Results:
If G is isomorphic to Results, STOP
Add G to Results
Here is the situation:
I have a graph type structure, an adjacency list, and each element of this adjacency list is a 1 dimensional array (either numpy, or bcolz.. not sure if I will use bcolz or not).
Each 1-dimensional array represents graph elements that could possibly connect, in the form of binary sequences. For them to connect, they need to have a specific bitwise intersection value.
Therefore, for each 1 dimensional array in my adjacency list, I want to do the bitwise "and" between every combination of two elements in the given array.
This will possibly be used for huge graph breadth-first traversal, so we may be talking a very very large number of elements.
Is this something I can do with vectorized operations? Should I be using a different structure? What is a good way to do this? I am willing to completely restructure everything if there could be a significant performance boost.
Is it as simple as looping through the individual elements and then broadcasting(correct terminology?) & against the entire array? Thanks.
quick edit
As an extra note, I am using python integers for my byte sequences. Which from my understanding, doesn't play well with numpy(the integers get too big, type long long). I have to create arrays of object type. Does this potentially cause a huge slowdown? Is it a reason to use a different structure?
An Example
//create an nxn adjacency list, where n is number of graph nodes.
//map each graph node to a value 2^k:
nodevals = defaultdict()
for i in xrange(n):
nodevals[i] = 2**(i+1)
//for each edge in our graph it is comprised of two nodes, which are mapped as powers of two. Take their sum, and place them in the adjacency list:
for i in xrange(n):
for j in xrange(n):
adjlist[i][j].append((nodevals[i]|nodevals[j]))
//We now have our first Adjacency list, which is just bare edges. These edges can be connected by row or column, by taking intersection of the sum (nodevals[i]|nodevals[j]) with the other edge (nodevals[i2]|nodevals[j2]), and checking if it equals the connection point for each.
//this may not seem useful for individual edges, but in future iterations we can do this:
//After 3 iterations. (5,1) connected to (1,9), and then this connected to (7,5), for example:
adjlist[5][1] & adjlist[1][9] == 1
adjlist2[5][9] == adjlist[5][1]|adjlist[1][9]
adjlist[7][5] & adjlist2[5][9] == 5
adjlist3[7][9] == adjlist[7][5]|adjlist2[5][9]
//So, you may see how this could be useful for efficient traversal.
//However, it becomes more complicated, because as we increase the length of our subpaths, or "pseudo-edges", or whatever you want to call them,
//The arrays for the given (i,j) hold more and more subpath sums that can potentially be connected.
//Soon the arrays can become very large, which is when I would want to efficiently be able to calculate intersections
//AND, for this problem in particular, I want to be able to connect edges against the SAME edge, so I want to do the bitwise intersection between all pairs of elements in the given array (ie, the given indices [i][j] of that adjacency list).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
My problem involves creating a directed graph, checking if it unique by comparing to a text file containing graphs and if it is unique, appending it to the file. What would be the best representation of graph to be used in that case?
I'm using Python and I'll be using brute-force to check if graphs are isomorphic, since the graphs are small and have some restrictions.
There is a standard text based format called DOT which allows you to work with directed and undirected graphs, and would give you the benefit of using a variety of different libraries to work with your graphs. Notably graphviz which allows you to read and write DOT files, as well as plot them graphically using matplotlib.
Assuming that this is a simple case of how the graphs are represented you might be ok with a simple CSV format where a line is a single edge and ther's some separator between graphs, eg:
graph_4345345
A,B
B,C
C,E
E,B
graph_3234766
F,D
B,C
etc.
You could then make use of https://docs.python.org/3/library/csv.html
I guess it depends on how you are going to represent your graph as a data structure.
The two most known graph representations as data structures are:
Adjacency matrices
Adjacency lists
Adjacency matrices
For a graph with |V| vertices, an adjacency matrix is a |V|X|V| matrix of 0s and 1s, where the entry in row i and column j is 1 if and only if the edge (i,j) is in the graph. If you want to indicate an edge weight, put it in the row i column j entry, and reserve a special value (perhaps null) to indicate an absent edge.
With an adjacency matrix, we can find out whether an edge is present in constant time, by just looking up the corresponding entry in the matrix. For example, if the adjacency matrix is named graph, then we can query whether edge (i,j) is in the graph by looking at graph[i][j].
For an undirected graph, the adjacency matrix is symmetric: the row i, column j entry is 1 if and only if the row j, column i entry is 1. For a directed graph, the adjacency matrix need not be symmetric.
Adjacency lists
Representing a graph with adjacency lists combines adjacency matrices with edge lists. For each vertex i, store an array of the vertices adjacent to it. We typically have an array of |V| adjacency lists, one adjacency list per vertex.
Vertex numbers in an adjacency list are not required to appear in any particular order, though it is often convenient to list them in increasing order.
We can get to each vertex's adjacency list in constant time, because we just have to index into an array. To find out whether an edge (i,j) is present in the graph, we go to i's adjacency list in constant time and then look for j in i's adjacency list.
In an undirected graph, vertex j is in vertex i's adjacency list if and only if i is in j's adjacency list. If the graph is weighted, then each item in each adjacency list is either a two-item array or an object, giving the vertex number and the edge weight.
Export to file
How to export the data structure to a text file? Well, that's up to you based on how you would read the text file and import it into the data structure you decided to work with.
If I were to do it, I'd probably try to dump it in the most simple way for later to know how to read and parse it back to the data structure.
Adjacency list
store graphs in this format:
First line contains two integers: N (number of nodes) and E (number of edges).
ThenE lines follow each containing two integers U and V. each line represents an edge (edge goring from U to V)
This is how a cycle graph of four nodes would look like:
4 4
1 2
2 3
3 4
4 1
To represent graphs in python you can use a list of lists.
N, E = input() # input will take two comma separated integers
graph = [[] for x in range(N+1)] # initially no edge is inserted
for x in range(E): #to read E edges
u, v = input()
# inserting edge u->v
graph[u].append(v)
I have three graphs represented as python dictionaries
A: {1:[2], 2:[1,3], 3:[]}.
B: {1: {neighbours:[2]}, 2: {neighbours:[1,3]}, 3: {neighbours:[]}}
C: {1: {2:None}, 2: {1:None, 3:None}, 3: {}}
I have a hasEdge and addEdge function
def addEdge(self, source, target):
assert self.hasNode(source) and self.hasNode(target)
if not self.hasEdge(source, target):
self.graph[source][target] = None
def hasEdge(self, source, target):
assert self.hasNode(source) and self.hasNode(target)
return target in self.graph[source]
I am not sure which structures will be most efficient for each function, my immediate thought is the first will be the most efficient for adding a edge and the C will be the most efficient for returning if it has an edge
A and B are classic adjacency lists. C is an adjacency list, but uses an O(1) structure instead of an O(N) structure for the list. But really, you should use D, the adjacency set.
In Python set.contains(s) is an O(1) operation.
So we can do
graph = { 1: set([2]), 2: set([1, 3], 3: set() }
Then our addEdge(from, to) is
graph[from].add(to)
graph[to].add(from)
and our hasEdge(from,to) is just
to in graph[from]
C seems to be the most efficient to me, since you are doing lookups that are on average O(1). (Note that this is the average case, not the worst case.) With Adjacency Lists, you have worst case Linear Search.
For a sparse graph, you may wish to use Adjacency Lists (A), as they will take up less space. However, for a dense graph, option C should be the most efficient.
A and B will have very similar runtimes - asymptotically the same. Unless there is data besides neighbors that you wish to add to these nodes, I would choose A.
I am not familiar with python; however, for Java, option C can be improved by using a HashSet (set) which would reduce your space requirements. Runtime would be the same as using a HashMap, but sets do not store values - only keys, which is what you want for checking if there is an edge between two nodes.
So, to clarify:
For runtime, choose C. You will have average case O(1) edge adds. To improve C in order to consume less memory, use sets instead of maps, so you do not have to allocate space for values.
For memory, choose A if you have a sparse graph. You'll save a good amount of memory, and won't lose too much in terms of runtime. For reference, sparse is when nodes don't have too many neighbors; for example, when each node has about 2 neighbors in a graph with 20 nodes.
I'm new to programming, Python and networkx (ouch!) and trying to merge four graphml-files into one and removing the duplicate nodes, following the excellent instructions here
However, I can't figure out how to keep track of the duplicate nodes when there are FOUR files to compare, instead of two. The code I've written below won't work, but you can hopefully see how I'm thinking wrong and help me.
# script to merge one or more graphml files into one single graphml file
# First read graphml-files into Python and Networkx (duplicate variables as necessary)
A = nx.read_graphml("file1.graphml")
B = nx.read_graphml("file2.graphml")
C = nx.read_graphml("file3.graphml")
D = nx.read_graphml("file4.graphml")
# Create a new graph variable containing all the previous graphs
H = nx.union(A,B,C,D, rename=('1-','2-','3-','4-'))
# Check what nodes are in two or more of the original graphml-files
duplicate_nodes_a_b = [n for n in A if n in B]
duplicate_nodes_b_c = [n for n in B if n in C]
duplicate_nodes_c_d = [n for n in C if n in D]
all_duplicate_nodes = # How should I get this?
# remove duplicate nodes
for n in all_duplicate nodes:
n1='1-'+str(n)
n2='2-'+str(n)
n3='3-'+str(n)
n4='4-'+str(n)
H.add_edges_from([(n1,nbr)for nbr in H[n2]]) # How can I take care of duplicates_nodes_b_c, duplicates_nodes_c_d?
H.remove_node(n2)
# write the merged graphml files-variable into a new merged graphml file
nx.write.graphml(H, "merged_file.graphml", encoding="utf-8", prettyprint=True)
First, note that the way you use nx.union is not what you want. You really need to call it with just two graphs. But how to deal with the duplicates gets complicated this way, because you have to consider all possible pairs of graphs to see how a node could be duplicated.
Instead, let's be more direct and just count up in how many graphs each node appears. This is easy using a Counter:
import collections
ctr = collections.Counter()
for G in [A, B, C, D]:
ctr.update(G)
Now determine which nodes just appear once, using the counter:
singles = {x for (x,n) in ctr.viewitems() if n == 1}
With that set of nodes, we can then compute the subgraphs containing only nodes that are not duplicated:
E = nx.union(A.subgraph(singles), B.subgraph(singles))
F = nx.union(C.subgraph(singles), D.subgraph(singles))
H = nx.union(E, F)
The graph H has all four initial graphs merged with duplicates removed.
The approach I've shown makes several intermediate graphs, so it is possible that, for large input graphs, you'll run into memory problems. If so, a similar approach could be done where you determine the set of duplicated nodes, delete those nodes from the original graphs, and then find the union without keeping all the intermediates. It looks like:
import collections
import networkx as nx
ctr = collections.Counter()
for G in [A, B, C, D]:
ctr.update(G)
duplicates = {x for (x,n) in ctr.viewitems() if n > 1}
H = nx.Graph()
for G in [A, B, C, D]:
G.remove_nodes_from(duplicates) # change graphs in-place
H = nx.union(H, G)
Both approaches take advantage of the way that NetworkX functions often allow extra nodes to be given and silently ignored.
If the graphml files are simple (no weights, properties, etc.), then it may be easier to work at the text level. For instance,
cat A.graphml B.graphml C.graphml | sort -r | uniq > D.graphml
This will keep unique sets of nodes and edges from three graphml files. You can rearrange <graph>, </graph>, <graphml ...>, </graphml> tags in D.graphml later with a text editor.