Number of Vertices Within a Distance from Start Vertex - python

I was working on a question on a judge that asked about finding the number of vertices that are within a certain distance from it. This has to be done for all vertices on the graph. The full question specifications can be seen here. I have some Python code to solve the program, but it is too slow.
import sys, collections
raw_input = sys.stdin.readline
n, m, k = map(int, raw_input().split())
dict1 = collections.defaultdict(set)
ans = {i:[set([i])for j in xrange(k)]for i in xrange(1, n+1)}
for i in xrange(m):
x, y = map(int, raw_input().split())
dict1[x].add(y)
dict1[y].add(x)
for point in dict1:
ans[point][0].update(dict1[point])
for i in xrange(1, k):
for point in dict1:
for neighbour in dict1[point]:
ans[point][i].update(ans[neighbour][i-1])
for i in xrange(1, n+1):
print len(ans[i][-1])
What my code does is it initially creates a set of points that are direct neighbours of each vertex (distance of 0 to 1). After that, it creates a new set of neighbours for each vertex from all the previously found neighbours of neighbours (distance of 2). Then it keeps doing this, creating a new set of neighbours and incrementing the distance until the final distance is reached. Is there a better way to solve this problem?

There is a plenty of good and fast solutions.
One of them (Not the fastest, but fast enough) is to use BFS algorithm up to distance K. Just run bfs, which not adds neighbours to queue when distance exceeds K, for all vertexes. K is parameter from exercise specification.

I would use the adjacency matrix multiplication. Adjacency matrix is a boolean square matrix n * n where n is a number of vertices. The value of adjacency_matrix[i][j] equals 1 if the edge from i to j exists and 0 otherwise. If we multiply the adjacency matrix by itself we get the paths of length 2. If we do that again we get the paths of length 3 and so on and so on. In Your case the K <= 5 so there won't be too much of that multiplication. You can use numpy for that and it will be very fast. So in pseudocode, the solution to Your problem would look like this:
adjacency_matrix = build_adjacency_matrix_from_input()
initial_adjacency_matrix = adjacency_matrix
result_matrix = adjacency_matrix
for i = 2 to K:
adjacency_matrix = adjacency_matrix * initial_adjacency_matrix
result_matrix = result_matrix + adjacency_matrix
for each row of result_matrix print how many values higher then 0 are in it

You want paths of length <=K. In this case, BFS can be used to find paths of certain length easily. Or if your graphs uses adjacency matrix representation, then matrix multiplication can also be used for these purposes.
If using BFS:
This is equivalent to performing level-by-level traversal starting from a given source vertex. Here is the pseudo code that can compute all the vertices that are at distance K from a given source vertex:
Start: Let s be your source vertex and K represent max path length required
Create two Queues Q1 and Q2 and insert source vertex s into Q1
Let queueTobeEmptied = Q1 // represents the queue that is to be emptied
Let queueTobeFilled = Q2 // represents the queue that is used for inserting new elements discovered
Let Result be a vector of vertices: initialize it to be empty
Note: source vertex s is at level 0, push it to Result vector if that is also required
for(current_level=1; current_level<=K; current_level++) {
while(queueTobeEmptied is not empty) {
remove the vertex from queueTobeEmptied and call it as u
for_each adjacent vertex 'v' of u {
if v is not already visited {
mark v as visited
insert v into the queueTobeFilled
push v to Result
}
}
}
swap the queues now for next iteration of for loop: swap(queueTobeEmptied, queueTobeFilled)
}
Empty Q1 and Q2
End: Result is the vector that contains all the vertices of length <= K

Related

Analyzing the complexity matrix path-finding

Recently in my homework, I was assinged to solve the following problem:
Given a matrix of order nxn of zeros and ones, find the number of paths from [0,0] to [n-1,n-1] that go only through zeros (they are not necessarily disjoint) where you could only walk down or to the right, never up or left. Return a matrix of the same order where the [i,j] entry is the number of paths in the original matrix that go through [i,j], the solution has to be recursive.
My solution in python:
def find_zero_paths(M):
n,m = len(M),len(M[0])
dict = {}
for i in range(n):
for j in range(m):
M_top,M_bot = blocks(M,i,j)
X,Y = find_num_paths(M_top),find_num_paths(M_bot)
dict[(i,j)] = X*Y
L = [[dict[(i,j)] for j in range(m)] for i in range(n)]
return L[0][0],L
def blocks(M,k,l):
n,m = len(M),len(M[0])
assert k<n and l<m
M_top = [[M[i][j] for i in range(k+1)] for j in range(l+1)]
M_bot = [[M[i][j] for i in range(k,n)] for j in range(l,m)]
return [M_top,M_bot]
def find_num_paths(M):
dict = {(1, 1): 1}
X = find_num_mem(M, dict)
return X
def find_num_mem(M,dict):
n, m = len(M), len(M[0])
if M[n-1][m-1] != 0:
return 0
elif (n,m) in dict:
return dict[(n,m)]
elif n == 1 and m > 1:
new_M = [M[0][:m-1]]
X = find_num_mem(new_M,dict)
dict[(n,m-1)] = X
return X
elif m == 1 and n>1:
new_M = M[:n-1]
X = find_num_mem(new_M, dict)
dict[(n-1,m)] = X
return X
new_M1 = M[:n-1]
new_M2 = [M[i][:m-1] for i in range(n)]
X,Y = find_num_mem(new_M1, dict),find_num_mem(new_M2, dict)
dict[(n-1,m)],dict[(n,m-1)] = X,Y
return X+Y
My code is based on the idea that the number of paths that go through [i,j] in the original matrix is equal to the product of the number of paths from [0,0] to [i,j] and the number of paths from [i,j] to [n-1,n-1]. Another idea is that the number of paths from [0,0] to [i,j] is the sum of the number of paths from [0,0] to [i-1,j] and from [0,0] to [i,j-1]. Hence I decided to use a dictionary whose keys are matricies of the form [[M[i][j] for j in range(k)] for i in range(l)] or [[M[i][j] for j in range(k+1,n)] for i in range(l+1,n)] for some 0<=k,l<=n-1 where M is the original matrix and whose values are the number of paths from the top of the matrix to the bottom. After analizing the complexity of my code I arrived at the conclusion that it is O(n^6).
Now, my instructor said this code is exponential (for find_zero_paths), however, I disagree.
The recursion tree (for find_num_paths) size is bounded by the number of submatrices of the form above which is O(n^2). Also, each time we add a new matrix to the dictionary we do it in polynomial time (only slicing lists), SO... the total complexity is polynomial (poly*poly = poly). Also, the function 'blocks' runs in polynomial time, and hence 'find_zero_paths' runs in polynomial time (2 lists of polynomial-size times a function which runs in polynomial time) so all in all the code runs in polynomial time.
My question: Is the code polynomial and my O(n^6) bound is wrong or is it exponential and I am missing something?
Unfortunately, your instructor is right.
There is a lot to unpack here:
Before we start, as quick note. Please don't use dict as a variable name. It hurts ^^. Dict is a reserved keyword for a dictionary constructor in python. It is a bad practice to overwrite it with your variable.
First, your approach of counting M_top * M_bottom is good, if you were to compute only one cell in the matrix. In the way you go about it, you are unnecessarily computing some blocks over and over again - that is why I pondered about the recursion, I would use dynamic programming for this one. Once from the start to end, once from end to start, then I would go and compute the products and be done with it. No need for O(n^6) of separate computations. Sine you have to use recursion, I would recommend caching the partial results and reusing them wherever possible.
Second, the root of the issue and the cause of your invisible-ish exponent. It is hidden in the find_num_mem function. Say you compute the last element in the matrix - the result[N][N] field and let us consider the simplest case, where the matrix is full of zeroes so every possible path exists.
In the first step, your recursion creates branches [N][N-1] and [N-1][N].
In the second step, [N-1][N-1], [N][N-2], [N-2][N], [N-1][N-1]
In the third step, you once again create two branches from every previous step - a beautiful example of an exponential explosion.
Now how to go about it: You will quickly notice that some of the branches are being duplicated over and over. Cache the results.

Dijkstra algorithm not working even though passes the sample test cases

So I have followed Wikipedia's pseudocode for Dijkstra's algorithm as well as Brilliants. https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Pseudocode https://brilliant.org/wiki/dijkstras-short-path-finder/. Here is my code which doesn't work. Can anyone point in the flaw in my code?
# Uses python3
from queue import Queue
n, m = map(int, input().split())
adj = [[] for i in range(n)]
for i in range(m):
u, v, w = map(int, input().split())
adj[u-1].append([v, w])
adj[v-1].append([u, w])
x, y = map(int, input().split())
x, y = x-1, y-1
q = [i for i in range(n, 0, -1)]
#visited = set()
# visited.add(x+1)
dist = [float('inf') for i in range(len(adj))]
dist[x] = 0
# print(adj[visiting])
while len(q) != 0:
visiting = q.pop()-1
for i in adj[visiting]:
u, v = i
dist[u-1] = dist[visiting]+v if dist[visiting] + \
v < dist[u-1] else dist[u-1]
# print(dist)
if dist[y] != float('inf'):
print(dist[y])
else:
print(-1)
Your algorithm is not implementing Dijkstra's algorithm correctly. You are just iterating over all nodes in their input order and updating the distance to the neighbors based on the node's current distance. But that latter distance is not guaranteed to be the shortest distance, because you iterate some nodes before their "turn". Dijkstra's algorithm specifies a particular order of processing nodes, which is not necessarily the input order.
The main ingredient that is missing from your algorithm, is a priority queue. You did import from Queue, but never use it. Also, it lacks the marking of nodes as visited, a concept which you seemed to have implemented a bit, but which you commented out.
The outline of the algorithm on Wikipedia explains the use of this priority queue in the last step of each iteration:
Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.
There is currently no mechanism in your code that selects the visited node with smallest distance. Instead it picks the next node based on the order in the input.
To correct your code, please consult the pseudo code that is available on that same Wikipedia page, and I would advise to go for the variant with priority queue.
In Python you can use heapq for performing the actions on the priority queue (heappush, heappop).

Given edges in a undirected graph, what is an algorithm for limiting the maximum degree of the graph while maximizing the degree of the graph?

This is for my research in protein folding (So I guess technically a school project)
Summary:
I have the edges of an weighted undirected graph. Each vertex of the graph has anywhere from 1 to 20-ish edges. I would like to trim this graph down such that no vertex has more than 6 edges. I would also like the graph to retain as much connectivity as possible (maximize the degree).
Background:
I have a Delaunay Tesselation of the atoms (pointcloud essentially) in a protein using the scipy library. I use this to create a list of all pairs of residues that are in contact with each other (I store the distance between them). This list contains every pair (twice), and the distance between the pairs. (The residue contains many atoms so I use the average position of them to get the position of the residue)
pairs
[(ALA 1, GLU 2, 2.7432), (ALA 1, GLU 2, 2.7432), (ALA 4, ASP 27, 4.8938), (ALA 4, ASP 27, 4.8938) ... ]
What I have tried (which works but isn't exactly what I want) is to only store the six closest contacts. (I sort the residue names so I can use collections later)
for contact in residue.contacts[:6]:
pairs.append( tuple( sorted([residue.name, contact.name], key=lambda r: r.name) + [residue.dist[contact]] ) )
And then remove any contacts that are not reciprocated. (I guess technically add contacts that are)
new_pairs = []
counter=collections.Counter(pairs)
for key, val in counter.items():
if val == 2:
new_pairs.append(key)
This works, but I lose some information that I would like to keep. I phrased the question as a graph theory problem because I feel like this problem has already been solved in that field.
I was thinking that greedy algorithm might work:
while run_greedy:
# find the residue with the maximum number of neighbors
# find that residues pair with the maximum number of neighbors but only if the pair exists in pairs
# remove that pair from pairs
# if maximum_degree <= 6: run_greedy = False
Does the greedy algorithm work? Are there known algorithms that do this well? Is there a library that can do this (I am more than willing to change the format of the data to fit the library)?
I hope this is enough information, Thanks in advance for the help.
EDIT this is an variant of the knapsack problem: you add edges one by one, and want to maximize the number of edges while the graph built doesn't exceed a given degree.
The following solution uses dynamic programming.
Let m[i, d] the maximum subset of edges in e_0, ..., e_{i-1} creating a subgraph of maximium degree <= d.
m[i, 0] = {}
m[0, d] = {}
m[i, d] = m[i-1, d] + {e_i} if the degree of the graph is <= d
m[i, d] = m[i-1, d-1] + {e_i} if it has more edges than m[i-1][d], else m[i-1][d].
Hence the algorithm (not tested):
for i in 0..N:
m[i][0] = {}
for d in 1..K:
m[0][d] = {}
for d in 1..K:
for i in 1..N:
G1 = m[i-1][d] + {e_i}
if D(G1) == d: # can add e_i with degree <= k
m[i][d] = G1
else:
m[i][d] = max(m[i-1][d-1] + {e_i}, m[i-1][d]) # key=cardinal
Solution is: m[N-1][K-1]. Time complexity is O(K N^2) (imbricated loops : K N + maximum degre of the graph in N or less)
Previous answer
TLDR; I don't know how to find an optimal solution, but a greedy algorithm might give you acceptable result.
The problem
Let me rephrase the problem, based on your question and your code: you want to remove a minimum number of edges from your graph in order to reduce the maximum degree the graph to 6. That is to get the maximal subgraph G' from G with D(u) <= 6 for all u in G'.
The closest idea I found is the K-core of a graph, but that's not exactly the same problem.
Your method
Your method is clearly not optimal, since you keep at most 6 edges of every vertex and recreate the graph with those edges. Take the graph A-B-C:
A -> 1. B, 2. C
B -> 1. C, 2. A
C -> 1. A, 2. B
If you try to reduce the maximum degree of this graph to 1 using your method, the first pass will remove A-B (B is the 2nd neighbor of A), B-A (A is the 2nd neighbor of B) and C-B (B is the 2nd neighbor of C):
A -> 1. B
B -> 1. C
C -> 1. A
The second pass, to insure that the graph is undirected, will remove all the remaining edges (and vertices).
An optimal reduction would be:
A -> 1. B
B -> 1. A
Or any other pair of vertices in A, B, C.
Some strategy
Let:
k = 6
D(u) = max(d(u)-k, 0): the number of neighbors above k, or 0
w(u-v) (resp s(u-v)) = the weak (resp. strong) endpoint of the edge: having the lowest (resp. highest) degree
m(u-v) = min(D(u), D(v))
M(u-v) = max(D(u), D(v))
Let S = sum(D(u) for u in G). The goal is to make S = 0 while removing a minimum number of edges. If you remove:
(1) a floating edge: m(u-v) > 0, then S is decreased by 2 (both endpoints loose 1 degree)
(2) a sinking edge: m(u-v) = 0 and M(u-v) > 0, then S is decreased by 1 (the degree of the weak endpoint is already <= 6)
(3) a sunk edge: M(u-v) = 0, then S is unchanged
Note that a floating edge may become a sinking edge if: 1. its weak endpoint has a degree of k+1; 2. you remove another edge connected to this endpoint. Similarly, a sinking edge can sunk.
You have to remove floating edges while avoid creating sinking edges, because removing a floating edges is more efficient to reduce S. Let K the number of floating edges removed, and L the number of sinking edges removed (we don't remove sunk edges) to make S = 0. We want 2*K + L >= S. Obviously, the idea is to make L as small a possible, because we want a small number of edges removed (K + L).
I doubt you'll find an optimal greedy algorithm, because everything depends on the order of removing and the remote consequences of the current removing are hard to predict.
But you can use a general strategy to limit the creation of sinking edges:
do not remove edges with m(u-v) = 1 unless you have no choice.
if you have to remove an edge with m(u-v) = 1, choose the one whose weak endpoint has the less floating edges (they will become sinking edges).
An algorithm
Here's a greedy algorithm that implements this strategy:
while {u, v in G | m(u-v) > 0} is not empty: // remove floating edges first
remove the edge u-v with:
1. the maxmimum m(u-v)
2. w(u-v) has the minimum of neighbors t with D(t) > 0
3. s(u-v) has the minimum of neighbors t with D(t) > 0
remove all edges from {u, v in G | M(u-v) > 0} // clean up sinking edges
clean orphan vertices
Termination the algorithm terminates because we remove an edge on each iteration, thus {u in G | D(u) > 0} will become empty at some point.
Note: you can use a heap and update m(u-v) after each removing.

Creating lists of mutual neighbor elements

Say, I have a set of unique, discrete parameter values, stored in a variable 'para'.
para=[1,2,3,4,5,6,7,8,9,10]
Each element in this list has 'K' number of neighbors (given: each neighbor ϵ para).
EDIT: This 'K' is obviously not the same for each element.
And to clarify the actual size of my problem: I need a neighborhood of close to 50-100 neighbors on average, given that my para list is around 1000 elements large.
NOTE: A neighbor of an element, is another possible 'element value' to which it can jump, by a single mutation.
neighbors_of_1 = [2,4,5,9] #contains all possible neighbors of 1 (i.e para[0])
Question: How can I define each of the other element's
neighbors randomly from 'para', but, keeping in mind the previously
assigned neighbors/relations?
eg:
neighbors_of_5=[1,3,7,10] #contains all possible neighbors of 5 (i.e para[4])
NOTE: '1' has been assigned as a neighbor of '5', keeping the values of 'neighbors_of_1' in mind. They are 'mutual' neighbors.
I know the inefficient way of doing this would be, to keep looping through the previously assigned lists and check if the current state is a neighbor of another state, and if True, store the value of that state as one of the new neighbors.
Is there a cleaner/more pythonic way of doing this? (By maybe using the concept of linked-lists or any other method? Or are lists redundant?)
This solution does what you want, I believe. It is not the most efficient, as it generates quite a bit of extra elements and data, but the run time was still short on my computer and I assume you won't run this repeatedly in a tight, inner loop?
import itertools as itools
import random
# Generating a random para variable:
#para=[1,2,3,4,5,6,7,8,9,10]
para = list(range(10000))
random.shuffle(para)
para = para[:1000]
# Generate all pais in para (in random order)
pairs = [(a,b) for a, b in itools.product(para, para) if a < b]
random.shuffle(pairs)
K = 50 # average number of neighbors
N = len(para)*K//2 # total connections
# Generating a neighbors dict, holding all the neighbors of an element
neighbors = dict()
for elem in para:
neighbors[elem] = []
# append the neighbors to eachother
for pair in pairs[:N]:
neighbors[pair[0]].append(pair[1])
neighbors[pair[1]].append(pair[0])
# sort each neighbor list
for neighbor in neighbors.values():
neighbor.sort()
I hope you understand my solution. Otherwise feel free to ask for a few pointers.
Neighborhood can be represented by a graph. If N is a neighbor of B does not necessarily implies that B is a neighbor of A, it is directed. Else it is undirected. I'm guessing you want a undirected graph since you want to "keep in mind the relationship between the nodes".
Besides the obvious choice of using a third party library for graphs, you can solve your issue by using a set of edges between the graph vertices. Edges can be represented by the pair of their two extremities. Since they are undirected, either you use a tuple (A,B), such that A < B or you use a frozenset((A,B)).
Note there are considerations to take about what neighbor to randomly choose from when in the middle of the algorithm, like discouraging to pick nodes with a lot of neighbor to avoid to go over your limits.
Here is a pseudo-code of what I'd do.
edges = set()
arities = [ 0 for p in para ]
for i in range(len(para)):
p = para[i]
arity = arities[i]
n = random.randrange(50, 100)
k = n
while k > 0:
w = list(map(lambda x : 1/x, arities))
#note: test what random scheme suits you best
j = random.choices(para, weight = w )
#note: I'm storing the vertices index in the edges rather than the nodes.
#But if the nodes are unique, you could store the nodes.
e = frozenset((i,j))
if e not in edges:
edges.add(e)
#instead of arities, you could have a list of list of the neighbours.
#arity[i] would be len(neighbors[i]), then
arities[i] += 1
arities[j] += 1
k-=1

Angle between planes algorithm is too slow

I have written working code calculating the angle between the adjacent planes.
Here's what I already tried to optimise:
1) I got rid of couple of np built-in functions, e.g. np.cross() and np.linalg.norm(), that gave me a couple of seconds.
2) It was for z in range(1, n), I changed 1 to k in order to not take into account already calculated triangles.
I also tried to make faster input, but to no avail.
Please, can someone tell me how to make it significantly faster?
I'm not well-acquainted with graphs, and I have a bad feeling about this...
(Migrated to Code Review)
You determine the adjacency of the triangles by matching all triangles to each other. If you create a dictionary of edges, you can find adjacent triangles more efficiently.
Use the two nodes of an edge as key. In order to make the key unique, make the node with the lowest index the first one. You can create the dict when you read the indices:
edge = {}
for i in range(n):
a, b, c = [int(j) for j in raw_input().split()]
ind.append((a, b, c))
k = (min(a, b), max(a, b))
edge[k] = edge.get(k, []) + [i]
k = (min(b, c), max(b, c))
edge[k] = edge.get(k, []) + [i]
k = (min(c, a), max(c, a))
edge[k] = edge.get(k, []) + [i]
Use the dict like so:
def calculate_angle():
for e in edge:
if len(e) == 2:
i1, i2 = e
n1 = norm[i1]
n2 = norm[i2]
a = abs(math.acos(max(-1, min(1, dot(n1, n2)))))
angles_list.append(a)
return max(angles_list)
The drawback here is that the angles appear in an arbitrary order in the list, but that's what happens in your original code, too.
You can speed up the program by precalculating the normal as unit vector for each tria only once and store it in the list norm. That's what I have done above. The angle calculation is then only the arc cosine of the dot product.
And do you only need the maximum value? Then don't create a list, but keep a running maximum which you update if the current angle is greater than the current maximum.

Categories