Say, I have a set of unique, discrete parameter values, stored in a variable 'para'.
para=[1,2,3,4,5,6,7,8,9,10]
Each element in this list has 'K' number of neighbors (given: each neighbor ϵ para).
EDIT: This 'K' is obviously not the same for each element.
And to clarify the actual size of my problem: I need a neighborhood of close to 50-100 neighbors on average, given that my para list is around 1000 elements large.
NOTE: A neighbor of an element, is another possible 'element value' to which it can jump, by a single mutation.
neighbors_of_1 = [2,4,5,9] #contains all possible neighbors of 1 (i.e para[0])
Question: How can I define each of the other element's
neighbors randomly from 'para', but, keeping in mind the previously
assigned neighbors/relations?
eg:
neighbors_of_5=[1,3,7,10] #contains all possible neighbors of 5 (i.e para[4])
NOTE: '1' has been assigned as a neighbor of '5', keeping the values of 'neighbors_of_1' in mind. They are 'mutual' neighbors.
I know the inefficient way of doing this would be, to keep looping through the previously assigned lists and check if the current state is a neighbor of another state, and if True, store the value of that state as one of the new neighbors.
Is there a cleaner/more pythonic way of doing this? (By maybe using the concept of linked-lists or any other method? Or are lists redundant?)
This solution does what you want, I believe. It is not the most efficient, as it generates quite a bit of extra elements and data, but the run time was still short on my computer and I assume you won't run this repeatedly in a tight, inner loop?
import itertools as itools
import random
# Generating a random para variable:
#para=[1,2,3,4,5,6,7,8,9,10]
para = list(range(10000))
random.shuffle(para)
para = para[:1000]
# Generate all pais in para (in random order)
pairs = [(a,b) for a, b in itools.product(para, para) if a < b]
random.shuffle(pairs)
K = 50 # average number of neighbors
N = len(para)*K//2 # total connections
# Generating a neighbors dict, holding all the neighbors of an element
neighbors = dict()
for elem in para:
neighbors[elem] = []
# append the neighbors to eachother
for pair in pairs[:N]:
neighbors[pair[0]].append(pair[1])
neighbors[pair[1]].append(pair[0])
# sort each neighbor list
for neighbor in neighbors.values():
neighbor.sort()
I hope you understand my solution. Otherwise feel free to ask for a few pointers.
Neighborhood can be represented by a graph. If N is a neighbor of B does not necessarily implies that B is a neighbor of A, it is directed. Else it is undirected. I'm guessing you want a undirected graph since you want to "keep in mind the relationship between the nodes".
Besides the obvious choice of using a third party library for graphs, you can solve your issue by using a set of edges between the graph vertices. Edges can be represented by the pair of their two extremities. Since they are undirected, either you use a tuple (A,B), such that A < B or you use a frozenset((A,B)).
Note there are considerations to take about what neighbor to randomly choose from when in the middle of the algorithm, like discouraging to pick nodes with a lot of neighbor to avoid to go over your limits.
Here is a pseudo-code of what I'd do.
edges = set()
arities = [ 0 for p in para ]
for i in range(len(para)):
p = para[i]
arity = arities[i]
n = random.randrange(50, 100)
k = n
while k > 0:
w = list(map(lambda x : 1/x, arities))
#note: test what random scheme suits you best
j = random.choices(para, weight = w )
#note: I'm storing the vertices index in the edges rather than the nodes.
#But if the nodes are unique, you could store the nodes.
e = frozenset((i,j))
if e not in edges:
edges.add(e)
#instead of arities, you could have a list of list of the neighbours.
#arity[i] would be len(neighbors[i]), then
arities[i] += 1
arities[j] += 1
k-=1
Related
Recently in my homework, I was assinged to solve the following problem:
Given a matrix of order nxn of zeros and ones, find the number of paths from [0,0] to [n-1,n-1] that go only through zeros (they are not necessarily disjoint) where you could only walk down or to the right, never up or left. Return a matrix of the same order where the [i,j] entry is the number of paths in the original matrix that go through [i,j], the solution has to be recursive.
My solution in python:
def find_zero_paths(M):
n,m = len(M),len(M[0])
dict = {}
for i in range(n):
for j in range(m):
M_top,M_bot = blocks(M,i,j)
X,Y = find_num_paths(M_top),find_num_paths(M_bot)
dict[(i,j)] = X*Y
L = [[dict[(i,j)] for j in range(m)] for i in range(n)]
return L[0][0],L
def blocks(M,k,l):
n,m = len(M),len(M[0])
assert k<n and l<m
M_top = [[M[i][j] for i in range(k+1)] for j in range(l+1)]
M_bot = [[M[i][j] for i in range(k,n)] for j in range(l,m)]
return [M_top,M_bot]
def find_num_paths(M):
dict = {(1, 1): 1}
X = find_num_mem(M, dict)
return X
def find_num_mem(M,dict):
n, m = len(M), len(M[0])
if M[n-1][m-1] != 0:
return 0
elif (n,m) in dict:
return dict[(n,m)]
elif n == 1 and m > 1:
new_M = [M[0][:m-1]]
X = find_num_mem(new_M,dict)
dict[(n,m-1)] = X
return X
elif m == 1 and n>1:
new_M = M[:n-1]
X = find_num_mem(new_M, dict)
dict[(n-1,m)] = X
return X
new_M1 = M[:n-1]
new_M2 = [M[i][:m-1] for i in range(n)]
X,Y = find_num_mem(new_M1, dict),find_num_mem(new_M2, dict)
dict[(n-1,m)],dict[(n,m-1)] = X,Y
return X+Y
My code is based on the idea that the number of paths that go through [i,j] in the original matrix is equal to the product of the number of paths from [0,0] to [i,j] and the number of paths from [i,j] to [n-1,n-1]. Another idea is that the number of paths from [0,0] to [i,j] is the sum of the number of paths from [0,0] to [i-1,j] and from [0,0] to [i,j-1]. Hence I decided to use a dictionary whose keys are matricies of the form [[M[i][j] for j in range(k)] for i in range(l)] or [[M[i][j] for j in range(k+1,n)] for i in range(l+1,n)] for some 0<=k,l<=n-1 where M is the original matrix and whose values are the number of paths from the top of the matrix to the bottom. After analizing the complexity of my code I arrived at the conclusion that it is O(n^6).
Now, my instructor said this code is exponential (for find_zero_paths), however, I disagree.
The recursion tree (for find_num_paths) size is bounded by the number of submatrices of the form above which is O(n^2). Also, each time we add a new matrix to the dictionary we do it in polynomial time (only slicing lists), SO... the total complexity is polynomial (poly*poly = poly). Also, the function 'blocks' runs in polynomial time, and hence 'find_zero_paths' runs in polynomial time (2 lists of polynomial-size times a function which runs in polynomial time) so all in all the code runs in polynomial time.
My question: Is the code polynomial and my O(n^6) bound is wrong or is it exponential and I am missing something?
Unfortunately, your instructor is right.
There is a lot to unpack here:
Before we start, as quick note. Please don't use dict as a variable name. It hurts ^^. Dict is a reserved keyword for a dictionary constructor in python. It is a bad practice to overwrite it with your variable.
First, your approach of counting M_top * M_bottom is good, if you were to compute only one cell in the matrix. In the way you go about it, you are unnecessarily computing some blocks over and over again - that is why I pondered about the recursion, I would use dynamic programming for this one. Once from the start to end, once from end to start, then I would go and compute the products and be done with it. No need for O(n^6) of separate computations. Sine you have to use recursion, I would recommend caching the partial results and reusing them wherever possible.
Second, the root of the issue and the cause of your invisible-ish exponent. It is hidden in the find_num_mem function. Say you compute the last element in the matrix - the result[N][N] field and let us consider the simplest case, where the matrix is full of zeroes so every possible path exists.
In the first step, your recursion creates branches [N][N-1] and [N-1][N].
In the second step, [N-1][N-1], [N][N-2], [N-2][N], [N-1][N-1]
In the third step, you once again create two branches from every previous step - a beautiful example of an exponential explosion.
Now how to go about it: You will quickly notice that some of the branches are being duplicated over and over. Cache the results.
This is for my research in protein folding (So I guess technically a school project)
Summary:
I have the edges of an weighted undirected graph. Each vertex of the graph has anywhere from 1 to 20-ish edges. I would like to trim this graph down such that no vertex has more than 6 edges. I would also like the graph to retain as much connectivity as possible (maximize the degree).
Background:
I have a Delaunay Tesselation of the atoms (pointcloud essentially) in a protein using the scipy library. I use this to create a list of all pairs of residues that are in contact with each other (I store the distance between them). This list contains every pair (twice), and the distance between the pairs. (The residue contains many atoms so I use the average position of them to get the position of the residue)
pairs
[(ALA 1, GLU 2, 2.7432), (ALA 1, GLU 2, 2.7432), (ALA 4, ASP 27, 4.8938), (ALA 4, ASP 27, 4.8938) ... ]
What I have tried (which works but isn't exactly what I want) is to only store the six closest contacts. (I sort the residue names so I can use collections later)
for contact in residue.contacts[:6]:
pairs.append( tuple( sorted([residue.name, contact.name], key=lambda r: r.name) + [residue.dist[contact]] ) )
And then remove any contacts that are not reciprocated. (I guess technically add contacts that are)
new_pairs = []
counter=collections.Counter(pairs)
for key, val in counter.items():
if val == 2:
new_pairs.append(key)
This works, but I lose some information that I would like to keep. I phrased the question as a graph theory problem because I feel like this problem has already been solved in that field.
I was thinking that greedy algorithm might work:
while run_greedy:
# find the residue with the maximum number of neighbors
# find that residues pair with the maximum number of neighbors but only if the pair exists in pairs
# remove that pair from pairs
# if maximum_degree <= 6: run_greedy = False
Does the greedy algorithm work? Are there known algorithms that do this well? Is there a library that can do this (I am more than willing to change the format of the data to fit the library)?
I hope this is enough information, Thanks in advance for the help.
EDIT this is an variant of the knapsack problem: you add edges one by one, and want to maximize the number of edges while the graph built doesn't exceed a given degree.
The following solution uses dynamic programming.
Let m[i, d] the maximum subset of edges in e_0, ..., e_{i-1} creating a subgraph of maximium degree <= d.
m[i, 0] = {}
m[0, d] = {}
m[i, d] = m[i-1, d] + {e_i} if the degree of the graph is <= d
m[i, d] = m[i-1, d-1] + {e_i} if it has more edges than m[i-1][d], else m[i-1][d].
Hence the algorithm (not tested):
for i in 0..N:
m[i][0] = {}
for d in 1..K:
m[0][d] = {}
for d in 1..K:
for i in 1..N:
G1 = m[i-1][d] + {e_i}
if D(G1) == d: # can add e_i with degree <= k
m[i][d] = G1
else:
m[i][d] = max(m[i-1][d-1] + {e_i}, m[i-1][d]) # key=cardinal
Solution is: m[N-1][K-1]. Time complexity is O(K N^2) (imbricated loops : K N + maximum degre of the graph in N or less)
Previous answer
TLDR; I don't know how to find an optimal solution, but a greedy algorithm might give you acceptable result.
The problem
Let me rephrase the problem, based on your question and your code: you want to remove a minimum number of edges from your graph in order to reduce the maximum degree the graph to 6. That is to get the maximal subgraph G' from G with D(u) <= 6 for all u in G'.
The closest idea I found is the K-core of a graph, but that's not exactly the same problem.
Your method
Your method is clearly not optimal, since you keep at most 6 edges of every vertex and recreate the graph with those edges. Take the graph A-B-C:
A -> 1. B, 2. C
B -> 1. C, 2. A
C -> 1. A, 2. B
If you try to reduce the maximum degree of this graph to 1 using your method, the first pass will remove A-B (B is the 2nd neighbor of A), B-A (A is the 2nd neighbor of B) and C-B (B is the 2nd neighbor of C):
A -> 1. B
B -> 1. C
C -> 1. A
The second pass, to insure that the graph is undirected, will remove all the remaining edges (and vertices).
An optimal reduction would be:
A -> 1. B
B -> 1. A
Or any other pair of vertices in A, B, C.
Some strategy
Let:
k = 6
D(u) = max(d(u)-k, 0): the number of neighbors above k, or 0
w(u-v) (resp s(u-v)) = the weak (resp. strong) endpoint of the edge: having the lowest (resp. highest) degree
m(u-v) = min(D(u), D(v))
M(u-v) = max(D(u), D(v))
Let S = sum(D(u) for u in G). The goal is to make S = 0 while removing a minimum number of edges. If you remove:
(1) a floating edge: m(u-v) > 0, then S is decreased by 2 (both endpoints loose 1 degree)
(2) a sinking edge: m(u-v) = 0 and M(u-v) > 0, then S is decreased by 1 (the degree of the weak endpoint is already <= 6)
(3) a sunk edge: M(u-v) = 0, then S is unchanged
Note that a floating edge may become a sinking edge if: 1. its weak endpoint has a degree of k+1; 2. you remove another edge connected to this endpoint. Similarly, a sinking edge can sunk.
You have to remove floating edges while avoid creating sinking edges, because removing a floating edges is more efficient to reduce S. Let K the number of floating edges removed, and L the number of sinking edges removed (we don't remove sunk edges) to make S = 0. We want 2*K + L >= S. Obviously, the idea is to make L as small a possible, because we want a small number of edges removed (K + L).
I doubt you'll find an optimal greedy algorithm, because everything depends on the order of removing and the remote consequences of the current removing are hard to predict.
But you can use a general strategy to limit the creation of sinking edges:
do not remove edges with m(u-v) = 1 unless you have no choice.
if you have to remove an edge with m(u-v) = 1, choose the one whose weak endpoint has the less floating edges (they will become sinking edges).
An algorithm
Here's a greedy algorithm that implements this strategy:
while {u, v in G | m(u-v) > 0} is not empty: // remove floating edges first
remove the edge u-v with:
1. the maxmimum m(u-v)
2. w(u-v) has the minimum of neighbors t with D(t) > 0
3. s(u-v) has the minimum of neighbors t with D(t) > 0
remove all edges from {u, v in G | M(u-v) > 0} // clean up sinking edges
clean orphan vertices
Termination the algorithm terminates because we remove an edge on each iteration, thus {u in G | D(u) > 0} will become empty at some point.
Note: you can use a heap and update m(u-v) after each removing.
I am learning to solve a topological sort problem in leetcode
There are a total of n courses you have to take, labeled from 0 to n-1.
Some courses may have prerequisites, for example to take course 0 you have to first take course 1, which is expressed as a pair: [0,1]
Given the total number of courses and a list of prerequisite pairs, is it possible for you to finish all courses?
Example 1:
Input: 2, [[1,0]]
Output: true
Explanation: There are a total of 2 courses to take.
To take course 1 you should have finished course 0. So it is possible.
Example 2:
Input: 2, [[1,0],[0,1]]
Output: false
Explanation: There are a total of 2 courses to take.
To take course 1 you should have finished course 0, and to take course 0 you should
also have finished course 1. So it is impossible.
Note:
The input prerequisites is a graph represented by a list of edges, not adjacency matrices. Read more about how a graph is represented.
You may assume that there are no duplicate edges in the input prerequisites.
I read the following toposort solution in discussion area
class Solution5:
def canFinish(self,numCourses, prerequirements):
"""
:type numCourse: int
:type prerequirements: List[List[int]]
:rtype:bool
"""
if not prerequirements: return True
count = []
in_degrees = defaultdict(int)
graph = defaultdict(list)
for u, v in prerequirements:
graph[v].append(u)
in_degrees[u] += 1 #Confused here
queue = [u for u in graph if in_degrees[u]==0]
while queue:
s = queue.pop()
count.append(s)
for v in graph(s):
in_degrees[v] -= 1
if in_degrees[v] ==0:
queue.append(v)
#check there exist a circle
for u in in_degrees:
if in_degrees[u]:
return False
return True
I am confused about in_degrees[u] += 1
for u, v in prerequirements:
graph[v].append(u)
in_degrees[u] += 1 #Confused here
for directed edge (u,v) , u -----> v , node u has one outdegree while node v has one indegree.
So I think, in_degrees[u] += 1 should be changed to in_degrees[v] += 1
because if there exist (u,v), then v has at least one incoming incident and one indegree
In Degree: This is applicable only for directed graph. This represents the number of edges incoming to a vertex.
However, the orginal solution works.
What's the problem with my understanding?
Look at the line above; graph[v].append(u). The edges are actually going in reverse direction to your assumption and the input format. This is because for topological sort, we want the things with no dependencies/incoming edges to end up at the front of the resulting order, so we direct the edges according to the interpretation, "is a requirement for" rather than "requires". Eg. input pair (0,1) means 0 requires 1, so in the graph we draw a directed edge (1,0) so that 1 can precede 0 in our sort. Thus 0 gains indegree from considering this pair.
I am learning about topological sort, and graphs in general. I implemented a version below using DFS but I am having trouble understanding why the wikipedia page says this is O(|V|+|E|) and analyzing its time complexity, and the difference between |V|+|E| and n^2 in general.
Firstly, I have two for loops, logic says that it would be (n^2) but also isnt it true that in any DAG(or Tree), there is n-1 edges, and n vertexes? How is this any different from n^2 if we can remove the "-1" for non significant value?
graph = {
1:[4, 5, 7],
2:[3,5,6],
3:[4],
4:[5],
5:[6,7],
6:[7],
7:[]
}
from collections import defaultdict
def topological_sort(graph):
ordered, marked = [], defaultdict(int)
while len(ordered) < len(graph):
for vertex in graph:
if marked[vertex]==0:
visit(graph, vertex, ordered, marked)
return ordered
def visit(graph, n, ordered, marked):
if marked[n] == 1:
raise 'Not a DAG'
marked[n] = 1
for neighbor in graph.get(n):
if marked[neighbor]!=2:
visit(graph, neighbor, ordered, marked)
marked[n] = 2
ordered.insert(0, n)
def main():
print(topological_sort(graph))
main()
The proper implementation works in O(|V| + |E|) time because it goes through every edge and every vertex at most once. It's the same thing as O(|V|^2) for a complete (or almost complete graph). However, it's much better when the graph is sparse.
You implementation is O(|V|^2), not O(|V| + |E|). These two nested loops:
while len(ordered) < len(graph):
for vertex in graph:
if marked[vertex]==0:
visit(graph, vertex, ordered, marked)
do 1 + 2 ... + |V| = O(|V|^2) iterations in the worst case (for instance, for an empty graph). You can easily fix by getting rid of the outer loop (it's that simple: just remove the while loop. You don't need it).
I was working on a question on a judge that asked about finding the number of vertices that are within a certain distance from it. This has to be done for all vertices on the graph. The full question specifications can be seen here. I have some Python code to solve the program, but it is too slow.
import sys, collections
raw_input = sys.stdin.readline
n, m, k = map(int, raw_input().split())
dict1 = collections.defaultdict(set)
ans = {i:[set([i])for j in xrange(k)]for i in xrange(1, n+1)}
for i in xrange(m):
x, y = map(int, raw_input().split())
dict1[x].add(y)
dict1[y].add(x)
for point in dict1:
ans[point][0].update(dict1[point])
for i in xrange(1, k):
for point in dict1:
for neighbour in dict1[point]:
ans[point][i].update(ans[neighbour][i-1])
for i in xrange(1, n+1):
print len(ans[i][-1])
What my code does is it initially creates a set of points that are direct neighbours of each vertex (distance of 0 to 1). After that, it creates a new set of neighbours for each vertex from all the previously found neighbours of neighbours (distance of 2). Then it keeps doing this, creating a new set of neighbours and incrementing the distance until the final distance is reached. Is there a better way to solve this problem?
There is a plenty of good and fast solutions.
One of them (Not the fastest, but fast enough) is to use BFS algorithm up to distance K. Just run bfs, which not adds neighbours to queue when distance exceeds K, for all vertexes. K is parameter from exercise specification.
I would use the adjacency matrix multiplication. Adjacency matrix is a boolean square matrix n * n where n is a number of vertices. The value of adjacency_matrix[i][j] equals 1 if the edge from i to j exists and 0 otherwise. If we multiply the adjacency matrix by itself we get the paths of length 2. If we do that again we get the paths of length 3 and so on and so on. In Your case the K <= 5 so there won't be too much of that multiplication. You can use numpy for that and it will be very fast. So in pseudocode, the solution to Your problem would look like this:
adjacency_matrix = build_adjacency_matrix_from_input()
initial_adjacency_matrix = adjacency_matrix
result_matrix = adjacency_matrix
for i = 2 to K:
adjacency_matrix = adjacency_matrix * initial_adjacency_matrix
result_matrix = result_matrix + adjacency_matrix
for each row of result_matrix print how many values higher then 0 are in it
You want paths of length <=K. In this case, BFS can be used to find paths of certain length easily. Or if your graphs uses adjacency matrix representation, then matrix multiplication can also be used for these purposes.
If using BFS:
This is equivalent to performing level-by-level traversal starting from a given source vertex. Here is the pseudo code that can compute all the vertices that are at distance K from a given source vertex:
Start: Let s be your source vertex and K represent max path length required
Create two Queues Q1 and Q2 and insert source vertex s into Q1
Let queueTobeEmptied = Q1 // represents the queue that is to be emptied
Let queueTobeFilled = Q2 // represents the queue that is used for inserting new elements discovered
Let Result be a vector of vertices: initialize it to be empty
Note: source vertex s is at level 0, push it to Result vector if that is also required
for(current_level=1; current_level<=K; current_level++) {
while(queueTobeEmptied is not empty) {
remove the vertex from queueTobeEmptied and call it as u
for_each adjacent vertex 'v' of u {
if v is not already visited {
mark v as visited
insert v into the queueTobeFilled
push v to Result
}
}
}
swap the queues now for next iteration of for loop: swap(queueTobeEmptied, queueTobeFilled)
}
Empty Q1 and Q2
End: Result is the vector that contains all the vertices of length <= K