distance between two nodes using breadth first search algorithm using Python - python

How can I get just a distance(number of edges) between any two nodes of the graph using BFS algorithms?
I do not want to save the path information as a list (like the code below) to decrease the runtime of the code. (for better performance)
def check_distance(self, satrt, end, max_distance):
queue = deque([start])
while queue:
path = queue.popleft()
node = path[-1]
if node == end:
return len(path)
elif len(path) > max_distance:
return False
else:
for adjacent in self.graph.get(node, []):
queue.append(list(path) + [adjacent])

You can increase performance with two changes:
As you said, replace paths with distances. This will save memory, more so when the distances are large.
Maintain a set of already seen nodes. This will drastically cut the number of possible paths, especially when there are multiple edges per node. If you don't do this, then the algorithm will walk in circles and back-and-forth between nodes.
I would try something like this:
from collections import deque
class Foo:
def __init__(self, graph):
self.graph = graph
def check_distance(self, start, end, max_distance):
queue = deque([(start, 0)])
seen = set()
while queue:
node, distance = queue.popleft()
if node in seen or max_distance < distance:
continue
seen.add(node)
if node == end:
return distance
for adjacent in self.graph.get(node, []):
queue.append((adjacent, distance + 1))
graph = {}
graph[1] = [2, 3]
graph[2] = [4]
graph[4] = [5]
foo = Foo(graph)
assert foo.check_distance(1, 2, 10) == 1
assert foo.check_distance(1, 3, 10) == 1
assert foo.check_distance(1, 4, 10) == 2
assert foo.check_distance(1, 5, 10) == 3
assert foo.check_distance(2, 2, 10) == 0
assert foo.check_distance(2, 1, 10) == None
assert foo.check_distance(2, 4, 10) == 1

Related

Finding a way throug a matrix using Python

Im trying to solve a Problem for my University Homework, The task is to find the cheapest path trough a NxN Matrix where every Point in the Matrix stores a random Integer between 0 and 9. The Start is at 0,0 and the end at N,N . The Output should consist of the cheapest Path as a List of Tupels and the Cost of the Path(adding up the values of each Step).
I have tried using a Tree where 0,0 is the root and the children are its neighbours in the matrix, and the children of the children are their neighbours and so on. Then i wanted to add up all the nodes that end with N,N as the last child, but i didnt get the tree working in the first place. We havent had Trees in our lectures yet, so im open to any other Solution for this Problem. Thank you :)
import random
import math
def Matrix_gen(n):
# Generate a n*n matrix with random values
matrix = []
for i in range(n):
matrix.append([])
for j in range(n):
matrix[i].append(random.randint(0, 9))
return matrix
MATRIX = Matrix_gen(5)
def get_neighbour(i, j, matrix,):
neighbours = []
n = len(matrix) - 1
for x in range(len(matrix)-1):
for y in range(len(matrix)-1):
if x != n:
if matrix[x+1][y] == matrix[i][j]:
neighbours.append((x + 1, y))
if x != 0:
if matrix[x-1][y] == matrix[i][j]:
neighbours.append((x - 1, y))
if y != n:
if matrix[x][y + 1] == matrix[i][j]:
neighbours.append((x, y + 1))
if y != 0:
if matrix[x][y - 1] == matrix[i][j]:
neighbours.append((x, y - 1))
if matrix[i][j] == matrix[n][n]:
return []
return neighbours
#creat a class that stores a Tree
class Tree:
def __init__(self, value, Children = []):
self.value = value
self.Children = Children
#the root of the tree is the first element of the matrix
def root(self):
#add (0,0) as the value of the root
self.value = (0,0)
return self.value
#add the neighbours of the root as the children of the root
def add_children(self, matrix):
#add the neighbours of the lowest node as the children of the lowest node until
#a node has no neighbours
while get_neighbour(self.value[0], self.value[1], matrix) != []:
self.Children.append(get_neighbour(self.value[0], self.value[1], matrix))
self.value = self.Children[-1]
return self.Children
#print the tree
def print_tree(self):
print(self.value)
for i in self.Children:
print(i)
return
#Create the tree in the Class Tree
Tree = Tree((0,0))
Tree.add_children(MATRIX)
Tree.print_tree()
Please read the open letter to students befor copy and paste any of this. Seek help with your tutor if things are unclear.
Disclaimer: Because this is homework, this is (intentionally) not a complete answer. The answer works under the assumption that we are NOT allowed to go diagonal. Allowing diagonal movements adds additional complexity in the path generation and is left for exercising (the needed flexibility is there).
The code will take longer and longer the bigger N is, because of the definition of the problem. See combination of pathes on a grid. See benchmark below...
I tried to keep the code readable and understandable, there are more compressed and probably also better optimized ways to do this (happy to take comments, given that readability is maintained).
Let's start with a set of functions.
from itertools import permutations
import numpy as np
DOWN = 'D'
RIGHT = 'R'
def random_int_matrix(size: int) -> np.array:
"""Generates a size x size matrix with random integers from 0 to 9"""
mat = np.random.random((size, size)) * 10
return mat.astype(int)
def find_all_paths(size: int):
"""Creates all possible pathes going down and right"""
return [gen_path(perm) for perm in permutations([DOWN] * (size-1) + [RIGHT] * (size-1))]
def gen_path(permutation: str) -> list:
track = [(0, 0)]
for entry in permutation:
if entry == DOWN:
track.append((track[-1][0] + 1, track[-1][1]))
else:
track.append((track[-1][0], track[-1][1] + 1))
return track
def sum_track_values(mat: np.array, track: list) -> list:
"""Computes the value sum for the given path"""
return sum([mat[e[0], e[1]] for e in track])
OK, now we can run the programm
MATRIX_SIZE = 4
matrix = random_int_matrix(MATRIX_SIZE)
print('Randomly generated matrix:\n', matrix)
paths = find_all_paths(MATRIX_SIZE)
costs = np.array([sum_track_values(matrix, p) for p in paths])
min_idx = costs.argmin()
print('Best path:', paths[min_idx])
print('Costs:', costs[min_idx])
In my case the result was
Randomly generated matrix:
[[3 8 6 6]
[2 4 1 4]
[7 4 0 4]
[9 6 8 4]]
Best path: [(0, 0), (1, 0), (1, 1), (1, 2), (2, 2), (2, 3), (3, 3)]
Costs: 18
Small benchmark:
Runtime for N=1: 0.0000 sec (1 possible paths)
Runtime for N=2: 0.0000 sec (2 possible paths)
Runtime for N=3: 0.0001 sec (24 possible paths)
Runtime for N=4: 0.0016 sec (720 possible paths)
Runtime for N=5: 0.1344 sec (40,320 possible paths)
Runtime for N=6: 19.9810 sec (3,628,800 possible paths)

Shortest path Graph BFS python

Trying to return the int for shortest path in a graph, using BFS. The idea is to use a q, append into the q as [node,distance] and then when we traverse increase distance and keep track of the count and when we hit our destination first time that means we found shortest path so we return that. But I got error " currNode,distance = q.popleft()
ValueError: not enough values to unpack (expected 2, got 1)"
def shortestPath(graph,nodeA,nodeB):
q = deque((nodeA,0))
visited = set(nodeA)
while q:
currNode,distance = q.popleft()
if currNode == nodeB:
return distance
for neighbor in graph[currNode]:
if neighbor not in visited:
visited.add(neighbor)
q.append([neighbor,distance+1])
return -1
graph_three = {
'w':['x','v'],
'x':['w','y'],
'y':['x','z'],
'z':['y','v'],
'v':['z','w']
}
print(shortestPath(graph_three,'w','z'))
Deque takes an iterable of elements as input, you gave it a tuple so your deque will contains two elements instead of the expected one tuple of two elements.
fix line 2 into:
q = deque([(nodeA,0)])
also here is a cleaner implementation of BFS:
def shortestPath(graph, root, target):
if root == target: return 0
q = collections.deque(root)
visited = set(root)
distance = 0
while q:
for _ in range(len(q)):
node = q.popleft()
for neighbor in graph[node]:
if neighbor == target:
return distance + 1
elif neighbor not in visited:
visited.add(neighbor)
q.append(neighbor)
distance += 1
return -1

python graph - how to sort a list of nodes in graph

I am faced with the below question. Any ideas? Please help.
There are two pre-defined functions given. One creates a digraph, another uses DFS to get the path between nodes. Problem is to return a path in increasing order between n to m.
Eg:
g = {0: [2], 1: [8, 3], 2: [4, 3, 8], 3: [4, 2, 0], 4: [8, 0], 5: [4, 1, 3], 8: [2, 0, 5, 3, 1]}
n = 8
Output
[0, 1, 2, 3, 4, 5]
import random
# You are given this function - do not modify
def createRandomGraph():
"""Creates a digraph with 7 randomly chosen integer nodes from 0 to 9 and
randomly chosen directed edges (between 10 and 20 edges)
"""
g = {}
n = random.sample([0,1,2,3,4,5,6,7,8,9], 7)
for i in n:
g[i] = []
edges = random.randint(10,20)
count = 0
while count < edges:
a = random.choice(n)
b = random.choice(n)
if b not in g[a] and a != b:
g[a].append(b)
count += 1
return g
# You are given this function - do not modify
def findPath(g, start, end, path=[]):
""" Uses DFS to find a path between a start and an end node in g.
If no path is found, returns None. If a path is found, returns the
list of nodes """
path = path + [start]
if start == end:
return path
if not start in g:
return None
for node in g[start]:
if node not in path:
newpath = findPath(g, node, end, path)
if newpath: return newpath
return None
#########################
## WRITE THIS FUNCTION ##
#########################
def allReachable(g, n):
"""
Assumes g is a directed graph and n a node in g.
Returns a sorted list (increasing by node number) containing all
nodes m such that there is a path from n to m in g.
Does not include the node itself.
"""
#TODO
I am very new to python graphs and I really need some help here.
You can choose a random node m, check if it's reachable from n using the supplied DFS function.
If a path is found you will receive that path (all the nodes in that path are also reachable from n).
You then repeat the process with a different node that isn't present in the returned path until no nodes remain to be checked.
Pseudo-code:
nodes = g.nodes
nodes.pop(n)
node = nodes[0]
path = DFS(g,n, node)
nodes = nodes - path
while len(nodes)>0:
node = nodes[0]
path = DFS(g,n, node)
nodes = nodes - path
return sorted(g.nodes - nodes)
def allReachable(g, n):
nodes = []
for key in g:
path = findPath(g,n,key)
if path != None:
for i in path:
if i not in nodes:
nodes.append(i)
nodes.remove(n)
return(sorted(nodes))

How to find the kth smallest node in BST? (revisited)

I have asked a similar question yesterday but I reached a different solution from the one posted in the original question sO I am reposting with new code. I am not keeping track of number of right and left children of each node. The code works fine for some cases, but for the case of of finding 6th smalest element, it fails. The problem is that I somehow need to carry the number of children down the tree. For example, for node 5, I need to cary over rank of node 4 and I am not able to do that.
This is not a homework, I am trying to prepare for interview and this is one of the classical questions and I can't solve it.
class Node:
"""docstring for Node"""
def __init__(self, data):
self.data = data
self.left = None
self.right = None
self.numLeftChildren = 0
self.numRightChildren = 0
class BSTree:
def __init__(self):
# initializes the root member
self.root = None
def addNode(self, data):
# creates a new node and returns it
return Node(data)
def insert(self, root, data):
# inserts a new data
if root == None:
# it there isn't any data
# adds it and returns
return self.addNode(data)
else:
# enters into the tree
if data <= root.data:
root.numLeftChildren += 1
# if the data is less than the stored one
# goes into the left-sub-tree
root.left = self.insert(root.left, data)
else:
# processes the right-sub-tree
root.numRightChildren += 1
root.right = self.insert(root.right, data)
return root
def getRankOfNumber(self, root, x, rank):
if root == None:
return 0
if rank == x:
return root.data
else:
if x > rank:
return self.getRankOfNumber(root.right, x, rank+1+root.right.numLeftChildren)
if x <= rank:
return self.getRankOfNumber(root.left, x, root.left.numLeftChildren+1)
# main
btree = BSTree()
root = btree.addNode(13)
btree.insert(root, 3)
btree.insert(root, 14)
btree.insert(root, 1)
btree.insert(root, 4)
btree.insert(root, 18)
btree.insert(root, 2)
btree.insert(root, 12)
btree.insert(root, 10)
btree.insert(root, 5)
btree.insert(root, 11)
btree.insert(root, 8)
btree.insert(root, 7)
btree.insert(root, 9)
btree.insert(root, 6)
print btree.getRankOfNumber(root, 8, rank=root.numLeftChildren+1)
You have the rank of a node. You need to find the rank of its left or right child. Well, how many nodes are between the node and its child?
a
/ \
/ \
b c
/ \ / \
W X Y Z
Here's an example BST. Lowercase letters are nodes; uppercase are subtrees. The number of nodes between a and b is the number of nodes in X. The number of nodes between a and c is the number of nodes in Y. Thus, you can compute the rank of b or c from the rank of a and the size of X or Y.
rank(b) == rank(a) - size(X) - 1
rank(c) == rank(a) + size(Y) + 1
You had the c formula, but the wrong b formula.

Code is taking too much time

I wrote code to arrange numbers after taking user input. The ordering requires that the sum of adjacent numbers is prime. Up until 10 as an input code is working fine. If I go beyond that the system hangs. Please let me know the steps to optimize it
ex input 8
Answer should be: (1, 2, 3, 4, 7, 6, 5, 8)
Code as follows....
import itertools
x = raw_input("please enter a number")
range_x = range(int(x)+1)
del range_x[0]
result = list(itertools.permutations(range_x))
def prime(x):
for i in xrange(1,x,2):
if i == 1:
i = i+1
if x%i==0 and i < x :
return False
else:
return True
def is_prime(a):
for i in xrange(len(a)):
print a
if i < len(a)-1:
if prime(a[i]+a[i+1]):
pass
else:
return False
else:
return True
for i in xrange(len(result)):
if i < len(result)-1:
if is_prime(result[i]):
print 'result is:'
print result[i]
break
else:
print 'result is'
print result[i-1]
For posterity ;-), here's one more based on finding a Hamiltonian path. It's Python3 code. As written, it stops upon finding the first path, but can easily be changed to generate all paths. On my box, it finds a solution for all n in 1 through 900 inclusive in about one minute total. For n somewhat larger than 900, it exceeds the maximum recursion depth.
The prime generator (psieve()) is vast overkill for this particular problem, but I had it handy and didn't feel like writing another ;-)
The path finder (ham()) is a recursive backtracking search, using what's often (but not always) a very effective ordering heuristic: of all the vertices adjacent to the last vertex in the path so far, look first at those with the fewest remaining exits. For example, this is "the usual" heuristic applied to solving Knights Tour problems. In that context, it often finds a tour with no backtracking needed at all. Your problem appears to be a little tougher than that.
def psieve():
import itertools
yield from (2, 3, 5, 7)
D = {}
ps = psieve()
next(ps)
p = next(ps)
assert p == 3
psq = p*p
for i in itertools.count(9, 2):
if i in D: # composite
step = D.pop(i)
elif i < psq: # prime
yield i
continue
else: # composite, = p*p
assert i == psq
step = 2*p
p = next(ps)
psq = p*p
i += step
while i in D:
i += step
D[i] = step
def build_graph(n):
primes = set()
for p in psieve():
if p > 2*n:
break
else:
primes.add(p)
np1 = n+1
adj = [set() for i in range(np1)]
for i in range(1, np1):
for j in range(i+1, np1):
if i+j in primes:
adj[i].add(j)
adj[j].add(i)
return set(range(1, np1)), adj
def ham(nodes, adj):
class EarlyExit(Exception):
pass
def inner(index):
if index == n:
raise EarlyExit
avail = adj[result[index-1]] if index else nodes
for i in sorted(avail, key=lambda j: len(adj[j])):
# Remove vertex i from the graph. If this isolates
# more than 1 vertex, no path is possible.
result[index] = i
nodes.remove(i)
nisolated = 0
for j in adj[i]:
adj[j].remove(i)
if not adj[j]:
nisolated += 1
if nisolated > 1:
break
if nisolated < 2:
inner(index + 1)
nodes.add(i)
for j in adj[i]:
adj[j].add(i)
n = len(nodes)
result = [None] * n
try:
inner(0)
except EarlyExit:
return result
def solve(n):
nodes, adj = build_graph(n)
return ham(nodes, adj)
This answer is based on #Tim Peters' suggestion about Hamiltonian paths.
There are many possible solutions. To avoid excessive memory consumption for intermediate solutions, a random path can be generated. It also allows to utilize multiple CPUs easily (each cpu generates its own paths in parallel).
import multiprocessing as mp
import sys
def main():
number = int(sys.argv[1])
# directed graph, vertices: 1..number (including ends)
# there is an edge between i and j if (i+j) is prime
vertices = range(1, number+1)
G = {} # vertex -> adjacent vertices
is_prime = sieve_of_eratosthenes(2*number+1)
for i in vertices:
G[i] = []
for j in vertices:
if is_prime[i + j]:
G[i].append(j) # there is an edge from i to j in the graph
# utilize multiple cpus
q = mp.Queue()
for _ in range(mp.cpu_count()):
p = mp.Process(target=hamiltonian_random, args=[G, q])
p.daemon = True # do not survive the main process
p.start()
print(q.get())
if __name__=="__main__":
main()
where Sieve of Eratosthenes is:
def sieve_of_eratosthenes(limit):
is_prime = [True]*limit
is_prime[0] = is_prime[1] = False # zero and one are not primes
for n in range(int(limit**.5 + .5)):
if is_prime[n]:
for composite in range(n*n, limit, n):
is_prime[composite] = False
return is_prime
and:
import random
def hamiltonian_random(graph, result_queue):
"""Build random paths until Hamiltonian path is found."""
vertices = list(graph.keys())
while True:
# build random path
path = [random.choice(vertices)] # start with a random vertice
while True: # until path can be extended with a random adjacent vertex
neighbours = graph[path[-1]]
random.shuffle(neighbours)
for adjacent_vertex in neighbours:
if adjacent_vertex not in path:
path.append(adjacent_vertex)
break
else: # can't extend path
break
# check whether it is hamiltonian
if len(path) == len(vertices):
assert set(path) == set(vertices)
result_queue.put(path) # found hamiltonian path
return
Example
$ python order-adjacent-prime-sum.py 20
Output
[19, 18, 13, 10, 1, 4, 9, 14, 5, 6, 17, 2, 15, 16, 7, 12, 11, 8, 3, 20]
The output is a random sequence that satisfies the conditions:
it is a permutation of the range from 1 to 20 (including)
the sum of adjacent numbers is prime
Time performance
It takes around 10 seconds on average to get result for n = 900 and extrapolating the time as exponential function, it should take around 20 seconds for n = 1000:
The image is generated using this code:
import numpy as np
figname = 'hamiltonian_random_noset-noseq-900-900'
Ns, Ts = np.loadtxt(figname+'.xy', unpack=True)
# use polyfit to fit the data
# y = c*a**n
# log y = log (c * a ** n)
# log Ts = log c + Ns * log a
coeffs = np.polyfit(Ns, np.log2(Ts), deg=1)
poly = np.poly1d(coeffs, variable='Ns')
# use curve_fit to fit the data
from scipy.optimize import curve_fit
def func(x, a, c):
return c*a**x
popt, pcov = curve_fit(func, Ns, Ts)
aa, cc = popt
a, c = 2**coeffs
# plot it
import matplotlib.pyplot as plt
plt.figure()
plt.plot(Ns, np.log2(Ts), 'ko', label='time measurements')
plt.plot(Ns, np.polyval(poly, Ns), 'r-',
label=r'$time = %.2g\times %.4g^N$' % (c, a))
plt.plot(Ns, np.log2(func(Ns, *popt)), 'b-',
label=r'$time = %.2g\times %.4g^N$' % (cc, aa))
plt.xlabel('N')
plt.ylabel('log2(time in seconds)')
plt.legend(loc='upper left')
plt.show()
Fitted values:
>>> c*a**np.array([900, 1000])
array([ 11.37200806, 21.56029156])
>>> func([900, 1000], *popt)
array([ 14.1521409 , 22.62916398])
Dynamic programming, to the rescue:
def is_prime(n):
return all(n % i != 0 for i in range(2, n))
def order(numbers, current=[]):
if not numbers:
return current
for i, n in enumerate(numbers):
if current and not is_prime(n + current[-1]):
continue
result = order(numbers[:i] + numbers[i + 1:], current + [n])
if result:
return result
return False
result = order(range(500))
for i in range(len(result) - 1):
assert is_prime(result[i] + result[i + 1])
You can force it to work for even larger lists by increasing the maximum recursion depth.
Here's my take on a solution. As Tim Peters pointed out, this is a Hamiltonian path problem.
So the first step is to generate the graph in some form.
Well the zeroth step in this case to generate prime numbers. I'm going to use a sieve, but whatever prime test is fine. We need primes upto 2 * n since that is the largest any two numbers can sum to.
m = 8
n = m + 1 # Just so I don't have to worry about zero indexes and random +/- 1's
primelen = 2 * m
prime = [True] * primelen
prime[0] = prime[1] = False
for i in range(4, primelen, 2):
prime[i] = False
for i in range(3, primelen, 2):
if not prime[i]:
continue
for j in range(i * i, primelen, i):
prime[j] = False
Ok, now we can test for primality with prime[i]. Now its easy to make the graph edges. If I have a number i, what numbers can come next. I'll also make use of the fact that i and j have opposite parity.
pairs = [set(j for j in range(i%2+1, n, 2) if prime[i+j])
for i in range(n)]
So here pairs[i] is set object whose elements are integers j such that i+j is prime.
Now we need to walk the graph. This is really where the time consuming part is and all further optimizations will be done here.
chains = [
([], set(range(1, n))
]
chains is going to keep track of the valid paths as we walk them. The first element in the tuple will be your result. The second element is all the unused numbers, or unvisited nodes. The idea is to take one chain out of the queue, take a step down the path and put it back.
while chains:
chain, unused = chains.pop()
if not chain:
# we haven't even started, all unused are valid
valid_next = unused
else:
# We need numbers that are both unused and paired with the last node
# Using sets makes this easy
valid_next = unused & pairs[chains[-1]]
for num in valid_next:
# Take a step to the new node and add the new path back to chains
# Reminder, its important not to mutate anything here, always make new objs
newchain = chain + [num]
newunused = unused - set([num])
chains.append( (newchain, newunused) )
# are we done?
if not newunused:
print newchain
chains = False
Notice that if there is no valid next step, the path is removed without a replacement.
This is really memory inefficient, but runs in a reasonable time. The biggest performance bottleneck is walking the graph, so the next optimization would be popping and inserting paths in intelligent places to prioritize the most likely paths. It might be helpful to use a collections.deque or different container for your chains in that case.
EDIT
Here is an example of how you can implement your path priority. We will assign each path a score and keep the chains list sorted by this score. For a simple example I will suggest that paths containing "harder to use" nodes are worth more. That is for each step on a path the score will increase by n - len(valid_next) The modified code will look something like this.
import bisect
chains = ...
chains_score = [0]
while chains:
chain, unused = chains.pop()
score = chains_score.pop()
...
for num in valid_next:
newchain = chain + [num]
newunused = unused - set([num])
newscore = score + n - len(valid_next)
index = bisect.bisect(chains_score, newscore)
chains.insert(index, (newchain, newunused))
chains_score.insert(index, newscore)
Remember that insertion is O(n) so the overhead of adding this can be rather large. Its worth doing some analysis on your score algorithm to keep the queue length len(chains) managable.

Categories