Count steps between nodes using adjacency list - python

adj_list={1:[2,4],2:[1,3,4,8],3:[2,6,8,7],4:[1,5,2],5:[4,6],6:[3,9,5],7:[3,8,9,10],8:[2,3,7],9:[6,7,10],10:[7,9]}
def func(x,y):
t=0
xx=x
global i
for i in range(len(adj_list[xx])):
if y in adj_list[xx]:
t=t+1
# print(x,y,t)
break
else:
if xx<y:
t = t + 1
xx = xx + 1
i=0
print(x,y,t)
func(1,6)
I except the output like:
func(1,10) :1-2-3-7-10(4) or 1-2-8-7-10(4)
4 should be no of steps from 1 to 10

If you want a quick and easy implementation in pure Python you can use recursion to traverse the adjacent list and count the number of steps it takes to get to the destination from each node, then only record whichever path took the least amount of steps.
def count_steps(current_vertex, destination, steps=0, calling=0):
"""
Counts the number of steps between two nodes in an adjacent array
:param current_vertex: Vertex to start looking from
:param destination: Node we want to count the steps to
:param steps: Number of steps taken so far (only used from this method, ignore when calling)
:param calling: The node that called this function (only used from this method, ignore when calling)
:return:
"""
# Start at illegal value so we know it can be changed
min_steps = -1
# Check every vertex at the current index
for vertex in adj_list[current_vertex]:
if destination in adj_list[current_vertex]:
# We found the destination in the current array of vertexes
return steps + 1
elif vertex > calling:
# Make sure the vertex we will go to is greater than wherever we want to go so we don't end up in a loop
counted = count_steps(vertex, destination, steps + 1, current_vertex)
if counted != -1 and (min_steps == -1 or counted < min_steps):
# If this is actually the least amount of steps we have found
min_steps = counted
return min_steps
Note that when we find the destination in the current vertex's array we add one. This is because one more step would be needed to actually get to the node we found.

If you're looking into the least amount of steps to get from a specific node to any other node, I would suggest Dijkstra's Algorithm. This isn't a problem that is solvable in a single loop, it requires a queue of values that keeps in mind the shortest amount of steps.

You can use networkx for this. Start by building a network using the keys as nodes and the values as edges. A little extra work will be necessary for the edges however, given that edges must be lists of tuples containing (source_node, dest_node).
So a way to deal with this is to get all key-value combinations from all entries in the dictionary.
For the nodes you'll simply need:
nodes = list(adj_list.keys())
Now lets get the list of edges from the dictionary. For that you can use the following list comprehension:
edges = [(k,val) for k, vals in adj_list.items() for val in vals]
# [(1, 2), (1, 4), (2, 1), (2, 3), (2, 4)...
So, this list contains the entries in the dict as a flat list of tuples:
1: [2, 4] -> (1, 2), (1, 4)
2: [1, 3, 4, 8] -> (2, 1), (2, 3), (2, 4), (2, 8)
...
Now lets build the network with the corresponding nodes and edges:
import networkx as nx
G=nx.Graph()
G.add_edges_from(edges)
G.add_nodes_from(nodes)
Having built the network, in order to find the steps between different nodes, you can use shortest_path, which will give you precisely the shortest path between two given nodes. So if you wanted to find the shortest path between nodes 1 and 10:
nx.shortest_path(G, 1,10)
# [1, 2, 3, 7, 10]
If you're interested in the length simply take the len of the list. Lets look at another example:
nx.shortest_path(G, 1,6)
# [1, 2, 3, 6]
This can more easily checked by directly plotting the network:
nx.draw(G, with_labels = True)
plt.show()
Where in the case of the shortest path between nodes 1 and 10, as it can be seen the intermediate nodes are [1, 2, 3, 7, 10]:

Related

How to find a node with a shortest path of length equal to some number in networkx python?

I have a simple code to create a graph,G in networkx.
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib notebook
G = nx.DiGraph()
G.add_edge(1,2); G.add_edge(1,4)
G.add_edge(3,1); G.add_edge(3,2)
G.add_edge(3,4); G.add_edge(2,3)
G.add_edge(4,3)
I want to find "which node in G is connected to the other nodes by a shortest path of length equal to the diameter of G ".
there are two of these combinations, [1,3] and [2,4], which can be found by nx.shortest_path(G, 1) and nx.shortest_path(G, 2),respectively.
Or, for example,
if I use nx.shortest_path_length(G, source=2) then I get {2: 0, 3: 1, 1: 2, 4: 2}. so the length=2 is from node 2 to node 4, which is ok.
now, I'm trying to generalise it for all of the nodes to see if I can find the target nodes.
for node in G.nodes():
target = [k for k,v in nx.shortest_path_length(G, node).items() if v == nx.diameter(G)]
print(target)
and I get this odd result:
[3]
[1, 4]
[1, 2]
[]
Can anybody explain what this result means? as I'm trying to apply this method to solve a bigger problem.
For the graph you provided:
G = nx.DiGraph()
G.add_edge(1,2); G.add_edge(1,4)
G.add_edge(3,1); G.add_edge(3,2)
G.add_edge(3,4); G.add_edge(2,3)
G.add_edge(4,3)
The following:
for node in G.nodes():
target = [k for k,v in nx.shortest_path_length(G, node).items() if v == nx.diameter(G)]
print(target)
will print the target to which node is of distance equal to nx.diameter(G)
I would advise not calculating the diameter inside the for loop since that can turn out quite expensive.
In comparison, for a 200 node graph (nx.barabasi_albert_graph(200, 2, seed=1)) with the diameter calculation outside the for loop it takes ~74ms. The other option (with the diameter calculation inside the for loop) takes... well it's still running :ยด) but i'd say it will take waaay too long.
Also, instead of just the targets print the start and end nodes for readability:
diameter = nx.diameter(G)
for node in G.nodes():
start_end_nodes = [(node, k) for k,v in nx.shortest_path_length(G, node).items() if v == diameter]
print(start_end_nodes)
yielding:
[(1, 3)] # the path from 1 to 3 has lenght 2 = nx.diameter(G)
[(2, 1), (2, 4)] # the paths from 2 to 1 and 2 to 4 have lenght 2
[(4, 1), (4, 2)] # the paths from 4 to 1 and 4 to 2 have lenght 2
[] # there is no path starting at 3 that has lenght 2
A slight modification of the code from the reply of willcrack above (note the addition of calls to sorted):
diameter = nx.diameter(G)
for node in sorted(G.nodes()):
start_end_nodes = sorted([(node, k) for k,v in nx.shortest_path_length(G, node).items()
if v == diameter])
print(node, ":", start_end_nodes)
will produce:
1 : [(1, 3)]
2 : [(2, 1), (2, 4)]
3 : []
4 : [(4, 1), (4, 2)]
The point is that G.nodes() returns the nodes in an arbitrary fashion based on the internal representation of the graph which likely stores the nodes in an unsorted set-like structure.

Accessing items in a list, and forming graphs

I have a list of 2D numpy arrays:
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
Each line in linelist is the array of vertices connecting the edge.
These elements are the lines that form two squares:
-----
| |
-----
-----
| |
-----
I want to form two graphs, one for each square. To do this, I use a for loop. If neither vertex is present in the graph, then we form a new graph. If one vertex is present in the linelist, then it gets added to a present graph. In order for two lines to be connected, they need to share a vertex in common. However, I am having trouble coding this.
This is what I have so far:
graphs = [[]]
i=0
for elements in linelist:
for graph in graphs:
if elements[0] not in graph[i] and elements[1] not in graph[i]:
graphs.append([])
graphs[i].append(elements)
i=i+1
else:
graphs[i].append(elements)
I suggest doing a 'diffusion-like' process over the graph to find the disjoint subgraphs. One algorithm that comes to mind is breadth-first search; it works by looking for what nodes can be reached from a start node.
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
# edge list usually reads v1 -> v2
graph = {}
# however these are lines so symmetry is assumed
for l in linelist:
v1, v2 = map(tuple, l)
graph[v1] = graph.get(v1, ()) + (v2,)
graph[v2] = graph.get(v2, ()) + (v1,)
def BFS(graph):
"""
Implement breadth-first search
"""
# get nodes
nodes = list(graph.keys())
graphs = []
# check all nodes
while nodes:
# initialize BFS
toCheck = [nodes[0]]
discovered = []
# run bfs
while toCheck:
startNode = toCheck.pop()
for neighbor in graph.get(startNode):
if neighbor not in discovered:
discovered.append(neighbor)
toCheck.append(neighbor)
nodes.remove(neighbor)
# add discovered graphs
graphs.append(discovered)
return graphs
print(BFS(graph))
for idx, graph in enumerate(BFS(graph)):
print(f"This is {idx} graph with nodes {graph}")
Output
This is 0 graph with nodes [(1, 0), (0, 1), (0, 0), (1, 1)]
This is 1 graph with nodes [(3, 1), (2, 2), (1, 2), (3, 2)]
You may be interested in the package networkx for analyzing graphs. For instance finding the disjoint subgraphs is pretty trivial:
import networkx as nx
tmp = [tuple(tuple(j) for j in i) for i in linelist]
graph = nx.Graph(tmp);
for idx, graph in enumerate(nx.connected_components(graph)):
print(idx, graph)
My approach involves 2 passes over the list. In the first pass, I will look at the vertices and assign a graph number to each (1, 2, ...) If both vertices have not been seen, I will assign a new graph number. Otherwise, assign it to an existing one.
In the second pass, I go through the list and group the edges that belong to the same graph number together. Here is the code:
import collections
import itertools
import pprint
linelist = [[[0,0],[1,0]],[[0,0],[0,1]],[[1,0],[1,1]],[[0,1],[1,1]],[[1,2],[3,1]],[[1,2],[2,2]],[[3,1],[3,2]],[[2,2],[3,2]]]
# First pass: Look at the vertices and figure out which graph they
# belong to
vertices = {}
graph_numbers = itertools.count(1)
for v1, v2 in linelist:
v1 = tuple(v1)
v2 = tuple(v2)
graph_number = vertices.get(v1) or vertices.get(v2) or next(graph_numbers)
vertices[v1] = graph_number
vertices[v2] = graph_number
print('Vertices:')
pprint.pprint(vertices)
# Second pass: Sort edges
graphs = collections.defaultdict(list)
for v1, v2 in linelist:
graph_number = vertices[tuple(v1)]
graphs[graph_number].append([v1, v2])
print('Graphs:')
pprint.pprint(graphs)
Output:
Vertices:
{(0, 0): 1,
(0, 1): 1,
(1, 0): 1,
(1, 1): 1,
(1, 2): 2,
(2, 2): 2,
(3, 1): 2,
(3, 2): 2}
Graphs:
defaultdict(<type 'list'>, {1: [[[0, 0], [1, 0]], [[0, 0], [0, 1]], [[1, 0], [1, 1]], [[0, 1], [1, 1]]], 2: [[[1, 2], [3, 1]], [[1, 2], [2, 2]], [[3, 1], [3, 2]], [[2, 2], [3, 2]]]})
Notes
I have to convert each vertex from a list to a tuple because list cannot be a dictionary's key.
graphs behaves like a dictionary, the keys are graph numbers (1, 2, ...) and the values are list of edges
A little explanation of the line
graph_number = vertices.get(v1) or vertices.get(v2) or next(graph_numbers)
That line is roughly equal to:
number1 = vertices.get(v1)
number2 = vertices.get(v2)
if number1 is None and number2 is None:
graph_number = next(graph_numbers)
elif number1 is not None:
graph_number = number1
else:
graph_number = number2
Which says: If both v1 and v2 are not in the vertices, generate a new number (i.e. next(graph_numbers)). Otherwise, assign graph_number to whichever value that is not None.
Not only that line is succinct, it takes advantage of Python's short circuit feature: The interpreter first evaluate vertices.get(v1). If this returns a number (1, 2, ...) then the interpreter will return that number and skips evaluating the vertices.get(v2) or next(graph_numbers) part.
If vertices.get(v1) returns None, which is False in Python, then the interpreter will evaluate the next segment of the or: vertices.get(v2). Again, if this returns a non-zero number, then the evaluation stops and that number is return. If vertices.get(v2) returns None, then the interpreter evaluates the last segment, next(graph_numbers) and returns that value.

Repeating items when implementing a solution for a similar situation to the classic 0-1 knapsack

This problem is largely the same as a classic 0-1 knapsack problem, but with some minor rule changes and a large dataset to play with.
Dataset (product ID, price, length, width, height, weight):
(20,000 rows)
Problem:
A company is closing in fast on delivering its 1 millionth order. The marketing team decides to give the customer who makes that order a prize as a gesture of appreciation. The prize is: the lucky customer gets a delivery tote and 1 hour in the warehouse. Use the hour to fill up the tote with any products you desire and take them home for free.
Rules:
1 of each item
Combined volume < tote capacity (45 * 30 * 25 = 47250)
Item must fit individually (Dimensions are such that it can fit into the tote, e.g. 45 * 45 * 1 wouldn't fit)
Maximize value of combined products
Minimize weight on draws
Solution (using dynamic programming):
from functools import reduce
# The main solver function
def Solver(myItems, myCapacity):
dp = {myCapacity: (0, (), 0)}
getKeys = dp.keys
for i in range(len(myItems)):
itemID, itemValue, itemVolume, itemWeight = myItems[i]
for oldVolume in list(getKeys()):
newVolume = oldVolume - itemVolume
if newVolume >= 0:
myValue, ListOfItems, myWeight = dp[oldVolume]
node = (myValue + itemValue, ListOfItems + (itemID,), myWeight + itemWeight)
if newVolume not in dp:
dp[newVolume] = node
else:
currentValue, loi, currentWeight = dp[newVolume]
if currentValue < node[0] or (currentValue == node[0] and node[-1] < currentWeight):
dp[newVolume] = node
return max(dp.values())
# Generate the product of all elements within a given list
def List_Multiply(myList):
return reduce(lambda x, y: x * y, myList)
toteDims = [30, 35, 45]
totalVolume = List_Multiply(toteDims)
productsList = []
with open('products.csv', 'r') as myFile:
for myLine in myFile:
myData = [int(x) for x in myLine.strip().split(',')]
itemDims = [myDim for myDim, maxDim in zip(sorted(myData[2:5]), toteDims) if myDim <= maxDim]
if len(itemDims) == 3:
productsList.append((myData[0], myData[1], List_Multiply(myData[2:5]), myData[5]))
print(Solver(productsList, totalVolume))
Issue:
The output is giving repeated items
ie. (14018, (26, 40, 62, 64, 121, 121, 121, 152, 152), 13869)
How can I correct this to make it choose only 1 of each item?
It seems that the reason your code may produce answers with duplicate items is that in the inner loop, when you iterate over all generated volumes so far, it is possible for the code to have replaced the solution for an existing volume value before we get there.
E.g. if your productsList contained the following
productsList = [
# id, value, volume, weight
[1, 1, 2, 1],
[2, 1, 3, 2],
[3, 3, 5, 1]
]
and
totalVolume = 10
then by the time you got to the third item, dp.keys() would contain:
10, 8, 7, 5
The order of iteration is not guaranteed, but for the sake of this example, let's assume it is as given above. Then dp[5] would be replaced by a new solution containing item #3, and later in the iteration, we would be using that as a base for a new, even better solution (except now with a duplicate item).
To overcome the above problem, you could sort the keys before the iteration (in ascending order, which is the default), like for oldVolume in sorted(getKeys()). Assuming all items have a non-negative volume, this should guarantee that we never replace a solution in dp before we have iterated over it.
Another possible problem I see above is the way we get the optimal solution at the end using max(dp.values()). In the problem statement, it says that we want to minimize weight in the case of a draw. If I'm reading the code correctly, the elements of the tuple are value, list of items, weight in that order, so below we're tied for value, but the latter choice would be preferable because of the smaller weight... however max returns the first one:
>>> max([(4, (2, 3), 3), (4, (1, 3), 2)])
(4, (2, 3), 3)
It's possible to specify the sorting key to max so something like this might work:
>>> max([(4, (2, 3), 3), (4, (1, 3), 2)], key=lambda x: (x[0], -x[-1]))
(4, (1, 3), 2)

find a path using all given edges python

I have a list of edges. I need to decode a path from source node to sink node from them. There might be loops in my paths, but I should only use each of the edges once. In my list, I might also have the same edge for more than one time, which means in my path I should pass it more than once.
Lets say my edges list as following:
[(1, 16), (9, 3), (8, 9), (15, 8), (5, 1), (8, 15), (3, 5)]
so my path is:
8->15->8->9->3->5->1->16 equivalent to [8,15,8,9,3,5,1,16]
I know the sink node and the source node. (In above sample I knew that 8 is source and 16 is sink) here is another sample with more than one usage of the same edge:
[(1,2),(2,1),(2,3),(1,2)]
the path is:
1->2->1->2->3 equivalent to [1,2,1,2,3]
Basically it is type of topological sorting but, we don't have loops in topological sorting. I have the following code, but it does not use the nodes in the loops !
def find_all_paths(graph, start, end):
path = []
paths = []
queue = [(start, end, path)]
while queue:
start, end, path = queue.pop()
print 'PATH', path
path = path + [start]
if start == end:
paths.append(path)
for node in set(graph[start]).difference(path):
queue.append((node, end, path))
return paths
Simply, you may need to do more than one pass over the edges to assemble a path using all the edges.
The included code operates on the following assumptions:
A solution exists. Namely all vertices belong to a single connected component of an underlying graph and
in_degree = out_degree for either all or all but 2 vertices. In the latter case one of the vertices has in_degree - out_degree = 1 and the other has in_degree - out_degree = -1.
Furthermore even with these conditions, there is not necessarily a unique solution to the problem of finding a path from source to sink utilizing all edges. This code only finds one solution and not all solutions. (An example where multiple solutions exist is a 'daisy' [(1,2),(2,1),(1,3),(3,1),(1,4),(4,1),(1,5),(5,1)] where the start and end are the same.)
The idea is to create a dictionary of all edges for the path indexed by the starting node for the edge and then remove edges from the dictionary as they are added to the path. Rather than trying to get all of the edges in the path in the first pass, we go over the dictionary multiple times until all of the edges are used. The first pass creates a path from source to sink. Subsequent passes add in loops.
Warning: There is almost no consistency checking or validation. If the start is not a valid source for the edges then the 'path' returned will be disconnected!
"""
This is a basic implementatin of Hierholzer's algorithm as applied to the case of a
directed graph with perhaps multiple identical edges.
"""
import collections
def node_dict(edge_list):
s_dict = collections.defaultdict(list)
for edge in edge_list:
s_dict[edge[0]].append(edge)
return s_dict
def get_a_path(n_dict,start):
"""
INPUT: A dictionary whose keys are nodes 'a' and whose values are lists of
allowed directed edges (a,b) from 'a' to 'b', along with a start WHICH IS
ASSUMED TO BE IN THE DICTIONARY.
OUTPUT: An ordered list of initial nodes and an ordered list of edges
representing a path starting at start and ending when there are no other
allowed edges that can be traversed from the final node in the last edge.
NOTE: This function modifies the dictionary n_dict!
"""
cur_edge = n_dict[start][0]
n_dict[start].remove(cur_edge)
trail = [cur_edge[0]]
path = [cur_edge]
cur_node = cur_edge[1]
while len(n_dict[cur_node]) > 0:
cur_edge = n_dict[cur_node][0]
n_dict[cur_node].remove(cur_edge)
trail.append(cur_edge[0])
path.append(cur_edge)
cur_node = cur_edge[1]
return trail, path
def find_a_path_with_all_edges(edge_list,start):
"""
INPUT: A list of edges given by ordered pairs (a,b) and a starting node.
OUTPUT: A list of nodes and an associated list of edges representing a path
where each edge is represented once and if the input had a valid Eulerian
trail starting from start, then the lists give a valid path through all of
the edges.
EXAMPLES:
In [2]: find_a_path_with_all_edges([(1,2),(2,1),(2,3),(1,2)],1)
Out[2]: ([1, 2, 1, 2, 3], [(1, 2), (2, 1), (1, 2), (2, 3)])
In [3]: find_a_path_with_all_edges([(1, 16), (9, 3), (8, 9), (15, 8), (5, 1), (8, 15), (3, 5)],8)
Out[3]:
([8, 15, 8, 9, 3, 5, 1, 16],
[(8, 15), (15, 8), (8, 9), (9, 3), (3, 5), (5, 1), (1, 16)])
"""
s_dict = node_dict(edge_list)
trail, path_check = get_a_path(s_dict,start)
#Now add in edges that were missed in the first pass...
while max([len(s_dict[x]) for x in s_dict]) > 0:
#Note: there may be a node in a loop we don't have on trail yet
add_nodes = [x for x in trail if len(s_dict[x])>0]
if len(add_nodes) > 0:
skey = add_nodes[0]
else:
print "INVALID EDGE LIST!!!"
break
temp,ptemp = get_a_path(s_dict,skey)
i = trail.index(skey)
if i == 0:
trail = temp + trail
path_check = ptemp + path_check
else:
trail = trail[:i] + temp + trail[i:]
path_check = path_check[:i] + ptemp + path_check[i:]
#Add the final node to trail.
trail.append(path_check[-1][1])
return trail, path_check

iterate through an array looking at non-consecutive values

for i,(x,y,z) in enumerate( zip(analysisValues, analysisValues[1:], analysisValues[2:]) ):
if all(k<0.5 for k in (x,y,z)):
instance = i
break
this code iterates through an array and looks for the first 3 consecutive values that meet the condition '<0.5'
==============================
i'm working with 'timeseries' data and comparing the values at t, t+1s and t+2s
if the data is sampled at 1Hz then 3 consecutive values are compared and the code above is correct (points 0,1,2)
if the data is sampled at 2Hz then every other point must be compared (points 0,2,4) or
if the data is sampled at 3Hz then every third point must be compared (points 0,3,6)
the sample rate of input data can vary, but is known and recorded as the variable 'SRate'
==============================
please can you help me incorporate 'time' into this point-by-point analysis
You can use extended slice notation, giving the step value as SRate:
for i,(x,y,z) in enumerate(zip(analysisValues, \
analysisValues[SRate::SRate], \
analysisValues[2 * SRate::SRate])):
Let us first construct helper generator which does the following:
from itertools import izip, tee, ifilter
def sparsed_window(iterator, elements=2, step=1):
its = tee(iterator, elements)
for i,it in enumerate(its):
for _ in range(i*step):
next(it,None) # wind forward each iterator for the needed number of items
return izip(*its)
print list(sparsed_window([1,2,3,4,5,6,7,8,9,10],3,2))
Output:
>>>
[(1, 3, 5), (2, 4, 6), (3, 5, 7), (4, 6, 8), (5, 7, 9), (6, 8, 10)]
This helper avoids us of creating nearly the same lists in memory. It uses tee to clever cache only the part that is needed.
The helper code is based on pairwise recipe
Then we can use this helper to get what we want:
def find_instance(iterator, hz=1):
iterated_in_sparsed_window = sparsed_window(iterator, elements=3, step=hz)
fitting_values = ifilter(lambda (i,els): all(el<0.5 for el in els), enumerate(iterated_in_sparsed_window))
i, first_fitting = next(fitting_values, (None,None))
return i
print find_instance([1,0.4,1,0.4,1,0.4,1,0.4,1], hz=2)
Output:
>>>
1

Categories