Which data structure do you suggest for handling this data? - python

I have a grid network with a number nodes with (x,y) coordinates, and I have a couple of individuals that visit these nodes in the network. For instance, individual 1 visits nodes (1,3), (4,5), (8,9) and individual 2 visits (4,3), (2,5).
I need to access these nodes for each individual (let say in a for loop for all individuals) but I do not know the best way of doing it in python.

You can create a class called Individual to hold all the relevant information for that individual. You can then put those Individual objects in a list or whatever data structure you want.
class Individual:
def __init__(self, visited):
self.visited = visited # type: list[tuple]
def add_visit(self, node):
self.visited.append(node)
individuals = [
Individual([(1, 3), (4, 5), (8, 9)]),
Individual([(4, 3), (2, 5)])
]
for individual in individuals:
pass # do stuff

Others suggest a class for this task, but I think you would be better off with just a normal dictionary (dict), or defaultdict. No need to create a class for a thing that has no methods and only contains a list of nodes, especially when python has such a fantastic arsenal of containers.
Solution with a dict in Python 3:
individuals = {}
individuals["1"] = [(1, 3), (4, 5), (8, 9)]
individuals["2"] = [(4, 3), (2, 5)]
for ind, node in individuals.items():
print(ind, node)
individuals["2"].append((6, 7))

Related

Most efficient way to find spatial order from a list of tuples?

I have a circle-growth algorithm (line-growth with closed links) where new points are added between existing points at each iteration.
The linkage information of each point is stored as a tuple in a list. That list is updated iteratively.
QUESTIONS:
What would be the most efficient way to return the spatial order of these points as a list ?
Do I need to compute the whole order at each iteration or is there a way to cumulatively insert the new points in a orderly manner into that list ?
All I could come up with is the following:
tuples = [(1, 4), (2, 5), (3, 6), (1, 6), (0, 7), (3, 7), (0, 8), (2, 8), (5, 9), (4, 9)]
starting_tuple = [e for e in tuples if e[0] == 0 or e[1] == 0][0]
## note: 'starting_tuple' could be either (0, 7) or (0, 8), starting direction doesn't matter
order = list(starting_tuple) if starting_tuple[0] == 0 else [starting_tuple[1], starting_tuple[0]]
## order will always start from point 0
idx = tuples.index(starting_tuple)
## index of the starting tuple
def findNext():
global idx
for i, e in enumerate(tuples):
if order[-1] in e and i != idx:
ind = e.index(order[-1])
c = 0 if ind == 1 else 1
order.append(e[c])
idx = tuples.index(e)
for i in range(len(tuples)/2):
findNext()
print order
It is working but it is neither elegant (non pythonic) nor efficient.
It seems to me that a recursive algorithm may be more suitable but unfortunately I don't know how to implement such solution.
Also, please note that I'm using Python 2 and can have access to full python packages only (no numpy)
Rather than recursion, this seems more like a dictionary and generator problem to me:
from collections import defaultdict
def findNext(tuples):
previous = 0
yield previous # our first result
dictionary = defaultdict(list)
# [(1, 4), (2, 5), (3, 6), ...] -> {0: [7, 8], 1: [4, 6], 2: [5, 8], ...}
for a, b in tuples:
dictionary[a].append(b)
dictionary[b].append(a)
current = dictionary[0][0] # dictionary[0][1] should also work
yield current # our second result
while True:
a, b = dictionary[current] # possible connections
following = a if a != previous else b # only one will move us forward
if following == 0: # have we come full circle?
break
yield following # our next result
previous, current = current, following # reset for next iteration
tuples = [(1, 4), (2, 5), (3, 6), (1, 6), (7, 0), (3, 7), (8, 0), (2, 8), (5, 9), (4, 9)]
generator = findNext(tuples)
for n in generator:
print n
OUTPUT
% python test.py
0
7
3
6
1
4
9
5
2
8
%
Algorithm currently assumes we have more than two nodes.
Since the nodes only link to two other nodes, you can bin them by number, then follow the numbers around. This is O(n) sorting, which is pretty solid, but it's not a true sort in the <,>,= sense.
def bin_nodes(node_list):
#figure out the in and out nodes for each node, and put those into a dictionary.
node_bins = {} #init the bins
for node_pair in node_list: #go once through the list
for i in range(len(node_pair)): #put each node into the other's bin
if node_pair[i] not in node_bins: #initialize the bin dictionary for unseen nodes
node_bins[node_pair[i]] = []
node_bins[node_pair[i]].append(node_pair[(i+1)%2])
return node_bins
def sort_bins(node_bins):
#go from bin to bin, following the numbers
nodes = [0]*len(node_bins) #allocate a list
nodes[0] = next(iter(node_bins)) #pick an arbitrary one to start
nodes[1] = node_bins[nodes[0]][0] #pick a direction to go
for i in range(2, len(node_bins)):
#one of the two nodes in the bin is the horse we rode in on.
#The other is the next stop.
j = 1 if node_bins[nodes[i-1]][0] == nodes[i-2] else 0 #figure out which one ISN"T the one we came in on
nodes[i] = node_bins[nodes[i-1]][j] #pick the next node, then go to its bin, rinse repeat
return nodes
if __name__ == "__main__":
#test
test = [(1,2),(3,4),(2,4),(1,3)] #should give 1,3,4,2 or some rotation or reversal thereof
print(bin_nodes(test))
print(sort_bins(bin_nodes(test)))

Python 3 - Reducing a list by cyclical reconstructing a given set

I am looking for an algorithm that reduces a list of tuples cyclical by reconstructing a given set as pattern.
Each tuple contains an id and a set, like (1, {'xy'}).
Example
query = {'xyz'}
my_dict = [(1, {'x'}), (2, {'yx'}), (3, {'yz'}),
(4, {'z'}), (5, {'x'}), (6, {'y'}),
(7, {'xyz'}), (8, {'xy'}), (9, {'x'}),]
The goal is to recreate the pattern xyz as often as possible given the second value of the tuples in my_dict. Remaining elements from which the query set can not be completely reconstructed shall be cut off, hence 'reduce'.
my_dict contains in total: 6 times x, 5 times y, 3 times z.
Considering the my_dict, valid solutions would be for example:
result_1 = [(7, {'xyz'}), (8, {'xy'}), (4, {'z'}), (1, {'x'}), (3, {'yz'})]
result_2 = [(7, {'xyz'}), (2, {'yx'}), (4, {'z'}), (1, {'x'}), (3, {'yz'})]
result_3 = [(7, {'xyz'}), (9, {'x'}), (6, {'y'}), (4, {'z'}), (1, {'x'}), (3, {'yz'})]
The order of the tuples in the list is NOT important, i sorted them in the order of the query pattern xyz for the purpose of illustration.
Goal
The goal is to have a list of tuples where the total number of occurrences of the elements from the query set is most optimal evenly distributed.
result_1, result_2 and result_3 all contain in total: 3 times x, 3 times y, 3 times z.
Does anyone know a way/ approach how to do this?
Thanks for your help!
Depending on your application context, a naive brute-force approach might be enough: using the powerset function from this SO answer,
def find_solutions(query, supply):
for subset in powerset(supply):
if is_solution(query, subset):
yield subset
You would need to implement a function is_solution(query, subset) that returns True when the given subset of the supply (my_dict.values()) is a valid solution for the given query.

Remove element from itertools.combinations while iterating?

Given a list l and all combinations of the list elements is it possible to remove any combination containing x while iterating over all combinations, so that you never consider a combination containing x during the iteration after it is removed?
for a, b in itertools.combinations(l, 2):
if some_function(a,b):
remove_any_tup_with_a_or_b(a, b)
My list l is pretty big so I don't want to keep the combinations in memory.
A cheap trick to accomplish this would be to filter by disjoint testing using a dynamically updated set of exclusion values, but it wouldn't actually avoid generating the combinations you wish to exclude, so it's not a major performance benefit (though filtering using a C built-in function like isdisjoint will be faster than Python level if checks with continue statements typically, by pushing the filter work to the C layer):
from future_builtins import filter # Only on Py2, for generator based filter
import itertools
blacklist = set()
for a, b in filter(blacklist.isdisjoint, itertools.combinations(l, 2)):
if some_function(a,b):
blacklist.update((a, b))
If you want to remove all tuples containing the number x from the list of combinations itertools.combinations(l, 2), consider that you there is a one-to-one mapping (mathematically speaking) from the set itertools.combinations([i for i in range(1,len(l)], 2) to the itertools.combinations(l, 2) that don't contain the number x.
Example:
The set of all of combinations from itertools.combinations([1,2,3,4], 2) that don't contain the number 1 is given by [(2, 3), (2, 4), (3, 4)]. Notice that the number of elements in this list is equal to the number of elements of combinations in the list itertools.combinations([1,2,3], 2)=[(1, 2), (1, 3), (2, 3)].
Since order doesn't matter in combinations, you can map 1 to 4 in [(1, 2), (1, 3), (2, 3)] to get [(1, 2), (1, 3), (2, 3)]=[(4, 2), (4, 3), (2, 3)]=[(2, 4), (3, 4), (2, 3)]=[(2, 3), (2, 4), (3, 4)].

Chaining data without iteration

I have a large set of From/To pairs that represent a hierarchy of connected nodes. As an example, the hierarchy:
4 -- 5 -- 8
/
2 --- 6 - 9 -- 10
/ \
1 -- 11
\
3 ----7
is encapsulated as:
{(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
I'd like to be able to create a function that returns all nodes upstream of a given node, e.g.:
nodes[2].us
> [4, 5, 6, 8, 9, 10, 11]
My actual set of nodes is in the tens of thousands, so I'd like to be able to very quickly return a list of all upstream nodes without having to perform recursion over the entire set each time I want to get an upstream set.
This is my best attempt so far, but it doesn't get beyond two levels up.
class Node:
def __init__(self, fr, to):
self.fr = fr
self.to = to
self.us = set()
def build_hierarchy(nodes):
for node in nodes.values():
if node.to in nodes:
nodes[node.to].us.add(node)
for node in nodes.values():
for us_node in node.us.copy():
node.us |= us_node.us
return nodes
from_to = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3), (1, 0)}
nodes = {fr: Node(fr, to) for fr, to in from_to} # node objects indexed by "from"
nodes = build_hierarchy(nodes)
print [node.fr for node in nodes[2].us]
> [4, 6, 5, 9]
I'll show two ways of doing this. First, we'll simply modify your us attribute to intelligently compute and cache the results of a descendant lookup. Second, we'll use a graph library, networkx.
I'd really recommend you go with the graph library if your data naturally has graph structure. You'll save yourself a lot of hassle that way.
Caching us nodes property
You can make your us attribute a property, and cache the results of previous lookups:
class Node(object):
def __init__(self):
self.name = None
self.parent = None
self.children = set()
self._upstream = set()
def __repr__(self):
return "Node({})".format(self.name)
#property
def upstream(self):
if self._upstream:
return self._upstream
else:
for child in self.children:
self._upstream.add(child)
self._upstream |= child.upstream
return self._upstream
Note that I'm using a slightly different representation than you. I'll create the graph:
import collections
edges = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
nodes = collections.defaultdict(lambda: Node())
for node, parent in edges:
nodes[node].name = node
nodes[parent].name = parent
nodes[node].parent = nodes[parent]
nodes[parent].children.add(nodes[node])
and I'll lookup the upstream nodes for node 2:
>>> nodes[2].upstream
{Node(5), Node(4), Node(11), Node(9), Node(6), Node(8), Node(10)}
Once the nodes upstream of 2 are computed, they won't be recomputed if you call, for example nodes[1].upstream. If you make any changes to your graph, then, the upstream nodes will be incorrect.
Using networkx
If we use networkx to represent our graph, a lookup of all of the descendants of a node is very simple:
>>> import networkx as nx
>>> from_to = [(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2),
(2, 1), (3, 1), (7, 3), (1, 0)]
>>> graph = nx.DiGraph(from_to).reverse()
>>> nx.descendants(graph, 2)
{4, 5, 6, 8, 9, 10, 11}
This doesn't fully answer your question, which seemed to be about optimizing the lookup of descendants so work wasn't repeated on subsequent calls. However, for all we know, networkx.descendants might do some intelligent caching.
So this is what I'd suggest: avoid optimizing prematurely and use the libraries. If networkx.descendants is too slow, then you might investigate the networkx code to see if it caches lookups. If not, you can build your own caching lookup using more primitive networkx functions. My bet is that networkx.descendants will work just fine, and you won't need to go through the extra work.
Here's a function that will calculate the entire upstream list for a single node:
def upstream_nodes(start_node):
result = []
current = start_node
while current.to: # current.to == 0 means we're at the root node
result.append(current.to)
current = nodes[current.to]
return result
You've said that you don't want to iterate over the entire set of nodes each time you query an upstream, but this won't: it will just query that nodes' parent, and its parent, all the way to the root. So if the node is four levels down, it will make four dictionary lookups.
Or, if you want to be really clever, here's a version that will only make each parent lookup once, then store that lookup in the Node object's .us attribute so you never have to calculate the value again. (If your nodes' parent links aren't going to change after the graph has been created, this will work -- if you change your graph, of course, it won't).
def caching_upstream_nodes(start_node, nodes):
# start_node is the Node object whose upstream set you want
# nodes is the dictionary you created mapping ints to Node objects
if start_node.us:
# We already calculated this once, no need to re-calculate
return start_node.us
parent = nodes.get(start_node.to)
if parent is None:
# We're at the root node
start_node.us = set()
return start_node.us
# Otherwise, our upstream is our parent's upstream, plus the parent
parent_upstream = caching_upstream_nodes(parent, nodes)
start_node.us = parent_upstream.copy()
start_node.us.add(start_node.to)
return start_node.us
One of those two functions should be what you're looking for. (NOTE: Exercise a little bit of caution when running these, as I just wrote them but haven't invested the time to test them. I believe the algorithm is correct, but there's always a chance that I made a basic error in writing it.)

find a path using all given edges python

I have a list of edges. I need to decode a path from source node to sink node from them. There might be loops in my paths, but I should only use each of the edges once. In my list, I might also have the same edge for more than one time, which means in my path I should pass it more than once.
Lets say my edges list as following:
[(1, 16), (9, 3), (8, 9), (15, 8), (5, 1), (8, 15), (3, 5)]
so my path is:
8->15->8->9->3->5->1->16 equivalent to [8,15,8,9,3,5,1,16]
I know the sink node and the source node. (In above sample I knew that 8 is source and 16 is sink) here is another sample with more than one usage of the same edge:
[(1,2),(2,1),(2,3),(1,2)]
the path is:
1->2->1->2->3 equivalent to [1,2,1,2,3]
Basically it is type of topological sorting but, we don't have loops in topological sorting. I have the following code, but it does not use the nodes in the loops !
def find_all_paths(graph, start, end):
path = []
paths = []
queue = [(start, end, path)]
while queue:
start, end, path = queue.pop()
print 'PATH', path
path = path + [start]
if start == end:
paths.append(path)
for node in set(graph[start]).difference(path):
queue.append((node, end, path))
return paths
Simply, you may need to do more than one pass over the edges to assemble a path using all the edges.
The included code operates on the following assumptions:
A solution exists. Namely all vertices belong to a single connected component of an underlying graph and
in_degree = out_degree for either all or all but 2 vertices. In the latter case one of the vertices has in_degree - out_degree = 1 and the other has in_degree - out_degree = -1.
Furthermore even with these conditions, there is not necessarily a unique solution to the problem of finding a path from source to sink utilizing all edges. This code only finds one solution and not all solutions. (An example where multiple solutions exist is a 'daisy' [(1,2),(2,1),(1,3),(3,1),(1,4),(4,1),(1,5),(5,1)] where the start and end are the same.)
The idea is to create a dictionary of all edges for the path indexed by the starting node for the edge and then remove edges from the dictionary as they are added to the path. Rather than trying to get all of the edges in the path in the first pass, we go over the dictionary multiple times until all of the edges are used. The first pass creates a path from source to sink. Subsequent passes add in loops.
Warning: There is almost no consistency checking or validation. If the start is not a valid source for the edges then the 'path' returned will be disconnected!
"""
This is a basic implementatin of Hierholzer's algorithm as applied to the case of a
directed graph with perhaps multiple identical edges.
"""
import collections
def node_dict(edge_list):
s_dict = collections.defaultdict(list)
for edge in edge_list:
s_dict[edge[0]].append(edge)
return s_dict
def get_a_path(n_dict,start):
"""
INPUT: A dictionary whose keys are nodes 'a' and whose values are lists of
allowed directed edges (a,b) from 'a' to 'b', along with a start WHICH IS
ASSUMED TO BE IN THE DICTIONARY.
OUTPUT: An ordered list of initial nodes and an ordered list of edges
representing a path starting at start and ending when there are no other
allowed edges that can be traversed from the final node in the last edge.
NOTE: This function modifies the dictionary n_dict!
"""
cur_edge = n_dict[start][0]
n_dict[start].remove(cur_edge)
trail = [cur_edge[0]]
path = [cur_edge]
cur_node = cur_edge[1]
while len(n_dict[cur_node]) > 0:
cur_edge = n_dict[cur_node][0]
n_dict[cur_node].remove(cur_edge)
trail.append(cur_edge[0])
path.append(cur_edge)
cur_node = cur_edge[1]
return trail, path
def find_a_path_with_all_edges(edge_list,start):
"""
INPUT: A list of edges given by ordered pairs (a,b) and a starting node.
OUTPUT: A list of nodes and an associated list of edges representing a path
where each edge is represented once and if the input had a valid Eulerian
trail starting from start, then the lists give a valid path through all of
the edges.
EXAMPLES:
In [2]: find_a_path_with_all_edges([(1,2),(2,1),(2,3),(1,2)],1)
Out[2]: ([1, 2, 1, 2, 3], [(1, 2), (2, 1), (1, 2), (2, 3)])
In [3]: find_a_path_with_all_edges([(1, 16), (9, 3), (8, 9), (15, 8), (5, 1), (8, 15), (3, 5)],8)
Out[3]:
([8, 15, 8, 9, 3, 5, 1, 16],
[(8, 15), (15, 8), (8, 9), (9, 3), (3, 5), (5, 1), (1, 16)])
"""
s_dict = node_dict(edge_list)
trail, path_check = get_a_path(s_dict,start)
#Now add in edges that were missed in the first pass...
while max([len(s_dict[x]) for x in s_dict]) > 0:
#Note: there may be a node in a loop we don't have on trail yet
add_nodes = [x for x in trail if len(s_dict[x])>0]
if len(add_nodes) > 0:
skey = add_nodes[0]
else:
print "INVALID EDGE LIST!!!"
break
temp,ptemp = get_a_path(s_dict,skey)
i = trail.index(skey)
if i == 0:
trail = temp + trail
path_check = ptemp + path_check
else:
trail = trail[:i] + temp + trail[i:]
path_check = path_check[:i] + ptemp + path_check[i:]
#Add the final node to trail.
trail.append(path_check[-1][1])
return trail, path_check

Categories