I currently have a list of connections stored in a list where each connection is a directed link that connects two points and no point ever links to more than one point or is linked to by more than one point. For example:
connections = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]
Should produce:
ordered = [ [ 4, 6, 5, 3, 7, 8 ], [ 1, 2, 1 ] ]
I have attempt to do this using an algorithm that takes an input point and a list of connections and recursively calls itself to find the next point and add it to the growing ordered list. However, my algorithm breaks down when I don't start with the correct point (though this should just be a matter of repeating the same algorithm in reverse), but also when there are multiple unconnected strands
What would be the best way of writing an efficient algorithm to order these connections?
Algorithm for a Solution
You're looking for a topological sort algorithm:
from collections import defaultdict
def topological_sort(dependency_pairs):
'Sort values subject to dependency constraints'
num_heads = defaultdict(int) # num arrows pointing in
tails = defaultdict(list) # list of arrows going out
for h, t in dependency_pairs:
num_heads[t] += 1
tails[h].append(t)
ordered = [h for h in tails if h not in num_heads]
for h in ordered:
for t in tails[h]:
num_heads[t] -= 1
if not num_heads[t]:
ordered.append(t)
cyclic = [n for n, heads in num_heads.iteritems() if heads]
return ordered, cyclic
if __name__ == '__main__':
connections = [(3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1)]
print topological_sort(connections)
Here is the output for your sample data:
([4, 6, 5, 3, 7, 8], [1, 2])
The runtime is linearly proportional to the number of edges (dependency pairs).
HOW IT WORKS
The algorithm is organized around a lookup table called num_heads that keeps a count the number of predecessors (incoming arrows). Consider an example with the following connections: a->h b->g c->f c->h d->i e->d f->b f->g h->d h->e i->b, the counts are:
node number of incoming edges
---- ------------------------
a 0
b 2
c 0
d 2
e 1
f 1
g 2
h 2
i 1
The algorithm works by "visting" nodes with no predecessors. For example, nodes a and c have no incoming edges, so they are visited first.
Visiting means that the nodes are output and removed from the graph. When a node is visited, we loop over its successors and decrement their incoming count by one.
For example, in visiting node a, we go to its successor h to decrement its incoming count by one (so that h 2 becomes h 1.
Likewise, when visiting node c, we loop over its successors f and h, decrementing their counts by one (so that f 1 becomes f 0 and h 1 becomes h 0).
The nodes f and h no longer have incoming edges, so we repeat the process of outputting them and removing them from the graph until all the nodes have been visited. In the example, the visitation order (the topological sort is):
a c f h e d i b g
If num_heads ever arrives at a state when there are no nodes without incoming edges, then it means there is a cycle that cannot be topologically sorted and the algorithm exits to show the requested results.
Something like this:
from collections import defaultdict
lis = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]
dic = defaultdict(list)
for k,v in lis:
if v not in dic:
dic[k].append(v)
else:
dic[k].extend([v]+dic[v])
del dic[v]
for k,v in dic.items():
for x in v:
if x in dic and x!=k:
dic[k].extend(dic[x])
del dic[x]
print dic
print [[k]+v for k,v in dic.items()]
output:
defaultdict(<type 'list'>, {2: [1, 2], 4: [6, 5, 3, 7, 8]})
[[2, 1, 2], [4, 6, 5, 3, 7, 8]]
I think you can probably do it in O(n) with something like this:
ordered = {}
for each connection (s,t):
if t exists in ordered :
ordered[s] = [t] + ordered[t]
del ordered[t]
else:
ordered[s] = [t]
# Now merge...
for each ordered (s : [t1, t2, ... tN]):
ordered[s] = [t1, t2, ... tN] + ordered[tN]
del ordered[tN]
At the end you will be left with something like this, but you can convert that rather easily
ordered = {
4 : [6, 5, 3, 7, 8],
2 : [1, 2]
}
Related
For example, suppose we have NextInOrder(10,(1,2,4,7)), then with these two as inputs for the function, I wish to write a python function that returns (1,2,4,8) by finding the next permutation in lexicographic order where elements of the permutation are in the range 1-10
So as another example NextInOrder(10, (5,3,2,10)) would return (5,3,4,1)
You can use a digit counter approach starting from the last position. Increase the last position to a value not in the previous positions. Backtrack to previous position(s) when the next value is out of range.
For example:
def nextPerm(N,P):
result = list(P) # mutable permutation
i = len(P)-1 # position to advance (start with last)
while i in range(len(P)): # advance/backtrack loop
result[i] += 1 # next value at position
if result[i] > N: # value beyond range
result[i]=0
i -= 1 # backtrack
elif result[i] not in result[:i]: # distinct values only
i += 1 # next position to advance
return None if i<0 else tuple(result)
output:
P = (1,2,4,7)
while P:
P = nextPerm(10,P)
print(P)
(1, 2, 4, 8)
(1, 2, 4, 9)
(1, 2, 4, 10)
(1, 2, 5, 3)
(1, 2, 5, 4)
(1, 2, 5, 6)
(1, 2, 5, 7)
(1, 2, 5, 8)
(1, 2, 5, 9)
(1, 2, 5, 10)
(1, 2, 6, 3)
...
You could use itertools:
from itertools import permutations
def NextInOrder(n, current):
perms = permutations(range(1, n+1), len(current))
for perm in perms:
if perm == current:
return next(perms)
Demo:
>>> NextInOrder(10,(1,2,4,7))
(1, 2, 4, 8)
>>> NextInOrder(10, (5,3,2,10))
(5, 3, 4, 1)
I have a list of 100 tuples. Each tuple contains 5 unique integers. I want to know the fastest way to find all the groups that have exactly the same N = 2 intersections. If a tuple has multiple pairs of elements that has 2 intersections with other tuples, then find all of them and store in different groups. The expected output is a list of unique lists ([(1,2,3,4,5),(4,5,6,7,8)] is the same as [(4,5,6,7,8),(1,2,3,4,5)]), where each list is a group that has all tuples with the same N=2 intersections. Below is my code:
from collections import defaultdict
from random import sample, choice
lst = [tuple(sample(range(10), 5)) for _ in range(100)]
dct = defaultdict(list)
N = 2
for i in lst:
for j in lst:
if len(set(i).intersection(set(j))) == N:
dct[i].append(j)
key = choice(list(dct))
print([key] + dct[key])
>>> [(4, 5, 2, 3, 7), (4, 6, 2, 5, 0), (9, 4, 2, 1, 8), (7, 6, 5, 2, 0), (2, 4, 0, 7, 8)]
Obviously, all last 4 tuples have 2 intersections with the first tuple, but not necessarily the same 2 elements. So how should I get the tuples that has the same 2 intersections?
An obvious solution is to brute force enumerate all possible (x, y) integer pairs and group tuples that has this (x, y) intersection accordingly, but is there a faster algorithm to do this?
Edit: [(1, 2, 3, 4, 5), (4, 5, 6, 7, 8), (4, 5, 9, 10, 11)] is allowed to be in a same group, but [(1, 2, 3, 4, 5),(4, 5, 6, 7, 8), (4, 5, 6, 10, 11)] is not, because (4, 5, 6, 7, 8) has 3 intersections with (4, 5, 6, 10, 11). In this case, it should be divided into 2 groups [(1, 2, 3, 4, 5), (4, 5, 6, 7, 8)] and [(1, 2, 3, 4, 5), (4, 5, 6, 10, 11)]. The final result will of course contains groups with various sizes, including many short lists with only two tuples, but this is what I want.
a simple combinations-based approach will suffice:
from collections import defaultdict
from itertools import combinations
res = defaultdict(set)
for t1, t2 in combinations(tuples, 2):
overlap = set(t1) & set(t2)
if len(overlap) == 2:
cur = res[frozenset(overlap)]
cur.add(t1)
cur.add(t2)
result:
defaultdict(set,
{frozenset({2, 4}): {(2, 4, 0, 7, 8),
(4, 5, 2, 2, 4),
(4, 6, 2, 6, 0),
(8, 4, 2, 1, 8)},
frozenset({2, 5}): {(4, 5, 2, 2, 4), (7, 6, 5, 2, 0)}})
I like how clean #acushner's solution looks, but I wrote one that's substantially faster:
def all_n_intersections2(xss, n):
xss = [frozenset(xs) for xs in xss]
result = {}
while xss:
xsa = xss.pop()
for xsb in xss:
ixs = xsa.intersection(xsb)
if len(ixs) == n:
if ixs not in result:
result[ixs] = [xsa, xsb]
else:
result[ixs].append(xsb)
return result
If I pit them against each other:
from timeit import timeit
from random import sample
from collections import defaultdict
from itertools import combinations
def all_n_intersections1(xss, n):
res = defaultdict(set)
for t1, t2 in combinations(xss, 2):
overlap = set(t1) & set(t2)
if len(overlap) == n:
cur = res[frozenset(overlap)]
cur.add(t1)
cur.add(t2)
def all_n_intersections2(xss, n):
xss = [frozenset(xs) for xs in xss]
result = {}
while xss:
xsa = xss.pop()
for xsb in xss:
ixs = xsa.intersection(xsb)
if len(ixs) == n:
if ixs not in result:
result[ixs] = [xsa, xsb]
else:
result[ixs].append(xsb)
return result
data = [tuple(sample(range(10), 5)) for _ in range(100)]
print(timeit(lambda: all_n_intersections1(data, 2), number=1000))
print(timeit(lambda: all_n_intersections2(data, 2), number=1000))
Results:
3.4294801999999995
1.4871790999999999
With some commentary:
def all_n_intersections2(xss, n):
# using frozensets to be able to use them as dict keys, convert only once
xss = [frozenset(xs) for xs in xss]
result = {}
# keep going until there are no more items left to combine
while xss:
# popping to compare against all others remaining, intersect each pair only once
xsa = xss.pop()
for xsb in xss:
# using library intersection, assuming the native implementation is fastest
ixs = xsa.intersection(xsb)
if len(ixs) == n:
if ixs not in result:
# not using default dict, initialising with these two
result[ixs] = [xsa, xsb]
else:
# otherwise, xsa was already in there, appending xsb
result[ixs].append(xsb)
return result
What the solution does:
for each combination of xsa, xsb from xss, it computes the intersection
if the intersection ixs is the target length n, xsa and xsb are added to a list in a dictionary using ixs as a key
duplicate appends are avoided (unless there are duplicate tuples in the source data)
How to go about finding the number of non-coprimes in a given array?
Suppose
a = [2, 5, 6, 7]
b = [4, 9, 10, 12]
Then the number of non-coprimes will be 3, since You can remove:
(2, 4)
(5, 10)
(6, 9)
n = int(input())
a = list(map(int, input().split()))
b = list(map(int, input().split()))
count = 0
len_a = len(a)
len_b = len(b)
for i in range(len_a):
for j in range(len_b):
x = a[i]
y = b[j]
if(math.gcd(x,y) != 1):
count += 1
print(count)
This is in reference to :https://www.hackerrank.com/challenges/computer-game/problem
I am receiving 8 as output.
Why do you expect the answer to be 3?
You're pairing 5 and 10, so you're obviously looking at pairs of elements from a and b disregarding their position.
Just print out the pairs and you'll see why you're getting 8...
import math
from itertools import product
a=[2, 5, 6, 7]
b=[4, 9, 10, 12]
print(sum([math.gcd(x, y) != 1 for x, y in product(a, b)])) # 8
print([(x, y) for x, y in product(a, b) if math.gcd(x, y) != 1]) # the pairs
Update: After reading the problem the OP is trying to handle, it's worth pointing out that the expect output (3) is the answer to a different question!
Not how many pairs of elements are not coprime, but rather how many non-coprime pairs can be removed without returning them into the arrays.
This question is actually an order of magnitude more difficult, and is not a matter of fixing one's code, but rather about giving the actual problem a lot of mathematical and algorithmic thought.
See some discussion here
Last edit, a sort-of solution, albeit an extremely inefficient one. The only point is to suggest some code that can help the OP understand the point of the original question, by seeing some form of solution, however low-quality or bad-runtime it is.
import math
from itertools import product, permutations
n = 4
def get_pairs_list_not_coprime_count(pairs_list):
x, y = zip(*pairs_list)
return min(i for i in range(n) if math.gcd(x[i], y[i]) == 1) # number of pairs before hitting a coprime pair
a = [2, 5, 6, 7]
b = [4, 9, 10, 12]
a_perms = permutations(a) # so that the pairing product with b includes all pairing options
b_perms = permutations(b) # so that the pairing product with a includes all pairing options
pairing_options = product(a_perms, b_perms) # pairs-off different orderings of a and b
actual_pairs = [zip(*p) for p in pairing_options] # turn a pair of a&b orderings into number-pairs (for each of the orderings possible as realized by the product)
print(max(get_pairs_list_not_coprime_count(pairs_list) for pairs_list in actual_pairs)) # The most pairings managed over all possible options: 3 for this example
I believe the answer should be 8 itself. Out of the 4*4 possible combinations of numbers that you are comparing, there are 8 coprimes and 8 non-coprimes.
Here is an implementation of the code with the gcd function without using math and broadcasting to avoid multiple loops.
import numpy
a = '2 5 6 7'
b = '4 9 10 12'
a = np.array(list(map(int,a.split())))
b = np.array(list(map(int,b.split())))
def gcd(p,q):
while q != 0:
p, q = q, p%q
return p
def is_coprime(x, y):
return gcd(x, y) == 1
is_coprime_v = np.vectorize(is_coprime)
compare = is_coprime_v(a[:, None], b[None, :])
noncoprime_pairs = [(a[i],b[j]) for i,j in np.argwhere(~compare)]
coprime_pairs = [(a[i],b[j]) for i,j in np.argwhere(compare)]
print('non-coprime',noncoprime_pairs)
print('coprime',coprime_pairs)
non-coprime [(2, 4), (2, 10), (2, 12), (5, 10), (6, 4), (6, 9), (6, 10), (6, 12)]
coprime [(2, 9), (5, 4), (5, 9), (5, 12), (7, 4), (7, 9), (7, 10), (7, 12)]
Same solution but using the math.gcd() -
import math
import numpy
a = '2 5 6 7'
b = '4 9 10 12'
a = np.array(list(map(int,a.split())))
b = np.array(list(map(int,b.split())))
def f(x,y):
return math.gcd(x, y) == 1
fv = np.vectorize(f)
compare = fv(a[:, None], b[None, :])
noncoprime_pairs = [(a[i],b[j]) for i,j in np.argwhere(~compare)]
print(noncoprime_pairs)
[(2, 4), (2, 10), (2, 12), (5, 10), (6, 4), (6, 9), (6, 10), (6, 12)]
If you are looking for the answer to be 3 in your example, I would assume you are counting the number of values in a that have at least one non-coprime in b.
If that is the case you could do it like this:
from math import gcd
def nonCoprimes(A,B):
return sum(any(gcd(a,b)>1 for b in B) for a in A)
print(nonCoprimes([2,5,6,7],[4,9,10,12])) # 3
So, for each value in a check if there are any values of b that don't have a gcd of 1 with the value in a
I am just a beginner in python and this may seem like an easy fix but I have been stuck at it given my limited knowledge of python.
I have two lists that are paired together:
s = [0,1,2,3,4,5,6,7,3,5,7]
t = [2,4,6,2,1,6,3,1,7,4,1]
This can be interpreted as start nodes and end nodes of lines, so 0 is connected to 2 and 1 is connected to 4 and so on.
I would like to remove all duplicate "lines" or pairs of nodes, in this example 7 -> 1 is repeated twice and 1 -> 4 is duplicated in the other direction 4 -> 1. I want to remove both types of duplicates and get the results:
S = [0,1,2,3,5,6,7,3,5]
T = [2,4,6,2,6,3,1,7,4]
Preserving the order and pairs of start and end is required.
I hope this makes sense, any help is greatly appreciated!
You can use a paired set and deduplicate the lists in order such as:
s = [0,1,2,3,4,5,6,7,3,5,7]
t = [2,4,6,2,1,6,3,1,7,4,1]
seen=set()
li=[]
for t in zip(s,t):
if frozenset(t) not in seen:
li.append(t)
seen.add(frosenset(t))
S,T=map(list,(zip(*li)))
Result:
>>> S
[0, 1, 2, 3, 5, 6, 7, 3, 5]
>>> T
[2, 4, 6, 2, 6, 3, 1, 7, 4]
Note: This can be reduced to:
seen=set()
S,T=zip(*[t for t in zip(s,t) if frozenset(t) not in seen and not seen.add(frozenset(t))])
But some will object to the use of a side effect in a list comprehension. I personally think it is OK in this use, but the loop form is considered by many to be better because it is far easier to read.
You can zip these lists together and use set comprehension
u = {tuple({a,b}) for (a,b) in (zip(s,t))}
# u: {(0, 2), (1, 4), (1, 7), (2, 3), (2, 6), (3, 6), (3, 7), (4, 5), (5, 6)}
first, sec = zip(*u)
# first: (6, 6, 5, 4, 3, 6, 7, 7, 2)
# sec : (2, 5, 4, 1, 2, 3, 1, 3, 0)
We use tuple to make objs hashable.
Just notice that sets are unorded, so if order is important please highlight
that in your question.
To preserve orders, check #Dawg's answes. My solution for this case was very similar to his after he undeleted ;)
I have a RDD of this type:
[(1, [3, 10, 11]), (2, [3, 4, 10, 11]), (3, [1, 4]), (4, [2, 3, 10])...]
And I need a function that follows this rule:
if the key x does not contain the key y (and vice versa) in its value-list, then outputs a tuple having the following syntax:
[(x, [y, len(values_x ^ values_y)]), ...]
where len(values_x ^ values_y) is the number of values in common between the two keys. If this value is 0 (i.e., no values in common), just skip this pair of keys.
E.g., from the sample above, the output should be:
(1, [2, 3]) # because keys 1 and 2 share the values 3, 10, 11
(1, [4, 2]) # because keys 1 and 4 share the values 3, 10
skipping: (2, [1, 3]) is the inverse of (1, [2, 3]), so it can be skipped
(2, [3, 1]) # because keys 2 and 3 share the value 4
...
The pair of keys 1 and 3 (and other similar cases) is skipped because key 3 is included in the list-value of key 1 and vice versa.
A solution that I've implemented (but I don't like at all), is using the cartesian function to create all the combinations between keys and then a mapping and a filtering to delete unnecessary pairs.
Is there a better solution without using cartesian?
First lets define some helpers:
def swap(x):
"""Given a tuple (x1, x2) return (x2, 1)"""
return (x[1], 1)
def filter_source(x):
"""Check if s1 < s2 in (x, (s1, s2))"""
return x[1][0] < x[1][1]
def reshape(kv):
"""Reshape ((k1, k2), v) to get final result"""
((k1, k2), v) = kv
return (k1, (k2, v))
and create an example RDD:
rdd = sc.parallelize([
(1, [3, 10, 11]), (2, [3, 4, 10, 11]),
(3, [1, 4]), (4, [2, 3, 10])])
Finally you can do something like this:
from operator import add
flattened = rdd.flatMap(lambda kv: ((v, kv[0]) for v in kv[1])) # Flatten input
flattened.first()
# (1, 3) <- from (3, [1, 4])
result = (flattened
.join(flattened) # Perform self join using value from input as key
.filter(filter_source) # Remove pairs from the same source
.map(swap)
.reduceByKey(add)
.map(reshape)) # Get final output