Two way removal of duplicate start and end nodes - python

I am just a beginner in python and this may seem like an easy fix but I have been stuck at it given my limited knowledge of python.
I have two lists that are paired together:
s = [0,1,2,3,4,5,6,7,3,5,7]
t = [2,4,6,2,1,6,3,1,7,4,1]
This can be interpreted as start nodes and end nodes of lines, so 0 is connected to 2 and 1 is connected to 4 and so on.
I would like to remove all duplicate "lines" or pairs of nodes, in this example 7 -> 1 is repeated twice and 1 -> 4 is duplicated in the other direction 4 -> 1. I want to remove both types of duplicates and get the results:
S = [0,1,2,3,5,6,7,3,5]
T = [2,4,6,2,6,3,1,7,4]
Preserving the order and pairs of start and end is required.
I hope this makes sense, any help is greatly appreciated!

You can use a paired set and deduplicate the lists in order such as:
s = [0,1,2,3,4,5,6,7,3,5,7]
t = [2,4,6,2,1,6,3,1,7,4,1]
seen=set()
li=[]
for t in zip(s,t):
if frozenset(t) not in seen:
li.append(t)
seen.add(frosenset(t))
S,T=map(list,(zip(*li)))
Result:
>>> S
[0, 1, 2, 3, 5, 6, 7, 3, 5]
>>> T
[2, 4, 6, 2, 6, 3, 1, 7, 4]
Note: This can be reduced to:
seen=set()
S,T=zip(*[t for t in zip(s,t) if frozenset(t) not in seen and not seen.add(frozenset(t))])
But some will object to the use of a side effect in a list comprehension. I personally think it is OK in this use, but the loop form is considered by many to be better because it is far easier to read.

You can zip these lists together and use set comprehension
u = {tuple({a,b}) for (a,b) in (zip(s,t))}
# u: {(0, 2), (1, 4), (1, 7), (2, 3), (2, 6), (3, 6), (3, 7), (4, 5), (5, 6)}
first, sec = zip(*u)
# first: (6, 6, 5, 4, 3, 6, 7, 7, 2)
# sec : (2, 5, 4, 1, 2, 3, 1, 3, 0)
We use tuple to make objs hashable.
Just notice that sets are unorded, so if order is important please highlight
that in your question.
To preserve orders, check #Dawg's answes. My solution for this case was very similar to his after he undeleted ;)

Related

Print every two pairs in a new line from array of elements in Python

How can I print from an array of elements in Python every second pair of elements one below another, without commas and brackets?
My array looks like this:
m=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
And, I want to print in one of the cases:
1 2
5 6
9 10
or in another case:
3 4
7 8
11 12
I didn't know how to do that, so i created two separate arrays, but when i try to print elements in separate rows, each pair has brackets and coma. Is there any way to solve this easier and to make it look as i wrote?
What I've tried:
a=[m[j:j+2] for j in range(0,len(m),2)]
a1=m[::2]
a2=m[1::2]
if s1>s2:
print("\n".join(map(str,a1)))
elif s1<s2:
print("\n".join(map(str,a2)))
My current output:
[3, 4]
[7, 8]
[11, 12]
You could use a while loop
m = [1,2,3,4,5,6,7,8,9]
idx = 0
try:
while idx < len:
print(m[idx], m[idx+1])
idx += 3
except IndexError:
print("Index out of bounds")
Just change the start Index (idx) for the other print
Another way to do it like this-
m=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
pairs = [[i, i+1] for i in m[::2]]
results = [], []
for i, e in enumerate(pairs):
results[i%2].append(e)
for i in results:
for p in i:
print(*p)
print("-----")
Output:
1 2
5 6
9 10
-----
3 4
7 8
11 12
-----
what you are trying to achieve is making a pair of 2 in array and save alternative pair in a different arrays/list. one way of achiving this is below code, by going step by step.
m=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
make_pair = [(m[i], m[i+1]) for i in range(0, len(m), 2)]
res1 = []
res2 = []
print(make_pair)
# [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10), (11, 12)]
for i in range(len(make_pair)):
if i%2:
res2.append(make_pair[i])
else:
res1.append(make_pair[i])
print(res1)
# [(1, 2), (5, 6), (9, 10)]
print(res2)
# [(3, 4), (7, 8), (11, 12)]
if you want to go in one go, ie without creating pair array, then using temporary stack you can achieve same result
I don't understand why all the solutions are so complex, this is as simple as follows (case 1):
len_m = len(m)
for i in range(0, len_m - 1 if len_m & 1 else len_m, 4):
print(f"{m[i]} {m[i + 1]}")
For case 2, just start the range with 2.

Fastest algorithm to get all the tuple groups that has the same N intersections from a list of tuples without duplicates

I have a list of 100 tuples. Each tuple contains 5 unique integers. I want to know the fastest way to find all the groups that have exactly the same N = 2 intersections. If a tuple has multiple pairs of elements that has 2 intersections with other tuples, then find all of them and store in different groups. The expected output is a list of unique lists ([(1,2,3,4,5),(4,5,6,7,8)] is the same as [(4,5,6,7,8),(1,2,3,4,5)]), where each list is a group that has all tuples with the same N=2 intersections. Below is my code:
from collections import defaultdict
from random import sample, choice
lst = [tuple(sample(range(10), 5)) for _ in range(100)]
dct = defaultdict(list)
N = 2
for i in lst:
for j in lst:
if len(set(i).intersection(set(j))) == N:
dct[i].append(j)
key = choice(list(dct))
print([key] + dct[key])
>>> [(4, 5, 2, 3, 7), (4, 6, 2, 5, 0), (9, 4, 2, 1, 8), (7, 6, 5, 2, 0), (2, 4, 0, 7, 8)]
Obviously, all last 4 tuples have 2 intersections with the first tuple, but not necessarily the same 2 elements. So how should I get the tuples that has the same 2 intersections?
An obvious solution is to brute force enumerate all possible (x, y) integer pairs and group tuples that has this (x, y) intersection accordingly, but is there a faster algorithm to do this?
Edit: [(1, 2, 3, 4, 5), (4, 5, 6, 7, 8), (4, 5, 9, 10, 11)] is allowed to be in a same group, but [(1, 2, 3, 4, 5),(4, 5, 6, 7, 8), (4, 5, 6, 10, 11)] is not, because (4, 5, 6, 7, 8) has 3 intersections with (4, 5, 6, 10, 11). In this case, it should be divided into 2 groups [(1, 2, 3, 4, 5), (4, 5, 6, 7, 8)] and [(1, 2, 3, 4, 5), (4, 5, 6, 10, 11)]. The final result will of course contains groups with various sizes, including many short lists with only two tuples, but this is what I want.
a simple combinations-based approach will suffice:
from collections import defaultdict
from itertools import combinations
res = defaultdict(set)
for t1, t2 in combinations(tuples, 2):
overlap = set(t1) & set(t2)
if len(overlap) == 2:
cur = res[frozenset(overlap)]
cur.add(t1)
cur.add(t2)
result:
defaultdict(set,
{frozenset({2, 4}): {(2, 4, 0, 7, 8),
(4, 5, 2, 2, 4),
(4, 6, 2, 6, 0),
(8, 4, 2, 1, 8)},
frozenset({2, 5}): {(4, 5, 2, 2, 4), (7, 6, 5, 2, 0)}})
I like how clean #acushner's solution looks, but I wrote one that's substantially faster:
def all_n_intersections2(xss, n):
xss = [frozenset(xs) for xs in xss]
result = {}
while xss:
xsa = xss.pop()
for xsb in xss:
ixs = xsa.intersection(xsb)
if len(ixs) == n:
if ixs not in result:
result[ixs] = [xsa, xsb]
else:
result[ixs].append(xsb)
return result
If I pit them against each other:
from timeit import timeit
from random import sample
from collections import defaultdict
from itertools import combinations
def all_n_intersections1(xss, n):
res = defaultdict(set)
for t1, t2 in combinations(xss, 2):
overlap = set(t1) & set(t2)
if len(overlap) == n:
cur = res[frozenset(overlap)]
cur.add(t1)
cur.add(t2)
def all_n_intersections2(xss, n):
xss = [frozenset(xs) for xs in xss]
result = {}
while xss:
xsa = xss.pop()
for xsb in xss:
ixs = xsa.intersection(xsb)
if len(ixs) == n:
if ixs not in result:
result[ixs] = [xsa, xsb]
else:
result[ixs].append(xsb)
return result
data = [tuple(sample(range(10), 5)) for _ in range(100)]
print(timeit(lambda: all_n_intersections1(data, 2), number=1000))
print(timeit(lambda: all_n_intersections2(data, 2), number=1000))
Results:
3.4294801999999995
1.4871790999999999
With some commentary:
def all_n_intersections2(xss, n):
# using frozensets to be able to use them as dict keys, convert only once
xss = [frozenset(xs) for xs in xss]
result = {}
# keep going until there are no more items left to combine
while xss:
# popping to compare against all others remaining, intersect each pair only once
xsa = xss.pop()
for xsb in xss:
# using library intersection, assuming the native implementation is fastest
ixs = xsa.intersection(xsb)
if len(ixs) == n:
if ixs not in result:
# not using default dict, initialising with these two
result[ixs] = [xsa, xsb]
else:
# otherwise, xsa was already in there, appending xsb
result[ixs].append(xsb)
return result
What the solution does:
for each combination of xsa, xsb from xss, it computes the intersection
if the intersection ixs is the target length n, xsa and xsb are added to a list in a dictionary using ixs as a key
duplicate appends are avoided (unless there are duplicate tuples in the source data)

Return various sum totals of integer list

Is there a way to return various sums of a list of integers? Pythonic or otherwise.
For e.g. various sum totals from [1, 2, 3, 4] would produce 1+2=3, 1+3=4, 1+4=5, 2+3=5, 2+4=6, 3+4=7. Integers to be summed could by default be stuck to two integers only or more I guess.
Can't seem to wrap my head around how to tackle this and can't seem to find an example or explanation on the internet as they all lead to "Sum even/odd numbers in list" and other different problems.
You can use itertools.combinations and sum:
from itertools import combinations
li = [1, 2, 3, 4]
# assuming we don't need to sum the entire list or single numbers,
# and that x + y is the same as y + x
for sum_size in range(2, len(li)):
for comb in combinations(li, sum_size):
print(comb, sum(comb))
outputs
(1, 2) 3
(1, 3) 4
(1, 4) 5
(2, 3) 5
(2, 4) 6
(3, 4) 7
(1, 2, 3) 6
(1, 2, 4) 7
(1, 3, 4) 8
(2, 3, 4) 9
is this what you are looking for ?
A=[1,2,3,4] for i in A: for j in A: if i!=j: print(i+j)

loop through 3 lists to Get substrings from elements of a list by using the start and end point from the other 2 lists

I have 3 different lists. Using the corresponding start point and end point from 2 lists I want to make a new list that has the substring from the 1st list.
for i in string_list:
for x in Start_Point_list:
for y in End_Point_list:
m= string_list[(i for i in Start_Point_list): (y for y in End_Point_list)]
I want to get a list of [m]
Do this:
m = [string_list[s:e] for s,e in zip(Start_Point_list, End_Point_list)]
zip() returns an iterator of the corresponding elements in its arguments. That is, zip([1, 2, 3, 4], [3, 2, 1, 0]) would yield (1, 3), (2, 2), (3, 1), and (4, 0)

How can I order a list of connections

I currently have a list of connections stored in a list where each connection is a directed link that connects two points and no point ever links to more than one point or is linked to by more than one point. For example:
connections = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]
Should produce:
ordered = [ [ 4, 6, 5, 3, 7, 8 ], [ 1, 2, 1 ] ]
I have attempt to do this using an algorithm that takes an input point and a list of connections and recursively calls itself to find the next point and add it to the growing ordered list. However, my algorithm breaks down when I don't start with the correct point (though this should just be a matter of repeating the same algorithm in reverse), but also when there are multiple unconnected strands
What would be the best way of writing an efficient algorithm to order these connections?
Algorithm for a Solution
You're looking for a topological sort algorithm:
from collections import defaultdict
def topological_sort(dependency_pairs):
'Sort values subject to dependency constraints'
num_heads = defaultdict(int) # num arrows pointing in
tails = defaultdict(list) # list of arrows going out
for h, t in dependency_pairs:
num_heads[t] += 1
tails[h].append(t)
ordered = [h for h in tails if h not in num_heads]
for h in ordered:
for t in tails[h]:
num_heads[t] -= 1
if not num_heads[t]:
ordered.append(t)
cyclic = [n for n, heads in num_heads.iteritems() if heads]
return ordered, cyclic
if __name__ == '__main__':
connections = [(3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1)]
print topological_sort(connections)
Here is the output for your sample data:
([4, 6, 5, 3, 7, 8], [1, 2])
The runtime is linearly proportional to the number of edges (dependency pairs).
HOW IT WORKS
The algorithm is organized around a lookup table called num_heads that keeps a count the number of predecessors (incoming arrows). Consider an example with the following connections: a->h b->g c->f c->h d->i e->d f->b f->g h->d h->e i->b, the counts are:
node number of incoming edges
---- ------------------------
a 0
b 2
c 0
d 2
e 1
f 1
g 2
h 2
i 1
The algorithm works by "visting" nodes with no predecessors. For example, nodes a and c have no incoming edges, so they are visited first.
Visiting means that the nodes are output and removed from the graph. When a node is visited, we loop over its successors and decrement their incoming count by one.
For example, in visiting node a, we go to its successor h to decrement its incoming count by one (so that h 2 becomes h 1.
Likewise, when visiting node c, we loop over its successors f and h, decrementing their counts by one (so that f 1 becomes f 0 and h 1 becomes h 0).
The nodes f and h no longer have incoming edges, so we repeat the process of outputting them and removing them from the graph until all the nodes have been visited. In the example, the visitation order (the topological sort is):
a c f h e d i b g
If num_heads ever arrives at a state when there are no nodes without incoming edges, then it means there is a cycle that cannot be topologically sorted and the algorithm exits to show the requested results.
Something like this:
from collections import defaultdict
lis = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]
dic = defaultdict(list)
for k,v in lis:
if v not in dic:
dic[k].append(v)
else:
dic[k].extend([v]+dic[v])
del dic[v]
for k,v in dic.items():
for x in v:
if x in dic and x!=k:
dic[k].extend(dic[x])
del dic[x]
print dic
print [[k]+v for k,v in dic.items()]
output:
defaultdict(<type 'list'>, {2: [1, 2], 4: [6, 5, 3, 7, 8]})
[[2, 1, 2], [4, 6, 5, 3, 7, 8]]
I think you can probably do it in O(n) with something like this:
ordered = {}
for each connection (s,t):
if t exists in ordered :
ordered[s] = [t] + ordered[t]
del ordered[t]
else:
ordered[s] = [t]
# Now merge...
for each ordered (s : [t1, t2, ... tN]):
ordered[s] = [t1, t2, ... tN] + ordered[tN]
del ordered[tN]
At the end you will be left with something like this, but you can convert that rather easily
ordered = {
4 : [6, 5, 3, 7, 8],
2 : [1, 2]
}

Categories