Find duplicate values in list of tuples in Python

Find duplicate values in list of tuples in Python - python

How do I find duplicate values in the following list of tuples?
[(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
I want to get a list like:
[4081, 4082, 4086, 4090]
I have tried using itemgetter then group by option but didn't work.
How can one do this?

Use an ordered dictionary with first items as its keys and list of second items as values (for duplicates which created using dict.setdefalt()) then pick up those that have a length more than 1:
>>> from itertools import chain
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for i, j in lst:
... d.setdefault(i,[]).append(j)
...
>>>
>>> list(chain.from_iterable([j for i, j in d.items() if len(j)>1]))
[4081, 4082, 4086, 4090]

As an alternative, if you want to use groupby, here is a way to do it:
In [1]: from itertools import groupby
In [2]: ts = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
In [3]: dups = []
In [4]: for _, g in groupby(ts, lambda x: x[0]):
...: grouped = list(g)
...: if len(grouped) > 1:
...: dups.extend([dup[1] for dup in grouped])
...:
In [5]: print(dups)
[4081, 4082, 4086, 4090]
You use groupby to group from the first element of the tuple, and add the duplicate value into the list from the tuple.

Yet another approach (without any imports):
In [896]: lot = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
In [897]: d = dict()
In [898]: for key, value in lot:
...: d[key] = d.get(key, []) + [value]
...:
...:
In [899]: d
Out[899]: {1622: [4081, 4082], 1624: [4083], 1626: [4085], 1650: [4086, 4090]}
In [900]: [d[key] for key in d if len(d[key]) > 1]
Out[900]: [[4086, 4090], [4081, 4082]]
In [901]: sorted([num for num in lst for lst in [d[key] for key in d if len(d[key]) > 1]])
Out[901]: [4081, 4081, 4082, 4082]

Haven't tested this.... (edit: yup, it works)
l = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)]
dup = []
for i, t1 in enumerate(l):
for t2 in l[i+1:]:
if t1[0]==t2[0]:
dup.extend([t1[1], t2[1]])
print dup

Related

list method to group elements in Python

Thanks for all your answers but I edit my question because it was not clear for all.
I have the following list of tuples:
[("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
I would like to have :
[("ok yes","no"),("why some","eat give about"),("tell","ask be")]
Thank you !
So I want to regroup all 1 and when a 0 appears I add the value in my list and I create a new element for the next values.

You can use itertools.groupby:
from itertools import groupby
d = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
new_d = [' '.join(j for j, _ in b) for _, b in groupby(d, key=lambda x:x[-1])]
result = [(new_d[i], new_d[i+1]) for i in range(0, len(new_d), 2)]
Output:
[('ok yes', 'no'), ('why some', 'eat give about'), ('tell', 'ask be')]

As per my understanding following code should work for your above question
list_tuples = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0)]
tups=[]
updated_list=[]
for elem in list_tuples:
if elem[1] == 0:
updated_list.append(tuple([' '.join(tups), elem[0]]))
tups=[]
else:
tups.append(elem[0])
print updated_list

One possible solution using itertools.groupby:
from operator import itemgetter
from itertools import groupby
lst = [("ok",1), ("yes",1), ("no",0), ("why",1), ("some",1), ("eat",0)]
def generate(lst):
rv = []
for v, g in groupby(lst, itemgetter(1)):
if v:
rv.append(' '.join(map(itemgetter(0), g)))
else:
for i in g:
rv.append(i[0])
yield tuple(rv)
rv = []
# yield last item if==1:
if v:
yield tuple(rv)
print([*generate(lst)])
Prints:
[('ok yes', 'no'), ('why some', 'eat')]

Python: Adding integer elements of a nested list to a list

So, I have two lists whose integer elements need to be added.
nested_lst_1 = [[6],[7],[8,9]]
lst = [1,2,3]
I need to add them such that every element in the nested list, will be added to its corresponding integer in 'lst' to obtain another nested list.
nested_list_2 = [[6 + 1],[7 + 2],[8 + 3,9 + 3]]
or
nested_list_2 = [[7],[9],[11,12]]
Then, I need to use the integers from nested_list_1 and nested_list_2 as indices to extract a substring from a string.
nested_list_1 = [[6],[7],[8,9]] *obtained above*
nested_list_2 = [[7],[9],[11,12]] *obtained above*
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
string[6:7] = 'CG'
string[7:9] = 'GTA'
string[8:11] = 'TACG'
string[9:12] = 'ACGA'
Then, I need to create a nested list of the substrings obtained:
nested_list_substrings = [['CG'],['GTA'],['TACG','ACGA']]
Finally, I need to use these substrings as key values in a dictionary which also possesses keys of type string.
keys = ['GG', 'GTT', 'TCGG']
nested_list_substrings = [['CG'],['GTA'],['TACG','ACGA']]
DNA_mutDNA = {'GG':['CG'], 'GTT':['GTA'], 'TCGG':['TACG','ACGA']}
I understand that this is a multi-step problem, but if you could assist in any way, I really appreciate it.

Assuming you don't need the intermediate variables, you can do all this with a dictionary comprehension:
a = [[6],[7],[8,9]]
b = [1,2,3]
keys = ['GG', 'GTT', 'TCGG']
s = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
DNA_mutDNA = {k: [s[start:start+length+1] for start in starts]
for k, starts, length in zip(keys, a, b)}

You can produce the substring list directly with a nested list comprehension, nested_lst_2 isn't necessary.
nested_lst_1 = [[6],[7],[8,9]]
lst = [1,2,3]
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
keys = ['GG', 'GTT', 'TCGG']
substrings = [[string[v:i+v+1] for v in u] for i, u in zip(lst, nested_lst_1)]
print(substrings)
DNA_mutDNA = dict(zip(keys, substrings))
print(DNA_mutDNA)
output
[['CG'], ['GTA'], ['TACG', 'ACGA']]
{'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}

In[2]: nested_lst_1 = [[6],[7],[8,9]]
...: lst = [1,2,3]
...: string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
...: keys = ['GG', 'GTT', 'TCGG']
In[3]: nested_lst_2 = [[elem + b for elem in a] for a, b in zip(nested_lst_1, lst)]
In[4]: nested_list_substrings = []
...: for a, b in zip(nested_lst_1, nested_lst_2):
...: nested_list_substrings.append([string[c:d + 1] for c, d in zip(a, b)])
...:
In[5]: {k: v for k, v in zip(keys, nested_list_substrings)}
Out[5]: {'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}

Surely not the most readable way to do it, here is a bit of functional style fun:
nested_lst_1 = [[6], [7], [8,9]]
lst = [1, 2, 3]
nested_lst_2 = list(map(
list,
map(map, map(lambda n: (lambda x: n+x), lst), nested_lst_1)))
nested_lst_2
Result looks as expected:
[[7], [9], [11, 12]]
Then:
from itertools import starmap
from operator import itemgetter
make_slices = lambda l1, l2: starmap(slice, zip(l1, map(lambda n: n+1, l2)))
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
get_slice = lambda s: itemgetter(s)(string)
nested_list_substrings = list(map(
lambda slices: list(map(get_slice, slices)),
starmap(make_slices, zip(nested_lst_1, nested_lst_2))))
nested_list_substrings
Result:
[['CG'], ['GTA'], ['TACG', 'ACGA']]
And finally:
keys = ['GG', 'GTT', 'TCGG']
DNA_mutDNA = dict(zip(keys, nested_list_substrings))
DNA_mutDNA
Final result:
{'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}

Computing mean of all tuple values where 1st number is similar

Consider list of tuples
[(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
how to compute mean of all tuple values in pythonic way where 1st number is similar? for example the answer has to be
[(7751, 0.532144902698135), (6631, 0.03942129)]

One way is using collections.defaultdict
from collections import defaultdict
lst = [(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
d_dict = defaultdict(list)
for k,v in lst:
d_dict[k].append(v)
[(k,sum(v)/len(v)) for k,v in d_dict.items()]
#[(7751, 0.5321449026981354), (6631, 0.03942129)]

You do with groupby ,
from itertools import groupby
result = []
for i,g in groupby(sorted(lst),key=lambda x:x[0]):
grp = list(g)
result.append((i,sum(i[1] for i in grp)/len(grp)))
Using, list comprehension,
def get_avg(g):
grp = list(g)
return sum(i[1] for i in grp)/len(grp)
result = [(i,get_avg(g)) for i,g in groupby(sorted(lst),key=lambda x:x[0])]
Result
[(6631, 0.03942129), (7751, 0.5321449026981354)]

groupby from itertools is your friend:
>>> l=[(7751, 0.9407466053962708), (6631, 0.03942129), (7751, 0.1235432)]
>>> #importing libs:
>>> from itertools import groupby
>>> from statistics import mean #(only python >= 3.4)
>>> # mean=lambda l: sum(l) / float(len(l)) #(for python < 3.4) (*1)
>>> #set the key to group and sort and sorting
>>> k=lambda x: x[0]
>>> data = sorted(l, key=k)
>>> #here it is, pythonic way:
>>> [ (k, mean([m[1] for m in g ])) for k, g in groupby(data, k) ]
Results:
[(6631, 0.03942129), (7751, 0.5321449026981354)]
EDITED (*1) Thanks Elmex80s to refer me to mean.

Max index of repeating elements divisible by 'n' in a List in Python

I have a sorted list of numbers like:
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
I need to find the max index of each values which is divisible by 100.
Output should be like: 4,10,15
My Code:
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
idx = 1
for i in (a):
if i%100 == 0:
print idx
idx = idx+1
Output of above code:
4
9
10
13
14
15

In case people are curious, I benchmarked the dict comprehension technique against the backward iteration technique. Dict comprehension is about twice the speed. Changing to OrderedDict resulted in MASSIVE slowdown. About 15x slower than the dict comprehension.
def test1():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
max_index = {}
for i, item in enumerate(a[::-1]):
if item not in max_index:
max_index[item] = len(a) - (i + 1)
return max_index
def test2():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
return {item: index for index, item in enumerate(a, 1)}
def test3():
a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
OrderedDict((item, index) for index, item in enumerate(a, 1))
if __name__ == "__main__":
import timeit
print(timeit.timeit("test1()", setup="from __main__ import test1"))
print(timeit.timeit("test2()", setup="from __main__ import test2"))
print(timeit.timeit("test3()", setup="from __main__ import test3; from collections import OrderedDict"))
3.40622282028
1.97545695305
26.347012043

Use a simple dict-comprehension or OrderedDict with divisible items as the keys, old values will be replaced by newest values automatically.
>>> {item: index for index, item in enumerate(lst, 1) if not item % 100}.values()
dict_values([4, 10, 15])
# if order matters
>>> from collections import OrderedDict
>>> OrderedDict((item, index) for index, item in enumerate(lst, 1) if not item % 100).values()
odict_values([4, 10, 15])
Another way will be to loop over reversed list and use a set to keep track of items seen so far(lst[::-1] may be slightly faster than reversed(lst) for tiny lists).
>>> seen = set()
>>> [len(lst) - index for index, item in enumerate(reversed(lst))
if not item % 100 and item not in seen and not seen.add(item)][::-1]
[4, 10, 15]
You can see the sort-of equivalent code of the above here.

You could use itertools.groupby since your data is sorted:
>>> a = [77,98,99,100,101,102,198,199,200,200,278,299,300,300,300]
>>> from itertools import groupby
>>> [list(g)[-1][0] for k,g in groupby(enumerate(a), lambda t: (t[1] % 100, t[1])) if k[0] == 0]
[3, 9, 14]
Although this is a little cryptic.
Here's a complicated approach using only a list-iterator and accumulating into a list:
>>> run, prev, idx = False, None, []
>>> for i, e in enumerate(a):
... if not (e % 100 == 0):
... if not run:
... prev = e
... continue
... idx.append(i - 1)
... run = False
... else:
... if prev != e and run:
... idx.append(i - 1)
... run = True
... prev = e
...
>>> if run:
... idx.append(i)
...
>>> idx
[3, 9, 14]
I think this is best dealt with a dictionary approach like #AshwiniChaudhary It is more straightforward, and much faster:
>>> timeit.timeit("{item: index for index, item in enumerate(a, 1)}", "from __main__ import a")
1.842843743012054
>>> timeit.timeit("[list(g)[-1][0] for k,g in groupby(enumerate(a), lambda t: (t[1] % 100, t[1])) if k[0] == 0]", "from __main__ import a, groupby")
8.479677081981208
The groupby approach is pretty slow, note, the complicated approach is faster, and not far-off form the dict-comprehension approach:
>>> def complicated(a):
... run, prev, idx = False, None, []
... for i, e in enumerate(a):
... if not (e % 100 == 0):
... if not run:
... prev = e
... continue
... idx.append(i - 1)
... run = False
... else:
... if prev != e and run:
... idx.append(i - 1)
... run = True
... prev = e
... if run:
... idx.append(i)
... return idx
...
>>> timeit.timeit("complicated(a)", "from __main__ import a, complicated")
2.6667005629860796
Edit Note, the performance difference narrows if we call list on the dict-comprehension .values():
>>> timeit.timeit("list({item: index for index, item in enumerate(a, 1)}.values())", "from __main__ import a")
2.3839886570058297
>>> timeit.timeit("complicated(a)", "from __main__ import a, complicated")
2.708565960987471

it seemed like a good idea at the start, got a bit twisty, had to patch a couple of cases...
a = [0,77,98,99,100,101,102,198,199,200,200,278,299,300,300,300, 459, 700,700]
bz = [*zip(*((i, d//100) for i, d in enumerate(a) if d%100 == 0 and d != 0))]
[a for a, b, c in zip(*bz, bz[1][1:]) if c-b != 0] + [bz[0][-1]]
Out[78]: [4, 10, 15, 18]
enumerate, zip to create bz which mates 100's numerator(s) with indices
bz = [*zip(*((i, d//100) for i, d in enumerate(a) if d%100 == 0 and d != 0))]
print(*bz, sep='\n')
(4, 9, 10, 13, 14, 15, 17, 18)
(1, 2, 2, 3, 3, 3, 7, 7)
then zip again, zip(*bz, bz[1][1:]) lagging the numerator tuple to allow the lagged difference to give a selection logic if c-b != 0for the last index of each run but the last
add the last 100's match because its always the end of the last run + [bz[0][-1]]

how to apply a groupby on list of tuples in python?

In my function I will create different tuples and add to an empty list :
tup = (pattern,matchedsen)
matchedtuples.append(tup)
The patterns have format of regular expressions. I am looking for apply groupby() on matchedtuples in following way:
For example :
matchedtuples = [(p1, s1) , (p1,s2) , (p2, s5)]
And I am looking for this result:
result = [ (p1,(s1,s2)) , (p2, s5)]
So, in this way I will have groups of sentences with the same pattern. How can I do this?

My answer for your question will work for any input structure you will use and print the same output as you gave. And i will use only groupby from itertools module:
# Let's suppose your input is something like this
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', 's5')]
The same solution work if you add more values to your input:
# When you add more values to your input
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5"), ("p2", "s6"), ("p3", "s7")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', ('s5', 's6')), ('p3', 's7')]
Now, if you modify your input structure:
# Let's suppose your modified input is something like this
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], ['s5'])]
Also, the same solution work if you add more values to your new input structure:
# When you add more values to your new input
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"]), (["p2"], ["s6"]), (["p3"], ["s7"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], (['s5'], ['s6'])), (['p3'], ['s7'])]
Ps: Test this code and if it breaks with any other kind of inputs please let me know.

If you require the output you present, you'll need to manually loop through the grouping of matchedtuples and build your list.
First, of course, if the matchedtuples list isn't sorted, sort it with itemgetter:
from operator import itemgetter as itmg
li = sorted(matchedtuples, key=itmg(0))
Then, loop through the result supplied by groupby and append to the list r based on the size of the group:
r = []
for i, j in groupby(matchedtuples, key=itmg(0)):
j = list(j)
ap = (i, j[0][1]) if len(j) == 1 else (i, tuple(s[1] for s in j))
r.append(ap)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find duplicate values in list of tuples in Python - python

How do I find duplicate values in the following list of tuples? [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)] I want to get a list like: [4081, 4082, 4086, 4090] I have tried using itemgetter then group by option but didn't work. How can one do this?

Haven't tested this.... (edit: yup, it works) l = [(1622, 4081), (1622, 4082), (1624, 4083), (1626, 4085), (1650, 4086), (1650, 4090)] dup = [] for i, t1 in enumerate(l): for t2 in l[i+1:]: if t1[0]==t2[0]: dup.extend([t1[1], t2[1]]) print dup

Related

list method to group elements in Python

Python: Adding integer elements of a nested list to a list

Computing mean of all tuple values where 1st number is similar

Max index of repeating elements divisible by 'n' in a List in Python

how to apply a groupby on list of tuples in python?

Categories

Resources