list method to group elements in Python - python

Thanks for all your answers but I edit my question because it was not clear for all.
I have the following list of tuples:
[("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
I would like to have :
[("ok yes","no"),("why some","eat give about"),("tell","ask be")]
Thank you !
So I want to regroup all 1 and when a 0 appears I add the value in my list and I create a new element for the next values.

You can use itertools.groupby:
from itertools import groupby
d = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
new_d = [' '.join(j for j, _ in b) for _, b in groupby(d, key=lambda x:x[-1])]
result = [(new_d[i], new_d[i+1]) for i in range(0, len(new_d), 2)]
Output:
[('ok yes', 'no'), ('why some', 'eat give about'), ('tell', 'ask be')]

As per my understanding following code should work for your above question
list_tuples = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0)]
tups=[]
updated_list=[]
for elem in list_tuples:
if elem[1] == 0:
updated_list.append(tuple([' '.join(tups), elem[0]]))
tups=[]
else:
tups.append(elem[0])
print updated_list

One possible solution using itertools.groupby:
from operator import itemgetter
from itertools import groupby
lst = [("ok",1), ("yes",1), ("no",0), ("why",1), ("some",1), ("eat",0)]
def generate(lst):
rv = []
for v, g in groupby(lst, itemgetter(1)):
if v:
rv.append(' '.join(map(itemgetter(0), g)))
else:
for i in g:
rv.append(i[0])
yield tuple(rv)
rv = []
# yield last item if==1:
if v:
yield tuple(rv)
print([*generate(lst)])
Prints:
[('ok yes', 'no'), ('why some', 'eat')]

Related

What is an easy way to remove duplicates from only part of the string in Python?

I have a list of strings that goes like this:
1;213;164
2;213;164
3;213;164
4;213;164
5;213;164
6;213;164
7;213;164
8;213;164
9;145;112
10;145;112
11;145;112
12;145;112
13;145;112
14;145;112
15;145;112
16;145;112
17;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
I would like to remove all duplicates where second 2 numbers are the same. So after running it through program I would get something like this:
1;213;164
9;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
But something like
8;213;164
15;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
would also be correct.
Here is a nice and fast trick you can use (assuming l is your list):
list({ s.split(';', 1)[1] : s for s in l }.values())
No need to import anything, and fast as can be.
In general you can define:
def custom_unique(L, keyfunc):
return list({ keyfunc(li): li for li in L }.values())
You can group the items by this key and then use the first item in each group (assuming l is your list).
import itertools
keyfunc = lambda x: x.split(";", 1)[1]
[next(g) for k, g in itertools.groupby(sorted(l, key=keyfunc), keyfunc)]
Here is a code on the few first items, just switch my list with yours:
x = [
'7;213;164',
'8;213;164',
'9;145;112',
'10;145;112',
'11;145;112',
]
new_list = []
for i in x:
check = True
s_part = i[i.find(';'):]
for j in new_list:
if s_part in j:
check = False
if check == True:
new_list.append(i)
print(new_list)
Output:
['7;213;164', '9;145;112']

Grouping the nested attribute list in Python

I have a list
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
How can I group the list by the initial three lines, so that in the end it's like this. Grouping occurs on three characters of the line. If the line starts with "orb", then subsequent lines are added to the list that begins with this line. Thanks for the answer.
result = [['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']]
Here is an algorithm of O(N) complexity:
res = []
tmp = []
for x in lst:
if x.startswith('orb'):
if tmp:
res.append(tmp)
tmp = [x]
elif tmp:
tmp.append(x)
res.append(tmp)
result:
In [133]: res
Out[133]:
[['orb|2|3|4', 'obx|2|3|4'],
['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]
You can use itertools.groupby:
import itertools, re
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
new_result = [list(b) for _, b in itertools.groupby(lst, key=lambda x:re.findall('^\w+', x)[0])]
final_result = [new_result[i]+new_result[i+1] for i in range(0, len(new_result), 2)]
Output:
[['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]

Create sublists with duplicate list elements

I'm new to Python and I'm trying to create sublists for list elements sharing the same base:
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
desiredOutput = [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
I've tried to first isolate the base from each list element using:
def commonNumerator(self):
checkPosition = self.find('/')
commonNumerator = self[:checkPosition]
return commonNumerator
listRawModified = [commonNumerator(x) for x in listRaw]
print(listRawModified)
which gets me:
['AKS', 'SBHS', 'SBJ', 'SBJ', 'AKS', 'SBHS', 'AKS']
but from then I don't know how to proceed to get to the desired ouput.
Can someone explain to me how to do it?
Typical usecase for itertools.groupby():
from itertools import groupby
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
def key(s):
return s.split('/')[0]
[list(g) for k, g in groupby(sorted(listRaw, key=key), key=key)]
# [['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
The key() function helps in extracting the sorting/grouping key: key('AKS/STB') == 'AKS'.
Another way to do this would be to split each element and create a dictionary and then construct your desired output from that dictionary, e.g.:
In []:
d = {}
for i in listRaw:
k, v = i.split('/')
d.setdefault(k, []).append(v)
[['/'.join([k, v]) for v in d[k]] for k in d]
Out[]:
[['AKS/STB', 'AKS/OSMX', 'AKS/AKX'], ['SBHS/AME', 'SBHS/ABNX'], ['SBJ/OAK', 'SBJ/ALS']]
This is a typical usecase for itertools. But you could also consider storing the values in a dictionary:
from collections import defaultdict
d = defaultdict(list)
listRaw = ['AKS/STB', 'SBHS/AME', 'SBJ/OAK', 'SBJ/ALS', 'AKS/OSMX', 'SBHS/ABNX', 'AKS/AKX']
for item in listRaw:
i,y = item.split('/')
d[i].append(y)
print(dict(d))
# {'AKS': ['STB', 'OSMX', 'AKX'], 'SBHS': ['AME', 'ABNX'], 'SBJ': ['OAK', 'ALS']}
You can then access the values to AKS with a simple command as:
d['AKS'] # ['STB', 'OSMX', 'AKX']

how to apply a groupby on list of tuples in python?

In my function I will create different tuples and add to an empty list :
tup = (pattern,matchedsen)
matchedtuples.append(tup)
The patterns have format of regular expressions. I am looking for apply groupby() on matchedtuples in following way:
For example :
matchedtuples = [(p1, s1) , (p1,s2) , (p2, s5)]
And I am looking for this result:
result = [ (p1,(s1,s2)) , (p2, s5)]
So, in this way I will have groups of sentences with the same pattern. How can I do this?
My answer for your question will work for any input structure you will use and print the same output as you gave. And i will use only groupby from itertools module:
# Let's suppose your input is something like this
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', 's5')]
The same solution work if you add more values to your input:
# When you add more values to your input
a = [("p1", "s1"), ("p1", "s2"), ("p2", "s5"), ("p2", "s6"), ("p3", "s7")]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[('p1', ('s1', 's2')), ('p2', ('s5', 's6')), ('p3', 's7')]
Now, if you modify your input structure:
# Let's suppose your modified input is something like this
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], ['s5'])]
Also, the same solution work if you add more values to your new input structure:
# When you add more values to your new input
a = [(["p1"], ["s1"]), (["p1"], ["s2"]), (["p2"], ["s5"]), (["p2"], ["s6"]), (["p3"], ["s7"])]
from itertools import groupby
result = []
for key, values in groupby(a, lambda x : x[0]):
b = tuple(values)
if len(b) >= 2:
result.append((key, tuple(j[1] for j in b)))
else:
result.append(tuple(j for j in b)[0])
print(result)
Output:
[(['p1'], (['s1'], ['s2'])), (['p2'], (['s5'], ['s6'])), (['p3'], ['s7'])]
Ps: Test this code and if it breaks with any other kind of inputs please let me know.
If you require the output you present, you'll need to manually loop through the grouping of matchedtuples and build your list.
First, of course, if the matchedtuples list isn't sorted, sort it with itemgetter:
from operator import itemgetter as itmg
li = sorted(matchedtuples, key=itmg(0))
Then, loop through the result supplied by groupby and append to the list r based on the size of the group:
r = []
for i, j in groupby(matchedtuples, key=itmg(0)):
j = list(j)
ap = (i, j[0][1]) if len(j) == 1 else (i, tuple(s[1] for s in j))
r.append(ap)

Splitting Nested List at every ['-1']

If I have a nested list like this:
[['01'], ['02'], ['-1'], ['03'], ['04']]
Is there a way I split this nested list at every ['-1']?
So that it looks like this:
[[['01'], ['02']], [['03'], ['04']]]
Any sort of help would be appreciated :)
You can use itertools.groupby to group at every occurrence of your split value (here ['-1']). if not k ensures that we leave out the split value itself.
orig = [['01'], ['02'], ['-1'], ['03'], ['04']]
from itertools import groupby
n = [list(g) for k, g in groupby(orig, lambda x: x == ['-1']) if not k]
Try this,
lists = [['01'], ['02'], ['-1'], ['03'], ['04'], ['-1'], ['05'], ['-1']]
results = list()
prev_idx = 0
for idx, l in enumerate(lists):
if l == ['-1']:
results.append(lists[prev_idx:idx])
prev_idx = idx+1
if prev_idx <= idx: # the last group might be [] as shown in this case
results.append(lists[prev_idx:])
print(results)
# Output
[[['01'], ['02']], [['03'], ['04']], [['05']]]
Seems like a usecase for groupby
>>> from itertools import groupby
>>> l = [['01'], ['02'], ['-1'], ['03'], ['04'], ['-1'], ['05'], ['06']]
>>> [list(g) for k,g in groupby(l, lambda x: x == ['-1']) if not k]
[[['01'], ['02']], [['03'], ['04']], [['05'], ['06']]]
itertools.groupby docs
A good old fashioned loop should do it:
l = [['01'], ['02'], ['-1'], ['03'], ['04']]
new = []
current = [] # Build a new list here
for i, item in enumerate(l):
if item != ['-1']:
current.append(item)
if i == len(l) - 1: # If the item is the last in the list
new.append(current)
else:
new.append(current)
current = []
>>> [[['01'], ['02']], [['03'], ['04']]]

Categories