Grouping the nested attribute list in Python - python

I have a list
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
How can I group the list by the initial three lines, so that in the end it's like this. Grouping occurs on three characters of the line. If the line starts with "orb", then subsequent lines are added to the list that begins with this line. Thanks for the answer.
result = [['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']]

Here is an algorithm of O(N) complexity:
res = []
tmp = []
for x in lst:
if x.startswith('orb'):
if tmp:
res.append(tmp)
tmp = [x]
elif tmp:
tmp.append(x)
res.append(tmp)
result:
In [133]: res
Out[133]:
[['orb|2|3|4', 'obx|2|3|4'],
['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]

You can use itertools.groupby:
import itertools, re
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
new_result = [list(b) for _, b in itertools.groupby(lst, key=lambda x:re.findall('^\w+', x)[0])]
final_result = [new_result[i]+new_result[i+1] for i in range(0, len(new_result), 2)]
Output:
[['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]

Related

What is an easy way to remove duplicates from only part of the string in Python?

I have a list of strings that goes like this:
1;213;164
2;213;164
3;213;164
4;213;164
5;213;164
6;213;164
7;213;164
8;213;164
9;145;112
10;145;112
11;145;112
12;145;112
13;145;112
14;145;112
15;145;112
16;145;112
17;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
I would like to remove all duplicates where second 2 numbers are the same. So after running it through program I would get something like this:
1;213;164
9;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
But something like
8;213;164
15;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
would also be correct.
Here is a nice and fast trick you can use (assuming l is your list):
list({ s.split(';', 1)[1] : s for s in l }.values())
No need to import anything, and fast as can be.
In general you can define:
def custom_unique(L, keyfunc):
return list({ keyfunc(li): li for li in L }.values())
You can group the items by this key and then use the first item in each group (assuming l is your list).
import itertools
keyfunc = lambda x: x.split(";", 1)[1]
[next(g) for k, g in itertools.groupby(sorted(l, key=keyfunc), keyfunc)]
Here is a code on the few first items, just switch my list with yours:
x = [
'7;213;164',
'8;213;164',
'9;145;112',
'10;145;112',
'11;145;112',
]
new_list = []
for i in x:
check = True
s_part = i[i.find(';'):]
for j in new_list:
if s_part in j:
check = False
if check == True:
new_list.append(i)
print(new_list)
Output:
['7;213;164', '9;145;112']

list method to group elements in Python

Thanks for all your answers but I edit my question because it was not clear for all.
I have the following list of tuples:
[("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
I would like to have :
[("ok yes","no"),("why some","eat give about"),("tell","ask be")]
Thank you !
So I want to regroup all 1 and when a 0 appears I add the value in my list and I create a new element for the next values.
You can use itertools.groupby:
from itertools import groupby
d = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0),("give",0),("about",0),("tell",1),("ask",0),("be",0)]
new_d = [' '.join(j for j, _ in b) for _, b in groupby(d, key=lambda x:x[-1])]
result = [(new_d[i], new_d[i+1]) for i in range(0, len(new_d), 2)]
Output:
[('ok yes', 'no'), ('why some', 'eat give about'), ('tell', 'ask be')]
As per my understanding following code should work for your above question
list_tuples = [("ok",1),("yes",1),("no",0),("why",1),("some",1),("eat",0)]
tups=[]
updated_list=[]
for elem in list_tuples:
if elem[1] == 0:
updated_list.append(tuple([' '.join(tups), elem[0]]))
tups=[]
else:
tups.append(elem[0])
print updated_list
One possible solution using itertools.groupby:
from operator import itemgetter
from itertools import groupby
lst = [("ok",1), ("yes",1), ("no",0), ("why",1), ("some",1), ("eat",0)]
def generate(lst):
rv = []
for v, g in groupby(lst, itemgetter(1)):
if v:
rv.append(' '.join(map(itemgetter(0), g)))
else:
for i in g:
rv.append(i[0])
yield tuple(rv)
rv = []
# yield last item if==1:
if v:
yield tuple(rv)
print([*generate(lst)])
Prints:
[('ok yes', 'no'), ('why some', 'eat')]

how to separate cam1,2,3,4,5,6 first images from the list

lst = ['Cam218-10-03_16-05-21-54.jpg',
'Cam318-10-03_17-04-21-54.jpg',
'Cam418-10-03_16-04-21-54.jpg',
'Cam218-10-02_16-05-21-54.jpg',
'Cam318-10-02_17-04-21-54.jpg',
'Cam418-10-02_16-04-21-54.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg',
'Cam118-10-02_16-04-09-33.jpg',
'Cam218-10-02_16-04-09-33.jpg',
'Cam318-10-02_16-04-09-33.jpg',
'Cam418-10-02_16-04-09-33.jpg',
'Cam518-10-02_16-04-09-33.jpg',
'Cam618-10-02_16-04-09-33.jpg',
'Cam118-10-02_16-04-11-53.jpg',
'Cam218-10-02_16-04-11-53.jpg',
'Cam318-10-02_16-04-11-53.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam118-10-02_16-04-08-31.jpg',
'Cam518-10-02_16-04-11-53.jpg',
'Cam118-10-02_16-04-11-53.jpg']
From this list I want the output:
['Cam118-10-02_16-04-08-31.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg']
by using Python. Could anybody help me?
With itertools.groupby - O(n*log(n))
>>> from itertools import groupby
>>> [next(g) for _, g in groupby(sorted(lst), key=lambda cam: cam.partition('-')[0])]
['Cam118-10-02_16-04-08-31.jpg',
'Cam218-10-02_16-04-08-31.jpg',
'Cam318-10-02_16-04-08-30.jpg',
'Cam418-10-02_16-04-08-30.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg']
With keeping track of duplicates manually (output not sorted, but potentially useful to other readers) - O(n)
>>> seen = set()
>>> result = []
>>>
>>> for cam in lst:
...: model, *_ = cam.partition('-')
...: if model not in seen:
...: result.append(cam)
...: seen.add(model)
...:
>>> result
['Cam218-10-03_16-05-21-54.jpg',
'Cam318-10-03_17-04-21-54.jpg',
'Cam418-10-03_16-04-21-54.jpg',
'Cam518-10-02_16-04-08-35.jpg',
'Cam618-10-02_16-04-08-36.jpg',
'Cam118-10-02_16-04-09-33.jpg']
you can make if condition to check for the occurrence of the photo tag after sorting the list
list.sort()
i = 1
for item in list:
if(item[3]==str(i)):
i=i+1
print(item)
continue
the result is
Cam118-10-02_16-04-08-31.jpg
Cam218-10-02_16-04-08-31.jpg
Cam318-10-02_16-04-08-30.jpg
Cam418-10-02_16-04-08-30.jpg
Cam518-10-02_16-04-08-35.jpg
Cam618-10-02_16-04-08-36.jpg
if you want to get the first occurrence of item with no regards to its order ascendingly, removing list.sort() shall resolve that.

Slice a list into a nested list based on special characters using Python

I have a list of strings like this:
lst = ['23532','user_name=app','content=123',
'###########################',
'54546','user_name=bee','content=998 hello','source=fb',
'###########################',
'12/22/2015']
I want a similar method like string.split('#') that can give me output like this:
[['23532','user_name=app','content='123'],
['54546','user_name=bee',content='998 hello','source=fb'],
['12/22/2015']]
but I know list has not split attribute. I cannot use ''.join(lst) either because this list comes from part of a txt file I read in and my txt.file was too big, so it will throw an memory error to me.
I don't think there's a one-liner for this, but you can easily write a generator to do what you want:
def sublists(lst):
x = []
for item in lst:
if item == '###########################': # or whatever condition you like
if x:
yield x
x = []
else:
x.append(item)
if x:
yield x
new_list = list(sublists(old_list))
If you can't use .join(), you can loop through the list and save the index of any string that contains # then loop again to slice the list:
lst = ['23532', 'user_name=app', 'content=123', '###########################' ,'54546','user_name=bee','content=998 hello','source=fb','###########################','12/22/2015']
idx = []
new_lst = []
for i,val in enumerate(lst):
if '#' in val:
idx.append(i)
j = 0
for x in idx:
new_lst.append(lst[j:x])
j = x+1
new_lst.append(lst[j:])
print new_lst
output:
[['23532', 'user_name=app', 'content=123'], ['54546', 'user_name=bee', 'content=998 hello', 'source=fb'], ['12/22/2015']]
sep = '###########################'
def split_list(_list):
global sep
lists = list()
sub_list = list()
for x in _list:
if x == sep:
lists.append(sub_list)
sub_list = list()
else:
sub_list.append(x)
lists.append(sub_list)
return lists
l = ['23532','user_name=app','content=123',
'###########################',
'54546','user_name=bee','content=998 hello','source=fb',
'###########################',
'12/22/2015']
pprint(split_list(l))
Output:
[['23532', 'user_name=app', 'content=123'],
['54546', 'user_name=bee', 'content=998 hello', 'source=fb'],
['12/22/2015']]
You can achieve this by itertools.groupby
from itertools import groupby
lst = ['23532','user_name=app','content=123',
'###########################','54546','user_name=bee','content=998 hello','source=fb',
'###########################','12/22/2015']
[list(g) for k, g in groupby(lst, lambda x: x == '###########################') if not k ]
Output
[['23532', 'user_name=app', 'content=123'],
['54546', 'user_name=bee', 'content=998 hello', 'source=fb'],
['12/22/2015']]

Splitting Nested List at every ['-1']

If I have a nested list like this:
[['01'], ['02'], ['-1'], ['03'], ['04']]
Is there a way I split this nested list at every ['-1']?
So that it looks like this:
[[['01'], ['02']], [['03'], ['04']]]
Any sort of help would be appreciated :)
You can use itertools.groupby to group at every occurrence of your split value (here ['-1']). if not k ensures that we leave out the split value itself.
orig = [['01'], ['02'], ['-1'], ['03'], ['04']]
from itertools import groupby
n = [list(g) for k, g in groupby(orig, lambda x: x == ['-1']) if not k]
Try this,
lists = [['01'], ['02'], ['-1'], ['03'], ['04'], ['-1'], ['05'], ['-1']]
results = list()
prev_idx = 0
for idx, l in enumerate(lists):
if l == ['-1']:
results.append(lists[prev_idx:idx])
prev_idx = idx+1
if prev_idx <= idx: # the last group might be [] as shown in this case
results.append(lists[prev_idx:])
print(results)
# Output
[[['01'], ['02']], [['03'], ['04']], [['05']]]
Seems like a usecase for groupby
>>> from itertools import groupby
>>> l = [['01'], ['02'], ['-1'], ['03'], ['04'], ['-1'], ['05'], ['06']]
>>> [list(g) for k,g in groupby(l, lambda x: x == ['-1']) if not k]
[[['01'], ['02']], [['03'], ['04']], [['05'], ['06']]]
itertools.groupby docs
A good old fashioned loop should do it:
l = [['01'], ['02'], ['-1'], ['03'], ['04']]
new = []
current = [] # Build a new list here
for i, item in enumerate(l):
if item != ['-1']:
current.append(item)
if i == len(l) - 1: # If the item is the last in the list
new.append(current)
else:
new.append(current)
current = []
>>> [[['01'], ['02']], [['03'], ['04']]]

Categories