Python groupby to split list by delimiter - python

I am pretty new to Python (3.6) and struggling to understand itertools groupby.
I've got the following list containing integers:
list1 = [1, 2, 0, 2, 3, 0, 4, 5, 0]
But the list could also be much longer and the '0' doesn't have to appear after every pair of numbers. It can also be after 3, 4 or more numbers. My goal is to split this list into sublists where the '0' is used as a delimiter and doesn't appear in any of these sublists.
list2 = [[1, 2], [2, 3], [4, 5]]
A similar problem has been solved here already:
Python spliting a list based on a delimiter word
Answer 2 seemed to help me a lot but unfortunately it only gave me a TypeError.
import itertools as it
list1 = [1, 2, 0, 2, 3, 0, 4, 5, 0]
list2 = [list(group) for key, group in it.groupby(list1, lambda x: x == 0) if not key]
print(list2)
File "H:/Python/work/ps0001/example.py", line 13, in
list2 = [list(group) for key, group in it.groupby(list, lambda x: x == '0') if not key]
TypeError: 'list' object is not callable
I would appreciate any help and be very happy to finally understand groupby.

You were checking for "0" (str) but you only have 0 (int) in your list. Also, you were using list as a variable name for your first list, which is a keyword in Python.
from itertools import groupby
list1 = [1, 2, 0, 2, 7, 3, 0, 4, 5, 0]
list2 = [list(group) for key, group in groupby(list1, lambda x: x == 0) if not key]
print(list2)
This should give you:
[[1, 2], [2, 7, 3], [4, 5]]

In your code, you need to change lambda x: x == '0' to lambda x: x == 0, since your working with a list of int, not a list of str.
Since others have shown how to improve your solution with itertools.groupby, you can also do this task with no libraries:
>>> list1 = [1, 2, 0, 2, 3, 0, 4, 5, 0]
>>> zeroes = [-1] + [i for i, e in enumerate(list1) if e == 0]
>>> result = [list1[zeroes[i] + 1: zeroes[i + 1]] for i in range(len(zeroes) - 1)]
>>> print(result)
[[1, 2], [2, 3], [4, 5]]

You can use regex for this:
>>> import ast
>>> your_list = [1, 2, 0, 2, 3, 0, 4, 5, 0]
>>> a_list = str(your_list).replace(', 0,', '], [').replace(', 0]', ']')
>>> your_result = ast.literal_eval(a_list)
>>> your_result
([1, 2], [2, 3], [4, 5])
>>> your_result[0]
[1, 2]
>>>
Or a single line solution:
ast.literal_eval(str(your_list).replace(', 0,', '], [').replace(', 0]', ']'))

You could do that within a Loop as depicted in the commented Snippet below:
list1 = [1, 2, 0, 2, 3, 0, 4, 5, 0]
tmp,result = ([],[]) # tmp HOLDS A TEMPORAL LIST :: result => RESULT
for i in list1:
if not i:
# CURRENT VALUE IS 0 SO WE BUILD THE SUB-LIST
result.append(tmp)
# RE-INITIALIZE THE tmp VARIABLE
tmp = []
else:
# SINCE CURRENT VALUE IS NOT 0, WE POPULATE THE tmp LIST
tmp.append(i)
print(result) # [[1, 2], [2, 3], [4, 5]]
Effectively:
list1 = [1, 2, 0, 2, 3, 0, 4, 5, 0]
tmp,result = ([],[]) # HOLDS A TEMPORAL LIST
for i in list1:
if not i:
result.append(tmp); tmp = []
else:
tmp.append(i)
print(result) # [[1, 2], [2, 3], [4, 5]]

Use zip to return a tuple of lists and convert them to list later on
>>> a
[1, 2, 0, 2, 3, 0, 4, 5, 0]
>>> a[0::3]
[1, 2, 4]
>>> a[1::3]
[2, 3, 5]
>>> zip(a[0::3],a[1::3])
[(1, 2), (2, 3), (4, 5)]
>>> [list(i) for i in zip(a[0::3],a[1::3])]
[[1, 2], [2, 3], [4, 5]]

Try to use join and then split by 0
lst = [1, 2, 0, 2, 3, 0, 4, 5, 0]
lst_string = "".join([str(x) for x in lst])
lst2 = lst_string.split('0')
lst3 = [list(y) for y in lst2]
lst4 = [list(map(int, z)) for z in lst3]
print(lst4)
Running on my console:

Related

How to pack consecutive duplicates of list elements into sublists?

How can I "pack" consecutive duplicated elements in a list into sublists of the repeated element?
What I mean is:
l = [1, 1, 1, 2, 2, 3, 4, 4, 1]
pack(l) -> [[1,1,1], [2,2], [3], [4, 4], [1]]
I want to do this problem in a very basic way as I have just started i.e using loops and list methods. I have looked for other methods but they were difficult for me to understand
For removing the duplicates instead of packing them, see Removing elements that have consecutive duplicates
You can use groupby:
from itertools import groupby
def pack(List):
result = []
for key, group in groupby(List):
result.append(list(group))
return result
l = [1, 1, 1, 2, 2, 3, 4, 4, 1]
print(pack(l))
Or one-line:
l = [1, 1, 1, 2, 2, 3, 4, 4, 1]
result = [list(group) for key,group in groupby(l)]
# [[1, 1, 1], [2, 2], [3], [4, 4], [1]]
You can use:
lst = [1, 1, 1, 2, 2, 3, 4, 4, 1]
# bootstrap: initialize a sublist with the first element of lst
out = [[lst[0]]]
for it1, it2 in zip(lst, lst[1:]):
# if previous item and current one are equal, append result to the last sublist
if it1 == it2:
out[-1].append(it2)
# else append a new empty sublist
else:
out.append([it2])
Output:
>>> out
[[1, 1, 1], [2, 2], [3], [4, 4], [1]]
This code will do:
data = [0,0,1,2,3,4,4,5,6,6,6,7,8,9,4,4,9,9,9,9,9,3,3,2,45,2,11,11,11]
newdata=[]
for i,l in enumerate(data):
if i==0 or l!=data[i-1]:
newdata.append([l])
else:
newdata[-1].append(l)
#Output
[[0,0],[1],[2],[3],[4,4],[5],[6,6,6],[7],[8],[9],[4,4],[9,9,9,9,9],[3,3],[2],[45],[2],[11,11,11]]

Is there a str.replace equivalent for sequence in general?

Is there a method similar to str.replace which can do the following:
>> replace(sequence=[0,1,3], old=[0,1], new=[1,2])
[1,2,3]
It should really act like str.replace : replacing a "piece" of a sequence by another sequence, not map elements of "old" with "new" 's ones.
Thanks :)
No, I'm afraid there is no built-in function that does this, however you can create your own!
The steps are really easy, we just need to slide a window over the list where the width of the window is the len(old). At each position, we check if the window == to old and if it is, we slice before the window, insert new and concatenate the rest of the list on after - this can be done simply be assigning directly to the old slice as pointed out by #OmarEinea.
def replace(seq, old, new):
seq = seq[:]
w = len(old)
i = 0
while i < len(seq) - w + 1:
if seq[i:i+w] == old:
seq[i:i+w] = new
i += len(new)
else:
i += 1
return seq
and some tests show it works:
>>> replace([0, 1, 3], [0, 1], [1, 2])
[1, 2, 3]
>>> replace([0, 1, 3, 0], [0, 1], [1, 2])
[1, 2, 3, 0]
>>> replace([0, 1, 3, 0, 1], [0, 1], [7, 8])
[7, 8, 3, 7, 8]
>>> replace([1, 2, 3, 4, 5], [1, 2, 3], [1, 1, 2, 3])
[1, 1, 2, 3, 4, 5]
>>> replace([1, 2, 1, 2], [1, 2], [3])
[3, 3]
As pointed out by #user2357112, using a for-loop leads to re-evaluating replaced sections of the list, so I updated the answer to use a while instead.
I tried this but before using this method read this about eval() by Ned :
import re
import ast
def replace(sequence, old, new):
sequence = str(sequence)
replace_s=str(str(old).replace('[', '').replace(']', ''))
if '.' in replace_s:
replace_ss=list(replace_s)
for j,i in enumerate(replace_ss):
if i=='.':
try:
replace_ss[0]=r"\b"+ replace_ss[0]
replace_ss[j]=r".\b"
except IndexError:
pass
replace_s="".join(replace_ss)
else:
replace_s = r"\b" + replace_s + r"\b"
final_ = str(new).replace('[', '').replace(']', '')
return ast.literal_eval(re.sub(replace_s, final_, sequence))
print(replace([0, 1, 3], [0, 1], [1, 2]))
output:
[1, 2, 3]

Splitting a list of arbitrary size into N-not-equal parts [duplicate]

This question already has answers here:
How to group a list of tuples/objects by similar index/attribute in python?
(3 answers)
Closed 8 years ago.
I see splitting-a-list-of-arbitrary-size-into-only-roughly-n-equal-parts. How about not-equal splitting? I have list having items with some attribute (value which can be retrieved for running same function against every item), how to split items having same attribute to be new list e.g. new sublist? Something lambda-related could work here?
Simple example could be:
list = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]
After fancy operation we could have:
list = [[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]
>>> L = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]
>>> [list(g) for i, g in itertools.groupby(L)]
[[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]
>>> L2 = ['apple', 'aardvark', 'banana', 'coconut', 'crow']
>>> [list(g) for i, g in itertools.groupby(L2, operator.itemgetter(0))]
[['apple', 'aardvark'], ['banana'], ['coconut', 'crow']]
You should use the itertools.groupby function from the standard library.
This function groups the elements in the iterable it receives (by default using the identity function, i.e., checking consequent elements for equality), and for each streak of grouped elements, it reutrns a 2-tuple consisting of the streak representative (the element itself), and an iterator of the elements within the streak.
Indeed:
l = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]
list(list(k[1]) for k in groupby(l))
>>> [[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]
P.S. you should avoid using list as a variable name, as it would conflict with the built-in type/function.
Here's a pretty simple roll your own solution. If the 'attribute' in question is simply the value of the item, there are more straightforward approaches.
def split_into_sublists(data_list, sizes_list):
if sum(sizes_list) != len(data_list):
raise ValueError
count = 0
output = []
for size in sizes_list:
output.append(data_list[count:count+size])
count += size
return output
if __name__ == '__main__':
data_list = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]
sizes_list = [3,1,4,2]
list2 = [[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]
print(split_into_sublists(data_list, sizes_list) == list2) # True

Combinations Including Select Elements (Python)

In order to make the set of all combinations of numbers 0 to x, with length y, we do:
list_of_combinations=list(combinations(range(0,x+1),y))
list_of_combinations=map(list,list_of_combinations)
print list_of_combinations
This will output the result as a list of lists.
For example, x=4, y=3:
[[0, 1, 2], [0, 1, 3], [0, 1, 4], [0, 2, 3], [0, 2, 4], [0, 3, 4], [1, 2, 3], [1, 2, 4],
[1, 3, 4], [2, 3, 4]]
I am trying to do the above, but only outputting lists that have 2 members chosen beforehand.
For instance, I would like to only output the set of the combos that has 1 and 4 inside it. The output would then be (for x=4, y=3):
[[0, 1, 4], [1, 2, 4], [1, 3, 4]]
The best approach I have now is to make a list that is y-2 length with all numbers of the set without the chosen numbers, and then append the chosen numbers, but this seems very inefficient. Any help appreciated.
*Edit: I am doing this for large x and y, so I can't just write out all the combos and then search for the selected elements, I need to find a better method.
combinations() returns an iterable, so loop over that while producing the list:
[list(combo) for combo in combinations(range(x + 1), y) if 1 in combo]
This produces one list, the list of all combinations that match the criteria.
Demo:
>>> from itertools import combinations
>>> x, y = 4, 3
>>> [list(combo) for combo in combinations(range(x + 1), y) if 1 in combo]
[[0, 1, 2], [0, 1, 3], [0, 1, 4], [1, 2, 3], [1, 2, 4], [1, 3, 4]]
The alternative would be to produce y - 1 combinations of range(x + 1) with 1 removed, then adding 1 back in (using bisect.insort() to avoid having to sort afterwards):
import bisect
def combinations_with_guaranteed(x, y, *guaranteed):
values = set(range(x + 1))
values.difference_update(guaranteed)
for combo in combinations(sorted(values), y - len(guaranteed)):
combo = list(combo)
for value in guaranteed:
bisect.insort(combo, value)
yield combo
then loop over that generator:
>>> list(combinations_with_guaranteed(4, 3, 1))
[[0, 1, 2], [0, 1, 3], [0, 1, 4], [1, 2, 3], [1, 2, 4], [1, 3, 4]]
>>> list(combinations_with_guaranteed(4, 3, 1, 2))
[[0, 1, 2], [1, 2, 3], [1, 2, 4]]
This won't produce as many combinations for filtering to discard again.
It may well be that for larger values of y and guaranteed numbers, just using yield sorted(combo + values) is going to beat repeated bisect.insort() calls.
This should do the trick:
filtered_list = filter(lambda x: 1 in x and 4 in x, list_of_combinations)
To make your code nicer (use more generators), I'd use this
combs = combinations(xrange(0, x+1), y)
filtered_list = map(list, filter(lambda x: 1 in x and 4 in x, combs))
If you don't need the filtered_list to be a list and it can be an iterable, you could even do
from itertools import ifilter, imap, combinations
combs = combinations(xrange(0, x+1), y)
filtered_list = imap(list, ifilter(lambda x: 1 in x and 4 in x, combs))
filtered_list.next()
> [0, 1, 4]
filtered_list.next()
> [1, 2, 4]
filtered_list.next()
> [1, 3, 4]
filtered_list.next()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> StopIteration

split a list by a lambda function in python

Is there any version of split that works on generic list types? For example, in Haskell
Prelude> import Data.List.Split
Prelude Data.List.Split> splitWhen (==2) [1, 2, 3]
[[1],[3]]
Nope. But you can use itertools.groupby() to mimic it.
>>> [list(x[1]) for x in itertools.groupby([1, 2, 3], lambda x: x == 2) if not x[0]]
[[1], [3]]
One more solution:
output = [[]]
valueToSplit = 2
data = [1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 2, 5, 2]
for i, val in enumerate(data):
if val == valueToSplit and i == len(data)-1:
break
output.append([]) if val == valueToSplit else output[-1].append(val)
print output # [[1], [3, 4, 1], [3, 4, 5, 6], [5]]
You can also create an iterator and use itertools.takewhile to include all matching items and discard the delimiter:
>>> import itertools
>>> l = [1, 2, 3]
>>> a = iter(l)
>>> [[_]+list(itertools.takewhile(lambda x: x!=2, a)) for _ in a]
[[1], [3]]
#TigerhawkT3 answer isn't flawlessly, but i can't add a comment
when l = [1, 2, 2, 2, 2, 3, 4]
output [[1], [2], [2, 3, 4]]
seem "wrong", but he inspired me
import itertools
l = [1, 2, 2, 2, 2, 3, 4]
a = iter(l)
output = [[_] + list(itertools.takewhile(lambda x: x != 2, a)) for _ in a if _ != 2]
output = [[1], [3, 4], [123, 123]]

Categories