I have a list in a for loop and it uses itertools.product() to find different combinations of letters. I want to use collections.Counter() to count the number of occurrences of an item, however, right now it prints all the different combinations of "A"'s and "G"'s:
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'g']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'a', 'G']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
# etc.
Now, this isn't all of them, but as you can see, there are some occurrences that are the same although ordered differently, for example:
['a', 'G', 'A', 'G']
['a', 'A', 'G', 'G']
I would much prefer the latter ordering, so I want to find a way to print all of the combinations with capital letters before lower case, and because 'a' is before 'g', also alphabetically. The final product should look like ['AaGG', 'aaGg', etc]. What function or functions should I use?
This is the code that generates the data. The section marked "Counting" is what I'm having trouble with.
import itertools
from collections import Counter
parent1 = 'aaGG'
parent2 = 'AaGg'
f1 = []
f1_ = []
genotypes = []
b = []
genetics = []
g = []
idx = []
parent1 = list(itertools.combinations(parent1, 2))
del parent1[0]
del parent1[4]
parent2 = list(itertools.combinations(parent2, 2))
del parent2[0]
del parent2[4]
for x in parent1:
f1.append(''.join(x))
for x in parent2:
f1_.append(''.join(x))
y = list(itertools.product(f1, f1_))
for x in y:
genotypes.append(''.join(x))
break
genotypes = [
thingies[0][0] + thingies[1][0] + thingies[0][1] + thingies[1][1]
for thingies in zip(parent1, parent2)
] * 4
print 'F1', Counter(genotypes)
# Counting
for genotype in genotypes:
alleles = list(itertools.combinations(genotype,2))
del alleles[1]
del alleles[3]
for x in alleles:
g.append(''.join(x))
for idx in g:
if idx.lower().count("a") == idx.lower().count("g") == 1:
break
f2 = list(itertools.product(g, g))
for x in f2:
genetics.append(''.join(x))
for genes in genetics:
if genes.lower().count("a") == genes.lower().count("g") == 2:
genes = ''.join(genes)
print Counter(genes)
I think you're looking for a customized way to define precedence; the lists are currently being ordered by ASCII numbering, which defines uppercase letters as always preceding lowercase letters. I would define customized precedence using a dictionary:
>>> test_list = ['a', 'A', 'g', 'G']
>>> precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
>>> test_list.sort(key=lambda x: precedence_dict[x])
>>> test_list
['A', 'a', 'G', 'g']
Edit:
Your last few lines:
for genes in genetics:
if genes.lower().count("a") == genes.lower().count("g") == 2:
genes = ''.join(genes)
print Counter(genes)
were not doing what you wanted them to.
Replace those lines with:
precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
for i in xrange(len(genetics)):
genetics[i] = list(genetics[i])
genetics[i].sort(key=lambda x: precedence_dict[x])
genetics[i] = ''.join(genetics[i])
from sets import Set
genetics = list(Set(genetics))
genetics.sort()
print genetics
and I think you have the correct solution. When iterating over elements in a for loop, Python makes a copy of the item. So the string 'genes' was actually not being modified in the original list.
I know you didn't ask for a code review, but you might be better off just generating the strings in the order you want in the first place instead of trying to filter them afterwards. Something like this might work.
def cross(parent1, parent2):
out = []
alleles = len(parent1)/2
# iterate parent 1 possible genotypes
for i in range(2):
# iterate loci
for k in range(alleles):
child = []
# iterate parent 2 possible genotypes
for j in range(2):
p1 = parent1[j * 2 + i]
p2 = parent2[j * 2 + k]
c = [p1, p2]
# get each genotype pair into capitalization order
c.sort()
c.reverse()
child += c
out.append("".join(child))
return out
if __name__ == "__main__":
parent1 = 'aaGG'
parent2 = 'AaGg'
# F1
f1 = cross(parent1, parent2)
print f1
# F2
f2 = []
for p1 in f1:
for p2 in f1:
f2 += cross(p1, p2)
print f2
Here's one way to get all combinations from a single parent. Start with the empty string and add the possibilities one by one.
def get_all_combos(allele_pair, gametes):
# Take a list of of genotypes. Return an updated list with each possibility from an allele pair
updated_gametes = []
for z in gametes:
updated_gametes.append(z + allele_pair[0])
updated_gametes.append(z + allele_pair[1])
return updated_gametes
if __name__ == "__main__":
parent1 = 'aaGG'
parent2 = 'AaGg'
alleles = len(parent2)/2
gametes = [""]
for a in range(alleles):
allele_pair = parent2[a*2:a*2+2]
gametes = get_all_combos(allele_pair, gametes)
print gametes
Maybe you can figure out how to combine these two solutions to get what you want.
you can try using the sort function.
Example of what I mean:
parent1 = "absdksakjcvjvugoh"
parent1sorted = list(parent1)
parent1sorted.sort()
print (parent1sorted)
The result you get is this : ['a', 'a', 'b', 'c', 'd', 'g', 'h', 'j', 'j', 'k', 'k', 'o', 's', 's', 'u', 'v', 'v']
Does this help you?
tldr:
Convert string into list, Sort list
Related
Say I have a list of options and I want to pick a certain number randomly.
In my case, say the options are in a list ['a', 'b', 'c', 'd', 'e'] and I want my script to return 3 elements.
However, there is also the case of two options that cannot appear at the same time. That is, if option 'a' is picked randomly, then option 'b' cannot be picked. And the same applies the other way round.
So valid outputs are: ['a', 'c', 'd'] or ['c', 'd', 'b'], while things like ['a', 'b', 'c'] would not because they contain both 'a' and 'b'.
To fulfil these requirements, I am fetching 3 options plus another one to compensate a possible discard. Then, I keep a set() with the mutually exclusive condition and keep removing from it and check if both elements have been picked or not:
import random
mutually_exclusive = set({'a', 'b'})
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
shuffled_options = random.sample(options, num_options_to_return + 1)
elements_returned = 0
for item in shuffled_options:
if elements_returned >= num_options_to_return:
break
if item in mutually_exclusive:
mutually_exclusive.remove(item)
if not mutually_exclusive:
# if both elements have appeared, then the set is empty so we cannot return the current value
continue
print(item)
elements_returned += 1
However, I may be overcoding and Python may have better ways to handle these requirements. Going through random's documentation I couldn't find ways to do this out of the box. Is there a better solution than my current one?
One way to do this is use itertools.combinations to create all of the possible results, filter out the invalid ones and make a random.choice from that:
>>> from itertools import combinations
>>> from random import choice
>>> def is_valid(t):
... return 'a' not in t or 'b' not in t
...
>>> choice([
... t
... for t in combinations('abcde', 3)
... if is_valid(t)
... ])
...
('c', 'd', 'e')
Maybe a bit naive, but you could generate samples until your condition is met:
import random
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
mutually_exclusive = set({'a', 'b'})
while True:
shuffled_options = random.sample(options, num_options_to_return)
if all (item not in mutually_exclusive for item in shuffled_options):
break
print(shuffled_options)
You can restructure your options.
import random
options = [('a', 'b'), 'c', 'd', 'e']
n_options = 3
selected_option = random.sample(options, n_options)
result = [item if not isinstance(item, tuple) else random.choice(item)
for item in selected_option]
print(result)
I would implement it with sets:
import random
mutually_exclusive = {'a', 'b'}
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
while True:
s = random.sample(options, num_options_to_return)
print('Sample is', s)
if not mutually_exclusive.issubset(s):
break
print('Discard!')
print('Final sample:', s)
Prints (for example):
Sample is ['a', 'b', 'd']
Discard!
Sample is ['b', 'a', 'd']
Discard!
Sample is ['e', 'a', 'c']
Final sample: ['e', 'a', 'c']
I created the below function and I think it's worth sharing it too ;-)
def random_picker(options, n, mutually_exclusives=None):
if mutually_exclusives is None:
return random.sample(options, n)
elif any(len(pair) != 2 for pair in mutually_exclusives):
raise ValueError('Lenght of pairs of mutually_exclusives iterable, must be 2')
res = []
while len(res) < n:
item_index = random.randint(0, len(options) - 1)
item = options[item_index]
if any(item == exc and pair[-(i - 1)] in res for pair in mutually_exclusives
for i, exc in enumerate(pair)):
continue
res.append(options.pop(item_index))
return res
Where:
options is the list of available options to pick from.
n is the number of items you want to be picked from options
mutually_exclusives is an iterable containing tuples pairs of mutually exclusive items
You can use it as follows:
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3)
['c', 'e', 'a']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b')])
['d', 'b', 'e']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b'), ('a', 'c')])
['e', 'd', 'a']
import random
l = [['a','b'], ['c'], ['d'], ['e']]
x = [random.choice(i) for i in random.sample(l,3)]
here l is a two-dimensional list, where the fist level reflects an and relation and the second level an or relation.
I am a beginner at programing and I'm trying to figure out how list methods work. I wrote a tiny string scrambler and decoder for exercise purposes.
import random
sliced = []
keyholder = []
scrambled = []
decoded = []
def string_slicer(string):
i = 0
while i < len(string):
sliced.append(string[i])
i += 1
def string_scrambler(string):
string = string_slicer(string)
a = 0
while len(scrambled) != len(sliced):
value = len(sliced) - 1
key = random.randint(0,value)
if key in keyholder:
continue
else:
scrambled.append(sliced[key])
keyholder.append(key)
continue
def string_decoder():
x = 0
for item in keyholder:
decoded.insert(keyholder[x], scrambled[x])
x += 1
string_scrambler('merhaba')
string_decoder()
print sliced
print keyholder
print scrambled
print decoded
When i'm testing it the string_scrambler() works fine but the string_decoder() gives random results. Here are some examples:
C:\Python27\Exercises>python scrambler.py
['m', 'e', 'r', 'h', 'a', 'b', 'a']
[2, 6, 0, 1, 3, 5, 4]
['r', 'a', 'm', 'e', 'h', 'b', 'a']
['m', 'e', 'r', 'h', 'a', 'a', 'b']
C:\Python27\Exercises>python scrambler.py
['m', 'e', 'r', 'h', 'a', 'b', 'a']
[4, 5, 1, 0, 3, 2, 6]
['a', 'b', 'e', 'm', 'h', 'r', 'a']
['m', 'a', 'r', 'e', 'h', 'b', 'a']
C:\Python27\Exercises>python scrambler.py
['m', 'e', 'r', 'h', 'a', 'b', 'a']
[1, 4, 5, 2, 3, 0, 6]
['e', 'a', 'b', 'r', 'h', 'm', 'a']
['m', 'e', 'a', 'r', 'h', 'b', 'a']
I think trying to add some items in an empty list with .insert method may cause this problem. But i can't figure out exactly why.
Note that a lot of your functions aren't necessary at all.
>>> list("some string")
["s", "o", "m", "e", " ", "s", "t", "r", "i", "n", "g"]`
# just like your `string_slicer` function.
Notably the problem with your approach is that you might try to do, for instance:
>>> lst = []
>>> lst.insert("after", 3)
>>> lst.insert("before", 2)
>>> lst
["after", "before"]
Since the list is initially length zero, inserting past the end point just sends it to the end of the list. Even though 3 is a more distant index than 2, it doesn't order correctly since essentially you've done
lst.append("after")
lst.append("before")
Instead, you could do something like:
scrambled = [''] * len(sliced)
# build a list of the same length as the cleartext sliced string
for idx, dest in enumerate(keyholder):
scrambled[dest] = sliced[idx]
Then to descramble, do the opposite
deciphered = [''] * len(scrambled)
for idx, dest in enumerate(keyholder):
deciphered[idx] = scrambled[dest]
The full solution I'd use, including some other tricks, is:
import random
def make_key(lst):
return random.shuffle(range(len(lst)))
def scramble(lst, key):
result = [''] * len(lst)
for idx, dst in enumerate(key):
result[dst] = lst[idx]
return result
def unscramble(scrambled, key):
return [scrambled[idx] for idx in key]
s = "merhaba"
key = make_key(list(s))
scrambled = scramble(list(s), key)
deciphered = unscramble(scrambled, key)
print(list(s))
print(key)
print(scrambled)
print(deciphered)
N.B. this removes every single list method you were trying to learn in the first place! You should notice this, because it's indicative of the fact that list methods are slow (with the exception of append and pop), and you should probably avoid using them if another equally-readable solution exists.
I think it's better to use not list as a data structure for decoded.
I'd use dict as a temporary variable, here is my version of string_decoder:
_decoded = dict() # changed
def string_decoder():
x = 0
for item in keyholder:
_decoded[keyholder[x]] = scrambled[x] #changed
x += 1
return [value for key, value in sorted(_decoded.items())] #changed
decoded = string_decoder()
BTW, you have issues with list.insert() because you insert values to a list to the position which is absent, e.g. adding 4th item to a list of 2 elements.
Example of the behavior:
>>> decoded = []
>>> decoded.insert(100, 'b')
>>['b']
>>> decoded.insert(99, 'a')
>>> decoded
['b', 'a'] # according to your code, you expect ['a', 'b'] because 99 is less than 100, but the list has not enough entries. So, the item is just appended to the end
I'm trying to write program that will find all palindromes in a word. For example word "radar" has got 2 palindromes radar and ada. We skip single letters, so r, a, d, etc. aren't palindromes.
import copy
def ILEP(word):
lista = list(word)
counter = 0
pali = []
def isPalindrome(listaWord):
backup = copy.deepcopy(listaWord)
backup.reverse()
a = ''.join(backup)
b = ''.join(listaWord)
if(a == b):
return True
else:
return False
for i in range(len(lista)):
current = [lista[i]]
for j in range(i+1, len(lista)):
current.append(lista[j])
if(isPalindrome(current)):
print(current)
pali.append(current)
counter+=1
print(pali)
return counter
print(ILEP("radar"))
The program is finding all palindromes correctly, but it assings them wrong to the list pali. Console:
['r', 'a', 'd', 'a', 'r']
['a', 'd', 'a']
[['r', 'a', 'd', 'a', 'r'], ['a', 'd', 'a', 'r']]
2
As U can see it prints palindromes ['r', 'a', 'd', 'a', 'r'] and ['a', 'd', 'a'], but the list pali has got wrong value [['r', 'a', 'd', 'a', 'r'], ['a', 'd', 'a', 'r']]
You are changing your list current after appending to pali. You have to make a copy:
def is_palindrome(word):
return word[::-1] == word
def ILEP(word):
pali = []
for i, ch in enumerate(word):
current = [ch]
for ch in word[i+1:]:
current.append(ch)
if is_palindrome(current):
print(current)
pali.append(current[:])
print(pali)
return len(pali)
print(ILEP("radar"))
I wrote a function to create a nested list.
For example:
input= ['a','b','c','','d','e','f','g','','d','s','d','a','']
I want to create a sublist before ''
As a return I want a nested list like:
[['a','b','c'],['d','e','f','g'],['d','s','d','a']]
Try the following implementation
>>> def foo(inlist, delim = ''):
start = 0
try:
while True:
stop = inlist.index(delim, start)
yield inlist[start:stop]
start = stop + 1
except ValueError:
# if '' may not be the end delimiter
if start < len(inlist):
yield inlist[start:]
return
>>> list(foo(inlist))
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
Another possible implementation could be by itertools.groupby. But then you have to filter the result to remove the ['']. But though it might look to be one-liner yet the above implementation is more pythonic as its intuitive and readable
>>> from itertools import ifilter, groupby
>>> list(ifilter(lambda e: '' not in e,
(list(v) for k,v in groupby(inlist, key = lambda e:e == ''))))
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
I'd use itertools.groupby:
l = ['a','b','c','','d','e','f','g','','d','s','d','a','']
from itertools import groupby
[list(g) for k, g in groupby(l, bool) if k]
gives
[['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['d', 's', 'd', 'a']]
def nester(nput):
out = [[]]
for n in nput:
if n == '':
out.append([])
else:
out[-1].append(n)
if out[-1] == []:
out = out[:-1]
return out
edited to add check for empty list at end
I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. For example, if the delimiter is 'X', then the following list:
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
Would turn into:
[['a', 'b'], ['c', 'd'], ['f', 'g']]
Notice that the last set is not split.
I've written some ugly code that does this, but I'm sure there is something nicer. Extra points if you can set an arbitrary length delimiter (i.e. split the list after seeing N delimiters).
I don't think there's going to be a nice, elegant solution to this (I'd love to be proven wrong of course) so I would suggest something straightforward:
def nSplit(lst, delim, count=2):
output = [[]]
delimCount = 0
for item in lst:
if item == delim:
delimCount += 1
elif delimCount >= count:
output.append([item])
delimCount = 0
else:
output[-1].append(item)
delimCount = 0
return output
>>> nSplit(['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], 'X', 2)
[['a', 'b'], ['c', 'd'], ['f', 'g']]
Here's a way to do it with itertools.groupby():
import itertools
class MultiDelimiterKeyCallable(object):
def __init__(self, delimiter, num_wanted=1):
self.delimiter = delimiter
self.num_wanted = num_wanted
self.num_found = 0
def __call__(self, value):
if value == self.delimiter:
self.num_found += 1
if self.num_found >= self.num_wanted:
self.num_found = 0
return True
else:
self.num_found = 0
def split_multi_delimiter(items, delimiter, num_wanted):
keyfunc = MultiDelimiterKeyCallable(delimiter, num_wanted)
return (list(item
for item in group
if item != delimiter)
for key, group in itertools.groupby(items, keyfunc)
if not key)
items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
print list(split_multi_delimiter(items, "X", 2))
I must say that cobbal's solution is much simpler for the same results.
Use a generator function to maintain state of your iterator through the list, and the count of the number of separator chars seen so far:
l = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
def splitOn(ll, x, n):
cur = []
splitcount = 0
for c in ll:
if c == x:
splitcount += 1
if splitcount == n:
yield cur
cur = []
splitcount = 0
else:
cur.append(c)
splitcount = 0
yield cur
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))
l += ['X','X']
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))
prints:
[['a', 'b'], ['c', 'd'], ['f', 'g']]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g']]
[['a', 'b', 'c', 'd', 'f', 'g']]
[['a', 'b'], ['c', 'd'], ['f', 'g'], []]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g'], [], []]
[['a', 'b', 'c', 'd', 'f', 'g']]
EDIT: I'm also a big fan of groupby, here's my go at it:
from itertools import groupby
def splitOn(ll, x, n):
cur = []
for isdelim,grp in groupby(ll, key=lambda c:c==x):
if isdelim:
nn = sum(1 for c in grp)
while nn >= n:
yield cur
cur = []
nn -= n
else:
cur.extend(grp)
yield cur
Not too different from my earlier answer, just lets groupby take care of iterating over the input list, creating groups of delimiter-matching and not-delimiter-matching characters. The non-matching characters just get added onto the current element, the matching character groups do the work of breaking up new elements. For long lists, this is probably a bit more efficient, as groupby does all its work in C, and still only iterates over the list once.
a = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
b = [[b for b in q if b != 'X'] for q in "".join(a).split("".join(['X' for i in range(2)]))]
this gives
[['a', 'b'], ['c', 'd'], ['f', 'g']]
where the 2 is the number of elements you want. there is most likely a better way to do this.
Very ugly, but I wanted to see if I could pull this off as a one-liner and I thought I would share. I beg you not to actually use this solution for anything of any importance though. The ('X', 3) at the end is the delimiter and the number of times it should be repeated.
(lambda delim, count: map(lambda x:filter(lambda y:y != delim, x), reduce(lambda x, y: (x[-1].append(y) if y != delim or x[-1][-count+1:] != [y]*(count-1) else x.append([])) or x, ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])))('X', 2)
EDIT
Here's a breakdown. I also eliminated some redundant code that was far more obvious when written out like this. (changed above also)
# Wrap everything in a lambda form to avoid repeating values
(lambda delim, count:
# Filter all sublists after construction
map(lambda x: filter(lambda y: y != delim, x), reduce(
lambda x, y: (
# Add the value to the current sub-list
x[-1].append(y) if
# but only if we have accumulated the
# specified number of delimiters
y != delim or x[-1][-count+1:] != [y]*(count-1) else
# Start a new sublist
x.append([]) or x,
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])
)
)('X', 2)
Here's a clean nice solution using zip and generators
#1 define traditional sequence split function
#if you only want it for lists, you can use indexing to make it shorter
def split(it, x):
to_yield = []
for y in it:
if x == y:
yield to_yield
to_yield = []
else:
to_yield.append(y)
if to_yield:
yield to_yield
#2 zip the sequence with its tail
#you could use itertools.chain to avoid creating unnecessary lists
zipped = zip(l, l[1:] + [''])
#3. remove ('X',not 'X')'s from the resulting sequence, and leave only the first position of each
# you can use list comprehension instead of generator expression
filtered = (x for x,y in zipped if not (x == 'X' and y != 'X'))
#4. split the result using traditional split
result = [x for x in split(filtered, 'X')]
This way split() is more reusable.
It's surprising python doesn't have one built in.
edit:
You can easily adjust it for longer split sequences, repeating steps 2-3 and zipping filtered with l[i:] for 0< i <= n.
import re
map(list, re.sub('(?<=[a-z])X(?=[a-z])', '', ''.join(lst)).split('XX'))
This does a list -> string -> list conversion and assumes that the non-delimiter characters are all lower case letters.
Here's another way of doing this:
def split_multi_delimiter(items, delimiter, num_wanted):
def remove_delimiter(objs):
return [obj for obj in objs if obj != delimiter]
ranges = [(index, index+num_wanted) for index in xrange(len(items))
if items[index:index+num_wanted] == [delimiter] * num_wanted]
last_end = 0
for range_start, range_end in ranges:
yield remove_delimiter(items[last_end:range_start])
last_end = range_end
yield remove_delimiter(items[last_end:])
items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
print list(split_multi_delimiter(items, "X", 2))
In [6]: input = ['a', 'b', 'X', 'X', 'cc', 'XX', 'd', 'X', 'ee', 'X', 'X', 'f']
In [7]: [s.strip('_').split('_') for s in '_'.join(input).split('X_X')]
Out[7]: [['a', 'b'], ['cc', 'XX', 'd', 'X', 'ee'], ['f']]
This assumes you can use a reserved character such as _ which is not found in the input.
Too clever by half, and only offered because the obvious right way to do it seems so brute-force and ugly:
class joiner(object):
def __init__(self, N, data = (), gluing = False):
self.data = data
self.N = N
self.gluing = gluing
def __add__(self, to_glue):
# Process an item from itertools.groupby, by either
# appending the data to the last item, starting a new item,
# or changing the 'gluing' state according to the number of
# consecutive delimiters that were found.
N = self.N
data = self.data
item = list(to_glue[1])
# A chunk of delimiters;
# return a copy of self with the appropriate gluing state.
if to_glue[0]: return joiner(N, data, len(item) < N)
# Otherwise, handle the gluing appropriately, and reset gluing state.
a, b = (data[:-1], data[-1] if data else []) if self.gluing else (data, [])
return joiner(N, a + (b + item,))
def split_on_multiple(data, delimiter, N):
# Split the list into alternating groups of delimiters and non-delimiters,
# then use the joiner to join non-delimiter groups when the intervening
# delimiter group is short.
return sum(itertools.groupby(data, delimiter.__eq__), joiner(N)).data
Regex, I choose you!
import re
def split_multiple(delimiter, input):
pattern = ''.join(map(lambda x: ',' if x == delimiter else ' ', input))
filtered = filter(lambda x: x != delimiter, input)
result = []
for k in map(len, re.split(';', ''.join(re.split(',',
';'.join(re.split(',{2,}', pattern)))))):
result.append([])
for n in range(k):
result[-1].append(filtered.__next__())
return result
print(split_multiple('X',
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']))
Oh, you said Python, not Perl.