I am a beginner at programing and I'm trying to figure out how list methods work. I wrote a tiny string scrambler and decoder for exercise purposes.
import random
sliced = []
keyholder = []
scrambled = []
decoded = []
def string_slicer(string):
i = 0
while i < len(string):
sliced.append(string[i])
i += 1
def string_scrambler(string):
string = string_slicer(string)
a = 0
while len(scrambled) != len(sliced):
value = len(sliced) - 1
key = random.randint(0,value)
if key in keyholder:
continue
else:
scrambled.append(sliced[key])
keyholder.append(key)
continue
def string_decoder():
x = 0
for item in keyholder:
decoded.insert(keyholder[x], scrambled[x])
x += 1
string_scrambler('merhaba')
string_decoder()
print sliced
print keyholder
print scrambled
print decoded
When i'm testing it the string_scrambler() works fine but the string_decoder() gives random results. Here are some examples:
C:\Python27\Exercises>python scrambler.py
['m', 'e', 'r', 'h', 'a', 'b', 'a']
[2, 6, 0, 1, 3, 5, 4]
['r', 'a', 'm', 'e', 'h', 'b', 'a']
['m', 'e', 'r', 'h', 'a', 'a', 'b']
C:\Python27\Exercises>python scrambler.py
['m', 'e', 'r', 'h', 'a', 'b', 'a']
[4, 5, 1, 0, 3, 2, 6]
['a', 'b', 'e', 'm', 'h', 'r', 'a']
['m', 'a', 'r', 'e', 'h', 'b', 'a']
C:\Python27\Exercises>python scrambler.py
['m', 'e', 'r', 'h', 'a', 'b', 'a']
[1, 4, 5, 2, 3, 0, 6]
['e', 'a', 'b', 'r', 'h', 'm', 'a']
['m', 'e', 'a', 'r', 'h', 'b', 'a']
I think trying to add some items in an empty list with .insert method may cause this problem. But i can't figure out exactly why.
Note that a lot of your functions aren't necessary at all.
>>> list("some string")
["s", "o", "m", "e", " ", "s", "t", "r", "i", "n", "g"]`
# just like your `string_slicer` function.
Notably the problem with your approach is that you might try to do, for instance:
>>> lst = []
>>> lst.insert("after", 3)
>>> lst.insert("before", 2)
>>> lst
["after", "before"]
Since the list is initially length zero, inserting past the end point just sends it to the end of the list. Even though 3 is a more distant index than 2, it doesn't order correctly since essentially you've done
lst.append("after")
lst.append("before")
Instead, you could do something like:
scrambled = [''] * len(sliced)
# build a list of the same length as the cleartext sliced string
for idx, dest in enumerate(keyholder):
scrambled[dest] = sliced[idx]
Then to descramble, do the opposite
deciphered = [''] * len(scrambled)
for idx, dest in enumerate(keyholder):
deciphered[idx] = scrambled[dest]
The full solution I'd use, including some other tricks, is:
import random
def make_key(lst):
return random.shuffle(range(len(lst)))
def scramble(lst, key):
result = [''] * len(lst)
for idx, dst in enumerate(key):
result[dst] = lst[idx]
return result
def unscramble(scrambled, key):
return [scrambled[idx] for idx in key]
s = "merhaba"
key = make_key(list(s))
scrambled = scramble(list(s), key)
deciphered = unscramble(scrambled, key)
print(list(s))
print(key)
print(scrambled)
print(deciphered)
N.B. this removes every single list method you were trying to learn in the first place! You should notice this, because it's indicative of the fact that list methods are slow (with the exception of append and pop), and you should probably avoid using them if another equally-readable solution exists.
I think it's better to use not list as a data structure for decoded.
I'd use dict as a temporary variable, here is my version of string_decoder:
_decoded = dict() # changed
def string_decoder():
x = 0
for item in keyholder:
_decoded[keyholder[x]] = scrambled[x] #changed
x += 1
return [value for key, value in sorted(_decoded.items())] #changed
decoded = string_decoder()
BTW, you have issues with list.insert() because you insert values to a list to the position which is absent, e.g. adding 4th item to a list of 2 elements.
Example of the behavior:
>>> decoded = []
>>> decoded.insert(100, 'b')
>>['b']
>>> decoded.insert(99, 'a')
>>> decoded
['b', 'a'] # according to your code, you expect ['a', 'b'] because 99 is less than 100, but the list has not enough entries. So, the item is just appended to the end
Related
Say I have a list of options and I want to pick a certain number randomly.
In my case, say the options are in a list ['a', 'b', 'c', 'd', 'e'] and I want my script to return 3 elements.
However, there is also the case of two options that cannot appear at the same time. That is, if option 'a' is picked randomly, then option 'b' cannot be picked. And the same applies the other way round.
So valid outputs are: ['a', 'c', 'd'] or ['c', 'd', 'b'], while things like ['a', 'b', 'c'] would not because they contain both 'a' and 'b'.
To fulfil these requirements, I am fetching 3 options plus another one to compensate a possible discard. Then, I keep a set() with the mutually exclusive condition and keep removing from it and check if both elements have been picked or not:
import random
mutually_exclusive = set({'a', 'b'})
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
shuffled_options = random.sample(options, num_options_to_return + 1)
elements_returned = 0
for item in shuffled_options:
if elements_returned >= num_options_to_return:
break
if item in mutually_exclusive:
mutually_exclusive.remove(item)
if not mutually_exclusive:
# if both elements have appeared, then the set is empty so we cannot return the current value
continue
print(item)
elements_returned += 1
However, I may be overcoding and Python may have better ways to handle these requirements. Going through random's documentation I couldn't find ways to do this out of the box. Is there a better solution than my current one?
One way to do this is use itertools.combinations to create all of the possible results, filter out the invalid ones and make a random.choice from that:
>>> from itertools import combinations
>>> from random import choice
>>> def is_valid(t):
... return 'a' not in t or 'b' not in t
...
>>> choice([
... t
... for t in combinations('abcde', 3)
... if is_valid(t)
... ])
...
('c', 'd', 'e')
Maybe a bit naive, but you could generate samples until your condition is met:
import random
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
mutually_exclusive = set({'a', 'b'})
while True:
shuffled_options = random.sample(options, num_options_to_return)
if all (item not in mutually_exclusive for item in shuffled_options):
break
print(shuffled_options)
You can restructure your options.
import random
options = [('a', 'b'), 'c', 'd', 'e']
n_options = 3
selected_option = random.sample(options, n_options)
result = [item if not isinstance(item, tuple) else random.choice(item)
for item in selected_option]
print(result)
I would implement it with sets:
import random
mutually_exclusive = {'a', 'b'}
options = ['a', 'b', 'c', 'd', 'e']
num_options_to_return = 3
while True:
s = random.sample(options, num_options_to_return)
print('Sample is', s)
if not mutually_exclusive.issubset(s):
break
print('Discard!')
print('Final sample:', s)
Prints (for example):
Sample is ['a', 'b', 'd']
Discard!
Sample is ['b', 'a', 'd']
Discard!
Sample is ['e', 'a', 'c']
Final sample: ['e', 'a', 'c']
I created the below function and I think it's worth sharing it too ;-)
def random_picker(options, n, mutually_exclusives=None):
if mutually_exclusives is None:
return random.sample(options, n)
elif any(len(pair) != 2 for pair in mutually_exclusives):
raise ValueError('Lenght of pairs of mutually_exclusives iterable, must be 2')
res = []
while len(res) < n:
item_index = random.randint(0, len(options) - 1)
item = options[item_index]
if any(item == exc and pair[-(i - 1)] in res for pair in mutually_exclusives
for i, exc in enumerate(pair)):
continue
res.append(options.pop(item_index))
return res
Where:
options is the list of available options to pick from.
n is the number of items you want to be picked from options
mutually_exclusives is an iterable containing tuples pairs of mutually exclusive items
You can use it as follows:
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3)
['c', 'e', 'a']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b')])
['d', 'b', 'e']
>>> random_picker(['a', 'b', 'c', 'd', 'e'], 3, [('a', 'b'), ('a', 'c')])
['e', 'd', 'a']
import random
l = [['a','b'], ['c'], ['d'], ['e']]
x = [random.choice(i) for i in random.sample(l,3)]
here l is a two-dimensional list, where the fist level reflects an and relation and the second level an or relation.
I am trying to understand the process of creating a function that can replace duplicate strings in a list of strings. for example, I want to convert this list
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
to this
mylist = ['a', 'b', 'x', 'x', 'c', 'x']
initially, I know I need create my function and iterate through the list
def replace(foo):
newlist= []
for i in foo:
if foo[i] == foo[i+1]:
foo[i].replace('x')
return foo
However, I know there are two problems with this. the first is that I get an error stating
list indices must be integers or slices, not str
so I believe I should instead be operating on the range of this list, but I'm not sure how to implement it. The other being that this would only help me if the duplicate letter comes directly after my iteration (i).
Unfortunately, that's as far as my understanding of the problem reaches. If anyone can provide some clarification on this procedure for me, I would be very grateful.
Go through the list, and keep track of what you've seen in a set. Replace things you've seen before in the list with 'x':
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
seen = set()
for i, e in enumerate(mylist):
if e in seen:
mylist[i] = 'x'
else:
seen.add(e)
print(mylist)
# ['a', 'b', 'x', 'x', 'c', 'x']
Simple Solution.
my_list = ['a', 'b', 'b', 'a', 'c', 'a']
new_list = []
for i in range(len(my_list)):
if my_list[i] in new_list:
new_list.append('x')
else:
new_list.append(my_list[i])
print(my_list)
print(new_list)
# output
#['a', 'b', 'b', 'a', 'c', 'a']
#['a', 'b', 'x', 'x', 'c', 'x']
The other solutions use indexing, which isn't necessarily required.
Really simply, you could check if the value is in the new list, else you can append x. If you wanted to use a function:
old = ['a', 'b', 'b', 'a', 'c']
def replace_dupes_with_x(l):
tmp = list()
for char in l:
if char in tmp:
tmp.append('x')
else:
tmp.append(char)
return tmp
new = replace_dupes_with_x(old)
You can use the following solution:
from collections import defaultdict
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
ret, appear = [], defaultdict(int)
for c in mylist:
appear[c] += 1
ret.append(c if appear[c] == 1 else 'x')
Which will give you:
['a', 'b', 'x', 'x', 'c', 'x']
I have two arrays of 5 objects
a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
b = ['a', 'b', 'd', 'f', 'e', 'f']
I would like to identify the repeated patterns of more than one object and their occurrences like
['a', 'b']: 2
['e', 'f']: 3
['f', 'e', 'f']: 2
The first sequence ['a', 'b'] appeared once in a and once in b, so total count 2. The 2nd sequence ['e', 'f'] appeared twice in a, once in b, so total 3. The 3rd sequence ['f', 'e', 'f'] appeared once in a, and once in b, so total 2.
Is there a good way to do this in Python?
Also the universe of objects is limited. Was wondering if there's an efficient solution that utilizes hash table?
If the approach is only for two lists, the following approach should work. I am not sure if this is the most efficient solution though.
A nice description of find n-grams is given in this blog post.
This approach provides the min length and determines the max length that a repeating sequence of a list might have (at most half the length of the list).
We then find all the sequences for each of the lists by combining the sequences for individual lists. Then we have a counter of every sequence and its count.
Finally we return a dictionary of all the sequences that occur more than once.
def find_repeating(list_a, list_b):
min_len = 2
def find_ngrams(input_list, n):
return zip(*[input_list[i:] for i in range(n)])
seq_list_a = []
for seq_len in range(min_len, len(list_a) + 1):
seq_list_a += [val for val in find_ngrams(list_a, seq_len)]
seq_list_b = []
for seq_len in range(min_len, len(list_b) + 1):
seq_list_b += [val for val in find_ngrams(list_b, seq_len)]
all_sequences = seq_list_a + seq_list_b
counter = {}
for seq in all_sequences:
counter[seq] = counter.get(seq, 0) + 1
filtered_counter = {k: v for k, v in counter.items() if v > 1}
return filtered_counter
Do let me know if you are unsure about anything.
>>> list_a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
>>> list_b = ['a', 'b', 'd', 'f', 'e', 'f']
>>> print find_repeating(list_a, list_b)
{('f', 'e'): 2, ('e', 'f'): 3, ('f', 'e', 'f'): 2, ('a', 'b'): 2}
When you mentioned that you were looking for an efficient solution, my first thought was of the approaches to solving the longest common subsequence problem. But in your case, we actually do need to enumerate all common subsequences so that we can count them, so a dynamic programming solution will not do. Here's my solution. It's certainly shorter than SSSINISTER's solution (mostly because I use the collections.Counter class).
#!/usr/bin/env python3
def find_repeating(sequence_a, sequence_b, min_len=2):
from collections import Counter
# Find all subsequences
subseq_a = [tuple(sequence_a[start:stop]) for start in range(len(sequence_a)-min_len+1)
for stop in range(start+min_len,len(sequence_a)+1)]
subseq_b = [tuple(sequence_b[start:stop]) for start in range(len(sequence_b)-min_len+1)
for stop in range(start+min_len,len(sequence_b)+1)]
# Find common subsequences
common = set(tup for tup in subseq_a if tup in subseq_b)
# Count common subsequences
return Counter(tup for tup in (subseq_a + subseq_b) if tup in common)
Resulting in ...
>>> list_a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
>>> list_b = ['a', 'b', 'd', 'f', 'e', 'f']
>>> print(find_repeating(list_a, list_b))
Counter({('e', 'f'): 3, ('f', 'e'): 2, ('a', 'b'): 2, ('f', 'e', 'f'): 2})
The advantage to using collections.Counter is that not only do you not need to produce the actual code to iterate and count, you get access to all of the dict methods as well as a few specialized methods for using those counts.
I have a list in a for loop and it uses itertools.product() to find different combinations of letters. I want to use collections.Counter() to count the number of occurrences of an item, however, right now it prints all the different combinations of "A"'s and "G"'s:
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'g']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'a', 'G']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
# etc.
Now, this isn't all of them, but as you can see, there are some occurrences that are the same although ordered differently, for example:
['a', 'G', 'A', 'G']
['a', 'A', 'G', 'G']
I would much prefer the latter ordering, so I want to find a way to print all of the combinations with capital letters before lower case, and because 'a' is before 'g', also alphabetically. The final product should look like ['AaGG', 'aaGg', etc]. What function or functions should I use?
This is the code that generates the data. The section marked "Counting" is what I'm having trouble with.
import itertools
from collections import Counter
parent1 = 'aaGG'
parent2 = 'AaGg'
f1 = []
f1_ = []
genotypes = []
b = []
genetics = []
g = []
idx = []
parent1 = list(itertools.combinations(parent1, 2))
del parent1[0]
del parent1[4]
parent2 = list(itertools.combinations(parent2, 2))
del parent2[0]
del parent2[4]
for x in parent1:
f1.append(''.join(x))
for x in parent2:
f1_.append(''.join(x))
y = list(itertools.product(f1, f1_))
for x in y:
genotypes.append(''.join(x))
break
genotypes = [
thingies[0][0] + thingies[1][0] + thingies[0][1] + thingies[1][1]
for thingies in zip(parent1, parent2)
] * 4
print 'F1', Counter(genotypes)
# Counting
for genotype in genotypes:
alleles = list(itertools.combinations(genotype,2))
del alleles[1]
del alleles[3]
for x in alleles:
g.append(''.join(x))
for idx in g:
if idx.lower().count("a") == idx.lower().count("g") == 1:
break
f2 = list(itertools.product(g, g))
for x in f2:
genetics.append(''.join(x))
for genes in genetics:
if genes.lower().count("a") == genes.lower().count("g") == 2:
genes = ''.join(genes)
print Counter(genes)
I think you're looking for a customized way to define precedence; the lists are currently being ordered by ASCII numbering, which defines uppercase letters as always preceding lowercase letters. I would define customized precedence using a dictionary:
>>> test_list = ['a', 'A', 'g', 'G']
>>> precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
>>> test_list.sort(key=lambda x: precedence_dict[x])
>>> test_list
['A', 'a', 'G', 'g']
Edit:
Your last few lines:
for genes in genetics:
if genes.lower().count("a") == genes.lower().count("g") == 2:
genes = ''.join(genes)
print Counter(genes)
were not doing what you wanted them to.
Replace those lines with:
precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
for i in xrange(len(genetics)):
genetics[i] = list(genetics[i])
genetics[i].sort(key=lambda x: precedence_dict[x])
genetics[i] = ''.join(genetics[i])
from sets import Set
genetics = list(Set(genetics))
genetics.sort()
print genetics
and I think you have the correct solution. When iterating over elements in a for loop, Python makes a copy of the item. So the string 'genes' was actually not being modified in the original list.
I know you didn't ask for a code review, but you might be better off just generating the strings in the order you want in the first place instead of trying to filter them afterwards. Something like this might work.
def cross(parent1, parent2):
out = []
alleles = len(parent1)/2
# iterate parent 1 possible genotypes
for i in range(2):
# iterate loci
for k in range(alleles):
child = []
# iterate parent 2 possible genotypes
for j in range(2):
p1 = parent1[j * 2 + i]
p2 = parent2[j * 2 + k]
c = [p1, p2]
# get each genotype pair into capitalization order
c.sort()
c.reverse()
child += c
out.append("".join(child))
return out
if __name__ == "__main__":
parent1 = 'aaGG'
parent2 = 'AaGg'
# F1
f1 = cross(parent1, parent2)
print f1
# F2
f2 = []
for p1 in f1:
for p2 in f1:
f2 += cross(p1, p2)
print f2
Here's one way to get all combinations from a single parent. Start with the empty string and add the possibilities one by one.
def get_all_combos(allele_pair, gametes):
# Take a list of of genotypes. Return an updated list with each possibility from an allele pair
updated_gametes = []
for z in gametes:
updated_gametes.append(z + allele_pair[0])
updated_gametes.append(z + allele_pair[1])
return updated_gametes
if __name__ == "__main__":
parent1 = 'aaGG'
parent2 = 'AaGg'
alleles = len(parent2)/2
gametes = [""]
for a in range(alleles):
allele_pair = parent2[a*2:a*2+2]
gametes = get_all_combos(allele_pair, gametes)
print gametes
Maybe you can figure out how to combine these two solutions to get what you want.
you can try using the sort function.
Example of what I mean:
parent1 = "absdksakjcvjvugoh"
parent1sorted = list(parent1)
parent1sorted.sort()
print (parent1sorted)
The result you get is this : ['a', 'a', 'b', 'c', 'd', 'g', 'h', 'j', 'j', 'k', 'k', 'o', 's', 's', 'u', 'v', 'v']
Does this help you?
tldr:
Convert string into list, Sort list
I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. For example, if the delimiter is 'X', then the following list:
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
Would turn into:
[['a', 'b'], ['c', 'd'], ['f', 'g']]
Notice that the last set is not split.
I've written some ugly code that does this, but I'm sure there is something nicer. Extra points if you can set an arbitrary length delimiter (i.e. split the list after seeing N delimiters).
I don't think there's going to be a nice, elegant solution to this (I'd love to be proven wrong of course) so I would suggest something straightforward:
def nSplit(lst, delim, count=2):
output = [[]]
delimCount = 0
for item in lst:
if item == delim:
delimCount += 1
elif delimCount >= count:
output.append([item])
delimCount = 0
else:
output[-1].append(item)
delimCount = 0
return output
>>> nSplit(['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], 'X', 2)
[['a', 'b'], ['c', 'd'], ['f', 'g']]
Here's a way to do it with itertools.groupby():
import itertools
class MultiDelimiterKeyCallable(object):
def __init__(self, delimiter, num_wanted=1):
self.delimiter = delimiter
self.num_wanted = num_wanted
self.num_found = 0
def __call__(self, value):
if value == self.delimiter:
self.num_found += 1
if self.num_found >= self.num_wanted:
self.num_found = 0
return True
else:
self.num_found = 0
def split_multi_delimiter(items, delimiter, num_wanted):
keyfunc = MultiDelimiterKeyCallable(delimiter, num_wanted)
return (list(item
for item in group
if item != delimiter)
for key, group in itertools.groupby(items, keyfunc)
if not key)
items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
print list(split_multi_delimiter(items, "X", 2))
I must say that cobbal's solution is much simpler for the same results.
Use a generator function to maintain state of your iterator through the list, and the count of the number of separator chars seen so far:
l = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
def splitOn(ll, x, n):
cur = []
splitcount = 0
for c in ll:
if c == x:
splitcount += 1
if splitcount == n:
yield cur
cur = []
splitcount = 0
else:
cur.append(c)
splitcount = 0
yield cur
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))
l += ['X','X']
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))
prints:
[['a', 'b'], ['c', 'd'], ['f', 'g']]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g']]
[['a', 'b', 'c', 'd', 'f', 'g']]
[['a', 'b'], ['c', 'd'], ['f', 'g'], []]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g'], [], []]
[['a', 'b', 'c', 'd', 'f', 'g']]
EDIT: I'm also a big fan of groupby, here's my go at it:
from itertools import groupby
def splitOn(ll, x, n):
cur = []
for isdelim,grp in groupby(ll, key=lambda c:c==x):
if isdelim:
nn = sum(1 for c in grp)
while nn >= n:
yield cur
cur = []
nn -= n
else:
cur.extend(grp)
yield cur
Not too different from my earlier answer, just lets groupby take care of iterating over the input list, creating groups of delimiter-matching and not-delimiter-matching characters. The non-matching characters just get added onto the current element, the matching character groups do the work of breaking up new elements. For long lists, this is probably a bit more efficient, as groupby does all its work in C, and still only iterates over the list once.
a = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
b = [[b for b in q if b != 'X'] for q in "".join(a).split("".join(['X' for i in range(2)]))]
this gives
[['a', 'b'], ['c', 'd'], ['f', 'g']]
where the 2 is the number of elements you want. there is most likely a better way to do this.
Very ugly, but I wanted to see if I could pull this off as a one-liner and I thought I would share. I beg you not to actually use this solution for anything of any importance though. The ('X', 3) at the end is the delimiter and the number of times it should be repeated.
(lambda delim, count: map(lambda x:filter(lambda y:y != delim, x), reduce(lambda x, y: (x[-1].append(y) if y != delim or x[-1][-count+1:] != [y]*(count-1) else x.append([])) or x, ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])))('X', 2)
EDIT
Here's a breakdown. I also eliminated some redundant code that was far more obvious when written out like this. (changed above also)
# Wrap everything in a lambda form to avoid repeating values
(lambda delim, count:
# Filter all sublists after construction
map(lambda x: filter(lambda y: y != delim, x), reduce(
lambda x, y: (
# Add the value to the current sub-list
x[-1].append(y) if
# but only if we have accumulated the
# specified number of delimiters
y != delim or x[-1][-count+1:] != [y]*(count-1) else
# Start a new sublist
x.append([]) or x,
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])
)
)('X', 2)
Here's a clean nice solution using zip and generators
#1 define traditional sequence split function
#if you only want it for lists, you can use indexing to make it shorter
def split(it, x):
to_yield = []
for y in it:
if x == y:
yield to_yield
to_yield = []
else:
to_yield.append(y)
if to_yield:
yield to_yield
#2 zip the sequence with its tail
#you could use itertools.chain to avoid creating unnecessary lists
zipped = zip(l, l[1:] + [''])
#3. remove ('X',not 'X')'s from the resulting sequence, and leave only the first position of each
# you can use list comprehension instead of generator expression
filtered = (x for x,y in zipped if not (x == 'X' and y != 'X'))
#4. split the result using traditional split
result = [x for x in split(filtered, 'X')]
This way split() is more reusable.
It's surprising python doesn't have one built in.
edit:
You can easily adjust it for longer split sequences, repeating steps 2-3 and zipping filtered with l[i:] for 0< i <= n.
import re
map(list, re.sub('(?<=[a-z])X(?=[a-z])', '', ''.join(lst)).split('XX'))
This does a list -> string -> list conversion and assumes that the non-delimiter characters are all lower case letters.
Here's another way of doing this:
def split_multi_delimiter(items, delimiter, num_wanted):
def remove_delimiter(objs):
return [obj for obj in objs if obj != delimiter]
ranges = [(index, index+num_wanted) for index in xrange(len(items))
if items[index:index+num_wanted] == [delimiter] * num_wanted]
last_end = 0
for range_start, range_end in ranges:
yield remove_delimiter(items[last_end:range_start])
last_end = range_end
yield remove_delimiter(items[last_end:])
items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
print list(split_multi_delimiter(items, "X", 2))
In [6]: input = ['a', 'b', 'X', 'X', 'cc', 'XX', 'd', 'X', 'ee', 'X', 'X', 'f']
In [7]: [s.strip('_').split('_') for s in '_'.join(input).split('X_X')]
Out[7]: [['a', 'b'], ['cc', 'XX', 'd', 'X', 'ee'], ['f']]
This assumes you can use a reserved character such as _ which is not found in the input.
Too clever by half, and only offered because the obvious right way to do it seems so brute-force and ugly:
class joiner(object):
def __init__(self, N, data = (), gluing = False):
self.data = data
self.N = N
self.gluing = gluing
def __add__(self, to_glue):
# Process an item from itertools.groupby, by either
# appending the data to the last item, starting a new item,
# or changing the 'gluing' state according to the number of
# consecutive delimiters that were found.
N = self.N
data = self.data
item = list(to_glue[1])
# A chunk of delimiters;
# return a copy of self with the appropriate gluing state.
if to_glue[0]: return joiner(N, data, len(item) < N)
# Otherwise, handle the gluing appropriately, and reset gluing state.
a, b = (data[:-1], data[-1] if data else []) if self.gluing else (data, [])
return joiner(N, a + (b + item,))
def split_on_multiple(data, delimiter, N):
# Split the list into alternating groups of delimiters and non-delimiters,
# then use the joiner to join non-delimiter groups when the intervening
# delimiter group is short.
return sum(itertools.groupby(data, delimiter.__eq__), joiner(N)).data
Regex, I choose you!
import re
def split_multiple(delimiter, input):
pattern = ''.join(map(lambda x: ',' if x == delimiter else ' ', input))
filtered = filter(lambda x: x != delimiter, input)
result = []
for k in map(len, re.split(';', ''.join(re.split(',',
';'.join(re.split(',{2,}', pattern)))))):
result.append([])
for n in range(k):
result[-1].append(filtered.__next__())
return result
print(split_multiple('X',
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']))
Oh, you said Python, not Perl.