How to split this string with python? - python

I have strings that look like this example:
"AAABBBCDEEEEBBBAA"
Any character is possible in the string.
I want to split it to a list like:
['AAA','BBB','C','D','EEEE','BBB','AA']
so every continuous stretch of the same characters goes to separate element of the split list.
I know that I can iterate over characters in the string, check every i and i-1 pair if they contain the same character, etc. but is there a more simple solution out there?

We could use Regex:
>>> import re
>>> r = re.compile(r'(.)\1*')
>>> [m.group() for m in r.finditer('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
Alternatively, we could use itertools.groupby.
>>> import itertools
>>> [''.join(g) for k, g in itertools.groupby('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
timeit shows Regex is faster (for this particular string) (Python 2.6, Python 3.1). But Regex is after all specialized for string, and groupby is a generic function, so this is not so unexpected.

>>> from itertools import groupby
>>> [''.join(g) for k, g in groupby('AAAABBBCCD')]
['AAAA', 'BBB', 'CC', 'D']
And by normal string manipulation
>>> a=[];S="";p=""
>>> s
'AAABBBCDEEEEBBBAA'
>>> for c in s:
... if c != p: a.append(S);S=""
... S=S+c
... p=c
...
>>> a.append(S)
>>> a
['', 'AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
>>> filter(None,a)
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

import itertools
s = "AAABBBCDEEEEBBBAA"
["".join(chars) for _, chars in itertools.groupby(s)]

Just another way of soloving your problem :
#!/usr/bin/python
string = 'AAABBBCDEEEEBBBAA'
memory = str()
List = list()
for index, element in enumerate(string):
if index > 0:
if string[index] == string[index - 1]:
memory += string[index]
else:
List.append(memory)
memory = element
else:
memory += element
print List

Related

Remove string shorter than k from a list of strings [duplicate]

This question already has an answer here:
How to return a subset of a list that matches a condition [duplicate]
(1 answer)
Closed 2 years ago.
I have a list of strings like the following:
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
And I need to extract only the strings with k=4 characters, so the output would be:
minlist = ['aaaa', 'bbbb', 'cccc']
How can be implemented efficiently ?
This is exactly the type of situation the filter function is intended for:
>>> mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
>>> minlist = list(filter(lambda i: len(i) == 4, mylist))
>>> minlist
['aaaa', 'bbbb', 'cccc']
filter takes two arguments: the first is a function, and the second is an iterable. The function will be applied to each element of the iterable, and if the function returns True, the element will be kept, and if the function returns False, the element will be excluded. filter returns the result of filtering these elements according to the passed in function
As a sidenote, the filter function returns a filter object, which is an iterator, rather than a list (which is why the explicit list call is included). So, if you're simply iterating over the values, you don't need to convert it to a list as it will be more efficient
Try this:
def get_minlist(my_list, k):
return [item for item in my_list if len(item) == k]
You can use this as:
print(get_minlist(["abc", "ab", "a"], 2))
Result:
['ab']
The code is pythonic, fast, and is very easy to understand. The code goes through the items in the list, checks if they are k in length, if so it keeps them.
You can check the length of a string using len().
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
minlist = [x for x in mylist if len(x) == 4]
Result:
['aaaa', 'bbbb', 'cccc']
Try this:
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
minilist=[]
for i in range (len(mylist)):
if len(mylist[i]) == 4:
minilist.append(mylist[i])
print(minilist)
Like I said in the comment, you could try something like this:
mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
newlst=[]
for item in mylist:
if len(item) == 4:
newlst.append(item)
print (newlst)
'mylist = ['a', 'b', 'c', 'aa', 'bb', 'cc', 'aaa', 'bbb', 'ccc', 'aaaa', 'bbbb', 'cccc']
here we are using the concept called as list comprehension,list comprehension means it is a easy way to create a list based on some iterables.
note:-iterable is something which can be looped over
during list comprehension creation elements from the iterables(ex:-mylist) can be conditionally included in the new list and transformed as needed
syntax of list comprehension:-
note:- this symbol '|' is used to tell syntax as three parts,1st 2 parts are mandatory and the last part is optional
[give me this | from the collection | with this condition ]
[mandatory | mandatory | optional ]
[var for var in iterables condition ]
filtered_list=[item for item in mylist if len(item)==4]
print(filtered list)

match the pattern at the end of a string?

Imagine I have the following strings:
['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
Where the "c" string has important sub-categories (L1, L2, L3). These indicate special data for our purposes that have been generated in a program based a pre-designated string "L". In other words, I know that the special entries should have the form:
name_Lnumber
Knowing that I'm looking for this pattern, and that I am using "L" or more specifically "_L" as my designation of these objects, how could I return a list of entries that meet this condition? In this case:
['c', 'e']
Use a simple filter:
>>> l = ['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
>>> filter(lambda x: "_L" in x, l)
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Alternatively, use a list comprehension
>>> [s for s in l if "_L" in s]
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Since you need the prefix only, you can just split it:
>>> set(s.split("_")[0] for s in l if "_L" in s)
set(['c', 'e'])
you can use the following list comprehension :
>>> set(i.split('_')[0] for i in l if '_L' in i)
set(['c', 'e'])
Or if you want to match the elements that ends with _L(digit) and not something like _Lm you can use regex :
>>> import re
>>> set(i.split('_')[0] for i in l if re.match(r'.*?_L\d$',i))
set(['c', 'e'])

python: how to sort lists alphabetically with respect to capitalized letters

I'm trying to sort a list alphabetically, where capital letters should come before lower case letters.
l = ['a', 'b', 'B', 'A']
sorted(l) should result in ['A','a','B','b']
I've tried these two forms, but to no avail;
>>> sorted(l, key=lambda s: s.lower())
['a', 'A', 'b', 'B']
>>> sorted(l, key=str.lower)
['a', 'A', 'b', 'B']
Create a tuple as your key instead:
>>> sorted(lst, key=lambda L: (L.lower(), L))
['A', 'a', 'B', 'b']
This means the sort order for lower-case doesn't change ('a', 'a') but means the first key for upper case puts it level with the lower-case equivalent, then sorts before it: eg ('a', 'A') < ('a', 'a')
Interesting how such a list supposed to sort following list
lst = ['abb', 'ABB', 'aBa', 'AbA']
Proposed solution produce following result
>>> sorted(lst, key=lambda L: (L.lower(), L))
['AbA', 'aBa', 'ABB', 'abb']
I can propose more complicated solution with different result
>>> sorted(lst, key=lambda a: sum(([a[:i].lower(),
a[:i]] for i in range(1, len(a)+1)),[]))
['ABB', 'AbA', 'aBa', 'abb']

Sort list of strings by length and alphabetically [duplicate]

This question already has answers here:
How to sort a list by length of string followed by alphabetical order?
(6 answers)
Closed 8 months ago.
I need to sort a list of words based on two criteria given. I need to return a list with the same words in order of length (longest to shortest) and the second sort criteria should be alphabetical.
Example list :
l = ['aa','aaa','aaaa','b','bb','z','ccc']
Desired output:
['aaaa', 'aaa', 'ccc', 'aa', 'bb', 'b', 'z']
You only need one call to sort, because Python automatically sorts tuples lexicographically. That is, if you ask Python to compare two tuples it will order them by their first element, except if those compare equal in which case it will order them by their second element, except if those compare equal in which case...
You want to sort the list of elements by minus their length and then alphabetically, so you want the key of a string s to be the tuple (-len(s), s). Hence:
>>> l = ['aa','aaa','aaaa','b','bb','z','ccc']
>>> sort_key = lambda s: (-len(s), s)
>>> l.sort(key=sort_key)
>>> l
['aaaa', 'aaa', 'ccc', 'aa', 'bb', 'b', 'z']
Design
The keyword here is "stable sorting algorithm". Think of two stable sorting functions:
one sorting according to length (maintaining the relative order of entries with equal lengths),
the other sorting alphabetically.
In which order would you combine them in order to get the desired order?
Implementation
As others have mentioned, the first sorting function can be called like this:
list.sort(key=len, reverse=True)
The second sorting function can be called like this:
list.sort()
This should be enough to write a complete solution.
Result
If you combine the function in the right way, you should get the following:
>>> l = ['aaa', 'fff', 'bbb', 'ddd', 'e', 'cccc']
# ... sorting functions combined in the right way ...
>>> l
['cccc', 'aaa', 'bbb', 'ddd', 'fff', 'e']
In Python, list's sort method has a key which can be used to specify the criteria for sorting.
For the problem you describe I would use the key as well as do the normal sorting as follows.
>>> l = ['aa','aaa','aaaa','b','bb','z','ccc']
>>> l.sort(key=len,reverse=True)
>>> l
['aaaa', 'ccc', 'aaa', 'bb', 'aa', 'z', 'b']
>>> l.sort()
>>> l
['aa', 'aaa', 'aaaa', 'b', 'bb', 'ccc', 'z']
>>>
Hints:
mylist = ['one', 'three', 'zero']
mylist.sort(key=len)
print mylist
mylist.reverse()
print mylist
mylist.sort()
print mylist
otherlist = [(2, 'a'), (1, 'a'), (3, 'b'), (3, 'a')]
otherlist.sort()
print otherlist
a = sorted([["foo", "o"], ["bar2", "yadda"], ["allo","as3r"]], key=len)
b = sorted(a)

Filtering a list of strings based on contents

Given the list ['a','ab','abc','bac'], I want to compute a list with strings that have 'ab' in them. I.e. the result is ['ab','abc']. How can this be done in Python?
This simple filtering can be achieved in many ways with Python. The best approach is to use "list comprehensions" as follows:
>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']
Another way is to use the filter function. In Python 2:
>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']
In Python 3, it returns an iterator instead of a list, but you can cast it:
>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']
Though it's better practice to use a comprehension.
[x for x in L if 'ab' in x]
# To support matches from the beginning, not any matches:
items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'
filter(lambda x: x.startswith(prefix), items)
Tried this out quickly in the interactive shell:
>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>
Why does this work? Because the in operator is defined for strings to mean: "is substring of".
Also, you might want to consider writing out the loop as opposed to using the list comprehension syntax used above:
l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
if 'ab' in s:
result.append(s)
mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist

Categories