Looping with regex (Python) appending after match to a list - python

So I'am trying to append every match of a string into a list with regex.
So here is my code. It only doesn't work (properly) unfortunatly.
seq = 'ABABABBBASHDBDHBEHDBEDH'
Empty_list = []
regex_ex = re.finditer(r'.{3}', seq)
for x in regex_ex:
Empty_list.append(x)

to access the value of your match you should use re.Match.group method:
for x in regex_ex:
Empty_list.append(x.group())
you could replace your for loop code with a list comprehension:
Empty_list = [x.group() for x in re.finditer(r'.{3}', seq)]
print(Empty_list)
output:
['ABA', 'BAB', 'BBA', 'SHD', 'BDH', 'BEH', 'DBE']
if you want a more compact code:
list(map(re.Match.group, re.finditer(r'.{3}', seq)))
output:
['ABA', 'BAB', 'BBA', 'SHD', 'BDH', 'BEH', 'DBE']

You're saving match object instead of matched string:
seq = 'ABABABBBASHDBDHBEHDBEDH'
Empty_list = []
regex_ex = re.finditer(r'.{3}', seq)
for x in regex_ex:
Empty_list.append(x.group(0)) # saves matched string
print(Empty_list)
Output:
['ABA', 'BAB', 'BBA', 'SHD', 'BDH', 'BEH', 'DBE']

Related

Python: The resulting array of type string does not match

I want to make a program to group words consisting of the same characters in an array, but the results don't match when using the Python programming language
Example problem: Suppose I have 3 arrays of type string as follows:
oe1 = ["abc", "def"];
oe2 = ["asd", "cab", "fed", "eqw"];
oe3 = ["qwe", "efd", "bca"];
Note: The order of the array or the elements doesn't matter, the important thing is that they are grouped.
Output example:
[abc, cab, bca]
[asd]
[def, fed, efd]
[eqw, qwe]
But I try to use coding like this the results are not appropriate:
oe1 = ["abc", "def"];
oe2 = ["asd", "cab", "fed", "eqw"];
oe3 = ["qwe", "efd", "bca"];
anagram_list = []
for word_1 in oe1:
for word_2 in oe2:
for word_3 in oe3:
if word_1 != word_2 != word_3 and (sorted(word_1)==sorted(word_2)==sorted(word_3)):
anagram_list.append(word_1 + word_2 + word_3)
print(anagram_list)
my output is like this:
['abccabbca', 'deffedefd']
How do I make it match the example output above?
First off, let's combine those lists and sort them using a lambda that converts each string to a list of characters, then sorts that.
>>> sorted(oe1 + oe2 + oe3, key=lambda s: sorted(list(s)))
['abc', 'cab', 'bca', 'asd', 'def', 'fed', 'efd', 'eqw', 'qwe']
Then using itertools.groupby to group them based on the same lambda.
>>> k = lambda s: sorted(list(s))
>>> [list(v) for _, v in groupby(sorted(oe1 + oe2 + oe3, key=k), key=k)]
[['abc', 'cab', 'bca'], ['asd'], ['def', 'fed', 'efd'], ['eqw', 'qwe']]
This can be simplified a bit further by not first converting to a list and just sorting each string.
>>> sorted(oe1 + oe2 + oe3, key=sorted)
['abc', 'cab', 'bca', 'asd', 'def', 'fed', 'efd', 'eqw', 'qwe']
>>> [list(v) for _, v in groupby(sorted(oe1 + oe2 + oe3, key=sorted), key=sorted)]
[['abc', 'cab', 'bca'], ['asd'], ['def', 'fed', 'efd'], ['eqw', 'qwe']]
To be easier, let merge them all together. Also, use condition with set() to check whether 2 text are using the same set of characters.
However, my solution doesn't require any sort(). But I added it because I want the output to be the same as your desired output.
oe = oe1 + oe2 + oe3
oe.sort() # optional, no need for sorting
oe_group = []
for i in range(len(oe)):
if i == 0:
oe_group.append([oe[i]])
else:
for j in range(len(oe_group)):
if set(oe[i]) == set(oe_group[j][0]):
oe_group[j].append(oe[i])
break
if j == len(oe_group) - 1:
oe_group.append([oe[i]])
output
[['abc', 'bca', 'cab'], ['asd'], ['def', 'efd', 'fed'], ['eqw', 'qwe']]

How to add elements in list which is value of dictionary and those elements not be repeated as another keys of that dictionary?

Suppose I have one list which contains anagram strings. For example,
anList = ['aba','baa','aab','cat','tac','act','sos','oss']
And I want to construct a dictionary which contains element of that list as key and anagram strings of that element will be values of that key as a list, Also elements which will be added into list are not repeated as another key of that dictionary. For example, if 'baa' has added to list, which list is value of key 'aba', then 'baa' can not be added as key further. Output dictionary should be look like,
anDict = {'aba' : ['baa','aab'],'cat' : ['tac','act'],'sos' : ['oss']}
I have tried with many approaches, but problem is that added elements in list are again add as key of dictionary.
How can I done it?
You can group your words by the count of letters using the Counter object:
from collections import Counter
from itertools import groupby
sorted list = sorted(anList, key=Counter)
groups = [list(y) for x, y in groupby(sortedList, key=Counter)]
#[['aba', 'baa', 'aab'], ['cat', 'tac', 'act'], ['sos', 'oss']]
Now, convert the list of lists of anagrams into a dictionary:
{words[0]: words[1:] for words in groups}
#{'aba': ['baa', 'aab'], 'cat': ['tac', 'act'], 'sos': ['oss']}
Here combining both the order of occurrence with the possibility of them not being grouped together:
anagram_list = ['cat','aba','baa','aab','tac','sos','oss','act']
first_anagrams = {}
anagram_dict = {}
for word in anagram_list:
sorted_word = ''.join(sorted(word))
if sorted_word in first_anagrams:
anagram_dict[first_anagrams[sorted_word]].append(word)
else:
first_anagrams[sorted_word] = word
anagram_dict[word] = []
print(anagram_dict)
The output is
{'aba': ['baa', 'aab'], 'sos': ['oss'], 'cat': ['tac', 'act']}
where the key is always the first anagram in order of occurrence, and the algorithm is strictly O(n) for n words of neglible length.
Should you want all anagrams in the list including the first one, it becomes much easier:
anagram_list = ['cat','aba','baa','aab','tac','sos','oss','act']
first_anagrams = {}
anagram_dict = defaultdict(list)
for word in anagram_list:
anagram_dict[first_anagrams.setdefault(''.join(sorted(word)), word)].append(word)
The result is
defaultdict(<type 'list'>,
{'aba': ['aba', 'baa', 'aab'], 'sos': ['sos', 'oss'], 'cat': ['cat', 'tac', 'act']})
The answers from #DYZ and #AnttiHaapala handle the expected output posted in the question much better than this one.
Following is an approach that comes with some caveats using collections.defaultdict. Sort each list element to compare it to the anagram key and append any anagrams that are not the same as the key.
from collections import defaultdict
anagrams = ['aba','baa','aab','cat','tac','act','sos','oss']
d = defaultdict(list)
for a in anagrams:
key = ''.join(sorted(a))
if key != a:
d[key].append(a)
print(d)
# {'aab': ['aba', 'baa'], 'act': ['cat', 'tac'], 'oss': ['sos']}
Caveats:
always uses the ascending sorted version of the anagram as the dict key, which is not an exact match for the example output in the question
if the ascending sorted version of the anagram is not in the list, this approach will add a previously non-existent anagram as the dict key
You can use the function groupby() on a presorted list. The function sorted (or Counter) can be used as the key for sorting and grouping:
from itertools import groupby
anList = ['aba', 'baa', 'aab', 'cat', 'tac', 'act', 'sos', 'oss']
{k: v for _, (k, *v) in groupby(sorted(anList, key=sorted), key=sorted)}
# {'aba': ['baa', 'aab'], 'cat': ['tac', 'act'], 'sos': ['oss']}
Here is slow, but working code:
anList = ['aba', 'baa', 'aab', 'cat', 'tac', 'act', 'sos', 'oss']
anDict = {}
for i in anList:
in_dict = False
for j in anDict.keys():
if sorted(i) == sorted(j):
in_dict = True
anDict[j].append(i)
break
if not in_dict:
anDict[i] = []
You may use else with a for loop to achieve this:
anList = ['aba','baa','aab','cat','tac','act','sos','oss']
anDict = dict()
for k in anList:
for ok in anDict:
if (ok == k): break
if (sorted(ok) == sorted(k)):
anDict[ok].append(k)
break
else:
anDict[k] = []
print(anDict)
# {'aba': ['baa', 'aab'], 'cat': ['tac', 'act'], 'sos': ['oss']}
A simple version without itertools.
Create a multimap sorted string -> [anagram string]:
>>> L = ['aba', 'baa', 'aab', 'cat', 'tac', 'act', 'sos', 'oss']
>>> d = {}
>>> for v in L:
... d.setdefault("".join(sorted(v)), []).append(v)
...
>>> d
{'aab': ['aba', 'baa', 'aab'], 'act': ['cat', 'tac', 'act'], 'oss': ['sos', 'oss']}
Now you've grouped the anagrams, use the first values as key of the return dict:
>>> {v[0]:v[1:] for v in d.values()}
{'aba': ['baa', 'aab'], 'cat': ['tac', 'act'], 'sos': ['oss']}
anList = ['aba', 'baa', 'aab', 'cat', 'tac', 'act', 'sos', 'oss']
anDict = {}
for word in anList:
sorted_word = ''.join(sorted(word))
found_key = [key for key in anDict.keys() if sorted_word == ''.join(sorted(key))]
if found_key:
anDict[found_key[0]].append(word)
else:
anDict[word]=[]
>>> anDict
{'aba': ['baa', 'aab'], 'cat': ['tac', 'act'], 'sos': ['oss']}

Generate permutations of string-code error

This is the code that i have a problem with.
def permute(word):
letters = list(word)
print(type(letters))
for letter in letters:
letter_copy = letters.remove(letter)
rtrn_list = letter + permute(letter_copy)
return rtrn_list
w = 'ABC'
print(permute(w))
i am new to programming. someone please say where the problem is. Thanks in advance
Find your problem my comparing to this implementation.
def permute(string):
'''
Recursively finds all possible combinations of the
elements -- or permuations -- of an input string and
returns them as a list.
>>>permute('abc')
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']
'''
output = []
if len(string) == 1:
output = [string]
else:
for i, let in enumerate(string):
for perm in permute(string[:i] + string[i + 1:]):
#print('Let is', let)
#print('Perm is', perm)
output += [let + perm]
return output
permute('abc')
Out[ ]:
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']
For permutations you can use python builtin from itertools:
from itertools import permutations
p = []
for t in permutations('abc'):
p.append(''.join(t))
print(p)
Output is:
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

String generation based on the other string in Python

I want to create a simple string generator and here is how it will work
I declare a pattern_string = "abcdefghijklmnopqrstuvwxyz"
My starting string lets say starting_string = "qywtx"
Now I want to generate strings as follows:
Check the last character in my starting_stringagainst the pattern string.
Last character is x. We find this character in the find it in the pattern_string:
abcdefghijklmnopqrstuvw x yz
and see that next character is y so I want output qywty.
...
However, when I reach the z, I want my string to increment second last character and set the last character to the first character of the starting_pattern so it will be qywra and so on...
Now questions:
Can I use REGEX to achieve that?
Are there any libraries out there that already handle such generation?
The following will generate the next string according to your description.
def next(s, pat):
l = len(s)
for i in range(len(s) - 1, -1, -1): # find the first non-'z' from the back
if s[i] != pat[-1]: # if you find it
# leave everything before i as is, increment at i, reset rest to all 'a's
return s[:i] + pat[pat.index(s[i]) + 1] + (l - i - 1) * pat[0]
else: # this is only reached for s == 'zzzzz'
return (l + 1) * pat[0] # and generates 'aaaaaa' (just my assumption)
>>> import string
>>> pattern = string.ascii_lowercase # 'abcde...xyz'
>>> s = 'qywtx'
>>> s = next(s, pattern) # 'qywty'
>>> s = next(s, pattern) # 'qywtz'
>>> s = next(s, pattern) # 'qywua'
>>> s = next(s, pattern) # 'qywub'
For multiple 'z' in the end:
>>> s = 'foozz'
>>> s = next(s, lower) # 'fopaa'
For all 'z', start over with 'a' of incremented length:
>>> s = 'zzz'
>>> s = next(s, lower) # 'aaaa'
To my knowledge there is no library function to do that. One that comes close is itertools.product:
>>> from itertools import product
>>> list(map(''.join, product('abc', repeat=3)))
['aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa',
'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab',
'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']
But that doesn't not work with an arbitrary start string. This behaviour could be mimicked by combining it with itertools.dropwhile but that has the serious overhead of skipping all the combinations before the start string (which in the case of an alphabet of 26 and a start string towards the end pretty much renders that approach useless):
>>> list(dropwhile(lambda s: s != 'bba', map(''.join, product('abc', repeat=3))))
['bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab', 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']

python get list element according to alphabet

I have a list of names alphabetically, like:
list = ['ABC', 'ACE', 'BED', 'BRT', 'CCD', ..]
How can I get element from each starting letter? Do I have to iterate the list one time? or Does python has some function to do it? New to python, this may be a really naive problem.
Suppose I want to get the second element from names that starts from 'A', this case I get 'ACE'.
If you're going to do multiple searches, you should take the one-time hit of iterating through everything and build a dictionary (or, to make it simpler, collections.defaultdict):
from collections import defaultdict
d = defaultdict(list)
words = ['ABC', 'ACE', 'BED', 'BRT', 'CCD', ...]
for word in words:
d[word[0]].append(word)
(Note that you shouldn't name your own variable list, as it shadows the built-in.)
Now you can easily query for the second word starting with "A":
d["A"][1] == "ACE"
or the first two words for each letter:
first_two = {c: w[:2] for c, w in d.items()}
Using generator expression and itertools.islice:
>>> import itertools
>>> names = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
>>> next(itertools.islice((name for name in names if name.startswith('A')), 1, 2), 'no-such-name')
'ACE'
>>> names = ['ABC', 'BBD', 'BED', 'BRT', 'CCD']
>>> next(itertools.islice((name for name in names if name.startswith('A')), 1, 2), 'no-such-name')
'no-such-name'
Simply group all the elements by their first char
from itertools import groupby
from operator import itemgetter
example = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
d = {g:list(values) for g, values in groupby(example, itemgetter(0))}
Now to get a value starting with a:
print d.get('A', [])
This is most usefull when you have a static list and will have multiple queries since as you may see, getting the 3rd item starting with 'A' is done in O(1)
You might want to use list comprehensions
mylist = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
elements_starting_with_A = [i for i in mylist if i[0] == 'A']
>>> ['ABC', 'ACE']
second = elements_starting_with_A[1]
>>> 'ACE'
In addition to list comprehension as others have mentioned, lists also have a sort() method.
mylist = ['AA', 'BB', 'AB', 'CA', 'AC']
newlist = [i for i in mylist if i[0] == 'A']
newlist.sort()
newlist
>>> ['AA', 'AB', 'AC']
The simple solution is to iterate over the whole list in O(n) :
(name for name in names if name.startswith('A'))
However you could sort the names and search in O(log(n)) for the item which is supposed to be on the index or after (using lexicographic comparison). The module bisect will help you to find the bounds :
from bisect import bisect_left
names = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
names.sort()
lower = bisect_left(names, 'B')
upper = bisect_left(names, chr(1+ord('B')))
print [names[i] for i in range(lower, upper)]
# ['BED', 'BRT']

Categories