finding all possible subsequences in a given string - python

I have written this piece of code and it prints all substrings of a given string but I want it to print all the possible subsequences.
from itertools import combinations_with_replacement
s = 'MISSISSIPPI'
lst = []
for i,j in combinations_with_replacement(range(len(s)), 2):
print(s[i:(j+1)])

Use combinations to get subsequences. That's what combinations is for.
from itertools import combinations
def all_subsequences(s):
out = set()
for r in range(1, len(s) + 1):
for c in combinations(s, r):
out.add(''.join(c))
return sorted(out)
Example:
>>> all_subsequences('HELLO')
['E', 'EL', 'ELL', 'ELLO', 'ELO', 'EO', 'H', 'HE', 'HEL', 'HELL', 'HELLO', 'HELO',
'HEO', 'HL', 'HLL', 'HLLO', 'HLO', 'HO', 'L', 'LL', 'LLO', 'LO', 'O']
>>> all_subsequences('WORLD')
['D', 'L', 'LD', 'O', 'OD', 'OL', 'OLD', 'OR', 'ORD', 'ORL', 'ORLD', 'R', 'RD',
'RL', 'RLD', 'W', 'WD', 'WL', 'WLD', 'WO', 'WOD', 'WOL', 'WOLD', 'WOR', 'WORD',
'WORL', 'WORLD', 'WR', 'WRD', 'WRL', 'WRLD']

One simple way to do so is to verify if the list you are making already has the case that you are iterating over. If you have already seen it, then skip it, if not, then append it to your list of seen combinations.
from itertools import combinations_with_replacement
s = 'MISSISSIPPI'
lst = []
for i,j in combinations_with_replacement(range(len(s)), 2):
if s[i:(j+1)] not in lst:
lst.append(s[i:(j+1)]) # save new combination into list
print(lst[-1]) # print new combination
To be sure that all cases are covered, it really helps to make a drawing of combination that the loop will go over. Suppose a generic string, where letters are represented by their position in the python list, for example 0 to 3.
Here are the numbers generated by "combinations_with_replacement"
00, 01, 02, 03,
11, 12, 13,
22, 23,
33

Related

How to get each letter of word list python

l = ['hello', 'world', 'monday']
for i in range(n):
word = input()
l.append(word)
for j in l[0]:
print(j)
Output : h e l l o
I would like to do it for every word in l.
I want to keep my list intact because i would need to get len() of each word and i won't know the number of word that i could possibly get.
I don't know if i'm clear enough, if you need more informations let me know, thanks !
def split_into_letters(word):
return ' '.join(word)
lst = ['hello', 'world', 'monday']
lst_2 = list(map(split_into_letters, lst))
print(lst_2)
You can map each word to a function that splits it into letters
l = ['hello', 'world', 'monday']
list(map(list, l))
#output
[['h', 'e', 'l', 'l', 'o'],
['w', 'o', 'r', 'l', 'd'],
['m', 'o', 'n', 'd', 'a', 'y']]
from itertools import chain
lst = ['hello', 'world', 'monday']
# Print all letters of all words seperated by spaces
print(*chain.from_iterable(lst))
# Print all letters of all words seperated by spaces
# for each word on a new line
for word in lst:
print(*word)

Generate all possible unique samples with n-elements from

Is there any simple way to generate all possible unique samples from any given sample frame eg. I have a list with 5 elements members = ['P', 'V', 'S', 'T', 'A'], and would like to draw all possible 2 element combinations, disregarding an order i.e 'PV' is equivalent to 'VP'. So from list ['P', 'V', 'S', 'T', 'A'], I should get 10, 2 element samples.
I created something that does the trick, but I wonder if there is some method or function available already that does it and would allow to simply provide sample frame, size of the sample and created all possible combinations.
members = list('PVSTA')
ms = []
for i in members:
for j in members:
if i != j and i+j not in ms and j+i not in ms:
ms.append(i+j)
else:
continue
print(ms)
['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']
You can use itertools.combinations(iterable, r) which return r length subsequences of elements from the input iterable. So in your case when the iterable is ['P', 'V', 'S', 'T', 'A'] and r=2 it will return 5C2 = 10 combinations.
Use:
from itertools import combinations
ms = ["".join(c) for c in combinations(list("PVSTA"), r=2)]
print(ms)
Output:
['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']
What you want to do is called the combinations, you can do this by using the itertools library in python.
from itertools import combinations
members = list('PVSTA')
comb_2 = combinations(members, 2)
result = ["".join(c) for c in comb_2]
print(result)
Others have already posted the itertools.combinations route (the best approach), but here is the manual way to do it for anyone interested:
members = list('PVSTA')
ms = []
for i in range(len(members)-1):
for j in range(i+1, len(members)):
ms.append(members[i] + members[j]
print(ms) # ['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']

Finding word in a matrix

I have a matrix file (which python reads like a list of lists) and I need to tell if a word from a different file appears in that matrix, in a given direction.
for example: given this matrix:
c,a,T,e
o,a,t,s
w,o,t,e
n,o,l,e
the words:
caT, cow, own, cat
and the directions:
downright (diagonal)
I expect an output:
cat:1
and for the direction:
down, right
I expect:
cow:1
own:1
cat:1
my function is set like so:
def word_finder(word_list,matrix, directions):
What I find hard to do is go through a list of lists and run over indexes that are horizontal for example or diagonal :(
thx for the help
There already seem to be several partial answers to your question. While it would not be efficient, with some simple parsing of the directions you could easily chain together the following separate solutions to come up with an answer to your problem.
Diagonal Traversal: In Python word search, searching diagonally, printing result of where word starts and ends
Linear Traversal: How to find words in a matrix - Python
Try this:
from itertools import chain
from collections import defaultdict
matrix= [['c', 'a', 'T', 'e'],
['o', 'a', 't', 's'],
['w', 'o', 't', 'e'],
['n', 'o', 'l', 'e']]
words = ['caT', 'cow', 'own', 'cat']
d = defaultdict(int)
for i in chain(matrix, list(zip(*matrix))): # list(zip(*matrix)) is for down direction
for word in words:
if word in ''.join(i):
d[word] = d[word] + 1
The d will be your expected output. {'caT': 1, 'cow': 1, 'own': 1}

Permutations Python

How do I take, for example, this tuple ("A", "E", "L") and generate all possible words without repeating the letters? The result would be 3 words with only one letter, 6 words with two letters and 6 words with 3 letters.
I tried this:
def gererate(tuplo_letras):
return [i for i in itertools.permutations(tuplo_letras)]
def final(arg):
return generate(list(map(''.join, itertools.permutations(arg))))
You can use itertools.permutations and iterate over all the lengthes of the permutations you want to cover. Note that permutations takes two arguments, the iterable and the desired length of the permutations you want:
from itertools import permutations, chain
tpl = ("A", "E", "L")
[''.join(p) for p in chain(*(permutations(tpl, l+1) for l in range(len(tpl))))]
# ['A', 'E', 'L', 'AE', 'AL', 'EA', 'EL', 'LA', 'LE', 'AEL', 'ALE', 'EAL', 'ELA', 'LAE', 'LEA']
If you need them grouped you can nest the comprehensions accordingly:
[[''.join(p) for p in (permutations(tpl, l+1))] for l in range(len(tpl))]
# [['A', 'E', 'L'], ['AE', 'AL', 'EA', 'EL', 'LA', 'LE'], ['AEL', 'ALE', 'EAL', 'ELA', 'LAE', 'LEA']]

Python: Pair alphabets after loop is completed

I tried to pair Alphabets by this
import string
a=string.uppercase
for i in range(0,30):
print a[i%26]*(i / 26+1)
This will print A-Z and then after Z it will print AA BB like string
but i need to put this string like AA AB AC AD AE until the range is defined after printing A-Z then the result will be like
print A-Z then AA AB AC ....
You can take advantage of the itertools module and use a generator to handle this pretty cleanly:
from itertools import count, product, islice
from string import ascii_uppercase
def multiletters(seq):
for n in count(1):
for s in product(seq, repeat=n):
yield ''.join(s)
gives
>>> list(islice(multiletters('ABC'), 20))
['A', 'B', 'C', 'AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC', 'AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB']
>>> list(islice(multiletters(ascii_uppercase), 30))
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD']
and you can make an object and get them one by one, if you'd prefer:
>>> m = multiletters(ascii_uppercase)
>>> next(m)
'A'
>>> next(m)
'B'
>>> next(m)
'C'
[Update: I should note though that I pass data between Python and Excel all the time -- am about to do so, actually -- and never need this function. But if you have a specific question about the best way to exchange data, it's probably better to ask a separate question than to edit this one now that there are several answers to the current question.]
I think what you are looking for is a nested for loop, like this:
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(0,26):
for j in range(0, 26):
print get_string(i) + get_string(j)
Note that I defined your indexing of string.uppercase to a function (get_string) so that its code would not be repeated.
I think what you want is something like this
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(-1,26):
for j in range(0, 26):
if i==-1:
print get_string(j)
else:
print get_string(i) + get_string(j)
The first time through the outer loop, do not print a leading character (the first 26 Excel columns) then after that the next 26 columns print a letter followed by a second letter.
Working example available on ideone.com -> http://ideone.com/M862Ra

Categories