Is there any simple way to generate all possible unique samples from any given sample frame eg. I have a list with 5 elements members = ['P', 'V', 'S', 'T', 'A'], and would like to draw all possible 2 element combinations, disregarding an order i.e 'PV' is equivalent to 'VP'. So from list ['P', 'V', 'S', 'T', 'A'], I should get 10, 2 element samples.
I created something that does the trick, but I wonder if there is some method or function available already that does it and would allow to simply provide sample frame, size of the sample and created all possible combinations.
members = list('PVSTA')
ms = []
for i in members:
for j in members:
if i != j and i+j not in ms and j+i not in ms:
ms.append(i+j)
else:
continue
print(ms)
['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']
You can use itertools.combinations(iterable, r) which return r length subsequences of elements from the input iterable. So in your case when the iterable is ['P', 'V', 'S', 'T', 'A'] and r=2 it will return 5C2 = 10 combinations.
Use:
from itertools import combinations
ms = ["".join(c) for c in combinations(list("PVSTA"), r=2)]
print(ms)
Output:
['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']
What you want to do is called the combinations, you can do this by using the itertools library in python.
from itertools import combinations
members = list('PVSTA')
comb_2 = combinations(members, 2)
result = ["".join(c) for c in comb_2]
print(result)
Others have already posted the itertools.combinations route (the best approach), but here is the manual way to do it for anyone interested:
members = list('PVSTA')
ms = []
for i in range(len(members)-1):
for j in range(i+1, len(members)):
ms.append(members[i] + members[j]
print(ms) # ['PV', 'PS', 'PT', 'PA', 'VS', 'VT', 'VA', 'ST', 'SA', 'TA']
Related
I have a list with alphanumeric characters like as shown below
l1 = ['G1','L1']
I would like to know whether we have something like below
for i in range(l1): #this doesn't work because range is only for numeric values
for i in range(G1:L1): #this also doesn't work
However, I want the i value at each run to change from G1to H1 to I1 to J1 to K1 to L1
Range always expects a number and cannot work with strings.
However, you can use the built-in ord() function to convert letters to numbers and then use the chr() function to convert them back from numbers to ASCII characters.
Code
a = [chr(c)+'1' for c in range(ord('G'), ord('M'))]
print(a)
Output
['G1', 'H1', 'I1', 'J1', 'K1', 'L1']
Update: Solution for double characters.
Doing it for double characters is a little more complicated, but this StackOverflow answer has a solution to that. You can simply use the from_excel() and to_excel() functions from that answer and replace them in my above code as follows.
Code
a = [to_excel(i) for i in range(from_excel('G'), from_excel('AG'))]
print(a)
Output
['G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD', 'AE', 'AF']
You can use:
from openpyxl.utils import coordinate_to_tuple, get_column_letter
def excel_range(start, end):
t1 = coordinate_to_tuple(start)
t2 = coordinate_to_tuple(end)
rows, cols = zip(t1, t2)
cells = []
for r in range(rows[0], rows[1]+1):
for c in range(cols[0], cols[1]+1):
cells.append(f'{get_column_letter(c)}{r}')
return cells
cells = excel_range('AA1', 'AC4')
Output:
>>> cells
['AA1',
'AB1',
'AC1',
'AA2',
'AB2',
'AC2',
'AA3',
'AB3',
'AC3',
'AA4',
'AB4',
'AC4']
I have written this piece of code and it prints all substrings of a given string but I want it to print all the possible subsequences.
from itertools import combinations_with_replacement
s = 'MISSISSIPPI'
lst = []
for i,j in combinations_with_replacement(range(len(s)), 2):
print(s[i:(j+1)])
Use combinations to get subsequences. That's what combinations is for.
from itertools import combinations
def all_subsequences(s):
out = set()
for r in range(1, len(s) + 1):
for c in combinations(s, r):
out.add(''.join(c))
return sorted(out)
Example:
>>> all_subsequences('HELLO')
['E', 'EL', 'ELL', 'ELLO', 'ELO', 'EO', 'H', 'HE', 'HEL', 'HELL', 'HELLO', 'HELO',
'HEO', 'HL', 'HLL', 'HLLO', 'HLO', 'HO', 'L', 'LL', 'LLO', 'LO', 'O']
>>> all_subsequences('WORLD')
['D', 'L', 'LD', 'O', 'OD', 'OL', 'OLD', 'OR', 'ORD', 'ORL', 'ORLD', 'R', 'RD',
'RL', 'RLD', 'W', 'WD', 'WL', 'WLD', 'WO', 'WOD', 'WOL', 'WOLD', 'WOR', 'WORD',
'WORL', 'WORLD', 'WR', 'WRD', 'WRL', 'WRLD']
One simple way to do so is to verify if the list you are making already has the case that you are iterating over. If you have already seen it, then skip it, if not, then append it to your list of seen combinations.
from itertools import combinations_with_replacement
s = 'MISSISSIPPI'
lst = []
for i,j in combinations_with_replacement(range(len(s)), 2):
if s[i:(j+1)] not in lst:
lst.append(s[i:(j+1)]) # save new combination into list
print(lst[-1]) # print new combination
To be sure that all cases are covered, it really helps to make a drawing of combination that the loop will go over. Suppose a generic string, where letters are represented by their position in the python list, for example 0 to 3.
Here are the numbers generated by "combinations_with_replacement"
00, 01, 02, 03,
11, 12, 13,
22, 23,
33
How can we use recursion to calculate all dna sequences of length n in a function.
For instance if the function is given 2, it returns ['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
etc...
functools.permutations will give all combinations of a given iterable, the second argument r is the length of the combinations returned
itertools.permutations('ACGT', length)
Here is one way:
def all_seq(n, curr, e, ways):
"""All possible sequences of size n given elements e.
ARGS
n: size of sequence
curr: a list used for constructing sequences
e: the list of possible elements (could have been a global list instead)
ways: the final list of sequences
"""
if len(curr) == n:
ways.append(''.join(curr))
return
for element in e:
all_seq(n, list(curr) + [element], e, ways)
perms = []
all_seq(2, [], ['A', 'C', 'T', 'G'], perms)
print(perms)
The ouput:
['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
You actually want itertools.product('ACGT', repeat=n). Note that this will grow enormously fast (4^n elements of n length).
If your assignment is to do it recursively, consider how you would get all n+1-length options that start with a n-length prefix. The naive recursive option might be rather slow compared to itertools, if you need to use it in anger.
How do I take, for example, this tuple ("A", "E", "L") and generate all possible words without repeating the letters? The result would be 3 words with only one letter, 6 words with two letters and 6 words with 3 letters.
I tried this:
def gererate(tuplo_letras):
return [i for i in itertools.permutations(tuplo_letras)]
def final(arg):
return generate(list(map(''.join, itertools.permutations(arg))))
You can use itertools.permutations and iterate over all the lengthes of the permutations you want to cover. Note that permutations takes two arguments, the iterable and the desired length of the permutations you want:
from itertools import permutations, chain
tpl = ("A", "E", "L")
[''.join(p) for p in chain(*(permutations(tpl, l+1) for l in range(len(tpl))))]
# ['A', 'E', 'L', 'AE', 'AL', 'EA', 'EL', 'LA', 'LE', 'AEL', 'ALE', 'EAL', 'ELA', 'LAE', 'LEA']
If you need them grouped you can nest the comprehensions accordingly:
[[''.join(p) for p in (permutations(tpl, l+1))] for l in range(len(tpl))]
# [['A', 'E', 'L'], ['AE', 'AL', 'EA', 'EL', 'LA', 'LE'], ['AEL', 'ALE', 'EAL', 'ELA', 'LAE', 'LEA']]
I tried to pair Alphabets by this
import string
a=string.uppercase
for i in range(0,30):
print a[i%26]*(i / 26+1)
This will print A-Z and then after Z it will print AA BB like string
but i need to put this string like AA AB AC AD AE until the range is defined after printing A-Z then the result will be like
print A-Z then AA AB AC ....
You can take advantage of the itertools module and use a generator to handle this pretty cleanly:
from itertools import count, product, islice
from string import ascii_uppercase
def multiletters(seq):
for n in count(1):
for s in product(seq, repeat=n):
yield ''.join(s)
gives
>>> list(islice(multiletters('ABC'), 20))
['A', 'B', 'C', 'AA', 'AB', 'AC', 'BA', 'BB', 'BC', 'CA', 'CB', 'CC', 'AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB']
>>> list(islice(multiletters(ascii_uppercase), 30))
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD']
and you can make an object and get them one by one, if you'd prefer:
>>> m = multiletters(ascii_uppercase)
>>> next(m)
'A'
>>> next(m)
'B'
>>> next(m)
'C'
[Update: I should note though that I pass data between Python and Excel all the time -- am about to do so, actually -- and never need this function. But if you have a specific question about the best way to exchange data, it's probably better to ask a separate question than to edit this one now that there are several answers to the current question.]
I think what you are looking for is a nested for loop, like this:
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(0,26):
for j in range(0, 26):
print get_string(i) + get_string(j)
Note that I defined your indexing of string.uppercase to a function (get_string) so that its code would not be repeated.
I think what you want is something like this
import string
def get_string(val):
return string.uppercase[val%26]*(val / 26+1)
for i in range(-1,26):
for j in range(0, 26):
if i==-1:
print get_string(j)
else:
print get_string(i) + get_string(j)
The first time through the outer loop, do not print a leading character (the first 26 Excel columns) then after that the next 26 columns print a letter followed by a second letter.
Working example available on ideone.com -> http://ideone.com/M862Ra