How can i start itertools combinations from specific letter/digit? - python

I have a code like this:
start = 1
end = 2
for length in range(start, end+1):
for c in itertools.combinations_with_replacement(string.ascii_letters + string.digits, length):
this will print every uppercase/lowercase letter from A to Z and after all letters are finished, It will start printing all digits from 0 to 9. so it looks like this: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
After first cycle is done, it goes on second one and does the same thing.
like this:
First cycle starting: a
Second cycle starting: aa
and etc.
what i want it to do:
I want to start combinations from specific letter or digit.
Like this:
Combination 1:
First cycle starting: b
Second cycle starting: ba
If it is not possible, then like this:
Combination 2:
First cycle starting: b
Second cycle starting: bb

I think you're asking for all possible permuations after a certain index in which case you can do:
import string
import itertools
chars = string.ascii_letters + string.digits
def alpha_perms(index):
return list(itertools.permutations(chars[index:]))
print(alpha_perms(0)) #0th index to go along with the below statement
Although... this will most likely (absolutely) freeze your machine. There are 62 characters in chars thats 62! (62 factorial) which is approximately equal to 3.1469973e+85 or ~π*10^85th power possible combinations assuming you passed in the 0th index. Even a reasonable index is going to take a long time.
Alternatively, since using return list(...) will cause problems for high combination possibilities. You could yield the value
import string
import itertools
chars = string.ascii_letters + string.digits
def alpha_perms(index):
for perm in itertools.permutations(chars[index:]):
yield perm

Related

create your own number system using python

I am making a small program that guesses a password.
I am making this program just for the purpose of learning, I want to improve my python skills by making a program that have a real meaning.
for example:
using_characts = "abcdefghijklmnopqrstuvwxyz" # I have other characters in my alphabetic system
What I want to do is something like this:
for char in myCharacters:
print(char)
for char_1 in myCharacters:
print(char + char_1)
for char_2 in myCharacters:
print(char + char_1 + char_2)
...etc
which makes this method non dynamic, and hard in the same time.
the output should be something like this:
a
b
c
d
e
f
..etc
aa
ab
ac
..etc
ba
bb
bc
..etc
You can use itertools.product but you should really limit yourself with a small number. Generating cartesian product for higher numbers can take really long time:
from itertools import chain, product
chars = "abcdefghijklmnopqrstuvwxyz"
limit = 2
for perm in chain.from_iterable(product(chars, repeat=i) for i in range(1, limit+1)):
print("".join(perm))
a
b
c
.
.
.
aa
ab
ac
.
.
.
zy
zz
Here you go, this will work. Let me know if you want me to explain any part.
import itertools
using_characts = "abc"
for str_length in range(1,len(using_characts)+1):
for q in itertools.product(using_characts,repeat=str_length):
print("".join(q))
So, the other answers have given you code that will probably work, but I wanted to explain a general approach. This algorithm uses a stack to keep track of the next things that need to be generated, and continues generating until it reaches the maximum length that you've specified.
from collections import deque
from typing import Deque, Iterator, Optional
def generate_next_strings(chars: str, base: str = "") -> Iterator[str]:
# This function appends each letter of a given alphabet to the given base.
# At its first run, it will generate all the single-length letters of the
# alphabet, since the default base is the empty string.
for c in chars:
yield f"{base}{c}"
def generate_all_strings(chars: str, maxlen: Optional[int] = None) -> Iterator[str]:
# We "seed" the stack with a generator. This generator will produce all the
# single-length letters of the alphabet, as noted above.
stack: Deque[Iterator[str]] = deque([generate_next_strings(chars)])
# While there are still items (generators) in the stack...
while stack:
# ...pop the next one off for processing.
next_strings: Iterator[str] = stack.popleft()
# Take each item from the generator that we popped off,
for string in next_strings:
# and send it back to the caller. This is a single "result."
yield string
# If we're still generating strings -- that is, we haven't reached
# our maximum length -- we add more generators to the stack for the
# next length of strings.
if maxlen is None or len(string) < maxlen:
stack.append(generate_next_strings(chars, string))
You can try it using print("\n".join(generate_all_strings("abc", maxlen=5))).
The following code will give you all combinations with lengths between 1 and max_length - 1:
import itertools
combs = []
for i in range(1, max_length):
c = [list(x) for x in itertools.combinations(using_characts, i)]
combs.extend(c)

Is there an easy way to get the number of repeating character in a word?

I'm trying to get how many any character repeats in a word. The repetitions must be sequential.
For example, the method with input "loooooveee" should return 6 (4 times 'o', 2 times 'e').
I'm trying to implement string level functions and I can do it this way but, is there an easy way to do this? Regex, or some other sort of things?
Original question: order of repetition does not matter
You can subtract the number of unique letters by the number of total letters. set applied to a string will return a unique collection of letters.
x = "loooooveee"
res = len(x) - len(set(x)) # 6
Or you can use collections.Counter, subtract 1 from each value, then sum:
from collections import Counter
c = Counter("loooooveee")
res = sum(i-1 for i in c.values()) # 6
New question: repetitions must be sequential
You can use itertools.groupby to group sequential identical characters:
from itertools import groupby
g = groupby("aooooaooaoo")
res = sum(sum(1 for _ in j) - 1 for i, j in g) # 5
To avoid the nested sum calls, you can use itertools.islice:
from itertools import groupby, islice
g = groupby("aooooaooaoo")
res = sum(1 for _, j in g for _ in islice(j, 1, None)) # 5
You could use a regular expression if you want:
import re
rx = re.compile(r'(\w)\1+')
repeating = sum(x[1] - x[0] - 1
for m in rx.finditer("loooooveee")
for x in [m.span()])
print(repeating)
This correctly yields 6 and makes use of the .span() function.
The expression is
(\w)\1+
which captures a word character (one of a-zA-Z0-9_) and tries to repeat it as often as possible.
See a demo on regex101.com for the repeating pattern.
If you want to match any character (that is, not only word characters), change your expression to:
(.)\1+
See another demo on regex101.com.
try this:
word=input('something:')
sum = 0
chars=set(list(word)) #get the set of unique characters
for item in chars: #iterate over the set and output the count for each item
if word.count(char)>1:
sum+=word.count(char)
print('{}|{}'.format(item,str(word.count(char)))
print('Total:'+str(sum))
EDIT:
added total count of repetitions
Since it doesn't matter where the repetition is occurring or which characters are being repeated, you can make use of the set data structure provided in Python. It will discard the duplicate occurrences of any character or an object.
Therefore, the solution would look something like this:
def measure_normalized_emphasis(text):
return len(text) - len(set(text))
This will give you the exact result.
Also, make sure to look out for some edge cases, which you should as it is a good practice.
I think your code is comparing the wrong things
You start by finding the last character:
char = text[-1]
Then you compare this to itself:
for i in range(1, len(text)):
if text[-i] == char: #<-- surely this is test[-1] to begin with?
Why not just run through the characters:
def measure_normalized_emphasis(text):
char = text[0]
emphasis_size = 0
for i in range(1, len(text)):
if text[i] == char:
emphasis_size += 1
else:
char = text[i]
return emphasis_size
This seems to work.

Python strings: quickly summarize the character count in order of appearance

Let's say I have the following strings in Python3.x
string1 = 'AAAAABBBBCCCDD'
string2 = 'CCBADDDDDBACDC'
string3 = 'DABCBEDCCAEDBB'
I would like to create a summary "frequency string" that counts the number of characters in the string in the following format:
string1_freq = '5A4B3C2D' ## 5 A's, followed by 4 B's, 3 C's, and 2D's
string2_freq = '2C1B1A5D1B1A1C1D1C'
string3_freq = '1D1A1B1C1B1E1D2C1A1E1D2B'
My problem:
How would I quickly create such a summary string?
My idea would be: create an empty list to keep track of the count. Then create a for loop which checks the next character. If there's a match, increase the count by +1 and move to the next character. Otherwise, append to end of the string 'count' + 'character identity'.
That's very inefficient in Python. Is there a quicker way (maybe using the functions below)?
There are several ways to count the elements of a string in python. I like collections.Counter, e.g.
from collections import Counter
counter_str1 = Counter(string1)
print(counter_str1['A']) # 5
print(counter_str1['B']) # 4
print(counter_str1['C']) # 3
print(counter_str1['D']) # 2
There's also str.count(sub[, start[, end]
Return the number of non-overlapping occurrences of substring sub in
the range [start, end]. Optional arguments start and end are
interpreted as in slice notation.
As an example:
print(string1.count('A')) ## 5
The following code accomplishes the task without importing any modules.
def freq_map(s):
num = 0 # number of adjacent, identical characters
curr = s[0] # current character being processed
result = '' # result of function
for i in range(len(s)):
if s[i] == curr:
num += 1
else:
result += str(num) + curr
curr = s[i]
num = 1
result += str(num) + curr
return result
Note: Since you requested a solution based on performance, I suggest you use this code or a modified version of it.
I have executed rough performance test against the code provided by CoryKramer for reference. This code performed the same function in 58% of the time without using external modules. The snippet can be found here.
I would use itertools.groupby to group consecutive runs of the same letter. Then use a generator expression within join to create a string representation of the count and letter for each run.
from itertools import groupby
def summarize(s):
return ''.join(str(sum(1 for _ in i[1])) + i[0] for i in groupby(s))
Examples
>>> summarize(string1)
'5A4B3C2D'
>>> summarize(string2)
'2C1B1A5D1B1A1C1D1C'
>>> summarize(string3)
'1D1A1B1C1B1E1D2C1A1E1D2B'

How can I modify the following program so that I can make sure that each letter it replaces is unique?

I wrote a program that takes a list of 3-letter strings (also called codons, for those who know biology) and for each string, it will pick any one of the 3 letters (randomly) and will replace that letter with eithe A, G, C, or T (at random). For example: For the string GCT, it will pick any one of the 3 positions at random i.e. C, and then it will randomly change it to either A, G, C, or T i.e. T. So the new string (or codon) generated will be GTT, and so on for the next string on the list.
However, there is one problem. The way I have written it doesn't check to make sure that the new string that it's generating is not the same as the old one. So if the program randomly chooses to change a letter to the same one as initial, then it will output the same string by chance i.e. switching the C from GCT into a C again and producing GCT. I want to make sure this doesn't happen so that the program isn't generating the same string, because this does happen by random chance when analyzing hundreds of thousands of these codons/strings. I tried to do this by using list(A, G, T, C) - codon[index] in the second line of my 'for' loop, but it didn't work.
I won't bother you with the entire code, but initially I just opened the file where my codons/strings are listed (in a column) and appended all of them into a list and named it 'codon'. Here's the remainder:
import random
def string_replace(s,index,char):
return s[:index] + char + s[index+1:]
for x in range(1,10): # I set the range to 10 so that I can manually check if the program worked properly
index = random.randrange(3)
letter_to_replace = random.choice(list({"A", "G", "T", "C"} - {codon[index]}))
mutated_codon = [string_replace(codon[x], index, letter_to_replace)]
print mutated_codon)
- {codon[index]
--> this will be a 3 letter code if codon is a list of 3 letter strings
think you want codon[x][index]
edit function, to have your codonset there, rather then down, give indexes to replace there,
I don't know how you will create list of codons, but here I have one example
listofcodons=["ATC", "AGT", "ACC"]
for s in listofcodons:
index=random.randrange(3)
mutated=string_replace(s,index)
print mutated
def string_replace(s,index):
codonset=set(["A","C","G","T"])
toreplace=s[index]
#codonset.pop(toreplace)
codonset.remove(toreplace)
char=random.choice(codonset)
return s[:index] + char + s[index+1:]
So I was bored and decided to code golf it (could be shorter but was satisfied here).
from random import choice as c
a=['ACT','ATT','GCT'] # as many as you want
f=lambda s,i:s[:i]+c(list(set(['A','G','C','T'])-set(s[i])))+s[i+1:]
b=[f(s,c([0,1,2]))for s in a]
print b
a can be your list of codons and b will be a list of codons with a random index replaced by a random (never the same) letter.
Ok to answer your new question:
from random import choice as c
codons = ['ACT','ATT','GCT']
f=lambda s,i:s[:i]+c(list(set(['A','G','C','T'])-set(s[i])))+s[i+1:]
mutated_codons = [f(s,c([0,1,2]))for s in codons]
for codon in mutated_codons:
try:
print codon, codon_lookup[codon]
except KeyError, e:
print e
Assuming your dictionary is called condon_lookup, this will print each mutated codon followed by its amino acid lookup. Your old code was looping over the letters in each mutated codon instead of looping through a list of codons like you intended.
You could use a while loop:
import random
mutated_codon=codon='GCT'
while mutated_codon==codon:
li=list(mutated_codon)
li[random.choice([0,1,2])]=random.choice(["A", "G", "T", "C"])
mutated_codon = ''.join(li)
print codon, mutated_codon
How about something like this?
#!/usr/local/cpython-3.3/bin/python
import random
def yield_mutated_codons(codon):
possible_set = set({"A", "G", "T", "C"})
for x in range(1, 10):
index = random.randrange(3)
letter_to_replace = codon[index]
current_set = possible_set - set(letter_to_replace)
codon[index] = random.choice(list(current_set))
yield ''.join(codon)
def main():
codon = list('GAT')
for mutated_codon in yield_mutated_codons(codon):
print(mutated_codon)
main()

Python - build new string of specific length with n replacements from specific alphabet

I have been working on a fast, efficient way to solve the following problem, but as of yet, I have only been able to solve it using a rather slow, nest-loop solution. Anyways, here is the description:
So I have a string of length L, lets say 'BBBX'. I want to find all possible strings of length L, starting from 'BBBX', that differ at, at most, 2 positions and, at least, 0 positions. On top of that, when building the new strings, new characters must be selected from a specific alphabet.
I guess the size of the alphabet doesn't matter, so lets say in this case the alphabet is ['B', 'G', 'C', 'X'].
So, some sample output would be, 'BGBG', 'BGBC', 'BBGX', etc. For this example with a string of length 4 with up to 2 substitutions, my algorithm finds 67 possible new strings.
I have been trying to use itertools to solve this problem, but I am having a bit of difficulty finding a solution. I try to use itertools.combinations(range(4), 2) to find all the possible positions. I am then thinking of using product() from itertools to build all of the possibilities, but I am not sure if there is a way I could connect it somehow to the indices from the output of combinations().
Here's my solution.
The first for loop tells us how many replacements we will perform. (0, 1 or 2 - we go through each)
The second loop tells us which letters we will change (by their indexes).
The third loop goes through all of the possible letter changes for those indexes. There's some logic to make sure we actually change the letter (changing "C" to "C" doesn't count).
import itertools
def generate_replacements(lo, hi, alphabet, text):
for count in range(lo, hi + 1):
for indexes in itertools.combinations(range(len(text)), count):
for letters in itertools.product(alphabet, repeat=count):
new_text = list(text)
actual_count = 0
for index, letter in zip(indexes, letters):
if new_text[index] == letter:
continue
new_text[index] = letter
actual_count += 1
if actual_count == count:
yield ''.join(new_text)
for text in generate_replacements(0, 2, 'BGCX', 'BBBX'):
print text
Here's its output:
BBBX GBBX CBBX XBBX BGBX BCBX BXBX BBGX BBCX BBXX BBBB BBBG BBBC GGBX
GCBX GXBX CGBX CCBX CXBX XGBX XCBX XXBX GBGX GBCX GBXX CBGX CBCX CBXX
XBGX XBCX XBXX GBBB GBBG GBBC CBBB CBBG CBBC XBBB XBBG XBBC BGGX BGCX
BGXX BCGX BCCX BCXX BXGX BXCX BXXX BGBB BGBG BGBC BCBB BCBG BCBC BXBB
BXBG BXBC BBGB BBGG BBGC BBCB BBCG BBCC BBXB BBXG BBXC
Not tested much, but it does find 67 for the example you gave. The easy way to connect the indices to the products is via zip():
def sub(s, alphabet, minsubs, maxsubs):
from itertools import combinations, product
origs = list(s)
alphabet = set(alphabet)
for nsubs in range(minsubs, maxsubs + 1):
for ix in combinations(range(len(s)), nsubs):
prods = [alphabet - set(origs[i]) for i in ix]
s = origs[:]
for newchars in product(*prods):
for i, char in zip(ix, newchars):
s[i] = char
yield "".join(s)
count = 0
for s in sub('BBBX', 'BGCX', 0, 2):
count += 1
print s
print count
Note: the major difference from FogleBird's is that I posted first - LOL ;-) The algorithms are very similar. Mine constructs the inputs to product() so that no substitution of a letter for itself is ever attempted; FogleBird's allows "identity" substitutions, but counts how many valid substitutions are made and then throws the result away if any identity substitutions occurred. On longer words and a larger number of substitutions, that can be much slower (potentially the difference between len(alphabet)**nsubs and (len(alphabet)-1)**nsubs times around the ... in product(): loop).

Categories