I want to be able to generate 12 character long chain, of hexadecimal, BUT with no more than 2 identical numbers duplicate in the chain: 00 and not 000
Because, I know how to generate ALL possibilites, including 00000000000 to FFFFFFFFFFF, but I know that I won't use all those values, and because the size of the file generated with ALL possibilities is many GB long, I want to reduce the size by avoiding the not useful generated chains.
So my goal is to have results like 00A300BF8911 and not like 000300BF8911
Could you please help me to do so?
Many thanks in advance!
if you picked the same one twice, remove it from the choices for a round:
import random
hex_digits = set('0123456789ABCDEF')
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
print(result)
Since the title mentions generators. Here's the above as a generator:
import random
hex_digits = set('0123456789ABCDEF')
def hexGen():
while True:
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
yield result
my_hex_gen = hexGen()
counter = 0
for result in my_hex_gen:
print(result)
counter += 1
if counter > 10:
break
Results:
1ECC6A83EB14
D0897DE15E81
9C3E9028B0DE
CE74A2674AF0
9ECBD32C003D
0DF2E5DAC0FB
31C48E691C96
F33AAC2C2052
CD4CEDADD54D
40A329FF6E25
5F5D71F823A4
You could also change the while true loop to only produce a certain number of these based on a number passed into the function.
I interpret this question as, "I want to construct a rainbow table by iterating through all strings that have the following qualities. The string has a length of 12, contains only the characters 0-9 and A-F, and it never has the same character appearing three times in a row."
def iter_all_strings_without_triplicates(size, last_two_digits = (None, None)):
a,b = last_two_digits
if size == 0:
yield ""
else:
for c in "0123456789ABCDEF":
if a == b == c:
continue
else:
for rest in iter_all_strings_without_triplicates(size-1, (b,c)):
yield c + rest
for s in iter_all_strings_without_triplicates(12):
print(s)
Result:
001001001001
001001001002
001001001003
001001001004
001001001005
001001001006
001001001007
001001001008
001001001009
00100100100A
00100100100B
00100100100C
00100100100D
00100100100E
00100100100F
001001001010
001001001011
...
Note that there will be several hundred terabytes' worth of values outputted, so you aren't saving much room compared to just saving every single string, triplicates or not.
import string, random
source = string.hexdigits[:16]
result = ''
while len(result) < 12 :
idx = random.randint(0,len(source))
if len(result) < 3 or result[-1] != result[-2] or result[-1] != source[idx] :
result += source[idx]
You could extract a random sequence from a list of twice each hexadecimal digits:
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
hex_number = ''.join(digits[:12])
If you wanted to allow shorter sequences, you could randomize that too, and left fill the blanks with zeros.
import random
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
num_digits = random.randrange(3, 13)
hex_number = ''.join(['0'] * (12-num_digits)) + ''.join(digits[:num_digits])
print(hex_number)
You could use a generator iterating a window over the strings your current implementation yields. Sth. like (hex_str[i:i + 3] for i in range(len(hex_str) - window_size + 1)) Using len and set you could count the number of different characters in the slice. Although in your example it might be easier to just compare all 3 characters.
You can create an array from 0 to 255, and use random.sample with your list to get your list
Related
I find some difficulties with the task of printing numbers in given range that contain only odd digits.
f.e: The first number is 2345 and the second number is 6789. There is one more thing - the printed numbers should be limited only the range according to the digit position 2 to 6 (3,5), 3 to 7(3,5,7), 4 to 8(5,7), 5 to 9(5,7,9) - so it means that the first numbers should be 3355,3357,3359,3375,3377,3379,3555,3557....
The code does not execute it the way output should look:
number_one=int(input())
number_two=int(input())
list_one=[]
list_two=[]
number_one=str(number_one)
number_two=str(number_two)
for i in number_one:
if int(i)==0 or int(i)%2==0:
i=int(i)+1
list_one.append(int(i))
for i in number_two:
list_two.append(int(i))
a=0
b=0
c=0
d=0
for j in range(list_one[0],list_two[0]+1):
if j%2==1:
a=j
for p in range(list_one[1],list_two[1]+1):
if p%2==1:
b=p
for x in range(list_one[2],list_two[2]+1):
if x%2==1:
c=x
for y in range(list_one[3],list_two[3]+1):
if y%2==1:
d=y
print(f"{a}{b}{c}{d}",end=" ")
There are a lot of repetitions in the output that I would like to avoid.
Thank you in advance!
May be this is not an optimal solution.
But this is working for positive integers with same length.
number_one=int(input())
number_two=int(input())
if len(str(number_one)) != len(str(number_two)):
raise Exception("numbers should be of same length")
def print_num(num_one, num_two):
res = []
for i,j in zip(num_one, num_two):
next_odd_for_i = int(i) + (not (int(i)%2))
prev_odd_for_j = int(j) - (not (int(j)%2))
temp_str = ""
for i_next in range(next_odd_for_i, prev_odd_for_j+1, 2):
temp_str += str(i_next)
res.append(temp_str)
return res
def print_perm(li_of_str):
if len(li_of_str) == 1:
return [li_of_str[-1]]
res = []
first = li_of_str[0]
for j in first:
tmp = [j+k for n in print_perm(li_of_str[1:]) for k in n ]
res.append(tmp)
return res
print(print_num(str(number_one), str(number_two)))
print(print_perm(print_num(str(number_one), str(number_two))))
One way to solve this problem is with recursion. This function takes in two strings representing numbers and returns all the odd numbers (as strings) that satisfy the conditions you specified:
def odd_digits(num1, num2):
# split off first digit of string
msd1, rest1 = int(num1[0]), num1[1:]
# make the digit odd if required
msd1 += msd1 % 2 == 0
# split off first digit of string
msd2, rest2 = int(num2[0]), num2[1:]
# make the digit odd if required
msd2 -= msd2 % 2 == 0
# if no more digits, just return the values between msd1 and msd2
if not rest1:
return [str(i) for i in range(msd1, msd2+1, 2)]
# otherwise, append the results of a recursive call to each
# odd digit between msd1 and msd2
result = []
for i in range(msd1, msd2+1, 2):
result += [str(i) + o for o in odd_digits(rest1, rest2)]
return result
print(odd_digits('2345', '6789'))
Output:
[
'3355', '3357', '3359',
'3375', '3377', '3379',
'3555', '3557', '3559',
'3575', '3577', '3579',
'3755', '3757', '3759',
'3775', '3777', '3779',
'5355', '5357', '5359',
'5375', '5377', '5379',
'5555', '5557', '5559',
'5575', '5577', '5579',
'5755', '5757', '5759',
'5775', '5777', '5779'
]
If you want to use integer values just use (for example)
print(list(map(int, odd_digits(str(2345), str(6789)))))
The output will be as above but all values will be integers rather than strings.
If you can use libraries, you can generate ranges for each digit and then use itertools.product to find all the combinations:
import itertools
def odd_digits(num1, num2):
ranges = []
for d1, d2 in zip(num1, num2):
d1 = int(d1) + (int(d1) % 2 == 0)
d2 = int(d2) - (int(d2) % 2 == 0)
ranges.append(list(range(d1, d2+1, 2)))
return [''.join(map(str, t)) for t in itertools.product(*ranges)]
This function takes string inputs and produces string outputs, which will be the same as the first function above.
goal: I have a string which usually looks like this "010" and I need to replace the zeros by 1 in all the possible ways like this ["010", "110", "111", "011"]
problem when I replace the zeros with 1s I iterate through the letters of the string from left to right then from right to left. As you can see in the code where I did number = number[::-1]. Now, this method does not actually cover all the possibilities.
I also need to maybe start from the middle or maybe use the permutation method But not sure how to apply in python.
mathematically there is something like factorial of the number of places/(2)!
A = '0111011110000'
B = '010101'
C = '10000010000001101'
my_list = [A,B,C]
for number in [A,B,C]:
number = number[::-1]
for i , n in enumerate(number):
number = list(number)
number[i] = '1'
number = ''.join(number)
if number not in my_list: my_list.append(number)
for number in [A,B,C]:
for i , n in enumerate(number):
number = list(number)
number[i] = '1'
number = ''.join(number)
if number not in my_list: my_list.append(number)
print(len(my_list))
print(my_list)
You can use separate out the zeros and then use itertools.product -
from itertools import product
x = '0011'
perm_elements = [('0', '1') if digit == '0' else ('1', ) for digit in x]
print([''.join(x) for x in product(*perm_elements)])
['0011', '0111', '1011', '1111']
If you only need the number of such combinations, and not the list itself - that should just be 2 ** x.count('0')
Well, you will definitely get other answers with a traditional implementations of combinations with fixed indexes, but as we're working with just "0" and "1", you can use next hack:
source = "010100100001100011"
pattern = source.replace("0", "{}")
count = source.count("0")
combinations = [pattern.format(*f"{i:0{count}b}") for i in range(1 << count)]
Basically, we count amount of zeros in source, then iteration over range where limit is number with this amount of set bits and unpack every number in binary form into a pattern.
It should be slightly faster if we predefine pattern for binary transformation too:
source = "010100100001100011"
pattern = source.replace("0", "{}")
count = source.count("0")
fmt = f"{{:0{count}b}}"
result = [pattern.format(*fmt.format(i)) for i in range(1 << count)]
Upd. It's not clear do you need to generate all possible combinations or just get number, so originally I provided code to generate them, but if you will look closely in my method I'm getting number of all possible combinations using 1 << count, where count is amount of '0' chars in source string. So if you need just number, code is next:
source = "010100100001100011"
number_of_combinations = 1 << source.count("0")
Alternatively, you can also use 2 ** source.count("0"), but generally power is much more slower than binary shift, so I'd recommend to use option I originally advised.
We also can use recursive solution for this problem, we iterate over string and if saw a "0" change it to "1" and begin another branch on this new string:
s = "010100100001100011"
def perm(s, i=0, result=[]):
if i < len(s):
if s[i] == "0":
t = s[:i]+"1"+s[i+1:]
result.append(t)
perm(t, i+1, result)
perm(s, i+1, result)
res = [s]
perm(s, 0, res)
print(res)
For each position in the string that has a zero, you can either replace it with a 1 or not. This creates the combinations. So you can progressively build the resulting list of strings by adding the replacements of each '0' position with a '1' based on the previous replacement results:
def zeroTo1(S):
result = [S] # start with no replacement
for i,b in enumerate(S):
if b != '0': continue # only for '0' positions
result += [r[:i]+'1'+r[i+1:] for r in result] # add replacements
return result
print(zeroTo1('010'))
['010', '110', '011', '111']
If you're allowed to use libraries, the product function from itertools can be used to combine the zero replacements directly for you:
from itertools import product
def zeroTo1(S):
return [*map("".join,product(*("01"[int(b):] for b in S)))]
The tuples of 1s and 0s generated by the product function are assembled into individual strings by mapping the string join function onto its output.
Based on your objective you can do this to obtain the expected results.
A = '0111011110000'
B = '010'
C = '10000010000001101'
my_list = [A, B, C]
new_list = []
for key, number in enumerate(my_list):
for key_item, num in enumerate(number):
item_list = [i for i in number]
item_list[key_item] = "1"
new_list.append(''.join(item_list))
print(len(new_list))
print(new_list)
yamxxopd
yndfyamxx
Output: 5
I am not quite sure how to find the number of the most amount of shared characters between two strings. For example (the strings above) the most amount of characters shared together is "yamxx" which is 5 characters long.
xx would not be a solution because that is not the most amount of shared characters. In this case the most is yamxx which is 5 characters long so the output would be 5.
I am quite new to python and stack overflow so any help would be much appreciated!
Note: They should be the same order in both strings
Here is simple, efficient solution using dynamic programming.
def longest_subtring(X, Y):
m,n = len(X), len(Y)
LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
result = 0
for i in range(m + 1):
for j in range(n + 1):
if (i == 0 or j == 0):
LCSuff[i][j] = 0
elif (X[i-1] == Y[j-1]):
LCSuff[i][j] = LCSuff[i-1][j-1] + 1
result = max(result, LCSuff[i][j])
else:
LCSuff[i][j] = 0
print (result )
longest_subtring("abcd", "arcd") # prints 2
longest_subtring("yammxdj", "nhjdyammx") # prints 5
This solution starts with sub-strings of longest possible lengths. If, for a certain length, there are no matching sub-strings of that length, it moves on to the next lower length. This way, it can stop at the first successful match.
s_1 = "yamxxopd"
s_2 = "yndfyamxx"
l_1, l_2 = len(s_1), len(s_2)
found = False
sub_length = l_1 # Let's start with the longest possible sub-string
while (not found) and sub_length: # Loop, over decreasing lengths of sub-string
for start in range(l_1 - sub_length + 1): # Loop, over all start-positions of sub-string
sub_str = s_1[start:(start+sub_length)] # Get the sub-string at that start-position
if sub_str in s_2: # If found a match for the sub-string, in s_2
found = True # Stop trying with smaller lengths of sub-string
break # Stop trying with this length of sub-string
else: # If no matches found for this length of sub-string
sub_length -= 1 # Let's try a smaller length for the sub-strings
print (f"Answer is {sub_length}" if found else "No common sub-string")
Output:
Answer is 5
s1 = "yamxxopd"
s2 = "yndfyamxx"
# initializing counter
counter = 0
# creating and initializing a string without repetition
s = ""
for x in s1:
if x not in s:
s = s + x
for x in s:
if x in s2:
counter = counter + 1
# display the number of the most amount of shared characters in two strings s1 and s2
print(counter) # display 5
I have a version number in a file like this:
Testing x.x.x.x
So I am grabbing it off like this:
import re
def increment(match):
# convert the four matches to integers
a,b,c,d = [int(x) for x in match.groups()]
# return the replacement string
return f'{a}.{b}.{c}.{d}'
lines = open('file.txt', 'r').readlines()
lines[3] = re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, lines[3])
I want to make it so if the last digit is a 9... then change it to 0 and then change the previous digit to a 1. So 1.1.1.9 changes to 1.1.2.0.
I did that by doing:
def increment(match):
# convert the four matches to integers
a,b,c,d = [int(x) for x in match.groups()]
# return the replacement string
if (d == 9):
return f'{a}.{b}.{c+1}.{0}'
elif (c == 9):
return f'{a}.{b+1}.{0}.{0}'
elif (b == 9):
return f'{a+1}.{0}.{0}.{0}'
Issue occurs when its 1.1.9.9 or 1.9.9.9. Where multiple digits need to rounded. How can I handle this issue?
Use integer addition?
def increment(match):
# convert the four matches to integers
a,b,c,d = [int(x) for x in match.groups()]
*a,b,c,d = [int(x) for x in str(a*1000 + b*100 + c*10 + d + 1)]
a = ''.join(map(str,a)) # fix for 2 digit 'a'
# return the replacement string
return f'{a}.{b}.{c}.{d}'
If your versions are never going to go beyond 10, it is better to just convert it to an integer, increment it and then convert back to a string.
This allows you to go up to as many version numbers as you require and you are not limited to thousands.
def increment(match):
match = match.replace('.', '')
match = int(match)
match += 1
match = str(match)
output = '.'.join(match)
return output
Add 1 to the last element. If it's more than 9, set it to 0 and do the same for the previous element. Repeat as necessary:
import re
def increment(match):
# convert the four matches to integers
g = [int(x) for x in match.groups()]
# increment, last one first
pos = len(g)-1
g[pos] += 1
while pos > 0:
if g[pos] > 9:
g[pos] = 0
pos -= 1
g[pos] += 1
else:
break
# return the replacement string
return '.'.join(str(x) for x in g)
print (re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, '1.8.9.9'))
print (re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, '1.9.9.9'))
print (re.sub(r"\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b", increment, '9.9.9.9'))
Result:
1.9.0.0
2.0.0.0
10.0.0.0
I would like to create a program that generate a particular long 7 characters string.
It must follow this rules:
0-9 are before a-z which are before A-Z
Length is 7 characters.
Each character must be different from the two close (Example 'NN' is not allowed)
I need all the possible combination incrementing from 0000000 to ZZZZZZZ but not in a random sequence
I have already done it with this code:
from string import digits, ascii_uppercase, ascii_lowercase
from itertools import product
chars = digits + ascii_lowercase + ascii_uppercase
for n in range(7, 8):
for comb in product(chars, repeat=n):
if (comb[6] != comb[5] and comb[5] != comb[4] and comb[4] != comb[3] and comb[3] != comb[2] and comb[2] != comb[1] and comb[1] != comb[0]):
print ''.join(comb)
But it is not performant at all because i have to wait a long time before the next combination.
Can someone help me?
Edit: I've updated the solution to use cached short sequences for lengths greater than 4. This significantly speeds up the calculations. With the simple version, it'd take 18.5 hours to generate all sequences of length 7, but with the new method only 4.5 hours.
I'll let the docstring do all of the talking for describing the solution.
"""
Problem:
Generate a string of N characters that only contains alphanumerical
characters. The following restrictions apply:
* 0-9 must come before a-z, which must come before A-Z
* it's valid to not have any digits or letters in a sequence
* no neighbouring characters can be the same
* the sequences must be in an order as if the string is base62, e.g.,
01010...01019, 0101a...0101z, 0101A...0101Z, 01020...etc
Solution:
Implement a recursive approach which discards invalid trees. For example,
for "---" start with "0--" and recurse. Try "00-", but discard it for
"01-". The first and last sequences would then be "010" and "ZYZ".
If the previous character in the sequence is a lowercase letter, such as
in "02f-", shrink the pool of available characters to a-zA-Z. Similarly,
for "9gB-", we should only be working with A-Z.
The input also allows to define a specific sequence to start from. For
example, for "abGH", each character will have access to a limited set of
its pool. In this case, the last letter can iterate from H to Z, at which
point it'll be free to iterate its whole character pool next time around.
When specifying a starting sequence, if it doesn't have enough characters
compared to `length`, it will be padded to the right with characters free
to explore their character pool. For example, for length 4, the starting
sequence "29" will be transformed to "29 ", where we will deal with two
restricted characters temporarily.
For long lengths the function internally calls a routine which relies on
fewer recursions and cached results. Length 4 has been chosen as optimal
in terms of precomputing time and memory demands. Briefly, the sequence is
broken into a remainder and chunks of 4. For each preceeding valid
subsequence, all valid following subsequences are fetched. For example, a
sequence of six would be split into "--|----" and for "fB|----" all
subsequences of 4 starting A, C, D, etc would be produced.
Examples:
>>> for i, x in enumerate(generate_sequences(7)):
... print i, x
0, 0101010
1, 0101012
etc
>>> for i, x in enumerate(generate_sequences(7, '012abcAB')):
... print i, x
0, 012abcAB
1, 012abcAC
etc
>>> for i, x in enumerate(generate_sequences(7, 'aB')):
... print i, x
0, aBABABA
1, aBABABC
etc
"""
import string
ALLOWED_CHARS = (string.digits + string.ascii_letters,
string.ascii_letters,
string.ascii_uppercase,
)
CACHE_LEN = 4
def _generate_sequences(length, sequence, previous=''):
char_set = ALLOWED_CHARS[previous.isalpha() * (2 - previous.islower())]
if sequence[-length] != ' ':
char_set = char_set[char_set.find(sequence[-length]):]
sequence[-length] = ' '
char_set = char_set.replace(previous, '')
if length == 1:
for char in char_set:
yield char
else:
for char in char_set:
for seq in _generate_sequences(length-1, sequence, char):
yield char + seq
def _generate_sequences_cache(length, sequence, cache, previous=''):
sublength = length if length == CACHE_LEN else min(CACHE_LEN, length-CACHE_LEN)
subseq = cache[sublength != CACHE_LEN]
char_set = ALLOWED_CHARS[previous.isalpha() * (2 - previous.islower())]
if sequence[-length] != ' ':
char_set = char_set[char_set.find(sequence[-length]):]
index = len(sequence) - length
subseq0 = ''.join(sequence[index:index+sublength]).strip()
sequence[index:index+sublength] = [' '] * sublength
if len(subseq0) > 1:
subseq[char_set[0]] = tuple(
s for s in subseq[char_set[0]] if s.startswith(subseq0))
char_set = char_set.replace(previous, '')
if length == CACHE_LEN:
for char in char_set:
for seq in subseq[char]:
yield seq
else:
for char in char_set:
for seq1 in subseq[char]:
for seq2 in _generate_sequences_cache(
length-sublength, sequence, cache, seq1[-1]):
yield seq1 + seq2
def precompute(length):
char_set = ALLOWED_CHARS[0]
if length > 1:
sequence = [' '] * length
result = {}
for char in char_set:
result[char] = tuple(char + seq for seq in _generate_sequences(
length-1, sequence, char))
else:
result = {char: tuple(char) for char in ALLOWED_CHARS[0]}
return result
def generate_sequences(length, sequence=''):
# -------------------------------------------------------------------------
# Error checking: consistency of the value/type of the arguments
if not isinstance(length, int):
msg = 'The sequence length must be an integer: {}'
raise TypeError(msg.format(type(length)))
if length < 0:
msg = 'The sequence length must be greater or equal than 0: {}'
raise ValueError(msg.format(length))
if not isinstance(sequence, str):
msg = 'The sequence must be a string: {}'
raise TypeError(msg.format(type(sequence)))
if len(sequence) > length:
msg = 'The sequence has length greater than {}'
raise ValueError(msg.format(length))
# -------------------------------------------------------------------------
if not length:
yield ''
else:
# ---------------------------------------------------------------------
# Error checking: the starting sequence, if provided, must be valid
if any(s not in ALLOWED_CHARS[0]+' ' for s in sequence):
msg = 'The sequence contains invalid characters: {}'
raise ValueError(msg.format(sequence))
if sequence.strip() != sequence.replace(' ', ''):
msg = 'Uninitiated characters in the middle of the sequence: {}'
raise ValueError(msg.format(sequence.strip()))
sequence = sequence.strip()
if any(a == b for a, b in zip(sequence[:-1], sequence[1:])):
msg = 'No neighbours must be the same character: {}'
raise ValueError(msg.format(sequence))
char_type = [s.isalpha() * (2 - s.islower()) for s in sequence]
if char_type != sorted(char_type):
msg = '0-9 must come before a-z, which must come before A-Z: {}'
raise ValueError(msg.format(sequence))
# ---------------------------------------------------------------------
sequence = list(sequence.ljust(length))
if length <= CACHE_LEN:
for s in _generate_sequences(length, sequence):
yield s
else:
remainder = length % CACHE_LEN
if not remainder:
cache = tuple((precompute(CACHE_LEN),))
else:
cache = tuple((precompute(CACHE_LEN), precompute(remainder)))
for s in _generate_sequences_cache(length, sequence, cache):
yield s
I've included thorough error checks in the generate_sequences() function. For the sake of brevity you can remove them if you can guarantee that whoever calls the function will never do so with invalid input. Specifically, invalid starting sequences.
Counting number of sequences of specific length
While the function will sequentially generate the sequences, there is a simple combinatorics calcuation we can perform to compute how many valid sequences exist in total.
The sequences can effectively be broken down to 3 separate subsequences. Generally speaking, a sequence can contain anything from 0 to 7 digits, followed by from 0 to 7 lowercase letters, followed by from 0 to 7 uppercase letters. As long as the sum of those is 7. This means we can have the partition (1, 3, 3), or (2, 1, 3), or (6, 0, 1), etc. We can use the stars and bars to calculate the various combinations of splitting a sum of N into k bins. There is already an implementation for python, which we'll borrow. The first few partitions are:
[0, 0, 7]
[0, 1, 6]
[0, 2, 5]
[0, 3, 4]
[0, 4, 3]
[0, 5, 2]
[0, 6, 1]
...
Next, we need to calculate how many valid sequences we have within a partition. Since the digit subsequences are independent of the lowercase letters, which are independent of the uppercase letters, we can calculate them individually and multiply them together.
So, how many digit combinations we can have for a length of 4? The first character can be any of the 10 digits, but the second character has only 9 options (ten minus the one that the previous character is). Similarly for the third letter and so on. So the total number of valid subsequences is 10*9*9*9. Similarly, for length 3 for letters, we get 26*25*25. Overall, for the partition, say, (2, 3, 2), we have 10*9*26*25*25*26*25 = 950625000 combinations.
import itertools as it
def partitions(n, k):
for c in it.combinations(xrange(n+k-1), k-1):
yield [b-a-1 for a, b in zip((-1,)+c, c+(n+k-1,))]
def count_subsequences(pool, length):
if length < 2:
return pool**length
return pool * (pool-1)**(length-1)
def count_sequences(length):
counts = [[count_subsequences(i, j) for j in xrange(length+1)] \
for i in [10, 26]]
print 'Partition {:>18}'.format('Sequence count')
total = 0
for a, b, c in partitions(length, 3):
subtotal = counts[0][a] * counts[1][b] * counts[1][c]
total += subtotal
print '{} {:18}'.format((a, b, c), subtotal)
print '\nTOTAL {:22}'.format(total)
Overall, we observe that while generating the sequences fast isn't a problem, there are so many that it can take a long time. Length 7 has 78550354750 (78.5 billion) valid sequences and this number only scales approximately by a factor of 25 with each incremented length.
Try this
import string
import random
a = ''.join(random.choice(string.ascii_lowercase + string.ascii_uppercase + string.digits) for _ in range(7))
print(a)
If it's a random string you want that sticks to the above rules you can use something like this:
def f():
digitLen = random.randrange(8)
smallCharLen = random.randint(0, 7 - digitLen)
capCharLen = 7 - (smallCharLen + digitLen)
print (str(random.randint(0,10**digitLen-1)).zfill(digitLen) +
"".join([random.choice(ascii_lowercase) for i in range(smallCharLen)]) +
"".join([random.choice(ascii_uppercase) for i in range(capCharLen)]))
I haven't added the repeated character rule but one you have the string it's easy to filter out the unwanted strings using dictionaries. You can also fix the length of each segment by putting conditions on the segment lengths.
Edit: a minor bug.
Extreme cases are not handled here but can be done this way
import random
from string import digits, ascii_uppercase, ascii_lowercase
len1 = random.randint(1, 7)
len2 = random.randint(1, 7-len1)
len3 = 7 - len1 - len2
print len1, len2, len3
result = ''.join(random.sample(digits, len1) + random.sample(ascii_lowercase, len2) + random.sample(ascii_uppercase, len3))
with a similar approach of #julian
from string import digits, ascii_uppercase, ascii_lowercase
from itertools import product, tee, chain, izip, imap
def flatten(listOfLists):
"Flatten one level of nesting"
#recipe of itertools
return chain.from_iterable(listOfLists)
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
#recipe of itertools
a, b = tee(iterable)
next(b, None)
return izip(a, b)
def eq_pair(x):
return x[0]==x[1]
def comb_noNN(alfa,size):
if size>0:
for candidato in product(alfa,repeat=size):
if not any( imap(eq_pair,pairwise(candidato)) ):
yield candidato
else:
yield tuple()
def my_string(N=7):
for a in range(N+1):
for b in range(N-a+1):
for c in range(N-a-b+1):
if sum([a,b,c])==N:
for letras in product(
comb_noNN(digits,c),
comb_noNN(ascii_lowercase,b),
comb_noNN(ascii_uppercase,a)
):
yield "".join(flatten(letras))
comb_noNN generate all combinations of char of a particular size that follow rule 3, then in my_string check all combination of length that add up to N and generate all string that follow rule 1 by individually generating each of digits, lower and upper case letters.
Some output of for i,x in enumerate(my_string())
0, '0101010'
...
100, '0101231'
...
491041580, '936gzrf'
...
758790032, '27ktxfi'
...
The reason it takes a long time to generate the first result with the original implementation is it takes a long time to reach the first valid value of 0101010 when starting from 0000000 as you do when using product.
Here's a recursive version which generates valid sequences rather than discarding invalid ones:
from string import digits, ascii_uppercase, ascii_lowercase
from sys import argv
from itertools import combinations_with_replacement, product
all_chars=[digits, ascii_lowercase, ascii_uppercase]
def seq(char_sets, start=None):
for char_set in char_sets:
for val in seqperm(char_set, start):
yield val
def seqperm(char_set, start=None, exclude=None):
left_chars, remaining_chars=char_set[0], char_set[1:]
if start:
try:
left_chars=left_chars[left_chars.index(start[0]):]
start=start[1:]
except:
left_chars=''
for left in left_chars:
if left != exclude:
if len(remaining_chars) > 0:
for right in seqperm(remaining_chars, start, left):
yield left + right
else:
yield left
if __name__ == "__main__":
count=int(argv[1])
start=None
if len(argv) == 3:
start=argv[2]
# char_sets=list(combinations_with_replacement(all_chars, 7))
char_sets=[[''.join(all_chars)] * 7]
for idx, val in enumerate(seq(char_sets, start)):
if idx == count:
break
print idx, val
Run as follows:
./permute.py 10
Output:
0 0101010
1 0101012
2 0101013
3 0101014
4 0101015
5 0101016
6 0101017
7 0101018
8 0101019
9 010101a
If you pass an additional argument then the script skips to the portion of the sequence which starts with that third argument like this:
./permute.py 10 01234Z
If it's a requirement to generate only permutations where lower letters always follow numbers and upper case always follow numbers and lower case then comment out the line char_sets=[[''.join(all_chars)] * 7] and use the line char_sets=list(combinations_with_replacement(all_chars, 7)).
Sample output for the above command line with char_sets=list(combinations_with_replacement(all_chars, 7)):
0 01234ZA
1 01234ZB
2 01234ZC
3 01234ZD
4 01234ZE
5 01234ZF
6 01234ZG
7 01234ZH
8 01234ZI
9 01234ZJ
Sample output for the same command line with char_sets=[[''.join(all_chars)] * 7]:
0 01234Z0
1 01234Z1
2 01234Z2
3 01234Z3
4 01234Z4
5 01234Z5
6 01234Z6
7 01234Z7
8 01234Z8
9 01234Z9
It's possible to implement the above without recursion as below. Performance characteristics don't change much:
from string import digits, ascii_uppercase, ascii_lowercase
from sys import argv
from itertools import combinations_with_replacement, product, izip_longest
all_chars=[digits, ascii_lowercase, ascii_uppercase]
def seq(char_sets, start=''):
for char_set in char_sets:
for val in seqperm(char_set, start):
yield val
def seqperm(char_set, start=''):
iters=[iter(chars) for chars in char_set]
# move to starting point in sequence if specified
for char, citer, chars in zip(list(start), iters, char_set):
try:
for _ in range(0, chars.index(char)):
citer.next()
except ValueError:
raise StopIteration
pos=0
val=''
while True:
citer=iters[pos]
try:
char=citer.next()
if val and val[-1] == char:
char=citer.next()
if pos == len(char_set) - 1:
yield val+char
else:
val = val + char
pos += 1
except StopIteration:
if pos == 0:
raise StopIteration
iters[pos] = iter(chars)
pos -= 1
val=val[:pos]
if __name__ == "__main__":
count=int(argv[1])
start=''
if len(argv) == 3:
start=argv[2]
# char_sets=list(combinations_with_replacement(all_chars, 7))
char_sets=[[''.join(all_chars)] * 7]
for idx, val in enumerate(seq(char_sets, start)):
if idx == count:
break
print idx, val
A recursive version with caching is also possible and that generates results faster but is less flexible.