How to find total number of possible combinations for a string? - python

How to find total number of possible sub sequences for a string that start with a particular character say 'a' and end with a particular character say 'b' from a given string?
EXAMPLE:
for a string 'aabb' if we want to know the count of how many sub sequences are possible if the sub-sequence must start from character'a' and end with character 'b' then valid sub sequences can be from (ab) contributed by index (0,2), (ab) contributed by index (0,3), (ab) contributed by index (1,2), (ab) contributed by index (1,3), (aab) using index (0,1,2) , (aab) using index (0,1,3) ,(abb) using index(0,2,3),(abb) using index(1,2,3) and aabb itself
so total is 9 .I can solve this for a string of small length but how to solve this for a large string where brute force doesn't work
Note:We consider two sub strings to be different if they start or end
at different indices of the given string.
def count(str,str1 ,str2 ):
l = len(str)
count=0
for i in range(0, l+1):
for j in range(i+1, l+1):
if str[i] == str1 and str[j-1] == str2:
count+=1
return count

Before I post my main code I'll try to explain how it works. Let the source string be 'a123b'. The valid subsequences consist of all the subsets of '123' prefixed with 'a' and suffixed with 'b'. The set of all subsets is called the powerset, and the itertools docs have code showing how to produce the powerset using combinations in the Itertools Recipes section.
# Print all subsequences of '123', prefixed with 'a' and suffixed with 'b'
from itertools import combinations
src = '123'
for i in range(len(src) + 1):
for s in combinations(src, i):
print('a' + ''.join(s) + 'b')
output
ab
a1b
a2b
a3b
a12b
a13b
a23b
a123b
Here's a brute-force solution which uses that recipe.
from itertools import combinations
def count_bruteforce(src, targets):
c0, c1 = targets
count = 0
for i in range(2, len(src) + 1):
for t in combinations(src, i):
if t[0] == c0 and t[-1] == c1:
count += 1
return count
It can be easily shown that the number of subsets of a set of n items is 2**n. So rather than producing the subsets one by one we can speed up the process by using that formula, which is what my count_fast function does.
from itertools import combinations
def count_bruteforce(src, targets):
c0, c1 = targets
count = 0
for i in range(2, len(src) + 1):
for t in combinations(src, i):
if t[0] == c0 and t[-1] == c1:
count += 1
return count
def count_fast(src, targets):
c0, c1 = targets
# Find indices of the target chars
idx = {c: [] for c in targets}
for i, c in enumerate(src):
if c in targets:
idx[c].append(i)
idx0, idx1 = idx[c0], idx[c1]
count = 0
for u in idx0:
for v in idx1:
if v < u:
continue
# Calculate the number of valid subsequences
# which start at u+1 and end at v-1.
n = v - u - 1
count += 2 ** n
return count
# Test
funcs = (
count_bruteforce,
count_fast,
)
targets = 'ab'
data = (
'ab', 'aabb', 'a123b', 'aacbb', 'aabbb',
'zababcaabb', 'aabbaaabbb',
)
for src in data:
print(src)
for f in funcs:
print(f.__name__, f(src, targets))
print()
output
ab
count_bruteforce 1
count_fast 1
aabb
count_bruteforce 9
count_fast 9
a123b
count_bruteforce 8
count_fast 8
aacbb
count_bruteforce 18
count_fast 18
aabbb
count_bruteforce 21
count_fast 21
zababcaabb
count_bruteforce 255
count_fast 255
aabbaaabbb
count_bruteforce 730
count_fast 730
There may be a way to make this even faster by starting the inner loop at the correct place rather than using continue to skip unwanted indices.

Easy, it should just be the number of letters to the power of two. I.e, n^2
Python implementation would just be n_substrings = n ** 2

Related

Replace Adjacent Elements of Circular Array to Make All Elements Equal

You are given a circular array A containing N integers. You can perform the
following operation on this array any number of items:
• For each i, replace A[i] by A[i-1], A[i] or A[i+1] i.e. you can keep the
current element or replace it by an adjacent element . Note that due to
circularity of the array adjacent elements exist even for the first and
the last element. In particular, A[i-1] for i=0 is the last element.
Determine the minimum number of steps needed to make all the elements of
the array equal.
Input Format
The first line contains an integer, N, denoting the number of elements in A.
Each line i of the N subsequent lines (where 0 $i< N) contains an integer
describing A[i].
Constraints
1 <= N <= 10^3
Sample input: 4 2 2 1 1 => Sample output: 1
Sample input:3 1 1 1 => Sample output: 0
Sample input:4 1 2 3 4 => Sample output: 2
I build the following code, it passes all visible test cases on platform, but not passing invisible test cases(that i'm not sure, what are they).Please help me to find any edge cases, if i forgot something here.
from collections import Counter
def make_equal(A):
count = 0
idx = []
map = Counter(A)
value = sorted(map.values(), reverse=True)[0]
for k, v in map.items():
if v == value:
key = k
for i, val in enumerate(A):
if val == key:
idx.append(i)
new_set = set(idx)
while len(new_set) < len(A):
for j in idx[:]:
l = (j + 1) % len(A)
m = (j - 1) % len(A)
idx.append(m)
idx.append(j)
idx.append(l)
count += 1
new_set = set(idx)
return count
print(make_equal(A))

How can I count the number of ways to divide a string into N parts of any size?

I'm trying to count the number of ways you can divide a given string into three parts in Python.
Example: "bbbbb" can be divided into three parts 6 ways:
b|b|bbb
b|bb|bb
b|bbb|b
bb|b|bb
bb|bb|b
bbb|b|b
My first line of thinking was N choose K, where N = the string's length and K = the number of ways to split (3), but that only works for 3 and 4.
My next idea was to iterate through the string and count the number of spots the first third could be segmented and the number of spots the second third could be segmented, then multiply the two counts, but I'm having trouble implementing that, and I'm not even too sure if it'd work.
How can I count the ways to split a string into N parts?
Think of it in terms of the places of the splits as the elements you're choosing:
b ^ b ^ b ^ ... ^ b
^ is where you can split, and there are N - 1 places where you can split (N is the length of the string), and, if you want to split the string into M parts, you need to choose M - 1 split places, so it's N - 1 choose M - 1.
For you example, N = 5, M = 3. (N - 1 choose M - 1) = (4 choose 2) = 6.
An implementation:
import scipy.special
s = 'bbbbb'
n = len(s)
m = 3
res = scipy.special.comb(n - 1, m - 1, exact=True)
print(res)
Output:
6
I came up with a solution to find the number of ways to split a string in python and I think it is quite easier to understand and has a better time complexity
def slitStr(s):
i = 1
j= 2
count = 0
while i <= len(s)-2:
# a, b, c are the split strings
a = s[:i]
b = s[i:j]
c = s[j:]
#increase j till it gets to the end of the list
#each time j gets to the end of the list increment i
#set j to i + 1
if j<len(s):
j+= 1
if j==len(s):
i += 1
j = i+1
# you can increment count after each iteration
count += 1
You can customize the solution to fit your need. I hope this helps.
Hope this helps you too :
string = "ABCDE"
div = "|"
out = []
for i in range(len(string)):
temp1 = ''
if 1 < i < len(string):
temp1 += string[0:i-1] + div
for j in range(len(string) + 1):
temp2 = ""
if j > i:
temp2 += string[i-1:j-1] + div + string[j-1:]
out.append(temp1 + temp2)
print(out)
Result :
['A|B|CDE', 'A|BC|DE', 'A|BCD|E', 'AB|C|DE', 'AB|CD|E', 'ABC|D|E']

Similarity Measure in Python

I am working on this coding challenge named Similarity Measure. Now the problem is my code works fine for some test cases, and failed due to the Time Limit Exceed problem. However, my code is not wrong, takes more than 25 sec for input of range 10^4.
I need to know what I can do to make it more efficient, I cannot think on any better solution than my code.
Question goes like this:
Problems states that given an array of positive integers, and now we have to answer based upon the Q queries.
Query: Given two indices L,R, determine the maximum absolute difference of index of two same elements lies between L and R
If in a range, there are no two same inputs then return 0
INPUT FORMAT
The first line contains N, no. of elements in the array A
The Second line contains N space separated integers that are elements of the array A
The third line contains Q the number of queries
Each of the Q lines contains L, R
CONSTRAINTS
1 <= N, Q <= 10^4
1 <= Ai <= 10^4
1 <= L, R <= N
OUTPUT FORMAT
For each query, print the ans in a new line
Sample Input
5
1 1 2 1 2
5
2 3
3 4
2 4
3 5
1 5
Sample Output
0
0
2
2
3
Explanation
[2,3] - No two elements are same
[3,4] - No two elements are same
[2,4] - there are two 1's so ans = |4-2| = 2
[3,5] - there are two 2's so ans = |5-3| = 2
[1,5] - there are three 1's and two 2's so ans = max(|4-2|, |5-3|, |4-1|, |2-1|) = 3
Here is my algorithm:
To take the input and test the range in a different method
Input will be L, R and the Array
For difference between L and R equal to 1, check if the next element is equal, return 1 else return 0
For difference more than 1, loop through array
Make a nested loop to check for the same element, if yes, store the difference into maxVal variable
Return maxVal
My Code:
def ansArray(L, R, arr):
maxVal = 0
if abs(R - L) == 1:
if arr[L-1] == arr[R-1]: return 1
else: return 0
else:
for i in range(L-1, R):
for j in range(i+1, R):
if arr[i] == arr[j]:
if (j-i) > maxVal: maxVal = j-i
return maxVal
if __name__ == '__main__':
input()
arr = (input().split())
for i in range(int(input())):
L, R = input().split()
print(ansArray(int(L), int(R), arr))
Please help me with this. I really want to learn a different and a more efficient way to solve this problem. Need to pass all the TEST CASES. :)
You can try this code:
import collections
def ansArray(L, R, arr):
dct = collections.defaultdict(list)
for index in range(L - 1, R):
dct[arr[index]].append(index)
return max(lst[-1] - lst[0] for lst in dct.values())
if __name__ == '__main__':
input()
arr = (input().split())
for i in range(int(input())):
L, R = input().split()
print(ansArray(int(L), int(R), arr))
Explanation:
dct is a dictionary that for every seen number keeps a list of indices. The list is sorted so lst[-1] - lst[0] will give maximum absolute difference for this number. Applying max to all this differences you get the answer. Code complexity is O(R - L).
This can be solved as O(N) approximately the following way:
from collections import defaultdict
def ansArray(L, R, arr) :
# collect the positions and save them into the dictionary
positions = defaultdict(list)
for i,j in enumerate(arr[L:R+1]) :
positions[j].append(i)
# create the list of the max differences in index
max_diff = list()
for vals in positions.values() :
max_diff.append( max(vals) - min(vals) )
# now return the max element from the list we have just created
if len(max_diff) :
return max(max_diff)
else :
return 0

Counting non matching, shared digits between two numbers in Python

I am trying to figure out a way to determine the total number of non-matching, common digits between two numbers in python.
So far I can get the number of matching digits between the two numbers.The end goal is to have a function that takes two numbers ie 6621 and 6662 and return the numbers 2 for the number of matching digits and 1 for the number of non-matching shared digits.
I have tried using nested while loops to do this, but the count is not always accurate depending on the numbers being compared.
while i < n:#check 2. Nested while statements
j = 0
while j < n:
if g_list[i] == p_list[j] and j == i:
x
elif g_list[i] == p_list[j]:
z += 1
print(g_list[i],p_list[j],z, i, j)
j += 1
i += 1
You could do it this way:
a = 6661
b = 6662
def find_difference(first, second):
first_list = list(str(first))
second_list = list(str(second))
c = set(first_list)
d = set(second_list)
print((len(c.symmetric_difference(d)),len(c.intersection(d))))
Output:
(2, 1)
You can count the number of occurrence of each digit in each number and take the minimum of those. This will get you the number of common digits, then you subtract the number of matching digits.
def get_count_of_digit(number):
l = [0 for i in range(10)]
list_digit = str(number)
for d in list_digit:
l[int(d)] += 1
return l
def number_of_common_digit(number1, number2):
l1, l2 = [get_count_of_digit(n) for n in (number1, number2)]
return sum([min(l1[i], l2[i]) for i in range(10)])

How to generate 5-character strings combinations (2 different digits, two equal letters and 1 letter) without duplication

I already published similar question but this is a DIFFERENT question.
I am trying to generate combinations of a 5-character strings consisting of three letters (exactly two are equal and another different letter) and two different digits but I got duplication when I tried to do so.
Example for CORRECT combinations:
82ccb
b8cc7
7c6dc
Example for INCORRECT combinations:
22ddc -> incorrect because the digits are equal and should be different
46ddd -> incorrect because there are more than 2 equal letters
2t4cd -> No 2 equal letters + 2 equal different letters
This is the code I am using:
LETTERS = 'bcdfghjklmnpqrstvwxz'
DIGITS = '2456789'
def aab12(letters=LETTERS, digits=DIGITS):
"""Generate the distinct 5-character strings consisting of three
letters (two are equal and a repeated letter) and two digits (each one is different from the other).
"""
letterdxs = set(range(5))
combs = []
for (d1, d2), (i, j), (l1, l2) in product(
permutations(digits, 2), # 2 digits (a repeated).
combinations(range(5), 2), # Positions for the 1st and 2nd digits.
permutations(letters, 2)): # 2 letters (a repeated).
x, y, z = letterdxs.difference((i, j))
s = set((x, y, z))
# Choosing 2 positions for the repeated letters
c1 = combinations((x, y, z), 2)
for c in c1:
result = []
result[i:i] = d1,
result[j:j] = d2,
result[c[0]:c[0]] = l1,
result[c[1]:c[1]] = l1,
# Choosing position for the last letter. This is position that was left
letter_indx = (s.difference(c)).pop()
result[letter_indx:letter_indx] = l2,
combs.append(''.join(result))
# Should be 478,800
print(len(combs))
return combs
def is_contain_dup(combos):
s = set(combos)
if len(s) != len(combos):
print('found duplicates !')
is_contain_dup(aab12())
I have duplication although the length is ok.
This function is based on this math:
Choosing 2 places for the different digits
Choosing 2 places for the repeated letter
Choosing different letter from the last letter
I am not sure what is causing the duplication but this is probably something with the choosing of the two equal letters + different letter.
Here is pure brute force, naive method with 4 nested loops:
LETTERS = 'bcdfghjklmnpqrstvwxz'
DIGITS = '2456789'
from itertools import permutations
def aab12_1(letters=LETTERS, digits=DIGITS):
st=[]
for fc in letters:
for sc in letters:
if sc==fc: continue
for n1 in digits:
for n2 in digits:
if n1==n2: continue
st.append(''.join((fc,fc,sc,n1,n2)))
di={e:[''.join(t) for t in permutations(e)] for e in st}
return {s for sl in di.values() for s in sl}
>>> r=aab12_1()
>>> len(r)
478800
This has O(n**4) complexity; ie, really bad for longer strings. However, the example strings are not so long and this is a doable approach for shorter strings.
You can cut the complexity a bit by sorting the generated base strings to cut the duplicate calls to permutations:
def aab12_2(letters=LETTERS, digits=DIGITS):
st=set()
for fc in letters:
for sc in letters:
if sc==fc: continue
for n1 in digits:
for n2 in digits:
if n1==n2: continue
st.add(''.join(sorted((fc,fc,sc,n1,n2))))
di={e:[''.join(t) for t in permutations(e)] for e in st}
return {s for sl in di.values() for s in sl}
That can be streamlined a bit further to:
from itertools import permutations, product, combinations
def aab12_3(letters=LETTERS, digits=DIGITS):
let_combo=[x+y for x,y in product([e+e for e in letters],letters) if x[0]!=y]
n_combos={a+b for a,b in combinations(digits,2)}
di={e:[''.join(t) for t in permutations(e)] for e in (x+y for x,y in product(let_combo, n_combos))}
return {s for sl in di.values() for s in sl}
That still has an implied O(n**3) with 3 products() which is the equivalent of a nested loop for each. Each O is faster however and the overall time here is now about 350 ms.
So, let's benchmark. Here are the 3 functions from above, Ajax1234's recursive function, and Rory Daulton's itertools function:
from itertools import combinations, permutations, product
def aab12_1(letters=LETTERS, digits=DIGITS):
st=[]
for fc in letters:
for sc in letters:
if sc==fc: continue
for n1 in digits:
for n2 in digits:
if n1==n2: continue
st.append(''.join((fc,fc,sc,n1,n2)))
di={e:[''.join(t) for t in permutations(e)] for e in st}
return {s for sl in di.values() for s in sl}
def aab12_2(letters=LETTERS, digits=DIGITS):
st=set()
for fc in letters:
for sc in letters:
if sc==fc: continue
for n1 in digits:
for n2 in digits:
if n1==n2: continue
st.add(''.join(sorted((fc,fc,sc,n1,n2))))
di={e:[''.join(t) for t in permutations(e)] for e in st}
return {s for sl in di.values() for s in sl}
def aab12_3(letters=LETTERS, digits=DIGITS):
let_combo=[x+y for x,y in product([e+e for e in letters],letters) if x[0]!=y]
n_combos={a+b for a,b in combinations(digits,2)}
di={e:[''.join(t) for t in permutations(e)] for e in (x+y for x,y in product(let_combo, n_combos))}
return {s for sl in di.values() for s in sl}
def aab12_4():
# Ajax1234 recursive approach
def validate(val, queue, counter):
if not queue:
return True
if val.isdigit():
return sum(i.isdigit() for i in queue) < 2 and val not in queue
_sum = sum(i.isalpha() for i in counter)
return _sum < 3 and counter.get(val, 0) < 2
def is_valid(_input):
d = Counter(_input)
return sum(i.isdigit() for i in d) == 2 and sum(i.isalpha() for i in d) == 2
def combinations(d, current = []):
if len(current) == 5:
yield ''.join(current)
else:
for i in d:
if validate(i, current, Counter(current)):
yield from combinations(d, current+[i])
return [i for i in combinations(DIGITS+LETTERS) if is_valid(i)]
def aab12_5(letters=LETTERS, digits=DIGITS):
""" Rory Daulton
Generate the distinct 5-character strings consisting of three
letters (two are equal and a repeated letter) and two digits (each
one is different from the other).
"""
indices = range(5) # indices for the generated 5-char strings
combs = []
for (letterdbl, lettersngl), (digit1, digit2), (indx1, indx2, indx3) in (
product(permutations(letters, 2),
combinations(digits, 2),
permutations(indices, 3))):
charlist = [letterdbl] * 5
charlist[indx1] = lettersngl
charlist[indx2] = digit1
charlist[indx3] = digit2
combs.append(''.join(charlist))
return combs
if __name__=='__main__':
import timeit
funcs=(aab12_1,aab12_2,aab12_3,aab12_4,aab12_5)
di={f.__name__:len(set(f())) for f in funcs}
print(di)
for f in funcs:
print(" {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("f()", setup="from __main__ import f", number=1)))
Prints:
{'aab12_1': 478800, 'aab12_2': 478800, 'aab12_3': 478800, 'aab12_4': 478800, 'aab12_5': 478800}
aab12_1 0.6230 secs
aab12_2 0.3433 secs
aab12_3 0.3292 secs
aab12_4 50.4786 secs
aab12_5 0.2094 secs
The fastest here is Rory Daulton's itertools function. Nicely done!
You can create a recursive function:
from collections import Counter
LETTERS = 'bcdfghjklmnpqrstvwxz'
DIGITS = '2456789'
def validate(val, queue, counter):
if not queue:
return True
if val.isdigit():
return sum(i.isdigit() for i in queue) < 2 and val not in queue
_sum = sum(i.isalpha() for i in counter)
return _sum < 3 and counter.get(val, 0) < 2
def is_valid(_input):
d = Counter(_input)
return sum(i.isdigit() for i in d) == 2 and sum(i.isalpha() for i in d) == 2
def combinations(d, current = []):
if len(current) == 5:
yield ''.join(current)
else:
for i in d:
if validate(i, current, Counter(current)):
yield from combinations(d, current+[i])
_r = [i for i in combinations(DIGITS+LETTERS) if is_valid(i)]
print(len(_r))
Output:
478800
Here is one answer that uses multiple functions in itertools. My strategy is stated in the comments. As much looping as possible is done in the itertools functions, to maximize speed. This returns the desired 478,800 strings, which are all distinct.
Running %timeit on aab12() on my system gives the time result
391 ms ± 2.34 ms per loop
.
"""
Strategy: Generate a permutation of 2 distinct characters, a
combination of 2 distinct digits, and a permutation of 3 distinct
indices in range(5). Then build a 5-char string of the first character
(which will be the repeated one), use the first two indices to place
the digits and the last index to place the non-repeated character.
This yields a total of (20*19) * (7/1*6/2) * (5*4*3) = 478,800
items.
"""
from itertools import combinations, permutations, product
LETTERS = 'bcdfghjklmnpqrstvwxz'
DIGITS = '2456789'
def aab12(letters=LETTERS, digits=DIGITS):
"""Generate the distinct 5-character strings consisting of three
letters (two are equal and a repeated letter) and two digits (each
one is different from the other).
"""
indices = range(5) # indices for the generated 5-char strings
combs = []
for (letterdbl, lettersngl), (digit1, digit2), (indx1, indx2, indx3) in (
product(permutations(letters, 2),
combinations(digits, 2),
permutations(indices, 3))):
charlist = [letterdbl] * 5
charlist[indx1] = digit1
charlist[indx2] = digit2
charlist[indx3] = lettersngl
combs.append(''.join(charlist))
return combs
The first fifteen strings in the list are
['24cbb',
'24bcb',
'24bbc',
'2c4bb',
'2b4cb',
'2b4bc',
'2cb4b',
'2bc4b',
'2bb4c',
'2cbb4',
'2bcb4',
'2bbc4',
'42cbb',
'42bcb',
'42bbc']

Categories