How do you separate all possible substrings in a string? - python

For example lets say:
Str = "abc"
The desired output I am looking for is:
a, b, c, ab, bc, abc
so far I have:
#input
Str = input("Please enter a word: ")
#len of word
n = len(Str)
#while loop to seperate the string into substrings
for Len in range(1,n + 1):
for i in range(n - Len + 1):
j = i + Len - 1
for k in range(i,j + 1):
#printing all the substrings
print(Str[k],end="")
this would get me:
abcabbcabc
which has all the correct substrings but not seperated. What do I do to get my desired output? I would think the end='' would do the trick in seperating each substring into each individual lines but it doesn't. Any suggestions?

You could add an extra print() in the i loop, but it's easier to use a slice instead:
s = "abc"
n = len(s)
for size in range(1, n+1):
for start in range(n-size+1):
stop = start + size
print(s[start:stop])
Output:
a
b
c
ab
bc
abc
On the other hand, if you want them literally joined on comma-spaces as you wrote, the simplest way is to save them in a list then join at the end.
s = "abc"
n = len(s)
L = []
for size in range(1, n+1):
for start in range(n-size+1):
stop = start + size
L.append(s[start:stop])
print(*L, sep=', ')
Or, I would probably use a list comprehension for this:
s = "abc"
n = len(s)
L = [s[j:j+i] for i in range(1, n+1) for j in range(n-i+1)]
print(*L, sep=', ')
Output:
a, b, c, ab, bc, abc

a more pythonic solution
code:
import itertools
s = "abc"
for i in range(1,len(s)+1):
print(["".join(word) for word in list(itertools.combinations(s,i))])
result:
['a', 'b', 'c']
['ab', 'ac', 'bc']
['abc']

Related

I have troubles with implementing distributions in python

I want to write the function all_possible(word, n) where word is string with no spaces in it, and n is number of |'s. I need to put |'s in word so that my output is list of all possible strings with inserted |'s between characters.
Example:
letters = 'abcdefghijklmnopqrstuvwxyz'
print(all_possible(letters, 7)
>> ['a|b|c|d|e|f|g|hijklmnopqrstuvwxyz', 'a|b|c|d|e|f|gh|ijklmnopqrstuvwxyz', 'a|b|c|d|e|f|ghi|jklmnopqrstuvwxyz'...]
This is what I have so far:
def all_possible(word, n):
word = list(word)
l = ['|'] * n
k = 1
for c in l:
word.insert(k, c)
k += 2
word = ''.join(word)
return word
Any help now?
You are not going to be able to do this with a single loop. You can try using itertools.combination(). For example, combos = combinations(range(1, 26), 7) will give you an iterator that lists out all the indices of the letters you should insert the bar before (with the help of enumerate to keep track of how many letters you're adding:
letters = 'abcdefghijklmnopqrstuvwxyz'
combos = combinations(range(1, 26), 2)
for indices in combos:
l = list(letters)
for i, n in enumerate(indices):
l.insert(i + n, '|')
print("".join(l))
Prints:
a|b|cdefghijklmnopqrstuvwxyz
a|bc|defghijklmnopqrstuvwxyz
a|bcd|efghijklmnopqrstuvwxyz
...
abcdefghijklmnopqrstuvwx|y|z
You can change the 2 to 7, but be warned, it is a lot of combinations.
You can also do this recursively with the insight that inserting a single bar at between each letter is a single loop. Inserting 2 bars is the same as inserting a bar and then doing the same for the string left after the bar.
def insert_bars(s, n):
if n == 0:
yield s
else:
for i in range(1, len(s) - n+1):
for rest in insert_bars(s[i:], n-1):
yield s[:i] + '|' + rest
l = list(insert_bars(letters, 7))
l is length 480700:
['a|b|c|d|e|f|gh|ijklmnopqrstuvwxyz',
'a|b|c|d|e|f|ghi|jklmnopqrstuvwxyz',
'a|b|c|d|e|f|ghij|klmnopqrstuvwxyz',
'a|b|c|d|e|f|ghijk|lmnopqrstuvwxyz',
'a|b|c|d|e|f|ghijkl|mnopqrstuvwxyz',
'a|b|c|d|e|f|ghijklm|nopqrstuvwxyz',
'a|b|c|d|e|f|ghijklmn|opqrstuvwxyz',
'a|b|c|d|e|f|ghijklmno|pqrstuvwxyz',
...
'abcdefghijklmnopqr|s|t|u|v|w|x|yz',
'abcdefghijklmnopqr|s|t|u|v|w|xy|z',
'abcdefghijklmnopqr|s|t|u|v|wx|y|z',
'abcdefghijklmnopqr|s|t|u|vw|x|y|z',
'abcdefghijklmnopqr|s|t|uv|w|x|y|z',
'abcdefghijklmnopqr|s|tu|v|w|x|y|z',
'abcdefghijklmnopqr|st|u|v|w|x|y|z',
'abcdefghijklmnopqrs|t|u|v|w|x|y|z']

How to compute the weight of s tring in Python?

Given a string S, we define its weight, weight(S) as the multiplication of the positions of vowels in the string (starting from 1). Ex: weight(“e”) = 1; # weight(“age”)= 3; weight(“pippo”) = 10.
I tried this:
def weight(s):
vowels = ['a','e','i','o','u']
numbers = []
for c in s:
if c in vowels:
n = s.index(c)+1
numbers.append(n)
result = 1
for x in numbers:
result = result*x
print(result)
But it works only with different vowels. If there is the same vowel in the string, the number is wrong.
What am I missing?
Thank you all.
You can use this:
s = 'pippo'
np.prod([i+1 for i,v in enumerate(s) if v in ['a','e','i','o','u']])
10
str.index() works like str.find in that:
Return the lowest index in the string where substring sub is found [...]
Source: str.index -> str.find)
only returns the first occurences index.
functools.reduce and operator.mul together with enumerate (from 1) makes this a one-liner:
from operator import mul
from functools import reduce
value = reduce(mul, (i for i,c in enumerate("pippo",1) if c in "aeiou"))
Or for all your strings:
for t in ["e","age","pippo"]:
# oneliner (if you omit the imports and iterating over all your given examples)
print(t, reduce(mul, (i for i,c in enumerate(t,1) if c in "aeiou")))
Output:
e 1
age 3
pippo 10
Maybe not an optimal way to do it, but this works.
vowels = ['a', 'e', 'i', 'o', 'u', 'y']
mystring = 'pippo'
weight = 1
i = 0
while i < len(mystring):
if mystring[i] in vowels:
weight *= i+1
i += 1
if weight == 1 and mystring[0] not in vowels:
weight = 0
print(weight)
The final IF statement gets you rid of the ONE exceptionnal case where the string contains 0 vowels.
You may want to use enumerate. Makes the job easy
The code becomes:
def weight(s):
vowels = ['a','e','i','o','u']
wt=1
for i,c in enumerate(s):
if c in vowels:
wt*=i+1
return wt
print(weight("asdew"))
When you are trying s.index(c) this returns the index of first occurence of the character in string.
You should use enumerate for iterating through the string. Enumerate gives you the value and index of the element while iterating on iterable.
def weight(s):
vowels = ['a','e','i','o','u']
numbers = []
for ind, c in enumerate(s):
if c in vowels:
n = ind+1
numbers.append(n)
result = 1
for x in numbers:
result = result*x
print(result)
You can read about enumerate on below link :
http://book.pythontips.com/en/latest/enumerate.html

Loop to find space with python

c = "ab cd ef gf"
n = []
for x in c:
if x == " ":
d = c.find(x)
n.append(d)
print(n)
I want this code to give me something like this. [2,5,8]
But instead it is giving me this. [2,2,2]
Please help me find the mistake. Thank you.
find() will find the first instance, so it always finds the space at index 2. You could keep track of the index as you go with enumerate() so you don't need find():
c = "ab cd ef gf"
n = []
for i, x in enumerate(c):
if x == " ":
n.append(i)
print(n)
Alternatively as a list comprehension:
[i for i, x in enumerate(c) if x == " "]
One way to do it would be:
space_idxs = []
for idx, char in enumerate(s):
if char == ' ':
space_idxs.append(idx)
That's because find(pattern) function returns the first entry of the pattern. Let me supplement your code with required function find_all(string, pattern)
def find_all(string, pattern):
start = 0
indexes = []
for char in string:
start = string.find(pattern, start)
if start == -1:
return indexes
indexes.append(start)
start += len(pattern)
c = "ab cd ef gf"
n = []
n = find_all(c, " ")
print(n)
try
c="ab cd ef gh"
x=" "
print([t for t, k in enumerate(c) if k==x])
it will return [2,5,8]
in your code you are searching for the index value of x in c, three times:
in the for loop you are taking all the characters in your string one by one,
the if loop validates if it is a space
now when the character is a space it enters the if loop
the find command will look for x (space) in c
which is 2
the same is repeated three times and are appended to n
if you want it in a list:
n=([t for t, k in enumerate(c) if k==x])

Python Optimization : Find the most occured sequence of 4 letters inside a 1000 letters string randomly generated

I'm here to ask help about my program.
I realise a program that raison d'être is to find the most occured four letters string on a x letters bigger string which have been generated randomly.
As example, if you would know the most occured sequence of four letters in 'abcdeabcdef' it's pretty easy to understand that is 'abcd' so the program will return this.
Unfortunately, my program works very slow, I mean, It take 119.7 seconds, for analyze all possibilities and display the results for only a 1000 letters string.
This is my program, right now :
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
string = ''
for _ in range(1000):
string += str(chars[random.randint(0, 25)])
print(string)
number = []
for ____ in range(0,26):
print(____)
for ___ in range(0,26):
for __ in range(0, 26):
for _ in range(0, 26):
test = chars[____] + chars[___] + chars[__] + chars[_]
print('trying :',test, end = ' ')
number.append(0)
for i in range(len(string) -3):
if string[i: i+4] == test:
number[len(number) -1] += 1
print('>> finished')
_max = max(number)
for i in range(len(number)-1):
if number[i] == _max :
j, k, l, m = i, 0, 0, 0
while j > 25:
j -= 26
k += 1
while k > 25:
k -= 26
l += 1
while l > 25:
l -= 26
m += 1
Result = chars[m] + chars[l] + chars[k] + chars[j]
print(str(Result),'occured',_max, 'times' )
I think there is ways to optimize it but at my level, I really don't know. Maybe the structure itself is not the best. Hope you'll gonna help me :D
You only need to loop through your list once to count the 4-letter sequences. You are currently looping n*n*n*n. You can use zip to make a four letter sequence that collects the 997 substrings, then use Counter to count them:
from collections import Counter
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
s = "".join([chars[random.randint(0, 25)] for _ in range(1000)])
it = zip(s, s[1:], s[2:], s[3:])
counts = Counter(it)
counts.most_common(1)
Edit:
.most_common(x) returns a list of the x most common strings. counts.most_common(1) returns a single item list with the tuple of letters and number of times it occurred like; [(('a', 'b', 'c', 'd'), 2)]. So to get a string, just index into it and join():
''.join(counts.most_common(1)[0][0])
Even with your current approach of iterating through every possible 4-letter combination, you can speed up a lot by keeping a dictionary instead of a list, and testing whether the sequence occurs at all first before trying to count the occurrences:
counts = {}
for a in chars:
for b in chars:
for c in chars:
for d in chars:
test = a + b + c + d
print('trying :',test, end = ' ')
if test in s: # if it occurs at all
# then record how often it occurs
counts[test] = sum(1 for i in range(len(s)-4)
if test == s[i:i+4])
The multiple loops can be replaced with itertools.permutations, though this improves readability rather than performance:
length = 4
for sequence in itertools.permutations(chars, length):
test = "".join(sequence)
if test in s:
counts[test] = sum(1 for i in range(len(s)-length) if test == s[i:i+length])
You can then display the results like this:
_max = max(counts.values())
for k, v in counts.items():
if v == _max:
print(k, "occurred", _max, "times")
Provided that the string is shorter or around the same length as 26**4 characters, then it is much faster still to iterate through the string rather than through every combination:
length = 4
counts = {}
for i in range(len(s) - length):
sequence = s[i:i+length]
if sequence in counts:
counts[sequence] += 1
else:
counts[sequence] = 1
This is equivalent to the Counter approach already suggested.

Finding Subarrays of Vowels from a given String

You are given a string S, and you have to find all the amazing substrings of S.
Amazing Substring is one that starts with a vowel (a, e, i, o, u, A, E, I, O, U).
Input
The only argument given is string S.
Output
Return a single integer X mod 10003, here X is number of Amazing Substrings in given string.
Constraints
1 <= length(S) <= 1e6
S can have special characters
Example
Input
ABEC
Output
6
Explanation
Amazing substrings of given string are :
1. A
2. AB
3. ABE
4. ABEC
5. E
6. EC
here number of substrings are 6 and 6 % 10003 = 6.
I have implemented the following algo for the above Problem.
class Solution:
# #param A : string
# #return an integer
def solve(self, A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
y = []
z = len(A)
for i in A:
if i in x:
n = A.index(i)
m = z
while m > n:
y.append(A[n:m])
m -= 1
if y:
return len(y)%10003
else:
return 0
Above Solution works fine for strings of normal length but not for greater length.
For example,
A = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
Above Algo outputs 1630 subarrays but the expected answer is 1244.
Please help me improving the above algo. Thanks for the help
Focus on the required output: you do not need to find all of those substrings. All you need is the quantity of substrings.
Look again at your short example, ABEC. There are two vowels, A and E.
A is at location 0. There are 4 total substrings, ending there and at each following location.
E is at location 2. There are 2 total substrings, ending there and at each following location.
2+4 => 6
All you need do is to find the position of each vowel, subtract from the string length, and accumulate those differences:
A = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
lenA = len(A)
vowel = "aeiouAEIOU"
count = 0
for idx, char in enumerate(A):
if char in vowel:
count += lenA - idx
print(count%10003)
Output:
1244
In a single command:
print( sum(len(A) - idx if char.lower() in "aeiou" else 0
for idx, char in enumerate(A)) )
When you hit a vowel in a string, all sub-strings that start with this vowel are 'amazing' so you can just count them:
def solve(A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
ans = 0
for i in range(len(A)):
if A[i] in x:
ans = (ans + len(A)-i)%10003
return ans
When you are looking for the index of the element n = A.index(i), you get the index of the first occurrence of the element. By using enumerate you can loop through indices and elements simultaneously.
def solve(A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
y = []
z = len(A)
for n,i in enumerate(A):
if i in x:
m = z
while m > n:
y.append(A[n:m])
m -= 1
if y:
return len(y)%10003
else:
return 0
A more general solution is to find all amazing substrings and then count them :
string = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
amazing_substring_start = ['a','e','i','o','u','A','E','I','O','U']
amazing_substrings = []
for i in range(len(string)):
if string[i] in amazing_substring_start:
for j in range(len(string[i:])+1):
amazing_substring = string[i:i+j]
if amazing_substring!='':
amazing_substrings += [amazing_substring]
print amazing_substrings,len(amazing_substrings)%10003
create a loop to calculate the number of amazing subarrays created by every vowel
def Solve(A):
sumn = 0
for i in range(len(A)):
if A[i] in "aeiouAEIOU":
sumn += len(A[i:])
return sumn%10003

Categories