Finding Subarrays of Vowels from a given String

Finding Subarrays of Vowels from a given String - python

You are given a string S, and you have to find all the amazing substrings of S.
Amazing Substring is one that starts with a vowel (a, e, i, o, u, A, E, I, O, U).
Input
The only argument given is string S.
Output
Return a single integer X mod 10003, here X is number of Amazing Substrings in given string.
Constraints
1 <= length(S) <= 1e6
S can have special characters
Example
Input
ABEC
Output
6
Explanation
Amazing substrings of given string are :
1. A
2. AB
3. ABE
4. ABEC
5. E
6. EC
here number of substrings are 6 and 6 % 10003 = 6.
I have implemented the following algo for the above Problem.
class Solution:
# #param A : string
# #return an integer
def solve(self, A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
y = []
z = len(A)
for i in A:
if i in x:
n = A.index(i)
m = z
while m > n:
y.append(A[n:m])
m -= 1
if y:
return len(y)%10003
else:
return 0
Above Solution works fine for strings of normal length but not for greater length.
For example,
A = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
Above Algo outputs 1630 subarrays but the expected answer is 1244.
Please help me improving the above algo. Thanks for the help

Focus on the required output: you do not need to find all of those substrings. All you need is the quantity of substrings.
Look again at your short example, ABEC. There are two vowels, A and E.
A is at location 0. There are 4 total substrings, ending there and at each following location.
E is at location 2. There are 2 total substrings, ending there and at each following location.
2+4 => 6
All you need do is to find the position of each vowel, subtract from the string length, and accumulate those differences:
A = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
lenA = len(A)
vowel = "aeiouAEIOU"
count = 0
for idx, char in enumerate(A):
if char in vowel:
count += lenA - idx
print(count%10003)
Output:
1244
In a single command:
print( sum(len(A) - idx if char.lower() in "aeiou" else 0
for idx, char in enumerate(A)) )

When you hit a vowel in a string, all sub-strings that start with this vowel are 'amazing' so you can just count them:
def solve(A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
ans = 0
for i in range(len(A)):
if A[i] in x:
ans = (ans + len(A)-i)%10003
return ans

When you are looking for the index of the element n = A.index(i), you get the index of the first occurrence of the element. By using enumerate you can loop through indices and elements simultaneously.
def solve(A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
y = []
z = len(A)
for n,i in enumerate(A):
if i in x:
m = z
while m > n:
y.append(A[n:m])
m -= 1
if y:
return len(y)%10003
else:
return 0

A more general solution is to find all amazing substrings and then count them :
string = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
amazing_substring_start = ['a','e','i','o','u','A','E','I','O','U']
amazing_substrings = []
for i in range(len(string)):
if string[i] in amazing_substring_start:
for j in range(len(string[i:])+1):
amazing_substring = string[i:i+j]
if amazing_substring!='':
amazing_substrings += [amazing_substring]
print amazing_substrings,len(amazing_substrings)%10003

create a loop to calculate the number of amazing subarrays created by every vowel
def Solve(A):
sumn = 0
for i in range(len(A)):
if A[i] in "aeiouAEIOU":
sumn += len(A[i:])
return sumn%10003

Related

How do you separate all possible substrings in a string?

For example lets say:
Str = "abc"
The desired output I am looking for is:
a, b, c, ab, bc, abc
so far I have:
#input
Str = input("Please enter a word: ")
#len of word
n = len(Str)
#while loop to seperate the string into substrings
for Len in range(1,n + 1):
for i in range(n - Len + 1):
j = i + Len - 1
for k in range(i,j + 1):
#printing all the substrings
print(Str[k],end="")
this would get me:
abcabbcabc
which has all the correct substrings but not seperated. What do I do to get my desired output? I would think the end='' would do the trick in seperating each substring into each individual lines but it doesn't. Any suggestions?

You could add an extra print() in the i loop, but it's easier to use a slice instead:
s = "abc"
n = len(s)
for size in range(1, n+1):
for start in range(n-size+1):
stop = start + size
print(s[start:stop])
Output:
a
b
c
ab
bc
abc
On the other hand, if you want them literally joined on comma-spaces as you wrote, the simplest way is to save them in a list then join at the end.
s = "abc"
n = len(s)
L = []
for size in range(1, n+1):
for start in range(n-size+1):
stop = start + size
L.append(s[start:stop])
print(*L, sep=', ')
Or, I would probably use a list comprehension for this:
s = "abc"
n = len(s)
L = [s[j:j+i] for i in range(1, n+1) for j in range(n-i+1)]
print(*L, sep=', ')
Output:
a, b, c, ab, bc, abc

a more pythonic solution
code:
import itertools
s = "abc"
for i in range(1,len(s)+1):
print(["".join(word) for word in list(itertools.combinations(s,i))])
result:
['a', 'b', 'c']
['ab', 'ac', 'bc']
['abc']

is there a way to improve the function that find a substring of a very large function

I tried to run this code but this function indeed consumes more time. I want to improve this code:
def minion_game(string):
k = 0
s = 0
for i in range(len(string)):
for j in range(i + 1, len(string) + 1):
ss = string[i:j]
if ss[0] in ['A', 'E', 'I', 'O', 'U']:
k += 1
else:
s += 1
if len(string) in range(0, 10 ** 6):
if string.isupper():
if k > s:
print(f"Kevin {k}")
if s > k:
print(f"Stuart {s}")
if k == s:
print("Draw")

Using the Counter class is usually pretty efficient in a case like this. This should be mostly similar to what you have done in terms of results, but hopefully much quicker.
from collections import Counter
k_and_s = Counter('k' if c in 'AEIOU' else 's' for c in string)
k, s = k_and_s['k'], k_and_s['s']
if k > s:
print(f'Kevin {k}')
elif k < s:
print(f'Stuart {s}')
else
print(f'Draw')
Zooming in on k_and_s = Counter('k' if c in 'AEIOU' else 's' for c in string), this uses comprehension in place of a loop. It is roughly equivalent to this:
k_and_s = Counter()
for c in string:
if c in 'AEIOU':
k_and_s['k'] += 1
else
k_and_s['s'] += 1

The answer by #jamie-deith is good and fast. It will process the complete works of Shakespeare in about 0.56 seconds on my computer. I gave up timing the original answer and modifications of it as it simply goes on and on.
This version is simple and produces the same answer in 0.26 seconds. I'm sure them are likely even faster answers:
with open("shakespeare.txt", encoding="utf-8") as file_in:
shakespeare = file_in.read().upper()
kevin = len([character for character in shakespeare if character in 'AEIOU'])
stuart = len(shakespeare) - kevin
if kevin > stuart:
print(f'Kevin {kevin}')
elif kevin < stuart:
print(f'Stuart {stuart}')
else:
print(f'Draw')

Taking the (perhaps doubtful) position that your code is doing what you intend, but slowly, I note:
The amount added to k or s for any value of i depends on how many times we go around the j loop. You're repeatedly testing the character at i (with the same result every time of course) and adding one to either s or k, as many times as we go around the loop.
So we don't need to actually go around the j loop; we can just add that amount on a single test. For the first character you go the same number of times around the loop as the length of the string, then reducing by one as you shift along the string.
So we can lose i and iterate through the string characters directly.
Then finally we don't report anything if the string is too long or not upper case, so we can just do that test first, and not even calculate in those circumstances.
def minion_game(string):
if len(string) < 10**6 and string.isupper():
k = 0
s = 0
j = len(string)
for ss in string:
if ss in 'AEIOU':
k += j
else:
s += j
j -= 1 # reducing amount to add
if k > s:
print(f"Kevin {k}")
elif s > k:
print(f"Stuart {s}")
else:
print("Draw")
As a hint for even faster options, I'll note that k+s is constant depending on the length of the string.

def minion_game(string):
k = 0
s = 0
l = len(string) # save length in a variable
for i in range(l):
for j in range(i + 1, l + 1):
ss = string[i] # take only the first
if ss in ['A', 'E', 'I', 'O', 'U']:
k += 1
else:
s += 1
if l in range(0, 10 ** 6):
if string.isupper():
if k > s: # change from three if's to if, elif, else
print(f"Kevin {k}")
elif s > k:
print(f"Stuart {s}")
else:
print("Draw")
I made a few edits that should speed up your code. They are described in comments on the lines. There seems to be some logic missing in the j-loop.
I'm not sure what you are doing on the line if l in range(0, 10 ** 6):. If you wanted to remove it, then it'd look like:
def minion_game(string):
k = 0
s = 0
l = len(string) # save length in a variable
for i in range(l):
for j in range(i + 1, l + 1):
ss = string[i] # take only the first
if ss in ['A', 'E', 'I', 'O', 'U']:
k += 1
else:
s += 1
# removed loop, which definitely saves time
if string.isupper():
if k > s: # change from thee if's to if, elif, else
print(f"Kevin {k}")
elif s > k:
print(f"Stuart {s}")
else:
print("Draw")

How to compute the weight of s tring in Python?

Given a string S, we define its weight, weight(S) as the multiplication of the positions of vowels in the string (starting from 1). Ex: weight(“e”) = 1; # weight(“age”)= 3; weight(“pippo”) = 10.
I tried this:
def weight(s):
vowels = ['a','e','i','o','u']
numbers = []
for c in s:
if c in vowels:
n = s.index(c)+1
numbers.append(n)
result = 1
for x in numbers:
result = result*x
print(result)
But it works only with different vowels. If there is the same vowel in the string, the number is wrong.
What am I missing?
Thank you all.

You can use this:
s = 'pippo'
np.prod([i+1 for i,v in enumerate(s) if v in ['a','e','i','o','u']])
10

str.index() works like str.find in that:
Return the lowest index in the string where substring sub is found [...]
Source: str.index -> str.find)
only returns the first occurences index.
functools.reduce and operator.mul together with enumerate (from 1) makes this a one-liner:
from operator import mul
from functools import reduce
value = reduce(mul, (i for i,c in enumerate("pippo",1) if c in "aeiou"))
Or for all your strings:
for t in ["e","age","pippo"]:
# oneliner (if you omit the imports and iterating over all your given examples)
print(t, reduce(mul, (i for i,c in enumerate(t,1) if c in "aeiou")))
Output:
e 1
age 3
pippo 10

Maybe not an optimal way to do it, but this works.
vowels = ['a', 'e', 'i', 'o', 'u', 'y']
mystring = 'pippo'
weight = 1
i = 0
while i < len(mystring):
if mystring[i] in vowels:
weight *= i+1
i += 1
if weight == 1 and mystring[0] not in vowels:
weight = 0
print(weight)
The final IF statement gets you rid of the ONE exceptionnal case where the string contains 0 vowels.

You may want to use enumerate. Makes the job easy
The code becomes:
def weight(s):
vowels = ['a','e','i','o','u']
wt=1
for i,c in enumerate(s):
if c in vowels:
wt*=i+1
return wt
print(weight("asdew"))

When you are trying s.index(c) this returns the index of first occurence of the character in string.
You should use enumerate for iterating through the string. Enumerate gives you the value and index of the element while iterating on iterable.
def weight(s):
vowels = ['a','e','i','o','u']
numbers = []
for ind, c in enumerate(s):
if c in vowels:
n = ind+1
numbers.append(n)
result = 1
for x in numbers:
result = result*x
print(result)
You can read about enumerate on below link :
http://book.pythontips.com/en/latest/enumerate.html

Find Repeating Substring In a List

I have a long list of sub-strings (close to 16000) that I want to find where the repeating cycle starts/stops. I have come up with this code as a starting point:
strings= ['1100100100000010',
'1001001000000110',
'0010010000001100',
'0100100000011011',
'1001000000110110',
'0010000001101101',
'1100100100000010',
'1001001000000110',
'0010010000001100',
'0100100000011011',]
pat = [ '1100100100000010',
'1001001000000110',
'0010010000001100',]
for i in range(0,len(strings)-1):
for j in range(0,len(pat)):
if strings[i] == pat[j]:
continue
if strings[i+1] == pat[j]:
print 'match', strings[i]
break
break
The problem with this method is that you have to know what pat is to search for it. I would like to be able to start with the first n sub-list (in this case 3) and search for them, if not match move down one sub-string to the next 3 until it has gone through the entire list or finds the repeat. I believe if the length is high enough (maybe 10) it will find the repeat without being too time demanding.

strings= ['1100100100000010',
'1001001000000110',
'0010010000001100',
'0100100000011011',
'1001000000110110',
'0010000001101101',
'1100100100000010',
'1001001000000110',
'0010010000001100',
'0100100000011011',]
n = 3
patt_dict = {}
for i in range(0, len(strings) - n, 1):
patt = (' '.join(strings[i:i + n]))
if patt not in patt_dict.keys(): patt_dict[patt] = 1
else: patt_dict[patt] += 1
for key in patt_dict.keys():
if patt_dict[key] > 1:
print 'Found ' + str(patt_dict[key]) + ' repeating instances of ' + str(key) + '.'
Give this a shot. Runs in linear time. Basically uses a dictionary to count the number of times that an n-size pattern occurs in a subset. If it exceeds 1, then we have a repeating pattern :)

Here's a reasonably simple way that finds all matches of all lengths >= 1:
def findall(xs):
from itertools import combinations
# x2i maps each member of xs to a list of all the
# indices at which that member appears.
x2i = {}
for i, x in enumerate(xs):
x2i.setdefault(x, []).append(i)
n = len(xs)
for ixs in x2i.values():
if len(ixs) > 1:
for i, j in combinations(ixs, 2):
length = 1 # xs[i] == xs[j]
while (i + length < n and
j + length < n and
xs[i + length] == xs[j + length]):
length += 1
yield i, j, length
Then:
for i, j, n in findall(strings):
print("match of length", n, "at indices", i, "and", j)
displays:
match of length 4 at indices 0 and 6
match of length 1 at indices 3 and 9
match of length 3 at indices 1 and 7
match of length 2 at indices 2 and 8
What you do and don't want hasn't been precisely specified, so this lists all matches. You probably don't really want some of the them. For example, the match of length 3 at indices 1 and 7 is just the tail end of the match of length 4 at indices 0 and 6.
So you'll need to alter the code to compute what you really want. Perhaps you only want a single, maximal match? All maximal matches? Only matches of a particular length? Etc.

Here's something that will find all subarrays that match within the strings array.
strings = ['A', 'B', 'C', 'D', 'Z', 'B', 'B', 'C', 'A', 'B', 'C']
pat = ['A', 'B', 'C', 'D']
i = 0
while i < len(strings):
if strings[i] not in pat:
i += 1
continue
matches = 0
for j in xrange(pat.index(strings[i]), len(pat)):
if i + j - pat.index(strings[i]) >= len(strings):
break
if strings[i + j - pat.index(strings[i])] == pat[j]:
matches += 1
else:
break
if matches:
print 'matched at index %d subsequence length: %d value %s' % (i, matches, strings[i])
i += matches
else:
i += 1
Output:
matched at index 0 subsequence length: 4 value A
matched at index 5 subsequence length: 1 value B
matched at index 6 subsequence length: 2 value B
matched at index 8 subsequence length: 3 value A

Is it possible to make a letter range in python?

Is there a way to do a letter range in python like this:
for x in range(a,h,)

Something like:
[chr(i) for i in range(ord('a'),ord('h'))]
Will give a list of alphabetical characters to iterate through, which you can then use in a loop
for x in [chr(i) for i in range(ord('a'),ord('h'))]:
print(x)
or this will do the same:
for x in map(chr, range(*map(ord,['a', 'h']))):
print(x)

You can use ord() to convert the letters into character ordinals and back:
def char_range(start, end, step=1):
for char in range(ord(start), ord(end), step):
yield chr(char)
It seems to work just fine:
>>> ''.join(char_range('a', 'z'))
'abcdefghijklmnopqrstuvwxy'

There is no built in letter range, but you can write one:
def letter_range(start, stop):
for c in xrange(ord(start), ord(stop)):
yield chr(c)
for x in letter_range('a', 'h'):
print x,
prints:
a b c d e f g

Emanuele's solution is great as long as one is only asking for a range of single characters, which I will admit is what the original questioner posed. There are also solutions out there to generate all multi-character combinations: How to generate a range of strings from aa... to zz. However I suspect that someone who wants a character like range function might want to be able to deal with generating an arbitrary range from say 'y' to 'af' (rolling over from 'z' to 'aa'). So here is a more general solution that includes the ability to either specify the last member of the range or its length.
def strange(start, end_or_len, sequence='ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
"""Create a generator of a range of 'sequential' strings from
start to end_or_len if end_or_len is a string or containing
end_or_len entries if end_or_len is an integer.
>>> list(strange('D', 'F'))
['D', 'E', 'F']
>>> list(strange('Y', 'AB'))
['Y', 'Z', 'AA', 'AB']
>>> list(strange('Y', 4))
['Y', 'Z', 'AA', 'AB']
>>> list(strange('A', 'BAA', sequence='AB'))
['A', 'B', 'AA', 'AB', 'BA', 'BB', 'AAA', 'AAB', 'ABA', 'ABB', 'BAA']
>>> list(strange('A', 11, sequence='AB'))
['A', 'B', 'AA', 'AB', 'BA', 'BB', 'AAA', 'AAB', 'ABA', 'ABB', 'BAA']
"""
seq_len = len(sequence)
start_int_list = [sequence.find(c) for c in start]
if isinstance(end_or_len, int):
inclusive = True
end_int_list = list(start_int_list)
i = len(end_int_list) - 1
end_int_list[i] += end_or_len - 1
while end_int_list[i] >= seq_len:
j = end_int_list[i] // seq_len
end_int_list[i] = end_int_list[i] % seq_len
if i == 0:
end_int_list.insert(0, j-1)
else:
i -= 1
end_int_list[i] += j
else:
end_int_list = [sequence.find(c) for c in end_or_len]
while len(start_int_list) < len(end_int_list) or start_int_list <= end_int_list:
yield ''.join([sequence[i] for i in start_int_list])
i = len(start_int_list)-1
start_int_list[i] += 1
while start_int_list[i] >= seq_len:
start_int_list[i] = 0
if i == 0:
start_int_list.insert(0,0)
else:
i -= 1
start_int_list[i] += 1
if __name__ =='__main__':
import doctest
doctest.testmod()

import string
def letter_range(f,l,al = string.ascii_lowercase):
for x in al[al.index(f):al.index(l)]:
yield x
print ' '.join(letter_range('a','h'))
result
a b c d e f g

this is easier for me at least to read/understand (and you can easily customize which letters are included, and in what order):
letters = 'abcdefghijklmnopqrstuvwxyz'
for each in letters:
print each
result:
a
b
c
...
z

how about slicing an already pre-arranged list?
import string
s = string.ascii_lowercase
print( s[ s.index('b'):s.index('o')+1 ] )

Malcom's example works great, but there is a little problem due to how Pythons list comparison works. If 'A' to "Z" or some character to "ZZ" or "ZZZ" will cause incorrect iteration.
Here "AA" < "Z" or "AAA" < "ZZ" will become false.
In Python [0,0,0] is smaller than [1,1] when compared with "<" or ">" operator.
So below line
while len(start_int_list) < len(end_int_list) or start_int_list <= end_int_list:
should be rewritten as below
while len(start_int_list) < len(end_int_list) or\
( len(start_int_list) == len(end_int_list) and start_int_list <= end_int_list):
It is well explained here
https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types
I rewrote the code example below.
def strange(start, end_or_len, sequence='ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
seq_len = len(sequence)
start_int_list = [sequence.find(c) for c in start]
if isinstance(end_or_len, int):
inclusive = True
end_int_list = list(start_int_list)
i = len(end_int_list) - 1
end_int_list[i] += end_or_len - 1
while end_int_list[i] >= seq_len:
j = end_int_list[i] // seq_len
end_int_list[i] = end_int_list[i] % seq_len
if i == 0:
end_int_list.insert(0, j-1)
else:
i -= 1
end_int_list[i] += j
else:
end_int_list = [sequence.find(c) for c in end_or_len]
while len(start_int_list) < len(end_int_list) or\
(len(start_int_list) == len(end_int_list) and start_int_list <= end_int_list):**
yield ''.join([sequence[i] for i in start_int_list])
i = len(start_int_list)-1
start_int_list[i] += 1
while start_int_list[i] >= seq_len:
start_int_list[i] = 0
if i == 0:
start_int_list.insert(0,0)
else:
i -= 1
start_int_list[i] += 1
Anyway, Malcom's code example is a great illustration of how iterator in Python works.

Sometimes one can over-design what can be a simple solution.
If you know the range of letters you want, why not just use:
for letter in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
print(letter)
Or even:
start = 4
end = 9
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for letter in alphabet[start:end]:
print(letter)
In the second example I illustrate an easy way to pick how many letters you want from a fixed list.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding Subarrays of Vowels from a given String - python

When you hit a vowel in a string, all sub-strings that start with this vowel are 'amazing' so you can just count them: def solve(A): x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U'] ans = 0 for i in range(len(A)): if A[i] in x: ans = (ans + len(A)-i)%10003 return ans

create a loop to calculate the number of amazing subarrays created by every vowel def Solve(A): sumn = 0 for i in range(len(A)): if A[i] in "aeiouAEIOU": sumn += len(A[i:]) return sumn%10003

Related

How do you separate all possible substrings in a string?

is there a way to improve the function that find a substring of a very large function

How to compute the weight of s tring in Python?

Find Repeating Substring In a List

Is it possible to make a letter range in python?

Categories

Resources