How to compute the weight of s tring in Python? - python

Given a string S, we define its weight, weight(S) as the multiplication of the positions of vowels in the string (starting from 1). Ex: weight(“e”) = 1; # weight(“age”)= 3; weight(“pippo”) = 10.
I tried this:
def weight(s):
vowels = ['a','e','i','o','u']
numbers = []
for c in s:
if c in vowels:
n = s.index(c)+1
numbers.append(n)
result = 1
for x in numbers:
result = result*x
print(result)
But it works only with different vowels. If there is the same vowel in the string, the number is wrong.
What am I missing?
Thank you all.

You can use this:
s = 'pippo'
np.prod([i+1 for i,v in enumerate(s) if v in ['a','e','i','o','u']])
10

str.index() works like str.find in that:
Return the lowest index in the string where substring sub is found [...]
Source: str.index -> str.find)
only returns the first occurences index.
functools.reduce and operator.mul together with enumerate (from 1) makes this a one-liner:
from operator import mul
from functools import reduce
value = reduce(mul, (i for i,c in enumerate("pippo",1) if c in "aeiou"))
Or for all your strings:
for t in ["e","age","pippo"]:
# oneliner (if you omit the imports and iterating over all your given examples)
print(t, reduce(mul, (i for i,c in enumerate(t,1) if c in "aeiou")))
Output:
e 1
age 3
pippo 10

Maybe not an optimal way to do it, but this works.
vowels = ['a', 'e', 'i', 'o', 'u', 'y']
mystring = 'pippo'
weight = 1
i = 0
while i < len(mystring):
if mystring[i] in vowels:
weight *= i+1
i += 1
if weight == 1 and mystring[0] not in vowels:
weight = 0
print(weight)
The final IF statement gets you rid of the ONE exceptionnal case where the string contains 0 vowels.

You may want to use enumerate. Makes the job easy
The code becomes:
def weight(s):
vowels = ['a','e','i','o','u']
wt=1
for i,c in enumerate(s):
if c in vowels:
wt*=i+1
return wt
print(weight("asdew"))

When you are trying s.index(c) this returns the index of first occurence of the character in string.
You should use enumerate for iterating through the string. Enumerate gives you the value and index of the element while iterating on iterable.
def weight(s):
vowels = ['a','e','i','o','u']
numbers = []
for ind, c in enumerate(s):
if c in vowels:
n = ind+1
numbers.append(n)
result = 1
for x in numbers:
result = result*x
print(result)
You can read about enumerate on below link :
http://book.pythontips.com/en/latest/enumerate.html

Related

Find the difference between two strings of uneven length in python

a = 'abcdfjghij'
b = 'abcdfjghi'
Output : j
def diff(a, b):
string=''
for val in a:
if val not in b:
string=val
return string
a = 'abcdfjghij'
b = 'abcdfjghi'
print(diff(a,b))
This code returns an empty string.
Any solution for this?
collections.Counter from the standard library can be used to model multi-sets, so it keeps track of repeated elements. It's a subclass of dict which is performant and extends its functionality for counting purposes. To find differences between two strings you can mimic a symmetric difference between sets.
from collections import Counter
a = 'abcdfjghij'
b = 'abcdfjghi'
ca = Counter(a)
cb = Counter(b)
diff = (cb-ca)+(ca-cb) # symmetric difference
print(diff)
#Counter({'j': 1})
Its hard to know exactly what you want based on your question. Like should
'abc'
'efg'
return 'abc' or 'efg' or is there always just going to be one character added?
Here is a solution that accounts for multiple characters being different but still might not give your exact output.
def diff(a, b):
string = ''
if(len(a) >= len(b)):
longString = a
shortString = b
else:
longString = b
shortString = a
for i in range(len(longString)):
if(i >= len(shortString) or longString[i] != shortString[i]):
string += longString[i]
return string
a = 'abcdfjghij'
b = 'abcdfjghi'
print(diff(a,b))
if one string just has one character added and i could be anywhere in the string you could change
string += longString[i]
to
string = longString[i]
In your example, there are 2 differences between the 2 strings :
The letter g and j.
I tested your code and it returns g because all the other letters from are in b:
a = 'abcdfjghij'
b = 'abcdfjhi'
def diff(a, b):
string=''
for val in a:
if val not in b:
string=val
return string
print(diff(a,b))
updated
But you have j twice in a. So the first time it sees j it looks at b and sees a j, all good. For the second j it looks again and still sees a j, all good.
Are you wanting to check if each letter is the same as the other letter in the same sequence, then you should try this:
a = 'abcdfjghij'
b = 'abcdfjghi'
def diff(a, b):
if len(a)>len(b):
smallest_len = len(b)
for index, value in enumerate(a[:smallest_len]):
if a[index] != b[index]:
print(f'a value {a[index]} at index {index} does not match b value {b[index]}')
if len(a) == len(b):
pass
else:
print(f'Extra Values in A Are {a[smallest_len:]}')
else:
smallest_len = len(a)
for index, value in enumerate(b[:smallest_len]):
if a[index] != b[index]:
print(f'a value {a[index]} at index {index} does not match b value {b[index]}')
if len(a) == len(b):
pass
else:
print(f'Extra Values in B Are {b[smallest_len:]}')
diff(a, b)
if I understand correctyl your question is:
"given 2 strings of different length, how can I find the characters
that are different between them?"
So judging by your example, this implies you want either the characters that are only present in 1 of the strings and not on the other, or characters that might be repeated and which count is different in between the two strings.
Here's a simple solution (maybe not the most efficient one), but one that's short and does not require any extra packages:
**UPDATED: **
a = 'abcdfjghij'
b = 'abcdfjghi'
dict_a = dict( (char, a.count(char)) for char in a)
dict_b = dict( (char, b.count(char)) for char in b)
idx_longest = [dict_a, dict_b].index(max([dict_a, dict_b], key = len))
results = [ k for (k,v) in [dict_a, dict_b][idx_longest].items() if k not in [dict_a, dict_b][1-idx_longest].keys() or v!=[dict_a, dict_b][1-idx_longest][k] ]
print(results)
> ['j']
or you can try with other pair of strings such as
a = 'abcaa'
b = 'aaa'
print(results)
> ['b', 'c']
as 'a' is in both string an equal number of times.

count substrings of a string with limitation

I have a string and a dictionary. I need to count number of substrings of a given string that has letters(and number of letters) not more than in the dict. I counted only 15 substrings(2a +4b +1d + 2ba + 2ab +bd +db +abc +dba) but I cannot write the program. Need to upgrade it(I hope it requires only ELSE condition)
string = 'babdbabcce'
dict= {'a':1,'b':1,'d':1}
counter= 0
answer = 0
for i in range(len(string)):
for j in dict:
if string[i] == j:
if dict[j] > 0:
dict[j] = dict[j] - 1
counter+= 1
answer+= counter
# else:
print(answer)
It seems like you're looking for permutations of strings (including substrings within them) within another string,
so build the strings using the dictionary, then load the permutations, then
count the permutations in the other string. Note that this probably not the most efficient solution, but it's effective.
Example code:
import itertools
import re
string_to_look_into = 'babdbabcce'
dict= {'a':1,'b':1,'d':1}
permutation_string = ''
for c, n in dict.items():
permutation_string += c * n
permutations = itertools.permutations(permutation_string)
matches_to_count = set()
for perm in permutations:
for i in range(1, len(perm)+1):
matches_to_count.add(''.join(perm[:i]))
sum_dict = {} # to verify matches
sum = 0
for item in matches_to_count:
count = len(re.findall(item, string_to_look_into))
sum_dict[item] = count
sum += count
print(sum)

Python Optimization : Find the most occured sequence of 4 letters inside a 1000 letters string randomly generated

I'm here to ask help about my program.
I realise a program that raison d'être is to find the most occured four letters string on a x letters bigger string which have been generated randomly.
As example, if you would know the most occured sequence of four letters in 'abcdeabcdef' it's pretty easy to understand that is 'abcd' so the program will return this.
Unfortunately, my program works very slow, I mean, It take 119.7 seconds, for analyze all possibilities and display the results for only a 1000 letters string.
This is my program, right now :
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
string = ''
for _ in range(1000):
string += str(chars[random.randint(0, 25)])
print(string)
number = []
for ____ in range(0,26):
print(____)
for ___ in range(0,26):
for __ in range(0, 26):
for _ in range(0, 26):
test = chars[____] + chars[___] + chars[__] + chars[_]
print('trying :',test, end = ' ')
number.append(0)
for i in range(len(string) -3):
if string[i: i+4] == test:
number[len(number) -1] += 1
print('>> finished')
_max = max(number)
for i in range(len(number)-1):
if number[i] == _max :
j, k, l, m = i, 0, 0, 0
while j > 25:
j -= 26
k += 1
while k > 25:
k -= 26
l += 1
while l > 25:
l -= 26
m += 1
Result = chars[m] + chars[l] + chars[k] + chars[j]
print(str(Result),'occured',_max, 'times' )
I think there is ways to optimize it but at my level, I really don't know. Maybe the structure itself is not the best. Hope you'll gonna help me :D
You only need to loop through your list once to count the 4-letter sequences. You are currently looping n*n*n*n. You can use zip to make a four letter sequence that collects the 997 substrings, then use Counter to count them:
from collections import Counter
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
s = "".join([chars[random.randint(0, 25)] for _ in range(1000)])
it = zip(s, s[1:], s[2:], s[3:])
counts = Counter(it)
counts.most_common(1)
Edit:
.most_common(x) returns a list of the x most common strings. counts.most_common(1) returns a single item list with the tuple of letters and number of times it occurred like; [(('a', 'b', 'c', 'd'), 2)]. So to get a string, just index into it and join():
''.join(counts.most_common(1)[0][0])
Even with your current approach of iterating through every possible 4-letter combination, you can speed up a lot by keeping a dictionary instead of a list, and testing whether the sequence occurs at all first before trying to count the occurrences:
counts = {}
for a in chars:
for b in chars:
for c in chars:
for d in chars:
test = a + b + c + d
print('trying :',test, end = ' ')
if test in s: # if it occurs at all
# then record how often it occurs
counts[test] = sum(1 for i in range(len(s)-4)
if test == s[i:i+4])
The multiple loops can be replaced with itertools.permutations, though this improves readability rather than performance:
length = 4
for sequence in itertools.permutations(chars, length):
test = "".join(sequence)
if test in s:
counts[test] = sum(1 for i in range(len(s)-length) if test == s[i:i+length])
You can then display the results like this:
_max = max(counts.values())
for k, v in counts.items():
if v == _max:
print(k, "occurred", _max, "times")
Provided that the string is shorter or around the same length as 26**4 characters, then it is much faster still to iterate through the string rather than through every combination:
length = 4
counts = {}
for i in range(len(s) - length):
sequence = s[i:i+length]
if sequence in counts:
counts[sequence] += 1
else:
counts[sequence] = 1
This is equivalent to the Counter approach already suggested.

Finding Subarrays of Vowels from a given String

You are given a string S, and you have to find all the amazing substrings of S.
Amazing Substring is one that starts with a vowel (a, e, i, o, u, A, E, I, O, U).
Input
The only argument given is string S.
Output
Return a single integer X mod 10003, here X is number of Amazing Substrings in given string.
Constraints
1 <= length(S) <= 1e6
S can have special characters
Example
Input
ABEC
Output
6
Explanation
Amazing substrings of given string are :
1. A
2. AB
3. ABE
4. ABEC
5. E
6. EC
here number of substrings are 6 and 6 % 10003 = 6.
I have implemented the following algo for the above Problem.
class Solution:
# #param A : string
# #return an integer
def solve(self, A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
y = []
z = len(A)
for i in A:
if i in x:
n = A.index(i)
m = z
while m > n:
y.append(A[n:m])
m -= 1
if y:
return len(y)%10003
else:
return 0
Above Solution works fine for strings of normal length but not for greater length.
For example,
A = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
Above Algo outputs 1630 subarrays but the expected answer is 1244.
Please help me improving the above algo. Thanks for the help
Focus on the required output: you do not need to find all of those substrings. All you need is the quantity of substrings.
Look again at your short example, ABEC. There are two vowels, A and E.
A is at location 0. There are 4 total substrings, ending there and at each following location.
E is at location 2. There are 2 total substrings, ending there and at each following location.
2+4 => 6
All you need do is to find the position of each vowel, subtract from the string length, and accumulate those differences:
A = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
lenA = len(A)
vowel = "aeiouAEIOU"
count = 0
for idx, char in enumerate(A):
if char in vowel:
count += lenA - idx
print(count%10003)
Output:
1244
In a single command:
print( sum(len(A) - idx if char.lower() in "aeiou" else 0
for idx, char in enumerate(A)) )
When you hit a vowel in a string, all sub-strings that start with this vowel are 'amazing' so you can just count them:
def solve(A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
ans = 0
for i in range(len(A)):
if A[i] in x:
ans = (ans + len(A)-i)%10003
return ans
When you are looking for the index of the element n = A.index(i), you get the index of the first occurrence of the element. By using enumerate you can loop through indices and elements simultaneously.
def solve(A):
x = ['a', 'e','i','o', 'u', 'A', 'E', 'I', 'O', 'U']
y = []
z = len(A)
for n,i in enumerate(A):
if i in x:
m = z
while m > n:
y.append(A[n:m])
m -= 1
if y:
return len(y)%10003
else:
return 0
A more general solution is to find all amazing substrings and then count them :
string = "pGpEusuCSWEaPOJmamlFAnIBgAJGtcJaMPFTLfUfkQKXeymydQsdWCTyEFjFgbSmknAmKYFHopWceEyCSumTyAFwhrLqQXbWnXSn"
amazing_substring_start = ['a','e','i','o','u','A','E','I','O','U']
amazing_substrings = []
for i in range(len(string)):
if string[i] in amazing_substring_start:
for j in range(len(string[i:])+1):
amazing_substring = string[i:i+j]
if amazing_substring!='':
amazing_substrings += [amazing_substring]
print amazing_substrings,len(amazing_substrings)%10003
create a loop to calculate the number of amazing subarrays created by every vowel
def Solve(A):
sumn = 0
for i in range(len(A)):
if A[i] in "aeiouAEIOU":
sumn += len(A[i:])
return sumn%10003

Code to output the first repeated character in given string?

I'm trying to find the first repeated character in my string and output that character using python. When checking my code, I can see I'm not index the last character of my code.
What am I doing wrong?
letters = 'acbdc'
for a in range (0,len(letters)-1):
#print(letters[a])
for b in range(0, len(letters)-1):
#print(letters[b])
if (letters[a]==letters[b]) and (a!=b):
print(b)
b=b+1
a=a+1
You can do this in an easier way:
letters = 'acbdc'
found_dict = {}
for i in letters:
if i in found_dict:
print(i)
break
else:
found_dict[i]= 1
Output:
c
Here's a solution with sets, it should be slightly faster than using dicts.
letters = 'acbdc'
seen = set()
for letter in letters:
if letter in seen:
print(letter)
break
else:
seen.add(letter)
Here is a solution that would stop iteration as soon as it finds a dup
>>> from itertools import dropwhile
>>> s=set(); next(dropwhile(lambda c: not (c in s or s.add(c)), letters))
'c'
You should use range(0, len(letters)) instead of range(0, len(letters) - 1) because range already stops counting at one less than the designated stop value. Subtracting 1 from the stop value simply makes you skip the last character of letters in this case.
Please read the documentation of range:
https://docs.python.org/3/library/stdtypes.html#range
There were a few issues with your code...
1.Remove -1 from len(letters)
2.Move back one indent and do b = b + 1 even if you don't go into the if statement
3.Indent and do a = a + 1 in the first for loop.
See below of how to fix your code...
letters = 'acbdc'
for a in range(0, len(letters)):
# print(letters[a])
for b in range(0, len(letters)):
# print(letters[b])
if (letters[a] == letters[b]) and (a != b):
print(b)
b = b + 1
a = a + 1
Nice one-liner generator:
l = 'acbdc'
next(e for e in l if l.count(e)>1)
Or following the rules in the comments to fit the "abba" case:
l = 'acbdc'
next(e for c,e in enumerate(l) if l[:c+1].count(e)>1)
If complexity is not an issue then this will work fine.
letters = 'acbdc'
found = False
for i in range(0, len(letters)-1):
for j in range(i+1, len(letters)):
if (letters[i] == letters[j]):
print (letters[j])
found = True
break
if (found):
break
The below code prints the first repeated character in a string. I used the functionality of the list to solve this problem.
def findChar(inputString):
list = []
for c in inputString:
if c in list:
return c
else:
list.append(c)
return 'None'
print (findChar('gotgogle'))
Working fine as well. It gives the result as 'g'.
def first_repeated_char(str1):
for index,c in enumerate(str1):
if str1[:index+1].count(c) > 1:
return c
return "None"
print(first_repeated_char("abcdabcd"))
str_24 = input("Enter the string:")
for i in range(0,len(str_24)):
first_repeated_count = str_24.count(str_24[i])
if(first_repeated_count > 1):
break
print("First repeated char is:{} and character is
{}".format(first_repeated_count,str_24[i]))

Categories