count substrings of a string with limitation

count substrings of a string with limitation - python

I have a string and a dictionary. I need to count number of substrings of a given string that has letters(and number of letters) not more than in the dict. I counted only 15 substrings(2a +4b +1d + 2ba + 2ab +bd +db +abc +dba) but I cannot write the program. Need to upgrade it(I hope it requires only ELSE condition)
string = 'babdbabcce'
dict= {'a':1,'b':1,'d':1}
counter= 0
answer = 0
for i in range(len(string)):
for j in dict:
if string[i] == j:
if dict[j] > 0:
dict[j] = dict[j] - 1
counter+= 1
answer+= counter
# else:
print(answer)

It seems like you're looking for permutations of strings (including substrings within them) within another string,
so build the strings using the dictionary, then load the permutations, then
count the permutations in the other string. Note that this probably not the most efficient solution, but it's effective.
Example code:
import itertools
import re
string_to_look_into = 'babdbabcce'
dict= {'a':1,'b':1,'d':1}
permutation_string = ''
for c, n in dict.items():
permutation_string += c * n
permutations = itertools.permutations(permutation_string)
matches_to_count = set()
for perm in permutations:
for i in range(1, len(perm)+1):
matches_to_count.add(''.join(perm[:i]))
sum_dict = {} # to verify matches
sum = 0
for item in matches_to_count:
count = len(re.findall(item, string_to_look_into))
sum_dict[item] = count
sum += count
print(sum)

Related

How would I count all pairs in a list when they are all the same?

I am trying to count all pairs of numbers in a list. A pair is just two numbers that are the same. My current code looks like this.
def pairs(lst):
lst.sort()
count = 0
for x in range(len(lst)):
if x+1 < len(lst):
if lst[x] == lst[x+1]:
count +=1
return count
pairs([1, 1, 1, 1, 1])
What do I need to change to be able to have it count each pair of 1's?

The reason that the function gives the wrong value is that it is taking each item in the list and checking if the next value matches it. This will double count all non-endpoint values. Also looping with conditional statements is inefficient. It may be better to think of the problem as the sum of modulo 2 of the count of each distinct item in the list.
Try this:
Include incomplete pairs
import math
def count_pairs(a_list):
counter=0
for x in set(a_list):
counter += math.ceil(lst.count(x)/2)
print(counter)
Include only complete pairs
import math
def count_pairs(a_list):
counter=0
for x in set(a_list):
counter += math.floor(lst.count(x)/2)
print(counter)
Example:
lst=[1,1,1,1,1,2,2,2,2,2,3,3,3,4,4,5,6,5]
count_pairs(lst)
Output 1
11
Output 2
7

You can try this approach:
list = [1,1,1,1,1,1,2,2,3,3,4]
list.sort
# remove last element if len(list) is odd
if ( len(list) % 2 != 0 ) :
list.pop()
c = 0
# create an `iter` object to simplify comparisons
it = iter(list)
for x1 in it:
x2 = next(it)
if ( x1 == x2 ):
c += 1
print(c)
It wasn't clear to me if you only want "1", if this is the case, introduce a check for x1 or x2 greater than 1 and break the loop.

Code
def count_pairs(lst):
' Using generator with Walrus operator '
return sum(cnt*(cnt-1)//2 for element in set(lst) if (cnt:=lst.count(element)))
Test
print(count_pairs([1, 1, 1, 1, 1])) # Output: 10
print(count_pairs([1,1,1,1,1,2,2,2,2,2,3,3,3,4,4,5,6,5])) # Output: 25
Explanation
The number of pairs of a number in the list is found by:
count the frequency of the number in the list
counting its combinations taking 2 at a time (i.e. for frequency k, combinations = k*(k-1)//2
We sum the pairs count for each unique number in list (i.e. set(lst))
For clarity, the oneliner solution can be expanded to the following.
def count_pairs(lst):
cnt = 0
for element in set(lst):
frequency = lst.count(element) # frequency of element
cnt += frequency * (frequency - 1) //2 # accumulate count of pairs of element
return cnt

count characters occurences in string

I want to find out how often does "reindeer" (in any order) come in a random string and what is the left over string after "reindeer" is removed. I need to preserve order of the left over string
So for example
"erindAeer" -> A (reindeer comes 1 time)
"ierndeBeCrerindAeer" -> ( 2 reindeers, left over is BCA)
I thought of sorting and removing "reindeer", but i need to preserve the order . What's a good way to do this?

We can replace those letters after knowing how many times they repeat, and Counter is convenient for counting elements.
from collections import Counter
def leftover(letter_set, string):
lcount, scount = Counter(letter_set), Counter(string)
repeat = min(scount[l] // lcount[l] for l in lcount)
for l in lcount:
string = string.replace(l, "", lcount[l] * repeat)
return f"{repeat} {letter_set}, left over is {string}"
print(leftover("reindeer", "ierndeBeCrerindAeer"))
print(leftover("reindeer", "ierndeBeCrerindAeere"))
print(leftover("reindeer", "ierndeBeCrerindAee"))
Output:
2 reindeer, left over is BCA
2 reindeer, left over is BCAe
1 reindeer, left over is BCerindAee

Here is a rather simple approach using collections.Counter:
from collections import Counter
def purge(pattern, string):
scount, pcount = Counter(string), Counter(pattern)
cnt = min(scount[x] // pcount[x] for x in pcount)
scount.subtract(pattern * cnt)
return cnt, "".join(scount.subtract(c) or c for c in string if scount[c])
>>> purge("reindeer", "ierndeBeCrerindAeer")
(2, 'BCA')

Here is the code in Python:
def find_reindeers(s):
rmap = {}
for x in "reindeer":
if x not in rmap:
rmap[x] = 0
rmap[x] += 1
hmap = {key: 0 for key in "reindeer"}
for x in s:
if x in "reindeer":
hmap[x] += 1
total_occ = min([hmap[x]//rmap[x] for x in "reindeer"])
left_over = ""
print(hmap, rmap)
for x in s:
if (x in "reindeer" and hmap[x] > total_occ * rmap[x]) or (x not in "reindeer"):
left_over += x
return total_occ, left_over
print(find_reindeers("ierndeBeCrerindAeer"))
Output for ierndeBeCrerindAeer:
(2, "BCA")

You can do it by using count and replace string function:
import queue
word = "reindeer"
given_string = "ierndeBeCrerindAeer"
new_string = ""
counter = 0
tmp = ""
letters = queue.Queue()
for i in given_string:
if not i in word:
new_string += i
else:
letters.put(i)
x = 0
while x < len(word):
while not letters.empty():
j = letters.get()
if j == word[x]:
tmp += j
# print(tmp)
break
else:
letters.put(j)
x = x +1
if tmp == word:
counter += 1
tmp = ""
x = 0
print(f"The word {word} occurs {counter} times in the string {given_string}.")
print("The left over word is",new_string)
Output will be:
The word reindeer occurs 2 times in the string ierndeBeCrerindAeer.
The left over word is BCA
It's easy to use queue here so that we don't repeat the elements that are already present or found.
Hope this answers your question, Thank you!

Python Optimization : Find the most occured sequence of 4 letters inside a 1000 letters string randomly generated

I'm here to ask help about my program.
I realise a program that raison d'être is to find the most occured four letters string on a x letters bigger string which have been generated randomly.
As example, if you would know the most occured sequence of four letters in 'abcdeabcdef' it's pretty easy to understand that is 'abcd' so the program will return this.
Unfortunately, my program works very slow, I mean, It take 119.7 seconds, for analyze all possibilities and display the results for only a 1000 letters string.
This is my program, right now :
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
string = ''
for _ in range(1000):
string += str(chars[random.randint(0, 25)])
print(string)
number = []
for ____ in range(0,26):
print(____)
for ___ in range(0,26):
for __ in range(0, 26):
for _ in range(0, 26):
test = chars[____] + chars[___] + chars[__] + chars[_]
print('trying :',test, end = ' ')
number.append(0)
for i in range(len(string) -3):
if string[i: i+4] == test:
number[len(number) -1] += 1
print('>> finished')
_max = max(number)
for i in range(len(number)-1):
if number[i] == _max :
j, k, l, m = i, 0, 0, 0
while j > 25:
j -= 26
k += 1
while k > 25:
k -= 26
l += 1
while l > 25:
l -= 26
m += 1
Result = chars[m] + chars[l] + chars[k] + chars[j]
print(str(Result),'occured',_max, 'times' )
I think there is ways to optimize it but at my level, I really don't know. Maybe the structure itself is not the best. Hope you'll gonna help me :D

You only need to loop through your list once to count the 4-letter sequences. You are currently looping n*n*n*n. You can use zip to make a four letter sequence that collects the 997 substrings, then use Counter to count them:
from collections import Counter
import random
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
s = "".join([chars[random.randint(0, 25)] for _ in range(1000)])
it = zip(s, s[1:], s[2:], s[3:])
counts = Counter(it)
counts.most_common(1)
Edit:
.most_common(x) returns a list of the x most common strings. counts.most_common(1) returns a single item list with the tuple of letters and number of times it occurred like; [(('a', 'b', 'c', 'd'), 2)]. So to get a string, just index into it and join():
''.join(counts.most_common(1)[0][0])

Even with your current approach of iterating through every possible 4-letter combination, you can speed up a lot by keeping a dictionary instead of a list, and testing whether the sequence occurs at all first before trying to count the occurrences:
counts = {}
for a in chars:
for b in chars:
for c in chars:
for d in chars:
test = a + b + c + d
print('trying :',test, end = ' ')
if test in s: # if it occurs at all
# then record how often it occurs
counts[test] = sum(1 for i in range(len(s)-4)
if test == s[i:i+4])
The multiple loops can be replaced with itertools.permutations, though this improves readability rather than performance:
length = 4
for sequence in itertools.permutations(chars, length):
test = "".join(sequence)
if test in s:
counts[test] = sum(1 for i in range(len(s)-length) if test == s[i:i+length])
You can then display the results like this:
_max = max(counts.values())
for k, v in counts.items():
if v == _max:
print(k, "occurred", _max, "times")
Provided that the string is shorter or around the same length as 26**4 characters, then it is much faster still to iterate through the string rather than through every combination:
length = 4
counts = {}
for i in range(len(s) - length):
sequence = s[i:i+length]
if sequence in counts:
counts[sequence] += 1
else:
counts[sequence] = 1
This is equivalent to the Counter approach already suggested.

Alternate letters in a string - code not working

I am trying to make a string alternate between upper and lower case letters. My current code is this:
def skyline (str1):
result = ''
index = 0
for i in str1:
result += str1[index].upper() + str1[index + 1].lower()
index += 2
return result
When I run the above code I get an error saying String index out of range. How can I fix this?

One way using below with join + enumerate:
s = 'asdfghjkl'
''.join(v.upper() if i%2==0 else v.lower() for i, v in enumerate(s))
#'AsDfGhJkL'

This is the way I would rewrite your logic:
from itertools import islice, zip_longest
def skyline(str1):
result = ''
index = 0
for i, j in zip_longest(str1[::2], islice(str1, 1, None, 2), fillvalue=''):
result += i.upper() + j.lower()
return result
res = skyline('hello')
'HeLlO'
Explanation
Use itertools.zip_longest to iterate chunks of your string.
Use itertools.islice to extract every second character without building a separate string.
Now just iterate through your zipped iterable and append as before.

Try for i in range(len(str1)): and substitute index for i in the code. After, you could do
if i % 2 == 0: result += str1[i].upper()
else: result += str1[i].lower()

For every character in your input string, you are incrementing the index by 2. That's why you are going out of bounds.
Try using length of string for that purpose.

you do not check if your index is still in the size of your string.
It would be necessary to add a condition which verifies if the value of i is always smaller than the string and that i% 2 == 0 and that i == 0 to put the 1st character in Upper
with i% 2 == 0 we will apply the upper one letter on two
for i, __ in enumerate(str1):
if i+1 < len(str1) and i % 2 == 0 or i == 0:
result += str1[i].upper() + str1[i + 1].lower()

I tried to modify as minimal as possible in your code, so that you could understand properly. I just added a for loop with step 2 so that you wouldn't end up with index out of range. And for the final character in case of odd length string, I handled separately.
def skyline (str1):
result = ''
length = len(str1)
for index in range(0, length - 1, 2):
result += str1[index].upper() + str1[index + 1].lower()
if length % 2 == 1:
result += str1[length - 1].upper()
return result

You can use the following code:
def myfunc(str1):
result=''
for i in range(0,len(str1)):
if i % 2 == 0:
result += str1[i].upper()
else:
result += str1[i].lower()
return result

in your code you are get 2 word by one time so you should divide your loop by 2 because your loop work by depending your input string so make an variable like peak and equal it to len(your input input) then peak = int(peak/2) it will solve your pr
def func(name):
counter1 = 0
counter2 = 1
string = ''
peak = len(name)
peak = int(peak/2)
for letter in range(1,peak+1):
string += name[counter1].lower() + name[counter2].upper()
counter1 +=2
counter2 +=2
return string

Return the number of words over min length and Replace ones that arent with word + spaces

For any name whose length is less than the min_length, replace that item of the list with a new string containing the original name with the space(s) added to the right-hand side to achieve the minimum length
example: min_length = 5 /// 'dog' after the change = 'dog '
and also Return the amount of names that were originally over the min_length in the list
def pad_names(list_of_names, min_length):
'''(list of str, int) -> int
Return the number of names that are longer than the minimum length.
>>> pad_names(list_of_names, 5)
2
'''
new_list = []
counter = 0
for word in list_of_names:
if len(word) > min_length:
counter = counter + 1
for word in list_of_names:
if len(word) < min_length:
new_list.append(word.ljust(min_length))
else:
new_list.append(word)
return(counter)
I currently am getting 5 errors on this function:
all errors are #incorrect return value
test_02: pad_names_one_name_much_padding_changes_list
Cause: The list is not changed correctly for the case of 1 name needing padding
test_04: pad_names_one_name_needs_one_pad_changes_list
Cause: one name needs one pad changes list
test_10: pad_names_all_padded_by_3_changes_list
Cause:all padded by 3 changes list
test_12: pad_names_all_padded_different_amounts_changes_list
Cause: all padded different amounts changes list
test_20: pad_names_one_empty_name_list_changed
Cause: one empty name list changed
This function does not need to be efficient just needs to pass those tests without creating more problems

Keeping in spirit with how you've written this, I'm guessing you want to be modifying the list of names so the result can be checked (as well as returning the count)...
def test(names, minlen):
counter = 0
for idx, name in enumerate(names):
newname = name.ljust(minlen)
if newname != name:
names[idx] = newname
else:
counter += 1
return counter

Not sure I completely understand your question, but here is a fairly pythonic way to do what I think you want:
data = ['cat', 'snake', 'zebra']
# pad all strings with spaces to 5 chars minimum
# uses string.ljust for padding and a list comprehension
padded_data = [d.ljust( 5, ' ' ) for d in data]
# get the count of original elements whose length was 5 or great
ct = sum( len(d) >= 5 for d in data )

You mean this?
def pad_names(list_of_names, min_length):
counter = 0
for i, val in enumerate(list_of_names):
if len(val) >= min_length:
counter += 1
elif len(val) < min_length:
list_of_names[i] = val.ljust(min_length)
return counter
OR:
def pad_names(words, min_length):
i, counter = 0, 0
for val in words:
if len(val) >= min_length:
counter += 1
elif len(val) < min_length:
words[i] = val.ljust(min_length)
i += 1
return counter

string.ljust() is a quick and simple way to pad strings to a minimum length.
Here's how I would write the function:
def pad_names(list_of_names, min_length):
# count strings that meets the min_length requirement
count = sum(1 for s in list_of_names if len(s) > min_length)
# replace list with updated content, using ljust() to pad strings
list_of_names[:] = [s.ljust(min_length) for s in list_of_names]
return count # return num of strings that exceeds min_length
While succinct, that may not be the most efficient approach for large datasets since we're essentially creating a new list and copying it over the new one.
If performance is an issue, I'd go with a loop and only update the relevant entries.
def pad_names(list_of_names, min_length):
count = 0
for i, s in enumerate(list_of_names):
if len(s) > min_length: # should it be >= ??
count += 1 # increment counter
else:
list_of_names[i] = s.ljust(min_length) # update original list
return count

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

count substrings of a string with limitation - python

Related

How would I count all pairs in a list when they are all the same?

count characters occurences in string

Python Optimization : Find the most occured sequence of 4 letters inside a 1000 letters string randomly generated

Alternate letters in a string - code not working

Return the number of words over min length and Replace ones that arent with word + spaces

Categories

Resources