I have a string, for example: string1 = 'abcdbcabdcabb'.
And I have another string, for example: string2 = 'cab'
I need to count all permutation of string2 in string1.
Currently I'm adding all permutation of string2 to a list,
and than iterating threw string1 by index+string.size and checking
if sub-string of string1 contain in the list of the permutations
I'm sure there is a better optimized way to do it.
You do not need DP in my mind, but a sliding window technic. A permutation of string2 is a string that has exactly the same length and the distribution of the characters is the same. In your example of string2, a permutation is. a string of length 3 with this distribution of characters: {a:1,b:1,c:1}.
So you can write a script, to consider a window of size N (size of string2), from the beginning of string1(index=0). if your current window has exactly the same distribution of characters, you accept it as a permutation, if not you do not count it, and you move on to index+1.
A trick for not recalculating the character distribution in each sliding window, you can get a dictionary of characters, and count the characters at the very first window, then when you slide the window one to the right, you decrease the removing character by one, and increase the adding character by 1.
The code should be something like this, you need to verify it for edge cases:
def get_permut(string1,string2):
N =len(string2)
M = len(string1)
if M < N:
return 0
valid_dist = dict()
for ch in string2:
valid_dist.setdefault(ch,0)
valid_dist[ch]+=1
current_dist=dict()
for ch in string1[:N]:
current_dist.setdefault(ch,0)
current_dist[ch]+=1
ct=0
for i in range(M-N):
if current_dist == valid_dist:
ct+=1
current_dist[i]-=1
current_dist.setdefault(i+1,0)
current_dist[i+1]+=1
if current_dist[i]==0:
del current_dist[i]
return ct
You can use string.count() method here. See below for some way to resolve it:
import itertools
perms=[''.join(i) for i in itertools.permutations(string2)]
res=0
for i in perms:
res+= string1.count(i)
print(res)
# 4
You can use regex for that.
def lst_permutation(text):
from itertools import permutations
lst_p=[]
for i in permutations(text):
lst_p.append(''.join(i))
return lst_p
def total_permutation(string1,string2):
import re
total=0
for i in lst_permutation(string2):
res=re.findall(string2,string1)
total += len(res)
return total
string1 = 'abcdbcabdcabb'
string2 = 'cab'
print(total_permutation(string1,string2))
#12
Here's a dumb way to do it with a regex (don't actually do this).
Use a non capturing group for each letter in the search text, and then expect one of each captured group to appear in the output:
import re
string1 = 'abcdbcabdcabb'
string2 = r'(?:c()|a()|b()){3}\1\2\3'
pos = 0
r = re.compile(string2)
while m := r.search(string1, pos=pos):
print(m.group())
pos = m.start() + 1
abc
bca
cab
cab
Can also dynamically generate it
import re
string1 = 'abcdbcabdcabb'
string2 = 'cab'
before = "|".join([f"{l}()" for l in string2])
matches = "".join([f"\\{i + 1}" for i in range(len(string2))])
r = re.compile(f"(?:{before}){{{len(string2)}}}{matches}")
pos = 0
while m := r.search(string1, pos=pos):
print(m.group())
pos = m.start() + 1
Related
Below is my code:
def count_substring(string, sub_string):
counter = 0
for x in range(0,len(string)):
if string[x]+string[x+1]+string[x+2] == sub_string:
counter +=1
return counter
When I run the code it throws an error - "IndexError: string index out of range"
Please help me in understanding what is wrong with my code and also with the solution.
I am a beginner in Python. Please explain this to me like I am 5.
Can't you simple use str.count for non-overlapping matches:
str.count(substring, [start_index], [end_index])
full_str = 'Test for substring, check for word check'
sub_str = 'check'
print(full_str.count(sub_str))
Returns 2
If you have overlapping matches of your substring you could try re.findall with a positive lookahead:
import re
full_str = 'bobob'
sub_str = 'bob'
print(len(re.findall('(?='+sub_str+')',full_str)))
If you got the new regex.findall module and you want to count as such, try to use the overlapping parameter in re.findall and set it to true:
import regex as re
full_str = 'bobob'
sub_str = 'bob'
print(len(re.findall(sub_str, full_str, overlapped=True)))
Both options will return: 2
Couldn't you just use count? It uses way less code. See JvdV's answer. Also, by the way, this is how I can do it:
def count_substring(string, substring)
print(string.count(substring))
This simplifies code by a lot and also you could just get rid of the function entirely and do this:
print(string.count(substring)) # by the way you have to define string and substring first
If you want to include overlapping strings, then do this:
def count(string, substring):
string_size = len(string)
substring_size = len(substring)
count = 0
for i in xrange(0, string_size-substring_size+1):
if string[ i:i + substring_size] == substring:
count += 1
return count
String has built-in method count for this purpose.
string = 'This is the way to do it.'
string.count('is')
Output: 2
I'm trying to get how many any character repeats in a word. The repetitions must be sequential.
For example, the method with input "loooooveee" should return 6 (4 times 'o', 2 times 'e').
I'm trying to implement string level functions and I can do it this way but, is there an easy way to do this? Regex, or some other sort of things?
Original question: order of repetition does not matter
You can subtract the number of unique letters by the number of total letters. set applied to a string will return a unique collection of letters.
x = "loooooveee"
res = len(x) - len(set(x)) # 6
Or you can use collections.Counter, subtract 1 from each value, then sum:
from collections import Counter
c = Counter("loooooveee")
res = sum(i-1 for i in c.values()) # 6
New question: repetitions must be sequential
You can use itertools.groupby to group sequential identical characters:
from itertools import groupby
g = groupby("aooooaooaoo")
res = sum(sum(1 for _ in j) - 1 for i, j in g) # 5
To avoid the nested sum calls, you can use itertools.islice:
from itertools import groupby, islice
g = groupby("aooooaooaoo")
res = sum(1 for _, j in g for _ in islice(j, 1, None)) # 5
You could use a regular expression if you want:
import re
rx = re.compile(r'(\w)\1+')
repeating = sum(x[1] - x[0] - 1
for m in rx.finditer("loooooveee")
for x in [m.span()])
print(repeating)
This correctly yields 6 and makes use of the .span() function.
The expression is
(\w)\1+
which captures a word character (one of a-zA-Z0-9_) and tries to repeat it as often as possible.
See a demo on regex101.com for the repeating pattern.
If you want to match any character (that is, not only word characters), change your expression to:
(.)\1+
See another demo on regex101.com.
try this:
word=input('something:')
sum = 0
chars=set(list(word)) #get the set of unique characters
for item in chars: #iterate over the set and output the count for each item
if word.count(char)>1:
sum+=word.count(char)
print('{}|{}'.format(item,str(word.count(char)))
print('Total:'+str(sum))
EDIT:
added total count of repetitions
Since it doesn't matter where the repetition is occurring or which characters are being repeated, you can make use of the set data structure provided in Python. It will discard the duplicate occurrences of any character or an object.
Therefore, the solution would look something like this:
def measure_normalized_emphasis(text):
return len(text) - len(set(text))
This will give you the exact result.
Also, make sure to look out for some edge cases, which you should as it is a good practice.
I think your code is comparing the wrong things
You start by finding the last character:
char = text[-1]
Then you compare this to itself:
for i in range(1, len(text)):
if text[-i] == char: #<-- surely this is test[-1] to begin with?
Why not just run through the characters:
def measure_normalized_emphasis(text):
char = text[0]
emphasis_size = 0
for i in range(1, len(text)):
if text[i] == char:
emphasis_size += 1
else:
char = text[i]
return emphasis_size
This seems to work.
I am writing a simple program to replace the repeating characters in a string with an *(asterisk). But the thing here is I can print the 1st occurrence of a repeating character in a string, but not the other occurrences.
For example,
if my input is Google, my output should be Go**le.
I am able to replace the characters that repeat with an asterisk, but just cant find a way to print the 1st occurrence of the character. In other words, my output right now is ****le.
Have a look at my Python3 code for this:
s = 'Google'
s = s.lower()
for i in s:
if s.count(i)>1:
s = s.replace(i,'*')
print(s)
Can someone suggest me what should be done to get the required output?
replace will replace ALL occurences of the char. You need to follow on the characters you already have seen, and if they are repeated to replace JUST this character (at specific index).
Strings don't support index assignment, so we can build a new list that represents the new string and ''.join() it afterwards.
Using Set you can follow on what items you have seen already.
It would look like this:
s = 'Google'
seen = set()
new_string = []
for c in s:
if c.lower() in seen:
new_string.append('*')
else:
new_string.append(c)
seen.add(c.lower())
new_string = ''.join(new_string)
print(new_string)
Go**le
This is my approach:
First, you need to find the nth occurrence of the character. Then, you can replace other occurrences by using this snippet:
s = s[:position] + '*' + s[position+1:]
Full example code:
def find_nth(haystack, needle, n):
start = haystack.find(needle)
while start >= 0 and n > 1:
start = haystack.find(needle, start+len(needle))
n -= 1
return start
s = 'Google'
s_lower = s.lower()
for c in s_lower:
if s_lower.count(c) > 1:
position = find_nth(s_lower, c, 2)
s = s[:position] + '*' + s[position+1:]
print(s)
Runnable link: https://repl.it/Mc4U/4
Regex approach:
import re
s = 'Google'
s_lower = s.lower()
for c in s_lower:
if s_lower.count(c) > 1:
position = [m.start() for m in re.finditer(c, s_lower)][1]
s = s[:position] + '*' + s[position+1:]
print(s)
Runnable link: https://repl.it/Mc4U/3
How about using list comprensions? When constructing a list from another list (which is kind of what you are doing here, since we're considering strings as lists), list comprehension is a great tool:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
inputstring = 'Google'.lower()
outputstring = ''.join(
[char if inputstring.find(char, 0, index) == -1 else '*'
for index, char in enumerate(inputstring)])
print(outputstring)
This results in go**le.
Hope this helps!
(edited to use '*' as the replacement character instead of '#')
I'm trying to write a program which counts how many times a substring appears within a string.
word = "wejmfoiwstreetstreetskkjoih"
streets = "streets"
count = 0
if streets in word:
count += 1
print(count)
as you can see "streets" appears twice but the last s of streets is also the beginning of streets. I can't think of a way to loop this.
Thanks!
Can be done using a regex
>>> import re
>>> text = 'streetstreets'
>>> len(re.findall('(?=streets)', text))
2
From the docs:
(?=...)
Matches if ... matches next, but doesn’t consume any of the
string. This is called a lookahead assertion. For example, Isaac
(?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.
Quick and dirty:
>>> word = "wejmfoiwstreetstreetskkjoih"
>>> streets = "streets"
>>> sum(word[start:].startswith(streets) for start in range(len(word)))
2
A generic (though not as elegant) way would be a loop like this:
def count_substrings(stack, needle):
idx = 0
count = 0
while True:
idx = stack.find(needle, idx) + 1 # next time look after this idx
if idx <= 0:
break
count += 1
return count
My measurement shows that it's ~8.5 times faster than the solution with startswith for every substring.
Is there a Python-way to split a string after the nth occurrence of a given delimiter?
Given a string:
'20_231_myString_234'
It should be split into (with the delimiter being '_', after its second occurrence):
['20_231', 'myString_234']
Or is the only way to accomplish this to count, split and join?
>>> n = 2
>>> groups = text.split('_')
>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')
Seems like this is the most readable way, the alternative is regex)
Using re to get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*) where n is a variable:
n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()
or have a nice function:
import re
def nthofchar(s, c, n):
regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
l = ()
m = re.match(regex, s)
if m: l = m.groups()
return l
s='20_231_myString_234'
print nthofchar(s, '_', 2)
Or without regexes, using iterative find:
def nth_split(s, delim, n):
p, c = -1, 0
while c < n:
p = s.index(delim, p + 1)
c += 1
return s[:p], s[p + 1:]
s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2
I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.
import re
string = "20_231_myString_234"
occur = 2 # on which occourence you want to split
indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]
print (part1, ' ', part2)
I thought I would contribute my two cents. The second parameter to split() allows you to limit the split after a certain number of strings:
def split_at(s, delim, n):
r = s.split(delim, n)[n]
return s[:-len(r)-len(delim)], r
On my machine, the two good answers by #perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.
It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:
def remove_head_parts(s, delim, n):
return s.split(delim, n)[n]
Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.
I put up my testing script online. You are welcome to review and comment.
>>>import re
>>>str= '20_231_myString_234'
>>> occerence = [m.start() for m in re.finditer('_',str)] # this will give you a list of '_' position
>>>occerence
[2, 6, 15]
>>>result = [str[:occerence[1]],str[occerence[1]+1:]] # [str[:6],str[7:]]
>>>result
['20_231', 'myString_234']
It depends what is your pattern for this split. Because if first two elements are always numbers for example, you may build regular expression and use re module. It is able to split your string as well.
I had a larger string to split ever nth character, ended up with the following code:
# Split every 6 spaces
n = 6
sep = ' '
n_split_groups = []
groups = err_str.split(sep)
while len(groups):
n_split_groups.append(sep.join(groups[:n]))
groups = groups[n:]
print n_split_groups
Thanks #perreal!
In function form of #AllBlackt's solution
def split_nth(s, sep, n):
n_split_groups = []
groups = s.split(sep)
while len(groups):
n_split_groups.append(sep.join(groups[:n]))
groups = groups[n:]
return n_split_groups
s = "aaaaa bbbbb ccccc ddddd eeeeeee ffffffff"
print (split_nth(s, " ", 2))
['aaaaa bbbbb', 'ccccc ddddd', 'eeeeeee ffffffff']
As #Yuval has noted in his answer, and #jamylak commented in his answer, the split and rsplit methods accept a second (optional) parameter maxsplit to avoid making splits beyond what is necessary. Thus, I find the better solution (both for readability and performance) is this:
s = '20_231_myString_234'
first_part = text.rsplit('_', 2)[0] # Gives '20_231'
second_part = text.split('_', 2)[2] # Gives 'myString_234'
This is not only simple, but also avoids performance hits of regex solutions and other solutions using join to undo unnecessary splits.