I want to replace all substring occurrences in a string, but I wish not to use the replace method. At the moment, experiments have led me to this:
def count_substrings_and_replace(string, substring, rpl=None):
string_size = len(string)
substring_size = len(substring)
count = 0
_o = string
for i in range(0, string_size - substring_size + 1):
if string[i:i + substring_size] == substring:
if rpl:
print(_o[:i] + rpl + _o[i + substring_size:])
count += 1
return count, _o
count_substrings_and_replace("aaabaaa", "aaa", "ddd")
but I have output like this:
dddbaaa
aaabddd
not dddbddd.
Update 1:
I figured out that I can only replace correctly with a string of the same length of substring. For example for count_substrings_and_replace("aaabaaa", "aaa", "d") I got output: (2, 'dbaad') not dbd
Update 2:
Issue described in update 1 did appear because of string comparing relative to the original string (line 8) that does not change throughout the process.
Fixed:
def count_substrings_and_replace(string, substring, rpl=None):
string_size = len(string)
substring_size = len(substring)
count = 0
_o = string
for i in range(0, string_size - substring_size + 1):
if _o[i:i + substring_size] == substring:
if rpl:
_o = _o[:i] + rpl + _o[i + substring_size:]
count += 1
return count, _o
count_substrings_and_replace("aaabaaa", "aaa", "d")
Output: (2, dbd)
You never update the value of _o when a match is found, you're only printing out what it'd look like if it was to be replaced. Instead, inside that innermost if statement should be two lines like:
_o = _o[:i] + rpl + _o[i + substring_size:]
print(_o)
That would print the string every time a match is found and replaced, moving the print statement to run after the for loop would make it only run once the entire string was parsed and replaced appropriately.
Just my mistake. I had to pass the value to the variable on each iteration not print:
_o = _o[:i] + rpl + _o[i + substring_size:]
Related
I have made a script:
our_word = "Success"
def duplicate_encode(word):
char_list = []
final_str = ""
changed_index = []
base_wrd = word.lower()
for k in base_wrd:
char_list.append(k)
for i in range(0, len(char_list)):
count = 0
for j in range(i + 1, len(char_list)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
else:
continue
if count > 0:
char_list[i] = ")"
else:
char_list[i] = "("
print(changed_index)
print(char_list)
final_str = "".join(char_list)
return final_str
print(duplicate_encode(our_word))
essentialy the purpose of this script is to convert a string to a new string where each character in the new string is "(", if that character appears only once in the original string, or ")", if that character appears more than once in the original string. I have made a rather layered up script (I am relatively new to the python language so didn't want to use any helpful in-built functions) that attempts to do this. My issue is that where I check if the current index has been previously edited (in order to prevent it from changing), it seems to ignore it. So instead of the intended )())()) I get )()((((. I'd really appreciate an insightful answer to why I am getting this issue and ways to work around this, since I'm trying to gather an intuitive knowledge surrounding python. Thanks!
word = "Success"
print(''.join([')' if word.lower().count(c) > 1 else '(' for c in word.lower()]))
The issue here has nothing to do with your understanding of Python. It's purely algorithmic. If you retain this 'layered' algorithm, it is essential that you add one more check in the "i" loop.
our_word = "Success"
def duplicate_encode(word):
char_list = list(word.lower())
changed_index = []
for i in range(len(word)):
count = 0
for j in range(i + 1, len(word)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
if i not in changed_index: # the new inportant check to avoid reversal of already assigned ')' to '('
char_list[i] = ")" if count > 0 else "("
return "".join(char_list)
print(duplicate_encode(our_word))
Your algorithm can be greatly simplified if you avoid using char_list as both the input and output. Instead, you can create an output list of the same length filled with ( by default, and then only change an element when a duplicate is found. The loops will simply walk along the entire input list once for each character looking for any matches (other than self-matches). If one is found, the output list can be updated and the inner loop will break and move on to the next character.
The final code should look like this:
def duplicate_encode(word):
char_list = list(word.lower())
output = list('(' * len(word))
for i in range(len(char_list)):
for j in range(len(char_list)):
if i != j and char_list[i] == char_list[j]:
output[i] = ')'
break
return ''.join(output)
for our_word in (
'Success',
'ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ',
):
result = duplicate_encode(our_word)
print(our_word)
print(result)
Output:
Success
)())())
ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ
))(()(()))))())))()()((())))()))))
I would like to remove 2 last sub-strings from a string like the following example :
str="Dev.TTT.roker.{i}.ridge.{i}."
str1="Dev.TTT.roker.{i}.ridge.{i}.obj."
if in the last two strings between the dot . there is a {i} we have to remove it as well.
so the result of python script should be loke this :
the expected result for str is : Dev.TTT.
the expected result for str1 is : Dev.TTT.roker.{i}.
you can simply split by . and ignore empty string or {i}.
Also do not use keyword as variable. In your case dont use str as variable name.
def solve(s):
x = s.split('.')
cnt = 2
l = len(x) - 1
while cnt and l:
if x[l] == '' or x[l] == '{i}':
l -= 1
continue
else:
cnt -= 1
l -= 1
return '.'.join(x[:l+1]) + '.'
str1="Dev.TTT.roker.{i}.ridge.{i}."
str2="Dev.TTT.roker.{i}.ridge.{i}.obj."
print(solve(str1))
print(solve(str2))
output:
Dev.TTT.
Dev.TTT.roker.{i}.
I'm trying to make my program return the exact same string but with ** between each character. Here's my code.
def separate(st):
total = " "
n = len(st + st[-1])
for i in range(n):
total = str(total) + str(i) + str("**")
return total
x = separate("12abc3")
print(x)
This should return:
1**2**a**b**c**3**
However, I'm getting 0**1**2**3**4**5**6**.
You can join the characters in the string together with "**" as the separator (this works because strings are basically lists in Python). To get the additional "**" at the end, just concatenate.
Here's an example:
def separate(st):
return "**".join(st) + "**"
Sample:
x = separate("12abc3")
print(x) # "1**2**a**b**c**3**"
A note on your posted code:
The reason you get the output you do is because you loop using for i in range(n): so the iteration variable i will be each index in st. Then when you call str(total) + str(i) + str("**"), you cast i to a string, and i was just each index (from 0 to n-1) in st.
To fix that you could iterate over the characters in st directly, like this:
for c in st:
or use the index i to get the character at each position in st, like this:
for i in range(len(st)):
total = total + st[i] + "**"
welcome to StackOverflow!
I will explain part of your code line by line.
for i in range(n) since you are only providing 1 parameter (which is for the stopping point), this will loop starting from n = 0, 1, 2, ... , n-1
total = str(total) + str(i) + str("**") this add i (which is the current number of iteration - 1) and ** to the current total string. Hence, which it is adding those numbers sequentially to the result.
What you should do instead is total = str(total) + st[i] + str("**") so that it will add each character of st one by one
In addition, you could initialize n as n = len(st)
I have a string:
a = babababbaaaaababbbab
And it needs to be shortened so it looks like this:
(ba)3(b)2(a)5ba(b)3ab
So basically it needs to take all repeating characters and write how many times they are repeating instead of printing them.
I managed to do half of this:
from itertools import groupby
a = 'babababbaaaaababbbab'
grouped = ["".join(grp) for patt,grp in groupby(a)]
solved = [str(len(i)) + i[0] for i in grouped if len(i) >= 2]
but this only does this for characters that are repeating but not patterns. I get it that I could do this by finding 'ab' pattern in string but this needs to be viable for every possible string. Has anyone encountered something similar?
You can easily do this with regex:
>>> repl= lambda match:'({}){}'.format(match.group(1), len(match.group())//len(match.group(1)))
>>> re.sub(r'(.+?)\1+', repl, 'babababbaaaaababbbab')
'(ba)3(b)2(a)5ba(b)3ab'
Not much to explain here. The pattern (.+?)\1+ matches repeating character sequences, and the lambda function rewrites them to the form (sequence)number.
This is what I came up with, the code is a mess, but I just wanted to have a quick fun, so I let it be like this
a = 'babababbaaaaababbbab'
def compress(text):
for i in range(1, len(text) // 2):
for j, c in enumerate(text[:-i if i > 0 else len(text)]):
pattern = text[j:i+j]
new_text = pattern_repeats_processor(pattern, text, j)
if new_text != text:
return compress(new_text)
return text
def pattern_repeats_processor(pattern, text, i):
chunk = pattern
count = 1
while chunk == pattern and i + (count + 1) * len(pattern) < len(text):
chunk = text[i + count * len(pattern): i + (count + 1) * len(pattern)]
if chunk == pattern:
count = count + 1
else:
break
if count > 1:
return text[:i] + '(' + pattern + ')' + str(count) + text[i + (count + 0) * len(pattern):]
return text
print(compress(a))
print(a)
It makes
babababbaaaaababbbab =>
(ba)3(b)2(a)5ba(b)3ab
P.S. Of course answer of Rowing is miles better, pretty impressive even
I'm not sure what exactly you're looking for but here hope this helps.
A=a.count('a')
B=a.count('b')
AB=a.count('ab')
BAB=a.count('bab')
BA=a.count('ba')
print(A,'(a)',B,'(b)',AB,'(ab)',BAB,'(bab)',BA,'(ba)')
Let
s = 'hello you blablablbalba qyosud'
i = 17
How to get the word around position i? i.e. blablablbalba in my example.
I was thinking about this, but it seems unpythonic:
for j, c in enumerate(s):
if c == ' ':
if j < i:
start = j
else:
end = j
break
print start, end
print s[start+1:end]
Here is another simple approach with regex,
import re
s = 'hello you blablablbalba qyosud'
i = 17
string_at_i = re.findall(r"(\w+)", s[i:])[0]
print(re.findall(r"\w*%s\w*" % string_at_i, s))
Updated : Previous pattern was failing when there is space. Current pattern takes care of it !
To answer your first question,
p = s[0 : i].rfind(' ')
Output: 9
For your second question,
s[ p + 1 : (s[p + 1 : ].find(' ') + p + 1) ]
Output: 'blablablbalba'
Description:
Extract the string from the starting to the ith position.
Find the index of the last occurrence of space. This will be your starting point for your required word (the second question).
Go from here to the next occurrence of space and extract the word in between.
The following consolidated code should work in all scenarios:
s = s + ' '
p = s[0 : i].rfind(' ')
s[ p + 1 : (s[p + 1 : ].find(' ') + p + 1) ]
You can split the word by space, after that you count the number of the spaces until the threshold parameter (i) and this would be the index of the item in the splitted list.
Solution:
print (s.split()[s[:i].count(" ")])
EDIT:
If we have more than one space between words and we want to consider two spaces (or more) as one space we can do:
print (s.split()[" ".join(s[:i].split()).count(" ")])
Output:
blablablbalba
Explanation:
This return's 2 as there are two spaces until the 17 index.
s[:i].count(" ") # return's 2
This return's a list splitted by space.
s.split()
What you need is the index of the relevant item, which you got from s[:i].count(" ")
['hello', 'you', 'blablablbalba', 'qyosud']
def func(s, i):
s1 = s[0:i]
k = s1.rfind(' ')
pos1 = k
s1 = s[k+1:len(s)]
k = s1.find(' ')
k = s[pos1+1:pos1+k+1]
return k
s = 'hello you blablablbalba qyosud'
i = 17
k = func(s, i)
print(k)
output:
blablablbalba
You can use index or find to get the index of the space starting from a precise position. In this case it will look for the space character position starting from start+1. Then, if it finds any space it will print out the word between the two indexes start and end
s = 'hello you blablablbalba qyosud'
def get_word(my_string, start_index):
end = -1
try:
end = s.find(' ', start_index + 1)
except ValueError:
# no second space was found
pass
return s[start_index:end] if end else None
print get_word(s)
Output: 'blablablbalba'
You can use rfind to search for the previous whitespace including s[i]:
>>> s = 'hello you blablablbalba qyosud'
>>> i = 17
>>> start = s.rfind(' ', 0, i + 1)
>>> start
9
Then you can use find to search the following whitespace again including s[i]:
>>> end = s.find(' ', i)
>>> end
23
And finally use slice to generate the word:
>>> s[start+1:(end if end != -1 else None)]
'blablablbalba'
Above will result to the word in case s[i] is not whitespace. In case s[i] is whitespace the result is empty string.