Shortening a string - python

I have a string:
a = babababbaaaaababbbab
And it needs to be shortened so it looks like this:
(ba)3(b)2(a)5ba(b)3ab
So basically it needs to take all repeating characters and write how many times they are repeating instead of printing them.
I managed to do half of this:
from itertools import groupby
a = 'babababbaaaaababbbab'
grouped = ["".join(grp) for patt,grp in groupby(a)]
solved = [str(len(i)) + i[0] for i in grouped if len(i) >= 2]
but this only does this for characters that are repeating but not patterns. I get it that I could do this by finding 'ab' pattern in string but this needs to be viable for every possible string. Has anyone encountered something similar?

You can easily do this with regex:
>>> repl= lambda match:'({}){}'.format(match.group(1), len(match.group())//len(match.group(1)))
>>> re.sub(r'(.+?)\1+', repl, 'babababbaaaaababbbab')
'(ba)3(b)2(a)5ba(b)3ab'
Not much to explain here. The pattern (.+?)\1+ matches repeating character sequences, and the lambda function rewrites them to the form (sequence)number.

This is what I came up with, the code is a mess, but I just wanted to have a quick fun, so I let it be like this
a = 'babababbaaaaababbbab'
def compress(text):
for i in range(1, len(text) // 2):
for j, c in enumerate(text[:-i if i > 0 else len(text)]):
pattern = text[j:i+j]
new_text = pattern_repeats_processor(pattern, text, j)
if new_text != text:
return compress(new_text)
return text
def pattern_repeats_processor(pattern, text, i):
chunk = pattern
count = 1
while chunk == pattern and i + (count + 1) * len(pattern) < len(text):
chunk = text[i + count * len(pattern): i + (count + 1) * len(pattern)]
if chunk == pattern:
count = count + 1
else:
break
if count > 1:
return text[:i] + '(' + pattern + ')' + str(count) + text[i + (count + 0) * len(pattern):]
return text
print(compress(a))
print(a)
It makes
babababbaaaaababbbab =>
(ba)3(b)2(a)5ba(b)3ab
P.S. Of course answer of Rowing is miles better, pretty impressive even

I'm not sure what exactly you're looking for but here hope this helps.
A=a.count('a')
B=a.count('b')
AB=a.count('ab')
BAB=a.count('bab')
BA=a.count('ba')
print(A,'(a)',B,'(b)',AB,'(ab)',BAB,'(bab)',BA,'(ba)')

Related

Ignoring Changed Index Check (Python)

I have made a script:
our_word = "Success"
def duplicate_encode(word):
char_list = []
final_str = ""
changed_index = []
base_wrd = word.lower()
for k in base_wrd:
char_list.append(k)
for i in range(0, len(char_list)):
count = 0
for j in range(i + 1, len(char_list)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
else:
continue
if count > 0:
char_list[i] = ")"
else:
char_list[i] = "("
print(changed_index)
print(char_list)
final_str = "".join(char_list)
return final_str
print(duplicate_encode(our_word))
essentialy the purpose of this script is to convert a string to a new string where each character in the new string is "(", if that character appears only once in the original string, or ")", if that character appears more than once in the original string. I have made a rather layered up script (I am relatively new to the python language so didn't want to use any helpful in-built functions) that attempts to do this. My issue is that where I check if the current index has been previously edited (in order to prevent it from changing), it seems to ignore it. So instead of the intended )())()) I get )()((((. I'd really appreciate an insightful answer to why I am getting this issue and ways to work around this, since I'm trying to gather an intuitive knowledge surrounding python. Thanks!
word = "Success"
print(''.join([')' if word.lower().count(c) > 1 else '(' for c in word.lower()]))
The issue here has nothing to do with your understanding of Python. It's purely algorithmic. If you retain this 'layered' algorithm, it is essential that you add one more check in the "i" loop.
our_word = "Success"
def duplicate_encode(word):
char_list = list(word.lower())
changed_index = []
for i in range(len(word)):
count = 0
for j in range(i + 1, len(word)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
if i not in changed_index: # the new inportant check to avoid reversal of already assigned ')' to '('
char_list[i] = ")" if count > 0 else "("
return "".join(char_list)
print(duplicate_encode(our_word))
Your algorithm can be greatly simplified if you avoid using char_list as both the input and output. Instead, you can create an output list of the same length filled with ( by default, and then only change an element when a duplicate is found. The loops will simply walk along the entire input list once for each character looking for any matches (other than self-matches). If one is found, the output list can be updated and the inner loop will break and move on to the next character.
The final code should look like this:
def duplicate_encode(word):
char_list = list(word.lower())
output = list('(' * len(word))
for i in range(len(char_list)):
for j in range(len(char_list)):
if i != j and char_list[i] == char_list[j]:
output[i] = ')'
break
return ''.join(output)
for our_word in (
'Success',
'ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ',
):
result = duplicate_encode(our_word)
print(our_word)
print(result)
Output:
Success
)())())
ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ
))(()(()))))())))()()((())))()))))

Inserting string to string regularly ( 1234567891234 -> 1,2345,6789,1234 )

How to insert ' # ' for each n index from backward?
ex) n=4
evil = '01234567891234oooooooooooooooo321'
to
stan = '0#1234#5678#9123#4ooo#oooo#oooo#oooo#o321'
i tried using list with for,if statement, got stuck. something shameful like this
a = 1234567891234
b = [ a[-i] for i in range(1,len(a)+1)]
for i in range(len(b)):
c += b[i]
if i%4==0: #stuck
c += ','
c.reverse()
What is the optimum way?
You might use a pattern asserting optional repetitions of 4 characters to the right, and replace that position with #
import re
pattern = r"(?=(?:.{4})*$)"
s = "01234567891234oooooooooooooooo321"
print(re.sub(pattern, "#", s))
Output
0#1234#5678#9123#4ooo#oooo#oooo#oooo#o321#
Python demo
cut the string into chunks (backwards) and then concat them using the seperator
evil = '01234567891234oooooooooooooooo321'
l = 4
sep = '#'
sep.join([evil[max(i-l,0):i] for i in range(len(evil), 0, -l)][::-1])
'0#1234#5678#9123#4ooo#oooo#oooo#oooo#o321'
chunks function as in this answer
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
evil = '01234567891234oooooooooooooooo321'
n = 4
stan = "#".join(chunks(evil[::-1], n))[::-1]
print(stan) # Output: 0#1234#5678#9123#4ooo#oooo#oooo#oooo#o321
Input string is reversed ([::-1]), split into chunks, joined by "#" and then reversed back again. (It's possible to skip reverses if you calculate how many characters there will be in the first set of characters)
A naive solution would be using parts of evil string:
evil = '01234567891234oooooooooooooooo321'
n = 4
start = len(evil) % n
insert = '#'
stan = evil[:start] + insert
for i in range(start, len(evil) - n, n):
stan += evil[i:i+n] + insert
stan += evil[-n:]
For this, I would go backwards through your string evil by reversing the string and iterating through it in a for loop. Then I set a count variable to keep track of how many loops it's done, and reset to 0 when it equals 4. All of this looks like the below:
count = 0
for char in evil[::-1]:
if count == 4:
count = 0
count += 1
You can then establish a new empty string (new_str), and append each character of evil to, each time checking if count is 4, and adding a # to the string as well before resetting the count. Full code:
count = 0
new_str = ''
for char in evil[::-1]:
if count == 4:
new_str += '#'
count = 0
count += 1
new_str += char
This will produce the new string reversed, so you need to reverse it again to get the desired result:
new_str = new_str[::-1]
Output:
'123o#oooo#oooo#oooo#ooo4#3219#8765#4321#0'
You can do it like this:
evil = '01234567891234oooooooooooooooo321'
''.join(j if i%4 else f'#{j}' for i, j in enumerate(evil[::-1]))[::-1][:-1]
Output:
'0#1234#5678#9123#4ooo#oooo#oooo#oooo#o321'
An exact method: use divmod to get the reminder and quotient of the string when divided in "blocks" of size 4 then slice.
evil = '01234567891234oooooooooooooooo321'
size = 4
q, r = divmod(len(evil), size)
sep = '#'
stan = f"{evil[:r]}{sep}{sep.join(evil[r+i*size: r+(i+1)*size] for i in range(q))}"
print(stan)
Remark: if the length of the string is a multiple of the block's size the new string will start with sep. Assumed as default behavior since lake of explanation

Replace all occurrences of the substring in string using string slicing

I want to replace all substring occurrences in a string, but I wish not to use the replace method. At the moment, experiments have led me to this:
def count_substrings_and_replace(string, substring, rpl=None):
string_size = len(string)
substring_size = len(substring)
count = 0
_o = string
for i in range(0, string_size - substring_size + 1):
if string[i:i + substring_size] == substring:
if rpl:
print(_o[:i] + rpl + _o[i + substring_size:])
count += 1
return count, _o
count_substrings_and_replace("aaabaaa", "aaa", "ddd")
but I have output like this:
dddbaaa
aaabddd
not dddbddd.
Update 1:
I figured out that I can only replace correctly with a string of the same length of substring. For example for count_substrings_and_replace("aaabaaa", "aaa", "d") I got output: (2, 'dbaad') not dbd
Update 2:
Issue described in update 1 did appear because of string comparing relative to the original string (line 8) that does not change throughout the process.
Fixed:
def count_substrings_and_replace(string, substring, rpl=None):
string_size = len(string)
substring_size = len(substring)
count = 0
_o = string
for i in range(0, string_size - substring_size + 1):
if _o[i:i + substring_size] == substring:
if rpl:
_o = _o[:i] + rpl + _o[i + substring_size:]
count += 1
return count, _o
count_substrings_and_replace("aaabaaa", "aaa", "d")
Output: (2, dbd)
You never update the value of _o when a match is found, you're only printing out what it'd look like if it was to be replaced. Instead, inside that innermost if statement should be two lines like:
_o = _o[:i] + rpl + _o[i + substring_size:]
print(_o)
That would print the string every time a match is found and replaced, moving the print statement to run after the for loop would make it only run once the entire string was parsed and replaced appropriately.
Just my mistake. I had to pass the value to the variable on each iteration not print:
_o = _o[:i] + rpl + _o[i + substring_size:]

Explanation of `ch[:prefix_len%len(ch)]` in python program

I am looking at this python program and almost understood its flow but I am unable to understand ch[:prefix_len%len(ch)] in the following part:
else:
prefix = ch * (prefix_len/len(ch)) + ch[:prefix_len%len(ch)]
suffix = ch * (suffix_len/len(ch)) + ch[:suffix_len%len(ch)]
Here is the context:
def banner(text, ch='=', length=78):
if text is None:
return ch * length
elif len(text) + 2 + len(ch)*2 > length:
# Not enough space for even one line char (plus space) around text.
return text
else:
remain = length - (len(text) + 2)
prefix_len = remain / 2
suffix_len = remain - prefix_len
if len(ch) == 1:
prefix = ch * prefix_len
suffix = ch * suffix_len
else:
prefix = ch * (prefix_len/len(ch)) + ch[:prefix_len%len(ch)]
suffix = ch * (suffix_len/len(ch)) + ch[:suffix_len%len(ch)]
return prefix + ' ' + text + ' ' + suffix
Could somebody please help me to understand this. Thank you.
Sure!
ch[:prefix_len % len(ch)] is accessing a slice of the ch sequence starting from the beginning (since there's no value before the : and going to one character before the index defined by prefix_len % len(ch).
This value is prefix_len (defined earlier as the length of the prefix, not surprisingly) modulus the length of ch. (Think of it as the remainder left over after integer division of prefix_len / len(ch).
I ran the function like: print(banner("Hello everyone!", "1234")) and got:
123412341234123412341234123412 Hello everyone! 1234123412341234123412341234123
so you can see it's fitting the ch value (1234 in my case) in the space it has.
They're adding the remainder.
Say prefix = 10, and ch = '#&+'
If you just multiply ch by prefix_len / len(ch), you'll get 9, but you know you need 10.
So ch[:prefix_len % len(ch)] is just indexing into ch string for the remainder.
Make sense?

Shift cipher in Python: error using ord

I want to replace each character of a string by a different one, shifted over in the alphabet. I'm shifting by 2 in the example below, so a -> c, b -> d, etc.
I'm trying to use a regular expression and the sub function to accomplish this, but I'm getting an error.
This is the code that I have:
p = re.compile(r'(\w)')
test = p.sub(chr(ord('\\1') + 2), text)
print test
where the variable text is an input string.
And I'm getting this error:
TypeError: ord() expected a character, but string of length 2 found
I think the problem is that I the ord function is being called on the literal string "\1" and not on the \w character matched by the regular expression. What is the right way to do this?
It won't work like this. Python first runs chr(ord('\\') + 2 and then passes that result to p.sub.
You need to put it in a separate function or use an anonymous function (lambda):
p = re.compile(r'(\w)')
test = p.sub(lambda m: chr(ord(m.group(1)) + 2), text)
print test
Or better yet use maketrans instead of regular expressions:
import string
shift = 2
t = string.maketrans(string.ascii_lowercase, string.ascii_lowercase[shift:] +
string.ascii_lowercase[:shift])
string.translate(text, t)
Full version
def shouldShift(char):
return char in string.lowercase
def caesarShift(string, n):
def letterToNum(char):
return ord(char)-ord('a')
def numToLetter(num):
return chr(num+ord('a'))
def shiftByN(char):
return numToLetter((letterToNum(char)+n) % 26)
return ''.join((shiftByN(c) if shouldShift(c) else c) for c in string.lower())
One-liner
If you really want a one-liner, it would be this, but I felt it was uglier:
''.join(chr((ord(c)-ord('a')+n)%26 + ord('a')) for c in string)
Demo
>>> caesarShift(string.lowercase, 3)
'defghijklmnopqrstuvwxyzabc'
Try this, using list comprehensions:
input = 'ABC'
''.join(chr(ord(c)+2) for c in input)
> 'CDE'
It's simpler than using regular expressions.
def CaesarCipher(s1,num):
new_str = ''
for i in s1:
asc_V = ord(i)
if asc_V in range(65, 91):
if asc_V + num > 90:
asc_val = 65 + (num - 1 - (90 - asc_V))
else:
asc_val = asc_V + num
new_str = new_str + chr(asc_val)
elif (asc_V in range(97, 123)):
if asc_V + num > 122:
asc_val = 97 + (num - 1 - (122 - asc_V))
else:
asc_val = asc_V + num
new_str = new_str + chr(asc_val)
else:
new_str = new_str + i
return new_str
print (CaesarCipher("HEllo", 4))
print (CaesarCipher("xyzderBYTE", 2))

Categories