Avoiding Python Off-by-One Error in RLE Algorithm

Avoiding Python Off-by-One Error in RLE Algorithm - python

EDIT: there's more wrong with this than just an off-by-one error, it seems.
I've got an off-by-one error in the following simple algorithm which is supposed to display the count of letters in a string, along the lines of run-length encoding.
I can see why the last character is not added to the result string, but if I increase the range of i I get index out of range for obvious reasons.
I want to know what the conceptual issue is here from an algorithm design perspective, as well as just getting my code to work.
Do I need some special case code to handle the last item in the original string? Or maybe it makes more sense to be comparing the current character with the previous character, although that poses a problem at the beginning of the algorithm?
Is there a general approach to this kind of algorithm, where current elements are compared to previous/next elements, which avoids index out of range issues?
def encode(text):
# stores output string
encoding = ""
i = 0
while i < len(text) - 1:
# count occurrences of character at index i
count = 1
while text[i] == text[i + 1]:
count += 1
i += 1
# append current character and its count to the result
encoding += text[i] + str(count)
i += 1
return encoding
text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1

You're right, you should have while i < len(text) for the external loop to process the last character if it is different for the previous one (d in your case).
Your algorithm is then globally fine, but it will crash when looking for occurrences of the last character. At this point, text[i+1] becomes illegal.
To solve this, just add a safety check in the internal loop: while i+1 < len(text)
def encode(text):
# stores output string
encoding = ""
i = 0
while i < len(text):
# count occurrences of character at index i
count = 1
# FIX: check that we did not reach the end of the string
# while looking for occurences
while i+1 < len(text) and text[i] == text[i + 1]:
count += 1
i += 1
# append current character and its count to the result
encoding += text[i] + str(count)
i += 1
return encoding
text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1d1

If you keep your strategy, you'll have to check i+1 < len(text).
This gives something like:
def encode(text):
L = len(text)
start = 0
encoding = ''
while start < L:
c = text[start]
stop = start + 1
while stop < L and text[stop] == c:
stop += 1
encoding += c + str(stop - start)
start = stop
return encoding
Another way to do things, is to remember the start of each run:
def encode2(text):
start = 0
encoding = ''
for i,c in enumerate(text):
if c != text[start]:
encoding += text[start] + str(i-start)
start = i
if text:
encoding += text[start] + str(len(text)-start)
return encoding
This allows you to just enumerate the input which feels more pythonic.

Related

Count the number of characters in a string and displaying the frequency count

Word Problem:
Create a function that applies a compression technique to a string and returns the resultant compressed string. More formally, a block is a substring of identical symbols that is as long as possible. A block will be represented in compressed form as the length of the block followed by the symbol in that block. The encoding of a string is the representation of each block in the string in the order in which they appear in the string. Given a sequence of characters, write a program to encode them in this format.
Example Input:
print(rleEncode('WWWWWWWBWWWWWWWBBW'))
Example Output:
'W7B1W7B2W1'
So far, I created a counter and a for loop that will loop through every character in the sting, I don't know how to finish it
def rleEncode(s: str) -> str:
count = 0
index = ""
for i in range(len(s)):
if s[i] in index:
index.append(i)
count += 1
return count, index

I think this prob. what you're looking for? In pure Python:
from itertools import groupby
s = '...your string....'
ans = ''
for k, g in groupby(s):
ans += ''.join(k + str(len(list(g))))
print(ans)
'W7B1W7B2W1'
Here is another purely, pure function solution
w/o even using Python lib - groupby. As you can see it's more lines of code... and some logic to determine where to start/stop new counts.
def encode(s: str) -> str:
count = 1
res = ''
# the first character
res += s[0]
# loop, skipping last one
for i, char in enumerate(s[:-1]):
if s[i] == s[i+1]: # current == next char.
count += 1 # increment count
else: # char changing
if count >= 1:
res += str(count) # convert int to string and add
res += s[i+1]
count = 1 # reset the count
# finally the last one
if count >= 1: # if the char is single ONE.
res += str(count)
return res
print(encode(s)) # W7B1W7B2W
print(encode('ABBA')) # A1B2A1

PYTHON 3 Vigniere decoder issue; even with code outputting spaces and punctuation correctly, after punctuation mark the decoder doesnt work

Working on a Vigniere Decoder in Python 3 but after processing punctuation marks (which output correctly) the decoding algorithm is thrown off, I assume this to be due to incorrect drawing of comparative index values (between encoded message string and repeated key phrase string).
To get around this I thought of including a punctuation counter for correction, but I do not know where in the maths of my nested if loop to implement this. Or if this (punctuation counter) is even a fix that will work in the method of logic I am using.
I'm only working with basic tools in Python as this is supposed to be a challenge of what I've learnt so far, so simpler/more advanced ways to program a Vigniere decoder might not be something I know, in case anyone sees my attempt and thinks what the hell are they doing!
import string
alphabet = (string.ascii_lowercase)
punctuation = (string.punctuation)
def vignieredecoder(message, key):
keycompare = ''
index = 0
decodedmessage = ''
#makes key repeat to length of message w/ spaces integrity the same
for character in message:
if character == ' ':
keycompare += ' '
else:
keycompare += key[index % len(key)]
index += 1
key = keycompare
# decoding loop
for characters in range(len(message)):
punctcount = 0
if message[characters] in punctuation:
decodedmessage += message[characters]
punctcount += 1
elif message[characters] == ' ':
decodedmessage += ' '
else:
# store index value of character in message, e.g. a = 0, b = 1, etc
messageletterindexval = alphabet.index(message[characters])
# store index value of character in keycompare[place of character in message], e.g. if key is friends and first iteration being 0 = f
keycompareindex = key[characters]
# access character in alphabet at indexvalue of keycompare, e.g. if key is friends and first iteration of for loop, finds f in alphabet and returns indexval 5
keycompareindexval = alphabet.index(keycompareindex)
decodedletterval = messageletterindexval - keycompareindexval
if decodedletterval < 0:
decodedletterval += 26
decodedmessage += alphabet[decodedletterval]
else:
decodedmessage += alphabet[decodedletterval]
return decodedmessage
print(vignieredecoder('dfc aruw fsti gr vjtwhr wznj? vmph otis! cbx swv jipreneo uhllj kpi rahjib eg fjdkwkedhmp!', 'friends' ))
terminal output: you were able to decode this? rzmp jcao! zjs bor wfxmnfab rpgub gcf zvqbeo bo asvgjhmyqel!
expected output: you were able to decode this? nice work! you are becoming quite the expert at crytography!

Can you explain me the RLE algorithm code in python

I I've finally found how to make a RLE algorithm by watching a tutorial but This tutorial didn' t explain something in that code I didn't get why we write j = i instead of j = 0 (Knowing that I = 0) it's the same no ?
I didn't get why i = j + 1 either. Why i = j + 1 At the end of the function ? Why not simply i += 1 but if we want to repeat a loop in a loop then we do j + 1 ?
Did the first while loop is supposed to repeat the second while loop until the string is finished ?
And finally why encoded_message is repeated two times ? instead of one. We return encoded_message so that's it ? We can simply do print(encode(text)) instead of
"print('The encoded message is the output ',encoded_message)" (when we put encode(text) into encoded_message)
I know i'm asking a lot of questions but I just can't memorize the code without understanding it, it would be totally useless and unproductive
def encode(message):
encoded_message = ""
i = 0
while(i<len(message)):
count = 1
ch = message[i]
j = i # ???
while(j<len(message)-1): # GET IT -----------------------------------------------------------
if message[j] == message[j+1]: # if the previous and next characters are the same
count = count + 1 # we increase count variable
j += 1 # we increase j position
# GET IT ----------------------------------------------------------------------------
else:
break
encoded_message = encoded_message + str(count) + ch # "" + count converted to string + character (ch)
i = j + 1 # ???
return encoded_message
text = input('enter your charcter chain...')
encoded_message = encode(text)
print('The encoded message is the output ',encoded_message)
When I replaced j = i by j = 0 nothing is displayed in the terminal
see : no result

There is an outer loop and an inner loop. The outer loop with the variable i starts iterating over the message. The inner loop uses the variable j and starts at the current position of i.
That is: when i=0 then j=0. But when i=5 (for example) then j=5 also.
The inner loops task is to check whether 2 or more identical characters follow one another. If they do i is increased accordingly at the end of the inner loop. So that each letter of the message is only looked at once.
That is why j should not be set to a constant value. Setting it to j=0 would cause the inner loop to start at the beginning of the message at every iteration.
I added two simple print() statements to your code to clarify:
def encode(message):
encoded_message = ""
i = 0
while(i<len(message)):
print(f'outer loop: i={i}')
count = 1
ch = message[i]
j = i
while(j<len(message)-1):
print(f'\tinner loop: j={j}')
if message[j] == message[j+1]: # if the previous and next characters are the same
count = count + 1 # we increase count variable
j += 1 # we increase j position
else:
break
encoded_message = encoded_message + str(count) + ch # "" + count converted to string + character (ch)
i = j + 1
return encoded_message
text = 'Hello World'
encoded_message = encode(text)
print('The encoded message is the output ', encoded_message)
(Please note: I do not know the RLE algorithm but just looked at your code.)

Python, how to not require an additional space at the end of the list?

I've written a program which can compress a sequence of characters.
def compress(string):
output = ""
counter = 1
firstLoop = True
for element in range(0, len(string)):
# if statement checking if current character was last character
if string[element] == string[element - 1]:
# if it was, then the character has been written more than one
# time in a row, so increase counter
counter = counter + 1
else:
# when we detect a new character reset the counter
# and also record the character and how many times it was repeated
if not firstLoop:
output = output + string[element - 1] + str(counter)
counter = 1
firstLoop = False
return output
data = "aaaabbbchhtttttttf"
print(data)
compressedData = compress(data)
print(compressedData)
The program outputs:
aaaabbbchhtttttttf
a4b3c1h2t7
So, it finds that there's '4' entries of 'a' so it writes 'a4', then 'b3' for three entries of b.
The issue is that it forgets about the 'f1' at the end of the string. I know this is because of the line:
output = output + string[element - 1] + str(counter)
Since string[element-1] refers to the position in the string before the current element, thus, it will never reach the final position which is where 'f' is. The program doesn't work without the '-1' since it doesn't write the correct letter.
How can I get around this problem and make it able to include f?
The correct output should be a4b3c1h2t7f1.
Thanks :)
Edit: I forgot to mention that the program works if I include an additional character after the 'f', such as just a blank space. But that's of course because the final character in my string is just a space rather than a letter.

You could do this all with itertools.groupby and sum and avoid all counting and keeping track of indexes:
from itertools import groupby
def compress(string):
return ''.join(k + str(sum(1 for _ in g)) for k, g in groupby(string))
>>> compress("aaaabbbchhtttttttf")
'a4b3c1h2t7f1'

You could make it simpler and add a character at the end:
def compress(string):
output = ""
counter = 0
string = string + '|'
for element in range(0, len(string)):
# if statement checking if current character was last character
if string[element] == string[element - 1]:
# if it was, then the character has been written more than one
# time in a row, so increase counter
counter = counter + 1
elif element != len(string):
output = output + string[element - 1] + str(counter)
counter = 1
return output[2:]
data = "aaaabbbchhtttttttf"
print(data)
compressedData = compress(data)
print(compressedData)

def compress(string):
output = ""
counter = 1
for element in range(1, len(string)):
# if statement checking if current character was last character
if string[element] == string[element - 1]:
# if it was, then the character has been written more than one
# time in a row, so increase counter
counter = counter + 1
else:
# when we detect a new character reset the counter
# and also record the character and how many times it was repeated
output = output + string[element - 1] + str(counter)
counter = 1
return output + string[-1] + str(counter)
Also note that you need to start counting form 1 not 0 and get rid of firstLoop

Try changing the loop to for element in range(0, len(string) + 1) and adding an extra if condition:
for element in range(0, len(string) + 1):
if element == len(string):
output = output + string[element-1] + str(counter)
# if statement checking if current character was last character
elif string[element] == string[element - 1]: ...

In the spirit of fixing your code you just needed to simply add the element first to output before adding the counter on change. You can use a neat treat called else with for loops that will run at the end, which will add the final counter to f. No need to buffer or import anything special, you were fairly close:
def compress(string):
output = ""
counter = 0
firstLoop = True
for i in range(len(string)):
# if statement checking if current character was last character
if firstLoop:
counter += 1
output += string[i]
else:
if string[i] == string[i - 1]:
counter += 1
else:
output += str(counter) + string[i]
counter = 1
firstLoop = False
else:
output += str(counter)
return output
data = "aaaabbbchhtttttttf"
print(data)
compressedData = compress(data)
print(compressedData)

How do I go about ending this loop?

I am trying to count the longest length of string in alphabetical order
s = 'abcv'
longest = 1
current = 1
for i in range (len(s) - 1):
if s[i] <= s[i+1]:
current += 1
else:
if current > longest:
longest = current
current = 0
i += 1
print longest
For this specific string, 'Current' ends up at the correct length, 4, but never modifies longest.
EDIT: The following code now runs into an error
s = 'abcv'
current = 1
biggest = 0
for i in range(len(s) - 1):
while s[i] <= s[i+1]:
current += 1
i += 1
if current > biggest:
biggest = current
current = 0
print biggest
It seems my logic is correct , but I run into errors for certain strings. :(
Although code sources are available on the internet which print the longest string, I can't seem to find how to print the longest length.

break will jump behind the loop (to sam indentation as the for statement. continue will jump to start of loop and do the next iteration
Your logic in the else: statement does not work - you need to indent it one less.
if s[i] <= s[i+1]:
checks for "is actual char less or equal then next char" - if this is the case you need to increment your internal counter and set longest if it is longer
You might get into trouble with if s[i] <= s[i+1]: - you are doing it till len(s)-1. "jfjfjf" is len("jfjfjf") = 6 - you would iterate from 0 to 5 - but the if accesses s[5] and s[6] which is more then there are items.
A different approach without going over explicit indexes and split into two responsibilities (get list of alphabetical substring, order them longest first):
# split string into list of substrings that internally are alphabetically ordered (<=)
def getAlphabeticalSplits(s):
result = []
temp = ""
for c in s: # just use all characters in s
# if temp is empty or the last char in it is less/euqal to current char
if temp == "" or temp[-1] <= c:
temp += c # append it to the temp substring
else:
result.append(temp) # else add it to the list of substrings
temp = "" # and clear tem
# done with all chars, return list of substrings
return result
# return the splitted list as copy after sorting reverse by length
def SortAlphSplits(sp, rev = True):
return sorted(sp, key=lambda x: len(x), reverse=rev)
splitter = getAlphabeticalSplits("akdsfabcdemfjklmnopqrjdhsgt")
print(splitter)
sortedSplitter = SortAlphSplits(splitter)
print (sortedSplitter)
print(len(sortedSplitter[0]))
Output:
['ak', 's', 'abcdem', 'jklmnopqr', 'dhs']
['jklmnopqr', 'abcdem', 'dhs', 'ak', 's']
9
This one returns the array of splits + sorts them by length descending. In a critical environment this costs more memory then yours as you only cache some numbers whereas the other approach fills lists and copies it into a sorted one.
To solve your codes index problem change your logic slightly:
Start at the second character and test if the one before is less that this. That way you will ever check this char with the one before
s = 'abcvabcdefga'
current = 0
biggest = 0
for i in range(1,len(s)): # compares the index[1] with [0] , 2 with 1 etc
if s[i] >= s[i-1]: # this char is bigger/equal last char
current += 1
biggest = max(current,biggest)
else:
current = 1
print biggest

You have to edit out the else statement. Because consider the case where the current just exceeds longest, i.e, from current = 3 and longest =3 , current becomes 4 by incrementing itself. Now here , you still want it to go inside the if current > longest statement
s = 'abcv'
longest = 1
current = 1
for i in range (len(s) - 1):
if s[i] <= s[i+1]:
current += 1
#else:
if current > longest:
longest = current
current = 0
i += 1
longest = current
print longest

Use a while condition loop, then you can easy define, at what condition your loop is done.
If you want QualityCode for longterm:
While loop is better practice than a break, because you see the Looping condition at one place. The simple break is often worse to recognize inbetween the loopbody.

At the end of the loop, current is the length of the last substring in ascending order. Assigning it to longest is not right as the last substring in ascending is not necessarily the longest.
So longest=max(current,longest) instead of longest=current after the loop, should solve it for you.
Edit: ^ was for before the edit. You just need to add longest=max(current,longest) after the for loop, for the same reason (the last ascending substring is not considered). Something like this:
s = 'abcv'
longest = 1
current = 1
for i in range (len(s) - 1):
if s[i] <= s[i+1]:
current += 1
else:
if current > longest:
longest = current
current = 0
i += 1
longest=max(current,longest) #extra
print longest

The loop ends when there is no code after the tab space so technically your loop has already ended

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoiding Python Off-by-One Error in RLE Algorithm - python

Related

Count the number of characters in a string and displaying the frequency count

PYTHON 3 Vigniere decoder issue; even with code outputting spaces and punctuation correctly, after punctuation mark the decoder doesnt work

Can you explain me the RLE algorithm code in python

Python, how to not require an additional space at the end of the list?

How do I go about ending this loop?

Categories

Resources