Can you explain me the RLE algorithm code in python

Can you explain me the RLE algorithm code in python - python

I I've finally found how to make a RLE algorithm by watching a tutorial but This tutorial didn' t explain something in that code I didn't get why we write j = i instead of j = 0 (Knowing that I = 0) it's the same no ?
I didn't get why i = j + 1 either. Why i = j + 1 At the end of the function ? Why not simply i += 1 but if we want to repeat a loop in a loop then we do j + 1 ?
Did the first while loop is supposed to repeat the second while loop until the string is finished ?
And finally why encoded_message is repeated two times ? instead of one. We return encoded_message so that's it ? We can simply do print(encode(text)) instead of
"print('The encoded message is the output ',encoded_message)" (when we put encode(text) into encoded_message)
I know i'm asking a lot of questions but I just can't memorize the code without understanding it, it would be totally useless and unproductive
def encode(message):
encoded_message = ""
i = 0
while(i<len(message)):
count = 1
ch = message[i]
j = i # ???
while(j<len(message)-1): # GET IT -----------------------------------------------------------
if message[j] == message[j+1]: # if the previous and next characters are the same
count = count + 1 # we increase count variable
j += 1 # we increase j position
# GET IT ----------------------------------------------------------------------------
else:
break
encoded_message = encoded_message + str(count) + ch # "" + count converted to string + character (ch)
i = j + 1 # ???
return encoded_message
text = input('enter your charcter chain...')
encoded_message = encode(text)
print('The encoded message is the output ',encoded_message)
When I replaced j = i by j = 0 nothing is displayed in the terminal
see : no result

There is an outer loop and an inner loop. The outer loop with the variable i starts iterating over the message. The inner loop uses the variable j and starts at the current position of i.
That is: when i=0 then j=0. But when i=5 (for example) then j=5 also.
The inner loops task is to check whether 2 or more identical characters follow one another. If they do i is increased accordingly at the end of the inner loop. So that each letter of the message is only looked at once.
That is why j should not be set to a constant value. Setting it to j=0 would cause the inner loop to start at the beginning of the message at every iteration.
I added two simple print() statements to your code to clarify:
def encode(message):
encoded_message = ""
i = 0
while(i<len(message)):
print(f'outer loop: i={i}')
count = 1
ch = message[i]
j = i
while(j<len(message)-1):
print(f'\tinner loop: j={j}')
if message[j] == message[j+1]: # if the previous and next characters are the same
count = count + 1 # we increase count variable
j += 1 # we increase j position
else:
break
encoded_message = encoded_message + str(count) + ch # "" + count converted to string + character (ch)
i = j + 1
return encoded_message
text = 'Hello World'
encoded_message = encode(text)
print('The encoded message is the output ', encoded_message)
(Please note: I do not know the RLE algorithm but just looked at your code.)

Related

Avoiding Python Off-by-One Error in RLE Algorithm

EDIT: there's more wrong with this than just an off-by-one error, it seems.
I've got an off-by-one error in the following simple algorithm which is supposed to display the count of letters in a string, along the lines of run-length encoding.
I can see why the last character is not added to the result string, but if I increase the range of i I get index out of range for obvious reasons.
I want to know what the conceptual issue is here from an algorithm design perspective, as well as just getting my code to work.
Do I need some special case code to handle the last item in the original string? Or maybe it makes more sense to be comparing the current character with the previous character, although that poses a problem at the beginning of the algorithm?
Is there a general approach to this kind of algorithm, where current elements are compared to previous/next elements, which avoids index out of range issues?
def encode(text):
# stores output string
encoding = ""
i = 0
while i < len(text) - 1:
# count occurrences of character at index i
count = 1
while text[i] == text[i + 1]:
count += 1
i += 1
# append current character and its count to the result
encoding += text[i] + str(count)
i += 1
return encoding
text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1

You're right, you should have while i < len(text) for the external loop to process the last character if it is different for the previous one (d in your case).
Your algorithm is then globally fine, but it will crash when looking for occurrences of the last character. At this point, text[i+1] becomes illegal.
To solve this, just add a safety check in the internal loop: while i+1 < len(text)
def encode(text):
# stores output string
encoding = ""
i = 0
while i < len(text):
# count occurrences of character at index i
count = 1
# FIX: check that we did not reach the end of the string
# while looking for occurences
while i+1 < len(text) and text[i] == text[i + 1]:
count += 1
i += 1
# append current character and its count to the result
encoding += text[i] + str(count)
i += 1
return encoding
text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1d1

If you keep your strategy, you'll have to check i+1 < len(text).
This gives something like:
def encode(text):
L = len(text)
start = 0
encoding = ''
while start < L:
c = text[start]
stop = start + 1
while stop < L and text[stop] == c:
stop += 1
encoding += c + str(stop - start)
start = stop
return encoding
Another way to do things, is to remember the start of each run:
def encode2(text):
start = 0
encoding = ''
for i,c in enumerate(text):
if c != text[start]:
encoding += text[start] + str(i-start)
start = i
if text:
encoding += text[start] + str(len(text)-start)
return encoding
This allows you to just enumerate the input which feels more pythonic.

How to fix a String index out of range exception in Python

There is some issue with my python code. I am making a program that finds the occurrences of the letter A in a word and if that letter is found and the next letter is not the letter A the A is swapped with the next letter.
As an example TAN being TNA but WHOA staying as WHOA
AARDVARK being ARADVRAK
The issue is when I input ABRACADABRA I get a string index out of range exception. Before I had that exception I had the word that prints it as BRACADABRIi'm not sure why if I have to add another loop in my program.
If you guys also have anymore efficient way to run the code then the way I have please let me know!
def scrambleWord(userInput):
count = 0
scramble = ''
while count < len(userInput):
if userInput[count] =='A' and userInput[count+1] != 'A':
scramble+= userInput[count+1] + userInput[count]
count+=2
elif userInput[count] != 'A':
scramble += userInput[count]
count+=1
if count < len(userInput):
scramble += userInput(len(userInput)-1)
return scramble
#if a is found switch the next letter index with a's index
def main():
userInput = input("Enter a word: ")
finish = scrambleWord(userInput.upper())
print(finish)
main()

When you get to the end of the string and it is an 'A' your program is then asking for the next character which is off the end of the string.
Change the loop so it doesn't include the last character:
while count < len(userInput)-1:
if ...

You can modify your code as below:
def scrambleWord(userInput):
count = 0
scramble = ''
while count < len(userInput):
if count < len(userInput)-1 and userInput[count] =='A' and userInput[count+1] != 'A':
scramble+= userInput[count+1] + userInput[count]
count+=2
else:
scramble += userInput[count]
count+=1
return scramble
You are not checking the condition (count < len(userInput)-1) when logic tries to check for A's occurrence and swap with next letter. It throws string index out of range exception.

The issue arises in your code when last character in input is 'A'.
This is because your first if in the loop tries to access 'count + 1' character during last iteration.
And since there's no character at that position, you get index error.
The simplest solution would be to make a separate if condition for the same.
Updated snippet for while loop might look like this -
# while start
while count < len_: # len_ is length of input
if count + 1 >= len_:
break # break outta loop, copy last character
current = inp[count]
next_ = inp[count + 1]
if current == 'A':
op += ( next_ + current) # op is result
count += 1
else:
op += current
# increment counter by 1
count += 1
# rest of the code after while is same
Another small issue in your code is while copying last character ( after loop ends ), you should use [ ] instead of ( ) to refer last character in input string.

Just for fun :
from functools import reduce
def main():
word = input("Enter a word: ").lower()
scramble = reduce((lambda x,y : x[:-1]+y+'A' \
if (x[-1]=='a' and y!=x[-1]) \
else x+y),word)
print(scramble.upper())
main()

Python, how to not require an additional space at the end of the list?

I've written a program which can compress a sequence of characters.
def compress(string):
output = ""
counter = 1
firstLoop = True
for element in range(0, len(string)):
# if statement checking if current character was last character
if string[element] == string[element - 1]:
# if it was, then the character has been written more than one
# time in a row, so increase counter
counter = counter + 1
else:
# when we detect a new character reset the counter
# and also record the character and how many times it was repeated
if not firstLoop:
output = output + string[element - 1] + str(counter)
counter = 1
firstLoop = False
return output
data = "aaaabbbchhtttttttf"
print(data)
compressedData = compress(data)
print(compressedData)
The program outputs:
aaaabbbchhtttttttf
a4b3c1h2t7
So, it finds that there's '4' entries of 'a' so it writes 'a4', then 'b3' for three entries of b.
The issue is that it forgets about the 'f1' at the end of the string. I know this is because of the line:
output = output + string[element - 1] + str(counter)
Since string[element-1] refers to the position in the string before the current element, thus, it will never reach the final position which is where 'f' is. The program doesn't work without the '-1' since it doesn't write the correct letter.
How can I get around this problem and make it able to include f?
The correct output should be a4b3c1h2t7f1.
Thanks :)
Edit: I forgot to mention that the program works if I include an additional character after the 'f', such as just a blank space. But that's of course because the final character in my string is just a space rather than a letter.

You could do this all with itertools.groupby and sum and avoid all counting and keeping track of indexes:
from itertools import groupby
def compress(string):
return ''.join(k + str(sum(1 for _ in g)) for k, g in groupby(string))
>>> compress("aaaabbbchhtttttttf")
'a4b3c1h2t7f1'

You could make it simpler and add a character at the end:
def compress(string):
output = ""
counter = 0
string = string + '|'
for element in range(0, len(string)):
# if statement checking if current character was last character
if string[element] == string[element - 1]:
# if it was, then the character has been written more than one
# time in a row, so increase counter
counter = counter + 1
elif element != len(string):
output = output + string[element - 1] + str(counter)
counter = 1
return output[2:]
data = "aaaabbbchhtttttttf"
print(data)
compressedData = compress(data)
print(compressedData)

def compress(string):
output = ""
counter = 1
for element in range(1, len(string)):
# if statement checking if current character was last character
if string[element] == string[element - 1]:
# if it was, then the character has been written more than one
# time in a row, so increase counter
counter = counter + 1
else:
# when we detect a new character reset the counter
# and also record the character and how many times it was repeated
output = output + string[element - 1] + str(counter)
counter = 1
return output + string[-1] + str(counter)
Also note that you need to start counting form 1 not 0 and get rid of firstLoop

Try changing the loop to for element in range(0, len(string) + 1) and adding an extra if condition:
for element in range(0, len(string) + 1):
if element == len(string):
output = output + string[element-1] + str(counter)
# if statement checking if current character was last character
elif string[element] == string[element - 1]: ...

In the spirit of fixing your code you just needed to simply add the element first to output before adding the counter on change. You can use a neat treat called else with for loops that will run at the end, which will add the final counter to f. No need to buffer or import anything special, you were fairly close:
def compress(string):
output = ""
counter = 0
firstLoop = True
for i in range(len(string)):
# if statement checking if current character was last character
if firstLoop:
counter += 1
output += string[i]
else:
if string[i] == string[i - 1]:
counter += 1
else:
output += str(counter) + string[i]
counter = 1
firstLoop = False
else:
output += str(counter)
return output
data = "aaaabbbchhtttttttf"
print(data)
compressedData = compress(data)
print(compressedData)

My code when run prints the output before it. I want it to print only what is required of that specific count number

Why does my code print the output before in the while loop. I want to print only the count # and the encrypted text for that count #.

put list_text2 inside your loop.
while count < 26:
list_text2 = []
for letter in list_text1:

I'm assuming that you're appending to list_test2 each time for a reason, so rather than sticking it outside of the loop, the following code should just slice the latest update off of the tail end of it, and print just that part only:
import string
text_to_encrypt = ('Please enter some text to encrypt: ')
offset = list(range(1,27))
alphabet = list(string.ascii_lowercase)
list_text2 = []
encrypted_text = ''
list_text1 = list(text_to_encrypt.lower())
adjusted_list = 2*alphabet
count = 1
while count < 26:
for letter in list_text1:
if letter not in alphabet:
list_text2 += letter
else:
index = adjusted_list.index(letter)
list_text2 += adjusted_list[index + offset[count]]
list_text2_len = len(list_text2)
slice_start_index = list_text2_len - len(text_to_encrypt)
print(''.join(list_text2[slice_start_index:list_text2_len]))
encrypted_text = ''.join(list_text2)
count += 1

How to count specific substrings using slice notation

I want to count the number of occurrences of the substring "bob" within the string s. I do this exercise for an edX Course.
s = 'azcbobobegghakl'
counter = 0
numofiterations = len(s)
position = 0
#loop that goes through the string char by char
for iteration in range(numofiterations):
if s[position] == "b": # search pos. for starting point
if s[position+1:position+2] == "ob": # check if complete
counter += 1
position +=1
print("Number of times bob occurs is: " + str(counter))
However it seems that the s[position+1:position+2] statement is not working properly. How do i adress the two chars behind a "b"?

The second slice index isn't included. It means that s[position+1:position+2] is a single character at position position + 1, and this substring cannot be equal to ob. See a related answer. You need [:position + 3]:
s = 'azcbobobegghakl'
counter = 0
numofiterations = len(s)
position = 0
#loop that goes through the string char by char
for iteration in range(numofiterations - 2):
if s[position] == "b": # search pos. for starting point
if s[position+1:position+3] == "ob": # check if complete
counter += 1
position +=1
print("Number of times bob occurs is: " + str(counter))
# 2

You could use .find with an index:
s = 'azcbobobegghakl'
needle = 'bob'
idx = -1; cnt = 0
while True:
idx = s.find(needle, idx+1)
if idx >= 0:
cnt += 1
else:
break
print("{} was found {} times.".format(needle, cnt))
# bob was found 2 times.

Eric's answer explains perfectly why your approach didn't work (slicing in Python is end-exclusive), but let me propose another option:
s = 'azcbobobegghakl'
substrings = [s[i:] for i in range(0, len(s))]
filtered_s = filter(substrings, lambda s: s.startswith("bob"))
result = len(filtered_s)
or simply
s = 'azcbobobegghakl'
result = sum(1 for ss in [s[i:] for i in range(0, len(s))] if ss.startswith("bob"))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can you explain me the RLE algorithm code in python - python

Related

Avoiding Python Off-by-One Error in RLE Algorithm

How to fix a String index out of range exception in Python

Python, how to not require an additional space at the end of the list?

My code when run prints the output before it. I want it to print only what is required of that specific count number

How to count specific substrings using slice notation

Categories

Resources