Can't seem to find the source of my KeyError - python

I'm currently doing a leetcode question, and I feel as though my logic is fairly solid, but I am getting a KeyError: 'B' error on the line that says if window_frequency[lefty] == 1:
and cannot seem to figure it out. Does anyone have any ideas?
Also the purpose of the code is to find the minimum substring of string s that contains all the letters of the second inputted string (aka string t)
code:
def minWindow(s, t):
"""
:type s: str
:type t: str
:rtype: str
"""
# dictionary of character frequencies
# everytime we move the right pointer, we check if the letter frequency is a subset of the
# window frequency dictionary, if it is, we set window_valid to true.
# when the window is no longer valid, we increase move it, editing the frequencies of the #window
letter_frequency = {}
window_frequency = {}
left = 0
current_result = ""
result = ""
# fill dictionary for letter frequency for t
for each in t:
if each not in letter_frequency:
letter_frequency[each] = 1
else:
letter_frequency[each] += 1
for right in range(len(s)):
letter = s[right]
# adding to frequency
if letter not in window_frequency:
window_frequency[letter] = 1
else:
window_frequency[letter] += 1
# setting the result value if the window is valid
window_valid = all(letter_frequency.get(key, None) == val for key, val in window_frequency.items())
while window_valid == True:
current_result = s[left:right]
if len(result) == 0:
result = current_result
elif (len(current_result) < len(result)):
result = current_result
# now we decrease the size of the window till it is no longer valid
lefty = s[left]
if window_frequency[lefty] == 1:
del window_frequency[lefty]
else:
window_frequency[lefty] -= 1
left += 1
return result
print(minWindow("ABAABANCHQR", "ABC"))

Related

Count the number of characters in a string and displaying the frequency count

Word Problem:
Create a function that applies a compression technique to a string and returns the resultant compressed string. More formally, a block is a substring of identical symbols that is as long as possible. A block will be represented in compressed form as the length of the block followed by the symbol in that block. The encoding of a string is the representation of each block in the string in the order in which they appear in the string. Given a sequence of characters, write a program to encode them in this format.
Example Input:
print(rleEncode('WWWWWWWBWWWWWWWBBW'))
Example Output:
'W7B1W7B2W1'
So far, I created a counter and a for loop that will loop through every character in the sting, I don't know how to finish it
def rleEncode(s: str) -> str:
count = 0
index = ""
for i in range(len(s)):
if s[i] in index:
index.append(i)
count += 1
return count, index
I think this prob. what you're looking for? In pure Python:
from itertools import groupby
s = '...your string....'
ans = ''
for k, g in groupby(s):
ans += ''.join(k + str(len(list(g))))
print(ans)
'W7B1W7B2W1'
Here is another purely, pure function solution
w/o even using Python lib - groupby. As you can see it's more lines of code... and some logic to determine where to start/stop new counts.
def encode(s: str) -> str:
count = 1
res = ''
# the first character
res += s[0]
# loop, skipping last one
for i, char in enumerate(s[:-1]):
if s[i] == s[i+1]: # current == next char.
count += 1 # increment count
else: # char changing
if count >= 1:
res += str(count) # convert int to string and add
res += s[i+1]
count = 1 # reset the count
# finally the last one
if count >= 1: # if the char is single ONE.
res += str(count)
return res
print(encode(s)) # W7B1W7B2W
print(encode('ABBA')) # A1B2A1

Having trouble identifying matching characters in two strings

I am able to iterate through two given strings of the same length.
I am supposed to output a green emoji if the letters in guess_word are are also contained and in the correct position of the secret_word. If a letter in the guess_word is in the secret_word but it's in the wrong position, then I should have an output of a yellow emoji.
This is where my issue is. The yellow emoji is not showing up in my output, only the green and white boxes. I have a picture below of what it should look like in the output.
I have to stick to two different functions because this is what my homework is asking me to do.
def contains_char(any_length: str, single_character: str) -> bool:
"""Loop iterates through each character in string to find matching character."""
assert len(single_character) == 1
if single_character in any_length:
return True
else:
return False
def emojified(guess_word: str, secret_word: str) -> str:
"""A way to match letters to its corresponding emoji color output. """
assert len(guess_word) == len(secret_word)
WHITE_BOX: str = "\U00002B1C"
GREEN_BOX: str = "\U0001F7E9"
YELLOW_BOX: str = "\U0001F7E8"
emoji_color: str = ""
i: int = 0
while i < len(secret_word):
i += 1
if guess_word[0] in secret_word[0]:
emoji_color += GREEN_BOX
i += 1
else:
emoji_color += WHITE_BOX
i += 1
if contains_char is True:
emoji_color += YELLOW_BOX
else:
emoji_color += WHITE_BOX
return emoji_color
Output:
Without changing too much, here's how I would write your while-loop:
while i < len(secret_word):
guess_char = guess_word[i]
secret_char = secret_word[i]
current_emoji = WHITE_BOX # By default, we display a white box for this position
if contains_char(secret_word, guess_char):
current_emoji = YELLOW_BOX # guess_char is in secret_word. guess_char may even be in the correct position, but we don't care about that yet.
if guess_char == secret_char:
current_emoji = GREEN_BOX # guess_char is in the correct position.
emoji_color += current_emoji
i += 1
return emoji_color
In order to properly process the "same-position" matches without interfering with the "different-position" matched, you need to check for same-positions first and fallback on diffrent-position matches otherwise.
You can do this by replacing characters in a comprehension that you join into the final result.
For example:
def emojified(guess,secret):
WHITE_BOX = "\U00002B1C"
GREEN_BOX = "\U0001F7E9"
YELLOW_BOX = "\U0001F7E8"
return "".join( GREEN_BOX if s==g # same-position
else YELLOW_BOX if g in secret # different-position
else WHITE_BOX # no match
for s,g in zip(secret,guess)) # character pairs
Output:
print(emojified("hello","world")) # ⬜️⬜️🟨🟩🟨
print(emojified("elloh","hello")) # 🟨🟨🟩🟨🟨
print(emojified("python","woohoo")) # ⬜️⬜️⬜️🟩🟩⬜️
print(emojified("python","python")) # 🟩🟩🟩🟩🟩🟩
print(emojified("yikyak","tiktok")) # ⬜️🟩🟩⬜️⬜️🟩

how do i run length encode a pattern, rather than a character?

heres my current RLE code
import re
def decode(string):
if string == '':
return ''
multiplier = 1
count = 0
rle_decoding = []
rle_encoding = []
rle_encoding = re.findall(r'[A-Za-z]|-?\d+\.\d+|\d+|[\w\s]', string)
for item in rle_encoding:
if item.isdigit():
multiplier = int(item)
elif item.isalpha() or item.isspace():
while count < multiplier:
rle_decoding.append('{0}'.format(item))
count += 1
multiplier = 1
count = 0
return(''.join(rle_decoding))
def encode(string):
if string == '':
return ''
i = 0
count = 0
letter = string[i]
rle = []
while i <= len(string) - 1:
while string[i] == letter:
i+= 1
count +=1
#catch the loop on last character so it doesn't got to top and access out of bounds
if i > len(string) - 1:
break
if count == 1:
rle.append('{0}'.format(letter))
else:
rle.append('{0}{1}'.format(count, letter))
if i > len(string) - 1: #ugly that I have to do it twice
break
letter = string[i]
count = 0
final = ''.join(rle)
return final
the code might have gotten fucked up when I removed all my comments, but the current code isn't too important. the problem is, I am running RLE on hexadecimal values, that have all been converted to letters so that 0-9 becomes g-p. the problem is that there are a lot of patterns like 'kjkjkjkjkjkjkjkjlmlmlmlmlmlmlm' which doesn't compress at all, because of their not single characters. how would I, if even possible, be able to run my program so that it encodes patterns as well?

CS50 DNA works for small.csv but not for large

I am having problems with CS50 pset6 DNA. It is getting all the right values and gives correct answers when I use the small.csv file but not when I use the large one. I have been going through it with debug50 for over a week and can't figure out the problem. I assume the problem is somewhere in the loop through the samples to find the STRS but I just don't see what it is doing wrong when walking through it.
If you are unfamiliar with CS50 DNA problemset, the code is supposed to look through a dna sequence (argv[1]) and compare it with a CSV file containing people DNA STRs to figure out which person (if any) it belongs to.
Note; My code fails within the case; (Python dna.py databases/large.csv sequences/5.txt) if this helps.
from sys import argv
from csv import reader
#ensures correct number of arguments
if (len(argv) != 3):
print("usage: python dna.py data sample")
#dict for storage
peps = {}
#storage for strands we look for.
types = []
#opens csv table
with open(argv[1],'r') as file:
data = reader(file)
line = 0
number = 0
for l in data:
if line == 0:
for col in l:
if col[2].islower() and col != 'name':
break
if col == 'name':
continue
else:
types.append(col)
line += 1
else:
row_mark = 0
for col in l:
if row_mark == 0:
peps[col] = []
row_mark += 1
else:
peps[l[0]].append(col)
#convert sample to string
samples = ""
with open(argv[2], 'r') as sample:
for c in sample:
samples = samples + c
#DNA STR GROUPS
dna = { "AGATC" : 0,
"AATG" : 0,
"TATC" : 0,
"TTTTTTCT" : 0,
"TCTAG" : 0,
"GATA" : 0,
"GAAA" : 0,
"TCTG" : 0 }
#go through all the strs in dna
for keys in dna:
#the longest run of sequnace
longest = 0
#the current run of sequances
run = 0
size = len(keys)
#look through sample for longest
i = 0
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
if run > longest:
longest = run
run = 0
i += 1
dna[keys] = longest
#see who it is
positive = True
person = ''
for key in peps:
positive = True
for entry in types:
x = types.index(entry)
test = dna.get(entry)
can = int(peps.get(key)[x])
if (test != can):
positive = False
if positive == True:
person = key
break
if person != '':
print(person)
else:
print("No match")
Problem is in this while loop. Look at this code carefully.
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
if run > longest:
longest = run
run = 0
i += 1
You have a missing logic here. You are supposed to check the longest consecutive DNA sequence. So when you have a repetition of dna sequence back to back, you need to find how many times it is repeated. When it is no longer repeated, only then, you need to check if this is the longest sequence.
Solution
You need to add else statement after if hold==keys: statement. This would be the right fix;
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
else: #only if there is no longer sequence match, check this.
if run > longest:
longest = run
run = 0
else: #if the number of sequence match is still smaller then longest, then make run zero.
run = 0
i += 1
earik87 is absolutely right! Just I like to add, the code is missing an = to work for all the cases especially when you have redundant sequences.
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples **( I added =)**
if ((i + size) <= len(samples)):
i = i + size
continue
else: #only if there is no longer sequence match, check this.
if run > longest:
longest = run
run = 0
else: #if the number of sequence match is still smaller then longest, then make run zero.
run = 0
i += 1

String index out of range (Python)

I'm writing a program to encode, decode and crack with the Caesar Cipher.
I have this function that shifts the letters in a string along by a specified amount:
def shift(data, shifter):
alphabet = "abcdefghijklmnopqrstuvwxyz"
data = list(data)
counter = 0 #  we will use this to modify the list while we iterate over it
for letter in data:
letter = letter.lower()
if letter not in alphabet:
counter += 1
continue
lPos = alphabet.find(letter)
if shifter >= 0:
shiftedPos = lPos + (0 - shifter)
else:
shiftedPos = lPos + abs(shifter)
if shiftedPos >= len(alphabet) - 1: shiftedPos -= len(alphabet)
data[counter] = alphabet[shiftedPos] #  update the letter
counter += 1 # advance
data = ''.join(data) # make it into a string again
return data
And I have this function to crack a ciphered string:
def crack(decryptor=None, tries=None):
if decryptor is None and tries is None:
task = getValidInput("Get data from a [f]ile or [s]tdin? >", "Please give either 'f' or 's'.", 'f', 's')
if task == "f": # it's a file
dataFile = getValidFile() # get an open file object
data = dataFile.read() # get the data from the text file. hopefully it's ascii text!
elif task == "s": # we need to get data from stdin
data = input("Enter data to crack >")
tries = getValidInt("Enter tries per sweep >")
else:
data = decryptor
retry = True
shifter = 0
while retry:
for i in range(0, tries):
oput = "Try " + str(i) + ": "
posData = shift(data, shifter)
negData = shift(data, 0 - shifter)
# semitry 1 - positive
oput += posData + ", "
# semitry 2 - negative
negData = ''.join(negData) # make it into a string again
oput += negData
print(oput)
shifter += 1
doRetry = getValidInput("Keep trying (y/n)? > ", "Invalid!", 'y', 'n')
if doRetry == 'n': retry = False
However, after selecting 'y' to continue a few times, I get the following IndexError:
Traceback (most recent call last):
File "CeaserCypher.py", line 152, in <module>
crack()
File "CeaserCypher.py", line 131, in crack
negData = shift(data, 0 - shifter)
File "CeaserCypher.py", line 60, in shift
print(alphabet[shiftedPos])
IndexError: string index out of range
Why am I getting this error and how can I fix it?
IndexError means that the index you are trying to access does not exist. In a string, that means you're trying to get a character from the string at a given point. If that given point does not exist, then you will be trying to get a character that is not inside of the string.
"0123456"[7] tries to get the 7th character in the string, but that index does not exist so "IndexError" is raised.
All valid indexes on a string are less than the length of the string (when you do len(string)). In your case, alphabet[shiftedPos] raises IndexError because shiftedPos is greater than or equal to the length of the string "alphabet".
To my understanding, what you want to do is loop back over the string when you go out of bounds like this. "z" (character 25) gets incrimented by say 2 and becomes character 27. You want that to now become character 2 (letter "b") in this case. Hence, you should use modulo. replace "alphabet[shiftedPos]" with "alphabet[shiftedPos%len(alphabet)]" and I believe this will solve this problem.
Modulo, btw, divides a number by n and gives you the remainder. Effectively, it will subtract n until the number is less than n (so it will always be in the range you want it to be in).

Categories