why does the replace character remove both brackets? - python

Hey guys in new to Python. And I was playing around writing python and I'm stuck.
words = ['w','hello.','my','.name.','(is)','james.','whats','your','name?']
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
'''#position of the invalid words i.e the ones that '''
inpos = -1
for word in words:
inpos = inpos + 1
#pass
for letter in word:
#print(letter)
if letter in alphabet:
pass
#print('Valid')
elif letter not in alphabet:
new_word = word.replace(letter,"")
print(word)
print(new_word)
words[inpos] = new_word
print(words)
This code is meant to clean the text (remove all full stops, commas, and other characters)
The problem is when I run it removes the adds the brackets
Heres the output:
Image of output
Can anyone explain why this is happening?

No, it does not add anything. You're printing both old and new word:
print(word)
print(new_word)
so when the new_word is (is the word is still (is).
BTW your code has a logical error: when you remove a character you put back new_word in the list, but word is still the old value. So only the last change for every word will be saved in the list words.

Related

Loop over string based on keys in dictionary

Instead of looping over each separate character of a string, I want to loop over parts of a string (multiple characters). Those parts are defined by the keys of a dictionary.
Example:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
output = ""
What I've tried (looping over each character separately, indeed):
for i in word:
letter = my_dict[i]
output += letter
word = word.lstrip(letter)
My output:
"KeyError: '1'"
But I want to get key "1000" and its value "i", and then continue with key "0010" and get its value "n", etc...
Expected output:
# Expected output:
output = "internet"
Assuming it's a prefix code (otherwise you'd need to define how to deal with ambiguities), accumulate the bits until you have a match, then output the letter and clear the bits:
output = ""
bits = ""
for bit in word:
bits += bit
if bits in my_dict:
letter = my_dict[bits]
output += letter
bits = ""
Try it online!
Slight variation of it the lookup, reminded by Jnevill's answer:
if letter := my_dict.get(bits):
output += letter
You could use a regular expression to substitutes the patterns with the corresponding letters. re.sub allows use of a function for the replacement which could be access to the dictionary to get the letters. The search pattern would need to have the longer values first so that they are "consumed" in priority over shorter patterns that could start with the same bits:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
import re
pattern = "|".join(sorted(my_dict.keys(),key=len,reverse=True))
output = re.sub(pattern,lambda m:my_dict[m.group(0)],word)
print(output) # internet
[EDIT]
If there are no conflicts between short and long bit patterns, the sort is not needed (as Kelly pointed out), the solution could be a single line:
output = re.sub('|'.join(my_dict),lambda m:my_dict[m[0]],word)
Issue with your code:
for i in word: # here, i is a single character
# so you can't get corresponding value since it's multiple character keys
letter = my_dict[i]
output += letter # this would work fine
word = word.lstrip(letter)
You can do a while loop on word, and remove the part you found in the dict each time. When words is empty, you will stop looping and the program ends.
You can iterate over each key in the dict and test if it match the beginning of the word. If it does, you have the letter you are looking for. Do what you want instead of the print, and repeat.
translate_table = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
message = "1000001001100001100000100000110"
while message:
for code, letter in translate_table.items():
if message.startswith(code):
# replace this with whatever you want to do with the letter
print(letter, end="")
# "Cut" the word to keep the remaining characters
message = message[len(code):]
break # a letter was found, move to next while iteration
While iterating my_dict (as DorianTurba suggests) feels like a more elegant solution, your gut was suggesting that you should iterate word. To do this you can use a while loop and then manage the length of characters you jump in each iteration depending on the size of the my_dict key that matches the first 3, 4, or 5 characters in word.
Consider:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
i=0
while len(word) > i:
for size in [3,4,5]:
if my_dict.get(word[i:i+size]):
print(my_dict[word[i:i+size]])
i += size
break

Making a Pangram function in Python but having a problem with the letter T

first time posting, new to programming. Issue I'm having is when I run my Pangram function, and turn the input string into a set list, the list still has multiple 't' but none of the other letters. when I input "The quick brown fox jumps over the lazy dog" when my code turns this to an organized list to match the alphabet, there are 2 t's. As you can see i'm using print to see what everything is doing
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 't', 'u', 'v', 'w', 'x', 'y', 'z']
above is what my input string converts to and there are 2 t's but all other multiple letters are gone. I also tried making the upper T a lower T manually, and also making other random letter upper and it has no problem with other letters.
def ispangram(str1, alphabet=string.ascii_lowercase):
str1 = str1.replace(' ','')
str1 = list(set(str1))
str1 = [letter.lower() for letter in str1]
str1.sort()
print(str1)
alphabet = list(set(alphabet))
alphabet.sort()
print(alphabet)
if str1 == alphabet:
return 'Is Pangram!'
else:
return 'Is not Pangram!'
You're converting your string to lowercase after collecting a set, you should make it lowercase before the set
str1 = list(set(str1.lower()))
You are lowercasing the characters after building the set, so T and t are considered different.
Instead, lowercase before building the set:
def ispangram(str1, alphabet=string.ascii_lowercase):
str1 = str1.replace(' ','')
str1 = [letter.lower() for letter in str1]
str1 = list(set(str1))
str1.sort()
print(str1)
Or in a much shorter way:
def ispangram(str1, alphabet=string.ascii_lowercase):
str1 = sorted(set(str1.replace(' ','').lower()))

Pig_Latin Translator & for loop

Pig Latin translator is basically an introductory code for many projects, and it involves getting words that start with consonants to move the first letter to the end of the word and add 'ay', and adding only 'ay' to a word that starts with a vowel. I have basically finished the whole project except for some places, and one main thing is getting the program to complete a whole sentence with a for loop, especially in moving to the next list item
I have tried to do simple things such as n+1 at the end of the for loop, and researched online (majority on Stackoverflow)
latin = ""
n = 0
sentence = input ("Insert your sentence ")
word = sentence.split()
first_word = word [n]
first_letter = first_word [0]
rest = first_word [1:]
consonant = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'z')
vowel = ['a', 'e', 'i', 'o', 'u')
for one in consonant :
latin = latin + " " + rest + one + "ay"
print (latin)
n + 1
for one in vowel :
latin = latin + " " + first_word +ay
print (latin)
n + 1
There was no error message, however, the computer ran the variable 'one' not as the first (zeroth) letter of the variable first_word, rather, just running it from a - z. Is there anyway to fix this? Thank you
#!/usr/bin/env python
sentence = input("Insert your sentence ")
# don't forget "y" :)
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
# this empty list will hold our output
output_words = list()
# we'll take the user's input, strip any trailing / leading white space (eg.: new lines), and split it into words, using spaces as separators.
for word in sentence.strip().split():
first_letter = word[0]
# check if the first letter (converted to lower case) is in the consonants list
if first_letter.lower() in consonants:
# if yes, take the rest of the word, add the lower case first_letter, then add "ay"
pig_word = word[1:] + first_letter.lower() + "ay"
# if it's not a consonant, do nothing :(
else:
pig_word = word
# add our word to the output list
output_words.append(pig_word)
# print the list of output words, joining them together with a space
print(" ".join(output_words))
The loop loops over each word in the sentence - no need to count anything with n. There's also no need to loop over the consonants or the vowels, all we care about is "does this word start with a consonant" - if yes, modify it as per your rules.
We store our (possibly) modified words in an output list, and when we're done, we print all of them, joined by a space.
Note that this code is very buggy - what happens if your user's input contains punctuation?
I opehay hattay elpshay, eyuleochoujay

How can I pop this letter if it exists based on this algorithm?

I'm trying to pop any letters given by the user, for example if they give you a keyword "ROSES" then these letters should be popped out of the list.
Note: I have a lot more explanation after the SOURCE CODE
SOURCE CODE
alphabet = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
encrypted_message = []
key_word = str(input("Please enter a word:"))
key_word = list(key_word)
#print(key_word)
check_for_dup = 0
for letter in key_word:
for character in alphabet:
if letter in character:
check_for_dup +=1
alphabet.pop(check_for_dup)
print(alphabet)
print(encrypted_message)
SAMPLE INPUT
Let's say keyword is "Roses"
this what it gives me a list of the following ['A', 'C', 'E', 'G', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
But that's wrong It should of just removed the characters that existed in the keyword given by the user, like the word "Roses" each letter should of been removed and not in the list being popped. as you can see in the list the letters "B","D","F","H",etc were gone. What I'm trying to do is pop the index of the alphabet letters that the keyword exists.
this is what should of happened.
["A","B","C","D","F","G","H","I","J","K","L","M","N","P","Q","T","U","V","W","X","Y","Z"]
The letters of the keyword "ROSES" were deleted of the list
There is some shortcomings in your code here is an implementation that works:
alphabet = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
key_word = str(input("Please enter a word:"))
for letter in key_word.upper():
if letter in alphabet:
alphabet.remove(letter)
print(alphabet)
Explanations
You can iterate on a string, no need to cast it as a list
Use remove since you can use the str type directly
You need to .upper() the input because you want to remove A if the user input a
Note that I did not handle encrypted_message since it is unused at the moment.
Also, as some comments says you could use a set instead of a list since lookups are faster for sets.
alphabet = {"A","B","C","D",...}
EDIT
alphabet = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
key_word = str(input("Please enter a word:"))
encrypted_message = []
for letter in key_word.upper():
if letter in alphabet:
alphabet.remove(letter)
encrypted_message.append(letter)
encrypted_message.extend(alphabet)
This is a new implementation with the handling of your encrypted_message. This will keep the order of the alphabet after the input of the user. Also, if you're wondering why there's no duplicate, you will be appending only if letter is in alphabet which means the second time it won't be in alphabet and therefore not added to your encrypted_message.
you can directly check with input key
iterate all the letters in data, and check whether or not the letter in the input_key, if yes discard it
data = ['A', 'C', 'E', 'G', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
input_key = 'Roses'
output = [l for l in data if l not in input_key.upper()]
print output
['A', 'C', 'G', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
You can do something like this....
import string
alphabet = list(string.ascii_lowercase)
user_word = 'Roses'
user_word = user_word.lower()
letters_to_remove = set(list(user_word)) # Create a unique set of characters to remove
for letter in letters_to_remove:
alphabet.remove(letter)
alphabet = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
# this can be accepted as a user input as well
userinput = "ROSES"
# creates a list with unique values in uppercase
l = set(list(userinput.upper()))
# iterates over items in l to remove
# the corresponding items in alphabet
for x in l:
alphabet.remove(x)
' '.join(alphabet).translate({ord(c): "" for c in keyword}).split()
Try this:
import string
key_word = set(input("Please enter a word:").lower())
print(sorted(set(string.ascii_lowercase) - key_word))
Explanation:
When checking for (in)existence, it's better to use set. https://docs.python.org/3/tutorial/datastructures.html#sets
Instead of iterating over it N times (N = number of input characters), you hash the character N times and check if there's already something at the result hash or not.
If you check the speed, try with "WXYZZZYW" and you'll see that it'll be a lot slower than if it were "ABCDDDCA" with the list way. With set, it will be always the same time.
The rest is pretty trivial. Casting to lowercase (or uppercase), to make sure it hits a match, case insensitive.
And then, we end by doing a set difference (-). It's all the items that are in the first set but not in the second one.

Why is my implementation of binary search very inefficient?

I am doing a Python exercise to search a word from a given sorted wordlist, containing more than 100,000 words.
When using bisect_left from the Python bisect module, it is very efficient, but using the binary method created by myself is very inefficient. Could anyone please clarify why?
This is the searching method using the Python bisect module:
def in_bisect(word_list, word):
"""Checks whether a word is in a list using bisection search.
Precondition: the words in the list are sorted
word_list: list of strings
word: string
"""
i = bisect_left(word_list, word)
if i != len(word_list) and word_list[i] == word:
return True
else:
return False
My implementation is really very inefficient (don't know why):
def my_bisect(wordlist,word):
"""search the given word in a wordlist using
bisection search, also known as binary search
"""
if len(wordlist) == 0:
return False
if len(wordlist) == 1:
if wordlist[0] == word:
return True
else:
return False
if word in wordlist[len(wordlist)/2:]:
return True
return my_bisect(wordlist[len(wordlist)/2:],word)
if word in wordlist[len(wordlist)/2:]
will make Python search through half of your wordlist, which is kinda defeating the purpose of writing a binary search in the first place. Also, you are not splitting the list in half correctly. The strategy for binary search is to cut the search space in half each step, and then only apply the same strategy to the half which your word could be in. In order to know which half is the right one to search, it is critical that the wordlist is sorted. Here's a sample implementation which keeps track of the number of calls needed to verify whether a word is in wordlist.
import random
numcalls = 0
def bs(wordlist, word):
# increment numcalls
print('wordlist',wordlist)
global numcalls
numcalls += 1
# base cases
if not wordlist:
return False
length = len(wordlist)
if length == 1:
return wordlist[0] == word
# split the list in half
mid = int(length/2) # mid index
leftlist = wordlist[:mid]
rightlist = wordlist[mid:]
print('leftlist',leftlist)
print('rightlist',rightlist)
print()
# recursion
if word < rightlist[0]:
return bs(leftlist, word) # word can only be in left list
return bs(rightlist, word) # word can only be in right list
alphabet = 'abcdefghijklmnopqrstuvwxyz'
wl = sorted(random.sample(alphabet, 10))
print(bs(wl, 'm'))
print(numcalls)
I included some print statements so you can see what is going on. Here are two sample outputs. First: word is in the wordlist:
wordlist ['b', 'c', 'g', 'i', 'l', 'm', 'n', 'r', 's', 'v']
leftlist ['b', 'c', 'g', 'i', 'l']
rightlist ['m', 'n', 'r', 's', 'v']
wordlist ['m', 'n', 'r', 's', 'v']
leftlist ['m', 'n']
rightlist ['r', 's', 'v']
wordlist ['m', 'n']
leftlist ['m']
rightlist ['n']
wordlist ['m']
True
4
Second: word is not in the wordlist:
wordlist ['a', 'c', 'd', 'e', 'g', 'l', 'o', 'q', 't', 'x']
leftlist ['a', 'c', 'd', 'e', 'g']
rightlist ['l', 'o', 'q', 't', 'x']
wordlist ['l', 'o', 'q', 't', 'x']
leftlist ['l', 'o']
rightlist ['q', 't', 'x']
wordlist ['l', 'o']
leftlist ['l']
rightlist ['o']
wordlist ['l']
False
4
Note that if you double the size of the wordlist, i.e. use
wl = sorted(random.sample(alphabet, 20))
numcalls on average will be only one higher than for a wordlist of length 10, because wordlist has to be split in half only once more.
to search if a word is in a wordlist simply (python 2.7):
def bisect_fun(listfromfile, wordtosearch):
bi = bisect.bisect_left(listfromfile, wordtosearch)
if listfromfile[bi] == wordtosearch:
return listfromfile[bi], bi

Categories