How to make dictionary of alphabet and scrambled alphabet? - python

I need to create multiple dictionaries for another program using the alphabet as the key and a scrambled alphabet (the first one I have to do is BDFHJLCPRTXVZNYEIWGAKMUSQO) as the value. This is what I have so far but it says "unhashable type: 'list'"
_list = input("Enter list: ")
alpha = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
dict1 = {}
for i in range(0,len(_list)):
dict1[alpha][i] == _list[i]
print(dict1)

There are two errors in your code.
dict1[alpha][i] should be dict1[alpha[i]]
As pointed out by Paul Rooney, you cannot use a list as a dictionary key, but I don't think that is what you are trying to do. Instead, you just want the ith element of the list alpha.
the == should be =.
== is used to compare, = is used to assign a value.
Also, the pythonic way to do your iteration is as follows:
dict1 = {alpha[idx]: character for idx, character in enumerate(_list)}
which is the equivalent of
dict1 = {}
for idx, character in enumerate(_list):
dict1[alpha[idx]] = character

Related

Loop over string based on keys in dictionary

Instead of looping over each separate character of a string, I want to loop over parts of a string (multiple characters). Those parts are defined by the keys of a dictionary.
Example:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
output = ""
What I've tried (looping over each character separately, indeed):
for i in word:
letter = my_dict[i]
output += letter
word = word.lstrip(letter)
My output:
"KeyError: '1'"
But I want to get key "1000" and its value "i", and then continue with key "0010" and get its value "n", etc...
Expected output:
# Expected output:
output = "internet"
Assuming it's a prefix code (otherwise you'd need to define how to deal with ambiguities), accumulate the bits until you have a match, then output the letter and clear the bits:
output = ""
bits = ""
for bit in word:
bits += bit
if bits in my_dict:
letter = my_dict[bits]
output += letter
bits = ""
Try it online!
Slight variation of it the lookup, reminded by Jnevill's answer:
if letter := my_dict.get(bits):
output += letter
You could use a regular expression to substitutes the patterns with the corresponding letters. re.sub allows use of a function for the replacement which could be access to the dictionary to get the letters. The search pattern would need to have the longer values first so that they are "consumed" in priority over shorter patterns that could start with the same bits:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
import re
pattern = "|".join(sorted(my_dict.keys(),key=len,reverse=True))
output = re.sub(pattern,lambda m:my_dict[m.group(0)],word)
print(output) # internet
[EDIT]
If there are no conflicts between short and long bit patterns, the sort is not needed (as Kelly pointed out), the solution could be a single line:
output = re.sub('|'.join(my_dict),lambda m:my_dict[m[0]],word)
Issue with your code:
for i in word: # here, i is a single character
# so you can't get corresponding value since it's multiple character keys
letter = my_dict[i]
output += letter # this would work fine
word = word.lstrip(letter)
You can do a while loop on word, and remove the part you found in the dict each time. When words is empty, you will stop looping and the program ends.
You can iterate over each key in the dict and test if it match the beginning of the word. If it does, you have the letter you are looking for. Do what you want instead of the print, and repeat.
translate_table = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
message = "1000001001100001100000100000110"
while message:
for code, letter in translate_table.items():
if message.startswith(code):
# replace this with whatever you want to do with the letter
print(letter, end="")
# "Cut" the word to keep the remaining characters
message = message[len(code):]
break # a letter was found, move to next while iteration
While iterating my_dict (as DorianTurba suggests) feels like a more elegant solution, your gut was suggesting that you should iterate word. To do this you can use a while loop and then manage the length of characters you jump in each iteration depending on the size of the my_dict key that matches the first 3, 4, or 5 characters in word.
Consider:
my_dict = {'010': 'a', '000': 'e', '1101': 'f', '1010': 'h', '1000': 'i', '0111': 'm', '0010': 'n', '1011': 's', '0110': 't', '11001': 'l', '00110': 'o', '10011': 'p', '11000': 'r', '00111': 'u', '10010': 'x'}
word = "1000001001100001100000100000110"
i=0
while len(word) > i:
for size in [3,4,5]:
if my_dict.get(word[i:i+size]):
print(my_dict[word[i:i+size]])
i += size
break

How can I reference a string (e.g. 'A') to the index of a larger list (e.g. ['A', 'B', 'C', 'D', ...])?

I have been racking my brain and scouring the internet for some hours now, please help.
Effectively I am trying to create a self-contained function (in python) for producing a caesar cipher. I have a list - 'cache' - of all letters A-Z.
def caesarcipher(text, s):
global rawmessage #imports a string input - the 'raw message' which is to be encrypted.
result = ''
cache = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
Is it possible to analyze the string input (the 'rawmessage') and attribute each letter to its subsequent index position in the list 'cache'? e.g. if the input was 'AAA' then the console would recognise it as [0,0,0] in the list 'cache'. Or if the input was 'ABC' then the console would recognise it as [0,1,2] in the list 'cache'.
Thank you to anyone who makes the effort to help me out here.
Use a list comprehension:
positions = [cache.index(letter) for letter in rawmessage if letter in cache]
You can with a list comprehension. Also you can get the letter from string.
import string
print([string.ascii_uppercase.index(c) for c in "AAA"])
# [0, 0, 0]
print([string.ascii_uppercase.index(c) for c in "ABC"])
# [0, 1, 2]
result = []
for i in list(rawmessage):
result.append(cache.index(i))

Python: How to replace string elements using an array of tuples?

I'm trying to replace the characters of the reversed alphabet with those of the alphabet. This is what I've got:
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
rev_alphabet = alphabet[::-1]
sample = "wrw blf hvv ozhg mrtsg'h vkrhlwv?"
def f(alph, rev_alph):
return (alph, rev_alph)
char_list_of_tups = list(map(f, alphabet, rev_alphabet))
for alph, rev_alph in char_list_of_tups:
sample = sample.replace(rev_alph, alph)
print(sample)
expected output: did you see last night's episode?
actual output: wrw you svv ozst nrtst's vprsowv?
I understand that I'm printing the last "replacement" of the whole iteration. How can I avoid this without appending it to a list and then running into problems with the spacing of the words?
Your problem here is that you lose data as you perform each replacement; for a simple example, consider an input of "az". On the first replacement pass, you replace 'z' with 'a', and now have "aa". When you get to replacing 'a' with 'z', it becomes "zz", because you can't tell the difference between an already replaced character and one that's still unchanged.
For single character replacements, you want to use the str.translate method (and the not strictly required, but useful helper function, str.maketrans), to do character by character transliteration across the string in a single pass.
from string import ascii_lowercase # No need to define the alphabet; Python provides it
# You can use the original str form, no list needed
# Do this once up front, and reuse it for as many translate calls as you like
trans_map = str.maketrans(ascii_lowercase[::-1], ascii_lowercase)
sample = sample.translate(trans_map)
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
# or
alphabet = [chr(97 + i) for i in range(0,26)]
sample = "wrw blf hvv ozhg mrtsg'h vkrhlwv?"
res = []
for ch in sample:
if ch in alphabet:
res.append(alphabet[-1 - alphabet.index(ch)])
else:
res.append(ch)
print("".join(res))
Another Way if you are ok with creating a new string instead.
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
dictRev = dict(zip(alphabet, alphabet[::-1]))
sample = "wrw blf hvv ozhg mrtsg'h vkrhlwv?"
s1="".join([dictRev.get(char, char) for char in sample])
print(s1)
"did you see last night's episode?"

Remove words from list containing certain characters

I have a long list of words that I'm trying to go through and if the word contains a specific character remove it. However, the solution I thought would work doesn't and doesn't remove any words
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
validwords = []
for i in l3:
for x in firstchect:
if i not in x:
validwords.append(x)
continue
else:
break
If a word from firstcheck has a character from l3 I want it removed or not added to this other list. I tried it both ways. Can anyone offer insight on what could be going wrong? I'm pretty sure I could use some list comprehension but I'm not very good at that.
The accepted answer makes use of np.sum which means importing a huge numerical library to perform a simple task that the Python kernel can easily do by itself:
validwords = [w for w in firstcheck if all(c not in w for c in l3)]
you can use a list comprehension:
import numpy as np
[w for w in firstcheck if np.sum([c in w for c in l3])==0]
It seems all the words contain at least 1 char from l3 and the output of above is an empty list.
If firstcheck is defined as below:
firstcheck = ['a', 'z', 'poach', 'omnificent']
The code should output:
['a', 'z']
If you want to avoid all loops etc, you can use re directly.
import re
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['azz', 'poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
# Create a regex string to remove.
strings_to_remove = "[{}]".format("".join(l3))
validwords = [x for x in firstcheck if re.sub(strings_to_remove, '', x) == x]
print(validwords)
Output:
['azz']
Ah, there was some mistake in code, rest was fine:
l3 = ['b', 'd', 'e', 'f', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']
firstcheck = ['aza', 'ca', 'poach', 'omnificent', 'aminoxylol', 'teetotaller', 'kyathos', 'toxaemic', 'herohead', 'desole', 'nincompoophood', 'dinamode']
validwords = []
flag=1
for x in firstcheck:
for i in l3:
if i not in x:
flag=1
else:
flag=0
break
if(flag==1):
validwords.append(x)
print(validwords)
So, here the first mistake was, the for loops, we need to iterate through words first then, through l3, to avoid the readdition of elements.
Next, firstcheck spelling was wrong in 'for x in firstcheck` due to which error was there.
Also, I added a flag, such that if flag value is 1 it will add the element in validwords.
To, check I added new elements as 'aza' and 'ca', due to which, now it shows correct o/p as 'aza' and 'ca'.
Hope this helps you.

Is that a tag list or something else?

I am new to NLP and NLTK, and I want to find ambiguous words, meaning words with at least n different tags. I have this method, but the output is more than confusing.
Code:
def MostAmbiguousWords(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
if wordsUniqeTags.has_key(w):
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
else:
wordsUniqeTags[w] = set([t])
# Starting to count
res = []
for w in wordsUniqeTags:
if len(wordsUniqeTags[w]) >= n:
res.append((w, wordsUniqeTags[w]))
return res
MostAmbiguousWords(brown.tagged_words(), 13)
Output:
[("what's", set(['C', 'B', 'E', 'D', 'H', 'WDT+BEZ', '-', 'N', 'T', 'W', 'V', 'Z', '+'])),
("who's", set(['C', 'B', 'E', 'WPS+BEZ', 'H', '+', '-', 'N', 'P', 'S', 'W', 'V', 'Z'])),
("that's", set(['C', 'B', 'E', 'D', 'H', '+', '-', 'N', 'DT+BEZ', 'P', 'S', 'T', 'W', 'V', 'Z'])),
('that', set(['C', 'D', 'I', 'H', '-', 'L', 'O', 'N', 'Q', 'P', 'S', 'T', 'W', 'CS']))]
Now I have no idea what B,C,Q, ect. could represent. So, my questions:
What are these?
What do they mean? (In case they are tags)
I think they are not tags, because who and whats don't have the WH tag indicating "wh question words".
I'll be happy if someone could post a link that includes a mapping of all possible tags and their meaning.
It looks like you have a typo. In this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
you should have set([t]) (not set(t)), like you do in the else case.
This explains the behavior you're seeing because t is a string and set(t) is making a set out of each character in the string. What you want is set([t]) which makes a set that has t as its element.
>>> t = 'WHQ'
>>> set(t)
set(['Q', 'H', 'W']) # bad
>>> set([t])
set(['WHQ']) # good
By the way, you can correct the problem and simplify things by just changing that line to:
wordsUniqeTags[w].add(t)
But, really, you should make use of the setdefault method on dict and list comprehension syntax to improve the method overall. So try this instead:
def most_ambiguous_words(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
wordsUniqeTags.setdefault(w, set()).add(t)
# Starting to count
return [(word,tags) for word,tags in wordsUniqeTags.iteritems() if len(tags) >= n]
You are splitting your POS tags into single characters in this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
set('AT') results in set(['A', 'T']).
How about making use of the Counter and defaultdict functionality in the collections module?
from collection import defaultdict, Counter
def most_ambiguous_words(words, n):
counts = defaultdict(Counter)
for (word,tag) in words:
counts[word][tag] += 1
return [(w, counts[w].keys()) for w in counts if len(counts[word]) > n]

Categories