How to find if the keys within two nested dictionaries match? - python

Objective of the project :
Compare an input to a pre-existing index and return the closest match in term or letters frequencies.
Basically the desired output would do in the comparison function with an index like this:
index = {'nino': {'n': '0.50', 'o': '0.25', 'i': '0.25'},
'pablo': {'l': '0.20', 'p': '0.20', 'o': '0.20', 'b': '0.20', 'a': '0.20'}}
and compare it with the input string from which I would calculate the letters frequencies as well to return a similar output, the letters frequencies:
{'y': '0.20', 'k': '0.20', 'o': '0.20', 'c': '0.20', 'r': '0.20'}
Once I have that, I would iterate through both dictionaries and check for each item what letters are present.
Once they are present, I would compare the frequencies in the word and attribute points, then compare the results and return the one that scores most points.
I have had no trouble with the end of the code.
However what I can not seem to get right is the iteration between the two dictionaries and their nested elements( the value is a dictionary after all:
I have tried the two sets approach and then getting the union of both sets but then I am unable to the next part: It says that set are immutable.
Then I tried adapting code from an answer I found here:
python dictionary match key values in two dictionaries
Then I tried this option inspired from the answer above:
if all(string_index[k] == v for k, v in index.items() if k in index):
But then I get a key error, rocky (first key), which tells me that somewhere it is not iterating and comparing what I want it to compare.
And there I am stuck in the iteration part.
Once I get it right I know I can finish it.
Thanks very much for any hint or tips!
index={}
#Get frequency of a letter
def frequency_return(string,letter):
count=0
for letters in string:
if letters==letter:
count+=1
return count
#Scan all letters: if a letter has not been searched then count
def get_frequency(string):
range_string=string
length_string=len(string)
datastore={}
target=0
frequency=0
while len(range_string)!=0:
# datastore.append(range_string[target])
frequency = (int(frequency_return(range_string,range_string[target]))/length_string)
frequency = format(frequency, '.2f')
datastore.update({range_string[target]:frequency})
range_string = range_string.replace(range_string[target],'')
return datastore
def index_string(string):
if string not in index:
index.update({string: (get_frequency(string))})
return index
index_string("pablo")
index_string("rocky")
index_string("rigo")
index_string("nino")
print (index)
###############################################################################################
def comparator (string, index):
string_index=get_frequency(string)
result={}
if all(string_index[k] == v for k, v in index.items() if k in index):
result.update(string_index)
return result
print(comparator("baobab", index))

I think that you've mislead what you're iterating over. You've got KeyError Exception for one certain reason - in this line:
if all(string_index[k] == v for k, v in index.items() if k in index):
In for loop, you're not iterating over keys of 'rigo' or 'pablo' dictionary. Instead, you're iterating over dictionary, that has keys: 'rigo', 'nino', 'rocky', 'pablo' (this is k in that code) and values {'a': '0.20', 'p': '0.20', 'b': '0.20', 'l': '0.20', 'o': '0.20'}, {'i': '0.25', 'r': '0.25', 'g': '0.25', 'o': '0.25'}, etc.
You can try it with this little snippet:
>>> for k,v in index.items():
... print("key is:{}, value is:{}".format(k,v))
...
"key is:pablo, value is:{'a': '0.20', 'p': '0.20', 'b': '0.20', 'l': '0.20', 'o': '0.20'}"
"key is:rigo, value is:{'i': '0.25', 'r': '0.25', 'g': '0.25', 'o': '0.25'}"
"key is:nino, value is:{'i': '0.25', 'o': '0.25', 'n': '0.50'}"
"key is:rocky, value is:{'y': '0.20', 'c': '0.20', 'r': '0.20', 'k': '0.20', 'o': '0.20'}"
What's more, this if doesn't have much sense, as if you're iterating over index.items(), k is always in index.
Finally, as k is one of the values 'rigo', 'rocky', 'pablo', 'nino', this part:
string_index[k] == v
...is trying to evaluate string_index on key 'rigo', which is not an element of string_index.keys(), thus program returns an Exception.
As suggested, try to re-write your code or use some better data structures from collections.

It's not quite clear what your desired output is, but I've had a go at sorting it out.
First of all, we can tidy up your calculation of letter proportions for each word by simply using a Counter:
from collections import Counter
def get_proportions(word):
frequencies = dict(Counter(word))
for letter, value in frequencies.items():
frequencies[letter] = float(value)/len(word)
return frequencies
A Counter returns the number of times it finds each letter in the word. To get this into proportions, we simply divide each value by the length of the word. To demonstrate this in use, if we do:
comparison_dict = {}
for word in ['pablo', 'rocky', 'rigo', 'nino']:
comparison_dict[word] = get_proportions(word)
print(comparison_dict)
We print out:
{'rigo': {'i': 0.25, 'r': 0.25, 'g': 0.25, 'o': 0.25}, 'rocky': {'y': 0.2, 'c': 0.2, 'r': 0.2, 'k': 0.2, 'o': 0.2}, 'nino': {'i': 0.25, 'o': 0.25, 'n': 0.5}, 'pablo': {'a': 0.2, 'p': 0.2, 'b': 0.2, 'l': 0.2, 'o': 0.2}}
The final part of your code I assume is aiming to work out some kind of "distance" between a provided word and each word in the comparison dictionary? I've assumed you want the total difference between the given word's letter values and the dictionary word's letter values, which gives the following function:
def compare_to_dict(word, compare_to):
props = get_proportions(word)
comparison_scores = []
for key in compare_to.keys():
word_distance = sum(abs(props.get(letter, 0) - compare_to[key].get(letter, 0))
for letter in set(word + key))
comparison_scores.append((key, word_distance))
return sorted(comparison_scores, key=lambda x: x[1])
For each letter in the given word and dictionary word, we calculate the (absolute) difference between the proportions for the two words - i.e. if our given word is 'baobab' and our dictionary word is 'rigo', the letter r contributes 0.25 (0.25-0) while the letter o contributes 0.083333 (0.25 - 0.0166666). We sort this according to the total of these differences, so the first entry in our returned list is the "closest" word in the dictionary to our given word.
For example, if we print(compare_to_dict('baobab', comparison_dict)) we get:
[('pablo', 0.8666666666666666), ('rigo', 1.6666666666666665), ('rocky', 1.6666666666666665), ('nino', 1.6666666666666665)]
suggesting that 'pablo' is the closest word to 'baobab'.
I'm not sure if this is exactly what you're after, so please let me know if it isn't. Full code is as follows:
from collections import Counter
def get_proportions(word):
frequencies = dict(Counter(word))
for letter, value in frequencies.items():
frequencies[letter] = float(value) / len(word)
return frequencies
def compare_to_dict(word, compare_to):
props = get_proportions(word)
comparison_scores = []
for key in compare_to.keys():
word_distance = sum(abs(props.get(letter, 0) - compare_to[key].get(letter, 0))
for letter in set(word + key))
comparison_scores.append((key, word_distance))
return sorted(comparison_scores, key=lambda x: x[1])
comparison_dict = {}
for word in ['pablo', 'rocky', 'rigo', 'nino']:
comparison_dict[word] = get_proportions(word)
print(comparison_dict)
print(compare_to_dict('baobab', comparison_dict))

Related

how to use for i loop with strings and array [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I'am trying to make a function that takes in a string as input and gives back the Nato alphabet for each corresponding letter. i've ben trying to do this for days i'am so furstrated i don't know how to treat them as an array or as a string
how i imagine it would be done is taking making both alphabets elements/strings to be equal
like Nato=alphabet
and use a for i loop to only print out the input letters.
any hints or ideas on how/what to do?
import numpy as np
def textToNato(plaintext):
plaintext=plaintext.lower()
plaintext="-".join(plaintext)
plaintext=plaintext.split()
Nato=np.array(["Alpha","Bravo","Charlie","Delta","Echo","Foxtrot",
"Golf","Kilo","Lima","Mike","November","Oscar",
"Papa","Quebec","Romeo","Sierra","Tango","Uniform"
"Victor","Whiskey","Xray","Yankee","Zulu"])
alphabet=np.array(["a","b","c","d","e","f","g","k","l","m","n","o"
,"p","q","r","s","t","u","v","w","x","y","z"])
new=plaintext
for i in range(len(Nato)): # i have no idea what i'am trying to do here
new=np.append(alphabet[i],Nato[i])
return new
I would create a different data-structure for doing the lookup. You can make a dictionary of what word each letter in the alphabet points to. Then loop through each letter in the word, lookup the letter in the dictionary, and add the nato letter to a list.
nato_alphabet = {
'a': 'Alpha', 'b': 'Bravo', 'c': 'Charlie', 'd': 'Delta', 'e': 'Echo',
'f': 'Foxtrot', 'g': 'Golf', 'h': 'Hotel', 'i': 'India', 'j': 'Juliet',
'k': 'Kilo', 'l': 'Lima', 'm': 'Mike', 'n': 'November', 'o': 'Oscar',
'p': 'Papa', 'q': 'Quebec', 'r': 'Romeo', 's': 'Sierra', 't': 'Tango',
'u': 'Uniform', 'v': 'Victor', 'w': 'Whiskey', 'x': 'Xray', 'y': 'Yankee',
'z': 'Zulu'
}
def word_to_nato(word):
output = []
for letter in word:
output.append(nato_alphabet[letter.lower()])
return output
print(word_to_nato("foobar"))
['Foxtrot', 'Oscar', 'Oscar', 'Bravo', 'Alpha', 'Romeo']
I see two problems with your code:
# ...
plaintext = plaintext.split()
# ...
# "plaintext"'s value here is overwritten by
# each element of alphabet
for plaintext in alphabet:
# ...
# Below compares each element of the alphabet
# with the alphabet list.
# You want to check each element of plaintext against
# each element of alphabet, to find the index and get the
# corresponding element of Nato
if plaintext == alphabet:
# ...
I think what you want to do is:
loop through each element of your input, and
loop through each letter of the alphabet to find the index, and
use that index to get the corresponding phonetic alphabet word.
That could be done like this:
output = ''
for char1 in plaintext:
found = False
for i, char2 in enumerate(alphabet):
if char1 == char2:
output += Nato[i] + ', '
found = True
break
if not found: output += 'not found'
return output
An easier and more efficient way is to use a dictionary (aka a hashmap):
nato_map = {
'a' : 'Alpha',
'b' : 'Bravo',
# ...
}
output = ''
for char in plaintext:
output += nato_map[char] + ', '
That way, the lookup is in constant time, rather than needing to loop through every element of the Nato list.

Use a for loop to create a new dictionary using keys from another dictionary

I am writing a program for a python class. I have been given a dictionary to start with like so:
aa2mw = {
'A': 89.093, 'G': 75.067, 'M': 149.211, 'S': 105.093, 'C': 121.158,
'H': 155.155, 'N': 132.118, 'T': 119.119, 'D': 133.103, 'I': 131.173,
'P': 115.131, 'V': 117.146, 'E': 147.129, 'K': 146.188, 'Q': 146.145,
'W': 204.225, 'F': 165.189, 'L': 131.173, 'R': 174.201, 'Y': 181.189
}
I want to create a new dictionary using all of the keys from aa2mw, but with new values. The values would be calculated using a string of arbitrary length like this: (inString.count(A) / len(inString)) where A would be a single letter that matches the keys. Rather than type in each key one by one, is it possible to use a loop to make all the keys in the new dictionary the same as aa2mw? I tried to write this, but kept running into syntax errors because I wasn't sure how to combine a loop with a dictionary. My best, yet messy attempt looks like this:
aaCompositionDict = {key for key in aa2mw.items(): (self.inString.count(key) / len(self.inString))}
You almost got it, but the syntax is a bit off. Try this:
aaCompositionDict = {key: (self.inString.count(key) / len(self.inString)) for key, _ in aa2mw.items()}

Creating a function in Python that counts number of letters in a dictionary [duplicate]

This question already has answers here:
Counting each letter's frequency in a string
(2 answers)
Closed 4 years ago.
How do I create a function that will let me input a word, and it will execute to create a dictionary that counts individual letters in the code. I would want it to display as a dictionary, for example, by inputting 'hello' it will display {'e': 1, 'h': 1, 'l': 2, 'o': 1}
I AM ALSO required to have 2 arguments in the function, one for the string and one for the dictionary. THIS IS DIFFERENT to the "Counting each letter's frequency in a string" question.
For example, I think I would have to start as,
d = {}
def count(text, d ={}):
count = 0
for l in text:
if l in d:
count +=1
else:
d.append(l)
return count
But this is incorrect? Also Would i need to set a default value to text, by writing text ="" in case the user does not actually enter any word?
Furthermore, if there were existing values already in the dictionary, I want it to add to that existing list. How would this be achieved?
Also if there were already existing words in the dictionary, then how would you add onto that list, e.g. dct = {'e': 1, 'h': 1, 'l': 2, 'o': 1} and now i run in terminal >>> count_letters('hello', dct) the result would be {'e': 2, 'h': 2, 'l': 4, 'o': 2}
If you can use Pandas, you can use value_counts():
import pandas as pd
word = "hello"
letters = [letter for letter in word]
pd.Series(letters).value_counts().to_dict()
Output:
{'e': 1, 'h': 1, 'l': 2, 'o': 1}
Otherwise, use dict and list comprehensions:
letter_ct = {letter:0 for letter in word}
for letter in word:
letter_ct[letter] += 1
letter_ct
You can use pythons defaultdict
from collections import defaultdict
def word_counter(word):
word_dict = defaultdict(int)
for letter in word:
word_dict[letter] += 1
return(word_dict)
print(word_counter('hello'))
Output:
defaultdict(<class 'int'>, {'h': 1, 'e': 1, 'l': 2, 'o': 1})
def count_freqs(string, dictionary={}):
for letter in string:
if letter not in dictionary:
dictionary[letter] = 1
else:
dictionary[letter] += 1
return dictionary

How to write String template file with multiple loop elements?

I have small issue in writing for loop elements in string template as follows: when I try to make a string templete from three loop elements I can able to print only the last element not the first two elements. I beleive that the error is because of some issues in writing file but I couldnt get what is the actual problem is with my code. So If some one could you kindly help me with this.
My SCRIPT:
from string import Template
import os
AMONIOACIDS = {'A': 'ALA', 'C': 'CYS', 'E': 'GLU', 'D': 'ASP', 'G': 'GLY',
'F': 'PHE', 'I': 'ILE', 'H': 'HIS', 'K': 'LYS', 'M': 'MET',
'L': 'LEU', 'N': 'ASN', 'Q': 'GLN', 'P': 'PRO', 'S': 'SER',
'R': 'ARG', 'T': 'THR', 'W': 'TRP', 'V': 'VAL', 'Y': 'TYR'}
rPrS={'C': '102', 'A': '104','H': '12'}
a=[]
b=[]
count=1
for single, third in AMONIOACIDS.iteritems():
for rS,rP in rPrS.iteritems():
if rS == single:
a.append(["s"+str(count)+"=selection(mdl1.chains["+chain+"].residues["+rP+"])"])
b.append(["s"+str(count)+".mutate(residue_type='"+third+"')"])
str='''Loop elements\n'''
for i,j in zip (a,b):
i=''.join(i)
j=''.join(j)
str+='''$i\n'''
str+='''$j\n'''
str=Template(str)
str.substitute(i=i, j=j)
file = open(os.getcwd() + '/' + 'model.py', 'w')
file.write(str.substitute(i=i,j=j))
file.close()
Expected ouput:
Loop elements
s1=selection(mdl1.chains[A].residues[104])
s1.mutate(residue_type='ALA')
s2=selection(mdl1.chains[A].residues[102])
s2.mutate(residue_type='CYS')
s3=selection(mdl1.chains[A].residues[12])
s3.mutate(residue_type='HIS')
What I am getting :
Loop elements
s3=selection(mdl1.chains[A].residues[12])
s3.mutate(residue_type='HIS')
s3=selection(mdl1.chains[A].residues[12])
s3.mutate(residue_type='HIS')
s3=selection(mdl1.chains[A].residues[12])
s3.mutate(residue_type='HIS')
Your template is getting its substitution values from the last values of i and j from the for loop. You need to persist values from the previous iteration. How? You could use a dictionary and a count to store and distinguish values at each iteration.
You can substitute values in a template using a dictionary. I have used the count variable to create corresponding dictionary keys at each iteration: i_0, i_1, i_2, and j_0, j_1, j_2. These same names are used as identifiers in the template $i_0, $i_1, $i_2, and $j_0, $j_1, $j_2.
safe_substitute safely substitutes the value at each key into the template e.g key i_0 to the template identifier $i_0.
The dictionary stores all values of i and j at each iteration, and the substitution in your template is done taking the appropriate values at each key in the dictionary. This part should fix it:
# your previous lines of code
count = 0
d = {}
s='''Loop elements\n'''
for i,j in zip (a,b):
d['i_{}'.format(count)] = ''.join(i)
d['j_{}'.format(count)] = ''.join(j)
s+='$i_{}\n'.format(count)
s+='$j_{}\n'.format(count)
count += 1
print(str)
print(d)
s=Template(s)
file = open(os.getcwd() + '/' + 'model.py', 'w')
file.write(s.safe_substitute(d))
file.close()
I have replaced the name str with s to avoid shadowing the builtin str. No other changes are required in the preceding code blocks before the fix.

Translating characters in a string to multiple characters using Python

I have a list of strings with prefix characters representing the multiplying factor for the number. So if I have data like:
data = ['101n', '100m', '100.100f']
I want to use the dictionary
prefix_dict = {'y': 'e-24', 'z': 'e-21', 'a': 'e-18', 'f': 'e-15', 'p': 'e-12',
'n': 'e-9', 'u': 'e-6', 'm': 'e-3', 'c': 'e-2', 'd': 'e-1',
'da': 'e1', 'h': 'e2', 'k': 'e3', 'M': 'e6', 'G': 'e9',
'T': 'e12', 'P': 'e15', 'E': 'e18', 'Z': 'e21', 'Y': 'e24'}
To insert their corresponding strings. When I look at the other questions similar to mine there is one character being translated into another character. Is there a way to use the translate function to translate one character into multiple characters or should I be approaching this differently?
You can use regex for this, this works for 'da' as well:
>>> data = ['101n', '100m', '100.100f', '1d', '1da']
>>> import re
>>> r = re.compile(r'([a-zA-Z]+)$')
>>> for d in data:
print r.sub(lambda m: prefix_dict.get(m.group(1), m.group(1)), d)
...
101e-9
100e-3
100.100e-15
1e-1
1e1
And a non-regex version using itertools.takewhile:
>>> from itertools import takewhile
>>> def find_suffix(s):
return ''.join(takewhile(str.isalpha, s[::-1]))[::-1]
...
>>> for d in data:
sfx = find_suffix(d)
print (d.replace(sfx, prefix_dict.get(sfx, sfx)))
...
101e-9
100e-3
100.100e-15
1e-1
1e1
Try:
for i, entry in enumerate(data):
for key, value in sorted(prefix_dict.items(),
key = lambda x: len(x[0]), reverse=True):
# need to sort the dictionary so that 'da' always comes before 'a'
if key in entry:
data[i] = entry.replace(key, value)
print(data)
This works for arbitrary combinations in the dictionary and the data. If the dictionary key is always only 1 string long, you have lots of other solutions posted here.
import re
data = ['101da', '100m', '100.100f']
prefix_dict = {'y': 'e-24', 'z': 'e-21', 'a': 'e-18', 'f': 'e-15', 'p': 'e-12',
'n': 'e-9', 'u': 'e-6', 'm': 'e-3', 'c': 'e-2', 'd': 'e-1',
'da': 'e1', 'h': 'e2', 'k': 'e3', 'M': 'e6', 'G': 'e9',
'T': 'e12', 'P': 'e15', 'E': 'e18', 'Z': 'e21', 'Y': 'e24'}
comp = re.compile(r"[^\[A-Za-z]")
for ind,d in enumerate(data):
pre = re.sub(comp,"",d)
data[ind] = d.replace(pre,prefix_dict.get(pre))
print data
['101e1', '100e-3', '100.100e-15']
You can use pre = [x for x in d if x.isalpha()][0] instead of using re

Categories