python position frequency dictionary of letters in words - python

To efficiently get the frequencies of letters (given alphabet ABC in a dictionary in a string code I can make a function a-la (Python 3) :
def freq(code):
return{n: code.count(n)/float(len(code)) for n in 'ABC'}
Then
code='ABBBC'
freq(code)
Gives me
{'A': 0.2, 'C': 0.2, 'B': 0.6}
But how can I get the frequencies for each position along a list of strings of unequal lengths ? For instance mcode=['AAB', 'AA', 'ABC', ''] should give me a nested structure like a list of dict (where each dict is the frequency per position):
[{'A': 1.0, 'C': 0.0, 'B': 0.0},
{'A': 0.66, 'C': 0.0, 'B': 0.33},
{'A': 0.0, 'C': 0.5, 'B': 0.5}]
I cannot figure out how to do the frequencies per position across all strings, and wrap this in a list comprehension. Inspired by other SO for word counts e.g. the well discussed post Python: count frequency of words in a list I believed maybe the Counter module from collections might be a help.
Understand it like this - write the mcode strings on separate lines:
AAB
AA
ABC
Then what I need is the column-wise frequencies (AAA, AAB, BC) of the alphabet ABC in a list of dict where each list element is the frequencies of ABC per columns.

A much shorter solution:
from itertools import zip_longest
def freq(code):
l = len(code) - code.count(None)
return {n: code.count(n)/l for n in 'ABC'}
mcode=['AAB', 'AA', 'ABC', '']
results = [ freq(code) for code in zip_longest(*mcode) ]
print(results)

Example, the steps are shortly explained in comments. Counter of module collections is not used, because the mapping for a position also contains characters, that are not present at this position and the order of frequencies does not seem to matter.
def freq(*words):
# All dictionaries contain all characters as keys, even
# if a characters is not present at a position.
# Create a sorted list of characters in chars.
chars = set()
for word in words:
chars |= set(word)
chars = sorted(chars)
# Get the number of positions.
max_position = max(len(word) for word in words)
# Initialize the result list of dictionaries.
result = [
dict((char, 0) for char in chars)
for position in range(max_position)
]
# Count characters.
for word in words:
for position in range(len(word)):
result[position][word[position]] += 1
# Change to frequencies
for position in range(max_position):
count = sum(result[position].values())
for char in chars:
result[position][char] /= count # float(count) for Python 2
return result
# Testing
from pprint import pprint
mcode = ['AAB', 'AA', 'ABC', '']
pprint(freq(*mcode))
Result (Python 3):
[{'A': 1.0, 'B': 0.0, 'C': 0.0},
{'A': 0.6666666666666666, 'B': 0.3333333333333333, 'C': 0.0},
{'A': 0.0, 'B': 0.5, 'C': 0.5}]
In Python 3.6, the dictionaries are even sorted; earlier versions can use OrderedDict from collections instead of dict.

Your code isn't efficient at all :
You first need to define which letters you'd like to count
You need to parse the string for each distinct letter
You could just use Counter:
import itertools
from collections import Counter
mcode=['AAB', 'AA', 'ABC', '']
all_letters = set(''.join(mcode))
def freq(code):
code = [letter for letter in code if letter is not None]
n = len(code)
counter = Counter(code)
return {letter: counter[letter]/n for letter in all_letters}
print([freq(x) for x in itertools.zip_longest(*mcode)])
# [{'A': 1.0, 'C': 0.0, 'B': 0.0}, {'A': 0.6666666666666666, 'C': 0.0, 'B': 0.3333333333333333}, {'A': 0.0, 'C': 0.5, 'B': 0.5}]
For Python2, you could use itertools.izip_longest.

Related

How to get a value in a tuple in a dictionary?

I want to access the values in a tuple within a dictionary using a lambda function
I need to get average GPA for each subject by comparing the average grades of the students in that class
I have tried using a lambda but I could not figure it out.
grade = {'A': 4.0, 'B': 3.0, 'C': 2.0, 'D': 1.0, 'F' : 0.0}
subjects = {'math': {('Jack', 'A'),('Larry', 'C')}, 'English': {('Kevin', 'C'),('Tom','B')}}
def highestAverageOfSubjects(subjects):
return
The output needs to be ['math','English'] since average GPA of math which is 3.0 is greater then English 2.0 average GPA
You can easily sort everything by using sorted with a key function:
Grade = {'A': 4.0, 'B': 3.0, 'C': 2.0, 'D': 1.0, 'F' : 0.0}
subject = {'math': {('Jack', 'A'),('Larry', 'C')}, 'English': {('Kevin', 'C'),('Tom','B')}}
result = sorted(subject, key=lambda x: sum(Grade[g] for _, g in subject[x]) / len(subject[x]), reverse=True)
print(result)
Output:
['math','English']
If, as a secondary, you want to sort by the number of students:
result = sorted(subject, key=lambda x: (sum(Grade[g] for _, g in subject[x]) / len(subject[x]), len(subject[x])), reverse=True)
print(result)
One of the issues with the way you have implemented is that you have used a set as values in your subject dict. This means you have to range over each element. But once you have the element, that value would simply be indexed like elem[1].
For ex:
Grade = {'A': 4.0, 'B': 3.0, 'C': 2.0, 'D': 1.0, 'F' : 0.0}
subject = {'math': {('Jack', 'A'),('Larry', 'C')}, 'English': {('Kevin', 'C'),('Tom','B')}}
for elem in subject['math']:
print(elem[1])
Output:
C
A
If in the print above you just print(elem) then you'd see something like:
('Larry', 'C')
('Jack', 'A')
So this way you could easily extend your highAveSub(subject) implementation to get what you want.
To find the avg grade of a subject:
def highAveSub(subname):
total = 0
for elem in subject[subname]: #Because your values are of type set, not dict.
total = total + grade[elem[1]] #This is how you will cross-reference the numerical value of the grade. You could also simply use enums and I'll leave that to you to find out
avg = total / len(subject[subname])
return avg

Multiplying values from a dictionary if it exists in a list

I am trying to calculate some sentence probabilities.
I have a dictionary that contains some values for different letters:
{'a': 0.2777777777777778, 'b': 0.3333333333333333, 'c': 0.3888888888888889}
I then have separate sentences in a list such as:
['aabc', 'abbcc', 'cba', 'abcd', 'adeb']
What i am trying to do is some probability calculations so that it searches the sentence in a list and multiplies the values for example
aabc would be 0.2777*0.2777*0.3333*0.388888
How would i search this list for each independent string and do this multiplication?
You can use reduce to reduce your sentence into its final probability (note that if a character does not have a probability, I just use 1 to multiply):
from functools import reduce
probs = {'a': 0.2777777777777778, 'b': 0.3333333333333333, 'c': 0.3888888888888889}
sentences = ['aabc', 'abbcc', 'cba', 'abcd', 'adeb']
result = [reduce(lambda acc, curr: probs.get(curr, 1) * acc, s, 1) for s in sentences]
print(result)
# [0.010002286236854138, 0.004667733577198597, 0.0360082304526749, 0.03600823045267489, 0.09259259259259259]
This is a pretty non-fanciful way of doing it:
values = {'a': 0.2777777777777778, 'b': 0.3333333333333333, 'c': 0.3888888888888889, 'd':0.1234, 'e':0.5678}
strings = ['aabc', 'abbcc', 'cba', 'abcd', 'adeb']
for string in strings:
product = 1
for char in string:
product *= values[char]
print(product)
EDIT :
If we want to use check if the dictionary has values, we can do the following and use unk instead:
values = {'a': 0.2777777777777778, 'b': 0.3333333333333333, 'c': 0.3888888888888889}
strings = ['aabc', 'abbcc', 'cba', 'abcd', 'adeb']
unk = 0.05
for string in strings:
product = 1
for char in string:
if char in values:
product *= values[char]
else:
product *= unk
print(product)
You could use a double for loop. The outer for would iterate over the sentence list, and the inner for can iterate over each letter in the sentence. Python for loop syntax is for item in iterable_object: <code to run>. Try using this information and see how far you can get.
You can use list comprehension and a for loop to do this.
def prob(string, prob):
out = 1;
probs = [prob[char] for char in string]
for x in probs:
out *= x;
return out
prob is a dictionary of probabilities and string is the string. in iterates over each character in the string.

How to find if the keys within two nested dictionaries match?

Objective of the project :
Compare an input to a pre-existing index and return the closest match in term or letters frequencies.
Basically the desired output would do in the comparison function with an index like this:
index = {'nino': {'n': '0.50', 'o': '0.25', 'i': '0.25'},
'pablo': {'l': '0.20', 'p': '0.20', 'o': '0.20', 'b': '0.20', 'a': '0.20'}}
and compare it with the input string from which I would calculate the letters frequencies as well to return a similar output, the letters frequencies:
{'y': '0.20', 'k': '0.20', 'o': '0.20', 'c': '0.20', 'r': '0.20'}
Once I have that, I would iterate through both dictionaries and check for each item what letters are present.
Once they are present, I would compare the frequencies in the word and attribute points, then compare the results and return the one that scores most points.
I have had no trouble with the end of the code.
However what I can not seem to get right is the iteration between the two dictionaries and their nested elements( the value is a dictionary after all:
I have tried the two sets approach and then getting the union of both sets but then I am unable to the next part: It says that set are immutable.
Then I tried adapting code from an answer I found here:
python dictionary match key values in two dictionaries
Then I tried this option inspired from the answer above:
if all(string_index[k] == v for k, v in index.items() if k in index):
But then I get a key error, rocky (first key), which tells me that somewhere it is not iterating and comparing what I want it to compare.
And there I am stuck in the iteration part.
Once I get it right I know I can finish it.
Thanks very much for any hint or tips!
index={}
#Get frequency of a letter
def frequency_return(string,letter):
count=0
for letters in string:
if letters==letter:
count+=1
return count
#Scan all letters: if a letter has not been searched then count
def get_frequency(string):
range_string=string
length_string=len(string)
datastore={}
target=0
frequency=0
while len(range_string)!=0:
# datastore.append(range_string[target])
frequency = (int(frequency_return(range_string,range_string[target]))/length_string)
frequency = format(frequency, '.2f')
datastore.update({range_string[target]:frequency})
range_string = range_string.replace(range_string[target],'')
return datastore
def index_string(string):
if string not in index:
index.update({string: (get_frequency(string))})
return index
index_string("pablo")
index_string("rocky")
index_string("rigo")
index_string("nino")
print (index)
###############################################################################################
def comparator (string, index):
string_index=get_frequency(string)
result={}
if all(string_index[k] == v for k, v in index.items() if k in index):
result.update(string_index)
return result
print(comparator("baobab", index))
I think that you've mislead what you're iterating over. You've got KeyError Exception for one certain reason - in this line:
if all(string_index[k] == v for k, v in index.items() if k in index):
In for loop, you're not iterating over keys of 'rigo' or 'pablo' dictionary. Instead, you're iterating over dictionary, that has keys: 'rigo', 'nino', 'rocky', 'pablo' (this is k in that code) and values {'a': '0.20', 'p': '0.20', 'b': '0.20', 'l': '0.20', 'o': '0.20'}, {'i': '0.25', 'r': '0.25', 'g': '0.25', 'o': '0.25'}, etc.
You can try it with this little snippet:
>>> for k,v in index.items():
... print("key is:{}, value is:{}".format(k,v))
...
"key is:pablo, value is:{'a': '0.20', 'p': '0.20', 'b': '0.20', 'l': '0.20', 'o': '0.20'}"
"key is:rigo, value is:{'i': '0.25', 'r': '0.25', 'g': '0.25', 'o': '0.25'}"
"key is:nino, value is:{'i': '0.25', 'o': '0.25', 'n': '0.50'}"
"key is:rocky, value is:{'y': '0.20', 'c': '0.20', 'r': '0.20', 'k': '0.20', 'o': '0.20'}"
What's more, this if doesn't have much sense, as if you're iterating over index.items(), k is always in index.
Finally, as k is one of the values 'rigo', 'rocky', 'pablo', 'nino', this part:
string_index[k] == v
...is trying to evaluate string_index on key 'rigo', which is not an element of string_index.keys(), thus program returns an Exception.
As suggested, try to re-write your code or use some better data structures from collections.
It's not quite clear what your desired output is, but I've had a go at sorting it out.
First of all, we can tidy up your calculation of letter proportions for each word by simply using a Counter:
from collections import Counter
def get_proportions(word):
frequencies = dict(Counter(word))
for letter, value in frequencies.items():
frequencies[letter] = float(value)/len(word)
return frequencies
A Counter returns the number of times it finds each letter in the word. To get this into proportions, we simply divide each value by the length of the word. To demonstrate this in use, if we do:
comparison_dict = {}
for word in ['pablo', 'rocky', 'rigo', 'nino']:
comparison_dict[word] = get_proportions(word)
print(comparison_dict)
We print out:
{'rigo': {'i': 0.25, 'r': 0.25, 'g': 0.25, 'o': 0.25}, 'rocky': {'y': 0.2, 'c': 0.2, 'r': 0.2, 'k': 0.2, 'o': 0.2}, 'nino': {'i': 0.25, 'o': 0.25, 'n': 0.5}, 'pablo': {'a': 0.2, 'p': 0.2, 'b': 0.2, 'l': 0.2, 'o': 0.2}}
The final part of your code I assume is aiming to work out some kind of "distance" between a provided word and each word in the comparison dictionary? I've assumed you want the total difference between the given word's letter values and the dictionary word's letter values, which gives the following function:
def compare_to_dict(word, compare_to):
props = get_proportions(word)
comparison_scores = []
for key in compare_to.keys():
word_distance = sum(abs(props.get(letter, 0) - compare_to[key].get(letter, 0))
for letter in set(word + key))
comparison_scores.append((key, word_distance))
return sorted(comparison_scores, key=lambda x: x[1])
For each letter in the given word and dictionary word, we calculate the (absolute) difference between the proportions for the two words - i.e. if our given word is 'baobab' and our dictionary word is 'rigo', the letter r contributes 0.25 (0.25-0) while the letter o contributes 0.083333 (0.25 - 0.0166666). We sort this according to the total of these differences, so the first entry in our returned list is the "closest" word in the dictionary to our given word.
For example, if we print(compare_to_dict('baobab', comparison_dict)) we get:
[('pablo', 0.8666666666666666), ('rigo', 1.6666666666666665), ('rocky', 1.6666666666666665), ('nino', 1.6666666666666665)]
suggesting that 'pablo' is the closest word to 'baobab'.
I'm not sure if this is exactly what you're after, so please let me know if it isn't. Full code is as follows:
from collections import Counter
def get_proportions(word):
frequencies = dict(Counter(word))
for letter, value in frequencies.items():
frequencies[letter] = float(value) / len(word)
return frequencies
def compare_to_dict(word, compare_to):
props = get_proportions(word)
comparison_scores = []
for key in compare_to.keys():
word_distance = sum(abs(props.get(letter, 0) - compare_to[key].get(letter, 0))
for letter in set(word + key))
comparison_scores.append((key, word_distance))
return sorted(comparison_scores, key=lambda x: x[1])
comparison_dict = {}
for word in ['pablo', 'rocky', 'rigo', 'nino']:
comparison_dict[word] = get_proportions(word)
print(comparison_dict)
print(compare_to_dict('baobab', comparison_dict))

iterating through list of list and dictionary python

I have this code in Python:
from pprint import pprint
def addDictionary(States,Transition,Languaje,Tr):
for s in States :
D = {}
Transition[s] = D # this create {"state1":{"symbol1":}}
for l in Languaje:
for i in range(len(Tr)):
D[l] = Tr[i][0]
def addStates(States):
cant = int(raw_input("how many states?: "))
for i in range(cant):
c = "q"+str(i)
States.append(c)
def addLan(Languaje):
c = int(raw_input("how many symbols?: "))
for j in range(c):
l = raw_input("symbol: ")
Languaje.append(l)
if __name__ == "__main__":
States=[]
Languaje=[]
Transition={} #{"state":{"symbol1":"transition value","symbol2":"transition value"}}
Tr=[["q2","q1"],["","q2"]] #transition values
addStates(States)
addLan(Languaje)
addDictionary(States,Transition,Languaje,Tr)
pprint(Transition)
and this is the output:
{'q0': {'a': '', 'b': ''}, 'q1': {'a': '', 'b': ''}}
what I want is something like this:
{'q0': {'a': 'q2', 'b': 'q1'}, 'q1': {'a': '', 'b': 'q2'}}
I want to put the values of the list Tr in my dictionary.
This is only a example code. I want to implement a Deterministic Finite Automata that I developed for a class at my University
I forgot to mention that to prove the code first input 2 ,and then 2 and then a and b because I only want to prove my code with a list of 2x2. Later I will change for a nxm list. (Sorry for my "medium" skills in English :V)
One more thing: the problem is in the the function addDictionary().
This:
def addDictionary(States, Transition, Languaje, Tr):
for s, t in zip(States, Tr):
Transition[s] = dict(zip(Languaje, t))
generate this output:
{'q0': {'a': 'q2', 'b': 'q1'}, 'q1': {'a': '', 'b': 'q2'}}
for two states and symbols a and b.

Rearranging levels of a nested dictionary in python

Is there a library that would help me achieve the task to rearrange the levels of a nested dictionary
Eg: From this:
{1:{"A":"i","B":"ii","C":"i"},2:{"B":"i","C":"ii"},3:{"A":"iii"}}
To this:
{"A":{1:"i",3:"iii"},"B":{1:"ii",2:"i"},"C":{1:"i",2:"ii"}}
ie first two levels on a 3 levelled dictionary swapped. So instead of 1 mapping to A and 3 mapping to A, we have A mapping to 1 and 3.
The solution should be practical for an arbitrary depth and move from one level to any other within.
>>> d = {1:{"A":"i","B":"ii","C":"i"},2:{"B":"i","C":"ii"},3:{"A":"iii"}}
>>> keys = ['A','B','C']
>>> e = {key:{k:d[k][key] for k in d if key in d[k]} for key in keys}
>>> e
{'C': {1: 'i', 2: 'ii'}, 'B': {1: 'ii', 2: 'i'}, 'A': {1: 'i', 3: 'iii'}}
thank god for dict comprehension
One way to think about this would be to consider your data as a (named) array and to take the transpose. An easy way to achieve this would be to use the data analysis package Pandas:
import pandas as pd
df = pd.DataFrame({1: {"A":"i","B":"ii","C":"i"},
2: {"B":"i","C":"ii"},
3: {"A":"iii"}})
df.transpose().to_dict()
{'A': {1: 'i', 2: nan, 3: 'iii'},
'B': {1: 'ii', 2: 'i', 3: nan},
'C': {1: 'i', 2: 'ii', 3: nan}}
I don't really care about performance for my application of this so I haven't bothered checking how efficient this is. Its based on bubblesort so my guess is ~O(N^2).
Maybe this is convoluted, but essentially below works by:
- providing dict_swap_index a nested dictionary and a list. the list should be of the format [i,j,k]. The length should be the depth of the dictionary. Each element corresponds to which position you'd like to move each element to. e.g. [2,0,1] would indicate move element 0 to position 2, element 1 to position 0 and element 2 to position 1.
- this function performs a bubble sort on the order list and dict_, calling deep_swap to swap the levels of the dictionary which are being swapped in the order list
- deep_swap recursively calls itself to find the level provided and returns a dictionary which has been re-ordered
- swap_two_level_dict is called to swap any two levels in a dictionary.
Essentially the idea is to perform a bubble sort on the dictionary, but instead of swapping elements in a list swap levels in a dictionary.
from collections import defaultdict
def dict_swap_index(dict_, order):
for pas_no in range(len(order)-1,0,-1):
for i in range(pas_no):
if order[i] > order[i+1]:
temp = order[i]
order[i] = order[i+1]
order[i+1] = temp
dict_ = deep_swap(dict_, i)
return dict_, order
def deep_swap(dict_, level):
dict_ = deepcopy(dict_)
if level==0:
dict_ = swap_two_level_dict(dict_)
else:
for key in dict_:
dict_[key] = deep_swap(dict_[key], level-1)
return dict_
def swap_two_level_dict(a):
b = defaultdict(dict)
for key1, value1 in a.items():
for key2, value2 in value1.items():
b[key2].update({key1: value2})
return b
e.g.
test_dict = {'a': {'c': {'e':0, 'f':1}, 'd': {'e':2,'f':3}}, 'b': {'c': {'g':4,'h':5}, 'd': {'j':6,'k':7}}}
result = dict_swap_index(test_dict, [2,0,1])
result
(defaultdict(dict,
{'c': defaultdict(dict,
{'e': {'a': 0},
'f': {'a': 1},
'g': {'b': 4},
'h': {'b': 5}}),
'd': defaultdict(dict,
{'e': {'a': 2},
'f': {'a': 3},
'j': {'b': 6},
'k': {'b': 7}})}),
[0, 1, 2])

Categories