so currently im stuck on a question of my assignment,
the assignment question is:
Define the print_most_frequent() function which is passed two parameters, a dictionary containing words and their corresponding frequencies (how many times they occurred in a string of text), e.g.,
{"fish":9, "parrot":8, "frog":9, "cat":9, "stork":1, "dog":4, "bat":9, "rat":4}
and, an integer, the length of the keywords in the dictionary which are to be considered.
The function prints the keyword length, followed by " letter keywords: ", then prints a sorted list of all the dictionary keywords of the required length, which have the highest frequency, followed by the frequency. For example, the following code:
word_frequencies = {"fish":9, "parrot":8, "frog":9, "cat":9, "stork":1, "dog":4, "bat":9, "rat":4}
print_most_frequent(word_frequencies,3)
print_most_frequent(word_frequencies,4)
print_most_frequent(word_frequencies,5)
print_most_frequent(word_frequencies,6)
print_most_frequent(word_frequencies, 7)
prints the following:
3 letter keywords: ['bat', 'cat'] 9
4 letter keywords: ['fish', 'frog'] 9
5 letter keywords: ['stork'] 1
6 letter keywords: ['parrot'] 8
7 letter keywords: [] 0
I have coded to get the answer above however it is saying I'm wrong. Maybe it needs a simplifying but i'm struggling how to. Could someone help thank you.
def print_most_frequent(words_dict, word_len):
word_list = []
freq_list = []
for word,freq in words_dict.items():
if len(word) == word_len:
word_list += [word]
freq_list += [freq]
new_list1 = []
new_list2 = []
if word_list == [] and freq_list == []:
new_list1 += []
new_list2 += [0]
return print(new_list1, max(new_list2))
else:
maximum_value = max(freq_list)
for i in range(len(freq_list)):
if freq_list[i] == maximum_value:
new_list1 += [word_list[i]]
new_list2 += [freq_list[i]]
new_list1.sort()
return print(new_list1, max(new_list2))
You can use:
def print_most_frequent(words_dict, word_len):
max_freq = 0
words = list()
for word, frequency in words_dict.items():
if len(word) == word_len:
if frequency > max_freq:
max_freq = frequency
words = [word]
elif frequency == max_freq:
words.append(word)
print("{} letter keywords:".format(word_len), sorted(words), max_freq)
It just iterates over the words dictionary, considering only the words whose length is the wanted one and builds the list of the most frequent words, resetting it as soon as a greater frequency is found.
One way you can do is to map the values as keys and vice-versa, this way you can easily get the most frequent words:
a = {"fish":9, "parrot":8, "frog":9, "cat":9, "stork":1, "dog":4, "bat":9, "rat":4}
getfunc = lambda x, dct: [i for i in dct if dct[i] == x]
new_dict = { k : getfunc(k, a) for k in a.values() }
print (new_dict)
output:
{8: ['parrot'], 1: ['stork'], 4: ['rat', 'dog'], 9: ['bat', 'fish', 'frog', 'cat']}
So, now if you want 9 digit words, simply say
b = new_dict[9]
print (b, len(b))
which will give:
['cat', 'fish', 'bat', 'frog'] 4
You get to use the dictionary, instead of calling the function again and over. This is faster as you loop over the frequencies just once, but if you still need a function, can just do a one-liner lambda maybe:
print_most_frequent = lambda freq, x: print (freq[x])
print_most_frequent(new_dict, 9)
print_most_frequent(new_dict, 4)
which gives:
['fish', 'bat', 'frog', 'cat']
['rat', 'dog']
Related
this is what I have so far:
wlist = [word for word in wlist if not any(map(lambda x: x in word, 'c'))]
this code works, however in its current state it will remove all strings from wlist which contain 'c'. I would like to be able to specify an index position. For example if
wlist = ['snake', 'cat', 'shock']
wlist = [word for word in wlist if not any(map(lambda x: x in word, 'c'))]
and I select index position 3 than only 'shock' will be removed since 'shock' is the only string with c in index 3. the current code will remove both 'cat' and 'shock'. I have no idea how to integrate this, I would appreciate any help, thanks.
Simply use slicing:
out = [w for w in wlist if w[3:4] != 'c']
Output: ['snake', 'cat']
Probably you should use regular expressions. How ever I don`t know them )), so just iterates through list of words.
for i in wlist:
try:
if i[3] == 'c':
wlist.remove(i)
except IndexError:
continue
You should check only the 3rd character in your selected string. If you use the [<selected_char>:<selected_char>+1] list slicing then only the selected character will be checked. More about slicing: https://stackoverflow.com/a/509295/11502612
Example code:
checking_char_index = 3
wlist = ["snake", "cat", "shock", "very_long_string_with_c_char", "x", "c"]
out = [word for word in wlist if word[checking_char_index : checking_char_index + 1] != "c"]
print(out)
As you can see, I have extended your list with some corner-case strings and it works as expected as you can see below.
Output:
>>> python3 test.py
['snake', 'cat', 'very_long_string_with_c_char', 'x', 'c']
Make the function unique (one, two) that can count the number of unique words that exist in sentences1 and sentences2 at once. The function has a special ability to combine all the same words into one word
def unique (one, two):
result= unique('I like food', 'I like cat')
print(len(result))
print(sorted(result))
Ouput
4
['food', 'cat', 'likelike', 'II']
def unique(one, two):
words = {}
sentence = one.split() + two.split()
for word in sentence:
if word in words.keys():
words[word] += word
else:
words[word] = word
return [word for word in words.values()]
print(unique("I like food", "I like cat"))
will print ['food', 'cat', 'likelike', 'II']
This might help you
from collections import defaultdict
def unique(s1, s2):
d = defaultdict(list)
for word in s1.split(' '):
d[word].append(word)
for word in s2.split(' '):
d[word].append(word)
return [''.join(word) for _, word in d.items()]
from collections import defaultdict
def unique (one, two):
uniques = defaultdict(int)
for word in one.split():
uniques[word] = uniques[word] + 1
for word in two.split():
uniques[word] = uniques[word] + 1
list_of_words = [word * count for word, count in uniques.items()]
return list_of_words
result= unique('I like food', 'I like cat')
print(len(result))
print(sorted(result))
And the output:
4
['II', 'cat', 'food', 'likelike']
I've opened another thread with exactly this subject, but I think I posted too much code and I didn't really know where my problem was, now I think I have a better idea but still in need of help. What we have is a text-file with 3 letter words, only 3 letter words. I also have a Word (node) and queue-class. My findchildren-method is supposed to find, for one single word, all the children to this word, let's say I enter "fan", then I'm supposed to get something like ["kan","man"....etc]. The code is currently looking like this:
def findchildren(mangd,parent):
children=set()
lparent=list(parent)
mangd.remove(parent)
for word in mangd:
letters=list(word)
count=0
i=0
for a in letters:
if a==lparent[i]:
count+=1
i+=1
else:
i+=1
if count==2:
if word not in children:
children.add(word)
if i>2:
break
return children
The code above, for findchildren is currently working fine, but, when I use it for my other methods (to implement the bfs-search) everything will take way too long time, therefore, I would like to gather all the children in a dictionary containing lists with the children. It feels like this assignment is out of my league right now, but is this possible to do? I tried to create something like this:
def findchildren2(mangd):
children=[]
for word in mangd:
lparent=list(word)
mangd.remove(word)
letters=list(word)
count=0
i=0
for a in letters:
if a==lparent[i]:
count+=1
i+=1
else:
i+=1
if count==2:
if word not in children:
children.append(word)
if i>2:
break
return children
I suppose my last try is simply garbage, I get the errormessage " Set changed size using iteration".
def findchildren3(mangd,parent):
children=defaultdict(list)
lparent=list(parent)
mangd.remove(parent)
for word in mangd:
letters=list(word)
count=0
i=0
for a in letters:
if a==lparent[i]:
count+=1
i+=1
else:
i+=1
if count==2:
children[0].append(word)
if i>2:
break
return children
There are more efficient ways to do this (the below is O(n^2) so not great) but here is a simple algorithm to get you started:
import itertools
from collections import defaultdict
words = ['abc', 'def', 'adf', 'adc', 'acf', 'dec']
bigrams = {k: {''.join(x) for x in itertools.permutations(k, 2)} for k in words}
result = defaultdict(list)
for k, v in bigrams.iteritems():
for word in words:
if k == word:
continue
if len(bigrams[k] & bigrams[word]):
result[k].append(word)
print result
Produces:
defaultdict(<type 'list'>, {'abc': ['adc', 'acf'], 'acf': ['abc', 'adf', 'adc'], 'adf': ['def', 'adc', 'acf'], 'adc': ['abc', 'adf', 'acf', 'dec'], 'dec': ['def', 'adc'], 'def': ['adf', 'dec']})
Here is a more efficient version with some commentary:
import itertools
from collections import defaultdict
words = ['abc', 'def', 'adf', 'adc', 'acf', 'dec']
# Build a map of {word: {bigrams}} i.e. {'abc': {'ab', 'ba', 'bc', 'cb', 'ac', 'ca'}}
bigramMap = {k: {''.join(x) for x in itertools.permutations(k, 2)} for k in words}
# 'Invert' the map so it is {bigram: {words}} i.e. {'ab': {'abc', 'bad'}, 'bc': {...}}
wordMap = defaultdict(set)
for word, bigramSet in bigramMap.iteritems():
for bigram in bigramSet:
wordMap[bigram].add(word)
# Create a final map of {word: {words}} i.e. {'abc': {'abc', 'bad'}, 'bad': {'abc', 'bad'}}
result = defaultdict(set)
for k, v in wordMap.iteritems():
for word in v:
result[word] |= v ^ {word}
# Display all 'childen' of each word from the original list
for word in words:
print "The 'children' of word {} are {}".format(word, result[word])
Produces:
The 'children' of word abc are set(['acf', 'adc'])
The 'children' of word def are set(['adf', 'dec'])
The 'children' of word adf are set(['adc', 'def', 'acf'])
The 'children' of word adc are set(['adf', 'abc', 'dec', 'acf'])
The 'children' of word acf are set(['adf', 'abc', 'adc'])
The 'children' of word dec are set(['adc', 'def'])
Solution (which is O(n^2) sadly) for the updated requirement in Python 3 (run it here):
from collections import defaultdict
words = ['fan', 'ban', 'fbn', 'ana', 'and', 'ann']
def isChildOf(a, b):
return sum(map(lambda xy: xy[0] == xy[1], zip(a, b))) >= 2
result = defaultdict(set)
for word in words:
result[word] = {x for x in words if isChildOf(word, x) and x != word}
# Display all 'childen' of each word from the original list
for word in words:
print("The children of word {0} are {1}".format(word, result[word]))
Produces:
The 'children' of word fan are set(['ban', 'fbn'])
The 'children' of word ban are set(['fan'])
The 'children' of word fbn are set(['fan'])
The 'children' of word ana are set(['and', 'ann'])
The 'children' of word and are set(['ann', 'ana'])
The 'children' of word ann are set(['and', 'ana'])
The algorithm here is very simple and not very efficient but let me try to break it down.
The isChildOf function takes two words as input and does the following:
zip's a & b together, here both are treated as iterables with each character being one 'item' in the iteration. For example if a is 'fan' and b is 'ban' then zip('fan', 'ban') produces this list of pairs [('f', 'b'), ('a', 'a'), ('n', 'n')]
Next it uses the map function to apply the lambda function (a fancy name for an anonymous function) to each item in the list produced in step one. The function simply takes the pair of input elements (i.e. 'f' & 'b') and returns True if they match and False otherwise. For our example this will result in [False, True, True] as the first pair of characters do not match but both the remaining pairs do match.
Finally the function runs the sum function on the list produced by step 2. It so happens that True evaluates to 1 in Python and False to 0 and so the sum of our list is 2. We then simply return whether that number is greater than or equal to 2.
The for word in words loop simply compares each input word against all other words and keeps the ones where isChildOf evaluates to True taking care not to add the word itself.
I hope that is clear!
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Checking if a string's characters are ascending alphabetically and its ascent is evenly spaced python
This is what I currently have:
wordlist = ['fox', 'aced', 'definite', 'ace']
for word in wordlist:
a = len(word)
if (ord(word[a-(a-1)] - ord(word[(a-a)])) == ord(word[a-(a-2)])-ord(word[a-(a-1)]:
print "success", word
else:
print "fail", word
What I'm trying to do is calculate the ASCII values between each of the letters in the word. And check to see if the ord of the letters are increasing by the same value.
so for fox, it would check if the difference between the ord of 2nd and 1st letters are equal to the ord difference of the 3rd and 2nd letters.
However, with my current 'if' statement, only the first 3 letters of a word are compared. How can I rewrite this statement to cover every letter in a word of length greater than 3?
Sorry if I can't present this clearly, thanks for your time.
Note the use of len(set(...)) < 2:
def check(word):
return len(set(ord(a) - ord(b) for a,b in zip(word,word[1:]))) < 2
wordlist = ['fox', 'aced', 'definite', 'ace']
print filter(check, wordlist)
Prints:
['fox', 'ace']
Consider
import operator
def diff(s):
return map(operator.sub, s[1:], s[:-1])
wordlist = ['fox', 'aced', 'definite', 'ace']
print [w for w in wordlist if len(set(diff(map(ord, w)))) == 1]
## ['fox', 'ace']
The latter expression decomposed:
w = 'fox'
print map(ord, w) # [102, 111, 120]
print diff(map(ord, w)) # [9, 9]
print set(diff(map(ord, w))) # set([9])
print len(set(diff(map(ord, w)))) # 1
I believe you are looking to see of the ord difference between every letter in a word is the same.
def check(word):
return all((ord(ele_1) - ord(ele_2)) == (ord(word[0]) - ord(word[1])) for ele_1,ele_2 in zip(word,word[1:]) )
Result:
>>> check('abcde')
True
>>> check('wdjhrd')
False
Applying to your list:
wordlist = ['fox', 'aced', 'definite', 'ace']
new_list = filter(check, wordlist)
Result:
>>> new_list
['fox', 'ace']
Here's my form which I feel is a little more readable and extendable into a list comprehension.
>>> a = 'ace'
>>> all(ord(a[n])-ord(a[n-1]) == (ord(a[1])-ord(a[0])) for n in xrange(len(a)-1,0,-1))
True
and to iterate through the list of words, a list comprehension:
wordlist = ['fox', 'aced', 'definite', 'ace']
[a for a in wordlist if all(ord(a[n])-ord(a[n-1]) == (ord(a[1])-ord(a[0])) for n in xrange(len(a) -1,0,-1))]
Returns:
['fox', 'ace']
Try this:
wordlist = ['fox', 'aced', 'definite', 'ace', 'x']
for word in wordlist:
if len(word) < 2:
print "fail", word
continue
diff = ord(word[1]) - ord(word[0])
if all(ord(word[i+1])-ord(word[i])==diff for i in xrange(1, len(word)-1)):
print "success", word
else:
print "fail", word
Notice that this solution is efficient, as it doesn't generate any intermediate lists or word slices and the processing inside all() is done with iterators, also all() is "short-circuiting": it will terminate on the first False it finds.
Maybe try changing the ord(word[a-(a-2)]) to ord(word[a-(a-len((word)-1))]). I'm not sure if that's your exact code, but it seems to also be some EOF errors.
like umbellar = umbrella both are equal words.
Input = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu","eyra","egma","game","leam","amel","year","meal","yare","gun","alme","ung","male","lame","mela","mage" ]
so output should be :
output=[
["umbellar","umbrella"],
["ago","goa"],
["aery","ayre","eyra","yare","year"],
["alem","alme","amel","lame","leam","male","meal","mela"],
["gnu","gun","ung"]
["egma","game","mage"],
]
from itertools import groupby
def group_words(word_list):
sorted_words = sorted(word_list, key=sorted)
grouped_words = groupby(sorted_words, sorted)
for key, words in grouped_words:
group = list(words)
if len(group) > 1:
yield group
Example:
>>> group_words(["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu","eyra","egma","game","leam","amel","year","meal","yare","gun","alme","ung","male","lame","mela","mage" ])
<generator object group_words at 0x0297B5F8>
>>> list(_)
[['umbellar', 'umbrella'], ['egma', 'game', 'mage'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['aery', 'ayre', 'eyra', 'year', 'yare'], ['goa', 'ago'], ['gnu', 'gun', 'ung']]
They're not equal words, they're anagrams.
Anagrams can be found by sorting by character:
sorted('umbellar') == sorted('umbrella')
collections.defaultdict comes in handy:
from collections import defaultdict
input = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu",
"eyra","egma","game","leam","amel","year","meal","yare","gun",
"alme","ung","male","lame","mela","mage" ]
D = defaultdict(list)
for i in input:
key = ''.join(sorted(input))
D[key].append(i)
output = D.values()
And output is [['umbellar', 'umbrella'], ['goa', 'ago'], ['gnu', 'gun', 'ung'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['egma', 'game', 'mage'], ['aery', 'ayre', 'eyra', 'year', 'yare']]
As others point out you're looking for all the groups of anagrams in your list of words. here you have a possible solution. This algorithm looks for candidates and selects one (first element) as the canonical word, deletes the rest as possible words because anagrams are transitive and once you find that a word belongs to an anagram group you don't need to recompute it again.
input = ["umbellar","goa","umbrella","ago","aery","alem","ayre","gnu",
"eyra","egma","game","leam","amel","year","meal","yare","gun",
"alme","ung","male","lame","mela","mage" ]
res = dict()
for word in input:
res[word]=[word]
for word in input:
#the len test is just to avoid sorting and comparing words of different len
candidates = filter(lambda x: len(x) == len(word) and\
sorted(x) == sorted(word),res.keys())
if len(candidates):
canonical = candidates[0]
for c in candidates[1:]:
#we delete all candidates expect the canonical/
del res[c]
#we add the others to the canonical member
res[canonical].append(c)
print res.values()
This algth outputs ...
[['year', 'ayre', 'aery', 'yare', 'eyra'], ['umbellar', 'umbrella'],
['lame', 'leam', 'mela', 'amel', 'alme', 'alem', 'male', 'meal'],
['goa', 'ago'], ['game', 'mage', 'egma'], ['gnu', 'gun', 'ung']]
the answer of Shang is right......but I have been challenged to do same thing without using .... 'groupby()' .......
here it is.....
adding the print statements will help you in debugging the code and runtime output....
def group_words(word_list):
global new_list
list1 = []
_list0 = []
_list1 = []
new_list = []
for elm in word_list:
list_elm = list(elm)
list1.append(list(list_elm))
for ee in list1:
ee = sorted(ee)
ee = ''.join(ee)
_list1.append(ee)
_list1 = list(set(_list1))
for _e1 in _list1:
for e0 in word_list:
if len(e0) == len(_e1):
list_e0 = ''.join(sorted(e0))
if _e1 == list_e0:
_list0.append(e0)
_list0 = list(_list0)
new_list.append(_list0)
_list0 = []
return new_list
and output is
[['umbellar', 'umbrella'], ['goa', 'ago'], ['gnu', 'gun', 'ung'], ['alem', 'leam', 'amel', 'meal', 'alme', 'male', 'lame', 'mela'], ['egma', 'game', 'mage'], ['aery', 'ayre', 'eyra', 'year', 'yare']]