Anagrams in Python using lists - python

Imagine we have following list of strings:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
The output of our program should group each set of anagram and return them all together as a list as following:
Output:
[
["ate","eat","tea"],
["nat","tan"],
["bat"]
]
My current solution finds the first set of anagrams but fails to detect the other two and instead, duplicates the first groups into the list:
class Solution(object):
def groupAnagrams(self, strs):
allResults=[]
results=[]
temp=''
for s in strs:
temp=s[1:]+s[:1]
for i in range(0,len(strs)):
if temp==strs[i]:
results.append(strs[i])
allResults.append(results)
return allResults
and the output is:
[["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"]]
How to fix this issue?
EDIT:
I have fixed the duplication in appending by appending the results into allResults outside of second loop:
class Solution(object):
def groupAnagrams(self, strs):
allResults=[]
results=[]
temp=''
for s in strs:
temp=s[1:]+s[:1]
for i in range(0,len(strs)):
if temp==strs[i]:
results.append(strs[i])
allResults.append(results)
print(results)
return allResults
Yet, it does not detect the other two sets of anagrams.

you can do it using defaultdict of python in-built collections library and sorted :
In [1]: l = ["eat", "tea", "tan", "ate", "nat", "bat"]
In [2]: from collections import defaultdict
In [3]: d = defaultdict(list)
In [4]: for x in l:
...: d[str(sorted(x))].append(x)
In [5]: d.values()
Out[5]: dict_values([['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']])
to fix your the solution you need add the variable to check is allready added, for exanmple(and the while walk through the strs i use enumerate for little performance in the search of the anagrams):
class Solution(object):
def groupAnagrams(self, strs):
allResults = []
added = set([])
temp=''
for i, s in enumerate(strs):
results = []
unique_s = "".join(sorted(s))
if unique_s in added:
continue
else:
added.add(unique_s)
for x in strs[i:]:
if unique_s=="".join(sorted(x)):
results.append(strs[i])
allResults.append(results)
print(added)
return allResults

Use itertools.groupby
>>> lst = ["eat", "tea", "tan", "ate", "nat", "bat"]
>>>
>>> from itertools import groupby
>>> f = lambda w: sorted(w)
>>> [list(v) for k,v in groupby(sorted(lst, key=f), f)]
[['bat'], ['eat', 'tea', 'ate'], ['tan', 'nat']]

Using only lists, as requested in the title of the question:
The second line s_words takes all the letters of each word in words, sorts them, and recreates a string composed of the sorted letters of the word; it creates a list of all the these sorted letters strings, in the same order as the original sequence of words --> this will be used to compare the possible anagrams (the letters of anagrams produce the same string when sorted)
The 3rd line indices hold True or False values, to indicate if the corresponding word has been extracted already, and avoid duplicates.
The following code is a double loop that for each s_word, determines which other s_word is identical, and uses its index to retrieve the corresponding word in the original list of words; it also updates the truth value of the indices.
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
s_words = [''.join(sorted(list(word))) for word in words]
indices = [False for _ in range(len(words))]
anagrams = []
for idx, s_word in enumerate(s_words):
if indices[idx]:
continue
ana = [words[idx]]
for jdx, word in enumerate(words):
if idx != jdx and not indices[jdx] and s_word == s_words[jdx]:
ana.append(words[jdx])
indices[jdx] = True
anagrams.append(ana)
print(anagrams)
output:
[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

The way you implemented your function, you are only looking at rotations of the strings (that is you shift a letter from the beginning to the end, e.g. a-t-e -> t-e-a -> e-a-t). What your algorithm cannot detect is single permutations were you only switch two letters (n-a-t -> t-a-n). In mathematical language you only consider the even permutations of the three letter strings and not the odd permutations.
A modification of your code could for example be:
def get_list_of_permutations(input_string):
list_out = []
if len(input_string) > 1:
first_char = input_string[0]
remaining_string = input_string[1:]
remaining_string_permutations = get_list_of_permutations(remaining_string)
for i in range(len(remaining_string)+1):
for permutation in remaining_string_permutations:
list_out.append(permutation[0:i]+first_char+permutation[i:])
else:
return [input_string]
return list_out
def groupAnagrams(strs):
allResults=[]
for s in strs:
results = []
list_of_permutations = get_list_of_permutations(s)
for i in range(0,len(strs)):
if strs[i] in list_of_permutations:
results.append(strs[i])
if results not in allResults:
allResults.append(results)
return allResults
The output is
Out[218]: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Edit: modified the code to work with all lengths of strings.

https://docs.python.org/3/library/itertools.html#itertools.permutations
from itertools import permutations
word_list = ["eat", "tea", "tan", "ate", "nat", "bat"]
anagram_group_list = []
for word in word_list:
if word == None:
pass
else:
anagram_group_list.append([])
for anagram in permutations(word):
anagram = ''.join(anagram)
try:
idx = word_list.index(anagram)
word_list[idx] = None
anagram_group_list[-1].append(anagram)
except ValueError:
pass # this anagram is not present in word_list
print(anagram_group_list)
# [['eat', 'ate', 'tea'], ['tan', 'nat'], ['bat']]
after refactoring code and stopping it from producing redundant result your code still doesn't give expected result as logic for producing anagram is not completely correct
def groupAnagrams(word_list):
allResults=[]
results=[]
for idx,s in enumerate(word_list):
if s == None:
pass
else:
results = [s] # word s is added to anagram list
# you were generating only 1 anagram like for tan --> ant but in word_list only nat was present
for i in range(1,len(s),1):
temp = s[i:]+s[:i] #anagram
# for s = 'tan' it generates only 'ant and 'nta'
# when it should generate all six tna ant nta _nat_ atn tan
if temp in word_list:
results.append(temp)
word_list[word_list.index(temp)] = None
allResults.append(results)
return allResults
print(groupAnagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
# [['eat', 'ate', 'tea'], ['tan'], ['nat'], ['bat']]

The detection of anagrams of words consisting of unique characters can be done by comparison between sets. [See comment for a general solution]
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
anagrams = []
for w in words:
m = [w2 for w2 in words if set(w2) == set(w)]
if m not in anagrams:
anagrams += [m]
print(anagrams)
Output
[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
EDIT
For words with duplicate characters a multi-set approach can be used. A multi-set can be modeled with collections.Counter.
from collections import Counter
words = ["eat", "tea", "tan", "ate", "nat", "bat", "cia", "aci"]
# group per index
d = {}
multi_sets = list(map(Counter, words))
for i, w in enumerate(words):
i_reference = multi_sets.index(Counter(w)) # always 1st match
d.setdefault(i_reference, []).append(words[i])
anagrams = list(d.values())
# inplace sort: group per size of family of anagrams
anagrams.sort(key=len, reverse=True)
print(anagrams)
Remark: ordering a multi-set is highly non-trivial and the usual methods __lt__, __gt__, ... are not implemented. As a consequence sorted cannot be used. Comparison is still possible with __eq__ or __ne__ which are both naturally supported by Counter.

Related

how can I found the most repeated word and how much repeated it [duplicate]

I am using Python 3.3
I need to create two lists, one for the unique words and the other for the frequencies of the word.
I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.
I have the design in text but am uncertain how to implement it in Python.
The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.
Here's the basic design:
original list = ["the", "car",....]
newlst = []
frequency = []
for word in the original list
if word not in newlst:
newlst.append(word)
set frequency = 1
else
increase the frequency
sort newlst based on frequency list
use this
from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})
You can use
from collections import Counter
It supports Python 2.7,read more information here
1.
>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
use dict
>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]
But, You have to read the file first, and converted to dict.
2.
it's the python docs example,use re and Counter
# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
print words.count(word), word
Pandas answer:
import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()
If you wanted it in ascending order instead, it is as simple as:
pd.Series(original_list).value_counts().sort_values(ascending=True)
Yet another solution with another algorithm without using collections:
def countWords(A):
dic={}
for x in A:
if not x in dic: #Python 2.7: if not dic.has_key(x):
dic[x] = A.count(x)
return dic
dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items()) # if you want it sorted
One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:
list1 = [] #this is your original list of words
list2 = [] #this is a new list
for word in list1:
if word in list2:
list2.index(word)[1] += 1
else:
list2.append([word,0])
Or, more efficiently:
for word in list1:
try:
list2.index(word)[1] += 1
except:
list2.append([word,0])
This would be less efficient than using a dictionary, but it uses more basic concepts.
You can use reduce() - A functional way.
words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})
returns:
{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}
Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.
# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
freq[word] = word_list.count(word) / float(len(word_list))
freq will end up with the frequency of each word in the list you already have.
You need float in there to convert one of the integers to a float, so the resulting value will be a float.
Edit:
If you can't use a dict or set, here is another less efficient way:
# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
if word not in unique_words:
unique_words += [word]
word_frequencies = []
for word in unique_words:
word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
print(unique_words[i] + ": " + word_frequencies[i])
The indicies of unique_words and word_frequencies will match.
The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.
Try this:
words = []
freqs = []
for line in sorted(original list): #takes all the lines in a text and sorts them
line = line.rstrip() #strips them of their spaces
if line not in words: #checks to see if line is in words
words.append(line) #if not it adds it to the end words
freqs.append(1) #and adds 1 to the end of freqs
else:
index = words.index(line) #if it is it will find where in words
freqs[index] += 1 #and use the to change add 1 to the matching index in freqs
Here is code support your question
is_char() check for validate string count those strings alone, Hashmap is dictionary in python
def is_word(word):
cnt =0
for c in word:
if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
cnt +=1
if cnt==len(word):
return True
return False
def words_freq(s):
d={}
for i in s.split():
if is_word(i):
if i in d:
d[i] +=1
else:
d[i] = 1
return d
print(words_freq('the the sky$ is blue not green'))
for word in original_list:
words_dict[word] = words_dict.get(word,0) + 1
sorted_dt = {key: value for key, value in sorted(words_dict.items(), key=lambda item: item[1], reverse=True)}
keys = list(sorted_dt.keys())
values = list(sorted_dt.values())
print(keys)
print(values)
Simple way
d = {}
l = ['Hi','Hello','Hey','Hello']
for a in l:
d[a] = l.count(a)
print(d)
Output : {'Hi': 1, 'Hello': 2, 'Hey': 1}
word and frequency if you need
def counter_(input_list_):
lu = []
for v in input_list_:
ele = (v, lc.count(v)/len(lc)) #if you don't % remove <</len(lc)>>
if ele not in lu:
lu.append(ele)
return lu
counter_(['a', 'n', 'f', 'a'])
output:
[('a', 0.5), ('n', 0.25), ('f', 0.25)]
the best thing to do is :
def wordListToFreqDict(wordlist):
wordfreq = [wordlist.count(p) for p in wordlist]
return dict(zip(wordlist, wordfreq))
then try to :
wordListToFreqDict(originallist)

Filter a list of sets with specific criteria

I have a list of sets:
a = [{'foo','cpu','phone'},{'foo','mouse'}, {'dog','cat'}, {'cpu'}]
Expected outcome:
I want to look at each individual string, do a count and return everything x >= 2 in the original format:
a = [{'foo','cpu'}, {'foo'}, {'cpu'}]
Here's what I have so far but I'm stuck on the last part where I need to append the new list:
from collections import Counter
counter = Counter()
for a_set in a:
# Created a counter to count the occurrences a word
counter.update(a_set)
result = []
for a_set in a:
for word in a_set:
if counter[word] >= 2:
# Not sure how I should append my new set below.
result.append(a_set)
break
print(result)
You are just appending the original set. So you should create a new set with the words that occur at least twice.
result = []
for a_set in a:
new_set = {
word for word in a_set
if counter[word] >= 2
}
if new_set: # check if new set is not empty
result.append(new_set)
Instead, use the following short approach based on sets intersection:
from collections import Counter
a = [{'foo','cpu','phone'},{'foo','mouse'}, {'dog','cat'}, {'cpu'}]
c = Counter([i for s in a for i in s])
valid_keys = {k for k,v in c.items() if v >= 2}
res = [s & valid_keys for s in a if s & valid_keys]
print(res) # [{'cpu', 'foo'}, {'foo'}, {'cpu'}]
Here's what I ended up doing:
Build a counter then iterate over the original list of sets and filter items with <2 counts, then filter any empty sets:
from itertools import chain
from collections import Counter
a = [{'foo','cpu','phone'},{'foo','mouse'}, {'dog','cat'}, {'cpu'}]
c = Counter(chain.from_iterable(map(list, a)))
res = list(filter(None, ({item for item in s if c[item] >= 2} for s in a)))
print(res)
Out: [{'foo', 'cpu'}, {'foo'}, {'cpu'}]

Find group of strings that are anagrams

This question refers to this problem on lintcode. I have a working solution, but it takes too long for the huge testcase. I am wondering how can it be improved? Maybe I can decrease the number of comparisons I make in the outer loop.
class Solution:
# #param strs: A list of strings
# #return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
#staticmethod
def isanagram(s, t):
if len(s)!=len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
if chars[i]<0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
Update: Just to add, not looking for built-in pythonic solutions such as using Counter which are already optimized. Have added Mike's suggestions, but still exceeding time-limit.
Skip strings you already placed in the set. Don't test them again.
# #param strs: A list of strings
# #return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
# If both anagrams exist in set, there is no need to compare them.
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
You can also do a length comparison in your anagram test before iterating through the letters. Whenever the strings aren't the same length, they can't be anagrams anyway. Also, when a counter in chars reaches -1 when comparing values in t, just return false. Don't iterate through chars again.
#staticmethod
def isanagram(s, t):
# Test strings are the same length
if len(s) != len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
# If this is below 0, return false
if chars[i] < 0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
Instead of comparing all pairs of strings, you can just create a dictionary (or collections.defaultdict) mapping each of the letter-counts to the words having those counts. For getting the letter-counts, you can use collections.Counter. Afterwards, you just have to get the values from that dict. If you want all words that are anagrams of any other words, just merge the lists that have more than one entry.
strings = ["cat", "act", "rat", "hut", "tar", "tact"]
anagrams = defaultdict(list)
for s in strings:
anagrams[frozenset(Counter(s).items())].append(s)
print([v for v in anagrams.values()])
# [['hut'], ['rat', 'tar'], ['cat', 'act'], ['tact']]
print([x for v in anagrams.values() if len(v) > 1 for x in v])
# ['cat', 'act', 'rat', 'tar']
Of course, if you prefer not to use builtin functionality you can with just a few more lines just as well use a regular dict instead of defaultdict and write your own Counter, similar to what you have in your isanagram method, just without the comparison part.
Your solution is slow because you're not taking advantage of python's data structures.
Here's a solution that collects results in a dict:
class Solution:
def anagrams(self, strs):
d = {}
for word in strs:
key = tuple(sorted(word))
try:
d[key].append(word)
except KeyError:
d[key] = [word]
return [w for ws in d.values() for w in ws if len(ws) > 1]
As an addition to #Mike's great answer, here is a nice Pythonic way to do it:
import collections
class Solution:
# #param strs: A list of strings
# #return: A list of strings
def anagrams(self, strs):
patterns = Solution.find_anagram_words(strs)
return [word for word in strs if ''.join(sorted(word)) in patterns]
#staticmethod
def find_anagram_words(strs):
anagrams = collections.Counter(''.join(sorted(word)) for word in strs)
return {word for word, times in anagrams.items() if times > 1}
Why not this?
str1 = "cafe"
str2 = "face"
def isanagram(s1,s2):
return all(sorted(list(str1)) == sorted(list(str2)))
if isanagram(str1, str2):
print "Woo"
The same can be done with a single line of code if you are using Linq in C#
string[] = strs; // Input string array
var result = strs.GroupBy(x => new string(x.ToCharArray().OrderBy(z => z).ToArray())).Select(g => g.ToList()).ToList();
Now to Group Anagrams in Python, We have to : Sort the lists. Then, Create a dictionary. Now dictionary will tell us where are those anagrams are( Indices of Dictionary). Then values of the dictionary is the actual indices of the anagrams.
def groupAnagrams(words):
# sort each word in the list
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords, names in enumerate(A):
dict.setdefault(names, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
for index in dict.values():
print([words[i] for i in index])
if __name__ == '__main__':
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
groupAnagrams(words)
The Output :
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']

put for loop in dict comprehension [duplicate]

I am using Python 3.3
I need to create two lists, one for the unique words and the other for the frequencies of the word.
I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.
I have the design in text but am uncertain how to implement it in Python.
The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.
Here's the basic design:
original list = ["the", "car",....]
newlst = []
frequency = []
for word in the original list
if word not in newlst:
newlst.append(word)
set frequency = 1
else
increase the frequency
sort newlst based on frequency list
use this
from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})
You can use
from collections import Counter
It supports Python 2.7,read more information here
1.
>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
use dict
>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]
But, You have to read the file first, and converted to dict.
2.
it's the python docs example,use re and Counter
# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
print words.count(word), word
Pandas answer:
import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()
If you wanted it in ascending order instead, it is as simple as:
pd.Series(original_list).value_counts().sort_values(ascending=True)
Yet another solution with another algorithm without using collections:
def countWords(A):
dic={}
for x in A:
if not x in dic: #Python 2.7: if not dic.has_key(x):
dic[x] = A.count(x)
return dic
dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items()) # if you want it sorted
One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:
list1 = [] #this is your original list of words
list2 = [] #this is a new list
for word in list1:
if word in list2:
list2.index(word)[1] += 1
else:
list2.append([word,0])
Or, more efficiently:
for word in list1:
try:
list2.index(word)[1] += 1
except:
list2.append([word,0])
This would be less efficient than using a dictionary, but it uses more basic concepts.
You can use reduce() - A functional way.
words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})
returns:
{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}
Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.
# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
freq[word] = word_list.count(word) / float(len(word_list))
freq will end up with the frequency of each word in the list you already have.
You need float in there to convert one of the integers to a float, so the resulting value will be a float.
Edit:
If you can't use a dict or set, here is another less efficient way:
# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
if word not in unique_words:
unique_words += [word]
word_frequencies = []
for word in unique_words:
word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
print(unique_words[i] + ": " + word_frequencies[i])
The indicies of unique_words and word_frequencies will match.
The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.
Try this:
words = []
freqs = []
for line in sorted(original list): #takes all the lines in a text and sorts them
line = line.rstrip() #strips them of their spaces
if line not in words: #checks to see if line is in words
words.append(line) #if not it adds it to the end words
freqs.append(1) #and adds 1 to the end of freqs
else:
index = words.index(line) #if it is it will find where in words
freqs[index] += 1 #and use the to change add 1 to the matching index in freqs
Here is code support your question
is_char() check for validate string count those strings alone, Hashmap is dictionary in python
def is_word(word):
cnt =0
for c in word:
if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
cnt +=1
if cnt==len(word):
return True
return False
def words_freq(s):
d={}
for i in s.split():
if is_word(i):
if i in d:
d[i] +=1
else:
d[i] = 1
return d
print(words_freq('the the sky$ is blue not green'))
for word in original_list:
words_dict[word] = words_dict.get(word,0) + 1
sorted_dt = {key: value for key, value in sorted(words_dict.items(), key=lambda item: item[1], reverse=True)}
keys = list(sorted_dt.keys())
values = list(sorted_dt.values())
print(keys)
print(values)
Simple way
d = {}
l = ['Hi','Hello','Hey','Hello']
for a in l:
d[a] = l.count(a)
print(d)
Output : {'Hi': 1, 'Hello': 2, 'Hey': 1}
word and frequency if you need
def counter_(input_list_):
lu = []
for v in input_list_:
ele = (v, lc.count(v)/len(lc)) #if you don't % remove <</len(lc)>>
if ele not in lu:
lu.append(ele)
return lu
counter_(['a', 'n', 'f', 'a'])
output:
[('a', 0.5), ('n', 0.25), ('f', 0.25)]
the best thing to do is :
def wordListToFreqDict(wordlist):
wordfreq = [wordlist.count(p) for p in wordlist]
return dict(zip(wordlist, wordfreq))
then try to :
wordListToFreqDict(originallist)

I'd like someone to help me understand a few lines of code

Can any one here please this code with example if its possible what this code doing?
def sort_by_length(words):
t = []
for word in words:
t.append((len(word), word))
t.sort(reverse=True)
res = []
for length, word in t:
res.append(word)
return res
and what the meaning of reverse =True what reverse is i do understand what men len and append method but return but what he ment by reverse
It is returning a list of words, sorted longest-to-shortest then z-to-a.
You could do the same thing with just
def sort_by_length(words):
return sorted(words, key=lambda w: (len(w), w), reverse=True)
It might make more sense to sort longest-to-shortest, a-to-z, which would be
def sort_by_length(words):
return sorted(words, key=lambda w: (-len(w), w))
def sort_by_length(words):
t = [] # empty list
for word in words:# iterating over given words
t.append((len(word), word)) # appending a word into the list "t" as tupel. e.g word "hello" as (5, "hello")
t.sort(reverse=True) # sorts all tupels in reverse-order
res = []
for length, word in t:
res.append(word) # extracts just the words out of the tupels e.g. (5, "hello") => "hello"
return res # return words ordered
Sort words by length
w = ["abcd", "za", "wyya", "dssffgdg"]
print sort_by_length(w);
http://ideone.com/jRzatV

Categories