Python Dictionary frequency - python

I have a dictionary of dictionary and need to count how many times letter pairs occur in a given string. I got the dictionaries to work, I just am completely stuck on how to make a counter work for this...
Anyway, here is what I got. Any help is appreciated
test = 'how now, brown cow, ok?'
def make_letter_pairs(text):
di = {}
total = len(text)
for i in range(len(text)-1):
ch = text[i]
ach = text[i+1]
if ch in ascii_lowercase and ach in ascii_lowercase:
if ch not in di:
row = di.setdefault(ch, {})
row.setdefault(ach, 0)
return di
make_letter_pairs(test)

Counter from the collections module is the way to go on this:
Code:
from collections import Counter
from string import ascii_lowercase
def make_letter_pairs(text):
return Counter([t for t in [text[i:i+2] for i in range(len(text) - 1)]
if t[0] in ascii_lowercase and t[1] in ascii_lowercase])
test = 'how now, brown cow, ok?'
print(make_letter_pairs(test))
Results:
Counter({'ow': 4, 'co': 1, 'no': 1, 'wn': 1, 'ho': 1, 'br': 1, 'ok': 1, 'ro': 1})

Related

Python: Nested dictionary - create if key doesn't exist, else sum 1

ESCENARIO
I am trying to count the number of times a word appears in a sentence, for a list of sentences.
Each sentence is a list of words.
I want the final dictionary to have a key for each word in the entire corpus, and a second key indicating the sentences in which they appear, with the value being the number of times it appears in it.
CURRENT SOLUTION
The following code works correctly:
dfm = dict()
for i,sentence in enumerate(setences):
for word in sentence:
if word not in df.keys():
dfm[word] = dict()
if i not in dfm[word].keys():
dfm[word][i] = 1
else:
dfm[word][i] += 1
QUESTION
Is there any cleaner way to do it with python?
I have already gone through this and this where they suggest using:
dic.setdefault(key,[]).append(value)
and,
d = defaultdict(lambda: defaultdict(dict))
I think they are good solution, but I can't figure out how to adapt that to my particular solution.
Thanks !
Say you have this input:
sentences = [['dog','is','big'],['cat', 'is', 'big'], ['cat', 'is', 'dark']]
Your solution:
dfm = dict()
for i,sentence in enumerate(sentences):
for word in sentence:
if word not in dfm.keys():
dfm[word] = dict()
if i not in dfm[word].keys():
dfm[word][i] = 1
else:
dfm[word][i] += 1
Defaultdict int:
from collections import defaultdict
dfm2 = defaultdict(lambda: defaultdict(int))
for i,sentence in enumerate(sentences):
for word in sentence:
dfm2[word][i] += 1
Test:
dfm2 == dfm # True
#{'dog': {0: 1},
# 'is': {0: 1, 1: 1, 2: 1},
# 'big': {0: 1, 1: 1},
# 'cat': {1: 1, 2: 1},
# 'dark': {2: 1}}
for cleaner version use Counter
from collections import Counter
string = 'this is america this is america'
x=Counter(string.split())
print(x)
output
Counter({'this': 2, 'is': 2, 'america': 2})
if want some own code then
copying input data (sentence) from #rassar
def func(list_:list):
dic = {}
for sub_list in list_:
for word in sub_list:
if word not in dic.keys():
dic.update({word:1})
else:
dic[word]+=1
return dic
sentences = [['dog','is','big'],['cat', 'is', 'big'], ['cat', 'is', 'dark']]
print(func(sentences))
output
{'dog': 1, 'is': 3, 'big': 2, 'cat': 2, 'dark': 1}
Use counters
from collections import Counter
sentences = ["This is Day", "Never say die", "Chat is a good bot", "Hello World", "Two plus two equals four","A quick brown fox jumps over the lazy dog", "Young chef, bring whisky with fifteen hydrogen ice cubes"]
sentenceWords = ( Counter(x.lower() for x in sentence.split()) for sentence in sentences)
#print result
print("\n".join(str(c) for c in sentenceWords))

Creating a dictionary for each word in a file and counting the frequency of words that follow it

I am trying to solve a difficult problem and am getting lost.
Here's what I'm supposed to do:
INPUT: file
OUTPUT: dictionary
Return a dictionary whose keys are all the words in the file (broken by
whitespace). The value for each word is a dictionary containing each word
that can follow the key and a count for the number of times it follows it.
You should lowercase everything.
Use strip and string.punctuation to strip the punctuation from the words.
Example:
>>> #example.txt is a file containing: "The cat chased the dog."
>>> with open('../data/example.txt') as f:
... word_counts(f)
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
Here's all I have so far, in trying to at least pull out the correct words:
def word_counts(f):
i = 0
orgwordlist = f.split()
for word in orgwordlist:
if i<len(orgwordlist)-1:
print orgwordlist[i]
print orgwordlist[i+1]
with open('../data/example.txt') as f:
word_counts(f)
I'm thinking I need to somehow use the .count method and eventually zip some dictionaries together, but I'm not sure how to count the second words for each first word.
I know I'm nowhere near solving the problem, but trying to take it one step at a time. Any help is appreciated, even just tips pointing in the right direction.
We can solve this in two passes:
in a first pass, we construct a Counter and count the tuples of two consecutive words using zip(..); and
then we turn that Counter in a dictionary of dictionaries.
This results in the following code:
from collections import Counter, defaultdict
def word_counts(f):
st = f.read().lower().split()
ctr = Counter(zip(st,st[1:]))
dc = defaultdict(dict)
for (k1,k2),v in ctr.items():
dc[k1][k2] = v
return dict(dc)
We can do this in one pass:
Use a defaultdict as a counter.
Iterate over bigrams, counting in-place
So... For the sake of brevity, we'll leave the normalization and cleaning out:
>>> from collections import defaultdict
>>> counter = defaultdict(lambda: defaultdict(int))
>>> s = 'the dog chased the cat'
>>> tokens = s.split()
>>> from itertools import islice
>>> for a, b in zip(tokens, islice(tokens, 1, None)):
... counter[a][b] += 1
...
>>> counter
defaultdict(<function <lambda> at 0x102078950>, {'the': defaultdict(<class 'int'>, {'cat': 1, 'dog': 1}), 'dog': defaultdict(<class 'int'>, {'chased': 1}), 'chased': defaultdict(<class 'int'>, {'the': 1})})
And a more readable output:
>>> {k:dict(v) for k,v in counter.items()}
{'the': {'cat': 1, 'dog': 1}, 'dog': {'chased': 1}, 'chased': {'the': 1}}
>>>
Firstly that is some brave cat who chased a dog! Secondly it is a little tricky because we don't interact with this type of parsing every day. Here's the code:
k = "The cat chased the dog."
sp = k.split()
res = {}
prev = ''
for w in sp:
word = w.lower().replace('.', '')
if prev in res:
if word.lower() in res[prev]:
res[prev][word] += 1
else:
res[prev][word] = 1
elif not prev == '':
res[prev] = {word: 1}
prev = word
print res
You could:
Create a list of stripped words;
Create word pairs with either zip(list_, list_[1:]) or any method that iterates by pairs;
Create a dict of first words in the pair followed by a list of the second word of the pair;
Count the words in the list.
Like so:
from collections import Counter
s="The cat chased the dog."
li=[w.lower().strip('.,') for w in s.split()] # list of the words
di={}
for a,b in zip(li,li[1:]): # words by pairs
di.setdefault(a,[]).append(b) # list of the words following first
di={k:dict(Counter(v)) for k,v in di.items()} # count the words
>>> di
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
If you have a file, just read from the file into a string and proceed.
Alternatively, you could
Same first two steps
Use a defaultdict with a Counter as a factory.
Like so:
from collections import Counter, defaultdict
li=[w.lower().strip('.,') for w in s.split()]
dd=defaultdict(Counter)
for a,b in zip(li, li[1:]):
dd[a][b]+=1
>>> dict(dd)
{'the': Counter({'dog': 1, 'cat': 1}), 'chased': Counter({'the': 1}), 'cat': Counter({'chased': 1})}
Or,
>>> {k:dict(v) for k,v in dd.items()}
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
I think this is a one pass solution without importing defaultdict. Also it has punctuation stripping. I have tried to optimize it for large files or repeated opening of files
from itertools import islice
class defaultdictint(dict):
def __missing__(self,k):
r = self[k] = 0
return r
class defaultdictdict(dict):
def __missing__(self,k):
r = self[k] = defaultdictint()
return r
keep = set('1234567890abcdefghijklmnopqrstuvwxy ABCDEFGHIJKLMNOPQRSTUVWXYZ')
def count_words(file):
d = defaultdictdict()
with open(file,"r") as f:
for line in f:
line = ''.join(filter(keep.__contains__,line)).strip().lower().split()
for one,two in zip(line,islice(line,1,None)):
d[one][two] += 1
return d
print (count_words("example.txt"))
will output:
{'chased': {'the': 1}, 'cat': {'chased': 1}, 'the': {'dog': 1, 'cat': 1}}

python programs to count letters in each word of a sentence

I'm pretty new to python and I need a program that not only counts the words from an input sentence but also counts the number of letters in each word. This is what I have so far. Any help would be very much appreciated!
def main():
s = input("Please enter your sentence: ")
words = s.split()
wordCount = len(words)
print ("Your word and letter counts are:", wordCount)
main()
You can generate a mapping from words to word lengths, as follows:
s = "this is a sentence"
words = s.split()
letter_count_per_word = {w:len(w) for w in words}
This yields
letter_count_per_word == {'this': 4, 'a': 1, 'is': 2, 'sentence': 8}
Actually, Python has a collections class called Counter which will count the number of occurrences of each word for you.
from collections import Counter
my_sentence = 'Python is a widely used programming language'
print Counter(my_sentence.split())
Output
Counter({'a': 1, 'used': 1, 'language': 1, 'Python': 1, 'is': 1, 'programming': 1, 'widely': 1})
Try following code
words = str(input("Please enter your sentence. "))
print (len(words))

How to form a dictionary from a string?

word = 'stacks'
word_dict = {} # to form new dictionary formed from
for letter in word:
word_dict[letter] += 1
print word_dict
I want to create a new dictionary from a string, tracking the count of the letters from word. So what I'm trying to get is:
> word_dict = {'s':2, 't':1, 'a':1, 'c':1, 'k':1}
But I can't figure out how to do this. I get KeyError with my current code
Use the collections.Counter() class instead:
from collections import Counter
word_dict = Counter(word)
The Counter does the exact same thing; count occurrences of each letter in word.
In your specific case you didn't first check if the key already exists or provide a default if it doesn't. You could use dict.get() to do that:
word = 'stacks'
word_dict = {} # to form new dictionary formed from
for letter in word:
word_dict[letter] = word_dict.get(letter, 0) + 1
print word_dict
or use dict.setdefault() separately to explicitly set a default before incrementing:
word = 'stacks'
word_dict = {} # to form new dictionary formed from
for letter in word:
word_dict.setdefault(letter, 0)
word_dict[letter] += 1
print word_dict
or test for the key yourself:
word = 'stacks'
word_dict = {} # to form new dictionary formed from
for letter in word:
if letter not in word_dict:
word_dict[letter] = 0
word_dict[letter] += 1
print word_dict
in decreasing order of efficiency.
Or you could use a collections.defaultdict() object to automatically insert a 0 if there the key doesn't yet exist:
from collections import defaultdict
word_dict = defaultdict(int)
for letter in word:
word_dict[letter] += 1
print word_dict
This is essentially what the Counter class does, but the type adds some other niceties such as listing the most common keys or combining counters.
Demo:
>>> from collections import defaultdict, Counter
>>> word = 'stacks'
>>> word_dict = {} # to form new dictionary formed from
>>> for letter in word:
... word_dict[letter] = word_dict.get(letter, 0) + 1
...
>>> word_dict
{'a': 1, 'c': 1, 's': 2, 't': 1, 'k': 1}
>>> word_dict = defaultdict(int)
>>> for letter in word:
... word_dict[letter] += 1
...
>>> word_dict
defaultdict(<type 'int'>, {'a': 1, 'c': 1, 's': 2, 't': 1, 'k': 1})
>>> Counter(word)
Counter({'s': 2, 'a': 1, 'c': 1, 't': 1, 'k': 1})
Try this
from collections import Counter
>>>Counter(word)
Counter({'s': 2, 'a': 1, 'c': 1, 't': 1, 'k': 1})

Counting word frequency and making a dictionary from it

This question already has answers here:
How do I split a string into a list of words?
(9 answers)
Using a dictionary to count the items in a list
(8 answers)
Closed yesterday.
I want to take every word from a text file, and count the word frequency in a dictionary.
Example: 'this is the textfile, and it is used to take words and count'
d = {'this': 1, 'is': 2, 'the': 1, ...}
I am not that far, but I just can't see how to complete it. My code so far:
import sys
argv = sys.argv[1]
data = open(argv)
words = data.read()
data.close()
wordfreq = {}
for i in words:
#there should be a counter and somehow it must fill the dict.
If you don't want to use collections.Counter, you can write your own function:
import sys
filename = sys.argv[1]
fp = open(filename)
data = fp.read()
words = data.split()
fp.close()
unwanted_chars = ".,-_ (and so on)"
wordfreq = {}
for raw_word in words:
word = raw_word.strip(unwanted_chars)
if word not in wordfreq:
wordfreq[word] = 0
wordfreq[word] += 1
for finer things, look at regular expressions.
Although using Counter from the collections library as suggested by #Michael is a better approach, I am adding this answer just to improve your code. (I believe this will be a good answer for a new Python learner.)
From the comment in your code it seems like you want to improve your code. And I think you are able to read the file content in words (while usually I avoid using read() function and use for line in file_descriptor: kind of code).
As words is a string, in for loop, for i in words: the loop-variable i is not a word but a char. You are iterating over chars in the string instead of iterating over words in the string words. To understand this, notice following code snippet:
>>> for i in "Hi, h r u?":
... print i
...
H
i
,
h
r
u
?
>>>
Because iterating over the given string char by chars instead of word by words is not what you wanted to achieve, to iterate words by words you should use the split method/function from string class in Python.
str.split(str="", num=string.count(str)) method returns a list of all the words in the string, using str as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to num.
Notice the code examples below:
Split:
>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?']
loop with split:
>>> for i in "Hi, how are you?".split():
... print i
...
Hi,
how
are
you?
And it looks like something you need. Except for word Hi, because split(), by default, splits by whitespaces so Hi, is kept as a single string (and obviously) you don't want that.
To count the frequency of words in the file, one good solution is to use regex. But first, to keep the answer simple I will be using replace() method. The method str.replace(old, new[, max]) returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max.
Now check code example below to see what I suggested:
>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?'] # it has , with Hi
>>> "Hi, how are you?".replace(',', ' ').split()
['Hi', 'how', 'are', 'you?'] # , replaced by space then split
loop:
>>> for word in "Hi, how are you?".replace(',', ' ').split():
... print word
...
Hi
how
are
you?
Now, how to count frequency:
One way is use Counter as #Michael suggested, but to use your approach in which you want to start from empty an dict. Do something like this code sample below:
words = f.read()
wordfreq = {}
for word in .replace(', ',' ').split():
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
# ^^ add 1 to 0 or old value from dict
What am I doing? Because initially wordfreq is empty you can't assign it to wordfreq[word] for the first time (it will raise key exception error). So I used setdefault dict method.
dict.setdefault(key, default=None) is similar to get(), but will set dict[key]=default if key is not already in dict. So for the first time when a new word comes, I set it with 0 in dict using setdefault then add 1 and assign to the same dict.
I have written an equivalent code using with open instead of single open.
with open('~/Desktop/file') as f:
words = f.read()
wordfreq = {}
for word in words.replace(',', ' ').split():
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
print wordfreq
That runs like this:
$ cat file # file is
this is the textfile, and it is used to take words and count
$ python work.py # indented manually
{'and': 2, 'count': 1, 'used': 1, 'this': 1, 'is': 2,
'it': 1, 'to': 1, 'take': 1, 'words': 1,
'the': 1, 'textfile': 1}
Using re.split(pattern, string, maxsplit=0, flags=0)
Just change the for loop: for i in re.split(r"[,\s]+", words):, that should produce the correct output.
Edit: better to find all alphanumeric character because you may have more than one punctuation symbols.
>>> re.findall(r'[\w]+', words) # manually indent output
['this', 'is', 'the', 'textfile', 'and',
'it', 'is', 'used', 'to', 'take', 'words', 'and', 'count']
use for loop as: for word in re.findall(r'[\w]+', words):
How would I write code without using read():
File is:
$ cat file
This is the text file, and it is used to take words and count. And multiple
Lines can be present in this file.
It is also possible that Same words repeated in with capital letters.
Code is:
$ cat work.py
import re
wordfreq = {}
with open('file') as f:
for line in f:
for word in re.findall(r'[\w]+', line.lower()):
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
print wordfreq
Used lower() to convert an upper letter to lower letter.
output:
$python work.py # manually strip output
{'and': 3, 'letters': 1, 'text': 1, 'is': 3,
'it': 2, 'file': 2, 'in': 2, 'also': 1, 'same': 1,
'to': 1, 'take': 1, 'capital': 1, 'be': 1, 'used': 1,
'multiple': 1, 'that': 1, 'possible': 1, 'repeated': 1,
'words': 2, 'with': 1, 'present': 1, 'count': 1, 'this': 2,
'lines': 1, 'can': 1, 'the': 1}
from collections import Counter
t = 'this is the textfile, and it is used to take words and count'
dict(Counter(t.split()))
>>> {'and': 2, 'is': 2, 'count': 1, 'used': 1, 'this': 1, 'it': 1, 'to': 1, 'take': 1, 'words': 1, 'the': 1, 'textfile,': 1}
Or better with removing punctuation before counting:
dict(Counter(t.replace(',', '').replace('.', '').split()))
>>> {'and': 2, 'is': 2, 'count': 1, 'used': 1, 'this': 1, 'it': 1, 'to': 1, 'take': 1, 'words': 1, 'the': 1, 'textfile': 1}
The following takes the string, splits it into a list with split(), for loops the list and counts
the frequency of each item in the sentence with Python's count function count (). The
words,i, and its frequency are placed as tuples in an empty list, ls, and then converted into
key and value pairs with dict().
sentence = 'this is the textfile, and it is used to take words and count'.split()
ls = []
for i in sentence:
word_count = sentence.count(i) # Pythons count function, count()
ls.append((i,word_count))
dict_ = dict(ls)
print dict_
output; {'and': 2, 'count': 1, 'used': 1, 'this': 1, 'is': 2, 'it': 1, 'to': 1, 'take': 1, 'words': 1, 'the': 1, 'textfile,': 1}
sentence = "this is the textfile, and it is used to take words and count"
# split the sentence into words.
# iterate thorugh every word
counter_dict = {}
for word in sentence.lower().split():
# add the word into the counter_dict initalize with 0
if word not in counter_dict:
counter_dict[word] = 0
# increase its count by 1
counter_dict[word] =+ 1
#open your text book,Counting word frequency
File_obj=open("Counter.txt",'r')
w_list=File_obj.read()
print(w_list.split())
di=dict()
for word in w_list.split():
if word in di:
di[word]=di[word] + 1
else:
di[word]=1
max_count=max(di.values())
largest=-1
maxusedword=''
for k,v in di.items():
print(k,v)
if v>largest:
largest=v
maxusedword=k
print(maxusedword,largest)
you can also use default dictionaries with int type.
from collections import defaultdict
wordDict = defaultdict(int)
text = 'this is the textfile, and it is used to take words and count'.split(" ")
for word in text:
wordDict[word]+=1
explanation:
we initialize a default dictionary whose values are of the type int. This way the default value for any key will be 0 and we don't need to check if a key is present in the dictionary or not. we then split the text with the spaces into a list of words. then we iterate through the list and increment the count of the word's count.
wordList = 'this is the textfile, and it is used to take words and count'.split()
wordFreq = {}
# Logic: word not in the dict, give it a value of 1. if key already present, +1.
for word in wordList:
if word not in wordFreq:
wordFreq[word] = 1
else:
wordFreq[word] += 1
print(wordFreq)
My approach is to do few things from ground:
Remove punctuations from the text input.
Make list of words.
Remove empty strings.
Iterate through list.
Make each new word a key into Dictionary with value 1.
If a word is already exist as key then increment it's value by one.
text = '''this is the textfile, and it is used to take words and count'''
word = '' #This will hold each word
wordList = [] #This will be collection of words
for ch in text: #traversing through the text character by character
#if character is between a-z or A-Z or 0-9 then it's valid character and add to word string..
if (ch >= 'a' and ch <= 'z') or (ch >= 'A' and ch <= 'Z') or (ch >= '0' and ch <= '9'):
word += ch
elif ch == ' ': #if character is equal to single space means it's a separator
wordList.append(word) # append the word in list
word = '' #empty the word to collect the next word
wordList.append(word) #the last word to append in list as loop ended before adding it to list
print(wordList)
wordCountDict = {} #empty dictionary which will hold the word count
for word in wordList: #traverse through the word list
if wordCountDict.get(word.lower(), 0) == 0: #if word doesn't exist then make an entry into dic with value 1
wordCountDict[word.lower()] = 1
else: #if word exist then increament the value by one
wordCountDict[word.lower()] = wordCountDict[word.lower()] + 1
print(wordCountDict)
Another approach:
text = '''this is the textfile, and it is used to take words and count'''
for ch in '.\'!")(,;:?-\n':
text = text.replace(ch, ' ')
wordsArray = text.split(' ')
wordDict = {}
for word in wordsArray:
if len(word) == 0:
continue
else:
wordDict[word.lower()] = wordDict.get(word.lower(), 0) + 1
print(wordDict)
One more function:
def wcount(filename):
counts = dict()
with open(filename) as file:
a = file.read().split()
# words = [b.rstrip() for b in a]
for word in a:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
return counts
def play_with_words(input):
input_split = input.split(",")
input_split.sort()
count = {}
for i in input_split:
if i in count:
count[i] += 1
else:
count[i] = 1
return count
input ="i,am,here,where,u,are"
print(play_with_words(input))
Write a Python program to create a list of strings by taking input from the user and then create a dictionary containing each string along with their frequencies. (e.g. if the list is [‘apple’, ‘banana’, ‘fig’, ‘apple’, ‘fig’, ‘banana’, ‘grapes’, ‘fig’, ‘grapes’, ‘apple’] then output should be {'apple': 3, 'banana': 2, 'fig': 3, 'grapes': 2}.
lst = []
d = dict()
print("ENTER ZERO NUMBER FOR EXIT !!!!!!!!!!!!")
while True:
user = input('enter string element :: -- ')
if user == "0":
break
else:
lst.append(user)
print("LIST ELEMENR ARE :: ",lst)
l = len(lst)
for i in range(l) :
c = 0
for j in range(l) :
if lst[i] == lst[j ]:
c += 1
d[lst[i]] = c
print("dictionary is :: ",d)
You can also go with this approach. But you need to store the text file's content in a variable as a string first after reading the file.
In this way, You don't need to use or import any external libraries.
s = "this is the textfile, and it is used to take words and count"
s = s.split(" ")
d = dict()
for i in s:
c = ""
if i.isalpha() == True:
if i not in d:
d[i] = 1
else:
d[i] += 1
else:
for j in i:
l = len(j)
if j.isalpha() == True:
c+=j
if c not in d:
d[c] = 1
else:
d[c] += 1
print(d)
Result:

Categories