count characters frequency in a phrase frequency dict in python 3 - python

In my experiences, this is a special work to do. I searched in many different ways but still can't find answer to it.
here the question is.
I have a dict of Chinese phrase frequency.It looks like:
{'中国':18950, '我们':16734, '我国':15400, ...}
What I need to do is count every single character's frequency, for example:
character '国' appears in two phrases ('中国'and '我国') , so this character's frequency should be:
{'国':(18950+15400)}
How can I achieve this?

Simple example,
d = {'abd':2, 'afd':3}
f = {}
for key in d:
strlen = len(key)
for i in range(strlen):
if key[i] in f:
f[key[i]] += d[key]
else:
f[key[i]] = d[key]
print f #gives {'a': 5, 'b': 2, 'd': 5, 'f': 3}

My way:
from collections import Counter
c={'中国':18950, '我们':16734, '我国':15400}
print(Counter([j for k,v in c.items() for i in k for j in [i]*v]))
Output:
Counter({'国': 34350, '我': 32134, '中': 18950, '们': 16734})

Something like this should work:
from collections import defaultdict
char_dict = defaultdict(int)
for phrase, count in phrase_dict.iteritems():
for char in phrase:
char_dict[char] += count

d = {'中国':18950, '我们':16734, '我国':15400, ...}
q = 0
for i in d:
if '国' in i:
a = (d[i])
q += a
print(q)

Related

How to create function that takes a text string and returns a dictionary containing how many times some defined characters occur even if not present?

Hello I asked this question previously and I wanted to adjust the code that I have now. I want to adjust this code so that if a letter is not present in a text string it still returns the value 0 to it assigned.
count = {}
for l in text.lower():
if l in let:
if l in count.keys():
count[l] += 1
else:
count[l] = 1
return count
It currently returns this:
example = "Sample String"
print(func(example, "sao")
{'s': 2, 'a' : 1}
This would be my desired output
example = "Sample String"
print(func(example, "sao"))
{'s': 2, 'a' : 1, 'o' :0}
If you don't mind using tools designed especially for your purpose, then the following will do:
from collections import Counter
def myfunc(inp, vals):
c = Counter(inp)
​return {e: c[e] for e in vals}
s = 'Sample String'
print(myfunc(s, 'sao')
Otherwise you can explicitly set all missing values in your functions.
def func(inp, vals):
count = {e:0 for e in vals}
for s in inp:
if s in count:
count[s] += 1
return count
# create a function
def stringFunc(string, letters):
# convert string of letters to a list of letters
letter_list = list(letters)
# dictionary comprehension to count the number of times a letter is in the string
d = {letter: string.lower().count(letter) for letter in letter_list}
return d
stringFunc('Hello World', 'lohdx')
# {'l': 3, 'o': 2, 'h': 1, 'd': 1, 'x': 0}
You can use a Dict Comprehensions and str.count:
def count_letters(text, letters):
lower_text = text.lower()
return {c: lower_text.count(c) for c in letters}
print(count_letters("Sample String", "sao"))
result: {'s': 2, 'a': 1, 'o': 0}
You can use collections.Counter and obtain character counts via the get method:
from collections import Counter
def func(string, chars):
counts = Counter(string.lower())
return {c: counts.get(c, 0) for c in chars}

How to count the occurrences of a list item without using count or counter?

I have a list of words and want to know how many unique words there are. I will eventually import the list into a dictionary showing how many of each word there is.
Right now I have
while i < len(list_words):
if list_words[i] in list_words:
repetitions += 1
i += 1
print(repetitions)
But this just returns the length of the list .
try this,
word_counts = dict.fromkeys(list_words, 0)
for word in list_words:
word_counts[word] += 1
Using defaultdict with int:
from collections import defaultdict
l = ['apple','banana','pizza','apple','banana']
d = defaultdict(int)
for k in l:
d[k] += 1
print(d)
defaultdict(<class 'int'>, {'apple': 2, 'banana': 2, 'pizza': 1})
If you want to know the words which are unique use:
keys = list(d.keys())
[keys[index] for index, value in enumerate(d.values()) if value == 1]
['pizza']
To get the count of unique words use:
sum([True for value in d.values() if value == 1])
1
You can easily get it by this formula length of words list - length of unique words list, which can be calculated by len(list_words) - len(set(list_words)). There is no need to do a loop.
len to a list comprehension:
>>> l = ['apple','banana','pizza','apple','banana']
>>> len([i for i in l if i == 'apple']) # for example we want "apple" to be the one to count.
2
>>>
This is for python 2.7
list_words = ["a","b","c","a","b","b","a"]
d = {}
for word in list_words:
if word in d.keys():
d[word]+=1
else:
d[word]=1
print "There are %d different words and they are: %s"%(len(d.keys()), d.keys())
print d
One possiblitiy would be this:
list_words = ["cat", "mouse", "cat", "rat"]
i = 0
dictionary = {}
result = 0
for unique_word in set(list_words):
word_occurances = 0
for word in list_words:
if word == unique_word:
word_occurances += 1
dictionary[unique_word] = word_occurances
for word in dictionary:
if dictionary[word] == 1:
result += 1
print("There are " + str(result) + " unique words")
UPDATED:
ls = ["apple", "bear", "cat", "cat", "drive"]
lu = set(ls)
d = {}
for s in ls:
if s in d.keys():
d[s] += 1
else:
d[s] = 1
print(d["cat"]) # counts a word in the list
print([s for s in lu if d[s] > 1]) # Multiply values
print(lu) # Unique values
print(d) # Number of unique words
Out:
2
['cat']
{'cat', 'bear', 'drive', 'apple'}
{'apple': 1, 'bear': 1, 'cat': 2, 'drive': 1}

Counting subsequent letters

So I am trying to implement code that will count the next letter in a sentence, using python.
so for instance,
"""So I am trying to implement code that will count the next letter in a sentence, using
python"""
most common letters one after the other
for 's'
'o' :1
'e' :1
for 'o'
' ' :1
'd' :1
'u' :1
'n' :1
I think you get the idea
I already have written code for counting letters prior
def count_letters(word, char):
count = 0
for c in word:
if char == c:
count += 1
return count
As you can see this just counts for letters, but not the next letter. can someone give me a hand on this one?
from collections import Counter, defaultdict
counts = defaultdict(Counter)
s = """So I am trying to implement code that will count the next letter in a sentence, using
python""".lower()
for c1, c2 in zip(s, s[1:]):
counts[c1][c2] += 1
(apart from being simpler, this should be significantly faster than pault's answer by not iterating over the string for every letter)
Concepts to google that aren't named in the code:
for c1, c2 in ... (namely the fact that there are two variables): tuple unpacking
s[1:]: slicing. Basically this is a copy of the string after the first character.
Here is a relatively terse way to do it:
from itertools import groupby
from collections import Counter
def countTransitionFrequencies(text):
prevNext = list(zip(text[:-1], text[1:]))
prevNext.sort(key = lambda pn: pn[0])
transitions = groupby(prevNext, lambda pn: pn[0])
freqs = map(
lambda kts: (kts[0], Counter(map(lambda kv: kv[1], kts[1]))),
transitions
)
return freqs
Explanation:
zip creates list of pairs with (previous, next) characters
The pairs are sorted and grouped by the previous character
The frequencies of the next characters (extracted from pairs by kv[1]) are then counted using Counter.
Sorting is not really necessary, but unfortunately, this is how the provided groupby works.
An example:
for k, v in countTransitionFrequencies("hello world"):
print("%r -> %r" % (k, v))
This prints:
' ' -> Counter({'w': 1})
'e' -> Counter({'l': 1})
'h' -> Counter({'e': 1})
'l' -> Counter({'l': 1, 'o': 1, 'd': 1})
'o' -> Counter({' ': 1, 'r': 1})
'r' -> Counter({'l': 1})
'w' -> Counter({'o': 1})
Here's a way using collections.Counter:
Suppose the string you provided was stored in a variable s.
First we iterate over the set of all lower case letters in s. We do this by making another string s_lower which will convert the string s to lowercase. We then wrap this with the set constructor to get unique values.
For each char, we iterate through the string and check to see if the previous letter is equal to char. If so, we store this in a list. Finally, we pass this list into the collections.Counter constructor which will count the occurrences.
Each counter is stored in a dictionary, counts, where the keys are the unique characters in the string.
from collections import Counter
counts = {}
s_lower = s.lower()
for char in set(s_lower):
counts[char] = Counter(
[c for i, c in enumerate(s_lower) if i > 0 and s_lower[i-1] == char]
)
For your string, this has the following outputs:
>>> print(counts['s'])
#Counter({'i': 1, 'e': 1, 'o': 1})
>>> print(counts['o'])
#Counter({' ': 2, 'd': 1, 'n': 1, 'u': 1})
One caveat is that this method will iterate through the whole string for each unique character, which could potentially make it slow for large lists.
Here is an alternative approach using collections.Counter and collections.defaultdict that only loops through the string once:
from collections import defaultdict, Counter
def count_letters(s):
s_lower = s.lower()
counts = defaultdict(Counter)
for i in range(len(s_lower) - 1):
curr_char = s_lower[i]
next_char = s_lower[i+1]
counts[curr_char].update(next_char)
return counts
counts = count_letters(s)
We loop over each character in the string (except the last) and on each iteration we update a counter using the next character.
This should work, the only thing is it doesn't sort the values, but that can be solved by creating a new dictionary with list of tuples (char, occurrences) and using sorted function on tuple[1].
def countNext(word):
d = {}
word = word.lower()
for i in range(len(word) - 1):
c = word[i]
cc = word[i+1]
if(not c.isalpha() or not cc.isalpha()):
continue
if c in d:
if cc in d[c]:
d[c][cc] += 1
else:
d[c][cc] = 1
else:
d[c] = {}
d[c][cc] = 1
return d

Most elegant way to count integers in a list

I am looking for the most elegant way to do the following:
Let's say that I want to count number of times each integer appears in a list; I could do it this way:
x = [1,2,3,2,4,1,2,5,7,2]
dicto = {}
for num in x:
try:
dicto[num] = dicto[num] + 1
except KeyError:
dicto[num] = 1
However, I think that
try:
dicto[num] = dicto[num] + 1
except KeyError:
dicto[num] = 1
is not the most elegant ways to do it; I think that I saw the above code replaced by a single line. What is the most elegant way to do this?
I realized that this might be a repeat, but I looked around and couldn't find what I was looking for.
Thank You in advance.
Use the Counter class
>>> from collections import Counter
>>> x = [1,2,3,2,4,1,2,5,7,2]
>>> c = Counter(x)
Now you can use the Counter object c as dictionary.
>>> c[1]
2
>>> c[10]
0
(This works for non-existant values too)
>>> from collections import defaultdict
>>> x = [1,2,3,2,4,1,2,5,7,2]
>>> d = defaultdict(int)
>>> for i in x:
d[i] += 1
>>> dict(d)
{1: 2, 2: 4, 3: 1, 4: 1, 5: 1, 7: 1}
Or just collections.Counter, if you are on Python 2.7+.
Bucket sort, as you're doing, is entirely algorithmically appropriate (discussion). This seems ideal when you don't need the additional overhead from Counter:
from collections import defaultdict
wdict = defaultdict(int)
for word in words:
wdict[word] += 1

How to use dict in python?

10
5
-1
-1
-1
1
1
0
2
...
If I want to count the number of occurrences of each number in a file, how do I use python to do it?
This is almost the exact same algorithm described in Anurag Uniyal's answer, except using the file as an iterator instead of readline():
from collections import defaultdict
try:
from io import StringIO # 2.6+, 3.x
except ImportError:
from StringIO import StringIO # 2.5
data = defaultdict(int)
#with open("filename", "r") as f: # if a real file
with StringIO("10\n5\n-1\n-1\n-1\n1\n1\n0\n2") as f:
for line in f:
data[int(line)] += 1
for number, count in data.iteritems():
print number, "was found", count, "times"
Counter is your best friend:)
http://docs.python.org/dev/library/collections.html#counter-objects
for(Python2.5 and 2.6) http://code.activestate.com/recipes/576611/
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
... cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
# or just cnt = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
for this :
print Counter(int(line.strip()) for line in open("foo.txt", "rb"))
##output
Counter({-1: 3, 1: 2, 0: 1, 2: 1, 5: 1, 10: 1})
I think what you call map is, in python, a dictionary.
Here is some useful link on how to use it: http://docs.python.org/tutorial/datastructures.html#dictionaries
For a good solution, see the answer from Stephan or Matthew - but take also some time to understand what that code does :-)
Read the lines of the file into a list l, e.g.:
l = [int(line) for line in open('filename','r')]
Starting with a list of values l, you can create a dictionary d that gives you for each value in the list the number of occurrences like this:
>>> l = [10,5,-1,-1,-1,1,1,0,2]
>>> d = dict((x,l.count(x)) for x in l)
>>> d[1]
2
EDIT: as Matthew rightly points out, this is hardly optimal. Here is a version using defaultdict:
from collections import defaultdict
d = defaultdict(int)
for line in open('filename','r'):
d[int(line)] += 1
New in Python 3.1:
from collections import Counter
with open("filename","r") as lines:
print(Counter(lines))
Use collections.defaultdict so that
by deafult count for anything is
zero
After that loop thru lines in file
using file.readline and convert
each line to int
increment counter for each value in
your countDict
at last go thru dict using for intV,
count in countDict.iteritems() and
print values
Use dictionary where every line is a key, and count is value. Increment count for every line, and if there is no dictionary entry for line initialize it with 1 in except clause -- this should work with older versions of Python.
def count_same_lines(fname):
line_counts = {}
for l in file(fname):
l = l.rstrip()
if l:
try:
line_counts[l] += 1
except KeyError:
line_counts[l] = 1
print('cnt\ttxt')
for k in line_counts.keys():
print('%d\t%s' % (line_counts[k], k))
l = [10,5,-1,-1,-1,1,1,0,2]
d = {}
for x in l:
d[x] = (d[x] + 1) if (x in d) else 1
There will be a key in d for every distinct value in the original list, and the values of d will be the number of occurrences.
counter.py
#!/usr/bin/env python
import fileinput
from collections import defaultdict
frequencies = defaultdict(int)
for line in fileinput.input():
frequencies[line.strip()] += 1
print frequencies
Example:
$ perl -E'say 1*(rand() < 0.5) for (1..100)' | python counter.py
defaultdict(<type 'int'>, {'1': 52, '0': 48})

Categories