why is it when I do not set default value of defaultdict to be zero (int), my below program does not give me results:
>>> doc
'A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existence in this spot, which was created for the bliss of souls like mine. I am so happy'
>>> some = defaultdict()
>>> for i in doc.split():
... some[i] = some[i]+1
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyError: 'A'
>>> some
defaultdict(None, {})
>>> i
'A'
yet it works with a default value
>>> some = defaultdict(int)
>>> for i in doc.split():
... some[i] = some[i]+1
...
>>> some
defaultdict(<class 'int'>, {'A': 1, 'wonderful': 1, 'serenity': 1, 'has': 1, 'taken': 1, 'possession': 1, 'of': 4, 'my': 2, 'entire': 1, 'soul,': 1, 'like': 2, 'these': 1, 'sweet': 1, 'mornings': 1, 'spring': 1, 'which': 2, 'I': 3, 'enjoy': 1, 'with': 1, 'whole': 1, 'heart.': 1, 'am': 2, 'alone,': 1, 'and': 1, 'feel': 1, 'the': 2, 'charm': 1, 'existence': 1, 'in': 1, 'this': 1, 'spot,': 1, 'was': 1, 'created': 1, 'for': 1, 'bliss': 1, 'souls': 1, 'mine.': 1, 'so': 1, 'happy': 1})
>>>
Could you tell why does it work like thus?
As the documentation says:
The first argument provides the initial value for the default_factory
attribute; it defaults to None. All remaining arguments are treated
the same as if they were passed to the dict constructor, including
keyword arguments.
Therefore, if you just write defaultdict without passing any value to the constructor, the default value is set to None
See the output:
some = defaultdict()
print(some) # defaultdict(None, {})
And when the value is set to None, you can not execute: some[i] = some[i]+1.
Thus, you have to set the default value to int explicitly: some = defaultdict(int)
Related
So I have a large nested dictionary which has the following structure:
dic = {Review0: [{'there': 1, 'good': 3, 'news': 4, 'bad': 4, 'first': 3}],
Review1: [{'roomat': 1, 'recent': 1, 'bought': 1, 'explor': 1, 'sport': 1, 'suv': 2, 'realli': 3, 'nice': 4}],
Review2: [{'found': 2, 'pregnanc': 2, 'also': 1, 'nice': 1, 'explor': 1, 'result': 2}]}
So in order to look at the keys in Review0, I can index through dictionary like this dic[0]
I want to find a way to loop through the nested dictionary to check if a key exists from Review0 to ReviewN, so for example if I want to look for the word pregnanc it will find it in Review2 and return True.
Any ideas?
def yourfunc(dic):
for key, value in dic.items() :
if 'pregnanc' in value[0] :
return True
data = {'Review0': [{'there': 1, 'good': 3, 'news': 4, 'bad': 4, 'first': 3}],
'Review1': [{'roomat': 1, 'recent': 1, 'bought': 1, 'explor': 1, 'sport': 1, 'suv': 2, 'realli': 3, 'nice': 4}],
'Review2': [{'found': 2, 'pregnanc': 2, 'also': 1, 'nice': 1, 'explor': 1, 'result': 2}]}
print (yourfunc(data))
or if your "reviews" might have multiple items :
def yourfunc(dic):
for key, value in dic.items() :
for item in value :
if 'pregnanc' in item :
return True
let me know if this isn't what you're looking for.
testWords is a list with words. setTestWords is the same list as a set. I want to create a dictionary with Dict Comprehension where I will use the word as key and the count as value. I'm also using the .count.
Example output would be like this:
>>> dictTestWordsCount[:2]
>>> {'hi': 22, 'hello': 99}
This is the line if code I'm using but it seems to crash my notebook every time.
l = {x: testWords.count(x) for x in setTestwords}
Not sure what causes your notebook to crash...
In [62]: txt = "the quick red fox jumped over the lazy brown dog"
In [63]: testWords = txt.split()
In [64]: setTestWords = set(testWords)
In [65]: {x:testWords.count(x) for x in setTestWords}
Out[65]:
{'brown': 1,
'dog': 1,
'fox': 1,
'jumped': 1,
'lazy': 1,
'over': 1,
'quick': 1,
'red': 1,
'the': 2}
Or better, Use collection.defaultdict
from collections import defaultdict
d = defaultdict(int)
for word in txt.split():
d[word]+=1
print(d)
defaultdict(int,
{'brown': 1,
'dog': 1,
'fox': 1,
'jumped': 1,
'lazy': 1,
'over': 1,
'quick': 1,
'red': 1,
'the': 2})
I have a default dictionary with name df:
defaultdict(<type 'int'>, {u'DE': 1, u'WV': 1, u'HI': 1, u'WY': 1, u'NH': 2, u'NJ': 1, u'NM': 1, u'TX': 1, u'LA': 1, u'NC': 1, u'NE': 1, u'TN': 1, u'RI': 1, u'VA': 1, u'CO': 1, u'AK': 1, u'AR': 1, u'IL': 1, u'GA': 1, u'IA': 1, u'MA': 1, u'ID': 1, u'ME': 1, u'OK': 2, u'MN': 1, u'MI': 1, u'KS': 1, u'MT': 1, u'MS': 1, u'SC': 2, u'KY': 1, u'OR': 1, u'SD': 1})
how do I get the keys of this dictionary whose values are more than 1.
If I do [df[val] for val in df if df[val]>1]
I get the output as [2, 2, 2]
If I print [df.keys() for val in df if df[val]>1] Still I donot get the key values, I need the keys that has values more than 2 like this ['SC', 'OK', 'NH']
How do I do that??
Reading from a dictionary created using defaultdict() is the same as a normal dict.
To get the keys which have values > 1, you would do:
my_dict = defaultdict(...)
print [key for key, value in my_dict.iteritems() if value > 1]
If you're using Python 3 then it's my_dict.items().
We can use list compression method.
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d['HI'] = 1
>>> d['NH'] = 2
>>> d['WY'] = 1
>>> d['OK'] = 2
>>> [i[0] for i in d.items() if i[1]>1]
['NH', 'OK']
My problem is I can't figure out how to display the word count using the dictionary and refer
to keys length. For example, consider the following piece of text:
"This is the sample text to get an idea!. "
Then the required output would be
3 2
2 3
0 5
as there are 3 words of length 2, 2 words of length 3, and 0 words of length 5 in the
given sample text.
I got as far as displaying the list the word occurrence frequency:
def word_frequency(filename):
word_count_list = []
word_freq = {}
text = open(filename, "r").read().lower().split()
word_freq = [text.count(p) for p in text]
dictionary = dict(zip(text,word_freq))
return dictionary
print word_frequency("text.txt")
which diplays the dict in this format:
{'all': 3, 'show': 1, 'welcomed': 1, 'not': 2, 'availability': 1, 'television,': 1, '28': 1, 'to': 11, 'has': 2, 'ehealth,': 1, 'do': 1, 'get': 1, 'they': 1, 'milestone': 1, 'kroes,': 1, 'now': 3, 'bringing': 2, 'eu.': 1, 'like': 1, 'states.': 1, 'them.': 1, 'european': 2, 'essential': 1, 'available': 4, 'because': 2, 'people': 3, 'generation': 1, 'economic': 1, '99.4%': 1, 'are': 3, 'eu': 1, 'achievement,': 1, 'said': 3, 'for': 3, 'broadband': 7, 'networks': 2, 'access': 2, 'internet': 1, 'across': 2, 'europe': 1, 'subscriptions': 1, 'million': 1, 'target.': 1, '2020,': 1, 'news': 1, 'neelie': 1, 'by': 1, 'improve': 1, 'fixed': 2, 'of': 8, '100%': 1, '30': 1, 'affordable': 1, 'union,': 2, 'countries.': 1, 'products': 1, 'or': 3, 'speeds': 1, 'cars."': 1, 'via': 1, 'reached': 1, 'cloud': 1, 'from': 1, 'needed': 1, '50%': 1, 'been': 1, 'next': 2, 'households': 3, 'commission': 5, 'live': 1, 'basic': 1, 'was': 1, 'said:': 1, 'more': 1, 'higher.': 1, '30mbps': 2, 'that': 4, 'but': 2, 'aware': 1, '50mbps': 1, 'line': 1, 'statement,': 1, 'with': 2, 'population': 1, "europe's": 1, 'target': 1, 'these': 1, 'reliable': 1, 'work': 1, '96%': 1, 'can': 1, 'ms': 1, 'many': 1, 'further.': 1, 'and': 6, 'computing': 1, 'is': 4, 'it': 2, 'according': 1, 'have': 2, 'in': 5, 'claimed': 1, 'their': 1, 'respective': 1, 'kroes': 1, 'areas.': 1, 'responsible': 1, 'isolated': 1, 'member': 1, '100mbps': 1, 'digital': 2, 'figures': 1, 'out': 1, 'higher': 1, 'development': 1, 'satellite': 4, 'who': 1, 'connected': 2, 'coverage': 2, 'services': 2, 'president': 1, 'a': 1, 'vice': 1, 'mobile': 2, "commission's": 1, 'points': 1, '"access': 1, 'rural': 1, 'the': 16, 'agenda,': 1, 'having': 1}
def freqCounter(infilepath):
answer = {}
with open(infilepath) as infile:
for line in infilepath:
for word in line.strip().split():
l = len(word)
if l not in answer:
answer[l] = 0
answer[l] += 1
return answer
AlternativelyL
import collections
def freqCounter(infilepath):
with open(infilepath) as infile:
return collections.Counter(len(word) for line in infile for word in line.strip().split())
Use collections.Counter
import collections
sentence = "This is the sample text to get an idea"
Count = collections.Counter([len(a) for a in sentence.split()])
print Count
To count how many words in a text have given lengths: size -> frequency distribution, you could use a regular expression to extract words:
#!/usr/bin/env python3
import re
from collections import Counter
text = "This is the sample text to get an idea!. "
words = re.findall(r'\w+', text.casefold())
frequencies = Counter(map(len, words)).most_common()
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in frequencies]))
Output
3 word(s) of length 2
3 word(s) of length 4
2 word(s) of length 3
1 word(s) of length 6
Note: It ignores the punctuation such as !. after 'idea' unlike .split()-based solutions automatically.
To read words from a file, you could read lines and extract words from them in the same way as it done for text in the first code example:
from itertools import chain
with open(filename) as file:
words = chain.from_iterable(re.findall(r'\w+', line.casefold())
for line in file)
# use words here.. (the same as above)
frequencies = Counter(map(len, words)).most_common()
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in frequencies]))
In practice, you could use a list to find the length frequency distribution if you ignore words that are longer than a threshold:
def count_lengths(words, maxlen=100):
frequencies = [0] * (maxlen + 1)
for length in map(len, words):
if length <= maxlen:
frequencies[length] += 1
return frequencies
Example
import re
text = "This is the sample text to get an idea!. "
words = re.findall(r'\w+', text.casefold())
frequencies = count_lengths(words)
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in enumerate(frequencies) if n > 0]))
Output
3 word(s) of length 2
2 word(s) of length 3
3 word(s) of length 4
1 word(s) of length 6
What i want is to be able to feed in a multiline Text file which is like a paragraph long and then to be returned with something like:
{'Total words': 'NUMBER', 'Words ending with LY': 'NUMBER'}
I have never used Counter before but i believe that is how i would do it. So i want it to count every word and if the word ends in LY add it to the second count. Considering i have never used Counter i don't know where to go...
with open('SOMETHING.txt') as f:
# something to do with counter here?
EDIT: I have to do it without using counter! how would i achieve the same result but without the counter library?
This should work for you...
def parse_file():
with open('SOMETHING.txt', 'r') as f:
c1 = 0
c2 = 0
for i in f:
w = i.split()
c1 += len(w)
for j in w:
if j.endswith('LY'):
c2 += 1
return {'Total words': c1, 'Words ending with LY': c2}
I would recommend however, you have a look at a few python basics.
Is this hard to try?
from collections import defaultdict
result = defaultdict(int)
result_second = defaultdict(int)
for word in open('text.txt').read().split():
result[word] += 1
if word.endswith('LY'):
result_second[word] +=1
print result,result_second
Output:
defaultdict(<type 'int'>, {'and': 1, 'Considering': 1, 'have': 2, "don't": 1, 'is': 1, 'it': 2, 'second': 1, 'want': 1, 'in': 1, 'before': 1, 'would': 1, 'to': 3, 'count.': 1, 'go...': 1, 'how': 1, 'add': 1, 'if': 1, 'LY': 1, 'it.': 1, 'do': 1, 'ends': 1, 'used': 2, 'that': 1, 'I': 1, 'Counter': 2, 'but': 1, 'So': 1, 'know': 1, 'never': 2, 'believe': 1, 'count': 1, 'word': 2, 'i': 5, 'every': 1, 'the': 2, 'where': 1})
Use collections.Counter()
import collections
with open('your_file.txt') as fp:
text = fp.read()
counter = collections.Counter(['ends_in_ly' if token.endswith('LY') else 'doesnt_end_in_ly' for token in text.split()])
Without counter
with open('file.txt') as fp:
tokens = fp.read().split()
c = sum([1 if token.endswith('LY') else 0 for token in tokens])
return {'ending_in_ly': c, 'not_ending_in_ly': len(tokens) - c}