Python - Check if a key exists in a large nested dictionary - python

So I have a large nested dictionary which has the following structure:
dic = {Review0: [{'there': 1, 'good': 3, 'news': 4, 'bad': 4, 'first': 3}],
Review1: [{'roomat': 1, 'recent': 1, 'bought': 1, 'explor': 1, 'sport': 1, 'suv': 2, 'realli': 3, 'nice': 4}],
Review2: [{'found': 2, 'pregnanc': 2, 'also': 1, 'nice': 1, 'explor': 1, 'result': 2}]}
So in order to look at the keys in Review0, I can index through dictionary like this dic[0]
I want to find a way to loop through the nested dictionary to check if a key exists from Review0 to ReviewN, so for example if I want to look for the word pregnanc it will find it in Review2 and return True.
Any ideas?

def yourfunc(dic):
for key, value in dic.items() :
if 'pregnanc' in value[0] :
return True
data = {'Review0': [{'there': 1, 'good': 3, 'news': 4, 'bad': 4, 'first': 3}],
'Review1': [{'roomat': 1, 'recent': 1, 'bought': 1, 'explor': 1, 'sport': 1, 'suv': 2, 'realli': 3, 'nice': 4}],
'Review2': [{'found': 2, 'pregnanc': 2, 'also': 1, 'nice': 1, 'explor': 1, 'result': 2}]}
print (yourfunc(data))
or if your "reviews" might have multiple items :
def yourfunc(dic):
for key, value in dic.items() :
for item in value :
if 'pregnanc' in item :
return True
let me know if this isn't what you're looking for.

Related

from collections import defaultdict

why is it when I do not set default value of defaultdict to be zero (int), my below program does not give me results:
>>> doc
'A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existence in this spot, which was created for the bliss of souls like mine. I am so happy'
>>> some = defaultdict()
>>> for i in doc.split():
... some[i] = some[i]+1
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyError: 'A'
>>> some
defaultdict(None, {})
>>> i
'A'
yet it works with a default value
>>> some = defaultdict(int)
>>> for i in doc.split():
... some[i] = some[i]+1
...
>>> some
defaultdict(<class 'int'>, {'A': 1, 'wonderful': 1, 'serenity': 1, 'has': 1, 'taken': 1, 'possession': 1, 'of': 4, 'my': 2, 'entire': 1, 'soul,': 1, 'like': 2, 'these': 1, 'sweet': 1, 'mornings': 1, 'spring': 1, 'which': 2, 'I': 3, 'enjoy': 1, 'with': 1, 'whole': 1, 'heart.': 1, 'am': 2, 'alone,': 1, 'and': 1, 'feel': 1, 'the': 2, 'charm': 1, 'existence': 1, 'in': 1, 'this': 1, 'spot,': 1, 'was': 1, 'created': 1, 'for': 1, 'bliss': 1, 'souls': 1, 'mine.': 1, 'so': 1, 'happy': 1})
>>>
Could you tell why does it work like thus?
As the documentation says:
The first argument provides the initial value for the default_factory
attribute; it defaults to None. All remaining arguments are treated
the same as if they were passed to the dict constructor, including
keyword arguments.
Therefore, if you just write defaultdict without passing any value to the constructor, the default value is set to None
See the output:
some = defaultdict()
print(some) # defaultdict(None, {})
And when the value is set to None, you can not execute: some[i] = some[i]+1.
Thus, you have to set the default value to int explicitly: some = defaultdict(int)

Dictionary returns only last key value pairs inside for loop

I have a list of strings as:
A = [
'philadelphia court excessive disappointed court hope hope',
'hope hope jurisdiction obscures acquittal court',
'mention hope maryland signal held mention problem internal reform life bolster level grievance'
]
and another list as:
B = ['court', 'hope', 'mention', 'life', 'bolster', 'internal', 'level']
I want to create dictionary based on occurrence counts of list words B in list of strings A. Something like,
C = [
{'count':2,'hope':2,'mention':0,'life':0,'bolster':0,'internal':0,'level':0},
{'count':1,'hope':2,'mention':0,'life':0,'bolster':0,'internal':0,'level':0},
{'count':0,'hope':1,'mention':2,'life':1,'bolster':1,'internal':1,'level':1}
]
What I did like,
dic={}
for i in A:
t=i.split()
for j in B:
dic[j]=t.count(j)
But,it returns only last pair of dictionary,
print (dic)
{'court': 0,
'hope': 1,
'mention': 2,
'life': 1,
'bolster': 1,
'internal': 1,
'level': 1}
Instead of creating a list of dicts as in your example output, you are only creating a single dict (and overwriting the word counts each time you check a phrase). You could use re.findall to count the word occurrences in each phrase (which has the benefit of not failing if any of your phrases contain words followed by punctuation such as "hope?").
import re
words = ['court', 'hope', 'mention', 'life', 'bolster', 'internal', 'level']
phrases = ['philadelphia court excessive disappointed court hope hope','hope hope jurisdiction obscures acquittal court','mention hope maryland signal held mention problem internal reform life bolster level grievance']
counts = [{w: len(re.findall(r'\b{}\b'.format(w), p)) for w in words} for p in phrases]
print(counts)
# [{'court': 2, 'hope': 2, 'mention': 0, 'life': 0, 'bolster': 0, 'internal': 0, 'level': 0}, {'court': 1, 'hope': 2, 'mention': 0, 'life': 0, 'bolster': 0, 'internal': 0, 'level': 0}, {'court': 0, 'hope': 1, 'mention': 2, 'life': 1, 'bolster': 1, 'internal': 1, 'level': 1}]
Two issues: You are initializing the dic at the wrong place and not collecting those dics in a list. Here is the fix:
C = []
for i in A:
dic = {}
t=i.split()
for j in B:
dic[j]=t.count(j)
C.append(dic)
# Result:
[{'court': 2, 'hope': 2, 'mention': 0, 'life': 0, 'bolster': 0, 'internal': 0, 'level': 0},
{'court': 1, 'hope': 2, 'mention': 0, 'life': 0, 'bolster': 0, 'internal': 0, 'level': 0},
{'court': 0, 'hope': 1, 'mention': 2, 'life': 1, 'bolster': 1, 'internal': 1, 'level': 1}]
Try this,
from collections import Counter
A = ['philadelphia court excessive disappointed court hope hope',
'hope hope jurisdiction obscures acquittal court',
'mention hope maryland signal held mention problem internal reform life bolster level grievance']
B = ['court', 'hope', 'mention', 'life', 'bolster', 'internal', 'level']
result = [{b: dict(Counter(i.split())).get(b, 0) for b in B} for i in A]
print(result)
output:
[{'court': 2, 'hope': 2, 'mention': 0, 'life': 0, 'bolster': 0, 'internal': 0, 'level': 0}, {'court': 1, 'hope': 2, 'mention': 0, 'life': 0, 'bolster': 0, 'internal': 0, 'level': 0}, {'court': 0, 'hope': 1, 'mention': 2, 'life': 1, 'bolster': 1, 'internal': 1, 'level': 1}]
You always overwrite the existing values in dict dic with dict[j]=t.count(j). You could create a new dict for every i and append it to a list like:
dic=[]
for i in A:
i_dict = {}
t=i.split()
for j in B:
i_dict[j]=t.count(j)
dic.append(i_dict)
print(dic)
To avoid overwriting existing values, check if the entry is already in the dictionary. Try adding:
if j in b:
dic[j] += t.count(j)
else:
dic[j] = t.count(j)

What's the correct way to loop over a list and make a dictionary with dict comprehension in python?

testWords is a list with words. setTestWords is the same list as a set. I want to create a dictionary with Dict Comprehension where I will use the word as key and the count as value. I'm also using the .count.
Example output would be like this:
>>> dictTestWordsCount[:2]
>>> {'hi': 22, 'hello': 99}
This is the line if code I'm using but it seems to crash my notebook every time.
l = {x: testWords.count(x) for x in setTestwords}
Not sure what causes your notebook to crash...
In [62]: txt = "the quick red fox jumped over the lazy brown dog"
In [63]: testWords = txt.split()
In [64]: setTestWords = set(testWords)
In [65]: {x:testWords.count(x) for x in setTestWords}
Out[65]:
{'brown': 1,
'dog': 1,
'fox': 1,
'jumped': 1,
'lazy': 1,
'over': 1,
'quick': 1,
'red': 1,
'the': 2}
Or better, Use collection.defaultdict
from collections import defaultdict
d = defaultdict(int)
for word in txt.split():
d[word]+=1
print(d)
defaultdict(int,
{'brown': 1,
'dog': 1,
'fox': 1,
'jumped': 1,
'lazy': 1,
'over': 1,
'quick': 1,
'red': 1,
'the': 2})

How to get key values from default dictionary in Python?

I have a default dictionary with name df:
defaultdict(<type 'int'>, {u'DE': 1, u'WV': 1, u'HI': 1, u'WY': 1, u'NH': 2, u'NJ': 1, u'NM': 1, u'TX': 1, u'LA': 1, u'NC': 1, u'NE': 1, u'TN': 1, u'RI': 1, u'VA': 1, u'CO': 1, u'AK': 1, u'AR': 1, u'IL': 1, u'GA': 1, u'IA': 1, u'MA': 1, u'ID': 1, u'ME': 1, u'OK': 2, u'MN': 1, u'MI': 1, u'KS': 1, u'MT': 1, u'MS': 1, u'SC': 2, u'KY': 1, u'OR': 1, u'SD': 1})
how do I get the keys of this dictionary whose values are more than 1.
If I do [df[val] for val in df if df[val]>1]
I get the output as [2, 2, 2]
If I print [df.keys() for val in df if df[val]>1] Still I donot get the key values, I need the keys that has values more than 2 like this ['SC', 'OK', 'NH']
How do I do that??
Reading from a dictionary created using defaultdict() is the same as a normal dict.
To get the keys which have values > 1, you would do:
my_dict = defaultdict(...)
print [key for key, value in my_dict.iteritems() if value > 1]
If you're using Python 3 then it's my_dict.items().
We can use list compression method.
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d['HI'] = 1
>>> d['NH'] = 2
>>> d['WY'] = 1
>>> d['OK'] = 2
>>> [i[0] for i in d.items() if i[1]>1]
['NH', 'OK']

Word frequency using dictionary

My problem is I can't figure out how to display the word count using the dictionary and refer
to keys length. For example, consider the following piece of text:
"This is the sample text to get an idea!. "
Then the required output would be
3 2
2 3
0 5
as there are 3 words of length 2, 2 words of length 3, and 0 words of length 5 in the
given sample text.
I got as far as displaying the list the word occurrence frequency:
def word_frequency(filename):
word_count_list = []
word_freq = {}
text = open(filename, "r").read().lower().split()
word_freq = [text.count(p) for p in text]
dictionary = dict(zip(text,word_freq))
return dictionary
print word_frequency("text.txt")
which diplays the dict in this format:
{'all': 3, 'show': 1, 'welcomed': 1, 'not': 2, 'availability': 1, 'television,': 1, '28': 1, 'to': 11, 'has': 2, 'ehealth,': 1, 'do': 1, 'get': 1, 'they': 1, 'milestone': 1, 'kroes,': 1, 'now': 3, 'bringing': 2, 'eu.': 1, 'like': 1, 'states.': 1, 'them.': 1, 'european': 2, 'essential': 1, 'available': 4, 'because': 2, 'people': 3, 'generation': 1, 'economic': 1, '99.4%': 1, 'are': 3, 'eu': 1, 'achievement,': 1, 'said': 3, 'for': 3, 'broadband': 7, 'networks': 2, 'access': 2, 'internet': 1, 'across': 2, 'europe': 1, 'subscriptions': 1, 'million': 1, 'target.': 1, '2020,': 1, 'news': 1, 'neelie': 1, 'by': 1, 'improve': 1, 'fixed': 2, 'of': 8, '100%': 1, '30': 1, 'affordable': 1, 'union,': 2, 'countries.': 1, 'products': 1, 'or': 3, 'speeds': 1, 'cars."': 1, 'via': 1, 'reached': 1, 'cloud': 1, 'from': 1, 'needed': 1, '50%': 1, 'been': 1, 'next': 2, 'households': 3, 'commission': 5, 'live': 1, 'basic': 1, 'was': 1, 'said:': 1, 'more': 1, 'higher.': 1, '30mbps': 2, 'that': 4, 'but': 2, 'aware': 1, '50mbps': 1, 'line': 1, 'statement,': 1, 'with': 2, 'population': 1, "europe's": 1, 'target': 1, 'these': 1, 'reliable': 1, 'work': 1, '96%': 1, 'can': 1, 'ms': 1, 'many': 1, 'further.': 1, 'and': 6, 'computing': 1, 'is': 4, 'it': 2, 'according': 1, 'have': 2, 'in': 5, 'claimed': 1, 'their': 1, 'respective': 1, 'kroes': 1, 'areas.': 1, 'responsible': 1, 'isolated': 1, 'member': 1, '100mbps': 1, 'digital': 2, 'figures': 1, 'out': 1, 'higher': 1, 'development': 1, 'satellite': 4, 'who': 1, 'connected': 2, 'coverage': 2, 'services': 2, 'president': 1, 'a': 1, 'vice': 1, 'mobile': 2, "commission's": 1, 'points': 1, '"access': 1, 'rural': 1, 'the': 16, 'agenda,': 1, 'having': 1}
def freqCounter(infilepath):
answer = {}
with open(infilepath) as infile:
for line in infilepath:
for word in line.strip().split():
l = len(word)
if l not in answer:
answer[l] = 0
answer[l] += 1
return answer
AlternativelyL
import collections
def freqCounter(infilepath):
with open(infilepath) as infile:
return collections.Counter(len(word) for line in infile for word in line.strip().split())
Use collections.Counter
import collections
sentence = "This is the sample text to get an idea"
Count = collections.Counter([len(a) for a in sentence.split()])
print Count
To count how many words in a text have given lengths: size -> frequency distribution, you could use a regular expression to extract words:
#!/usr/bin/env python3
import re
from collections import Counter
text = "This is the sample text to get an idea!. "
words = re.findall(r'\w+', text.casefold())
frequencies = Counter(map(len, words)).most_common()
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in frequencies]))
Output
3 word(s) of length 2
3 word(s) of length 4
2 word(s) of length 3
1 word(s) of length 6
Note: It ignores the punctuation such as !. after 'idea' unlike .split()-based solutions automatically.
To read words from a file, you could read lines and extract words from them in the same way as it done for text in the first code example:
from itertools import chain
with open(filename) as file:
words = chain.from_iterable(re.findall(r'\w+', line.casefold())
for line in file)
# use words here.. (the same as above)
frequencies = Counter(map(len, words)).most_common()
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in frequencies]))
In practice, you could use a list to find the length frequency distribution if you ignore words that are longer than a threshold:
def count_lengths(words, maxlen=100):
frequencies = [0] * (maxlen + 1)
for length in map(len, words):
if length <= maxlen:
frequencies[length] += 1
return frequencies
Example
import re
text = "This is the sample text to get an idea!. "
words = re.findall(r'\w+', text.casefold())
frequencies = count_lengths(words)
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in enumerate(frequencies) if n > 0]))
Output
3 word(s) of length 2
2 word(s) of length 3
3 word(s) of length 4
1 word(s) of length 6

Categories