I am trying to put a list into a dictionary and count the number of occurrences of each word in the list. The only problem I don't understand is when I use the update function, it takes x as a dictionary key, when I want x to be the x value of list_ . I am new to python so any advice is appreciated. Thanks
list_ = ["hello", "there", "friend", "hello"]
d = {}
for x in list_:
d.update(x = list_.count(x))
Use a Counter object if you want a simple way of converting a list of items to a dictionary which contains a mapping of list_entry: number_of_occurences .
>>> from collections import Counter
>>> words = ['hello', 'there', 'friend', 'hello']
>>> c = Counter(words)
>>> print(c)
Counter({'hello': 2, 'there': 1, 'friend': 1})
>>> print(dict(c))
{'there': 1, 'hello': 2, 'friend': 1}
An option would be using dictionary comprehension with list.count() like this:
list_ = ["hello", "there", "friend", "hello"]
d = {item: list_.count(item) for item in list_}
Output:
>>> d
{'hello': 2, 'there': 1, 'friend': 1}
But the best option should be collections.Counter() used in #AK47's solution.
Related
This is the question - '''Given a list of strings, create a dictionary, where keys are
the strings and the values are the number of times the corresponding string
repeats in the list. Do not use any additional libraries.
Example:
>>> problem1(['Hello', 'Hello', 'Friends', 'Friends', 'Friends', 'Home'])
{'Hello': 2, 'Friends': 3, 'Home': 1}
This is my code:
def problem2(mystrings):
mydct = {}
for x in mystrings:
mydct = [mystrings]
for in mydct:
return mydct
print (problem2(['Hello', 'Hello', 'Friends', 'Friends', 'Friends', 'Home']))
I need help on how to make the keys into strings and the values into the number of times the string shows up, I can't figure out how to do it. Any help is appreciated.
You could simply use collections.Counter
>>> l = ['Hello', 'Hello', 'Friends', 'Friends', 'Friends', 'Home']
>>> c = Counter(l)
>>> c
Counter({'Friends': 3, 'Hello': 2, 'Home': 1})
>>> dict(c)
{'Hello': 2, 'Friends': 3, 'Home': 1}
so your function should be like
from collections import Counter
def problem2(mystrings):
return dict(Counter(mystrings))
if for any reasons, you are not allowed collections.Counter which is part of the Python's standard library, do the following:
def problem2(mystrings):
counter = {}
for word in mystrings:
counter[word] = counter.get(word, 0) + 1
return counter
words = ['Hello', 'Hello', 'Friends', 'Friends', 'Friends', 'Home']
result = {}
for word in words:
if word not in result:
result[word] = 1
else:
result[word] += 1
print(result) # {'Hello': 2, 'Friends': 3, 'Home': 1}
Note that we first want to check whether there's a key already in the dictionary because if we just run result[word] += 1 we'll get a KeyError since there's no key in the dictionary with that name.
I have this list:
list1 = [{'Hello'}, {'Welcome'}, {'BYE'}]
And I have a dictionary:
dict1 = {'Welcome': 5, 'BYE': 3, 'How are you': 3}
I would like the result to be something like:
dict2 = {'Welcome': 5, 'BYE': 3}
According to this post.
I tried:
dict2 = {k: dict1[k] for k in (dict1.keys() & list1)}
But it says:
TypeError: unhashable type: 'set'
Do I need first to make list1, like this:
list1 = ['Hello', 'Welcome', 'BYE']
And if this is the problem, then how?
You can make a set of value from the list1:
list1 = [{"Hello"}, {"Welcome"}, {"BYE"}]
dict1 = {"Welcome": 5, "BYE": 3, "How are you": 3}
dict2 = {k: dict1[k] for k in (dict1.keys() & {v for s in list1 for v in s})}
print(dict2)
Prints:
{'BYE': 3, 'Welcome': 5}
If you are required to start with that weird single-item-set-in-list list, I'd flatten it with itertools.chain.from_iterable into a simple list. From there, with a simple dictionary comprehension, you create a new dictionary for each key from list1 that exists in dict1:
>>> from itertools import chain
>>> list1 = [{'Hello'}, {'Welcome'}, {'BYE'}]
>>> dict1 = {'Welcome': 5, 'BYE': 3, 'How are you': 3}
>>> list(chain.from_iterable(list1))
['Hello', 'Welcome', 'BYE']
>>> {k: dict1[k] for k in chain.from_iterable(list1) if k in dict1}
{'Welcome': 5, 'BYE': 3}
Yes your variable list1 is a list of set, you might do this:
dict2 = {k: dict1[k] for k in set(dict1.keys()).intersection(set().union(*list1))}
I'n being warned that this question has been frequently downvoted, but I haven't seen a solution for my particular problem.
I have a dictionary that looks like this:
d = {'a': [['I', 'said', 'that'], ['said', 'I']],
'b':[['she', 'is'], ['he', 'was']]}
I would like for the output to be a dictionary with the original keys and then a dictionary containing a value that indicates the count for each of the words (e.g., {'a':{'I':2, 'said':2, 'that':1} and so on with b.
If the values were in a list instead of a sublist, I could get what I wanted just by using Counter:
d2 = {'a': ['I','said','that', 'I'],'b': ['she','was','here']}
from collections import Counter
counts = {k: Counter(v) for k, v in d2.items()}
However, I'm getting TypeError: unhashable type: 'list' because the lists containing the values I want to count are sublists and the list that contains them isn't hashable.
I also know that if I just had sublists, I could get what I want with something like:
lst = [['I', 'said', 'that'], ['said', 'I']]
Counter(word for sublist in lst for word in sublist)
But I just can't figure out how to combine these ideas to solve my problem (and I guess it lies in combining these two).
I did try this
for key, values in d.items():
flat_list = [item for sublist in values for item in sublist]
new_dict = {key: flat_list}
counts = {k: Counter(v) for k, v in new_dict.items()}
But that only gives me the counts for the second list (because the flat_list itself only returns the value for the second key.
To combine the two solutions, just replace Counter(v) from your first solution with the second solution.
from collections import Counter
d = {'a': [['I', 'said', 'that'], ['said', 'I']],
'b': [['she', 'is'], ['he', 'was']]}
counts = {k: Counter(word
for sublist in lst
for word in sublist)
for k, lst in d.items()}
print(counts)
Output:
{'a': Counter({'I': 2, 'said': 2, 'that': 1}),
'b': Counter({'she': 1, 'is': 1, 'he': 1, 'was': 1})}
You can merge your sublists to get your d2: d2 = {k: reduce(list.__add__, d[k], []) for k in d}.
In python3, you will need to from functools import reduce
Use both itertools and collections modules for this. Flatten the nested lists with itertools.chain and count with collections.Counter
import itertools, collections
d = {
'a': [['I', 'said', 'that'], ['said', 'I']],
'b':[['she', 'is'], ['he', 'was']]
}
out_dict = {}
for d_key, data in d.items():
counter = collections.Counter(itertools.chain(*data))
out_dict[d_key] = counter
print out_dict
Output:
{'a': Counter({'I': 2, 'said': 2, 'that': 1}),
'b': Counter({'she': 1, 'is': 1, 'he': 1, 'was': 1})}
I am trying to solve a difficult problem and am getting lost.
Here's what I'm supposed to do:
INPUT: file
OUTPUT: dictionary
Return a dictionary whose keys are all the words in the file (broken by
whitespace). The value for each word is a dictionary containing each word
that can follow the key and a count for the number of times it follows it.
You should lowercase everything.
Use strip and string.punctuation to strip the punctuation from the words.
Example:
>>> #example.txt is a file containing: "The cat chased the dog."
>>> with open('../data/example.txt') as f:
... word_counts(f)
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
Here's all I have so far, in trying to at least pull out the correct words:
def word_counts(f):
i = 0
orgwordlist = f.split()
for word in orgwordlist:
if i<len(orgwordlist)-1:
print orgwordlist[i]
print orgwordlist[i+1]
with open('../data/example.txt') as f:
word_counts(f)
I'm thinking I need to somehow use the .count method and eventually zip some dictionaries together, but I'm not sure how to count the second words for each first word.
I know I'm nowhere near solving the problem, but trying to take it one step at a time. Any help is appreciated, even just tips pointing in the right direction.
We can solve this in two passes:
in a first pass, we construct a Counter and count the tuples of two consecutive words using zip(..); and
then we turn that Counter in a dictionary of dictionaries.
This results in the following code:
from collections import Counter, defaultdict
def word_counts(f):
st = f.read().lower().split()
ctr = Counter(zip(st,st[1:]))
dc = defaultdict(dict)
for (k1,k2),v in ctr.items():
dc[k1][k2] = v
return dict(dc)
We can do this in one pass:
Use a defaultdict as a counter.
Iterate over bigrams, counting in-place
So... For the sake of brevity, we'll leave the normalization and cleaning out:
>>> from collections import defaultdict
>>> counter = defaultdict(lambda: defaultdict(int))
>>> s = 'the dog chased the cat'
>>> tokens = s.split()
>>> from itertools import islice
>>> for a, b in zip(tokens, islice(tokens, 1, None)):
... counter[a][b] += 1
...
>>> counter
defaultdict(<function <lambda> at 0x102078950>, {'the': defaultdict(<class 'int'>, {'cat': 1, 'dog': 1}), 'dog': defaultdict(<class 'int'>, {'chased': 1}), 'chased': defaultdict(<class 'int'>, {'the': 1})})
And a more readable output:
>>> {k:dict(v) for k,v in counter.items()}
{'the': {'cat': 1, 'dog': 1}, 'dog': {'chased': 1}, 'chased': {'the': 1}}
>>>
Firstly that is some brave cat who chased a dog! Secondly it is a little tricky because we don't interact with this type of parsing every day. Here's the code:
k = "The cat chased the dog."
sp = k.split()
res = {}
prev = ''
for w in sp:
word = w.lower().replace('.', '')
if prev in res:
if word.lower() in res[prev]:
res[prev][word] += 1
else:
res[prev][word] = 1
elif not prev == '':
res[prev] = {word: 1}
prev = word
print res
You could:
Create a list of stripped words;
Create word pairs with either zip(list_, list_[1:]) or any method that iterates by pairs;
Create a dict of first words in the pair followed by a list of the second word of the pair;
Count the words in the list.
Like so:
from collections import Counter
s="The cat chased the dog."
li=[w.lower().strip('.,') for w in s.split()] # list of the words
di={}
for a,b in zip(li,li[1:]): # words by pairs
di.setdefault(a,[]).append(b) # list of the words following first
di={k:dict(Counter(v)) for k,v in di.items()} # count the words
>>> di
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
If you have a file, just read from the file into a string and proceed.
Alternatively, you could
Same first two steps
Use a defaultdict with a Counter as a factory.
Like so:
from collections import Counter, defaultdict
li=[w.lower().strip('.,') for w in s.split()]
dd=defaultdict(Counter)
for a,b in zip(li, li[1:]):
dd[a][b]+=1
>>> dict(dd)
{'the': Counter({'dog': 1, 'cat': 1}), 'chased': Counter({'the': 1}), 'cat': Counter({'chased': 1})}
Or,
>>> {k:dict(v) for k,v in dd.items()}
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
I think this is a one pass solution without importing defaultdict. Also it has punctuation stripping. I have tried to optimize it for large files or repeated opening of files
from itertools import islice
class defaultdictint(dict):
def __missing__(self,k):
r = self[k] = 0
return r
class defaultdictdict(dict):
def __missing__(self,k):
r = self[k] = defaultdictint()
return r
keep = set('1234567890abcdefghijklmnopqrstuvwxy ABCDEFGHIJKLMNOPQRSTUVWXYZ')
def count_words(file):
d = defaultdictdict()
with open(file,"r") as f:
for line in f:
line = ''.join(filter(keep.__contains__,line)).strip().lower().split()
for one,two in zip(line,islice(line,1,None)):
d[one][two] += 1
return d
print (count_words("example.txt"))
will output:
{'chased': {'the': 1}, 'cat': {'chased': 1}, 'the': {'dog': 1, 'cat': 1}}
I have a list of strings. I am willing to create a dictionary which its keys are all the strings in the list (each string is a key of course).
Now for the values: The value corresponding to each key will be the string which comes next after the key string on the list. The values will be from list type.
Remarks: The last word won't be included in the dictionary.
a key won't appear twice on the dic. if there are more than one values for a certain key, they will be added to the exist value's list of the key.
The order doesn't matter (the dictionary can be not sorted if it makes the job easier).
Example:
for the list:
List = ['today','is','worm','and','dry']
the dictionary will be:
Dic={'today': ['is'], 'is': ['worm'],'worm': ['and'], 'and':['dry']}
Thanks,
l = ['today','is','worm','and','dry']
d = {}
for w1, w2 in zip(l, l[1:]):
d.setdefault(w1, []).append(w2)
# d == {'and': ['dry'], 'is': ['worm'], 'today': ['is'], 'worm': ['and']}
Not very fine but it is working
>>> List = ['today','is','worm','and','dry']
>>> Dic ={}
>>> key = None
>>> for item in List:
... if key:
... Dic.update({key:item})
... key=item
...
>>> Dic
{'and': 'dry', 'is': 'worm', 'worm': 'and', 'today': 'is'}
>>>
(based on #eumiro's answer)
>>> l = ['today','is','worm','and','dry']
>>> print dict(zip(l, l[1:]))
{'and': 'dry', 'is': 'worm', 'worm': 'and', 'today': 'is'}
>>> print dict(zip(l, l[1:] + [None]))
{'and': 'dry', 'dry': None, 'is': 'worm', 'worm': 'and', 'today': 'is'}
>>> print dict((k, [v]) for (k, v) in zip(l, l[1:] + [None]))
{'and': ['dry'], 'dry': [None], 'is': ['worm'], 'worm': ['and'], 'today': ['is']}
Try this:
lst = ['today','is','worm','and','dry']
dic = {}
for k in xrange(len(lst) - 1):
dic[lst[k]] = [lst[k+1]]
It's an efficient answer, since it doesn't create any additional lists for building the dictionary. You can check the result, it's what was expected:
dic == {'today': ['is'], 'is': ['worm'],'worm': ['and'], 'and':['dry']}
> True
You can achieve it by the following code:
Dic = dict((key, [item]) for key, item in zip(List, List[1:]))
and the result will be the following:
>>> Dic
{'and': ['dry'], 'is': ['worm'], 'worm': ['and'], 'today': ['is']}
If you need more performant solution, you can use izip() function:
from itertools import izip
Dic = dict((key, [item]) for key, item in izip(List, List[1:]))
which will give the same result, but in a more efficient way.
But please follow naming conventions for Python (eg. call your variables my_dict and my_list, but preferably give them more meaningful names) and do not overwrite list and dict keywords (that is in case you would like to give your variables such names).