How to count words by grouping similar words in Python?

How to count words by grouping similar words in Python? - python

I have a list
list_1 = ['warning', 'media', 'media-other','media-other','warning-type2','threat','threat-type1]
I need to count the occurrence of different types as in the following dictionary
dict_1 = {'warning':0, 'media':0, 'threat':0}
I need to select similar types and increase the count. media and media-other should be counted as media. warning and warning-type2 should be counted as warning
The output of dict_1 after counting should be {'warning':2, 'media':3, 'threat':2}

list_1 = ['warning', 'media', 'media-other','media-other','warning-type2','threat','threat-type1']
list_2 = [x.split('-')[0] for x in list_1]
dict_1 = {}
for key in list_2:
if key not in dict_1.keys():
dict_1[key] = list_2.count(key)
print(dict_1)

Assuming the part before any hyphen gives you the 'type' of the items in the list, you can use split and collections.Counter to count them:
from collections import Counter
Counter(word.split("-")[0] for word in list_1)
# returns Counter({'warning': 2, 'media': 3, 'threat': 2})

Related

compare a list with values in dictionary

I have a dictionary contains lists of values and a list:
dict1={'first':['hi','nice'], 'second':['night','moon']}
list1= [ 'nice','moon','hi']
I want to compare the value in the dictionary with the list1 and make a counter for the keys if the value of each key appeared in the list:
the output should like this:
first 2
second 1
here is my code:
count = 0
for list_item in list1:
for dict_v in dict1.values():
if list_item.split() == dict_v:
count+= 1
print(dict.keys,count)
any help? Thanks in advance

I would make a set out of list1 for the O(1) lookup time and access to the intersection method. Then employ a dict comprehension.
>>> dict1={'first':['hi','nice'], 'second':['night','moon']}
>>> list1= [ 'nice','moon','hi']
>>>
>>> set1 = set(list1)
>>> {k:len(set1.intersection(v)) for k, v in dict1.items()}
{'first': 2, 'second': 1}
intersection accepts any iterable argument, so creating sets from the values of dict1 is not necessary.

You can use the following dict comprehension:
{k: sum(1 for i in l if i in list1) for k, l in dict1.items()}
Given your sample input, this returns:
{'first': 2, 'second': 1}

You can get the intersection of your list and the values of dict1 using sets:
for key in dict1.keys():
count = len(set(dict1[key]) & set(list1))
print("{0}: {1}".format(key,count))

While brevity can be great, I thought it would be good to also provide an example that is as close to the OPs original code as possible:
# notice conversion to set for O(1) lookup
# instead of O(n) lookup where n is the size of the list of desired items
dict1={'first':['hi','nice'], 'second':['night','moon']}
set1= set([ 'nice','moon','hi'])
for key, values in dict1.items():
counter = 0
for val in values:
if val in set1:
counter += 1
print key, counter

Using collections.Counter
from collections import Counter
c = Counter(k for k in dict1 for i in list1 if i in dict1[k])
# Counter({'first': 2, 'second': 1})

The most simplest and basic approach would be:
dict1={'first':['hi','nice'], 'second':['night','moon']}
list1= [ 'nice','moon','hi']
listkeys=list(dict1.keys())
listvalues=list(dict1.values())
for i in range(0,len(listvalues)):
ctr=0
for j in range(0,len(listvalues[i])):
for k in range(0,len(list1)):
if list1[k]==listvalues[i][j]:
ctr+=1
print(listkeys[i],ctr)
Hope it helps.

How do I represent a dictionary from a list assuming that every other number by its side is its value?

I have a list that looks like this,
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
In this list, each number after the word represents the value of the word. I want to represent this list in a dictionary such that the value of each repeated word gets added. I want the dictionary to be like this:
dict = {'hello':'2', 'go':'14', 'sit':'6','line':'3','play':'0'}
In the list 'go' occurs twice with two different values so we add the number that occur just after the word, similarly for other words.
This is my approach, it does not seem to work.
import csv
with open('teest.txt', 'rb') as input:
count = {}
my_file = input.read()
listt = my_file.split()
i = i + 2
for i in range(len(listt)-1):
if listt[i] in count:
count[listt[i]] = count[listt[i]] + listt[i+1]
else:
count[listt[i]] = listt[i+1]

Counting occurrences of unique keys is usually possible with defaultdict.
import collections as ct
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
dd = ct.defaultdict(int)
iterable = iter(lista)
for word in iterable:
dd[word] += int(next(iterable))
dd
# defaultdict(int, {'go': 14, 'hello': 2, 'line': 3, 'play': 0, 'sit': 6})
Here we initialize the defaultdict to accept integers. We make a list iterator, both creates a generator and allows us to call next() on it. Since the word and value occur in consecutive pairs in the list, we will iterate and immediately call next() to extract these values in sync. We assign these items as (key, value) pairs to the defaultdict, which happens to keep count.
Convert the integers to strings if this is required:
{k: str(v) for k, v in dd.items()}
# {'go': '14', 'hello': '2', 'line': '3', 'play': '0', 'sit': '6'}
An alternate tool may be the Counter (see #DexJ's answer), which is related to this type of defaultdict. In fact, Counter() can substitute defaultdict(int) here and return the same result.

You can "stride" the array 2 items at a time using a range(). The optional 3rd argument in a range lets you define a "skip".
range(start, stop[, step])
Using this, we can create a range of indexes that skip ahead 2 at a time, for the entire length of your list. We can then ask the list what "name" is at that index lista[i] and what "value" is after it lista[i + 1].
new_dict = {}
for i in range(0, len(lista), 2):
name = lista[i]
value = lista[i + 1]
# the name already exists
# convert their values to numbers, add them, then convert back to a string
if name in new_dict:
new_dict[name] = str( int(new_dict[name]) + int(value) )
# the name doesn't exist
# simply append it with the value
else:
new_dict[name] = value

as explained by #Soviut you may use range() function with step value 2 to reach to word directly. as I seen in your list you have value stored as string so I have converted them to integers.
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
data = {}
for i in range(0, len(lista), 2): # increase searching with step of 2 from 0 i.e. 0,2,4,...
if lista[i] in data.keys(): # this condition checks whether your element exist in dictionary key or not
data[lista[i]] = int(data[lista[i]]) + int(lista[i+1])
else:
data[lista[i]] = int(lista[i+1])
print(data)
Output
{'hello': 2, 'go': 14, 'sit': 6, 'line': 3, 'play': 0}

lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
dictionary = {}
for keyword, value in zip(*[iter(lista)]*2): # iterate two at a time
if keyword in dictionary: # if the key is present, add to the existing sum
dictionary[keyword] = dictionary[keyword] + int(value)
else: # if not present, set the value for the first time
dictionary[keyword] = int(value)
print(dictionary)
Output:
{'hello': 2, 'go': 14, 'sit': 6, 'line': 3, 'play': 0}

Another solution using iter(), itertools.zip_longest() and itertools.groupby() functions:
import itertools
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
it = iter(lista)
d = {k: sum(int(_[1]) for _ in g)
for k,g in itertools.groupby(sorted(itertools.zip_longest(it, it)), key=lambda x: x[0])}
print(d)
The output:
{'line': 3, 'sit': 6, 'hello': 2, 'play': 0, 'go': 14}

You can use range(start,end,steps) to get endpoint and split list and just use Counter() from collections to sum duplicate key's value and you're done :)
here yourdict will be {'go': 14, 'line': 3, 'sit': 6, 'play': 0, 'hello': 2}
from collections import Counter
counter_obj = Counter()
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
items, start = [], 0
for end in range(2,len(lista)+2,2):
print end
items.append(lista[start:end])
start = end
for item in items:
counter_obj[item[0]] += int(item[1])
yourdict = dict(counter_obj)
print yourdict

Python - Splitting dictionary into dictionaries with the same values?

Say I have a dictionary with many items that have the same values; for example:
dict = {'hello':'a', 'goodbye':'z', 'bonjour':'a', 'au revoir':'z', 'how are you':'m'}
How would I split the dictionary into dictionaries (in this case, three dictionaries) with the same values? In the example, I want to end up with this:
dict1 = {'hello':'a', 'bonjour':'a'}
dict2 = {'goodbye':'z', 'au revoir':'z'}
dict3 = {'how are you':'m'}

You can use itertools.groupby to collect by the common values, then create dict objects for each group within a list comprehension.
>>> from itertools import groupby
>>> import operator
>>> by_value = operator.itemgetter(1)
>>> [dict(g) for k, g in groupby(sorted(d.items(), key = by_value), by_value)]
[{'hello': 'a', 'bonjour': 'a'},
{'how are you': 'm'},
{'goodbye': 'z', 'au revoir': 'z'}]

Another way without importing any modules is as follows:
def split_dict(d):
unique_vals = list(set(d.values()))
split_dicts = []
for i in range(len(unique_vals)):
unique_dict = {}
for key in d:
if d[key] == unique_vals[i]:
unique_dict[key] = d[key]
split_dicts.append(unique_dict)
return split_dicts
For each unique value in the input dictionary, we create a dictionary and add the key values pairs from the input dictionary where the value is equal to that value. We then append each dictionary to a list, which is finally returned.

dict comprehension with unique keys out of list

i have used list comprehensions not very often but i was wondering if the below lines can be a one liner (yes the code is already small, but i am curious):
lst = ['hi', 'hello', 'bob', 'hello', 'bob', 'hello']
for index in lst:
data[index] = data.get(index,0) + 1
data would be: {'hi':1, 'hello':3, 'bob':2}
something:
d = { ... for index in lst } ????
I have tried some comprehensions but they don't work:
d = { index:key for index in lst if index in d: key = key + 1 else key = 1 }
Thanks in adv.

Simply use collections.Counter
A Counter is a dict subclass for counting hashable objects. It is an
unordered collection where elements are stored as dictionary keys and
their counts are stored as dictionary values. Counts are allowed to be
any integer value including zero or negative counts. The Counter class
is similar to bags or multisets in other languages.
import collections
l = ['hi', 'hello', 'bob', 'hello', 'bob', 'hello']
c = collections.Counter(l)
assert c['hello'] == 3

Adding a random key from a dictionary to a list in python 3.3

I'm struggling to figure out what's wrong with my code. I am trying to randomly select a key from several dictionaries, then add that key to the beginning of a list. For example:
import random
list1 = ["list"]
dict1 = {"key1" : "def1", "key2" : "def2", "key3" : "def3"}
dict2 = {"key4" : "def4", "key5" : "def5", "key6" : "def6"}
DICTIONARIES = [dict1, dict2]
value = random.choice(DICTIONARIES)
key = random.choice(list(value.keys()))
list1[:0] = key
print (list1)
What I want, is a print result of [key5, list]. What I get is ['k', 'e', 'y', '5', list]
Any ideas? is there a better way to search multiple dictionaries for a random key that will produce the desired results?
Thanks.

I suppose that item variable is the same as list1. If yes, try this:
list1[:0] = [key]
Or, alternatively you may use the list.insert function instead of slice assignment:
list1.insert(0, key)
Your version was working like the following:
Before assignment: list1 = ['list'], key = 'key5'
Left side of assignment refers to element before 'list' in list1
Right side refers to value of key, which is "key5".
"key5" is a sequence of k, e, y and 5.
So, by list1[:0] = key we concatenate "key5" sequence and list1.
But in list1[:0] = [key] we concatenate [key] sequence (that has only one element that equals to "key5") and list1. And that's what we actually want.

Something like this:
import random
list1 = ["list"]
dict1 = {"key1" : "def1", "key2" : "def2", "key3" : "def3"}
dict2 = {"key4" : "def4", "key5" : "def5", "key6" : "def6"}
all_keys = list(dict1.keys() | dict2.keys()) #creates a list containing all keys
key = random.choice(all_keys)
list1.insert(0,key)
print (list1) #prints ['key1', 'list']

In Python string objects are iterable, so, this line
list1[:0] = key
and this
list1[:0] = list(key)
are equals, and means replace all elements from 0th position to 0th position in list1 with elements from key. You should use insert function, or make a list with key element and assign it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to count words by grouping similar words in Python? - python

list_1 = ['warning', 'media', 'media-other','media-other','warning-type2','threat','threat-type1'] list_2 = [x.split('-')[0] for x in list_1] dict_1 = {} for key in list_2: if key not in dict_1.keys(): dict_1[key] = list_2.count(key) print(dict_1)

Assuming the part before any hyphen gives you the 'type' of the items in the list, you can use split and collections.Counter to count them: from collections import Counter Counter(word.split("-")[0] for word in list_1) # returns Counter({'warning': 2, 'media': 3, 'threat': 2})

Related

compare a list with values in dictionary

How do I represent a dictionary from a list assuming that every other number by its side is its value?

Python - Splitting dictionary into dictionaries with the same values?

dict comprehension with unique keys out of list

Adding a random key from a dictionary to a list in python 3.3

Categories

Resources