Count occurrence of strings in list of lists - python

I want to count the number of times a string has occurred in a list which is in another list and store it in a list of dictionary where each dictionary has count of a list.
Ex,
list = [['Sam','John','Alex','Sam','Alex'],['Max','Sam','Max']...]
and I want my list of dictionaries to be like:
count_list = [{'Sam':2,'Alex':2,'John':1}, {'Max':2, 'Sam':1}..]
I am iterating through each list to count number of times each string has occurred and adding each result to dict. But I end up having different result every time and not the correct values.
count_list = []
for l in list :
d = {}
for str in l:
if str not in d:
d[str] = l.count(str)
count_list.append(d)
Any help would be useful.Thanks.

It would be easier to use collections.Counter() here:
>>> from collections import Counter
>>> lst = [["Sam", "John", "Alex", "Sam", "Alex"], ["Max", "Sam", "Max"]]
>>> list(map(Counter, lst))
[Counter({'Sam': 2, 'Alex': 2, 'John': 1}), Counter({'Max': 2, 'Sam': 1})]
You could also use a list comprehension instead of using map() if thats easier to understand:
>>> [Counter(l) for l in lst]
[Counter({'Sam': 2, 'Alex': 2, 'John': 1}), Counter({'Max': 2, 'Sam': 1})]
Note: Counter is a subclass of dict, so you can treat them like normal dictionaries.
You can always cast to dict() if you want to as well:
>>> [dict(Counter(l)) for l in lst]
[{'Sam': 2, 'John': 1, 'Alex': 2}, {'Max': 2, 'Sam': 1}]
You should also not use list as a variable name, since it shadows the builtin function list().

Currently, you are doing the following:
count_list = []
for l in list :
d = {}
for str in l:
if str not in d:
d[str] = l.count(str)
count_list.append(d)
Note that you are appending the dictionary for each string in the sub lists, rather than one dictionary per sub list.
Doing the following should address the issue:
count_list = []
for l in list :
d = {}
for str in l:
if str not in d:
d[str] = l.count(str)
count_list.append(d)

Related

How to create a dictionary out of a list of lists in python?

Let's suppose I have the following list made out of lists
list1 = [['a','b'],['a'],['b','c'],['c','d'],['b'], ['a','d']]
I am wondering if there is a way to convert every element of list1 in a dictionary where all the new dictionaries will use the same key. E.g: if ['a']
gets to be {'a':1}, and ['b'] gets to be {'b':2}, I would like for all keys a the value of 1 and for all keys b the value of 2. Therefore, when creating the dictionary of ['a','b'], I would like to turn into {'a':1, 'b':2}.
What I have found so far are ways to create a dictionary out of lists of lists but using the first element as the key and the rest of the list as the value:
Please note that's not what I am interested in.
The result I would want to obtain from list1 is something like:
dict_list1 = [{'a':1,'b':2}, {'a':1}, {'b':2,'c':3}, {'c':3,'d':4}, {'b':2}, {'a':1,'d':4}]
I am not that interested in the items being that numbers but in the numbers being the same for each different key.
You need to declare your mapping first:
mapping = dict(a=1, b=2, c=3, d=4)
Then, you can just use dict comprehension:
[{e: mapping[e] for e in li} for li in list1]
# [{'a': 1, 'b': 2}, {'a': 1}, {'b': 2, 'c': 3}, {'c': 3, 'd': 4}, {'b': 2}, {'a': 1, 'd': 4}]
Using chain and OrderedDict you can do auto mapping
from itertools import chain
from collections import OrderedDict
list1 = [['a','b'],['a'],['b','c'],['c','d'],['b'], ['a','d']]
# do flat list for auto index
flat_list = list(chain(*list1))
# remove duplicates
flat_list = list(OrderedDict.fromkeys(flat_list))
mapping = {x:flat_list.index(x)+1 for x in set(flat_list)}
[{e: mapping[e] for e in li} for li in list1]
Here a try with ord() also it will work for both capital and lower letters :
[{e: ord(e)%32 for e in li} for li in list1]

Python dictionary comprehension: assign value to key, where value is a list

Example:
dictionary = {"key":[5, "string1"], "key2":[2, "string2"], "key3":[3, "string1"]}
After applying this dict comprehension:
another_dictionary = {key:value for (value,key) in dictionary.values()}
The result is like this:
another_dictionary = {"string1": 5, "string2": 2}
In other words, it doesn't sum up integer values under the same key which was a list item.
=================================================================
Desired result:
another_dictionary = {"string1": 8, "string2": 2}
You can use collections.defaultdict for this:
from collections import defaultdict
dictionary = {"key":[5, "string1"], "key2":[2, "string2"], "key3":[3, "string1"]}
d = defaultdict(int)
for num, cat in dictionary.values():
d[cat] += num
print(d)
defaultdict(<class 'int'>, {'string1': 8, 'string2': 2})
The reason your code does not work is you have not specified any summation or aggregation logic. This will require either some kind of grouping operation or, as here, iterating and adding to relevant items in a new dictionary.
You can also use itertools.groupby:
import itertools
dictionary = {"key":[5, "string1"], "key2":[2, "string2"], "key3":[3, "string1"]}
d= {a:sum(c for _, [c, d] in b) for a, b in itertools.groupby(sorted(dictionary.items(), key=lambda x:x[-1][-1]), key=lambda x:x[-1][-1])}
Output:
{'string2': 2, 'string1': 8}

How do I represent a dictionary from a list assuming that every other number by its side is its value?

I have a list that looks like this,
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
In this list, each number after the word represents the value of the word. I want to represent this list in a dictionary such that the value of each repeated word gets added. I want the dictionary to be like this:
dict = {'hello':'2', 'go':'14', 'sit':'6','line':'3','play':'0'}
In the list 'go' occurs twice with two different values so we add the number that occur just after the word, similarly for other words.
This is my approach, it does not seem to work.
import csv
with open('teest.txt', 'rb') as input:
count = {}
my_file = input.read()
listt = my_file.split()
i = i + 2
for i in range(len(listt)-1):
if listt[i] in count:
count[listt[i]] = count[listt[i]] + listt[i+1]
else:
count[listt[i]] = listt[i+1]
Counting occurrences of unique keys is usually possible with defaultdict.
import collections as ct
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
dd = ct.defaultdict(int)
iterable = iter(lista)
for word in iterable:
dd[word] += int(next(iterable))
dd
# defaultdict(int, {'go': 14, 'hello': 2, 'line': 3, 'play': 0, 'sit': 6})
Here we initialize the defaultdict to accept integers. We make a list iterator, both creates a generator and allows us to call next() on it. Since the word and value occur in consecutive pairs in the list, we will iterate and immediately call next() to extract these values in sync. We assign these items as (key, value) pairs to the defaultdict, which happens to keep count.
Convert the integers to strings if this is required:
{k: str(v) for k, v in dd.items()}
# {'go': '14', 'hello': '2', 'line': '3', 'play': '0', 'sit': '6'}
An alternate tool may be the Counter (see #DexJ's answer), which is related to this type of defaultdict. In fact, Counter() can substitute defaultdict(int) here and return the same result.
You can "stride" the array 2 items at a time using a range(). The optional 3rd argument in a range lets you define a "skip".
range(start, stop[, step])
Using this, we can create a range of indexes that skip ahead 2 at a time, for the entire length of your list. We can then ask the list what "name" is at that index lista[i] and what "value" is after it lista[i + 1].
new_dict = {}
for i in range(0, len(lista), 2):
name = lista[i]
value = lista[i + 1]
# the name already exists
# convert their values to numbers, add them, then convert back to a string
if name in new_dict:
new_dict[name] = str( int(new_dict[name]) + int(value) )
# the name doesn't exist
# simply append it with the value
else:
new_dict[name] = value
as explained by #Soviut you may use range() function with step value 2 to reach to word directly. as I seen in your list you have value stored as string so I have converted them to integers.
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
data = {}
for i in range(0, len(lista), 2): # increase searching with step of 2 from 0 i.e. 0,2,4,...
if lista[i] in data.keys(): # this condition checks whether your element exist in dictionary key or not
data[lista[i]] = int(data[lista[i]]) + int(lista[i+1])
else:
data[lista[i]] = int(lista[i+1])
print(data)
Output
{'hello': 2, 'go': 14, 'sit': 6, 'line': 3, 'play': 0}
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
dictionary = {}
for keyword, value in zip(*[iter(lista)]*2): # iterate two at a time
if keyword in dictionary: # if the key is present, add to the existing sum
dictionary[keyword] = dictionary[keyword] + int(value)
else: # if not present, set the value for the first time
dictionary[keyword] = int(value)
print(dictionary)
Output:
{'hello': 2, 'go': 14, 'sit': 6, 'line': 3, 'play': 0}
Another solution using iter(), itertools.zip_longest() and itertools.groupby() functions:
import itertools
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
it = iter(lista)
d = {k: sum(int(_[1]) for _ in g)
for k,g in itertools.groupby(sorted(itertools.zip_longest(it, it)), key=lambda x: x[0])}
print(d)
The output:
{'line': 3, 'sit': 6, 'hello': 2, 'play': 0, 'go': 14}
You can use range(start,end,steps) to get endpoint and split list and just use Counter() from collections to sum duplicate key's value and you're done :)
here yourdict will be {'go': 14, 'line': 3, 'sit': 6, 'play': 0, 'hello': 2}
from collections import Counter
counter_obj = Counter()
lista = ['hello','2','go','5','sit','4','line','3','sit','2', 'go','9','play','0']
items, start = [], 0
for end in range(2,len(lista)+2,2):
print end
items.append(lista[start:end])
start = end
for item in items:
counter_obj[item[0]] += int(item[1])
yourdict = dict(counter_obj)
print yourdict

Removing dictionaries from a list on the basis of duplicate value of key

I am new to Python. Suppose i have the following list of dictionaries:
mydictList= [{'a':1,'b':2,'c':3},{'a':2,'b':2,'c':4},{'a':2,'b':3,'c':4}]
From the above list, i want to remove dictionaries with same value of key b. So the resultant list should be:
mydictList = [{'a':1,'b':2,'c':3},{'a':2,'b':3,'c':4}]
You can create a new dictionary based on the value of b, iterating the mydictList backwards (since you want to retain the first value of b), and get only the values in the dictionary, like this
>>> {item['b'] : item for item in reversed(mydictList)}.values()
[{'a': 1, 'c': 3, 'b': 2}, {'a': 2, 'c': 4, 'b': 3}]
If you are using Python 3.x, you might want to use list function over the dictionary values, like this
>>> list({item['b'] : item for item in reversed(mydictList)}.values())
Note: This solution may not maintain the order of the dictionaries.
First, sort the list by b-values (Python's sorting algorithm is stable, so dictionaries with identical b values will retain their relative order).
from operator import itemgetter
tmp1 = sorted(mydictList, key=itemgetter('b'))
Next, use itertools.groupby to create subiterators that iterate over dictionaries with the same b value.
import itertools
tmp2 = itertools.groupby(tmp1, key=itemgetter('b))
Finally, create a new list that contains only the first element of each subiterator:
# Each x is a tuple (some-b-value, iterator-over-dicts-with-b-equal-some-b-value)
newdictList = [ next(x[1]) for x in tmp2 ]
Putting it all together:
from itertools import groupby
from operator import itemgetter
by_b = itemgetter('b')
newdictList = [ next(x[1]) for x in groupby(sorted(mydictList, key=by_b), key=by_b) ]
A very straight forward approach can go something like this:
mydictList= [{'a':1,'b':2,'c':3},{'a':2,'b':2,'c':4},{'a':2,'b':3,'c':4}]
b_set = set()
new_list = []
for d in mydictList:
if d['b'] not in b_set:
new_list.append(d)
b_set.add(d['b'])
Result:
>>> new_list
[{'a': 1, 'c': 3, 'b': 2}, {'a': 2, 'c': 4, 'b': 3}]

Dictionary comprehension to build list of lists: referencing the current value for a key during comprehension

I am trying to create a list of lists based on hashes. That is, I want a list of lists of items that hash the same. Is this possible in a single-line comprehension?
Here is the simple code that works without comprehensions:
def list_of_lists(items):
items_by_hash = defaultdict(list)
for item in items:
words_by_key[hash(item)].append(item)
return words_by_key.values()
For example, let's say we have this simple hash function:
def hash(string):
import __builtin__
return __builtin__.hash(string) % 10
Then,
>>> l = ['sam', 'nick', 'nathan', 'mike']
>>> [hash(x) for x in l]
[4, 3, 2, 2]
>>>
>>> list_of_lists(l)
[['nathan', 'mike'], ['nick'], ['sam']]
Is there any way I could do this in a comprehension? I need to be able to reference the dictionary I'm building mid-comprehension, in order to append the next item to the list-value.
This is the best I've got, but it doesn't work:
>>> { hash(word) : [word] for word in l }.values()
[['mike'], ['nick'], ['sam']]
It obviously creates a new list every time which is not what I want. I want something like
{ hash(word) : __this__[hash(word)] + [word] for word in l }.values()
or
>>> dict([ (hash(word), word) for word in l ])
{2: 'mike', 3: 'nick', 4: 'sam'}
but this causes the same problem.
[[y[1] for y in x[1]] for x in itertools.groupby(sorted((hash(y), y)
for y in items), operator.itemgetter(0))]

Categories