Convert list into dict of prefix with different delimiters - python

I am trying to convert a list of items that have three unique prefixes (e.g. apple_, banana_, water_melon_)
The initial list looks like this
table_ids = ["apple_1", "apple_2", "apple_3", "banana_1", "banana_2", "banana_3", "water_melon_1", "water_melon_2", "water_melon_3"]
My desired outcome would look like this:
{"apple": ["_1", "_2", "_3"], "banana": ["_1", "_2", "_3"], "water_melon": ["_1", "_2", "_3"]}
I've tried this
prefixes = ["apple_", "banana_", "water_melon_"]
res =[[id for id in table_ids if(id.startswith(prefix))] for prefix in prefixes]
However, this creates a list of list grouped by prefixes.

You can use str.rsplit and collections.defaultdict.
from collections import defaultdict
res = defaultdict(list)
for t in table_ids:
res[t.rsplit('_', 1)[0]].append('_' + t.rsplit('_', 1)[1])
print(res)
Output:
defaultdict(<class 'list'>, {'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']})

You can't do this with a list comprehension because you're trying to create a dict (not a list), and you can't do it with a dict comprehension efficiently because you can't determine which entries go in each sublist without iterating over the original list in its entirety.
Here's an example of how to do it by iterating over the list and appending to entries in a dictionary:
>>> table_ids = ["apple_1", "apple_2", "apple_3", "banana_1", "banana_2", "banana_3", "water_melon_1", "water_melon_2", "water_melon_3"]
>>> tables = {}
>>> for x in table_ids:
... t, _, i = x.rpartition("_")
... tables.setdefault(t, []).append("_" + i)
...
>>> tables
{'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']}
If you really wanted to do it in a nested dict/list comprehension, that'd look like:
>>> {t: ["_" + x.rpartition("_")[2] for x in table_ids if x.startswith(t)] for t in {x.rpartition("_")[0] for x in table_ids}}
{'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']}
Note that the list comprehensions inside the dict comprehension make this O(N^2) whereas the first version is O(N).

Related

Custom sort list of dictionaries by other dictionary or list

I have a list of dictionaries that I would like to order based on an external ordering (list, dictionary, whatever works). Let's say I have the following list:
list_a = [{"dylan": "alice"}, {"arnie": "charles"}, {"chelsea": "bob"}]
and I want to sort it like this
sorted_list_a = [{"arnie": "charles"}, {"chelsea": "bob"}, {"dylan": "alice"}]
I've tried to do it like this:
# list_a will have a variable number of dictionaries all with unique keys
# list_order will have all dictionary keys ordered, even if they don't appear in list_a
list_order = [{"arnie": 1}, {"britta": 2}, {"chelsea": 3}, {"dylan": 4}]
list_a.sort(key=lambda x: list_order[x.keys()])
but I get TypeError: list indices must be integers or slices, not dict_keys. I feel like I'm close, but I can't quite get to the end.
Try this:
def fun(x):
k, = (x)
for d in list_order:
if k in d:
return d[k]
res = sorted(list_a, key=fun)
print(res)
Output:
[{'arnie': 'charles'}, {'chelsea': 'bob'}, {'dylan': 'alice'}]
l = [{"dylan": "alice"}, {"arnie": "charles"}, {"chelsea": "bob"}]
d={}
for i in l:
for x,y in (i.items()):
d[x]=y
print(sorted(d.items(),key=lambda x: x[0]))

Count occurrence of strings in list of lists

I want to count the number of times a string has occurred in a list which is in another list and store it in a list of dictionary where each dictionary has count of a list.
Ex,
list = [['Sam','John','Alex','Sam','Alex'],['Max','Sam','Max']...]
and I want my list of dictionaries to be like:
count_list = [{'Sam':2,'Alex':2,'John':1}, {'Max':2, 'Sam':1}..]
I am iterating through each list to count number of times each string has occurred and adding each result to dict. But I end up having different result every time and not the correct values.
count_list = []
for l in list :
d = {}
for str in l:
if str not in d:
d[str] = l.count(str)
count_list.append(d)
Any help would be useful.Thanks.
It would be easier to use collections.Counter() here:
>>> from collections import Counter
>>> lst = [["Sam", "John", "Alex", "Sam", "Alex"], ["Max", "Sam", "Max"]]
>>> list(map(Counter, lst))
[Counter({'Sam': 2, 'Alex': 2, 'John': 1}), Counter({'Max': 2, 'Sam': 1})]
You could also use a list comprehension instead of using map() if thats easier to understand:
>>> [Counter(l) for l in lst]
[Counter({'Sam': 2, 'Alex': 2, 'John': 1}), Counter({'Max': 2, 'Sam': 1})]
Note: Counter is a subclass of dict, so you can treat them like normal dictionaries.
You can always cast to dict() if you want to as well:
>>> [dict(Counter(l)) for l in lst]
[{'Sam': 2, 'John': 1, 'Alex': 2}, {'Max': 2, 'Sam': 1}]
You should also not use list as a variable name, since it shadows the builtin function list().
Currently, you are doing the following:
count_list = []
for l in list :
d = {}
for str in l:
if str not in d:
d[str] = l.count(str)
count_list.append(d)
Note that you are appending the dictionary for each string in the sub lists, rather than one dictionary per sub list.
Doing the following should address the issue:
count_list = []
for l in list :
d = {}
for str in l:
if str not in d:
d[str] = l.count(str)
count_list.append(d)

Python: summing up list of dictionaries with different keys and same values

I have the following list of dictionaries with different keys and values. Every dictionary has only one key-value pair.
[{'a':1}, {'b':1}, {'a':1}, {'b':2}]
I need the following output preferably using list/dict comprehension
[{'a': 2, 'b': 3}]
I had a look on a similar question, but wasn't lucky to figure it out.
Any suggestions please?
you could use collections.Counter:
from collections import Counter
lst = [{'a':1}, {'b':1}, {'a':1}, {'b':2}]
c = Counter()
for dct in lst:
c.update(dct)
print(c)
# Counter({'b': 3, 'a': 2})
There is an answer in the question you linked to that works as is:
dict1 = [{'a':1}, {'b':1}, {'a':1}, {'b':2}]
final = {}
for d in dict1:
for k in d.keys():
final[k] = final.get(k,0) + d[k]
print (final)
Yields:
{'a': 2, 'b': 3}
This can be done in a single line without any imports using some clever reducing
To understand how this works, you have to understand how reduce works. Essentially, it allows you define an operation which takes two element from the list and reduces them down to a single element. It then applies this operation recursively to your list until the list has been reduced to a single element. Here is the one-line version:
dict1 = [{'a':1}, {'b':1}, {'a':1}, {'b':2}]
print(reduce(lambda a, b: {k: a.setdefault(k, 0) + b.setdefault(k, 0) for k in set(a.keys()).union(b.keys())}, dict1))
In this case, the operation is defined as:
lambda a, b:
{k: a.setdefault(k, 0) + b.setdefault(k, 0) for k in (set(a.keys()).union(b.keys()))}
Which can also be expressed as:
# a and b are two elements from the list. In this case they are dictionaries
def combine_dicts(a, b):
output = {}
for k in set(a.keys()).union(b.keys()): # the union of the keys in a and b
output[k] = a.setdefault(k, 0) + b.setdefault(k, 0)
# dict.setdefault returns the provided value if the key doesn't exist
return output
When this operation is applied to your list with reduce, you get the desired output:
>>> {'b': 3, 'a': 2}

How to return a list of frequencies for a certain value in a dict

So I have a dict, which contains keys corresponding to a list, which contains str. I want to collect all the same values in said list and sum them together. Perhaps my explanation was confusing so I'll provide an example:
function_name({'key1':['apple', 'orange'], 'key2':['orange', 'pear'})
>>> {'apple':1, 'orange':2, 'pear':1}
How would I create this function? I was thinking of somehow making a for loop like this:
count = 0
for fruit in dict_name:
if food == 'apple'
count = count + fruit
I am still unsure about how to format this especially how to count the values and collect them, thanks in advance for any suggestions!
You can un-nest the dict's values and apply a Counter.
>>> from collections import Counter
>>>
>>> d = {'key1':['apple', 'orange'], 'key2':['orange', 'pear']}
>>> Counter(v for sub in d.values() for v in sub)
Counter({'apple': 1, 'orange': 2, 'pear': 1})
If you don't like the nested generator comprehension, the un-nesting can be done with itertools.chain.from_iterable.
>>> from itertools import chain
>>> Counter(chain.from_iterable(d.values()))
Counter({'apple': 1, 'orange': 2, 'pear': 1})
Without imports and with traditional loops, it would look like this:
>>> result = {}
>>> for sub in d.values():
...: for v in sub:
...: result[v] = result.get(v, 0) + 1
...:
>>> result
{'apple': 1, 'orange': 2, 'pear': 1}
Something like this should do the trick:
>>> from collections import Counter
>>> counts = Counter([item for sublist in your_dict.values() for item in sublist])
If you don't want to import any libraries you can do as follows:
function_name = {'key1':['apple', 'orange'], 'key2':['orange', 'pear']}
foobar = {}
for key, value in function_name.items():
for element in value:
if element in foobar:
foobar[element] += 1
else:
foobar[element] = 1
print(foobar)
You check if the value is already in the created dict 'foobar'. If it is you add its value by one. If its not, then you add the value as a key and define its value as one. :)

Python dictionary comprehension: assign value to key, where value is a list

Example:
dictionary = {"key":[5, "string1"], "key2":[2, "string2"], "key3":[3, "string1"]}
After applying this dict comprehension:
another_dictionary = {key:value for (value,key) in dictionary.values()}
The result is like this:
another_dictionary = {"string1": 5, "string2": 2}
In other words, it doesn't sum up integer values under the same key which was a list item.
=================================================================
Desired result:
another_dictionary = {"string1": 8, "string2": 2}
You can use collections.defaultdict for this:
from collections import defaultdict
dictionary = {"key":[5, "string1"], "key2":[2, "string2"], "key3":[3, "string1"]}
d = defaultdict(int)
for num, cat in dictionary.values():
d[cat] += num
print(d)
defaultdict(<class 'int'>, {'string1': 8, 'string2': 2})
The reason your code does not work is you have not specified any summation or aggregation logic. This will require either some kind of grouping operation or, as here, iterating and adding to relevant items in a new dictionary.
You can also use itertools.groupby:
import itertools
dictionary = {"key":[5, "string1"], "key2":[2, "string2"], "key3":[3, "string1"]}
d= {a:sum(c for _, [c, d] in b) for a, b in itertools.groupby(sorted(dictionary.items(), key=lambda x:x[-1][-1]), key=lambda x:x[-1][-1])}
Output:
{'string2': 2, 'string1': 8}

Categories