Dict comprehension to group words by first letter

Dict comprehension to group words by first letter - python

Does anyone know how to avoid the error <generator object dictionary.<locals>.<genexpr> at 0x000001D295344580> that I get while trying to create a dict comprehension that generates specific keys: values?
For example, if we have a list:
words = ["hallo" , "hell", "hype", "empty", "full", "charge", "hey"]
I want to create a dictionary
{starting character of the item in list : list of items in words that start with the specific character}
so, for my example, the expected output would be:
{"h": ["hallo", "hell" , "hype", "hey"], "e" : ["empty"], "f": ["full"], "c": ["charge] }
My code:
{(chr(c) for c in range(ord("a"), ord("z")+1)):
[word for word in words if word.startswith("a")]}
The same happens if i try to generalize the word.startswith() statement.

Your current solution - and the corrected version - are rather inefficient, as they iterate on all letters, and for each letter, on all words, so 26*(number of words) loops.
You can do it by iterating only once on the list of words, by creating the dictionary key and the list that will contain the words on the fly. A defaultdict makes this easy:
from collections import defaultdict
words = ["hallo" , "hell", "hype", "empty", "full", "charge", "hey"]
out = defaultdict(list)
for word in words:
out[word[0]].append(word)
print(out)
# defaultdict(<class 'list'>, {'h': ['hallo', 'hell', 'hype', 'hey'], 'e': ['empty'], 'f': ['full'], 'c': ['charge']})
with just 7 loops, instead of 26*7 and as many tests, and simpler code...

You can do this easily with itertools.groupby:
>>> from itertools import groupby
>>> {k: list(v) for k, v in groupby(sorted(words), lambda s: s[0])}
{'c': ['charge'], 'e': ['empty'], 'f': ['full'], 'h': ['hallo', 'hell', 'hey', 'hype']}
Once the words are sorted in ordinary lexicographic order, it's safe to group them by their first letters. (Sorting by first letter only would be sufficient as well.)

That's not an error, it's an object that you inserted as a key. It seems like you're confused about the syntax for dict comprehensions. The generator expression you wrote ((chr(c) for c in ...)) doesn't expand, it gets used as the key instead. In fact, what you wrote isn't even a dict comprehension.
To do what you want, the loop needs to be after the key-value pair.
{chr(c): [word for word in words if word.startswith(chr(c))]
for c in range(ord("a"), ord("z")+1)}
For comparison, here's a loose version of the syntax:
{key: value for x in iterable}
This is the naive solution. See Thierry's and chepner's answers for the better solutions. With the naive one, you'd also need to remove the empty lists:
>>> d = {chr(c): [word for word in words if word.startswith(chr(c))]
... for c in range(ord("a"), ord("z")+1)}
>>> d
{'a': [], 'b': [], 'c': ['charge'], 'd': [], 'e': ['empty'], 'f': ['full'], 'g': [], 'h': ['hallo', 'hell', 'hype', 'hey'], 'i': [], 'j': [], 'k': [], 'l': [], 'm': [], 'n': [], 'o': [], 'p': [], 'q': [], 'r': [], 's': [], 't': [], 'u': [], 'v': [], 'w': [], 'x': [], 'y': [], 'z': []}
>>> {k: v for k, v in d.items() if v}
{'c': ['charge'], 'e': ['empty'], 'f': ['full'], 'h': ['hallo', 'hell', 'hype', 'hey']}

Related

How to initialize defaultdict with keys?

I have a dictionary of lists, and it should be initialized with default keys. I guess, the code below is not good (I mean, it works, but I don't feel that it is written in the pythonic way):
d = {'a' : [], 'b' : [], 'c' : []}
So I want to use something more pythonic like defaultict:
d = defaultdict(list)
However, every tutorial that I've seen dynamically sets the new keys. But in my case all the keys should be defined from the start. I'm parsing other data structures, and I add values to my dictionary only if specific key in the structure also contains in my dictionary.
How can I set the default keys?

From the comments, I'm assuming you want a dictionary that fits the following conditions:
Is initialized with set of keys with an empty list value for each
Has defaultdict behavior that can initialize an empty list for non-existing keys
#Aaron_lab has the right method, but there's a slightly cleaner way:
d = defaultdict(list,{ k:[] for k in ('a','b','c') })

That's already reasonable but you can shorten that up a bit with a dict comprehension that uses a standard list of keys.
>>> standard_keys = ['a', 'b', 'c']
>>> d1 = {key:[] for key in standard_keys}
>>> d2 = {key:[] for key in standard_keys}
>>> ...

If you're going to pre-initialize to empty lists, there is no need for a defaultdict. Simple dict-comprehension gets the job done clearly and cleanly:
>>> {k : [] for k in ['a', 'b', 'c']}
{'a': [], 'b': [], 'c': []}

If you have a close set of keys (['a', 'b', 'c'] in your example)
you know you'll use, you can definitely use the answers above.
BUT...
dd = defaultdict(list) gives you much more then: d = {'a':[], 'b':[], 'c':[]}.
You can append to "not existing" keys in defaultdict:
>>dd['d'].append(5)
>>dd
>>defaultdict(list, {'d': 5})
where if you do:
>>d['d'].append(5) # you'll face KeyError
>>KeyError: 'd'
Recommend to do something like:
>>d = {'a' : [], 'b' : [], 'c' : []}
>>default_d = defaultdict(list, **d)
now you have a dict holding your 3 keys: ['a', 'b', 'c'] and empty lists as values, and you can also append to other keys without explicitly writing: d['new_key'] = [] before appending

You can have a function defined which will return you a dict with preset keys.
def get_preset_dict(keys=['a','b','c'],values=None):
d = {}
if not values:
values = [[]]*len(keys)
if len(keys)!=len(values):
raise Exception('unequal lenghts')
for index,key in enumerate(keys):
d[key] = values[index]
return d
In [8]: get_preset_dict()
Out[8]: {'a': [], 'b': [], 'c': []}
In [18]: get_preset_dict(keys=['a','e','i','o','u'])
Out[18]: {'a': [], 'e': [], 'i': [], 'o': [], 'u': []}
In [19]:
get_preset_dict(keys=['a','e','i','o','u'],values=[[1],[2,2,2],[3],[4,2],[5]])
Out[19]: {'a': [1], 'e': [2, 2, 2], 'i': [3], 'o': [4, 2], 'u': [5]}

from collections import defaultdict
list(map((data := defaultdict(list)).__getitem__, 'abcde'))
data
Out[3]: defaultdict(list, {'a': [], 'b': [], 'c': [], 'd': [], 'e':
[]})

Adding multiple key,value pair using dictionary comprehension

for a list of dictionaries
sample_dict = [
{'a': 'woot', 'b': 'nope', 'c': 'duh', 'd': 'rough', 'e': '1'},
{'a': 'coot', 'b': 'nope', 'c': 'ruh', 'd': 'rough', 'e': '2'},
{'a': 'doot', 'b': 'nope', 'c': 'suh', 'd': 'rough', 'e': '3'},
{'a': 'soot', 'b': 'nope', 'c': 'fuh', 'd': 'rough', 'e': '4'},
{'a': 'toot', 'b': 'nope', 'c': 'cuh', 'd': 'rough', 'e': '1'}
]
How do I make a separate dictionary that contains all the key,value pair that match to a certain key. With list comprehension I created a list of all the key,value pairs like this:
container = [[key,val] for s in sample_dict for key,val in s.iteritems() if key == 'a']
Now the container gave me
[['a', 'woot'], ['a', 'coot'], ['a', 'doot'], ['a', 'soot'], ['a', 'toot']]
Which is all fine... but if I want to do the same with dictionaries, I get only a singe key,value pair. Why does this happen ?
container = {key : val for s in sample_dict for key,val in s.iteritems() if key == 'a'}
The container gives only a single element
{'a': 'toot'}
I want the something like
{'a': ['woot','coot','doot','soot','toot']}
How do I do this with minimal change to the code above ?

You are generating multiple key-value pairs with the same key, and a dictionary will only ever store unique keys.
If you wanted just one key, you'd use a dictionary with a list comprehension:
container = {'a': [s['a'] for s in sample_dict if 'a' in s]}
Note that there is no need to iterate over the nested dictionaries in sample_dict if all you wanted was a specific key; in the above I simply test if the key exists ('a' in s) and extract the value for that key with s['a']. This is much faster than looping over all the keys.

Another option:
filter = lambda arr, x: { x: [ e.get(x) for e in arr] }
So, from here, you can construct the dict based on the original array and the key
filter(sample_dict, 'a')
# {'a': ['woot', 'coot', 'doot', 'soot', 'toot']}

Rearranging levels of a nested dictionary in python

Is there a library that would help me achieve the task to rearrange the levels of a nested dictionary
Eg: From this:
{1:{"A":"i","B":"ii","C":"i"},2:{"B":"i","C":"ii"},3:{"A":"iii"}}
To this:
{"A":{1:"i",3:"iii"},"B":{1:"ii",2:"i"},"C":{1:"i",2:"ii"}}
ie first two levels on a 3 levelled dictionary swapped. So instead of 1 mapping to A and 3 mapping to A, we have A mapping to 1 and 3.
The solution should be practical for an arbitrary depth and move from one level to any other within.

>>> d = {1:{"A":"i","B":"ii","C":"i"},2:{"B":"i","C":"ii"},3:{"A":"iii"}}
>>> keys = ['A','B','C']
>>> e = {key:{k:d[k][key] for k in d if key in d[k]} for key in keys}
>>> e
{'C': {1: 'i', 2: 'ii'}, 'B': {1: 'ii', 2: 'i'}, 'A': {1: 'i', 3: 'iii'}}
thank god for dict comprehension

One way to think about this would be to consider your data as a (named) array and to take the transpose. An easy way to achieve this would be to use the data analysis package Pandas:
import pandas as pd
df = pd.DataFrame({1: {"A":"i","B":"ii","C":"i"},
2: {"B":"i","C":"ii"},
3: {"A":"iii"}})
df.transpose().to_dict()
{'A': {1: 'i', 2: nan, 3: 'iii'},
'B': {1: 'ii', 2: 'i', 3: nan},
'C': {1: 'i', 2: 'ii', 3: nan}}

I don't really care about performance for my application of this so I haven't bothered checking how efficient this is. Its based on bubblesort so my guess is ~O(N^2).
Maybe this is convoluted, but essentially below works by:
- providing dict_swap_index a nested dictionary and a list. the list should be of the format [i,j,k]. The length should be the depth of the dictionary. Each element corresponds to which position you'd like to move each element to. e.g. [2,0,1] would indicate move element 0 to position 2, element 1 to position 0 and element 2 to position 1.
- this function performs a bubble sort on the order list and dict_, calling deep_swap to swap the levels of the dictionary which are being swapped in the order list
- deep_swap recursively calls itself to find the level provided and returns a dictionary which has been re-ordered
- swap_two_level_dict is called to swap any two levels in a dictionary.
Essentially the idea is to perform a bubble sort on the dictionary, but instead of swapping elements in a list swap levels in a dictionary.
from collections import defaultdict
def dict_swap_index(dict_, order):
for pas_no in range(len(order)-1,0,-1):
for i in range(pas_no):
if order[i] > order[i+1]:
temp = order[i]
order[i] = order[i+1]
order[i+1] = temp
dict_ = deep_swap(dict_, i)
return dict_, order
def deep_swap(dict_, level):
dict_ = deepcopy(dict_)
if level==0:
dict_ = swap_two_level_dict(dict_)
else:
for key in dict_:
dict_[key] = deep_swap(dict_[key], level-1)
return dict_
def swap_two_level_dict(a):
b = defaultdict(dict)
for key1, value1 in a.items():
for key2, value2 in value1.items():
b[key2].update({key1: value2})
return b
e.g.
test_dict = {'a': {'c': {'e':0, 'f':1}, 'd': {'e':2,'f':3}}, 'b': {'c': {'g':4,'h':5}, 'd': {'j':6,'k':7}}}
result = dict_swap_index(test_dict, [2,0,1])
result
(defaultdict(dict,
{'c': defaultdict(dict,
{'e': {'a': 0},
'f': {'a': 1},
'g': {'b': 4},
'h': {'b': 5}}),
'd': defaultdict(dict,
{'e': {'a': 2},
'f': {'a': 3},
'j': {'b': 6},
'k': {'b': 7}})}),
[0, 1, 2])

Translating characters in a string to multiple characters using Python

I have a list of strings with prefix characters representing the multiplying factor for the number. So if I have data like:
data = ['101n', '100m', '100.100f']
I want to use the dictionary
prefix_dict = {'y': 'e-24', 'z': 'e-21', 'a': 'e-18', 'f': 'e-15', 'p': 'e-12',
'n': 'e-9', 'u': 'e-6', 'm': 'e-3', 'c': 'e-2', 'd': 'e-1',
'da': 'e1', 'h': 'e2', 'k': 'e3', 'M': 'e6', 'G': 'e9',
'T': 'e12', 'P': 'e15', 'E': 'e18', 'Z': 'e21', 'Y': 'e24'}
To insert their corresponding strings. When I look at the other questions similar to mine there is one character being translated into another character. Is there a way to use the translate function to translate one character into multiple characters or should I be approaching this differently?

You can use regex for this, this works for 'da' as well:
>>> data = ['101n', '100m', '100.100f', '1d', '1da']
>>> import re
>>> r = re.compile(r'([a-zA-Z]+)$')
>>> for d in data:
print r.sub(lambda m: prefix_dict.get(m.group(1), m.group(1)), d)
...
101e-9
100e-3
100.100e-15
1e-1
1e1
And a non-regex version using itertools.takewhile:
>>> from itertools import takewhile
>>> def find_suffix(s):
return ''.join(takewhile(str.isalpha, s[::-1]))[::-1]
...
>>> for d in data:
sfx = find_suffix(d)
print (d.replace(sfx, prefix_dict.get(sfx, sfx)))
...
101e-9
100e-3
100.100e-15
1e-1
1e1

Try:
for i, entry in enumerate(data):
for key, value in sorted(prefix_dict.items(),
key = lambda x: len(x[0]), reverse=True):
# need to sort the dictionary so that 'da' always comes before 'a'
if key in entry:
data[i] = entry.replace(key, value)
print(data)
This works for arbitrary combinations in the dictionary and the data. If the dictionary key is always only 1 string long, you have lots of other solutions posted here.

import re
data = ['101da', '100m', '100.100f']
prefix_dict = {'y': 'e-24', 'z': 'e-21', 'a': 'e-18', 'f': 'e-15', 'p': 'e-12',
'n': 'e-9', 'u': 'e-6', 'm': 'e-3', 'c': 'e-2', 'd': 'e-1',
'da': 'e1', 'h': 'e2', 'k': 'e3', 'M': 'e6', 'G': 'e9',
'T': 'e12', 'P': 'e15', 'E': 'e18', 'Z': 'e21', 'Y': 'e24'}
comp = re.compile(r"[^\[A-Za-z]")
for ind,d in enumerate(data):
pre = re.sub(comp,"",d)
data[ind] = d.replace(pre,prefix_dict.get(pre))
print data
['101e1', '100e-3', '100.100e-15']
You can use pre = [x for x in d if x.isalpha()][0] instead of using re

Concatenate lists together

I have a dictionary where the value elements are lists:
d1={'A': [], 'C': ['SUV'], 'B': []}
I need to concatenate the values into a single list ,only if the list is non-empty.
Expected output:
o=['SUV']
Help is appreciated.

from itertools import chain
d1={'A': [], 'C': ['SUV'], 'B': []}
print list(chain.from_iterable(d1.itervalues()))

>>> d1 = {'A': [], 'C': ['SUV'], 'B': []}
>>> [ele for lst in d1.itervalues() for ele in lst]
['SUV']

You can use itertools.chain, but the order can be arbitrary as dicts are unordered collection. So may have have to sort the dict based on keys or values to get the desired result.
>>> d1={'A': [], 'C': ['SUV'], 'B': []}
>>> from itertools import chain
>>> list(chain(*d1.values())) # or use d1.itervalues() as it returns an iterator(memory efficient)
['SUV']

>>> from operator import add
>>> d1={'A': [], 'C': ['SUV'], 'B': []}
>>> reduce(add,d1.itervalues())
['SUV']
Or more comprehensive example:
>>> d2={'A': ["I","Don't","Drive"], 'C': ['SUV'], 'B': ["Boy"]}
>>> reduce(add,d2.itervalues())
['I', "Don't", 'Drive', 'SUV', 'Boy']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dict comprehension to group words by first letter - python

Related

How to initialize defaultdict with keys?

Adding multiple key,value pair using dictionary comprehension

Rearranging levels of a nested dictionary in python

Translating characters in a string to multiple characters using Python

Concatenate lists together

Categories

Resources